scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Benny Halevy	8a56050507	main: handle abort_requested_exception on startup Handle abort_requested_exception exactly like sleep_aborted, as an expected error when startup is aborted mid-way. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15443	2023-09-18 15:05:52 +03:00
Botond Dénes	f7557a4891	Merge 'updating presto integration page documentation' from Guy Shtub null Closes scylladb/scylladb#15342 * github.com:scylladb/scylladb: Update integration-presto.rst Update integration-presto.rst Update docs/using-scylla/integrations/integration-presto.rst updating presto integration page	2023-09-18 14:41:16 +03:00
Botond Dénes	edb50c27ec	Merge 'Use sstable_state in sstables populator' from Pavel Emelyanov Some time ago populating of tables from sstables was reworked to use sstable states instead of full paths (#12707). Since then few places in the populator was left that still operate on the state-based subdirectory name. This PR collects most of those dangling ends refs: #13020 Closes scylladb/scylladb#15421 * github.com:scylladb/scylladb: distributed_loader: Print sstable state explicitly distributed_loader: Move check for the missing dir upper distributed_loader: Use state as _sstable_directories key	2023-09-18 14:38:49 +03:00
Kefu Chai	054beb6377	tests: tablets: do not compare signed integer with unsigned integer when compiling the tests with -Wsign-compare, the compiler complains like: ``` /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_BROKEN_SOURCE_LOCATION -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -isystem /home/kefu/dev/scylladb/build/cmake/rust -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -march=westmere -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/tablets_test.dir/tablets_test.cc.o -MF test/boost/CMakeFiles/tablets_test.dir/tablets_test.cc.o.d -o test/boost/CMakeFiles/tablets_test.dir/tablets_test.cc.o -c /home/kefu/dev/scylladb/test/boost/tablets_test.cc /home/kefu/dev/scylladb/test/boost/tablets_test.cc:1335:53: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] for (int log2_tablets = 0; log2_tablets < tablet_count_bits; ++log2_tablets) { ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~ ``` in this case, it should be safe to use an signed int as the loop variable to be compared with `tablet_count_bits`, but let's just appease the compiler so we can enable the warning option project-wide to prevent any potential issues caused by signed-unsigned comparision. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15449	2023-09-18 13:17:16 +02:00
Kamil Braun	bc6f7d1b20	Merge 'raft topology: add garbage collection for internal CDC generations table' from Patryk Jędrzejczak We add garbage collection for the `CDC_GENERATIONS_V3` table to prevent it from endlessly growing. This mechanism is especially needed because we send the entire contents of `CDC_GENERATIONS_V3` as a part of the group 0 snapshot. The solution is to keep a clean-up candidate, which is one of the already published CDC generations. The CDC generation publisher introduced in #15281 continually uses this candidate to remove all generations with timestamps not exceeding the candidate's and sets a new candidate when needed. We also add `test_cdc_generation_clearing.py` that verifies this new mechanism. Fixes #15323 Closes scylladb/scylladb#15413 * github.com:scylladb/scylladb: test: add test_cdc_generation_clearing raft topology: remove obsolete CDC generations raft topology: set CDC generation clean-up candidate topology_coordinator: refactor publish_oldest_cdc_generation system_keyspace: introduce decode_cdc_generation_id system_keyspace: add cleanup_candidate to CDC_GENERATIONS_V3	2023-09-18 11:30:10 +02:00
Pavel Emelyanov	30959fc9b1	lsa, test: Extend memory footprint test with per-type total sizes When memory footprint test is over it prints total size taken by row cache, memtable and sstables as well as individual objects' sizes. It's also nice to know the details on the row-cache's individual objects. This patch extends the printing with total size of allocated object types according to migrator_fn types. Sample output: mutation footprint: - in cache: 11040928 - in memtable: 9142424 - in sstable: mc: 2160000 md: 2160000 me: 2160000 - frozen: 540 - canonical: 827 - query result: 342 sizeof(cache_entry) = 64 sizeof(memtable_entry) = 64 sizeof(bptree::node) = 288 sizeof(bptree::data) = 72 -- sizeof(decorated_key) = 32 -- sizeof(mutation_partition) = 96 -- -- sizeof(_static_row) = 8 -- -- sizeof(_rows) = 24 -- -- sizeof(_row_tombstones) = 40 sizeof(rows_entry) = 144 sizeof(evictable) = 24 sizeof(deletable_row) = 72 sizeof(row) = 16 radix_tree::inner_node::node_sizes = 48 80 144 272 528 1040 radix_tree::leaf_node::node_sizes = 120 216 416 816 3104 sizeof(atomic_cell_or_collection) = 16 btree::linear_node_size(1) = 24 btree::inner_node_size = 216 btree::leaf_node_size = 120 LSA stats: N18compact_radix_tree4treeI13cell_and_hashjE9leaf_nodeE: 360 N5bplus4dataIl15intrusive_arrayI11cache_entryEN3dht25raw_token_less_comparatorELm16ELNS_10key_searchE0ELNS_10with_debugE0EEE: 5040 N5bplus4nodeIl15intrusive_arrayI11cache_entryEN3dht25raw_token_less_comparatorELm16ELNS_10key_searchE0ELNS_10with_debugE0EEE: 19296 17partition_version: 952416 N11intrusive_b4nodeI10rows_entryXadL_ZNS1_5_linkEEENS1_11tri_compareELm12ELm20ELNS_10key_searchE0ELNS_10with_debugE0EEE: 317472 10rows_entry: 1429056 12blob_storage: 254 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15434	2023-09-18 11:23:18 +02:00
Guy Shtub	5d833b2ee7	Update integration-presto.rst	2023-09-18 11:29:38 +03:00
Botond Dénes	bb7121a1fb	Merge 'tools/scylla-nodetools: do not create unowned bpo::value ' from Kefu Chai in other words, do not create bpo::value unless transfer it to an option_description. `boost::program_options::value()` create a new typed_value<T> object, without holding it with a shared_ptr. boost::program_options expects developer to construct a `bpo::option_description` right away from it. and `boost::program_options::option_description` takes the ownership of the `type_value<T>` raw pointer, and manages its life cycle with a shared_ptr. but before passing it to a `bpo::option_description`, the pointer created by `boost::program_options::value()` is a still a raw pointer. before this change, we initialize `operations_with_func` as global variables using `boost::program_options::value()`. but unfortunately, we don't always initialize a `bpo::option_description` from it -- we only do this on demand when the corresponding subcommand is called. so, if the corresponding subcommand is not called, the created `typed_value<T>` objects are leaked. hence LeakSanitizer warns us. after this change, we create the option map as a static local variable in a function so it is created on demand as well. as an alternative, we could initialize the options map as local variable where it used. but to be more consistent with how `global_option` is specified. and to colocate them in a single place, let's keep the existing code layout. this change is quite similar to `374bed8c3d` Fixes https://github.com/scylladb/scylladb/issues/15429 Closes scylladb/scylladb#15430 github.com:scylladb/scylladb: tools/scylla-nodetools: reindent tools/scylla-nodetools: do not create unowned bpo::value	2023-09-18 11:09:46 +03:00
Kefu Chai	a51b14d4c4	sstables/metadata_collector: drop unused functions column_stats::update_local_deletion_time() is not used anywhere, what is being used is `column_stats::update_local_deletion_time_and_tombstone_histogram(time_point)`. while `update_local_deletion_time_and_tombstone_histogram(int32_t)` is only used internally by a single caller. neither is `column_stats::update(const deletion_time&)` used. so let's drop them. and merge `update_local_deletion_time_and_tombstone_histogram(int32_t)` into its caller. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15189	2023-09-18 10:18:56 +03:00
Botond Dénes	b97778e4b2	Merge 'create-relocatable-package.py: do not assume "build" build directory' from Kefu Chai in this series, we do not assume the existence of "build" build directory. and prefer using the version files located under the directory specified with the `--build-dir` option. Refs #15241 Closes scylladb/scylladb#15402 * github.com:scylladb/scylladb: create-relocatable-package.py: prefer $build_dir/SCYLLA-RELEASE-FILE create-relocatable-package.py: create SCYLLA-RELOCATABLE-FILE with tempfile	2023-09-18 09:07:37 +03:00
Kefu Chai	a03dc92cb5	tools/scylla-nodetools: reindent Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-09-18 13:57:37 +08:00
Kefu Chai	ed41c725f3	tools/scylla-nodetools: do not create unowned bpo::value in other words, do not create bpo::value unless transfer it to an option_description. `boost::program_options::value()` create a new typed_value<T> object, without holding it with a shared_ptr. boost::program_options expects developer to construct a `bpo::option_description` right away from it. and `boost::program_options::option_description` takes the ownership of the `type_value<T>*` raw pointer, and manages its life cycle with a shared_ptr. but before passing it to a `bpo::option_description`, the pointer created by `boost::program_options::value()` is a still a raw pointer. before this change, we initialize `operations_with_func` as global variables using `boost::program_options::value()`. but unfortunately, we don't always initialize a `bpo::option_description` from it -- we only do this on demand when the corresponding subcommand is called. so, if the corresponding subcommand is not called, the created `typed_value<T>` objects are leaked. hence LeakSanitizer warns us. after this change, we create the option map as a static local variable in a function so it is created on demand as well. as an alternative, we could initialize the options map as local variable where it used. but to be more consistent with how `global_option` is specified. and to colocate them in a single place, let's keep the existing code layout. this change is quite similar to `374bed8c3d` Fixes #15429 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-09-18 13:57:37 +08:00
Kefu Chai	b350596656	docs: correct the code sample for checking service status ```console $ journalctl --user start scylla-server -xe Failed to add match 'start': Invalid argument ``` `journalctl` expects a match filter as its positional arguments. but apparently, start is not a filter. we could use `--unit` to specify a unit though, like: ```console $ journalctl --user --unit scylla-server.service -xe ``` but it would flood the stdout with the logging messages printed by scylla. this is not what a typical user expects. probably a better use experience can be achieved using ```console $ systemctl --user status scylla-server ``` which also print the current status reported by the service, and the command line arguments. they would be more informative in typical use cases. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15390	2023-09-18 08:37:42 +03:00
Avi Kivity	67a0c865cf	tools: toolchain: prepare: don't overwrite existing images The docker/podman tooling is destructive: it will happily overwrite images locally and on the server. If a maintainer forgets to update tools/toolchain/image, this can result in losing an older toolchain container image. To prevent that, check that the image name is new. Closes scylladb/scylladb#15397	2023-09-18 08:35:01 +03:00
Kefu Chai	a04fa0b41e	conf: update commented out experimental_features update commented out experimental_features to reflect the latest experimental features: - in `4f23eec4`, "raft" was renamed to "consistent-topology-changes". - in `2dedb5ea`, "alternator-ttl" was moved out of experimental features. - in `5b1421cc`, "broadcast-tables" was added to experimental features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15407	2023-09-18 08:31:01 +03:00
Guy Shtub	b8693636b8	Update integration-presto.rst Removing link to forum, will be added as general footer	2023-09-18 06:50:11 +03:00
Guy Shtub	7d0691b348	Update docs/using-scylla/integrations/integration-presto.rst Co-authored-by: Anna Stuchlik <37244380+annastuchlik@users.noreply.github.com>	2023-09-18 06:46:02 +03:00
Avi Kivity	4eb4ac4634	scripts: pull_gitgub_pr.sh: absolutize project reference pull_gitgub_pr.sh adds a "Closes #xyz" tag so github can close the pull request after next promotion. Convert it to an absolute refefence (scylladb/scylladb#xyz) so the commit can be cherry-picked into another repository without the reference dangling. Closes #15424	2023-09-15 19:29:50 +03:00
Kefu Chai	1e6b2eb4c8	tools/scylla-nodetool: mark format string as constexpr this change change `const` to `constexpr`. because the string literal defined here is not only immutable, but also initialized at compile-time, and can be used by constexpr expressions and functions. this change is introduced to reduce the size of the change when moving to compile-time format string in future. so far, seastar::format() does not use the compile-time format string, but we have patches pending on review implementing this. and the author of this change has local branches implementing the changes on scylla side to support compile-time format string, which practically replaces most of the `format()` calls with `seastar::format()`. without this change, if we use compile-time format check, compiler fails like: ``` /home/kefu/dev/scylladb/tools/scylla-nodetool.cc:276:44: error: call to consteval function 'fmt::basic_format_string<char, const char const &, seastar::basic_sstring<char, unsigned int, 15>>::basic_format_string<const char , 0>' is not a constant expression .description = seastar::format(description_template, app_name, boost::algorithm::join(operations \| boost::adaptors::transformed([] (const auto& op) { ^ /usr/include/fmt/core.h:3148:67: note: read of non-constexpr variable 'description_template' is not allowed in a constant expression FMT_CONSTEVAL FMT_INLINE basic_format_string(const S& s) : str_(s) { ^ /home/kefu/dev/scylladb/tools/scylla-nodetool.cc:276:44: note: in call to 'basic_format_string(description_template)' .description = seastar::format(description_template, app_name, boost::algorithm::join(operations \| boost::adaptors::transformed([] (const auto& op) { ^ /home/kefu/dev/scylladb/tools/scylla-nodetool.cc:258:16: note: declared here const auto description_template = ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15432	2023-09-15 19:28:38 +03:00
Kefu Chai	6c75dc4be8	tools/scylla-nodetool: do not compare unsigned with int change the loop variable to `int` to silence warning like ``` /home/kefu/.local/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_BROKEN_SOURCE_LOCATION -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -I/home/kefu/dev/scylladb/build/cmake/gen -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -march=westmere -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT tools/CMakeFiles/tools.dir/scylla-nodetool.cc.o -MF tools/CMakeFiles/tools.dir/scylla-nodetool.cc.o.d -o tools/CMakeFiles/tools.dir/scylla-nodetool.cc.o -c /home/kefu/dev/scylladb/tools/scylla-nodetool.cc /home/kefu/dev/scylladb/tools/scylla-nodetool.cc:215:28: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare] for (unsigned i = 0; i < argc; ++i) { ~ ^ ~~~~ ``` `i` is used as the index in a plain C-style array, it's perfectly fine to use a signed integer as index in this case. as per C++ standard, > The expression E1[E2] is identical (by definition) to *((E1)+(E2)) Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15431	2023-09-15 19:28:14 +03:00
Kefu Chai	30ef69fcb2	docs/dev/object_store: add more samples in hope to lower the bar to testing object store. * add language specifier for better readability of the document. to highlight the config with YAML syntax * add more specific comment on the AWS related settings * explain that endpoint should match in the CREATE KEYSPACE statement and the one defined by the YAML configuration. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15433	2023-09-15 17:35:17 +03:00
Pavel Emelyanov	cce2752b64	Merge 'node_ops: move node_ops related classes to node_ops/' from Aleksandra Martyniuk Move node_ops related classes to node_ops/ so that they are consistently grouped and could be access from many modules. Closes #15351 * github.com:scylladb/scylladb: node_ops: extract classes related to node operations node_ops: repair: move node_ops_id to node_ops directory	2023-09-15 15:12:00 +03:00
Anna Stuchlik	fb635dccaa	doc: add info - support for FIPS-compliant systems This commit adds the information that ScyllaDB Enterprise supports FIPS-compliant systems in versions 2023.1.1 and later. The information is excluded from OSS docs with the "only" directive, because the support was not added in OSS. This commit must be backported to branch-5.2 so that it appears on version 2023.1 in the Enterprise docs. Closes #15415	2023-09-15 11:08:34 +02:00
Patryk Jędrzejczak	840e1c5185	test: add test_cdc_generation_clearing We add a test for the new CDC generation garbage collection mechanism.	2023-09-15 09:28:32 +02:00
Patryk Jędrzejczak	0cc54e0da7	raft topology: remove obsolete CDC generations We make the CDC generation publisher continually remove the obsolete CDC generation data to prevent CDC_GENERATIONS_V3 from endlessly growing. To achieve this, we use the clean-up candidate. If it exists and can be safely removed, we remove it together with all older CDC generations. We also mark the lack of a new candidate. The next published CDC generation will become one. Note this solution does not have any guarantee about "when" it removes obsolete generations. Formally, it guarantees that if there is a candidate that can be removed and the CDC generation publisher attempts to remove it, all generations up to the candidate are removed. In practice, when a new generation appears, the publisher makes a new candidate or tries to remove an old candidate, so obsolete generations can stay for a long time only if no generation appears for a long time. But it is fine because we only want to prevent CDC_GENERATIONS_V3 from growing too much. Moreover, providing any guarantees would require a new wake-up mechanism for the publisher, which would be hard to implement.	2023-09-15 09:26:58 +02:00
Patryk Jędrzejczak	e375e769b9	raft topology: set CDC generation clean-up candidate We want to use the clean-up candidates to remove the obsolete CDC generation data, but first, we need to set suitable generations as a candidate when there is no candidate. Since CDC generations must be published before we remove them, a generation that is being published is a good candidate.	2023-09-15 09:23:59 +02:00
Patryk Jędrzejczak	b84e097c28	topology_coordinator: refactor publish_oldest_cdc_generation In the following commits, we add a new task for the CDC generation publisher -- clearing obsolete CDC generation data. This task can be done together with the publishing under one group 0 guard. We refactor publish_oldest_cdc_generation to make it possible. Now, this function is more like a command builder. It takes guard by const reference and updates the vector of mutations and the reason string. The CDC generation publisher uses them directly to update the topology at the end after finishing building the command. This logic will be more visible after adding the clearing task.	2023-09-15 09:04:23 +02:00
Botond Dénes	b87660f90c	tools/scylla-sstable: log where schema was obtained from Currently, we only log anything about what was tried w.r.t. obtaining the schema if it failed. Add a log message to the success path too, so in case the wrong schema was successfully loaded, the user can find the problem. The log message is printed with debug-level, so it doesn't distrurb output by default. Fixes: #15384 Closes #15417	2023-09-14 23:09:30 +03:00
Botond Dénes	0f8b297d07	Merge 'build: cmake: add targets for building deb and rpm packages' from Kefu Chai in this series, - the build of unstripped package is fixed, and - the targets for building deb and rpm packages are added. these targets builds deb and rpm packages from the unstripped package. Closes #15403 * github.com:scylladb/scylladb: build: cmake: add targets for building deb and rpm packages build: cmake: correct the paths used when building unstripped pkg	2023-09-14 18:22:30 +03:00
Kefu Chai	60db7f8ae3	doc: do not suggest "-node xxx" when running c-s cassandra-stress connects to "localhost" by default. that's exactly the use case when we install scylla using the unified installer. so do not suggest "-node xxx" option. the "xxx" part is but confusing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15411	2023-09-14 18:21:46 +03:00
Petr Gusev	6c3cc7d6e0	test_fence_hints: increase timeouts We saw failures on CI in debug mode, probably the machine running the test is shared, and we starved for some resources. Fix #15285 Closes #15388	2023-09-14 16:22:50 +02:00
Avi Kivity	d9a453e72e	Merge 'Introduce a scylla-native nodetool' from Botond Dénes This series introduces a scylla-native nodetool. It is invokable via the main scylla executable as the other native tools we have. It uses the seastar's new `http::client` to connect to the specified node and execute the desired commands. For now a single command is implemented: `nodetool compact`, invokable as `scylla nodetool compact`. Once all the boilerplate is added to create a new tool, implementing a single command is not too bad, in terms of code-bloat. Certainly not as clean as a python implementation would be, but good enough. The advantages of a C++ implementation is that all of us in the core team know C++ and that it is shipped right as part of the scylla executable.. Closes #14841 * github.com:scylladb/scylladb: test: add nodetool tests test.py: add ToolTestSuite and ToolTest tools/scylla-nodetool: implement compact operation tools/scylla-nodetool: implement basic scylla_rest_api_client tools: introduce scylla-nodetool utils: export dns_connection_factory from s3/client.cc to http.hh utils/s3/client: pass logger to dns_connection_factory in constructor tools/utils: tool_app_template::run_async(): also detect --help* as --help	2023-09-14 17:20:40 +03:00
Avi Kivity	a3d73bfba7	Merge 'Add support for decommission with tablets' from Tomasz Grabiec Load balancer will recognize decommissioning nodes and will move tablet replicas away from such nodes with highest priority. Topology changes have now an extra step called "tablet draining" which calls the load balancer. The step will execute tablet migration track as long as there are nodes which require draining. It will not do regular load balancing. If load balancer is unable to find new tablet replicas, because RF cannot be met or availability is at risk due to insufficient node distribution in racks, it will throw an exception. Currently, topology change will retry in a loop. We should make this error cause topology change to be aborted. There is no infrastructure for aborts yet, so this is not implemented. Closes #15197 * github.com:scylladb/scylladb: tablets, raft topology: Add support for decommission with tablets tablet_allocator: Compute load sketch lazily tablet_allocator: Set node id correctly tablet_allocator: Make migration_plan a class tablets: Implement cleanup step storage_service, tablets: Prevent stale RPCs from running beyond their stage locator: Introduce tablet_metadata_guard locator, replica: Add a way to wait for table's effective_replication_map change storage_service, tablets: Extract do_tablet_operation() from stream_tablet() raft topology: Add break in the final case clause raft topology: Fix SIGSEGV when trace-level logging is enabled raft topology: Set node state in topology raft topology: Always set host id in topology	2023-09-14 17:16:23 +03:00
Kamil Braun	0564d000c6	Merge 'Validate compaction strategy options' from Aleksandra Martyniuk When a column family's schema is changed new compaction strategy type may be applied. To make sure that it will behave as expected, compaction strategy need to contain only the allowed options and values. Methods throwing exception on invalid options are added. Fixes: #2336. Closes #13956 * github.com:scylladb/scylladb: test: add test for compaction strategy validation compaction: unify exception messages compaction: cql3: validate options in check_restricted_table_properties compaction: validate options used in different compaction strategies compaction: validate common compaction strategy options compaction: split compaction_strategy_impl constructor compaction: validate size_tiered_compaction_strategy specific options compaction: validate time_window_compaction_strategy specific options compaction: add method to validate min and max threshold compaction: split size_tiered_compaction_strategy_options constructor compaction: make compaction strategy keys static constexpr compaction: use helpers in validate_* functions compaction: split time_window_compaction_strategy_options construtor compaction: add validate method to compaction_strategy_options time_window_compaction_strategy_options: make copy and move-able size_tiered_compaction_strategy_options: make copy and move-able	2023-09-14 16:11:52 +02:00
Pavel Emelyanov	4370e6c8d0	distributed_loader: Print sstable state explicitly When populating from a particular directory, populator code converts state to subdir name, then prints the path. The conversion is pretty much artificial, it's better to provide printer for state and print state explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-14 16:41:26 +03:00
Pavel Emelyanov	b19e6a68f8	distributed_loader: Move check for the missing dir upper The quarantine directory can be missing on the datadir and that's OK. In order to check that and skip population the populator code uses two-step logic -- first it checks if the directory exists and either puts or not the sstable_directory object into the map. Later it checks the map and decide whether to throw or not if the directory is missing. Let's keep both check and throw in one place for brevity. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-14 16:39:56 +03:00
Pavel Emelyanov	74eef029e2	distributed_loader: Use state as _sstable_directories key The populator maintains a map of path -> sstable_directory pairs one for each subdirectory for every sstable state. The "path" is in fact not used by the logic as it's just a subdirectory name for the state and the rest of the core operates on state. So it's good to make the map of directories also be indexed by the state. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-14 16:38:03 +03:00
Benny Halevy	a5a22fe5b7	tools/scylla-sstable: load_sstables: handle load errors Currently, exceptions thrown from `sst->load` are unhandled, resulting in, e.g.: ``` ERROR 2023-09-12 08:02:58,124 [shard 0:main] seastar - Exiting on unhandled exception: std::runtime_error (SSTable /home/bhalevy/.dtest/dtest-dxg4xdxg/test/node1/data/ks/cf-a3009f20512911ee8000d81cd2da3fd7/me-3g9b_0e0x_39vtt1y2rcqrffz55j-big-Data.db uses org.apache.cassandra.dht.Murmur3Partitioner partitioner which is different than com.scylladb.dht.CDCPartitioner partitioner used by the database) ``` Log the errors and exit the tool with non-zero status in this case. Fixes #15359 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15376	2023-09-14 14:27:38 +03:00
Tomasz Grabiec	551cc0233d	tablets, raft topology: Add support for decommission with tablets Load balancer will recognize decommissioning nodes and will move tablet replicas away from such nodes with highest priority. Topology changes have now an extra step called "tablet draining" which calls the load balancer. The step will execute tablet migration track as long as there are nodes which require draining. It will not do regular load balancing. If load balancer is unable to find new tablet replicas, because RF cannot be met or availability is at risk due to insufficient node distribution in racks, it will throw an exception. Currently, topology change will retry in a loop. We should make this error cause topology change to be paused so that admin becomes aware of the problem and issues an abort on the topology change. There is no infrastructure for aborts yet, so this is not implemented.	2023-09-14 13:05:49 +02:00
Tomasz Grabiec	8565af4dd3	tablet_allocator: Compute load sketch lazily This allows any node to act as a target later.	2023-09-14 13:04:49 +02:00
Tomasz Grabiec	1c595ab7f4	tablet_allocator: Set node id correctly It was unset and unused.	2023-09-14 13:04:49 +02:00
Tomasz Grabiec	389573543e	tablet_allocator: Make migration_plan a class It will be extended with more fields so that load balancer can communicate more information to the coordinator.	2023-09-14 13:04:47 +02:00
Tomasz Grabiec	d5539e080d	tablets: Implement cleanup step This change adds a stub for tablet cleanup on the replica side and wires it into the tablet migration process. The handling on replica side is incomplete because it doesn't remove the actual data yet. It only flushes the memtables, so that all data is in sstables and none requires a memtable flush. This patch is necessary to make decommission work. Otherwise, a memtable flush would happen when the decommissioned node is put in the drained state (as in nodetool drain) and it would fail on missing host id mapping (node is no longer in topology), which is examined by the tablet sharder when producing sstable sharding metadata. Leading to abort due to failed memtable flush.	2023-09-14 12:45:10 +02:00
Tomasz Grabiec	5cf035878d	storage_service, tablets: Prevent stale RPCs from running beyond their stage Example scenario: 1. coordinator A sends RPC #1 to trigger streaming 2. coordinator fails over to B 3. coordinator B performs streaming successfully 4. RPC #1 arrives and starts streaming 5. coordinator B commits the transition to the post-streaming stage 6. coordinator B executes global token metadata barrier We end up with streaming running despite the fact that the current coordinator moved on. Currently, this won't happen, because streaming holds on to erm. But we want to change that (see #14995), so that it does not block barriers for migrations of other tablets. The same problem applies to tablet cleanup. The fix is to use tablet_metadata_guard around such long running operations, which will keep hold to erm so that in the above scenario coordinator B will wait for it in step 6. The guard ensures that erm doesn't block other migrations because it switches to the latest erm if it's compatible. If it's not, it signals abort_source for the guard so that such stale operation aborts soon and the barrier in step 6 doesn't wait for long.	2023-09-14 12:45:10 +02:00
Tomasz Grabiec	6a62aca3a9	locator: Introduce tablet_metadata_guard Will be used to synchronize long-running tablet operations with topology coordinator. It blocks barriers like erm_ptr, but refreshes if change is irrelevant, so behaves as if the erm_ptr's scope was narrowed down to a single tablet.	2023-09-14 12:45:10 +02:00
Patryk Jędrzejczak	c0fd42ead4	system_keyspace: introduce decode_cdc_generation_id The decode_cdc_generations_ids function allows us to decode a vector of CDC generation IDs. After adding cleanup_candidate to CDC_GENERATIONS_V3, we need a similar function that decodes a single ID.	2023-09-14 12:09:14 +02:00
Patryk Jędrzejczak	6db325fb69	system_keyspace: add cleanup_candidate to CDC_GENERATIONS_V3 In the following commits, we implement a garbage collection for CDC_GENERATIONS_V3. The first step is introducing the clean-up candidate. It will be continually updated by the CDC generation publisher and used to remove obsolete data.	2023-09-14 12:09:10 +02:00
Tomasz Grabiec	532ec84210	locator, replica: Add a way to wait for table's effective_replication_map change	2023-09-14 12:08:54 +02:00
Tomasz Grabiec	2c6785dc8f	storage_service, tablets: Extract do_tablet_operation() from stream_tablet() It will be shared with cleanup_tablet(). Minor changes: - ditch the redundant optional<> around shared_future<>	2023-09-14 12:08:52 +02:00
Tomasz Grabiec	e2c1f904c8	raft topology: Add break in the final case clause To be safe in case we add more cases.	2023-09-14 12:07:59 +02:00

1 2 3 4 5 ...

38914 Commits