scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Kamil Braun	3b108f2e31	Merge 'db: config: make consistent_cluster_management mandatory' from Patryk Jędrzejczak We make `consistent_cluster_management` mandatory in 5.5. This option will be always unused and assumed to be true. Additionally, we make `override_decommission` deprecated, as this option has been supported only with `consistent_cluster_management=false`. Making `consistent_cluster_management` mandatory also simplifies the code. Branches that execute only with `consistent_cluster_management` disabled are removed. We also update documentation by removing information irrelevant in 5.5. Fixes scylladb/scylladb#15854 Note about upgrades: this PR does not introduce any more limitations to the upgrade procedure than there are already. As in scylladb/scylladb#16254, we can upgrade from the first version of Scylla that supports the schema commitlog feature, i.e. from 5.1 (or corresponding Enterprise release) or later. Assuming this PR ends up in 5.5, the documented upgrade support is from 5.4. For corresponding Enterprise release, it's from 2023.x (based on 5.2), so all requirements are met. Closes scylladb/scylladb#16334 * github.com:scylladb/scylladb: docs: update after making consistent_cluster_management mandatory system_keyspace, main, cql_test_env: fix indendations db: config: make consistent_cluster_management mandatory test: boost: schema_change_test: replace disable_raft_schema_config db: config: make override_decommission deprecated db: config: make force_schema_commit_log deprecated	2023-12-18 09:44:52 +01:00
Raphael S. Carvalho	546b31846a	replica: Introduce storage group splitting This introduces the ability to split a storage group. The main compaction group is split into left and right groups. set_split() is used to set the storage group to splitting mode, which will create left and right compaction groups. Incoming writes will now be placed into memtable of either left or right groups. split() is used to complete the splitting of a group. It only returns when all preexisting data is split. That means main compaction group will be empty and all the data will be stored in either left or right group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 12:02:01 -03:00
Raphael S. Carvalho	bcbba9a5e3	locator: Introduce tablet_map::get_tablet_id_and_range_side(token) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:26:32 -03:00
Patryk Jędrzejczak	5ebfbf42bc	db: config: make consistent_cluster_management mandatory Code that executed only when consistent_cluster_management=false is removed. In particular, after this patch: - raft_group0 and raft_group_registry are always enabled, - raft_group0::status_for_monitoring::disabled becomes unused, - topology tests can only run with consistent_cluster_management.	2023-12-14 16:54:04 +01:00
Petr Gusev	fbf507b1ba	token_metadata: topology: cleanup add_or_update_endpoint Make host_id parameter non-optional and move it to the beginning of the arguments list. Delete unused overloads of add_or_update_endpoint. Delete unused overload of token_metadata::update_topology with inet_address argument.	2023-12-12 23:19:54 +04:00
Petr Gusev	7b55ccbd8e	token_metadata: drop the template Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.	2023-12-12 23:19:54 +04:00
Petr Gusev	799f747c8f	shared_token_metadata: switch to the new token_metadata	2023-12-12 23:19:54 +04:00
Petr Gusev	d9283bd025	tablets: switch to token_metadata2 locator_topology_test, network_topology_strategy_test and tablets_test are fully switched to the host_id-based token_metadata, meaning they no longer populate the old token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00
Tomasz Grabiec	d1c1b59236	storage_service, api: Add API to disable tablet balancing Load balancing needs to be disabled before making a series of manual migrations so that we don't fight with the load balancer. Also will be used in tests to ensure tablets stick to expected locations.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	5381792401	tablets: Add per-tablet session id field to tablet metadata range_streamer will pick it up when creating topology_guard. It's materialized in memory only for migrating tablets in tablet_transition_info.	2023-12-06 18:36:17 +01:00
Benny Halevy	86716b2048	locator: topology: add helpers to retrieve this host_id and address And respective `is_me()` predicates, to prepare for getting rid of fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Kefu Chai	054beb6377	tests: tablets: do not compare signed integer with unsigned integer when compiling the tests with -Wsign-compare, the compiler complains like: ``` /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_BROKEN_SOURCE_LOCATION -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -isystem /home/kefu/dev/scylladb/build/cmake/rust -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -march=westmere -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/tablets_test.dir/tablets_test.cc.o -MF test/boost/CMakeFiles/tablets_test.dir/tablets_test.cc.o.d -o test/boost/CMakeFiles/tablets_test.dir/tablets_test.cc.o -c /home/kefu/dev/scylladb/test/boost/tablets_test.cc /home/kefu/dev/scylladb/test/boost/tablets_test.cc:1335:53: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] for (int log2_tablets = 0; log2_tablets < tablet_count_bits; ++log2_tablets) { ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~ ``` in this case, it should be safe to use an signed int as the loop variable to be compared with `tablet_count_bits`, but let's just appease the compiler so we can enable the warning option project-wide to prevent any potential issues caused by signed-unsigned comparision. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15449	2023-09-18 13:17:16 +02:00
Tomasz Grabiec	551cc0233d	tablets, raft topology: Add support for decommission with tablets Load balancer will recognize decommissioning nodes and will move tablet replicas away from such nodes with highest priority. Topology changes have now an extra step called "tablet draining" which calls the load balancer. The step will execute tablet migration track as long as there are nodes which require draining. It will not do regular load balancing. If load balancer is unable to find new tablet replicas, because RF cannot be met or availability is at risk due to insufficient node distribution in racks, it will throw an exception. Currently, topology change will retry in a loop. We should make this error cause topology change to be paused so that admin becomes aware of the problem and issues an abort on the topology change. There is no infrastructure for aborts yet, so this is not implemented.	2023-09-14 13:05:49 +02:00
Tomasz Grabiec	389573543e	tablet_allocator: Make migration_plan a class It will be extended with more fields so that load balancer can communicate more information to the coordinator.	2023-09-14 13:04:47 +02:00
Raphael S. Carvalho	5d1f60439a	token_metadata: Add this_host_id to topology config The motivation is that token_metadata::get_my_id() is not available early in the bootstrap process, as raft topology is pulled later than new tables are registered and created, and this node is added to topology even later. To allow creation of compaction groups to retrieve "my id" from token metadata early, initialization will now feed local id into topology config which is immutable for each node anyway. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-08-16 18:23:44 -03:00
Tomasz Grabiec	67c7aadded	service, raft: Move balance_tablets() to tablet_allocator The implementation will access metrics registered from tablet_allocator.	2023-08-05 21:48:08 +02:00
Tomasz Grabiec	96d06b58df	tests: tablets: Check that load balancing is interrupted by topology change We add a special mode of load balancing, enabled through error injection, which causes it to continuously generate plans. This should keep the topology coordinator continuously in the tablet migration track. We enable this mode in test_tablets.py:test_bootstrap before bootstrapping nodes to see that bootstrap request interrupts tablet migration track. If this would not be the case, the test will hang.	2023-07-31 01:45:23 +02:00
Tomasz Grabiec	8fdbc42e71	tests: tablets: Add test for load balancing with active migrations	2023-07-31 01:45:23 +02:00
Tomasz Grabiec	5c681a1d63	tablets: Introduce tablet_mutation_builder	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	6f4a35f9ae	service: tablet_allocator: Introduce tablet load balancer Will be invoked by the topology coordinator later to decide which tablets to migrate.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	8cf92d4c86	tablets: Turn tablet_id into a struct The IDL compiler cannot deal with enum classes like this.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	dc2ec3f81c	tablets: Store "stage" in transition info It's needed to implement tablet migration. It stores the current step of tablet migration state machine. The state machine will be advanced by the topology change coordinator. See the "Tablet migration" section of topology-over-raft.md	2023-07-25 21:08:02 +02:00
Tomasz Grabiec	22ab100b41	tablets: Implement tablet sharder	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	e44e6033d8	tablets: Include pending replica in get_shard() We need to move get_shard() from tablet_info to tablet_map in order to have access to transition_info.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	70a35f70a6	test: Introduce tablets_test	2023-04-24 10:49:37 +02:00

26 Commits