scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 15:33:15 +00:00

Author	SHA1	Message	Date
Wojciech Mitros	629ea63922	rust: update dependencies The currently used versions of "time" and "rustix" depencies had minor security vulnerabilities. In this patch: - the "rustix" crate is updated - the "chrono" crate that we depend on was not compatible with the version of the "time" crate that had fixes, so we updated the "chrono" crate, which actually removed the dependency on "time" completely. Both updated were performed using "cargo update" on the relevant package and the corresponding version. Fixes #15772 Closes scylladb/scylladb#16378	2023-12-17 13:20:25 +02:00
Kefu Chai	273ee36bee	tools/scylla-sstable: add `scylla sstable shard-of` command when migrating to the uuid-based identifiers, the mapping from the integer-based generation to the shard-id is preserved. we used to have "gen % smp_count" for calculating the shard which is responsible to host a given sstable. despite that this is not a documented behavior, this is handy when we try to correlate an sstable to a shard, typically when looking at a performance issue. in this change, a new subcommand is added to expose the connection between the sstable and its "owner" shards. Fixes #16343 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16345	2023-12-15 11:36:45 +02:00
Avi Kivity	2b8392b8b8	Merge 'database, reader_concurrency_semaphore: deduplicate reader_concurrency_semaphore metrics ' from Botond Dénes Reduce code duplication by defining each metric just once, instead of three times, by having the semaphore register metrics by itself. This also makes the lifecycle of metrics contained in that of the semaphore. This is important on enterprise where semaphores are added and removed, together with service levels. We don't want all semaphores to export metrics, so a new parameter is introduced and all call-sites make a call whether they opt-in or not. Fixes: https://github.com/scylladb/scylladb/issues/16402 Closes scylladb/scylladb#16383 * github.com:scylladb/scylladb: database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics reader_concurrency_semaphore: add register_metrics constructor parameter sstables: name sstables_manager	2023-12-14 18:26:24 +02:00
Patryk Jędrzejczak	dced4bb924	system_keyspace, main, cql_test_env: fix indendations Broken in the previous patch.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	5ebfbf42bc	db: config: make consistent_cluster_management mandatory Code that executed only when consistent_cluster_management=false is removed. In particular, after this patch: - raft_group0 and raft_group_registry are always enabled, - raft_group0::status_for_monitoring::disabled becomes unused, - topology tests can only run with consistent_cluster_management.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	7dd7ec8996	test: boost: schema_change_test: replace disable_raft_schema_config In the following commits, we make consistent cluster management mandatory. This will make disable_raft_schema_config unusable, so we need to get rid of it. However, we don't want to remove tests that use it. The idea is to use the Raft RECOVERY mode instead of disabling consistent cluster management directly.	2023-12-14 16:54:04 +01:00
Kefu Chai	caa0230e5d	test/cql-pytest: use raw string when appropriate we use "\w" to represent a character class in Python. see https://docs.python.org/3/library/re.html. but "\" should be escaped as well, CPython accepts "\w" after trying to find an escaped character of "\." but failed, and leave "\." as it is. but it complains. in this change, we use raw string to avoid escaping "\" in the regular expression. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16405	2023-12-13 21:14:32 +02:00
Kamil Braun	26cbd28883	Merge 'token_metadata: switch to host_id' from Petr Gusev In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes. The refactoring is structured as follows: * Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version. * Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading. * Go over all the places which read `token_metadata` and switch them to the new version. * Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template. These series [depends](`1745a1551a`) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](`95c726a8df`) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node. Closes scylladb/scylladb#15903 * github.com:scylladb/scylladb: topology: remove_endpoint: remove inet_address overload token_metadata: topology: cleanup add_or_update_endpoint token_metadata: add_replacing_endpoint: forbid replacing node with itself topology: drop key_kind, host_id is now the primary key dc_rack_fn: make it non-template token_metadata: drop the template shared_token_metadata: switch to the new token_metadata gossiper: use new token_metadata database: get_token_metadata -> new token_metadata erm: switch to the new token_metadata storage_service: get_token_metadata -> token_metadata2 storage_service: get_token_to_endpoint_map: use new token_metadata api/token_metadata: switch to new version storage_service::on_change: switch to new token_metadata cdc: switch to token_metadata2 calculate_natural_endpoints: fix indentation calculate_natural_endpoints: switch to token_metadata2 storage_service: get_changed_ranges_for_leaving: use new token_metadata decommission_with_repair, removenode_with_repair -> new token_metadata rebuild_with_repair, replace_with_repair: use new token_metadata bootstrap: use new token_metadata tablets: switch to token_metadata2 calculate_effective_replication_map: use new token_metadata calculate_natural_endpoints: fix formatting abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata network_topology_strategy_test: update new token_metadata storage_service: on_alive: update new token_metadata storage_service: handle_state_bootstrap: update new token_metadata storage_service: snitch_reconfigured: update new token_metadata storage_service: leave_ring: update new token_metadata storage_service: node_ops_cmd_handler: update new token_metadata storage_service: node_ops_cmd_handler: add coordinator_host_id storage_service: bootstrap: update new token_metadata storage_service: join_token_ring: update new token_metadata storage_service: excise: update new token_metadata storage_service: join_cluster: update new token_metadata storage_service: on_remove: update new token_metadata storage_service: handle_state_normal: fill new token_metadata storage_service: topology_state_load: fill new token_metadata storage_service: adjust update_topology_change_info to update new token_metadata topology: set self host_id on the new topology locator::topology: allow being_replaced and replacing nodes to have the same IP token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known token_metadata: get_host_id: exception -> on_internal_error token_metadata: add get_all_ips method token_metadata: support host_id-based version token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. locator: make dc_rack_fn a template locator/topology: add key_kind parameter token_metadata: topology_change_info: change field types to token_metadata_ptr token_metadata: drop unused method get_endpoint_to_token_map_for_reading	2023-12-13 16:35:52 +01:00
Botond Dénes	e1b30f50be	reader_concurrency_semaphore: add register_metrics constructor parameter To be used in the next patch to control whether the semaphore registers and exports metrics or not. We want to move metric registration to the semaphore but we don't want all semaphores to export metrics. The decision on whether a semaphore should or shouldn't export metrics should be made on a case-by-case basis so this new parameter has no default value (except for the for_tests constructor).	2023-12-13 06:25:45 -05:00
Avi Kivity	814f3eb6b5	sstables: name sstables_manager Soon, the reader_concurrency_semaphore will require a unique and meaningful name in order to label its metrics. To prepare for that, name sstable_manager instances. This will be used to generate a name for sstable_manager's reader_concurrency_semaphore.	2023-12-13 04:40:33 -05:00
Tomasz Grabiec	cdc53d0a49	test: tablets: Add test case which tests table drop concurrent with migration	2023-12-13 00:06:56 +01:00
Avi Kivity	22b77edef3	Merge 'scylla-nodetool: implement the scrub command' from Botond Dénes On top of the capabilities of the java-nodetool command, the following additional functionalit is implemented: * Expose quarantine-mode option of the scrub_keyspace REST API * Exit with error and print a message, when scrub finishes with abort or validation_errors return code The command comes with tests and all tests pass with both the new and the current nodetool implementations. Refs: #15588 Refs: #16208 Closes scylladb/scylladb#16391 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the scrub command test/nodetool: rest_api_mock.py: add missing "f" to error message f string api: extract scrub_status into its own header	2023-12-12 22:22:35 +02:00
Petr Gusev	9d93a518ac	topology: remove_endpoint: remove inet_address overload The overload was used only in tests.	2023-12-12 23:19:54 +04:00
Petr Gusev	fbf507b1ba	token_metadata: topology: cleanup add_or_update_endpoint Make host_id parameter non-optional and move it to the beginning of the arguments list. Delete unused overloads of add_or_update_endpoint. Delete unused overload of token_metadata::update_topology with inet_address argument.	2023-12-12 23:19:54 +04:00
Petr Gusev	3b59919a9c	topology: drop key_kind, host_id is now the primary key	2023-12-12 23:19:54 +04:00
Petr Gusev	8c551f9104	dc_rack_fn: make it non-template	2023-12-12 23:19:54 +04:00
Petr Gusev	7b55ccbd8e	token_metadata: drop the template Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.	2023-12-12 23:19:54 +04:00
Petr Gusev	799f747c8f	shared_token_metadata: switch to the new token_metadata	2023-12-12 23:19:54 +04:00
Petr Gusev	11cc21d0a9	erm: switch to the new token_metadata In this commit we replace token_metadata with token_metadata2 in the erm interface and field types. To accommodate the change some of strategy-related methods are also updated. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	80ccbc0d53	calculate_natural_endpoints: switch to token_metadata2 All usages of calculate_natural_endpoints are migrated, now we can change its interface to take token_metadata2 instead of token_metadata.	2023-12-12 23:19:53 +04:00
Petr Gusev	d9283bd025	tablets: switch to token_metadata2 locator_topology_test, network_topology_strategy_test and tablets_test are fully switched to the host_id-based token_metadata, meaning they no longer populate the old token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	f5038f6c72	calculate_effective_replication_map: use new token_metadata In this commit we switch the function calculate_effective_replication_map to use the new token_metadata. We do this by employing our new helper calculate_natural_ips function. We can't use this helper for current_endpoints/target_endpoints though, since in that case we won't add the IP to the pending_endpoints in the replace-with-same-ip scenario The token_metadata_test is migrated to host_ids in the same commit to make it pass. Other tests work because they fill both versions of the token_metadata, but for this test it was simpler to just migrate it straight away. The test constructs the old token_metadata over the new token_metadata, this means only the get_new() method will work on it. That's why we also need to switch some other functions (maybe_remove_node_being_replaced, do_get_natural_endpoints, get_replication_factor) to the new version in the same commit. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	d5b4b02b28	abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata We've updated all the places where token_metadata is mutated, and now we can progress to the next stage of the refactoring - gradually switching the read code paths. The calculate_natural_endpoints function is at the core of all of them. It decides to what nodes the given token should be replicated to for the given token_metadata. It has a lot of usages in various contexts, we can't switch them all in one commit, so instead we allowed the function to behave in both ways. If use_host_id parameter is false, the function uses the provided token_metadata as is and returns endpoint_set as a result. If it's true, it uses get_new() on the provided token_metadata and returns host_id_set as a result. The scope of the whole refactoring is limited to the erm data structure, its interface will be kept inet_address based for now. This means we'll often need to resolve host_ids to inet_address-es as soon as we got a result from calculated_natural_endpoints. A new calculate_natural_ips function is added for convenience. It uses the new token_metadata and immediately resolves returned host_id-s to inet_address-es. The auxiliary declarations natural_ep_type, set_type, vector_type, get_self_id, select_tm are introduced only for the sake of migration, they will be removed later.	2023-12-12 23:19:53 +04:00
Petr Gusev	1960436d93	network_topology_strategy_test: update new token_metadata	2023-12-12 23:19:53 +04:00
Botond Dénes	47450ae4db	tools/scylla-nodetool: implement the scrub command On top of the capabilities of the java-nodetool command, the following additional functionalit is implemented: * Expose quarantine-mode option of the scrub_keyspace REST API * Exit with error and print a message, when scrub finishes with abort or validation_errors return code	2023-12-12 09:39:58 -05:00
Botond Dénes	892683cace	test/nodetool: rest_api_mock.py: add missing "f" to error message f string	2023-12-12 09:33:39 -05:00
Tomasz Grabiec	9b0d9e7c6b	tests: tablets: Do read barrier in get_tablet_replicas() In order for the call to see all prior changes to group0. Also, we should query on the host on which we executed the barrier. I hope this will reduce flakiness observed in CI runs on https://github.com/scylladb/scylladb/pull/16341 where the expected tablet replica didn't match the one returned by get_tablet_replica() after tablet movement, possibly because the node is still behind group0 changes.	2023-12-12 12:46:39 +01:00
Botond Dénes	885a807c71	Merge 'api: storage_service: api for starting async compaction' from Aleksandra Martyniuk For all compaction types which can be started with api, add an asynchronous version of api, which returns task_id of the corresponding task manager task. With the task_id a user can check task status, abort, or wait for it, using task manager api. Closes scylladb/scylladb#15092 * github.com:scylladb/scylladb: test: use async api in test_not_created_compaction_task_abort test: test compaction task started asynchronously api: tasks: api for starting async compaction api: compaction: pass pointer to top level compaction tasks	2023-12-12 12:06:52 +02:00
Calle Wilund	b34366957e	commitlog_test::test_commitlog_reader: handle segment_truncation Fixes #16312 This test replays a segment before it might be closed or even fully flushed, thus it can (with the new semantics) generate a segment_truncation exception if hitting eof earlier than expected. (Note: test does not use pre-allocated segments).	2023-12-11 11:53:12 +00:00
Calle Wilund	d85c0ea26f	commitlog_test: coroutinize test_commitlog_reader To make it easier to read and modify.	2023-12-11 11:47:48 +00:00
Tomasz Grabiec	effb9fb3cb	Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620). If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957). When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary. We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`. Fixes: #7620 Fixes: #13957 --- This is a reincarnation of PR scylladb/scylladb#15331. The previous PR was reverted due to a bug it unmasked; the bug has now been fixed (scylladb/scylladb#16139). Some refactors from the previous PR were already merged separately, so this one is a bit smaller. I have checked with @Lorak-mmk's reproducer (https://github.com/Lorak-mmk/udt_schema_change_reproducer -- many thanks for it!) that the originally exposed bug is no longer reproducing on this PR, and that it can still be reproduced if I revert the aforementioned fix on top of this PR. Closes scylladb/scylladb#16242 * github.com:scylladb/scylladb: docs: describe group 0 schema versioning in raft docs test: add test for group 0 schema versioning feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0 migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations schema_tables: use schema version from group 0 if present migration_manager: store `group0_schema_version` in `scylla_local` during schema changes system_keyspace: make `get/set_scylla_local_param` public feature_service: add `GROUP0_SCHEMA_VERSIONING` feature	2023-12-11 12:17:57 +01:00
Aleksandra Martyniuk	31977a1cde	test: use async api in test_not_created_compaction_task_abort	2023-12-11 11:39:41 +01:00
Aleksandra Martyniuk	68f6886d50	test: test compaction task started asynchronously Check whether task id returned by asynchronous api is correct and whether tasks of proper type are created.	2023-12-11 11:39:41 +01:00
Petr Gusev	66c30e4f8e	topology: set self host_id on the new topology With this commit, we begin the next stage of the refactoring - updating the new version of the token_metadata in all places where the old version is currently being updated. In this commit we assign host_id of this node, both in main.cc and in boost tests.	2023-12-11 12:51:34 +04:00
Petr Gusev	5a1418fdba	token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known This commit fixes an inconsistency in method names: get_host_id and get_host_id_if_known are (internal_error, returns null), but there was only one method for the opposite conversion - get_endpoint_for_host_id, and it returns null. In this commit we change it to on_internal_error if it can't find the argument and add another method get_endpoint_for_host_id_if_known which returns null in this case. We can't use get_endpoint_for_host_id/get_host_id in host_id_or_endpoint::resolve since it's called from storage_service::parse_node_list -> token_metadata::parse_host_id_and_endpoint, and exceptions are caught and handled in `storage_service::parse_node_list`.	2023-12-11 12:51:34 +04:00
Petr Gusev	c9fbe3d377	locator: make dc_rack_fn a template In the next commits token_metadata will be made a template with NodeId=inet_address\|host_id parameter. This parameter will be passed to dc_rack_fn function, so it also should be made a template.	2023-12-11 12:51:33 +04:00
Piotr Dulikowski	5227b71363	locator/topology: add key_kind parameter For the host_id-based token_metadata we want host_id to be the main node key, meaning it should be used in add_or_update_endpoint to find the node to update. For the inet_address-based token_metadata version we want to retain the old behaviour during transition period. In this commit we introduce key_kind parameter and use key_kind::inet_address in all current topology usages. Later we'll use key_kind::host_id for the new token_metadata. In the last commits of the series, when the new token_metadata version is used everywhere, we will remove key_kind enum.	2023-12-11 12:51:33 +04:00
Alexander Turetskiy	f30b5473ab	cql: Reject empty options while altering a keyspace Reject ALTER KEYSPACE request for NetworkTopologyStrategy when replication options are missed. Also reject CREATE KEYSPACE with no replication factor options. Cassandra has a default_keyspace_rf configuration that may allow such CREATE KEYSPACE commands, but Scylla doesn't have this option (refs #16028). fixes #10036 Closes scylladb/scylladb#16221	2023-12-10 17:44:35 +02:00
Kefu Chai	818343b57d	build: build session.cc in CMake building system this source file was added in `d3d83869`. so let's update cmake as well. sessions_tests was added in the same commit, so add it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16344	2023-12-09 22:14:47 +02:00
Avi Kivity	d62a5fc60b	Merge 'tools/scylla-nodetool: implement additional commands, part 5/N ' from Botond Dénes This PR implements the following new nodetool commands: * decomission * rebuild * removenode * getlogginglevels * setlogginglevel * move * refresh All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#16348 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the refresh command tools/scylla-nodetool: implement the move command tools/scylla-nodetool: implement setlogginglevel command tools/sclla-sstable: implement the getlogginglevels command tools/scylla-nodetool: implement the removenode command tools/scylla-nodetool: implement the rebuild command tools/scylla-nodetool: implement the decommission command	2023-12-09 21:47:22 +02:00
Kamil Braun	30fc36f8d2	test: add test for group 0 schema versioning Perform schema changes while mixing nodes in RECOVERY mode with nodes in group 0 mode: - schema changes originating from RECOVERY node use digest-based schema versioning. - schema changes originating from group 0 nodes use persisted versions committed through group 0. Verify that schema versions are in sync after each schema change, and that each schema change results in a different version. Also add a simple upgrade test, performing a schema change before we enable Raft (which also enables the new versioning feature) in the entire cluster, then once upgrade is finished. One important upgrade test is missing, which we should add to dtest: create a cluster in Raft mode but in a Scylla version that doesn't understand GROUP0_SCHEMA_VERSIONING. Then start upgrading to a version that has this patchset. Perform schema changes while the cluster is mixed, both on non-upgraded and on upgraded nodes. Such test is especially important because we're adding a new column to the `system.scylla_local` table (which we then redact from the schema definition when we see that the feature is disabled).	2023-12-08 17:46:31 +01:00
Kamil Braun	7dad31c78f	feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode As promised in earlier commits: Fixes: #7620 Fixes: #13957 Also modify two test cases in `schema_change_test` which depend on the digest calculation method in their checks. Details are explained in the comments.	2023-12-08 17:46:31 +01:00
Botond Dénes	496459165e	tools/scylla-nodetool: implement the refresh command	2023-12-08 08:58:16 -05:00
Botond Dénes	58d3850da1	tools/scylla-nodetool: implement setlogginglevel command	2023-12-08 08:18:56 -05:00
Botond Dénes	3a8590e1af	tools/sclla-sstable: implement the getlogginglevels command	2023-12-08 07:32:45 -05:00
Botond Dénes	c35ed794de	tools/scylla-nodetool: implement the removenode command	2023-12-08 07:32:31 -05:00
Botond Dénes	9a484cb145	tools/scylla-nodetool: implement the rebuild command	2023-12-08 07:05:30 -05:00
Botond Dénes	ea62f7c848	tools/scylla-nodetool: implement the decommission command	2023-12-08 06:14:36 -05:00
Kurashkin Nikita	c071cd92b5	cql3:statement_restrictions.cc add more conditions to prevent "allow filtering" error to pop up in delete/update statements Modified Cassandra tests to check for Scylla's error messages Fixes #12474 Closes scylladb/scylladb#15811	2023-12-07 21:25:18 +02:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00

... 115 116 117 118 119 ...

11801 Commits