scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 15:03:06 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	2c90aeb5ee	test.py: Give local variable meaningful name Rename t to testname as it's more informative Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:53:48 +03:00
Pavel Emelyanov	b2f5b63aaa	test.py: Sanitize test list creation To create the list of tests to run there's a loop that fist collects all tests from suits, then filters the list in two ways -- excludes opt-out-ed lists (disabled and matching the skip pattern) or leaves there only opt-in-ed (those, specified as positional arguments). This patch keeps both list-checking code close to each other so that the intent is explicitly clear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:53:20 +03:00
Yaron Kaikov	cb2c69a3f7	github: mergify: Add Ref to original PR When openning a backport PR, adding a reference to the original PR. This will be used later for updating the original PR/issue once the backport is done (with different label) Closes scylladb/scylladb#17973	2024-03-25 08:12:47 +02:00
Raphael S. Carvalho	6bdb456fad	sstables_loader: Fix loader when write selector is previous during tablet migration The loader is writing to pending replica even when write selector is set to previous. If migration is reverted, then the writes won't be rolled back as it assumes pending replicas weren't written to yet. That can cause data resurrection if tablet is later migrated back into the same replica. NOTE: write selector is handled correctly when set to next, because get_natural_endpoints() will return the next replica set, and none of the replicas will be considered leaving. And of course, selector set to both is also handled correctly. Fixes #17892. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17902	2024-03-24 01:20:50 +01:00
Kamil Braun	230f23004b	Revert "test.py: adjust the test for topology upgrade to write to and read from CDC tables" This reverts commit `b4144d14c6`. The test is flaky and blocks next promotions.	2024-03-22 17:25:04 +01:00
Petr Gusev	2a5f5d1948	test_fencing: fix flakiness To cause the stale topology exception the test reads the version from the last bootstrapped host and assigns its decremented value to version and fence_version fields of system.topology. The test assumes that version == fence_version here, if version is greater than fence_version we won't get state topology exception in this setup. Tablet balancer can break this -- it may increment the version after the last node is bootstrapped. Fix this by disabling the tablet balancer earlier. fixes scylladb/scylladb#17807 Closes scylladb/scylladb#17940	2024-03-22 12:49:13 +01:00
Piotr Dulikowski	f23f8f81bf	Merge 'Raft-based service levels' from Michał Jadwiszczak This patch introduces raft-based service levels. The difference to the current method of working is: - service levels are stored in `system.service_levels_v2` - reads are executed with `LOCAL_ONE` - writes are done via raft group0 operation Service levels are migrated to v2 in topology upgrade. After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then) Fixes #17926 Closes scylladb/scylladb#16585 * github.com:scylladb/scylladb: test: test service levels v2 works in recovery mode test: add test for service levels migration test: add test for service levels snapshot test:topology: extract `trigger_snapshot` to utils main: create raft dda if sl data was migrated service:qos: store information about sl data migration service:qos: service levels migration main: assign standard service level DDA before starting group0 service:qos: fix `is_v2()` method service:qos: add a method to upgrade data accessor test: add unit_test_raft_service_levels_accessor service:storage_service: add support for service levels raft snapshot service:qos: add abort_source for group0 operations service:qos: raft service level distributed data accessor service:qos: use group0_guard in data accessor cql3:statements: run service level statements on shard0 with raft guard test: fix overrides in unit_test_service_levels_accessor service:qos: fix indentation service:qos: coroutinize some of the methods db:system_keyspace: add `SERVICE_LEVELS_V2` table service:qos: extract common service levels' table functions	2024-03-22 11:51:53 +01:00
Kamil Braun	9979adb670	Merge 'topology_coordinator: do not clear unpublished CDC generation's data' from Patryk Jędrzejczak In this PR, we ensure unpublished CDC generation's data is never removed, which was theoretically possible. If it happened, it could cause problems. CDC generation publisher would then try to publish the generation with its data removed. In particular, the precondition of calling `_sys_ks.read_cdc_generation` wouldn't be satisfied. We also add a test that passes only after the fix. However, this test needs to block execution of the CDC generation publisher's loop twice. Currently, error injections with handlers do not allow it because handlers always share received messages. Apart from the first created handler, all handlers would be instantly unblocked by a message from the past that has already unblocked the first handler. This seems like a general limitation that could cause problems in the future, so in this PR, we extend injections with handlers to solve it once and for all. We add the `share_messages` parameter to the `inject` (with handler) function. Depending on its value, handlers will share messages (as before) or not. Fixes scylladb/scylladb#17497 Closes scylladb/scylladb#17934 * github.com:scylladb/scylladb: topology_coordinator: clean_obsolete_cdc_generations: fix log topology_coordinator: do not clear unpublished CDC generation's data topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages error_injection: allow injection handlers to not share messages	2024-03-22 11:20:26 +01:00
Kamil Braun	4359a1b460	Merge 'raft timeouts: better handling of lost quorum' from Petr Gusev In this PR we add timeouts support to raft groups registry. We introduce the `raft_server_with_timeouts` class, which wraps the `raft::server` add exposes its interface with additional `raft_timeout` parameter. If it's set, the wrapper cancels the `abort_source` after certain amount of time. The value of the timeout can be specified either in the `raft_timeout` parameter, or the default value can be set in `the raft_server_with_timeouts` class constructor. The `raft_group_registry` interface is extended with `group0_with_timeouts()` method. It returns an instance of `raft_server_with_timeouts` for group0 raft server. The timeout value for it is configured in `create_server_for_group0`. It's one minute by default and can be overridden for tests with `group0-raft-op-timeout-in-ms` parameter. The new api allows the client to decide whether to use timeouts or not. In this PR we are reviewing all the group0 call sites and add `raft_timeout` if that makes sense. The general principle is that if the code is handling a client request and the client expects a potential error, we use timeouts. We don't use timeouts for background fibers (such as topology coordinator), since they wouldn't add much value. The only thing the background fiber can do with a timeout is to retry, and this will have the same end effect as not having a timeout at all. Fixes scylladb/scylladb#16604 Closes scylladb/scylladb#17590 * github.com:scylladb/scylladb: migration_manager: use raft_timeout{} storage_service::join_node_response_handler: use raft_timeout{} storage_service::start_upgrade_to_raft_topology: use raft_timeout{} storage_service::set_tablet_balancing_enabled: use raft_timeout{} storage_service::move_tablet: use raft_timeout{} raft_check_and_repair_cdc_streams: use raft_timeout{} raft_timeout: test that node operations fail properly raft_rebuild: use raft_timeout{} do_cluster_cleanup: use raft_timeout{} raft_initialize_discovery_leader: use raft_timeout{} update_topology_with_local_metadata: use with_timeout{} raft_decommission: use raft_timeout{} raft_removenode: use raft_timeout{} join_node_request_handler: add raft_timeout to make_nonvoters and add_entry raft_group0: make_raft_config_nonvoter: add raft_timeout parameter raft_group0: make_raft_config_nonvoter: add abort_source parameter manager_client: server_add with start=false shouldn't call driver_connect scylla_cluster: add seeds parameter to the add_server and servers_add raft_server_with_timeouts: report the lost quorum join_node_request_handler: add raft_timeout{} for start_operation skip_mode: add platform_key auth: use raft_timeout{} raft_group0_client: add raft_timeout parameter raft_group_registry: add group0_with_timeouts utils: add composite_abort_source.hh error_injection: move api registration to set_server_init error_injection: add inject_parameter method error_injection: move injection_name string into injection_shared_data error_injection: pass injection parameters at startup	2024-03-22 10:45:33 +01:00
Botond Dénes	f02baef871	Merge 'test/lib: sstable::test_env consolidate and reduce header footprint' from Avi Kivity Reduce the sprawl of sstables::test_env in .cc and .hh files, to ease maintenance and reduce recompilations. Closes scylladb/scylladb#17965 * github.com:scylladb/scylladb: test: sstables::test_env: complete pimplification test/lib: test_env: move test_env::reusable_sst() to test_services.cc	2024-03-22 11:26:12 +02:00
Botond Dénes	8b2856339a	Merge 'github: sync-labels: use more descriptive name for workflow' from Kefu Chai * rename `sync_labels.yaml` to `sync-labels.yaml` * use more descrptive name for workflow Closes scylladb/scylladb#17971 * github.com:scylladb/scylladb: github: sync-labels: use more descriptive name for workflow github: sync_labels: rename sync_labels to sync-labels	2024-03-22 10:01:56 +02:00
David Garcia	0375faa6aa	docs: add experimental tag Closes scylladb/scylladb#17633	2024-03-22 09:53:30 +02:00
Patryk Wrobel	28ed20d65e	scylla-nodetool: adjust effective ownership handling When a keyspace uses tablets, then effective ownership can be obtained per table. If the user passes only a keyspace, then /storage_service/ownership/{keyspace} returns an error. This change: - adds an additional positional parameter to 'status' command that allows a user to query status for table in a keyspace - makes usage of /storage_service/ownership/{keyspace} optional to avoid errors when user tries to obtain effective ownership of a keyspace that uses tablets - implements new frontend tests in 'test_status.py' that verify the new logic Refs: scylladb#17405 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17827	2024-03-22 09:51:57 +02:00
Yaron Kaikov	407d25e47b	[mergify] delete backport branch after merge Since those branches clutter the branch search UI and we don't need them after merging Closes scylladb/scylladb#17961	2024-03-22 09:51:22 +02:00
Calle Wilund	7e09517433	Update seastar submodule Submodule seastar 6b7b16a8a3..cd8a9133d2: > abort_source: add fmt::formatter for abort_requested_exception > memory: Ensure thread locals etc are minimally initialized even with non-seastar reactor options for alloc > rpc: add fmt::formatter for rpc::error classes and rpc::optional > Merge 'Adding Metrics family config' from Amnon Heiman > util: add fmt::formatter for bool_class<Tag> > util/bool_class: use the default-generated comparison operators > membarrier: cooperatively serialize calls to sys_membarrier > Merge 'build: relax the version constraint for Protobuf' from Kefu Chai > tls: add fmt::formatter for tls::subject_alt_name > memory.cc: Fix static init fiasco in system malloc override diff --git a/seastar b/seastar index 6b7b16a8a3..cd8a9133d2 160000 --- a/seastar +++ b/seastar @@ -1 +1 @@ -Subproject commit 6b7b16a8a329d831b94fdd4b41f6f55b260e9afd +Subproject commit cd8a9133d2c02f63dbd578d882cf7333a427e194 Closes scylladb/scylladb#17865	2024-03-22 09:49:23 +02:00
Kefu Chai	7ebdfdb705	github: sync-labels: use more descriptive name for workflow "label-sync" is not very helpful for developers to understand what this workflow is for. the "name" field of a job shows in the webpage on github of the pull request against which the job is performed, so if the author or reviewer checks the status of the pull request, he/she would notice these names aside of the workflow's name. for this very job, what we have now is: ``` Sync labels / label-sync ``` after this change it will be: ``` Sync labels / Synchronize labels between PR and the issue(s) fixed by it ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-22 10:41:20 +08:00
Kefu Chai	af879759b9	github: sync_labels: rename sync_labels to sync-labels to be more consistent with other github workflows Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-22 10:31:31 +08:00
Michał Jadwiszczak	c0853b461c	test: test service levels v2 works in recovery mode	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	c551a85cda	test: add test for service levels migration	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	5811f696be	test: add test for service levels snapshot	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	bf3aed1ecb	test:topology: extract `trigger_snapshot` to utils The function was defined separately in a few tests.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	a08918a320	main: create raft dda if sl data was migrated Create `raft_service_levels_distributed_data_accessor` if service levels were migrated to v2 table. This supports raft recovery mode, as service levels will be read from v2 table in the mode.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	dab909b1d1	service:qos: store information about sl data migration Save information whether service levels data was migrated to v2 table. The information is stored in `system.scylla_local` table. It's written with raft command and included in raft snapshot.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	2917ec5d51	service:qos: service levels migration Migrate data from `system_distributes.service_levels` to `system.service_levels_v2` during raft topology upgrade. Migration process reads data from old table with CL ALL and inserts the data to the new table via raft.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	36c9afda99	main: assign standard service level DDA before starting group0 `topology_state_load()` is responsible for upgrading service level DDA, so the standard DDA has to be assigned before to be upgraded	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	159a6a2169	service:qos: fix `is_v2()` method	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	fd32f5162a	service:qos: add a method to upgrade data accessor	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	d403bdfdd5	test: add unit_test_raft_service_levels_accessor Raft service level data accessor with logic simillar to `unit_test_service_levels_accessor` to avoid sleeps in boost tests.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	8bbeea0169	service:storage_service: add support for service levels raft snapshot Include mutations from `system.service_levels_v2` in `raft_snapshot`.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	d5fa0747d7	service:qos: add abort_source for group0 operations Add mechanism to abort ongoing group0 operations while draining service_level_controller or leaving the cluster.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	7e61bbb0d5	service:qos: raft service level distributed data accessor `raft_service_level_distributed_data_accessor` works this way: - on read path it reads service levels from `SYSTEM.SERVICE_LEVELS_V2` table with CL = LOCAL_ONE - on write path it starts group0 operation and it makes the change using raft command	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	71c07addb5	service:qos: use group0_guard in data accessor Adjust service_level_controller and service_level_controller::service_level_distributed_data_accessor interfaces to take `group0_guard` while adding/altering/dropping a service level.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	da82c5f0b0	cql3:statements: run service level statements on shard0 with raft guard To migrate service levels to be raft managed, obtain `group0_guard` to be able to pass it to service_level_controller's methods. Using this mechanism also automatically provides retries in case of concurrent group0 operation.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	674286b868	test: fix overrides in unit_test_service_levels_accessor	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	c0e22fcb9c	service:qos: fix indentation	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	1f3c6b2813	service:qos: coroutinize some of the methods Functions: - `service_level_controller::set_distributed_service_level()` - `service_level_controller::drop_distributed_service_level()` - `service_level_controller::drain()` Coroutines increase readability of those functions.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	8e242f5acd	db:system_keyspace: add `SERVICE_LEVELS_V2` table The table has the same schema as `system_distributed.service_levels`. However it's created entirely at once (unlike old table which creates base table first and then it adds other columns) because `system` tables are local to the node.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	990c5e7dd0	service:qos: extract common service levels' table functions Getting a service level(s) will be done the same way in raft-based service levels as it's done in standard service levels, so those funtions are extracted to reused it.	2024-03-21 23:14:57 +01:00
Avi Kivity	b530dc1e3b	test: sstables::test_env: complete pimplification sstables::test_env uses the pimpl idiom, but incompletely. This prevents reaping some of the benefits. Complete the pimplification: - the `impl` nested struct is moved out-of-line - all non-template member functions are moved out-of-line - a destructor is declared and defined out-of-line - the move constructor is also defined (necessary after the destructor is defined) After this, we can forward-declare more components.	2024-03-21 22:29:01 +02:00
Avi Kivity	d745929b44	test/lib: test_env: move test_env::reusable_sst() to test_services.cc test_env implementation is scattered around two .cc, concentrate it in test_services.cc, which happens to be the file that doesn't cause link errors. Move toc_filename with it, as it is its only caller and it is static.	2024-03-21 22:21:02 +02:00
Kefu Chai	900b56b117	raft_group0: print runtime_error by printing e.what() before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. but fortunately, fmt v10 brings the builtin formatter for classes derived from `std::exception`. but before switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM` macro, we need to print out `std::runtime_error`. so far, we don't have a shared place for formatter for `std::runtime_error`. so we are addressing the needs on a case-by-case basis. in this change, we just print it using `e.what()`. it's behavior is identical to what we have now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17954	2024-03-21 19:43:52 +02:00
Avi Kivity	f0ca5e5a08	Merge 'treewide: add fmt::formatter for exception types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter` is added for following types for backward compatibility with {fmt} < 10: * `utils::bad_exception_container_access` * `cdc::no_generation_data_exception` * classes derived from `sstables::malformed_sstable_exception` * classes derived from `cassandra_exception` Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#17944 * github.com:scylladb/scylladb: cdc: add fmt::formatter for exception types in data_dictionary.hh utils: add fmt::formatter for utils::bad_exception_container_access sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception exceptions: add fmt::formatter for classes derived from cassandra_exception cdc: add fmt::formatter for cdc::no_generation_data_exception	2024-03-21 18:44:37 +02:00
Botond Dénes	f9104fbfa9	tools/toolchain/image: update python driver (implicit) Fixes: #17662 Closes scylladb/scylladb#17956	2024-03-21 18:27:40 +02:00
Andrei Chekun	7de28729e7	test: change maintenance socket location to /tmp Fixes #16912 By default, ScyllaDB stores the maintenance socket in the workdir. Test.py by default uses the location for the ScyllaDB workdir as testlog/{mode}/scylla-#. The Usual location for cloning the repo is the user's home folder. In some cases, it can lead the socket path being too long and the test will start to fail. The simple way is to move the maintenance socket to /tmp folder to eliminate such a possibility. Closes scylladb/scylladb#17941	2024-03-21 18:22:21 +02:00
Patryk Jędrzejczak	33a0864aaa	topology_coordinator: clean_obsolete_cdc_generations: fix log We use a non-inclusive bound here, so the log was incorrect.	2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak	27465a00e0	topology_coordinator: do not clear unpublished CDC generation's data In this commit, we ensure unpublished CDC generation's data is never removed, which was theoretically possible. If it happened, it could cause problems. CDC generation publisher would then try to publish the generation with its data removed. In particular, the precondition of calling `_sys_ks.read_cdc_generation` wouldn't be satisfied. We also add a test that passes only after the fix.	2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak	f45aebeee2	topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages In the following commit, we add a test that needs to block the CDC generation publisher's loop twice. We allow it in this commit by making handlers of the `cdc_generation_publisher_fiber` injection share messages. From now on, unblocking every step of the loop will require sending a new message from the test. This change breaks the test already using the `cdc_generation_publisher_fiber` injection, so we adjust the test.	2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak	c5c4cc7d00	error_injection: allow injection handlers to not share messages For a single injection, all created injection handlers share all received messages. In particular, it means that one received message unblocks all handlers waiting for the first message. This behavior is often desired, for example, if multiple fibers execute the injected code and we want to unblock them all with a single message. However, there is a problem if we want to block every execution of the injected code. Apart from the first created handler, all handlers will be instantly unblocked by messages from the past that have already unblocked the first handler. In one of the following commits, we add a test that needs to block the CDC generation publisher's loop twice. Since it looks like there are no good workarounds for this arguably general problem, we extend injections with handlers in a way that solves it. We introduce the new `share_messages` parameter. Depending on its value, handlers will share messages or not. The details are described in the new comments in `error_injection.hh`. We also add some basic unit tests for the new funcionality.	2024-03-21 14:35:38 +01:00
Petr Gusev	ae0ec19537	migration_manager: use raft_timeout{} Checking all the call sites of the migration manager shows that all of them are initiated by user requests, not background activities. Therefore, we add a global raft_timeout{} here.	2024-03-21 16:35:48 +04:00
Petr Gusev	294e1ff464	storage_service::join_node_response_handler: use raft_timeout{} This function is called as part of a node join procedure initiated by the user, so having timeouts here makes sense.	2024-03-21 16:35:48 +04:00

1 2 3 4 5 ...

42002 Commits