scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Nadav Har'El	b659dfcbe9	test/cqlpy: comment out Cassandra check that is no longer relevant In the test translated from Cassandra validation/operations/alter_test.py we had two lines in the beginning of an unrelated test that verified that CREATE KEYSPACE is not allowed without replication parameters. But starting recently, ScyllaDB does have defaults and does allow these CREATE KEYSPACE. So comment out these two test lines. We didn't notice that this test started to fail, because it was already marked xfail, because in the main part of this test, it reproduces a different issue! The annoying side-affect of these no-longer-passing checks was that because the test expected a CREATE KEYSPACE to fail, it didn't bother to delete this keyspace when it finished, which causes test.py to report that there's a problem because some keyspaces still exist at the end of the test. Now that we fixed this problem, we no longer need to list this test in test/cqlpy/suite.yaml as a test that leaves behind undeleted keyspaces. Fixes #26292 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#26341	2025-11-11 10:34:27 +02:00
Botond Dénes	042303f0c9	Merge 'Alternator: enable tablets by default - depending on tablets_mode_for_new_keyspaces' from Nadav Har'El Before this series, Alternator's CreateTable operation defaults to creating a table replicated with vnodes, not tablets. The reasons for this default included missing support for LWT, Materialized Views, Alternator TTL and Alternator Streams if tablets are used. But today, all of these (except the still-experimental Alternator Streams) are now fully available with tablets, so we are finally ready to switch Alternator to use tablets by default in new tables. We will use the same configuration parameter that CQL uses, tablets_mode_for_new_keyspaces, to determine whether new keyspaces use tablets by default. If set to `enabled`, tablets are used by default on new tables. If set to `disabled`, tablets will not be used by default (i.e., vnodes will be used, as before). A third value, `enforced` is similar to `enabled` but forbids overriding the default to vnodes when creating a table. As before, the user can set a tag during the CreateTable operation to override the default choice of tablets or vnodes (unless in `enforced` mode). This tag is now named `system:initial_tablets` - whereas before this patch it was called `experimental:initial_tablets`. The rules stay the same as with the earlier, experimental:initial_tablets tag: when supplied with a numeric value, the table will use tablets. When supplied with something else (like a string "none"), the table will use vnodes. Fixes https://github.com/scylladb/scylladb/issues/22463 Backport to 2025.4, it's important not to delay phasing out vnodes. Closes scylladb/scylladb#26836 * github.com:scylladb/scylladb: test,alternator: use 3-rack clusters in tests alternator: improve error in tablets_mode_for_new_keyspaces=enforced config: make tablets_mode_for_new_keyspaces live-updatable alternator: improve comment about non-hidden system tags alternator: Fix test_ttl_expiration_streams() alternator: Fix test_scan_paging_missing_limit() alternator: Don't require vnodes for TTL tests alternator: Remove obsolete test from test_table.py alternator: Fix tag name to request vnodes alternator: Fix test name clash in test_tablets.py alternator: test_tablets.py handles new policy reg. tablets alternator: Update doc regarding tablets support alternator: Support `tablets_mode_for_new_keyspaces` config flag Fix incorrect hint for tablets_mode_for_new_keyspaces Fix comment for tablets_mode_for_new_keyspaces	2025-11-11 09:45:29 +02:00
Nikos Dragazis	94c4f651ca	test/cqlpy: Test secondary index with short reads Add a test to check that paged secondary index queries behave correctly when pages are short. This is currently failing in Scylla, but passes in Cassandra 5, therefore marked as "xfailing". Refer to the test's docstring for more details. The bug is a regression introduced by commit `f6f18b1`. `test/cqlpy/run --release ...` shows that the test passes in 5.1 but fails in 5.2 onwards. Refs #25839. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#25843	2025-11-11 09:28:45 +02:00
Pavel Emelyanov	decf86b146	Merge 'Make AWS & Azure KMS boost testing use fixture + include Azure in pytests' from Calle Wilund * Adds test fixture for AWS KMS * Adds test fixture for Azure KMS * Adds key provider proxy for Azure to pytests (ported dtests) * Make test gather for boost tests handle suites * Fix GCP test snafu Fixes #26781 Fixes #26780 Fixes #26776 Fixes #26775 Closes scylladb/scylladb#26785 * github.com:scylladb/scylladb: gcp_object_storage_test: Re-enable parallelism. test::pylib: Add azure (mock) testing to EAR matrix test::boost::encryption_at_rest: Remove redundant azure test indent test::boost::encryption_at_rest: Move azure tests to use fixture test::lib: Add azure mock/real server fixture test::pylib::boost: Fix test gather to handle test suites utils::gcp::object_storage: Fix typo in semaphore init test::boost::encryption_at_rest_test: Remove redundant indent test::boost::test_encryption_at_rest: Move to AWS KMS fixture for kms test test::boost::test_encryption_at_rest: Reorder tests and helpers ent::encryption: Make text helper routines take std::string test::pylib::dockerized_service: Handle docker/podman bind error message test::lib::aws_kms_fixture: Add a fixture object to run mock AWS KMS test::lib::gcs_fixture: Only set port if running docker image + more retry	2025-11-10 14:35:05 +03:00
Yauheni Khatsianevich	d3e62b15db	fix(test): minor typo fix, removing redundant param from logging Closes scylladb/scylladb#26901	2025-11-10 08:42:11 +03:00
Nadav Har'El	65ed678109	test,alternator: use 3-rack clusters in tests With tablets enabled, we can't create an Alternator table on a three- node cluster with a single rack, since Scylla refuses RF=3 with just one rack and we get the error: An error occurred (InternalServerError) when calling the CreateTable operation: ... Replication factor 3 exceeds the number of racks (1) in dc datacenter1 So in test/cluster/test_alternator.py we need to use the incantation "auto_rack_dc='dc1'" every time that we create a three-node cluster. Before this patch, several tests in test/cluster/test_alternator.py failed on this error, with this patch all of them pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-11-09 12:52:29 +02:00
Nadav Har'El	c03081eb12	alternator: improve error in tablets_mode_for_new_keyspaces=enforced When in tablets_mode_for_new_keyspaces=enforced mode, Alternator is supposed to fail when CreateTable asks explicitly for vnodes. Before this patch, this error was an ugly "Internal Server Error" (an exception thrown from deep inside the implementation), this patch checks for this case in the right place, to generate a proper ValidationException with a proper error message. We also enable the test test_tablets_tag_vs_config which should have caught this error, but didn't because it was marked xfail because tablets_mode_for_new_keyspaces had not been live-updatable. Now that it is, we can enable the test. I also improved the test to be slightly faster (no need to change the configuration so many times) and also check the ordinary case - where the schema doesn't choose neither vnodes nor tablets explicitly and we should just use the default. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	eeb3a40afb	alternator: Fix test_ttl_expiration_streams() The test is now aware of the new name of the `system:initial_tablets` tag.	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	a659698c6d	alternator: Fix test_scan_paging_missing_limit() With tablets, the test begun failing. The failure was correlated with the number of initial tablets, which when kept at default, equals 4 tablets per shard in release build and 2 tablets per shard in dev build. In this patch we split the test into two - one with a more data in the table to check the original purpose of this test - that Scan doesn't return the entire table in one page if "Limit" is missing. The other test reproduces issue #10327 - that when the table is small, Scan's page size isn't strictly limited to 1MB as it is in DynamoDB. Experimentally, 8000 KB of data (compared to 6000 KB before this patch) is enough when we have up to 4 initial tablets per shard (so 8 initial tablets on a two-shard node as we typically run in tests). Original patch by Piotr Szymaniak <piotr.szymaniak@scylladb.com> modified by Nadav Har'El <nyh@scylladb.com>	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	345747775b	alternator: Don't require vnodes for TTL tests Since #23662 Alternator supports TTL with tablets too. Let's clear some leftovers causing Alternator to test TTL with vnodes instead of with what is default for Alternator (tablets or vnodes).	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	274d0b6d62	alternator: Remove obsolete test from test_table.py Since Alternator is capable of runnng with tablets according to the flag in config, remove the obsolete test that is making sure that Alternator runs with vnodes.	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	63897370cb	alternator: Fix tag name to request vnodes The tag was lately renamed from `experimental:initial_tablets` to `system::initial_tablets`. This commit fixes both the tests as well as the exceptions sent to the user instructing how to create table with vnodes.	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	c7de7e76f4	alternator: Fix test name clash in test_tablets.py	2025-11-09 12:52:28 +02:00
Piotr Szymaniak	7466325028	alternator: test_tablets.py handles new policy reg. tablets Adjust the tests so they are in-line with the config flag 'tablets_mode_for_new_keyspaces` that the Alternator learned to honour.	2025-11-09 12:52:28 +02:00
Botond Dénes	cdba3bebda	Merge 'Generalize directory checks in database_test's snapshot test cases' from Pavel Emelyanov Those test cases use lister::scan_dir() to validate the contents of snapshot directory of a table against this table's base directory. This PR generalizes the listing code making it shorter. Also, the snapshot_skip_flush_works case is missing the check for "schema.cql" file. Nothing is wrong with it, but the test is more accurate if checking it. Also, the snapshot_with_quarantine_works case tries to check if one set of names is sub-set of another using lengthy code. Using std::includes improves the test readability a lot. Also, the PR replaces lister::scan_dir() with directory_lister. The former is going to be removed some day (see also #26586) Improving existing working test, no backport is needed. Closes scylladb/scylladb#26693 * github.com:scylladb/scylladb: database_test: Simplify snapshot_with_quarantine_works() test database_test: Improve snapshot_skip_flush_works test database_test: Simplify snapshot_works() tests database_test: Use collect_files() to remove files database_test: Use collectz_files() to count files in directory database_test: Introduce collect_files() helper	2025-11-07 16:04:02 +02:00
Michał Chojnowski	b82c2aec96	sstables/trie: fix an assertion violation in bti_partition_index_writer_impl::write_last_key _last_key is a multi-fragment buffer. Some prefix of _last_key (up to _last_key_mismatch) is unneeded because it's already a part of the trie. Some suffix of _last_key (after needed_prefix) is unneeded because _last_key can be differentiated from its neighbors even without it. The job of write_last_key() is to find the middle fragments, (containing the range `[_last_key_mismatch, needed_prefix)`) trim the first and last of the middle fragments appropriately, and feed them to the trie writer. But there's an error in the current logic, in the case where `_last_key_mismatch` falls on a fragment boundary. To describe it with an example, if the key is fragmented like `aaa\|bbb\|ccc`, `_last_key_mismatch == 3`, and `needed_prefix == 7`, then the intended output to the trie writer is `bbb\|c`, but the actual output is `\|bbb\|c`. (I.e. the first fragment is empty). Technically the trie writer could handle empty fragments, but it has an assertion against them, because they are a questionable thing. Fix that. We also extend bti_index_test so that it's able to hit the assert violation (before the patch). The reason why it wasn't able to do that before the patch is that the violation requires decorated keys to differ on the _first_ byte of a partition key column, but the keys generated by the test only differed on the last byte of the column. (Because the test was using sequential integers to make the values more human-readable during debugging). So we modify the key generation to use random values that can differ on any position. Fixes scylladb/scylladb#26819 Closes scylladb/scylladb#26839	2025-11-07 11:25:07 +02:00
Abhinav Jha	ab0e0eab90	raft topology: skip non-idempotent steps in decommission path to avoid problems during races In the present scenario, there are issues in left_token_ring transition state execution in the decommissioning path. In case of concurrent mutation race conditions, we enter left_token_ring more than once, and apparently if we enter left token ring second time, we try to barrier the decommisioned node, which at this point is no longer possible. That's what causes the errors. This pr resolves the issue by adding a check right in the start of left_token_ring to check if the first topology state update, which marks the request as done is completed. In this case, its confirmed that this is the second time flow is entering left_token_ring and the steps preceding the request status update should be skipped. In such cases, all the rest steps are skipped and topology node status update( which threw error in previous trial) is executed directly. Node removal status from group0 is also checked and remove operation is retried if failed last time. Although these changes are done with regard to the decommission operation behavior in `left_token_ring` transition state, but since the pr doesn't interfere with the core logic, it should not derail any rollback specific logic. The changes just prevent some non-idempotent operations from re-occuring in case of failures. Rest of the core logic remain intact. Test is also added to confirm the proper working of the same. Fixes: scylladb/scylladb#20865 Backport is not needed, since this is not a super critical bug fix. Closes scylladb/scylladb#26717	2025-11-07 10:07:49 +01:00
Asias He	dbeca7c14d	repair: Add metric for time spent on tablet repair It is useful to check time spent on tablet repair. It can be used to compare incremental repair and non-incremental repair. The time does not include the time waiting for the tablet scheduler to schedule the tablet repair task. Fixes #26505 Closes scylladb/scylladb#26502	2025-11-06 10:00:20 +03:00
Calle Wilund	b0061e8c6a	gcp_object_storage_test: Re-enable parallelism. Re-enable parallel execution to get better logs. Note, this is somewhat wasteful, as we won't re-use test fixture here, but in the end, it is probably an improvement.	2025-11-05 15:07:26 +00:00
Wojciech Mitros	0a22ac3c9e	mv: don't mark the view as built if the reader produced no partitions When we build a materialized view we read the entire base table from start to end to generate all required view udpates. If a view is created while another view is being built on the same base table, this is optimized - we start generating view udpates for the new view from the base table rows that we're currently reading, and we read the missed initial range again after the previous view finishes building. The view building progress is only updated after generating view updates for some read partitions. However, there are scenarios where we'll generate no view updates for the entire read range. If this was not handled we could end up in an infinite view building loop like we did in https://github.com/scylladb/scylladb/issues/17293 To handle this, we mark the view as built if the reader generated no partitions. However, this is not always the correct conclusion. Another scenario where the reader won't encounter any partitions is when view building is interrupted, and then we perform a reshard. In this scenario, we set the reader for all shards to the last unbuilt token for an existing partition before the reshard. However, this partition may not exist on a shard after reshard, and if there are also no partitions with higher tokens, the reader will generate no partitions even though it hasn't finished view building. Additionally, we already have a check that prevents infinite view building loops without taking the partitions generated by the reader into account. At the end of stream, before looping back to the start, we advance current_key to the end of the built range and check for built views in that range. This handles the case where the entire range is empty - the conditions for a built view are: 1. the "next_token" is no greater than "first_token" (the view building process looped back, so we've built all tokens above "first_token") 2. the "current_token" is no less than "first_token" (after looping back, we've built all tokens below "first_token") If the range is empty, we'll pass these conditions on an empty range after advancing "current_key" to the end because: 1. after looping back, "next_token" will be set to `dht::minimum_token` 2. "current_key" will be set to `dht::ring_position::max()` In this patch we remove the check for partitions generated by the reader. This fixes the issue with resharding and it does not resurrect the issue with infinite view building that the check was introduced for. Fixes https://github.com/scylladb/scylladb/issues/26523 Closes scylladb/scylladb#26635	2025-11-05 17:02:32 +02:00
Nadav Har'El	8a07b41ae4	test/cqlpy: add test confirming page_size=0 disables paging In pull request #26384 a discussion started whether page_size=0 really disables paging, or maybe one needs page_size=-1 to truly disable paging. The reason for that discussion was commit `08c81427b` that started to use page_size=-1 for internal unpaged queries, and commit `76b31a3` that incorrectly claimed that page_size>=0 means paging is enabled. This patch introduces a test that confirms that with page_size=0, paging is truly disabled - including the size-based (1MB) paging. The new test is Scylla-only, because Cassandra is anyway missing the size-based page cutoff (see CASSANDRA-11745). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#26742	2025-11-05 15:52:16 +03:00
Tomasz Grabiec	f8879d797d	tablet_allocator: Avoid load balancer failure when replacing the last node in a rack Introduced in `9ebdeb2` The problem is specific to node replacing and rack-list RF. The culprit is in the part of the load balancer which determines rack's shard count. If we're replacing the last node, the rack will contain no normal nodes, and shards_per_rack will have no entry for the rack, on which the table still has replicas. This throws std::out_of_range and fails the tablet draining stage, and node replace is failed. No backport because the problem exists only on master. Fixes #26768 Closes scylladb/scylladb#26783	2025-11-05 15:49:51 +03:00
Pavel Emelyanov	05d711f221	database_test: Simplify snapshot_with_quarantine_works() test The test collects Data files from table dir, then _all_ files from snapshot dir and then checks whether the former is the subset of the latter. Using std::includes over two sets makes the code much shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-05 15:35:28 +03:00
Pavel Emelyanov	c8492b3562	database_test: Improve snapshot_skip_flush_works test It has two inaccuracies. First, when checking the contents of table directory, it uses pre-populated expected list with "manifest.json" in it. Weird. Second, when cechking the contents of snapshot directory it doesn't check if the "schema.cql" is there. It's always there, but if something breaks in the future it may come unnoticed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-05 15:35:26 +03:00
Pavel Emelyanov	5a25d74b12	database_test: Simplify snapshot_works() tests No functional changes here, just make use of the new lister to shorten the code. A small side effect -- if the test fails because contents of directories changes, it will print the exact difference in logs, not just that N files are missing/present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-05 15:34:25 +03:00
Pavel Emelyanov	365044cdbb	database_test: Use collect_files() to remove files Some test cases remove files from table directory to perform some checks over the taken snapshots. Using collect_files() helper makes the code easier to read. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-05 15:34:24 +03:00
Pavel Emelyanov	e1f326d133	database_test: Use collectz_files() to count files in directory Some test cases want to see that there are more than one file in a directory, so they can just re-use the new helper. Much shorter this way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-05 15:32:58 +03:00
Pavel Emelyanov	60d1f78239	database_test: Introduce collect_files() helper It returns a set of files in a given directoy. Will be used by all next patches. Implemented using directory_lister, not lister::scan_dir in order to help removing the latter one in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-11-05 15:32:58 +03:00
Calle Wilund	6c6105e72e	test::pylib: Add azure (mock) testing to EAR matrix Fixes #26782 Adds a provider proxy for azure, using the existing mock server, now as a fixture.	2025-11-05 10:22:23 +00:00
Calle Wilund	b8a6b6dba9	test::boost::encryption_at_rest: Remove redundant azure test indent	2025-11-05 10:22:23 +00:00
Calle Wilund	10e591bd6b	test::boost::encryption_at_rest: Move azure tests to use fixture Fixes #26781 Makes the test independent of wrapping scripts. Note: retains the split into "real" and "mock" tests. For other tests, we either all mock, or allow the environment to select mock or real. Here we have them combined. More expensive, but otoh more thourough.	2025-11-05 10:22:22 +00:00
Calle Wilund	1d37873cba	test::lib: Add azure mock/real server fixture Wraps the real/mock azure server for test in a fixture. Note: retains the current test setup which explicitly runs some tests with "real" azure, if avail, and some always mock.	2025-11-05 10:22:22 +00:00
Calle Wilund	10041419dc	test::pylib::boost: Fix test gather to handle test suites Fixes #26775	2025-11-05 10:22:22 +00:00
Calle Wilund	2edf6cf325	test::boost::encryption_at_rest_test: Remove redundant indent Removed empty scope and reindents kms test using fixtures.	2025-11-05 10:22:22 +00:00
Calle Wilund	286a655bc0	test::boost::test_encryption_at_rest: Move to AWS KMS fixture for kms test Fixes #26780 Uses fake/real CI endpoint for AWS KMS tests, and moves these into a suite for sharing the mock server.	2025-11-05 10:22:22 +00:00
Calle Wilund	a1cc866f35	test::boost::test_encryption_at_rest: Reorder tests and helpers No code changes. Just reorders code to organize more by provider etc, prepping for fixtures and test suites.	2025-11-05 10:22:22 +00:00
Calle Wilund	1b0394762e	test::pylib::dockerized_service: Handle docker/podman bind error message If we run non-dbuild, docker/podman can/will cause first bind error, we should check these too.	2025-11-05 10:22:22 +00:00
Calle Wilund	0842b2ae55	test::lib::aws_kms_fixture: Add a fixture object to run mock AWS KMS Runs local-kms mock AWS KMS server unless overridden by env var. Allows tests to use real or fake AWS KMS endpoint and shared fixture for quicker execution.	2025-11-05 10:22:21 +00:00
Calle Wilund	98c060232e	test::lib::gcs_fixture: Only set port if running docker image + more retry Our connect can spuriously fail. Just retry.	2025-11-05 10:22:21 +00:00
Wojciech Mitros	977fa91e3d	view_building_coordinator: rollback tasks on the leaving tablet replica When a tablet migration is started, we abort the corresponding view building tasks (i.e. we change the state of those tasks to "ABORTED"). However, we don't change the host and shard of these tasks until the migration successfully completes. When for some reason we have to rollback the migration, that means the migration didn't finish and the aborted task still has the host and shard of the migration source. So when we recreate tasks that should no longer be aborted due to a rolled-back migration, we should look at the aborted tasks of the source (leaving) replica. But we don't do it and we look at the aborted tasks of the target replica. In this patch we adjust the rollback mechanism to recreate tasks for the migration source instead of destination. We also fix the test that should have detected this issue - the injection that the test was using didn't make us rollback, but we simply retried a stage of the tablet migration. By using one_shot=False and adding a second injection, we can now guarantee that the migration will eventually fail and we'll continue to the 'cleanup_target' and 'revert_migration' stages. Fixes https://github.com/scylladb/scylladb/issues/26691 Closes scylladb/scylladb#26825	2025-11-05 10:44:06 +01:00
Pavel Emelyanov	59019bc9a9	Merge 'Alternator: allow warning on auth errors before enabling enforcement' from Nadav Har'El An Alternator user was recently "bit" when switching `alternator_enforce_authorization` from "false" to "true": ְְְAfter the configuration change, all application requests suddenly failed because unbeknownst to the user, their application used incorrect secret keys. This series introduces a solution for users who want to safely switch `alternator_enforce_authorization` from "false" to "true": Before switching from "false" to "true", the user can temporarily switch a new option, `alternator_warn_authorization`, to true. In this "warn" mode, authentication and authorization errors are counted in metrics (`scylla_alternator_authentication_failures` and `scylla_alternator_authorization_failures`) and logged as WARNings, but the user's application continues to work. The user can use these metrics or log messages to learn of errors in their application's setup, fix them, and only do the switch of `alternator_enforce_authorization` when the metrics or log messages show there are no more errors. The first patch is the implementation of the the feature - the new configuration option, the metrics and the log messages, the second patch is a test for the new feature, and the third patch is documentation recommending how to use the warn mode and the associated metrics or log messages to safely switch `alternaor_enforce_authorization` from false to true. Fixes #25308 This is a feature that users need, so it should probably be backported to live branches. Closes scylladb/scylladb#25457 * github.com:scylladb/scylladb: docs/alternator: explain alternator_warn_authorization test/alternator: tests for new auth failure metrics and log messages alternator: add alternator_warn_authorization config	2025-11-05 10:45:17 +03:00
Pavel Emelyanov	fc37518aff	test: Check file existence directly There's a test that checks if temporary-statistics file is gone at some point. It does it by listing the directory it expects the file to be in and then comparing the names met with the temp. stat. file name. It looks like a single file_exists() call is enough for that purpose. As a "sanity" check this patch adds a validation that non-temporary statistics file is there, all the more so this file is removed after the test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26743	2025-11-04 19:37:55 +01:00
Avi Kivity	95700c5f7f	Merge 'Support counters with tablets' from Michael Litvak Support the counters feature in tablets keyspaces. The main change is to fix the counter update during tablets intranode migration. Counter cell is c = map<host_id, value>. A counter update is applied by doing read-modify-write on a leader replica to retrieve the current host's counter value and transform the mutation to contain the updated value for the host, then apply the mutation and replicate it to other hosts. the read-modify-write is protected against concurrent updates by locking the counter cell. When the counter is migrated between two shards, it's not enough to lock the counter on the read shard, because in the stage write_both_read_new the read shard is switched, and then we can have concurrent updates reach either the old or the new shard. In order to keep the counter update exclusive we lock both shards when in the stage write_both_read_new. Also, when applying the transformed mutation we need to respect write_both stages and apply the mutation on both shards. We change it to use `apply_on_shards` similarly to other methods in storage proxy. The change applies to both tablets and vnodes, they use the same implementation, but for vnodes the behavior should remain equivalent up to some small reordering of the code since it doesn't have intranode migration and reduces to single read shard = write shard. Fixes https://github.com/scylladb/scylladb/issues/18180 no backport - new feature Closes scylladb/scylladb#26636 * github.com:scylladb/scylladb: docs: counters now work with tablets pgo: enable counters with tablets test: enable counters tests with tablets test: add counters with tablets test cql3: remove warning when creating keyspace with tablets cql3: allow counters with tablets storage_proxy: lock all read shards for counter update storage_proxy: apply counter mutation on all write shards storage_proxy: move counter update coordination to storage proxy storage_proxy: refactor mutate_counter_on_leader replica/db: add counter update guard replica/db: split counter update helper functions	2025-11-03 22:28:10 +01:00
Raphael S. Carvalho	7f34366b9d	sstables_loader: Don't bypass synchronization with busy topology The patch `c543059f86` fixed the synchronization issue between tablet split and load-and-stream. The synchronization worked only with raft topology, and therefore was disabled with gossip. To do the check, storage_service::raft_topology_change_enabled() but the topology kind is only available/set on shard 0, so it caused the synchronization to be bypassed when load-and-stream runs on any shard other than 0. The reason the reproducer didn't catch it is that it was restricted to single cpu. It will now run with multi cpu and catch the problem observed. Fixes #22707 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#26730	2025-11-03 18:10:08 +01:00
Michael Litvak	1dbf53ca29	test: enable counters tests with tablets Enable all counters-related tests that were disabled for tablets because counters was not supported with tablets until now. Some tests were parametrized to run with both vnodes and tablets, and the tablets case was skipped, in order to not lose coverage. We change them to run with the default configuration since now counters is supported with both vnodes and tablets, and the implementation is the same, so there is no benefit in running them with both configurations.	2025-11-03 16:04:37 +01:00
Michael Litvak	a6c12ed1ef	test: add counters with tablets test add a new test for counters with tablets to test things that are specific to tablets. test counter updates that are concurrent with tablet internode and intranode migrations and verify it remains consistent and no updates are lost.	2025-11-03 16:04:37 +01:00
Michael Litvak	60ac13d75d	cql3: remove warning when creating keyspace with tablets When creating a keyspace with tablets, a warning is shown with all the unsupported features for tablets, which is only counters currently. Now that counters is also supported with tablets, we can remove this warning entirely.	2025-11-03 16:04:37 +01:00
Tomasz Grabiec	e878042987	Revert "Revert "tests(lwt): new test for LWT testing during tablet resize"" This reverts commit `6cb14c7793`. The issue causing the previous revert was fixed in `88765f627a`.	2025-11-03 10:38:00 +01:00
Avi Kivity	7a72155374	Merge 'Introduce nodetool excludenode' from Tomasz Grabiec If a node is dead and cannot be brought back, tablet migrations are stuck, until the node is explicitly marked as "permanently dead" / "ignored node" / "excluded" (name differs in different contexts). Currently, this is done during removenode and replace operations but it should be possible to only mark the node as dead, for the purpose of unblocking migrations or other topology operations, without doing the actual removenode, because full removal might be currently impossible, or not desirable due to lack of capacity or priorities. This patch introduces this kind of API: ``` nodetool excludenode <host-id> [ ... <host-id> ] ``` Having this kind of API is an improvement in user experience in several cases. For example, when we lose a rack, the only viable option for recovery is to run removenode with an extra --ignore-dead-nodes option. This removenode will fail in the tablet draining phase, as there is no live node in the rack to rebuild replicas in. This is confusing to the operator. But necessary before ALTER KEYSPACE can proceed in order to change replication options to drop the rack from RF. Having this API allows operators to have more unified procedures, where "nodetool excludenode" is always the first step of recovery, which unblocks further topology operations, both those which restore capacity, but also auto-scaling, tablet split/merge, load balancing, etc. Fixes #21281 The PR also changes "nodetool status" to show excluded nodes, they have 'X' in their status instead of 'D'. Closes scylladb/scylladb#26659 * github.com:scylladb/scylladb: nodetool: status: Show excluded nodes as having status 'X' test: py: Test scenario involving excludenode API nodetool: Introduce excludenode command	2025-10-31 22:14:57 +02:00
Avi Kivity	d458dd41c6	Merge 'Avoid input_/output_stream-s default initialization and move-assignment' from Pavel Emelyanov Recent seastar update deprecated in/out streams usage pattern when a stream is default constructed early and them move-assigned with the proper one (see scylladb/seastar#3051). This PR fixes few places in Scylla that still use one. Adopting newer seastar API, no need to backport Closes scylladb/scylladb#26747 * github.com:scylladb/scylladb: commitlog: Remove unused work::r stream variable ec2_snitch: Fix indentation after previous patch ec2_snitch: Coroutinize the aws_api_call_once() sstable: Construct output_stream for data instantly test: Don't reuse on-stack input stream	2025-10-31 21:22:41 +02:00

1 2 3 4 5 ...

10036 Commits