scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 04:06:59 +00:00

Author	SHA1	Message	Date
Nadav Har'El	85c19d21bb	Merge 'cql, schema: Extend keyspace, table, views, indexes name length limit from 48 to 192 bytes' from Karol Nowacki cql, schema: Extend name length limit from 48 to 192 bytes This commit increases the maximum length of names for keyspaces, tables, materialized views, and indexes from 48 to 192 bytes. The previous 48-bytes limit was inherited from Cassandra 3 for compatibility. However, this validation was removed in Cassandra 4 and 5 (see CASSANDRA-20389) and some usage scenarios (such as some feature store workflows generating long table names) now depend on this relaxed constraint. This change brings ScyllaDB's behavior in line with modern Cassandra versions and better supports these use cases. The new limit of 192 bytes is derived from underlying filesystem limitations to prevent runtime errors when creating directories for table data. When a new table is created, ScyllaDB generates a directory for its SSTables. The directory name is constructed from the table name, a dash, and a 32-character UUID. For a CDC-enabled table, an associated log table is also created, which has the suffix `_scylla_cdc_log` appended to its name. The directory name for this log table becomes the longest possible representation. Additionally we reserve 15 bytes for future use, allowing for potential future extensions without breaking existing schemas. To guarantee that directory creation never fails due to exceeding filesystem name limits, the maximum name length is calculated as follows: 255 bytes (common filesystem limit for a path component) - 32 bytes (for the 32-character UUID string) - 1 byte (for the '-' separator) - 15 bytes (for the '_scylla_cdc_log' suffix) - 15 bytes (reserved for future use) ---------- = 192 bytes (Maximum allowed name length) This calculation is similar in principle to the one proposed for Cassandra to fix related directory creation failures (see apache/cassandra/pull/4038). This patch also updates/adds all associated tests to validate the new 192-byte limit. The documentation has been updated accordingly. Fixes #4480 Backport 2025.2: The significantly shorter maximum table name length in Scylla compared to Cassandra is becoming a more common issue for users in the latest release. Closes scylladb/scylladb#24500 * github.com:scylladb/scylladb: cql, schema: Extend name length limit from 48 to 192 bytes replica: Remove unused keyspace::init_storage()	2025-06-22 17:41:10 +03:00
Avi Kivity	770b91447b	Merge 'memtable: ensure _flushed_memory doesn't grow above total_memory' from Michał Chojnowski `dirty_memory_manager` tracks two quantities about memtable memory usage: "real" and "unspooled" memory usage. "real" is the total memory usage (sum of `occupancy().total_space()`) by all memtable LSA regions, plus a upper-bound estimate of the size of memtable data which has already moved to the cache region but isn't evictable (merged into the cache) yet. "unspooled" is the difference between total memory usage by all memtable LSA regions, and the total flushed memory (sum of `_flushed_memory`) of memtables. `dirty_memory_manager` controls the shares of compaction and/or blocks writes when these quantities cross various thresholds. "Total flushed memory" isn't a well defined notion, since the actual consumption of memory by the same data can vary over time due to LSA compactions, and even the data present in memtable can change over the course of the flush due to removals of outdated MVCC versions. So `_flushed_memory` is merely an approximation computed by `flush_reader` based on the data passing through it. This approximation is supposed to be a conservative lower bound. In particular, `_flushed_memory` should be not greater than `occupancy().total_space()`. Otherwise, for example, "unspooled" memory could become negative (and/or wrap around) and weird things could happen. There is an assertion in `~flush_memory_accounter` which checks that `_flushed_memory < occupancy().total_space()` at the end of flush. But it can fail. Without additional treatment, the memtable reader sometimes emits data which is already deleted. (In particular, it emites rows covered by a partition tombstone in a newer MVCC version.) This data is seen by `flush_reader` and accounted in `_flushed_memory`. But this data can be garbage-collected by the `mutation_cleaner` later during the flush and decrease `total_memory` below `_flushed_memory`. There is a piece of code in `mutation_cleaner` intended to prevent that. If `total_memory` decreases during a `mutation_cleaner` run, `_flushed_memory` is lowered by the same amount, just to preserve the asserted property. (This could also make `_flushed_memory` quite inaccurate, but that's considered acceptable). But that only works if `total_memory` is decreased during that run. It doesn't work if the `total_memory` decrease (enabled by the new allocator holes made by `mutation_cleaner`'s garbage collection work) happens asynchronously (due to memory reclaim for whatever reason) after the run. This patch fixes that by tracking the decreases of `total_memory` closer to the source. Instead of relying on `mutation_cleaner` to notify the memtable if it lowers `total_memory`, the memtable itself listens for notifications about LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's estimate of flushed memory decreased by the change in `total_memory` since the beginning of flush (if it was positive), and it keeps the amount of "spooled" memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`. Fixes scylladb/scylladb#21413 Backport candidate because it fixes a crash that can happen in existing stable branches. Closes scylladb/scylladb#21638 * github.com:scylladb/scylladb: memtable: ensure _flushed_memory doesn't grow above total memory usage replica/memtable: move region_listener handlers from dirty_memory_manager to memtable	2025-06-22 11:19:25 +03:00
Michał Chojnowski	7d551f99be	replica/memtable: move region_listener handlers from dirty_memory_manager to memtable The memtable wants to listen for changes in its `total_memory` in order to decrease its `_flushed_memory` in case some of the freed memory has already been accounted as flushed. (This can happen because the flush reader sees and accounts even outdated MVCC versions, which can be deleted and freed during the flush). Today, the memtable doesn't listen to those changes directly. Instead, some calls which can affect `total_memory` (in particular, the mutation cleaner) manually check the value of `total_memory` before and after they run, and they pass the difference to the memtable. But that's not good enough, because `total_memory` can also change outside of those manually-checked calls -- for example, during LSA compaction, which can occur anytime. This makes memtable's accounting inaccurate and can lead to unexpected states. But we already have an interface for listening to `total_memory` changes actively, and `dirty_memory_manager`, which also needs to know it, does just that. So what happens e.g. when `mutation_cleaner` runs is that `mutation_cleaner` checks the value of `total_memory` before it runs, then it runs, causing several changes to `total_memory` which are picked up by `dirty_memory_manager`, then `mutation_cleaner` checks the end value of `total_memory` and passes the difference to `memtable`, which corrects whatever was observed by `dirty_memory_manager`. To allow memtable to modify its `_flushed_memory` correctly, we need to make `memtable` itself a `region_listener`. Also, instead of the situation where `dirty_memory_manager` receives `total_memory` change notifications from `logalloc` directly, and `memtable` fixes the manager's state later, we want to only the memtable listen for the notifications, and pass them already modified accordingl to the manager, so there is no intermediate wrong states. This patch moves the `region_listener` callbacks from the `dirty_memory_manager` to the `memtable`. It's not intended to be a functional change, just a source code refactoring. The next patch will be a functional change enabled by this.	2025-06-20 11:42:30 +02:00
Łukasz Paszkowski	a9a53d9178	compaction_manager: cancel submission timer on drain The `drain` method, cancels all running compactions and moves the compaction manager into the disabled state. To move it back to the enabled state, the `enable` method shall be called. This, however, throws an assertion error as the submission time is not cancelled and re-enabling the manager tries to arm the armed timer. Thus, cancel the timer, when calling the drain method to disable the compaction manager. Fixes https://github.com/scylladb/scylladb/issues/24504 All versions are affected. So it's a good candidate for a backport. Closes scylladb/scylladb#24505	2025-06-20 11:33:49 +03:00
Nadav Har'El	70f5a6a4d6	test/cqlpy: fix run-cassandra script to ignore CASSANDRA_HOME As test/cqlpy/README.md explains, the way to tell the run-cassandra script which version of Cassandra should be run is through the "CASSANDRA" variable, for example: CASSANDRA=$HOME/apache-cassandra-4.1.6/bin/cassandra \ test/cqlpy/run-cassandra test_file.py::test_function But all the Cassandra scripts, of all versions, have one strange feature: If you set CASSANDRA_HOME, then instead of running the actual Cassandra script you tried to run (in this case, 4.1.6), the Cassandra script goes to run the other Cassandra from CASSANDRA_HOME! This means that if a user happens to have, for some reason, set CASSANDRA_HOME, then the documented "CASSANDRA" variable doesn't work. The simple fix is to clear CASSANDRA_HOME in the environment that run-cassandra passes to Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#24546	2025-06-20 11:31:02 +03:00
Andrei Chekun	392a7fc171	test.py: Fix the boost output file name File name for the boost test do not use run_id, so each consequent run will overwrite the logs from the previous one. If the first repeat fails, and the second will pass, it overwrites the failed log. This PR allows saving the failed one. Closes scylladb/scylladb#24580	2025-06-20 11:26:16 +03:00
Avi Kivity	c89ab90554	Merge 'main: don't start maintenance auth service if not enabled' from Marcin Maliszkiewicz In `f96d30c2b5` we introduced the maintenance service, which is an additional instance of auth::service. But this service has a somewhat confusing 2-level startup mechanism: it's initialized with sharded<Service>::start and then auth::service::start (different method with the same name to confuse even more). When maintenance_socket was disabled (default setting), the code did only the first part of the startup. This registered a config observer but didn't create a permission_cache instance. As a result, a crash on SIGHUP when config is reloaded can occur. Fixes: https://github.com/scylladb/scylladb/issues/24528 Backport: all not eol versions since 6.0 and 2025.1 Closes scylladb/scylladb#24527 * github.com:scylladb/scylladb: test: add test for live updates of permissions cache config main: don't start maintenance auth service if not enabled	2025-06-18 20:28:53 +03:00
Karol Nowacki	4577c66a04	cql, schema: Extend name length limit from 48 to 192 bytes This commit increases the maximum length of names for keyspaces, tables, materialized views, and indexes from 48 to 192 bytes. The previous 48-bytes limit was inherited from Cassandra 3 for compatibility. However, this validation was removed in Cassandra 4 and 5 (see CASSANDRA-20389) and some usage scenarios (such as some feature store workflows generating long table names) now depend on this relaxed constraint. This change brings ScyllaDB's behavior in line with modern Cassandra versions and better supports these use cases. The new limit of 192 bytes is derived from underlying filesystem limitations to prevent runtime errors when creating directories for table data. When a new table is created, ScyllaDB generates a directory for its SSTables. The directory name is constructed from the table name, a dash, and a 32-character UUID. For a CDC-enabled table, an associated log table is also created, which has the suffix `_scylla_cdc_log` appended to its name. The directory name for this log table becomes the longest possible representation. Additionally we reserve 15 bytes for future use, allowing for potential future extensions without breaking existing schemas. To guarantee that directory creation never fails due to exceeding filesystem name limits, the maximum name length is calculated as follows: 255 bytes (common filesystem limit for a path component) - 32 bytes (for the 32-character UUID string) - 1 byte (for the '-' separator) - 15 bytes (for the '_scylla_cdc_log' suffix) - 15 bytes (reserved for future use) ---------- = 192 bytes (Maximum allowed name length) This calculation is similar in principle to the one proposed for Cassandra to fix related directory creation failures (see apache/cassandra/pull/4038). This patch also updates/adds all associated tests to validate the new 192-byte limit. The documentation has been updated accordingly.	2025-06-18 14:08:38 +02:00
Marcin Maliszkiewicz	dd01852341	test: add test for live updates of permissions cache config	2025-06-18 11:27:08 +02:00
Botond Dénes	da1a3dd640	Merge 'test: introduce upgrade tests to test.py, add a SSTable dict compression upgrade test' from Michał Chojnowski This PR adds an upgrade test for SSTable compression with shared dictionaries, and adds some bits to pylib and test.py to support that. In the series, we: 1. Mount `$XDG_CACHE_DIR` into dbuild. 2. Add a pylib function which downloads and installs a released ScyllaDB package into a subdirectory of `$XDG_CACHE_DIR/scylladb/test.py`, and returns the path to `bin/scylla`. 3. Add new methods and params to the cluster manager, which let the test start nodes with historical Scylla executables, and switch executables during the test. 4. Add a test which uses the above to run an upgrade test between the released package and the current build. 5. Add `--run-internet-dependent-tests` to `test.py` which lets the user of `test.py` skip this test (and potentially other internet-dependent tests in the future). (The patch modifying `wait_for_cql_and_get_hosts` is a part of the new test — the new test needs it to test how particular nodes in a mixed-version cluster react to some CQL queries.) This is a follow-up to #23025, split into a separate PR because the potential addition of upgrade tests to `test.py` deserved a separate thread. Needs backport to 2025.2, because that's where the tested feature is introduced. Fixes #24110 Closes scylladb/scylladb#23538 * github.com:scylladb/scylladb: test: add test_sstable_compression_dictionaries_upgrade.py test.py: add --run-internet-dependent-tests pylib/manager_client: add server_switch_executable test/pylib: in add_server, give a way to specify the executable and version-specific config pylib: pass scylla_env environment variables to the topology suite test/pylib: add get_scylla_2025_1_executable() pylib/scylla_cluster: give a way to pass executable-specific options to nodes dbuild: mount "$XDG_CACHE_HOME/scylladb"	2025-06-18 12:21:21 +03:00
Michał Chojnowski	27f66fb110	test/boost/mutation_reader_test: fix a use-after-free in `test_fast_forwarding_combined_reader_is_consistent_with_slicing` The contract in mutation_reader.hh says: ``` // pr needs to be valid until the reader is destroyed or fast_forward_to() // is called again. future<> fast_forward_to(const dht::partition_range& pr) { ``` `test_fast_forwarding_combined_reader_is_consistent_with_slicing` violates this by passing a temporary to `fast_forward_to`. Fix that. Fixes scylladb/scylladb#24542 Closes scylladb/scylladb#24543	2025-06-17 19:30:50 +03:00
Pavel Emelyanov	b0766d1e73	Merge 's3_client: Refactor `range` class for state validation' from Ernest Zaslavsky Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3. This should address and prevent future problems related to this issue https://github.com/minio/minio/issues/21333 No backport needed since this problem related only to this change https://github.com/scylladb/scylladb/pull/23880 Closes scylladb/scylladb#24312 * github.com:scylladb/scylladb: s3_client: headers cleanup s3_client: Refactor `range` class for state validation	2025-06-17 10:34:55 +03:00
Avi Kivity	cd79a8fc25	Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz" This reverts commit `0b516da95b`, reversing changes made to `30199552ac`. It breaks cluster.random_failures.test_random_failures.test_random_failures in debug mode (at least). Fixes #24513	2025-06-16 22:38:12 +03:00
Ernest Zaslavsky	9ad7a456fe	s3_client: Refactor `range` class for state validation Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3.	2025-06-16 16:02:24 +03:00
Pavel Emelyanov	5c2e5890a6	Merge 'test.py: Integrate pytest c++ test execution to test.py' from Andrei Chekun With current changes, pytest executes boost tests. Gathering metrics added to the pytest BoostFacade and UnitFacade to have the possibility to get them for C++ test as previously. Since boost, raft, unit, and ldap directories aren't executed by test.py, suite.yaml files are renamed to test_config.yaml to preserve the old way of test configuration and removing them from execution by test.py Pytest executes all modes by itself, JUnit report for the C++ test will be one for the run. That means that there is no possibility to output them in testlog in different folders. So testlog/report directory is used to store all kinds of reports generated during tests. JUnit reports should be testlog/report/junit, Allure reports should be in testlog/report/allure. Breaking changes: 1. Terminal output changed. test.py will run pytest for the next directories: `test/boost`, `test/ldap`, `test/raft`, `test/unit`. `test.py` will blindly translate the output of the pytest to the terminal. Then when all these tests are finished, `test.py` will continue to show previous output for the rest of the test. 2. The format of execution of C++ test directories mentioned above has been changed. Now it will be a simple path to the file with extension. For example, instead of `boost/aggregate_fcts_test` now you need to use `test/boost/aggregate_fcts_test.cc` 3. This PR creates a spike in test amount. The previous logic was to consolidate the boost results from different runs and different modes to one report. So for the three repeats and three modes (nine test results) in CI was shown one result. Now it shows nine results, with differentiating them by mode and run. Note: Pytest uses pytest-xdist module to run tests in parallel. The Frozen toolchain has this dependency installed, for the local use, please install it manually. Changes for CI https://github.com/scylladb/scylla-pkg/pull/4949. It will be merged after the current PR will be in master. Short disruption is expected, while PR in scylla-pkg will not be merged. Fixes: https://github.com/scylladb/qa-tasks/issues/1777 Closes scylladb/scylladb#22894 * github.com:scylladb/scylladb: test.py: clean code that isn't used anymore test.py: switch off C++ tests from test.py discovery test.py: Integrate pytest c++ test execution to test.py	2025-06-16 16:01:37 +03:00
Tomasz Grabiec	cdb1499898	Merge 'interval: reduce memory footprint' from Avi Kivity The interval class's memory footprint isn't important for single objects, but intervals are frequently held in moderately sized collections. In #3335 this caused a stall. Therefore reducing interval's memory footprint and reduce allocation pressure. This series does this by consolidating badly-padded booleans in the object tree spanned by interval into 5 booleans that are consecutive in memory. This reduces the space required by these booleans from 40 bytes to 8 bytes. perf-simple-query report (with refresh-pgo-profiles.sh for each measurement): before: 252127.60 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37128 insns/op, 18147 cycles/op, 0 errors) INFO 2025-06-07 21:00:34,010 [shard 0:main] group0_tombstone_gc_handler - Setting reconcile time to 1749319231 (min id=4dbed2f4-43c9-11f0-cbc6-87d1a08b4ca4) 246492.37 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37153 insns/op, 18411 cycles/op, 0 errors) 253633.11 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37127 insns/op, 17941 cycles/op, 0 errors) 254029.93 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37155 insns/op, 17951 cycles/op, 0 errors) 254465.76 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37123 insns/op, 17906 cycles/op, 0 errors) throughput: mean= 252149.75 standard-deviation=3282.75 median= 253633.11 median-absolute-deviation=1880.17 maximum=254465.76 minimum=246492.37 instructions_per_op: mean= 37137.24 standard-deviation=15.71 median= 37127.54 median-absolute-deviation=14.45 maximum=37155.24 minimum=37122.79 cpu_cycles_per_op: mean= 18071.19 standard-deviation=212.25 median= 17950.62 median-absolute-deviation=130.10 maximum=18411.50 minimum=17906.13 after: 252561.26 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37039 insns/op, 18075 cycles/op, 0 errors) 256876.44 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37022 insns/op, 17785 cycles/op, 0 errors) 257084.38 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37030 insns/op, 17840 cycles/op, 0 errors) 257305.35 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37042 insns/op, 17804 cycles/op, 0 errors) 258088.53 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 37028 insns/op, 17778 cycles/op, 0 errors) throughput: mean= 256383.19 standard-deviation=2185.22 median= 257084.38 median-absolute-deviation=922.16 maximum=258088.53 minimum=252561.26 instructions_per_op: mean= 37032.17 standard-deviation=8.06 median= 37030.46 median-absolute-deviation=6.44 maximum=37041.83 minimum=37021.93 cpu_cycles_per_op: mean= 17856.60 standard-deviation=124.70 median= 17804.16 median-absolute-deviation=71.24 maximum=18075.50 minimum=17777.95 A small improvement is observed in instructions_per_op. It could be random fluctuations in the compiler performance, or maybe the default constructor/destructor of interval are meaningful even in this simple test. Small performance improvement, so not a backport candidate. Closes scylladb/scylladb#24232 * github.com:scylladb/scylladb: interval: reduce sizeof interval: change start()/end() not to return references to data members interval: rename start_ref() back to start() (and end_ref() etc). interval: rename start() to start_ref() (and end() etc). test: wrapping_interval_test: add more tests for intervals	2025-06-16 09:23:56 +02:00
Pavel Emelyanov	9aaa33c15a	Merge 'main.cc: fix group0 shutdown order' from Petr Gusev Applier fiber needs local storage, so before shutting down local storage we need to make sure that group0 is stopped. We also improve the logs for the case when `gate_closed_exception` is thrown while a mutation is being written. Fixes [scylladb/scylladb#24401](https://github.com/scylladb/scylladb/issues/24401) Backport: no backport -- not safe and the problem is minor. Closes scylladb/scylladb#24418 * github.com:scylladb/scylladb: storage_service: test_group0_apply_while_node_is_being_shutdown main.cc: fix group0 shutdown order storage_proxy: log gate_closed_exception	2025-06-16 09:32:34 +03:00
Avi Kivity	16fb68bb5e	interval: rename start_ref() back to start() (and end_ref() etc). To reduce noise, rename start_ref() back to its original name start(), after it was changed in the previous patch to force an audit of all calls.	2025-06-14 21:26:16 +03:00
Avi Kivity	3363bc41e2	interval: rename start() to start_ref() (and end() etc). We are about to change start() to return a proxy object rather than a `const interval_bound<T>&`. This is generally transparent, except in one case: `auto x = i.start()`. With the current implementation, we'll copy object referred to and assign it to x. With the planned implementation, the proxy object will be assigned to `x`, but it will keep referring to `i`. To prevent such problems, rename start() to start_ref() and end() to end_ref(). This forces us to audit all calls, and redirect calls that will break to new start_copy() and end_copy() methods.	2025-06-14 21:26:16 +03:00
Avi Kivity	674118fd2e	test: wrapping_interval_test: add more tests for intervals In this series, we will make interval manage its memory directly, specifically it will directly construct and destroy T values that it contains rather than let std::optional<T> manage those values itself. Add tests that expose bugs encountered during development (actually, review) of this series. The tests pass before the series, fail with series as it was before fixing, and pass with the series as it is now. The tests use a class maybe_throwing_interval_payload that can be set to throw at strategic locations and exercise all the interesting interval shapes.	2025-06-14 21:26:14 +03:00
Piotr Dulikowski	238fc24800	Merge 'test: dtest: move audit_test.py to test.py' from Andrzej Jackowski Copied the entire audit_test.py from scylladb/scylla-dtest, to remove the entire file from scylla-dtest after this patch series is merged. The motivation is to move entire audit testing to from dtests, to make it easier to maintain and more reliable. After audit_test.py was moved from dtests to test.py, some issues that require fixing arose due to differences between the frameworks. No backport, moving audit_test.py to test.py is a new testing effort. Closes scylladb/scylladb#24231 * github.com:scylladb/scylladb: test: audit: filter out LOGIN and USE audit logs test: audit: remove require mark test: audit: wait until raft state is applied in test_permissions test: audit: fix problems in audit_test.py test: dtest: add dict support to populate in scylla_cluster.py test: dtest: copied get_node_ip from dtests to scylla_cluster.py test: dtest: copy run_rest_api from dtests to cluster.py test: dtest: copy run_in_parallel from dtests to data.py test: audit: copy unmodified audit_test.py from dtests	2025-06-12 09:03:45 +02:00
Andrei Chekun	570aaa2ecb	test.py: clean code that isn't used anymore Clean code that is not used anymore	2025-06-11 18:29:26 +02:00
Andrei Chekun	9dca7719b1	test.py: switch off C++ tests from test.py discovery Switch off C++ tests from test.py discovery. With this change, test.py loses the ability to directly see and run the C++ tests. Instead, it'll delegate all things to the pytest. Since boost, raft, unit, and ldap directories aren't executed by test.py, suite.yaml files are renamed to test_config.yaml to preserve the old way of test configuration and removing them from execution by test.py Before this patch boost test were visible by test.py and pytest. So if the test.py will be invoked without test name, it will execute boost tests twice: with test.py executor and with pytest executor. Depending on the test name according executor will be used. For example, if test name is test/boost/aggregate_fcts_test.cc it will be executed by pytest, but if the boost/aggregate_fcts_test it will be executed by test.py executor.	2025-06-11 18:29:26 +02:00
Tomasz Grabiec	eabc1fa6ff	Merge 'tablets: deallocate storage state on end_migration' from Michael Litvak When a tablet is migrated and cleaned up, deallocate the tablet storage group state on `end_migration` stage, instead of `cleanup` stage: * When the stage is updated from `cleanup` to `end_migration`, the storage group is removed on the leaving replica. * When the table is initialized, if the tablet stage is `end_migration` then we don't allocate a storage group for it. This happens for example if the leaving replica is restarted during tablet migration. If it's initialized in `cleanup` stage then we allocate a storage group, and it will be deallocated when transitioning to `end_migration`. This guarantees that the storage group is always deallocated on the leaving replica by `end_migration`, and that it is always allocated if the tablet wasn't cleaned up fully yet. It is a similar case also for the pending replica when the migration is aborted. We deallocate the state on `revert_migration` which is the stage following `cleanup_target`. Previously the storage group would be allocated when the tablet is initialized on any of the tablet replicas - also on the leaving replica, and when the tablet stage is `cleanup` or `end_migration`, and deallocated during `cleanup`. This fixes the following issue: 1. A migrating tablet enters cleanup stage 2. the tablet is cleaned up successfuly 3. The leaving replica is restarted, and allocates storage group 4. tablet cleanup is not called because it's already cleaned up 5. the storage group remains allocated on the leaving replica after the migration is completed - it's not cleaned up properly. Fixes https://github.com/scylladb/scylladb/issues/23481 backport to all relevant releases since it's a bug that results in a crash Closes scylladb/scylladb#24393 * github.com:scylladb/scylladb: test/cluster/test_tablets: test restart during tablet cleanup test: tablets: add get_tablet_info helper tablets: deallocate storage state on end_migration	2025-06-11 17:37:02 +02:00
Andrzej Jackowski	e23d79cb62	test: audit: filter out LOGIN and USE audit logs LOGIN entries can appear at many points during testing, for example, when a driver creates a new session. Similarly, `USE ks` statements can appear unexpectedly, especially when the python-driver calls `set_keyspace_async` for new connections. To avoid test checks failures, this commit filters out LOGIN and USE entries in tests that are not intended to verify these two types of audit logs.	2025-06-11 09:43:51 +02:00
Andrzej Jackowski	876eaf459b	test: audit: remove require mark After moving audit tests to dtests, require marks are no longer needed because the tests and the code are in the same repository.	2025-06-11 09:43:51 +02:00
Marcin Maliszkiewicz	111cccf8ba	test: audit: wait until raft state is applied in test_permissions Otherwise test is flaky, expecting permissions to be enforced before they get applied.	2025-06-11 09:43:51 +02:00
Andrzej Jackowski	6c6234979c	test: audit: fix problems in audit_test.py After audit_test.py was moved from dtests to test.py, the following issues arose due to differences between the frameworks: - Some imports were unnecessary or broken - The @pytest.mark.dtest_full decorator was no longer needed - The `issue_open` attribute in `xmark` is not supported - Support for sending SIGHUP is encapsulated by `server_update_config` in test.py` - A workaround for scylladb#24473 was required Moreover, suite.yaml was changed to start running audit_test.py in dev mode. Ref. scylladb#24473 Co-authored-by: Marcin Maliszkiewicz <marcinmal@scylladb.com>	2025-06-11 09:43:44 +02:00
Petr Gusev	b1050944a3	storage_service: test_group0_apply_while_node_is_being_shutdown	2025-06-10 17:25:03 +02:00
Wojciech Mitros	5eb4466789	Return correct creation date time in describe table Add system:table_creation_time tag with value - timestamp in milliseconds of creation table. If the tag is present, it will used to fill creation timestamp value (when CreateTable or DescribeTable is called). If the tag is missing, value 0 for timestamp will be substituted (in other words table was created on 1th january of 1970). Update test to change how we make sure timestamp is actually used - we create two tables one after another and make sure their creation timestamp is in correct order. Update tests, that work with tags to filter system tags out. Fixes #5013 Closes scylladb/scylladb#24007	2025-06-10 15:25:57 +03:00
Nadav Har'El	ed3a0a81d6	test/cqlpy: add some more tests of secondary index system tables This patch adds a couple of basic tests for system tables related to secondary indexes - system."IndexInfo" and system_schema.indexes. I wanted to understand these system tables better when writing documentation for them - so wrote these tests. These tests can also serve as regression tests that verify that we don't accidentally lose support for these system tables. I checked that these tests also pass in Cassandra 3, 4 and 5. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#24137	2025-06-10 15:00:51 +03:00
Tomasz Grabiec	0b516da95b	Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft. Pulling `database::apply()` out of schema merging code will allow to batch changes to subsystems. Future generic code will first call `prepare()` on all implementations, then single `database::apply()` and then `update()` on all implementations, then on each shard it will call `commit()` for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then `post_commit()`. Backport: no, it's a new feature Fixes: https://github.com/scylladb/scylladb/issues/19649 Closes scylladb/scylladb#20853 * github.com:scylladb/scylladb: storage_service: always wake up load balancer on update tablet metadata db: schema_applier: call destroy also when exception occurs db: replica: simplify seeding ERM during shema change db: remove cleanup from add_column_family db: abort on exception during schema commit phase db: make user defined types changes atomic replica: db: make keyspace schema changes atomic db: atomically apply changes to tables and views replica: make truncate_table_on_all_shards get whole schema from table_shards service: split update_tablet_metadata into two phases service: pull out update_tablet_metadata from migration_listener db: service: add store_service dependency to schema_applier service: simplify load_tablet_metadata and update_tablet_metadata db: don't perform move on tablet_hint reference replica: split add_column_family_and_make_directory into steps replica: db: split drop_table into steps db: don't move map references in merge_tables_and_views() db: introduce commit_on_shard function db: access types during schema merge via special storage replica: make non-preemptive keyspace create/update/delete functions public replica: split update keyspace into two phases replica: split creating keyspace into two functions db: rename create_keyspace_from_schema_partition db: decouple functions and aggregates schema change notification from merging code db: store functions and aggregates change batch in schema_applier db: decouple tables and views schema change notifications from merging code db: store tables and views schema diff in schema_applier db: decouple user type schema change notifications from types merging code service: unify keyspace notification functions arguments db: replica: decouple keyspace schema change notifications to a separate function db: add class encapsulating schema merging	2025-06-10 13:45:32 +02:00
Ernest Zaslavsky	30199552ac	s3_client: Mitigate connection exhaustion in `download_source` The existing `download_source` implementation optimizes performance by keeping the connection to S3 open and draining data directly from the socket. While this eliminates the overhead (60-100ms) of repeatedly establishing new connections, it leads to rapid exhaustion of client- side connections. On a single shard, two `mx_readers` for load and stream are enough to trigger this issue. Since each client typically holds two connections, readers keeping index and data sources open can cause deadlocks where processes stall due to unavailable connections. Introduce `chunked_download_source`, a new S3 download method built on `download_source`, to dynamically manage connections: - Buffers data in 5MiB chunks using a producer-consumer model - Closes connections once buffers reach capacity, returning them to the pool for other clients - Uses a filling fiber that resumes fetching once buffers are consumed from the queue Performance remains comparable to `download_source`, achieving 95MiB/s for sequential 1GiB downloads from S3. However, preloading large chunks may cause read amplification. Fixes: https://github.com/scylladb/scylladb/issues/23785 Closes scylladb/scylladb#23880	2025-06-10 12:58:24 +03:00
Robert Bindar	ca1a9c8d01	Add support for nodetool refresh --skip-reshape This patch adds the new option in nodetool, patches the load_new_ss_tables REST request with a new parameter and skips the reshape step in refresh if this flag is passed. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#24409 Fixes: #24365	2025-06-10 12:52:13 +03:00
Michael Litvak	bd88ca92c8	test/cluster/test_tablets: test restart during tablet cleanup Add a test that reproduces issue scylladb/scylladb#23481. The test migrates a tablet from one node to another, and while the tablet is in some stage of cleanup - either before or right after, depending on the parameter - the leaving replica, on which the tablet is cleaned, is restarted. This is interesting because when the leaving replica starts and loads its state, the tablet could be in different stages of cleanup - the SSTables may still exist or they may have been cleaned up already, and we want to make sure the state is loaded correctly.	2025-06-09 17:27:45 +03:00
Michael Litvak	fb18fc0505	test: tablets: add get_tablet_info helper Add a helper for tests to get the tablet info from system.tablets for a tablet owning a given token.	2025-06-09 16:59:07 +03:00
Michael Litvak	8aeb404893	test_cdc_generation_clearing: wait for generations to propagate In test_cdc_generation_clearing we trigger events that update CDC generations, verify the generations are updated as expected, and verify the system topology and CDC generations are consistent on all nodes. Before checking that all nodes are consistent and have the same CDC generations, we need to consider that the changes are propagated through raft and take some time to propagate to all nodes. Currently, we wait for the change to be applied only on the first server which runs the CDC generation publisher fiber and read the CDC generations from this single node. The consistency check that follows could fail if the change was not propagated to some other node yet. To fix that, before checking consistency with all nodes, we execute a read barrier on all nodes so they all see the same state as the leader. Fixes scylladb/scylladb#24407 Closes scylladb/scylladb#24433	2025-06-09 12:59:04 +02:00
Michał Chojnowski	7d26d3c7cb	db/config: add an option that disables dict-aware sstable compressors in DDL statements For reasons, we want to be able to disallow dictionary-aware compressors in chosen deployments. This patch adds a knob for that. When the knob is disabled, dictionary-aware compressors will be rejected in the validation stage of CREATE and ALTER statements. Closes scylladb/scylladb#24355	2025-06-09 13:30:40 +03:00
Raphael S. Carvalho	2d716f3ffe	replica: Fix truncate assert failure Truncate doesn't really go well with concurrent writes. The fix (#23560) exposed a preexisting fragility which I missed. 1) truncate gets RP mark X, truncated_at = second T 2) new sstable written during snapshot or later, also at second T (difference of MS) 3) discard_sstables() get RP Y > saved RP X, since creation time of sstable with RP Y is equal to truncated_at = second T. So the problem is that truncate is using a clock of second granularity for filtering out sstables written later, and after we got low mark and truncate time, it can happen that a sstable is flushed later within the same second, but at a different millisecond. By switching to a millisecond clock (db_clock), we allow sstables written later within the same second from being filtered out. It's not perfect but extremely unlikely a new write lands and get flushed in the same millisecond we recorded truncated_at timepoint. In practice, truncate will not be used concurrently to writes, so this should be enough for our tests performing such concurrent actions. We're moving away from gc_clock which is our cheap lowres_clock, but time is only retrieved when creating sstable objects, which frequency of creation is low enough for not having significant consequences, and also db_clock should be cheap enough since it's usually syscall-less. Fixes #23771. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#24426	2025-06-08 15:59:15 +03:00
Nadav Har'El	a714079a62	Merge 'Add Support for Per-Table Metrics in Alternator' from Amnon Heiman This series introduces per-table metrics support for Alternator. It includes the following commits: Add optional per-table metrics for Alternator Introduces a shared_ptr-based mechanism that allows Alternator to register per-table metrics. These metrics follow the table's lifecycle, similar to how CQL metrics are handled. The use of shared_ptr ensures no direct dependency between table stats and Alternator. Enable registration of stats objects per table Adds support for registering a stats object using a keyspace and table name. Per-table metrics are prefixed with alternator_table to differentiate them from per-shard metrics. Metrics are reported once per node, and those not meaningful at the table level (e.g. create/delete) are excluded. All metrics use the skip_when_empty flag. Update per-table metrics handling Adds a helper function to retrieve the stats object from a table schema. Updates both per-shard and per-table metrics, resulting in some code duplication. Add tests for per-table metrics Extends existing tests to also validate the per-table metrics. These tests ensure that the new metrics are correctly registered and updated. This series improves observability in Alternator by enabling fine-grained per-table metrics without disrupting existing per-shard metrics. No need to backport Fixes #19824 Closes scylladb/scylladb#24046 * github.com:scylladb/scylladb: alternator/test_metrics.py: Test the per-table metrics alternator/executor.cc: Update per-table metrics alternator/stats: Add per-table metrics replica/database.hh: Add alternator per-table metrics alternator/stats.hh: Introduce a per-table stats container	2025-06-08 10:42:05 +03:00
Marcin Maliszkiewicz	2090e44283	storage_service: always wake up load balancer on update tablet metadata Lack of wakeup is error-prone, as it relies on a wakeup occurring elsewhere.	2025-06-06 08:50:34 +02:00
Marcin Maliszkiewicz	a27776b4ff	replica: make truncate_table_on_all_shards get whole schema from table_shards Before for views and indexes it was fetching base schema from db (and couple other properties). This is a problem once we introduce atomic tables and views deletion (in the following commit). Because once we delete table it can no longer be fetched from db object, and truncation is performed after atomically deleting all relevant tables/views/indexes. Now the whole relevant schema will be fetched via global_table_ptr (table_shards) object.	2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz	21a5a3c01f	service: pull out update_tablet_metadata from migration_listener It's not a good usage as there is only one non-empty implementation. Also we need to change it further in the following commit which makes it incompatible with listener code.	2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz	92e3d69f79	db: service: add store_service dependency to schema_applier There is already implicit logical dependency via migration_notifier but in the next commits we'll be moving store_service out from it as we need better control (i.e. return a value from the call).	2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz	1c8fd3a65d	service: simplify load_tablet_metadata and update_tablet_metadata - remove load_tablet_metadata(), instead we add wake_up_load_balancer flag to update_tablet_metadata(), it reduces number of public functions and also serves as a comment (removed comment with very similar meaning) - reimplement the code to not use mutate_token_metadata(), this way it's more readable and it's also needed as we'll split update_tablet_metadata() in following commits so that we can have subroutine which doesn't yield (for ensuring atomicity)	2025-06-06 08:50:33 +02:00
Marcin Maliszkiewicz	141a5643e5	replica: db: split drop_table into steps This is done so that actual dropping can be an atomic step which could be composed with other schema operations, and eventually all subsystems modified via raft so that we could introduce atomic changes which span across different subsystems. We split drop_table_on_all_shards() into: - prepare_tables_metadata_change_on_all_shards() - prepare_drop_table_on_all_shards() - drop_table() - cleanup_drop_table_on_all_shards() prepare_tables_metadata_change_on_all_shards() is necessary because when applying multiple schema changes at once (e.g. drop and add tables) we need to lock only once. We add legacy_drop_table_on_all_shards() which behaves exactly like old drop_table_on_all_shards() to be compatible with code which doesn't need to play with atomicity. Usages of legacy_drop_table_on_all_shards() in schema_applier will be replaced with direct calls to split functions in the following commits - that's the place we will take advantage of drop_table not yielding (as it returns void now).	2025-06-06 08:50:33 +02:00
Pavel Emelyanov	f5743c6afc	Merge 'test/alternator: make tests runnable on DynamoDB Local' from Nadav Har'El The Alternator tests should pass on Alternator (of course), and almost always also on DynamoDB to verify that the tests themselves are correct and don't just enshrine Alternator's incorrect behavior. Although much less important, it is sometimes useful to be able to check if the test also pass on other DynamoDB clones, especially "DynamoDB Local" - Amazon's DynamoDB mock written in Java. In issue https://github.com/scylladb/scylladb/issues/7775 we noted that some of our tests don't actually pass on DynamoDB Local, for different reasons, but at the time that issue was created most of the tests did work. However, checking now on a newer version of DynamoDB Local (2.6.1), I notice that _all_ tests failed because of some silly reasons that are easy to fix - and this is what the two patches in this series fix. After these fixes, most of the Alternator tests pass on DynamoDB Local. But not all of them - #7775 is still open. No backport needed - these are just test framework improvements for developers. Closes scylladb/scylladb#24361 * github.com:scylladb/scylladb: test/alternator: any response from healthcheck means server is alive test/alternator: fall back to legal-looking access key id	2025-06-06 08:50:58 +03:00
Nadav Har'El	b0f98f7d4b	mv: test that view's SELECT automatically includes primary key Both ScyllaDB's and Datastax's documentation suggest that when creating a view with CREATE MATERIALIZED VIEW, its SELECT clause doesn't need to list the view's primary key columns because those are selected automatically. For example, our documentation has an example in https://docs.scylladb.com/manual/stable/features/materialized-views.html ``` CREATE MATERIALIZED VIEW building_by_city2 AS SELECT meters FROM buildings WHERE city IS NOT NULL PRIMARY KEY(city, name); ``` Note how the primary key columns - city and name - are not explicitly SELECTed. I just discovered that while this behavior was indeed true in Cassandra 3 (and still true in ScyllaDB), it actually got broken in Cassandra 4 and 5. I reported this apprent regression to Cassandra (CASSANDRA-20701), and proposing the regression test in this patch to ensure that Scylla can't suffer a similar regression in the future. The new test passes on ScyllaDB and Cassandra 3, but fails on Cassandra 4 and 5 (and therefore tagged with "cassandra_bug"). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#24399	2025-06-05 16:52:49 +02:00
Piotr Szymaniak	de96c28625	alternator: Add support for TTL when using tablets Support for TTL-based data removal when using tablets. The essence of this commit is a separate code path for finding token ranges owned by the current shard for the cases when tablets are used and not vnodes. At the same time, the vnodes-case is not touched not to cause any regressions. The TTL-caused data removal is normally performed by the primary replica (both when using vnodes and tablets). For the tablets case, the already-existing method tablet_map::get_primary_replica(tablet_id) is used to know if a shard execuring the TTL-related data removal is the primary replica for each tablet. A new method tablet_map::get_secondary_replica(tablet_id) has been added. It is needed by the data invalidation procedure to remove data when the primary replica node is down - the data is then removed by the secondary replica node. The mechanism is the same as in the vnodes case. Since alternator now supports TTL, the test `test_ttl_enable_error_with_tablets` has been removed. Also, tests in the test_ttl.py have been made to run twice, once with vnodes and once with tablets. When run with tablets, the due to lack of support for LWT with tablets (#18068), tests use 'system:write_isolation' of 'unsafe_rmw'. This approach allows early regression testing with tablets and is meant only as a tentative solution. Fixes scylladb/scylladb#16567 Closes scylladb/scylladb#23662	2025-06-05 17:39:29 +03:00
Amnon Heiman	760c8c3333	alternator/test_metrics.py: Test the per-table metrics This patch adds tests for the newly added per-table metrics. It mainly redoes existing tests, but verifies that the per-table metrics are updated correctly. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-06-05 15:12:19 +03:00

1 2 3 4 5 ...

8985 Commits