scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 12:17:02 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	e3952fbd35	test: test_raft_recovery_user_data: disable hinted handoff The test is currently flaky, writes can fail with "Too many in flight hints: 10485936". See scylladb/scylladb#23565 for more details. We suspect that scylladb/scylladb#23565 is caused by an infrastructure issue - slow disks on some machines we run CI jobs on. Since the test fails often and investigation doesn't seem to be easy, we first deflake the test in this patch by disabling hinted handoff. For replacing nodes, we provide `cfg` because there should have been `cfg` in the first place. The test was correct anyway because: - `tablets_mode_for_new_keyspaces` is set to `true` by default in test/cluster/suite.yaml, - `endpoint_snitch` is set to `GossipingPropertyFileSnitch` by default if the property file is provided in `ScyllaServer.__init__`. Ref scylladb/scylladb#23565 We should backport this patch to 2025.2 because this test is also flaky on CI jobs using 2025.2. Older branches don't have this test. Closes scylladb/scylladb#24364 (cherry picked from commit `8756c233e0`) Fixes #24756 Closes scylladb/scylladb#24757	2025-07-01 19:14:22 +02:00
Avi Kivity	dd509b9513	Merge '[Backport 2025.2] memtable: ensure _flushed_memory doesn't grow above total_memory' from Scylladb[bot] `dirty_memory_manager` tracks two quantities about memtable memory usage: "real" and "unspooled" memory usage. "real" is the total memory usage (sum of `occupancy().total_space()`) by all memtable LSA regions, plus a upper-bound estimate of the size of memtable data which has already moved to the cache region but isn't evictable (merged into the cache) yet. "unspooled" is the difference between total memory usage by all memtable LSA regions, and the total flushed memory (sum of `_flushed_memory`) of memtables. `dirty_memory_manager` controls the shares of compaction and/or blocks writes when these quantities cross various thresholds. "Total flushed memory" isn't a well defined notion, since the actual consumption of memory by the same data can vary over time due to LSA compactions, and even the data present in memtable can change over the course of the flush due to removals of outdated MVCC versions. So `_flushed_memory` is merely an approximation computed by `flush_reader` based on the data passing through it. This approximation is supposed to be a conservative lower bound. In particular, `_flushed_memory` should be not greater than `occupancy().total_space()`. Otherwise, for example, "unspooled" memory could become negative (and/or wrap around) and weird things could happen. There is an assertion in `~flush_memory_accounter` which checks that `_flushed_memory < occupancy().total_space()` at the end of flush. But it can fail. Without additional treatment, the memtable reader sometimes emits data which is already deleted. (In particular, it emites rows covered by a partition tombstone in a newer MVCC version.) This data is seen by `flush_reader` and accounted in `_flushed_memory`. But this data can be garbage-collected by the `mutation_cleaner` later during the flush and decrease `total_memory` below `_flushed_memory`. There is a piece of code in `mutation_cleaner` intended to prevent that. If `total_memory` decreases during a `mutation_cleaner` run, `_flushed_memory` is lowered by the same amount, just to preserve the asserted property. (This could also make `_flushed_memory` quite inaccurate, but that's considered acceptable). But that only works if `total_memory` is decreased during that run. It doesn't work if the `total_memory` decrease (enabled by the new allocator holes made by `mutation_cleaner`'s garbage collection work) happens asynchronously (due to memory reclaim for whatever reason) after the run. This patch fixes that by tracking the decreases of `total_memory` closer to the source. Instead of relying on `mutation_cleaner` to notify the memtable if it lowers `total_memory`, the memtable itself listens for notifications about LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's estimate of flushed memory decreased by the change in `total_memory` since the beginning of flush (if it was positive), and it keeps the amount of "spooled" memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`. Fixes scylladb/scylladb#21413 Backport candidate because it fixes a crash that can happen in existing stable branches. - (cherry picked from commit `7d551f99be`) - (cherry picked from commit `975e7e405a`) Parent PR: #21638 Closes scylladb/scylladb#24604 * github.com:scylladb/scylladb: memtable: ensure _flushed_memory doesn't grow above total memory usage replica/memtable: move region_listener handlers from dirty_memory_manager to memtable	2025-07-01 12:31:25 +03:00
Michał Chojnowski	50736e9740	test_sstable_compression_dictionaries_basic.py: fix a flaky check test_dict_memory_limit trains new dictionaries and checks (via metrics) that the old dictionaries are appropriately cleaned up. The problem is that the cleanup is asynchronous (because the lifetimes are handled by foreign_ptr, which sends the destructor call to the owner shard asynchronously), so the metrics might be checked a few milliseconds before the old dictionary is cleaned up. The dict lifetimes are lazy on purpose, the right thing to do is to just let the test retry the check. Fixes scylladb/scylladb#24516 Closes scylladb/scylladb#24526 (cherry picked from commit `cace55aaaf`) Closes scylladb/scylladb#24653	2025-07-01 12:30:25 +03:00
Avi Kivity	ee733c4d38	Merge '[Backport 2025.2] generic_server: fix connections semaphore config observer' from Scylladb[bot] In `ed3e4f33fd` we introduced new connection throttling feature which is controlled by uninitialized_connections_semaphore_cpu_concurrency config. But live updating of it was broken, this patch fixes it. When the temporary value from observer() is destroyed, it disconnects from updateable_value, so observation stops right away. We need to retain the observer. Backport: to 2025.2 where this feature was added Fixes: https://github.com/scylladb/scylladb/issues/24557 - (cherry picked from commit `c6a25b9140`) - (cherry picked from commit `45392ac29e`) - (cherry picked from commit `68ead01397`) Parent PR: #24484 Closes scylladb/scylladb#24679 * github.com:scylladb/scylladb: test: add test for live updates of generic server config utils: don't allow do discard updateable_value observer generic_server: fix connections semaphore config observer	2025-07-01 12:29:53 +03:00
Lakshmi Narayanan Sreethar	adab525151	utils/big_decimal: fix scale overflow when parsing values with large exponents The exponent of a big decimal string is parsed as an int32, adjusted for the removed fractional part, and stored as an int32. When parsing values like `1.23E-2147483647`, the unscaled value becomes `123`, and the scale is adjusted to `2147483647 + 2 = 2147483649`. This exceeds the int32 limit, and since the scale is stored as an int32, it overflows and wraps around, losing the value. This patch fixes that the by parsing the exponent as an int64 value and then adjusting it for the fractional part. The adjusted scale is then checked to see if it is still within int32 limits before storing. An exception is thrown if it is not within the int32 limits. Note that strings with exponents that exceed the int32 range, like `0.01E2147483650`, were previously not parseable as a big decimal. They are now accepted if the final adjusted scale fits within int32 limits. For the above value, unscaled_value = 1 and scale = -2147483648, so it is now accepted. This is in line with how Java's `BigDecimal` parses strings. Fixes: #24581 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#24640 (cherry picked from commit `279253ffd0`) Closes scylladb/scylladb#24692	2025-07-01 12:28:55 +03:00
Botond Dénes	5f45cf1683	test/boost/memtable_test: only inject error for test table Currently the test indiscriminately injects failures into the flushes of any table, via the IO extension mechanism. The tests want to check that the node correctly handles the IO error by self isolating, however the indiscriminate IO errors can have unintended consequences when they hit raft, leading to disorderly shutdown and failure of the tests. Testing raft's resiliency to IO errors if of course worth doing, but it is not the goal of this particular test, so to avoid the fallout, the IO errors are limited to the test tables only. Fixes: https://github.com/scylladb/scylladb/issues/24637 Closes scylladb/scylladb#24638 (cherry picked from commit `ee6d7c6ad9`) Closes scylladb/scylladb#24743	2025-07-01 12:28:05 +03:00
Avi Kivity	5e4941a74b	Merge '[Backport 2025.2] sstables/mx/writer: handle non-full prefix row keys' from Scylladb[bot] Although valid for compact tables, non-full (or empty) clustering key prefixes are not handled for row keys when writing sstables. Only the present components are written, consequently if the key is empty, it is omitted entirely. When parsing sstables, the parsing code unconditionally parses a full prefix. This mis-match results in parsing failures, as the parser parses part of the row content as a key resulting in a garbage key and subsequent mis-parsing of the row content and maybe even subsequent partitions. Introduce a new system table: `system.corrupt_data` and infrastructure similar to `large_data_handler`: `corrupt_data_handler` which abstracts how corrupt data is handled. The sstable writer now passes rows such corrupt keys to the corrupt data handler. This way, we avoid corrupting the sstables beyond parsing and the rows are also kept around in system.corrupt_data for later inspection and possible recovery. Add a full-stack test which checks that rows with bad keys are correctly handled. Fixes: https://github.com/scylladb/scylladb/issues/24489 The bug is present in all versions, has to be backported to all supported versions. - (cherry picked from commit `92b5fe8983`) - (cherry picked from commit `0753643606`) - (cherry picked from commit `b0d5462440`) - (cherry picked from commit `093d4f8d69`) - (cherry picked from commit `678deece88`) - (cherry picked from commit `64f8500367`) - (cherry picked from commit `b931145a26`) - (cherry picked from commit `3e1c50e9a7`) - (cherry picked from commit `46ff7f9c12`) - (cherry picked from commit `ebd9420687`) - (cherry picked from commit `aae212a87c`) - (cherry picked from commit `592ca789e2`) - (cherry picked from commit `edc2906892`) Parent PR: #24492 Closes scylladb/scylladb#24744 * github.com:scylladb/scylladb: test/boost/sstable_datafile_test: add test for corrupt data sstables/mx/writer: handler rows with empty keys test/lib/cql_assertions: introduce columns_assertions sstables: add corrupt_data_handler to sstables::sstables tools/scylla-sstable: make large_data_handler a local db: introduce corrupt_data_handler mutation: introduce frozen_mutation_fragment_v2 mutation/mutation_partition_view: read_{clustering,static}_row(): return row type mutation/mutation_partition_view: extract de-ser of {clustering,static} row idl-compiler.py: generate skip() definition for enums serializers idl: extract full_position.idl from position_in_partition.idl db/system_keyspace: add apply_mutation() db/system_keyspace: introduce the corrupt_data table	2025-07-01 12:27:01 +03:00
Abhinav Jha	160c937efe	group0: modify `start_operation` logic to account for synchronize phase race condition In the present scenario, the bootstrapping node undergoes synchronize phase after initialization of group0, then enters post_raft phase and becomes fully ready for group0 operations. The topology coordinator is agnostic of this and issues stream ranges command as soon as the node successfully completes `join_group0`. Although for a node booting into an already upgraded cluster, the time duration for which, node remains in synchronize phase is negligible but this race condition causes trouble in a small percentage of cases, since the stream ranges operation fails and node fails to bootstrap. This commit addresses this issue and updates the error throw logic to account for this edge case and lets the node wait (with timeouts) for synchronize phase to get over instead of throwing error. A regression test is also added to confirm the working of this code change. The test adds a wait in synchronize phase for newly joining node and releases only after the program counter reaches the synchronize case in the `start_operation` function. Hence it indicates that in the updated code, the start_operation will wait for the node to get done with the synchronize phase instead of throwing error. This PR fixes a bug. Hence we need to backport it. Fixes: scylladb/scylladb#23536 Closes scylladb/scylladb#23829 (cherry picked from commit `5ff693eff6`) Closes scylladb/scylladb#24628	2025-07-01 10:10:55 +02:00
Botond Dénes	236cab0f66	test/boost/sstable_datafile_test: add test for corrupt data * create a table with random schema * generate data: random mutations + one row with bad key * write data to sstable * check that only good data is written to sstable * check that the bad data was saved to system.corrupt_data (cherry picked from commit `edc2906892`)	2025-06-30 12:44:29 +00:00
Botond Dénes	7654ccbef5	test/lib/cql_assertions: introduce columns_assertions To enable targeted and optionally typed assertions against individual columns in a row. (cherry picked from commit `aae212a87c`)	2025-06-30 12:44:29 +00:00
Botond Dénes	9eb9ffe4bc	sstables: add corrupt_data_handler to sstables::sstables Similar to how large_data_handler is handled, propagate through sstables::sstables_manager and store its owner: replica::database. Tests and tools are also patched. Mostly mechanical changes, updating constructors and patching callers. (cherry picked from commit `ebd9420687`)	2025-06-30 12:44:29 +00:00
Aleksandra Martyniuk	7fd4d77fdd	test: rest_api: fix test_repair_task_progress test_repair_task_progress checks the progress of children of root repair task. However, nothing ensures that the children are already created. Wait until at least one child of a root repair task is created. Fixes: #24556. Closes scylladb/scylladb#24560 (cherry picked from commit `0deb9209a0`) Closes scylladb/scylladb#24655	2025-06-28 09:39:06 +03:00
Marcin Maliszkiewicz	a54cc8291c	test: add test for live updates of generic server config Affected config: uninitialized_connections_semaphore_cpu_concurrency (cherry picked from commit `68ead01397`)	2025-06-27 16:01:43 +02:00
Michał Chojnowski	9b98bacaa1	replica/memtable: move region_listener handlers from dirty_memory_manager to memtable The memtable wants to listen for changes in its `total_memory` in order to decrease its `_flushed_memory` in case some of the freed memory has already been accounted as flushed. (This can happen because the flush reader sees and accounts even outdated MVCC versions, which can be deleted and freed during the flush). Today, the memtable doesn't listen to those changes directly. Instead, some calls which can affect `total_memory` (in particular, the mutation cleaner) manually check the value of `total_memory` before and after they run, and they pass the difference to the memtable. But that's not good enough, because `total_memory` can also change outside of those manually-checked calls -- for example, during LSA compaction, which can occur anytime. This makes memtable's accounting inaccurate and can lead to unexpected states. But we already have an interface for listening to `total_memory` changes actively, and `dirty_memory_manager`, which also needs to know it, does just that. So what happens e.g. when `mutation_cleaner` runs is that `mutation_cleaner` checks the value of `total_memory` before it runs, then it runs, causing several changes to `total_memory` which are picked up by `dirty_memory_manager`, then `mutation_cleaner` checks the end value of `total_memory` and passes the difference to `memtable`, which corrects whatever was observed by `dirty_memory_manager`. To allow memtable to modify its `_flushed_memory` correctly, we need to make `memtable` itself a `region_listener`. Also, instead of the situation where `dirty_memory_manager` receives `total_memory` change notifications from `logalloc` directly, and `memtable` fixes the manager's state later, we want to only the memtable listen for the notifications, and pass them already modified accordingl to the manager, so there is no intermediate wrong states. This patch moves the `region_listener` callbacks from the `dirty_memory_manager` to the `memtable`. It's not intended to be a functional change, just a source code refactoring. The next patch will be a functional change enabled by this. (cherry picked from commit `7d551f99be`)	2025-06-24 13:06:06 +00:00
Benny Halevy	afa2b40ac9	disk_space_monitor: add space_source_registration Register the current space_source_fn in an RAII object that resets monitor._space_source to the previous function when the RAII object is destroyed. Use space_source_registration in database_test:: mutation_dump_generated_schema_deterministic_id_version to prevent use-after-stack-return in the test. Fixes #24314 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#24342 (cherry picked from commit `8b387109fc`) Closes scylladb/scylladb#24392	2025-06-24 10:02:23 +03:00
Raphael S. Carvalho	fa420f8644	replica: Fix truncate assert failure Truncate doesn't really go well with concurrent writes. The fix (#23560) exposed a preexisting fragility which I missed. 1) truncate gets RP mark X, truncated_at = second T 2) new sstable written during snapshot or later, also at second T (difference of MS) 3) discard_sstables() get RP Y > saved RP X, since creation time of sstable with RP Y is equal to truncated_at = second T. So the problem is that truncate is using a clock of second granularity for filtering out sstables written later, and after we got low mark and truncate time, it can happen that a sstable is flushed later within the same second, but at a different millisecond. By switching to a millisecond clock (db_clock), we allow sstables written later within the same second from being filtered out. It's not perfect but extremely unlikely a new write lands and get flushed in the same millisecond we recorded truncated_at timepoint. In practice, truncate will not be used concurrently to writes, so this should be enough for our tests performing such concurrent actions. We're moving away from gc_clock which is our cheap lowres_clock, but time is only retrieved when creating sstable objects, which frequency of creation is low enough for not having significant consequences, and also db_clock should be cheap enough since it's usually syscall-less. Fixes #23771. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#24426 (cherry picked from commit `2d716f3ffe`) Closes scylladb/scylladb#24435	2025-06-24 10:02:06 +03:00
Andrzej Jackowski	60bc1c339c	test: wait for normal state propagation in test_auth_v2_migration By default, cluster tests have skip_wait_for_gossip_to_settle=0 and ring_delay_ms=0. In tests with gossip topology, it may lead to a race, where nodes see different state of each other. In case of test_auth_v2_migration, there are three nodes. If the first node already knows that the third node is NORMAL, and the second node does not, the system_auth tables can return incomplete results. To avoid such a race, this commit adds a check that all nodes see other nodes as NORMAL before any writes are done. Refs: #24163 Closes scylladb/scylladb#24185 (cherry picked from commit `555d897a15`) Closes scylladb/scylladb#24520	2025-06-24 10:01:42 +03:00
Michał Chojnowski	3eba371e09	test/boost/mutation_reader_test: fix a use-after-free in `test_fast_forwarding_combined_reader_is_consistent_with_slicing` The contract in mutation_reader.hh says: ``` // pr needs to be valid until the reader is destroyed or fast_forward_to() // is called again. future<> fast_forward_to(const dht::partition_range& pr) { ``` `test_fast_forwarding_combined_reader_is_consistent_with_slicing` violates this by passing a temporary to `fast_forward_to`. Fix that. Fixes scylladb/scylladb#24542 Closes scylladb/scylladb#24543 (cherry picked from commit `27f66fb110`) Closes scylladb/scylladb#24548	2025-06-24 10:01:19 +03:00
Karol Nowacki	76bd23cddd	cql, schema: Extend name length limit from 48 to 192 bytes This commit increases the maximum length of names for keyspaces, tables, materialized views, and indexes from 48 to 192 bytes. The previous 48-bytes limit was inherited from Cassandra 3 for compatibility. However, this validation was removed in Cassandra 4 and 5 (see CASSANDRA-20389) and some usage scenarios (such as some feature store workflows generating long table names) now depend on this relaxed constraint. This change brings ScyllaDB's behavior in line with modern Cassandra versions and better supports these use cases. The new limit of 192 bytes is derived from underlying filesystem limitations to prevent runtime errors when creating directories for table data. When a new table is created, ScyllaDB generates a directory for its SSTables. The directory name is constructed from the table name, a dash, and a 32-character UUID. For a CDC-enabled table, an associated log table is also created, which has the suffix `_scylla_cdc_log` appended to its name. The directory name for this log table becomes the longest possible representation. Additionally we reserve 15 bytes for future use, allowing for potential future extensions without breaking existing schemas. To guarantee that directory creation never fails due to exceeding filesystem name limits, the maximum name length is calculated as follows: 255 bytes (common filesystem limit for a path component) - 32 bytes (for the 32-character UUID string) - 1 byte (for the '-' separator) - 15 bytes (for the '_scylla_cdc_log' suffix) - 15 bytes (reserved for future use) ---------- = 192 bytes (Maximum allowed name length) This calculation is similar in principle to the one proposed for Cassandra to fix related directory creation failures (see apache/cassandra/pull/4038). This patch also updates/adds all associated tests to validate the new 192-byte limit. The documentation has been updated accordingly. (cherry picked from commit `4577c66a04`)	2025-06-22 17:38:30 +00:00
Michael Litvak	305f827888	test/cluster/test_tablets: test restart during tablet cleanup Add a test that reproduces issue scylladb/scylladb#23481. The test migrates a tablet from one node to another, and while the tablet is in some stage of cleanup - either before or right after, depending on the parameter - the leaving replica, on which the tablet is cleaned, is restarted. This is interesting because when the leaving replica starts and loads its state, the tablet could be in different stages of cleanup - the SSTables may still exist or they may have been cleaned up already, and we want to make sure the state is loaded correctly. (cherry picked from commit `bd88ca92c8`)	2025-06-17 13:59:10 +00:00
Michael Litvak	d094bc6fc9	test: tablets: add get_tablet_info helper Add a helper for tests to get the tablet info from system.tablets for a tablet owning a given token. (cherry picked from commit `fb18fc0505`)	2025-06-17 13:59:10 +00:00
Botond Dénes	a63b22eec6	Merge '[Backport 2025.2] tablets: fix missing data after tablet merge ' from Scylladb[bot] Consider the following scenario: 1) let's assume tablet 0 has range [1, 5] (pre merge) 2) tablet merge happens, tablet 0 has now range [1, 10] 3) tablet_sstable_set isn't refreshed, so holds a stale state, thinks tablet 0 still has range [1, 5] 4) during a full scan, forward service will intersect the full range with tablet ranges and consume one tablet at a time 5) replica service is asked to consume range [1, 10] of tablet 0 (post merge) We have two possible outcomes: With cache bypass: 1) cache reader is bypassed 2) sstable reader is created on range [1, 10] 3) unrefreshed tablet_sstable_set holds stale state, but select correctly all sstables intersecting with range [1, 10] With cache: 1) cache reader is created 2) finds partition with token 5 is cached 3) sstable reader is created on range [1, 4] (later would fast forward to range [6, 10]; also belongs to tablet 0) 4) incremental selector consumes the pre-merge sstable spanning range [1, 5] 4.1) since the partitioned_sstable_set pre-merge contains only that sstable, EOS is reached 4.2) since EOS is reached, the fast forward to range [6, 10] is not allowed. So with the set refreshed, sstable set is aligned with tablet ranges, and no premature EOS is signalled, otherwise preventing fast forward to from happening and all data from being properly captured in the read. This change fixes the bug and triggers a mutation source refresh whenever the number of tablets for the table has changed, not only when we have incoming tablets. Additionally, includes a fix for range reads that span more than one tablet, which can happen during split execution. Fixes: https://github.com/scylladb/scylladb/issues/23313 This change needs to be backported to all supported versions which implement tablet merge. - (cherry picked from commit `d0329ca370`) - (cherry picked from commit `1f9f724441`) - (cherry picked from commit `53df911145`) Parent PR: #24287 Closes scylladb/scylladb#24339 * github.com:scylladb/scylladb: replica: Fix range reads spanning sibling tablets test: add reproducer and test for mutation source refresh after merge tablets: trigger mutation source refresh on tablet count change	2025-06-17 08:35:14 +03:00
Raphael S. Carvalho	79958472bc	replica: Fix range reads spanning sibling tablets We don't guarantee that coordinators will only emit range reads that span only one tablet. Consider this scenario: 1) split is about to be finalized, barrier is executed, completes. 2) coordinator starts a read, uses pre-split erm (split not committed to group0 yet) 3) split is committed to group0, all replicas switch storage. 4) replica-side read is executed, uses a range which spans tablets. We could fix it with two-phase split execution. Rather than pushing the complexity to higher levels, let's fix incremental selector which should be able to serve all the tokens owned by a given shard. During split execution, either of sibling tablets aren't going anywhere since it runs with state machine locked, so a single read spanning both sibling tablets works as long as the selector works across tablet boundaries. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `53df911145`)	2025-06-15 09:14:38 -03:00
Ferenc Szili	ba192c1a29	test: add reproducer and test for mutation source refresh after merge This change adds a reproducer and test for the fix where the local mutation source is not always refreshed after a tablet merge. (cherry picked from commit `1f9f724441`)	2025-06-15 09:14:37 -03:00
Robert Bindar	a926cba476	Add support for nodetool refresh --skip-reshape This patch adds the new option in nodetool, patches the load_new_ss_tables REST request with a new parameter and skips the reshape step in refresh if this flag is passed. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#24409 Fixes: #24365 (cherry picked from commit `ca1a9c8d01`) Closes scylladb/scylladb#24472	2025-06-13 14:06:19 +03:00
Michał Chojnowski	9c28b812ca	db/config: add an option that disables dict-aware sstable compressors in DDL statements For reasons, we want to be able to disallow dictionary-aware compressors in chosen deployments. This patch adds a knob for that. When the knob is disabled, dictionary-aware compressors will be rejected in the validation stage of CREATE and ALTER statements. Closes scylladb/scylladb#24355 (cherry picked from commit `7d26d3c7cb`) Closes scylladb/scylladb#24454	2025-06-13 14:03:32 +03:00
Michael Litvak	d792916e8e	test_cdc_generation_clearing: wait for generations to propagate In test_cdc_generation_clearing we trigger events that update CDC generations, verify the generations are updated as expected, and verify the system topology and CDC generations are consistent on all nodes. Before checking that all nodes are consistent and have the same CDC generations, we need to consider that the changes are propagated through raft and take some time to propagate to all nodes. Currently, we wait for the change to be applied only on the first server which runs the CDC generation publisher fiber and read the CDC generations from this single node. The consistency check that follows could fail if the change was not propagated to some other node yet. To fix that, before checking consistency with all nodes, we execute a read barrier on all nodes so they all see the same state as the leader. Fixes scylladb/scylladb#24407 Closes scylladb/scylladb#24433 (cherry picked from commit `8aeb404893`) Closes scylladb/scylladb#24450	2025-06-10 15:50:40 +03:00
Ernest Zaslavsky	4fed3a5a5a	encryption_test: Catch exact exception Apparently `test_kms_network_error` will succeed at any circumstances since most of our exceptions derive from `std::exception`, so whatever happens to the test, for whatever reason it will throw, the test will be marked as passed. Start catching the exact exception that we expect to be thrown. Maybe somewhat related to https://github.com/scylladb/scylladb/issues/22628 Fixes: https://github.com/scylladb/scylladb/issues/24145 reapplies reverted: https://github.com/scylladb/scylladb/pull/24065 Should be backported to 2025.2. Closes scylladb/scylladb#24242 (cherry picked from commit `a39b773d36`) Closes scylladb/scylladb#24402	2025-06-06 08:48:02 +03:00
Pavel Emelyanov	024af57bd5	nodetool: Add refresh --skip-cleanup option The option "conflicts" with load-and-stream. Tests and doc included. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `c0796244bb`)	2025-06-05 17:52:13 +03:00
Botond Dénes	a5251b4d44	Merge '[Backport 2025.2] Add --scope arg to `notedool refresh`' from Scylladb[bot] This PR adds the `--scope` option to `nodetool refresh`. Like in the case of `nodetool restore`, you can pass either of: * `node` - On the local node. * `rack` - On the local rack. * `dc` - In the datacenter (DC) where the local node lives. * `all` (default) - Everywhere across the cluster. as scope. The feature is based on the existing load_and_stream paths, so it requires passing `--load-and-stream` to the `refresh` command, although this might change in the near future. Fixes https://github.com/scylladb/scylladb/issues/23564 - (cherry picked from commit `c570941692`) Parent PR: #23861 Closes scylladb/scylladb#24379 * github.com:scylladb/scylladb: Add nodetool refresh --scope option Refactor out code from test_restore_with_streaming_scopes Refactor out code from test_restore_with_streaming_scopes Refactor out code from test_restore_with_streaming_scopes Refactor out code from test_restore_with_streaming_scopes Refactor out code from test_restore_with_streaming_scopes	2025-06-05 11:54:17 +03:00
Robert Bindar	b62264e1d9	Add nodetool refresh --scope option This change adds the --scope option to nodetool refresh. Like in the case of nodetool restore, you can pass either of: * node - On the local node. * rack - On the local rack. * dc - In the datacenter (DC) where the local node lives. * all (default) - Everywhere across the cluster. as scope. The feature is based on the existing load_and_stream paths, so it requires passing --load-and-stream to the refresh command. Also, it is not compatible with the --primary-replica-only option. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#23861 (cherry picked from commit `c570941692`)	2025-06-04 11:59:17 +03:00
Robert Bindar	36cc0f8e7e	Refactor out code from test_restore_with_streaming_scopes part 5: check_data_is_back Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> (cherry picked from commit `548a1ec20a`)	2025-06-04 11:54:07 +03:00
Robert Bindar	a885c87547	Refactor out code from test_restore_with_streaming_scopes part 4: compute_scope Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> (cherry picked from commit `29309ae533`)	2025-06-04 11:54:01 +03:00
Robert Bindar	371fc05943	Refactor out code from test_restore_with_streaming_scopes part 3: create_dataset Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> (cherry picked from commit `a0f0580a9c`)	2025-06-04 11:53:51 +03:00
Robert Bindar	4366cd5a81	Refactor out code from test_restore_with_streaming_scopes part 2: take_snapshot Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> (cherry picked from commit `5171ca385a`)	2025-06-04 11:53:43 +03:00
Robert Bindar	38ee119112	Refactor out code from test_restore_with_streaming_scopes part 1: create_cluster Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> (cherry picked from commit `f09bb20ac4`)	2025-06-04 11:53:32 +03:00
Piotr Dulikowski	6edf92a9e3	Merge '[Backport 2025.2] test/boost: Adjust tests to RF-rack-valid keyspaces' from Scylladb[bot] This PR adjusts existing Boost tests so they respect the invariant introduced by enabling `rf_rack_valid_keyspaces` configuration option. We disable it explicitly in more problematic tests. After that, we enable the option by default in the whole test suite. Fixes scylladb/scylladb#23958 Backport: backporting to 2025.1 to be able to test the implementation there too. - (cherry picked from commit `6e2fb79152`) - (cherry picked from commit `e4e3b9c3a1`) - (cherry picked from commit `1199c68bac`) - (cherry picked from commit `cd615c3ef7`) - (cherry picked from commit `fa62f68a57`) - (cherry picked from commit `22d6c7e702`) - (cherry picked from commit `237638f4d3`) - (cherry picked from commit `c60035cbf6`) Parent PR: scylladb/scylladb#23802 Closes scylladb/scylladb#24368 * github.com:scylladb/scylladb: test/lib/cql_test_env.cc: Enable rf_rack_valid_keyspaces by default test/boost/tablets_test.cc: Explicitly disable rf_rack_valid_keyspaces in problematic tests test/boost/tablets_test.cc: Fix indentation in test_load_balancing_with_random_load test/boost/tablets_test.cc: Adjust test_load_balancing_with_random_load to RF-rack-validity test/boost/tablets_test.cc: Adjust test_load_balancing_works_with_in_progress_transitions to RF-rack-validity test/boost/tablets_test.cc: Adjust test_load_balancing_resize_requests to RF-rack-validity test/boost/tablets_test.cc: Adjust test_load_balancing_with_two_empty_nodes to RF-rack-validity test/boost/tablets_test.cc: Adjust test_load_balancer_shuffle_mode to RF-rack-validity	2025-06-04 10:24:35 +02:00
Nadav Har'El	609ad01bbc	alternator: hide internal tags from users The "tags" mechanism in Alternator is a convenient way to attach metadata to Alternator tables. Recently we have started using it more and more for internal metadata storage: * UpdateTimeToLive stores the attribute in a tag system:ttl_attribute * CreateTable stores provisioned throughput in tags system:provisioned_rcu and system:provisioned_wcu * CreateTable stores the table's creation time in a tag called system:table_creation_time. We do not want any of these internal tags to be visible to a ListTagsOfResource request, because if they are visible (as before this patch), systems such as Terraform can get confused when they suddenly see a tag which they didn't set - and may even attempt to delete it (as reported in issue #24098). Moreover, we don't want any of these internal tags to be writable with TagResource or UntagResource: If a user wants to change the TTL setting they should do it via UpdateTimeToLive - not by writing directly to tags. So in this patch we forbid read or write to any tag that begins with the "system:" prefix, except one: "system:write_isolation". That tag is deliberately intended to be writable by the user, as a configuration mechanism, and is never created internally by Scylla. We should have perhaps chosen a different prefix for configurable vs. internal tags, or chosen more unique prefixes - but let's not change these historic names now. This patch also adds regression tests for the internal tags features, failing before this patch and passing after: 1. internal tags, specifically system:ttl_attribute, are not visible in ListTagsOfResource, and cannot be modified by TagResource or UntagResource. 2. system:write_isolation is not internal, and be written by either TagResource or UntagResource, and read with ListTagsOfResource. This patch also fixes a bug in the test where we added more checks for system:write_isolation - test_tag_resource_write_isolation_values. This test forgot to remove the system:write_isolation tags from test_table when it ended, which would lead to other tests that run later to run with a non-default write isolation - something which we never intended. Fixes #24098. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#24299 (cherry picked from commit `6cbcabd100`) Closes scylladb/scylladb#24377	2025-06-04 09:56:33 +03:00
Dawid Mędrek	5130ec84de	test/lib/cql_test_env.cc: Enable rf_rack_valid_keyspaces by default We've adjusted all of the Boost tests so they respect the invariant enforced by the `rf_rack_valid_keyspaces` configuration option, or explicitly disabled the option in those that turned out to be more problematic and will require more attention. Thanks to that, we can now enable it by default in the test suite. (cherry picked from commit `c60035cbf6`)	2025-06-03 11:10:16 +00:00
Dawid Mędrek	9938183ace	test/boost/tablets_test.cc: Explicitly disable rf_rack_valid_keyspaces in problematic tests Some of the tests in the file verify more subtle parts of the behavior of tablets and rely on topology layouts or using keyspaces that violate the invariant the `rf_rack_valid_keyspaces` configuration option is trying to enforce. Because of that, we explicitly disable the option to be able to enable it by default in the rest of the test suite in the following commit. (cherry picked from commit `237638f4d3`)	2025-06-03 11:10:16 +00:00
Dawid Mędrek	1271b42848	test/boost/tablets_test.cc: Fix indentation in test_load_balancing_with_random_load (cherry picked from commit `22d6c7e702`)	2025-06-03 11:10:16 +00:00
Dawid Mędrek	012e248792	test/boost/tablets_test.cc: Adjust test_load_balancing_with_random_load to RF-rack-validity We make sure that the keyspaces created in the test are always RF-rack-valid. To achieve that, we change how the test is performed. Before this commit, we first created a cluster and then ran the actual test logic multiple times. Each of those test cases created a keyspace with a random replication factor. That cannot work with `rf_rack_valid_keyspaces` set to true. We cannot modify the property file of a node (see commit: `eb5b52f598`), so once we set up the cluster, we cannot adjust its layout to work with another replication factor. To solve that issue, we also recreate the cluster in each test case. Now we choose the replication factor at random, create a cluster distributing nodes across as many racks as RF, and perform the rest of the logic. We perform it multiple times in a loop so that the test behaves as before these changes. (cherry picked from commit `fa62f68a57`)	2025-06-03 11:10:16 +00:00
Dawid Mędrek	1364eec694	test/boost/tablets_test.cc: Adjust test_load_balancing_works_with_in_progress_transitions to RF-rack-validity We distribute the nodes used in the test across two racks so we can run the test with `rf_rack_valid_keyspaces` set to true. We want to avoid cross-rack migrations and keep the test as realistic as possible. Since host3 is supposed to function as a new node in the cluster, we change the layout of it: now, host1 has 2 shards and resides in a separate rack. Most of the remaining test logic is preserved and behaves as before this commit. There is a slight difference in the tablet migrations. Before the commit, we were migrating a tablet between nodes of different shard counts. Now it's impossible because it would force us to migrate tablets between racks. However, since the test wants to simply verify that an ongoing migration doesn't interfere with load balancing and still leads to a perfect balance, that still happens: we explicitly migrate ONLY 1 tablet from host2 to host3, so to achieve the goal, one more tablet needs to be migrated, and we test that. (cherry picked from commit `cd615c3ef7`)	2025-06-03 11:10:16 +00:00
Dawid Mędrek	85fe37a8e4	test/boost/tablets_test.cc: Adjust test_load_balancing_resize_requests to RF-rack-validity We assign the nodes created by the test to separate racks. It has no impact on the test since the keyspace used in the test uses RF=2, so the tablet replicas will still be the same. (cherry picked from commit `1199c68bac`)	2025-06-03 11:10:16 +00:00
Dawid Mędrek	e21bdbb9ef	test/boost/tablets_test.cc: Adjust test_load_balancing_with_two_empty_nodes to RF-rack-validity We distribute the nodes used in the test between two racks. Although that may affect how tablets behave in general, this change will not have any real impact on the test. The test verifies that load balancing eventually balances tablets in the cluster, which will still happen. Because of that, the changes in this commit are safe to apply. (cherry picked from commit `e4e3b9c3a1`)	2025-06-03 11:10:16 +00:00
Dawid Mędrek	ca8762885b	test/boost/tablets_test.cc: Adjust test_load_balancer_shuffle_mode to RF-rack-validity We distribute the nodes used in the test between two racks. Although that may have an impact on how tablets behave, it's orthogonal to what the test verifies -- whether the topology coordinator is continuously in the tablet migration track. Because of that, it's safe to make this change without influencing the test. (cherry picked from commit `6e2fb79152`)	2025-06-03 11:10:15 +00:00
Michał Chojnowski	3a7a1dc4a9	test/boost/sstable_compressor_factory_test: define a test suite name It seems that tests in test/boost/combined_tests have to define a test suite name, otherwise they aren't picked up by test.py. Fixes #24199 Closes scylladb/scylladb#24200 (cherry picked from commit `ff8a119f26`) Closes scylladb/scylladb#24255	2025-06-03 12:01:35 +03:00
Botond Dénes	9a7ea917eb	mutation/mutation_compactor: cache regular/shadowable max-purgable in separate members Max purgeable has two possible values for each partition: one for regular tombstones and one for shadowable ones. Yet currently a single member is used to cache the max-purgeable value for the partition, so whichever kind of tombstone is checked first, its max-purgeable will become sticky and apply to the other kind of tombstones too. E.g. if the first can_gc() check is for a regular tombstone, its max-purgeable will apply to shadowable tombstones in the partition too, meaning they might not be purged, even though they are purgeable, as the shadowable max-purgeable is expected to be more lenient. The other way around is worse, as it will result in regular tombstone being incorrectly purged, permitted by the more lenient shadowable tombstone max-purgeable. Fix this by caching the two possible values in two separate members. A reproducer unit test is also added. Fixes: scylladb/scylladb#23272 Closes scylladb/scylladb#24171 (cherry picked from commit `7db956965e`) Closes scylladb/scylladb#24329	2025-06-03 09:51:52 +03:00
Michael Litvak	5aca2c134d	test_cdc_generation_publishing: fix to read monotonically The test test_multiple_unpublished_cdc_generations reads the CDC generation timestamps to verify they are published in the correct order. To do so it issues reads in a loop with a short sleep period and checks the differences between consecutive reads, assuming they are monotonic. However the assumption that the reads are monotonic is not valid, because the reads are issued with consistency_level=ONE, thus we may read timestamps {A,B} from some node, then read timestamps {A} from another node that didn't apply the write of the new timestamp B yet. This will trigger the assert in the test and fail. To ensure the reads are monotonic we change the test to use consistency level ALL for the reads. Fixes scylladb/scylladb#24262 Closes scylladb/scylladb#24272 (cherry picked from commit `3a1be33143`) Closes scylladb/scylladb#24336	2025-06-02 14:42:57 +03:00
Pavel Emelyanov	eb78d3aefb	test/result_utils: Do not assume map_reduce reducing order When map_reduce is called on a collection, one shouldn't expect that it processes the elements of the collection in any specific order. Current test of map-reduce over boost outcome assumes that if reduce function is the string concatenation, then it would concatenate the given vector of strings in the order they are listed. That requirement should be relaxed, and the result may have reversed concatentation. Fixes scylladb/scylladb#24321 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24325 (cherry picked from commit `a65ffdd0df`) Closes scylladb/scylladb#24337	2025-06-02 14:00:07 +03:00

1 2 3 4 5 ...

8826 Commits