scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Botond Dénes	1ad5214bdd	Merge '[Backport 2025.1] mutation: check key of inserted rows' from Scylladb[bot] Make sure the keys are full prefixes as it is expected to be the case for rows. At severeal occasions we have seen empty row keys make their ways into the sstables, despite the fact that they are not allowed by the CQL frontend. This means that such empty keys are possibly results of memory corruption or use-after-{free,copy} errors. The source of the corruption is impossible to pinpoint when the empty key is discovered in the sstable. So this patch adds checks for such keys to places where mutations are built: when building or unserializing mutations. Fixes: https://github.com/scylladb/scylladb/issues/24506 Not a typical backport candidate (not a bugfix or regression fix), but we should still backport so we have the additional checks deployed to existing production clusters. - (cherry picked from commit `8b756ea837`) - (cherry picked from commit `ab96c703ff`) Parent PR: #24497 Closes scylladb/scylladb#24739 * github.com:scylladb/scylladb: mutation: check key of inserted rows compound: optimize is_full() for single-component types	2025-07-02 12:03:29 +03:00
Lakshmi Narayanan Sreethar	99ec69e27d	utils/big_decimal: fix scale overflow when parsing values with large exponents The exponent of a big decimal string is parsed as an int32, adjusted for the removed fractional part, and stored as an int32. When parsing values like `1.23E-2147483647`, the unscaled value becomes `123`, and the scale is adjusted to `2147483647 + 2 = 2147483649`. This exceeds the int32 limit, and since the scale is stored as an int32, it overflows and wraps around, losing the value. This patch fixes that the by parsing the exponent as an int64 value and then adjusting it for the fractional part. The adjusted scale is then checked to see if it is still within int32 limits before storing. An exception is thrown if it is not within the int32 limits. Note that strings with exponents that exceed the int32 range, like `0.01E2147483650`, were previously not parseable as a big decimal. They are now accepted if the final adjusted scale fits within int32 limits. For the above value, unscaled_value = 1 and scale = -2147483648, so it is now accepted. This is in line with how Java's `BigDecimal` parses strings. Fixes: #24581 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#24640 (cherry picked from commit `279253ffd0`) Closes scylladb/scylladb#24691	2025-07-02 11:34:38 +03:00
Szymon Malewski	1d78f17697	utils/exceptions.cc: Added check for `exceptions::request_timeout_exception` in `is_timeout_exception` function. It solves the issue, where in some cases a timeout exceptions in CAS operations are logged incorrectly as a general failure. Fixes #24591 Closes scylladb/scylladb#24619 (cherry picked from commit `f28bab741d`) Closes scylladb/scylladb#24684	2025-07-02 11:30:26 +03:00
Botond Dénes	864ba576c5	Merge '[Backport 2025.1] tablets: fix missing data after tablet merge ' from Scylladb[bot] Consider the following scenario: 1) let's assume tablet 0 has range [1, 5] (pre merge) 2) tablet merge happens, tablet 0 has now range [1, 10] 3) tablet_sstable_set isn't refreshed, so holds a stale state, thinks tablet 0 still has range [1, 5] 4) during a full scan, forward service will intersect the full range with tablet ranges and consume one tablet at a time 5) replica service is asked to consume range [1, 10] of tablet 0 (post merge) We have two possible outcomes: With cache bypass: 1) cache reader is bypassed 2) sstable reader is created on range [1, 10] 3) unrefreshed tablet_sstable_set holds stale state, but select correctly all sstables intersecting with range [1, 10] With cache: 1) cache reader is created 2) finds partition with token 5 is cached 3) sstable reader is created on range [1, 4] (later would fast forward to range [6, 10]; also belongs to tablet 0) 4) incremental selector consumes the pre-merge sstable spanning range [1, 5] 4.1) since the partitioned_sstable_set pre-merge contains only that sstable, EOS is reached 4.2) since EOS is reached, the fast forward to range [6, 10] is not allowed. So with the set refreshed, sstable set is aligned with tablet ranges, and no premature EOS is signalled, otherwise preventing fast forward to from happening and all data from being properly captured in the read. This change fixes the bug and triggers a mutation source refresh whenever the number of tablets for the table has changed, not only when we have incoming tablets. Additionally, includes a fix for range reads that span more than one tablet, which can happen during split execution. Fixes: https://github.com/scylladb/scylladb/issues/23313 This change needs to be backported to all supported versions which implement tablet merge. - (cherry picked from commit `d0329ca370`) - (cherry picked from commit `1f9f724441`) - (cherry picked from commit `53df911145`) Parent PR: #24287 Closes scylladb/scylladb#24338 * github.com:scylladb/scylladb: replica: Fix range reads spanning sibling tablets test: add reproducer and test for mutation source refresh after merge tablets: trigger mutation source refresh on tablet count change	2025-07-02 11:28:35 +03:00
Gleb Natapov	583d33ac8c	storage_proxy: retry paxos repair even if repair write succeeded After paxos state is repaired in begin_and_repair_paxos we need to re-check the state regardless if write back succeeded or not. This is how the code worked originally but it was unintentionally changed when co-routinized in `61b2e41a23`. Fixes #24630 Closes scylladb/scylladb#24651 (cherry picked from commit `5f953eb092`) Closes scylladb/scylladb#24701	2025-07-02 10:18:56 +02:00
Jenkins Promoter	33c79319d2	Update pgo profiles - aarch64	2025-07-01 04:32:49 +03:00
Jenkins Promoter	c50093a5dc	Update pgo profiles - x86_64	2025-07-01 04:11:16 +03:00
Botond Dénes	76efa77466	test/boost/memtable_test: only inject error for test table Currently the test indiscriminately injects failures into the flushes of any table, via the IO extension mechanism. The tests want to check that the node correctly handles the IO error by self isolating, however the indiscriminate IO errors can have unintended consequences when they hit raft, leading to disorderly shutdown and failure of the tests. Testing raft's resiliency to IO errors if of course worth doing, but it is not the goal of this particular test, so to avoid the fallout, the IO errors are limited to the test tables only. Fixes: https://github.com/scylladb/scylladb/issues/24637 Closes scylladb/scylladb#24638 (cherry picked from commit `ee6d7c6ad9`) Closes scylladb/scylladb#24741	2025-06-30 20:15:07 +03:00
Anna Stuchlik	acbe88172a	doc: remove OSS mention from the SI notes This commit removes a confusing reference to an Open Source version form the Local Secondary Indexes page. Fixes https://github.com/scylladb/scylladb/issues/24668 Closes scylladb/scylladb#24673 (cherry picked from commit `2367330513`) Closes scylladb/scylladb#24722	2025-06-30 18:54:29 +03:00
Raphael S. Carvalho	52ae6c2aba	replica: Fix range reads spanning sibling tablets We don't guarantee that coordinators will only emit range reads that span only one tablet. Consider this scenario: 1) split is about to be finalized, barrier is executed, completes. 2) coordinator starts a read, uses pre-split erm (split not committed to group0 yet) 3) split is committed to group0, all replicas switch storage. 4) replica-side read is executed, uses a range which spans tablets. We could fix it with two-phase split execution. Rather than pushing the complexity to higher levels, let's fix incremental selector which should be able to serve all the tokens owned by a given shard. During split execution, either of sibling tablets aren't going anywhere since it runs with state machine locked, so a single read spanning both sibling tablets works as long as the selector works across tablet boundaries. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `53df911145`)	2025-06-30 10:50:59 -03:00
Ferenc Szili	b0140cb646	test: add reproducer and test for mutation source refresh after merge This change adds a reproducer and test for the fix where the local mutation source is not always refreshed after a tablet merge. (cherry picked from commit `1f9f724441`)	2025-06-30 10:50:53 -03:00
Botond Dénes	d37efb0c08	mutation: check key of inserted rows Make sure the keys are full prefixes as it is expected to be the case for rows. At severeal occasions we have seen empty row keys make their ways into the sstables, despite the fact that they are not allowed by the CQL frontend. This means that such empty keys are possibly results of memory corruption or use-after-{free,copy} errors. The source of the corruption is impossible to pinpoint when the empty key is discovered in the sstable. So this patch adds checks for such keys to places where mutations are built: when building or unserializing mutations. The test row_cache_test/test_reading_of_nonfull_keys needs adjustment to work with the changes: it has to make the schema use compact storage, otherwise the non-full changes used by this tests are rejected by the new checks. Fixes: https://github.com/scylladb/scylladb/issues/24506 (cherry picked from commit `ab96c703ff`)	2025-06-30 12:43:04 +00:00
Botond Dénes	3af0561078	compound: optimize is_full() for single-component types For such compounds, unserializing the key is not necessary to determine whether the key is full or not. (cherry picked from commit `8b756ea837`)	2025-06-30 12:43:03 +00:00
Abhinav Jha	2b43fb9841	group0: modify `start_operation` logic to account for synchronize phase race condition In the present scenario, the bootstrapping node undergoes synchronize phase after initialization of group0, then enters post_raft phase and becomes fully ready for group0 operations. The topology coordinator is agnostic of this and issues stream ranges command as soon as the node successfully completes `join_group0`. Although for a node booting into an already upgraded cluster, the time duration for which, node remains in synchronize phase is negligible but this race condition causes trouble in a small percentage of cases, since the stream ranges operation fails and node fails to bootstrap. This commit addresses this issue and updates the error throw logic to account for this edge case and lets the node wait (with timeouts) for synchronize phase to get over instead of throwing error. A regression test is also added to confirm the working of this code change. The test adds a wait in synchronize phase for newly joining node and releases only after the program counter reaches the synchronize case in the `start_operation` function. Hence it indicates that in the updated code, the start_operation will wait for the node to get done with the synchronize phase instead of throwing error. This PR fixes a bug. Hence we need to backport it. Fixes: scylladb/scylladb#23536 Closes scylladb/scylladb#23829 (cherry picked from commit `5ff693eff6`) Closes scylladb/scylladb#24627	2025-06-29 14:33:01 +03:00
Avi Kivity	837f3eb6c2	Merge '[Backport 2025.1] main: don't start maintenance auth service if not enabled' from Scylladb[bot] In `f96d30c2b5` we introduced the maintenance service, which is an additional instance of auth::service. But this service has a somewhat confusing 2-level startup mechanism: it's initialized with sharded<Service>::start and then auth::service::start (different method with the same name to confuse even more). When maintenance_socket was disabled (default setting), the code did only the first part of the startup. This registered a config observer but didn't create a permission_cache instance. As a result, a crash on SIGHUP when config is reloaded can occur. Fixes: https://github.com/scylladb/scylladb/issues/24528 Backport: all not eol versions since 6.0 and 2025.1 - (cherry picked from commit `97c60b8153`) - (cherry picked from commit `dd01852341`) Parent PR: #24527 Closes scylladb/scylladb#24569 * github.com:scylladb/scylladb: test: add test for live updates of permissions cache config main: don't start maintenance auth service if not enabled	2025-06-29 14:32:35 +03:00
Aleksandra Martyniuk	b7cb4dd413	test: rest_api: fix test_repair_task_progress test_repair_task_progress checks the progress of children of root repair task. However, nothing ensures that the children are already created. Wait until at least one child of a root repair task is created. Fixes: #24556. Closes scylladb/scylladb#24560 (cherry picked from commit `0deb9209a0`) Closes scylladb/scylladb#24654	2025-06-28 09:39:52 +03:00
Marcin Maliszkiewicz	d27150e1ef	test: add test for live updates of permissions cache config (cherry picked from commit `dd01852341`)	2025-06-27 16:06:03 +02:00
Marcin Maliszkiewicz	58446fba62	main: don't start maintenance auth service if not enabled In `f96d30c2b5` we introduced the maintenance service, which is an additional instance of auth::service. But this service has a somewhat confusing 2-level startup mechanism: it's initialized with sharded<Service>::start and then auth::service::start (different method with the same name to confuse even more). When maintenance_socket was disabled (default setting), the code did only the first part of the startup. This registered a config observer but didn't create a permission_cache instance. As a result, a crash on SIGHUP when config is reloaded can occur. (cherry picked from commit `97c60b8153`)	2025-06-27 16:06:03 +02:00
Gleb Natapov	be1fb59806	api: return error from get_host_id_map if gossiper is not enabled yet. Token metadata api is initialized before gossiper is started. get_host_id_map REST endpoint cannot function without the fully initialized gossiper though. The gossiper is started deep in the join_cluster call chain, but if we move token_metadata api initialization after the call it means that no api will be available during bootstrap. This is not what we want. Make a simple fix by returning an error from the api if the gossiper is not initialized yet. Fixes: #24479 Closes scylladb/scylladb#24575 (cherry picked from commit `e364995e28`) Closes scylladb/scylladb#24584	2025-06-24 10:05:15 +03:00
Anna Stuchlik	987c57c889	doc: improve the tablets limitations section This PR improves the Limitations and Unsupported Features section for tablets, as it has been confusing to the customers. Refs https://github.com/scylladb/scylla-enterprise/issues/5465 Fixes https://github.com/scylladb/scylladb/issues/24562 Closes scylladb/scylladb#24563 (cherry picked from commit `17eabbe712`) Closes scylladb/scylladb#24586	2025-06-24 10:05:01 +03:00
Pavel Emelyanov	d431d8a99c	Merge '[Backport 2025.1] memtable: ensure _flushed_memory doesn't grow above total_memory' from Scylladb[bot] `dirty_memory_manager` tracks two quantities about memtable memory usage: "real" and "unspooled" memory usage. "real" is the total memory usage (sum of `occupancy().total_space()`) by all memtable LSA regions, plus a upper-bound estimate of the size of memtable data which has already moved to the cache region but isn't evictable (merged into the cache) yet. "unspooled" is the difference between total memory usage by all memtable LSA regions, and the total flushed memory (sum of `_flushed_memory`) of memtables. `dirty_memory_manager` controls the shares of compaction and/or blocks writes when these quantities cross various thresholds. "Total flushed memory" isn't a well defined notion, since the actual consumption of memory by the same data can vary over time due to LSA compactions, and even the data present in memtable can change over the course of the flush due to removals of outdated MVCC versions. So `_flushed_memory` is merely an approximation computed by `flush_reader` based on the data passing through it. This approximation is supposed to be a conservative lower bound. In particular, `_flushed_memory` should be not greater than `occupancy().total_space()`. Otherwise, for example, "unspooled" memory could become negative (and/or wrap around) and weird things could happen. There is an assertion in `~flush_memory_accounter` which checks that `_flushed_memory < occupancy().total_space()` at the end of flush. But it can fail. Without additional treatment, the memtable reader sometimes emits data which is already deleted. (In particular, it emites rows covered by a partition tombstone in a newer MVCC version.) This data is seen by `flush_reader` and accounted in `_flushed_memory`. But this data can be garbage-collected by the `mutation_cleaner` later during the flush and decrease `total_memory` below `_flushed_memory`. There is a piece of code in `mutation_cleaner` intended to prevent that. If `total_memory` decreases during a `mutation_cleaner` run, `_flushed_memory` is lowered by the same amount, just to preserve the asserted property. (This could also make `_flushed_memory` quite inaccurate, but that's considered acceptable). But that only works if `total_memory` is decreased during that run. It doesn't work if the `total_memory` decrease (enabled by the new allocator holes made by `mutation_cleaner`'s garbage collection work) happens asynchronously (due to memory reclaim for whatever reason) after the run. This patch fixes that by tracking the decreases of `total_memory` closer to the source. Instead of relying on `mutation_cleaner` to notify the memtable if it lowers `total_memory`, the memtable itself listens for notifications about LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's estimate of flushed memory decreased by the change in `total_memory` since the beginning of flush (if it was positive), and it keeps the amount of "spooled" memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`. Fixes scylladb/scylladb#21413 Backport candidate because it fixes a crash that can happen in existing stable branches. - (cherry picked from commit `7d551f99be`) - (cherry picked from commit `975e7e405a`) Parent PR: #21638 Closes scylladb/scylladb#24602 * github.com:scylladb/scylladb: memtable: ensure _flushed_memory doesn't grow above total memory usage replica/memtable: move region_listener handlers from dirty_memory_manager to memtable	2025-06-24 10:03:50 +03:00
Piotr Dulikowski	810ce1c67a	Merge '[Backport 2025.1] test/cluster: Adjust tests to RF-rack-valid keyspaces' from Scylladb[bot] In this PR, we're adjusting most of the cluster tests so that they pass with the `rf_rack_valid_keyspaces` configuration option enabled. In most cases, the changes are straightforward and require little to no additional insight into what the tests are doing or verifying. In some, however, doing that does require a deeper understanding of the tests we're modifying. The justification for those changes and their correctness is included in the commit messages corresponding to them. Note that this PR does not cover all of the cluster tests. There are few remaining ones, but they require a bit more effort, so we delegate that work to a separate PR. I tested all of the modified tests locally with `rf_rack_valid_keyspaces` set to true, and they all passed. Fixes scylladb/scylladb#23959 Backport: we want to backport these changes to 2025.1 since that's the version where we introduced RF-rack-valid keyspaces in. Although the tests are not, by default, run with `rf_rack_valid_keyspaces` enabled yet, that will most likely change in the near future and we'll also want to backport those changes too. The reason for this is that we want to verify that Scylla works correctly even with that constraint. - (cherry picked from commit `dbb8835fdf`) - (cherry picked from commit `9281bff0e3`) - (cherry picked from commit `5b83304b38`) - (cherry picked from commit `73b22d4f6b`) - (cherry picked from commit `2882b7e48a`) - (cherry picked from commit `4c46551c6b`) - (cherry picked from commit `92f7d5bf10`) - (cherry picked from commit `5d1bb8ebc5`) - (cherry picked from commit `d3c0cd6d9d`) - (cherry picked from commit `04567c28a3`) - (cherry picked from commit `c8c28dae92`) - (cherry picked from commit `c4b32c38a3`) - (cherry picked from commit `ee96f8dcfc`) Parent PR: #23661 Closes scylladb/scylladb#24120 * github.com:scylladb/scylladb: test/{topology,topology_custom,object_store}/suite.yaml: Enable rf_rack_valid_keyspaces in suites test/topology, test/topology_custom: Disable rf_rack_valid_keyspaces in problematic tests test/topology_custom/test_tablets: Divide rack into two to adjust tests to RF-rack-validity test/topology_custom/test_tablets: Adjust test_tablet_rf_change to RF-rack-validity test/topology_custom/test_tablet_repair_scheduler.py: Adjust to RF-rack-validity test/pylib/repair.py: Assign nodes to multiple racks in create_table_insert_data_for_repair test/topology_custom/test_zero_token_nodes_topology_ops: Adjust to RF-rack-validity test/topology_custom/test_zero_token_nodes_no_replication.py: Adjust to RF-rack-validity test/topology_custom/test_zero_token_nodes_multidc.py: Adjust to RF-rack-validity test/topology_custom/test_not_enough_token_owners.py: Adjust to RF-rack-validity test/topology_custom/test_multidc.py: Adjust to RF-rack-validity test/object_store/test_backup.py: Adjust to RF-rack-validity test/topology, test/topology_custom: Adjust simple tests to RF-rack-validity	2025-06-23 09:05:46 +02:00
Michał Chojnowski	dbb3f5ab17	memtable: ensure _flushed_memory doesn't grow above total memory usage dirty_memory_manager tracks two quantities about memtable memory usage: "real" and "unspooled" memory usage. "real" is the total memory usage (sum of `occupancy().total_space()`) by all memtable LSA regions, plus a upper-bound estimate of the size of memtable data which has already moved to the cache region but isn't evictable (merged into the cache) yet. "unspooled" is the difference between total memory usage by all memtable LSA regions, and the total flushed memory (sum of `_flushed_memory`) of memtables. dirty_memory_manager controls the shares of compaction and/or blocks writes when these quantities cross various thresholds. "Total flushed memory" isn't a well defined notion, since the actual consumption of memory by the same data can vary over time due to LSA compactions, and even the data present in memtable can change over the course of the flush due to removals of outdated MVCC versions. So `_flushed_memory` is merely an approximation computed by `flush_reader` based on the data passing through it. This approximation is supposed to be a conservative lower bound. In particular, `_flushed_memory` should be not greater than `occupancy().total_space()`. Otherwise, for example, "unspooled" memory could become negative (and/or wrap around) and weird things could happen. There is an assertion in ~flush_memory_accounter which checks that `_flushed_memory < occupancy().total_space()` at the end of flush. But it can fail. Without additional treatment, the memtable reader sometimes emits data which is already deleted. (In particular, it emites rows covered by a partition tombstone in a newer MVCC version.) This data is seen `flush_reader` and accounted in `_flushed_memory`. But this data can be garbage-collected by the mutation_cleaner later during the flush and decrease `total_memory` below `_flushed_memory`. There is a piece of code in mutation_cleaner intended to prevent that. If `total_memory` decreases during a `mutation_cleaner` run, `_flushed_memory` is lowered by the same amount, just to preserve the asserted property. (This could also make `_flushed_memory` quite inaccurate, but that's considered acceptable). But that only works if `total_memory` is decreased during that run. It doesn't work if the `total_memory` decrease (enabled by the new allocator holes made by `mutation_cleaner`'s garbage collection work) happens asynchronously (due to memory reclaim for whatever reason) after the run. This patch fixes that by tracking the decreases of `total_memory` closer to the source. Instead of relying on `mutation_cleaner` to notify the memtable if it lowers `total_memory`, the memtable itself listens for notifications about LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's estimate of flushed memory decreased by the change in `total_memory` since the beginning of flush (if it was positive), and it keeps the amount of "spooled" memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`. (cherry picked from commit `975e7e405a`)	2025-06-22 17:37:47 +00:00
Michał Chojnowski	b2302d8a46	replica/memtable: move region_listener handlers from dirty_memory_manager to memtable The memtable wants to listen for changes in its `total_memory` in order to decrease its `_flushed_memory` in case some of the freed memory has already been accounted as flushed. (This can happen because the flush reader sees and accounts even outdated MVCC versions, which can be deleted and freed during the flush). Today, the memtable doesn't listen to those changes directly. Instead, some calls which can affect `total_memory` (in particular, the mutation cleaner) manually check the value of `total_memory` before and after they run, and they pass the difference to the memtable. But that's not good enough, because `total_memory` can also change outside of those manually-checked calls -- for example, during LSA compaction, which can occur anytime. This makes memtable's accounting inaccurate and can lead to unexpected states. But we already have an interface for listening to `total_memory` changes actively, and `dirty_memory_manager`, which also needs to know it, does just that. So what happens e.g. when `mutation_cleaner` runs is that `mutation_cleaner` checks the value of `total_memory` before it runs, then it runs, causing several changes to `total_memory` which are picked up by `dirty_memory_manager`, then `mutation_cleaner` checks the end value of `total_memory` and passes the difference to `memtable`, which corrects whatever was observed by `dirty_memory_manager`. To allow memtable to modify its `_flushed_memory` correctly, we need to make `memtable` itself a `region_listener`. Also, instead of the situation where `dirty_memory_manager` receives `total_memory` change notifications from `logalloc` directly, and `memtable` fixes the manager's state later, we want to only the memtable listen for the notifications, and pass them already modified accordingl to the manager, so there is no intermediate wrong states. This patch moves the `region_listener` callbacks from the `dirty_memory_manager` to the `memtable`. It's not intended to be a functional change, just a source code refactoring. The next patch will be a functional change enabled by this. (cherry picked from commit `7d551f99be`)	2025-06-22 17:37:47 +00:00
Pavel Emelyanov	b6a4065ac1	Update seastar submodule (no nested stall backtraces) * seastar ed31c1ce...b2ff08a5 (1): > stall_detector: no backtrace if exception Fixes #24464 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24540	2025-06-19 10:07:45 +03:00
Dawid Mędrek	5a41282bb3	test/{topology,topology_custom,object_store}/suite.yaml: Enable rf_rack_valid_keyspaces in suites Almost all of the tests have been adjusted to be able to be run with the `rf_rack_valid_keyspaces` configuration option enabled, while the rest, a minority, create nodes with it disabled. Thanks to that, we can enable it by default, so let's do that. (cherry picked from commit `ee96f8dcfc`)	2025-06-18 16:47:49 +02:00
Dawid Mędrek	6506eddcfd	test/topology, test/topology_custom: Disable rf_rack_valid_keyspaces in problematic tests Some of the tests in the test suite have proven to be more problematic in adjusting to RF-rack-validity. Since we'd like to run as many tests as possible with the `rf_rack_valid_keyspaces` configuration option enabled, let's disable it in those. In the following commit, we'll enable it by default. (cherry picked from commit `c4b32c38a3`)	2025-06-18 14:21:53 +02:00
Dawid Mędrek	a63f1737d1	test/topology_custom/test_tablets: Divide rack into two to adjust tests to RF-rack-validity Three tests in the file use a multi-DC cluster. Unfortunately, they put all of the nodes in a DC in the same rack and because of that, they fail when run with the `rf_rack_valid_keyspaces` configuration option enabled. Since the tests revolve mostly around zero-token nodes and how they affect replication in a keyspace, this change should have zero impact on them. (cherry picked from commit `c8c28dae92`)	2025-06-18 14:21:51 +02:00
Dawid Mędrek	720c80239d	test/topology_custom/test_tablets: Adjust test_tablet_rf_change to RF-rack-validity We reduce the number of nodes and the RF values used in the test to make sure that the test can be run with the `rf_rack_valid_keyspaces` configuration option. The test doesn't seem to be reliant on the exact number of nodes, so the reduction should not make any difference. (cherry picked from commit `04567c28a3`)	2025-06-18 14:21:48 +02:00
Dawid Mędrek	dd7da8a9b9	test/topology_custom/test_tablet_repair_scheduler.py: Adjust to RF-rack-validity The change boils down to matching the number of created racks to the number of created nodes in each DC in the auxiliary function `prepare_multi_dc_repair`. This way, we ensure that the created keyspace will be RF-rack-valid and so we can run the test file even with the `rf_rack_valid_keyspaces` configuration option enabled. The change has no impact on the tests that use the function; the distribution of nodes across racks does not affect how repair is performed or what the tests do and verify. Because of that, the change is correct. (cherry picked from commit `d3c0cd6d9d`)	2025-06-18 14:21:46 +02:00
Dawid Mędrek	05fffbdea0	test/pylib/repair.py: Assign nodes to multiple racks in create_table_insert_data_for_repair We assign the newly created nodes to multiple racks. If RF <= 3, we create as many racks as the provided RF. We disallow the case of RF > 3 to avoid trying to create an RF-rack-invalid keyspace; note that no existing test calls `create_table_insert_data_for_repair` providing a higher RF. The rationale for doing this is we want to ensure that the tests calling the function can be run with the `rf_rack_valid_keyspaces` configuration option enabled. (cherry picked from commit `5d1bb8ebc5`)	2025-06-18 14:21:44 +02:00
Dawid Mędrek	b0e60c2f90	test/topology_custom/test_zero_token_nodes_topology_ops: Adjust to RF-rack-validity We assign the nodes to the same DC, but multiple racks to ensure that the created keyspace is RF-rack-valid and we can run the test with the `rf_rack_valid_keyspaces` configuration option enabled. The changes do not affect what the test does and verifies. (cherry picked from commit `92f7d5bf10`)	2025-06-18 14:21:42 +02:00
Dawid Mędrek	358cb2f82f	test/topology_custom/test_zero_token_nodes_no_replication.py: Adjust to RF-rack-validity We simply assign the nodes used in the test to seprate racks to ensure that the created keyspace is RF-rack-valid to be able to run the test with the `rf_rack_valid_keyspaces` configuration option set to true. The change does not affect what the test does and verifies -- it only depends on the type of nodes, whether they are normal token owners or not -- and so the changes are correct in that sense. (cherry picked from commit `4c46551c6b`)	2025-06-18 14:21:40 +02:00
Dawid Mędrek	50793d1109	test/topology_custom/test_zero_token_nodes_multidc.py: Adjust to RF-rack-validity We parameterize the test so it's run with and without enforced RF-rack-valid keyspaces. In the test itself, we introduce a branch to make sure that we won't run into a situation where we're attempting to create an RF-rack-invalid keyspace. Since the `rf_rack_valid_keyspaces` option is not commonly used yet and because its semantics will most likely change in the future, we decide to parameterize the test rather than try to get rid of some of the test cases that are problematic with the option enabled. (cherry picked from commit `2882b7e48a`)	2025-06-18 14:21:37 +02:00
Dawid Mędrek	822a131d30	test/topology_custom/test_not_enough_token_owners.py: Adjust to RF-rack-validity We simply assign DC/rack properties to every node used in the test. We put all of them in the same DC to make sure that the cluster behaves as closely to how it would before these changes. However, we distribute them over multiple racks to ensure that the keyspace used in the test is RF-rack-valid, so we can also run it with the `rf_rack_valid_keyspaces` configuration option set to true. The distribution of nodes between racks has no effect on what the test does and verifies, so the changes are correct in that sense. (cherry picked from commit `73b22d4f6b`)	2025-06-18 14:21:35 +02:00
Dawid Mędrek	8da3f7003c	test/topology_custom/test_multidc.py: Adjust to RF-rack-validity Instead of putting all of the nodes in a DC in the same rack in `test_putget_2dc_with_rf`, we assign them to different racks. The distribution of nodes in racks is orthogonal to what the test is doing and verifying, so the change is correct in that sense. At the same time, it ensures that the test never violates the invariant of RF-rack-valid keyspaces, so we can also run it with `rf_rack_valid_keyspaces` set to true. (cherry picked from commit `5b83304b38`)	2025-06-18 14:21:33 +02:00
Dawid Mędrek	60bc1e6c9d	test/object_store/test_backup.py: Adjust to RF-rack-validity We modify the parameters of `test_restore_with_streaming_scopes` so that it now represents a pair of values: topology layout and the value `rf_rack_valid_keyspaces` should be set to. Two of the already existing parameters violate RF-rack-validity and so the test would fail when run with `rf_rack_valid_keyspaces: true`. However, since the option isn't commonly used yet and since the semantics of RF-rack-valid keyspaces will most likely change in the future, let's keep those cases and just run them with the option disabled. This way, we still test everything we can without running into undesired failures that don't indicate anything. (cherry picked from commit `9281bff0e3`)	2025-06-18 14:21:31 +02:00
Dawid Mędrek	ebc1584823	test/topology, test/topology_custom: Adjust simple tests to RF-rack-validity We adjust all of the simple cases of cluster tests so they work with `rf_rack_valid_keyspaces: true`. It boils down to assigning nodes to multiple racks. For most of the changes, we do that by: * Using `pytest.mark.prepare_3_racks_cluster` instead of `pytest.mark.prepare_3_nodes_cluster`. * Using an additional argument -- `auto_rack_dc` -- when calling `ManagerClient::servers_add()`. In some cases, we need to assign the racks manually, which may be less obvious, but in every such situation, the tests didn't rely on that assignment, so that doesn't affect them or what they verify. (cherry picked from commit `dbb8835fdf`)	2025-06-18 14:21:28 +02:00
Michał Chojnowski	25cb5bc724	test/boost/mutation_reader_test: fix a use-after-free in `test_fast_forwarding_combined_reader_is_consistent_with_slicing` The contract in mutation_reader.hh says: ``` // pr needs to be valid until the reader is destroyed or fast_forward_to() // is called again. future<> fast_forward_to(const dht::partition_range& pr) { ``` `test_fast_forwarding_combined_reader_is_consistent_with_slicing` violates this by passing a temporary to `fast_forward_to`. Fix that. Fixes scylladb/scylladb#24542 Closes scylladb/scylladb#24543 (cherry picked from commit `27f66fb110`) Closes scylladb/scylladb#24547	2025-06-18 14:29:33 +03:00
Pavel Emelyanov	10ad2d9460	Merge '[Backport 2025.1] tablets: deallocate storage state on end_migration' from Scylladb[bot] When a tablet is migrated and cleaned up, deallocate the tablet storage group state on `end_migration` stage, instead of `cleanup` stage: * When the stage is updated from `cleanup` to `end_migration`, the storage group is removed on the leaving replica. * When the table is initialized, if the tablet stage is `end_migration` then we don't allocate a storage group for it. This happens for example if the leaving replica is restarted during tablet migration. If it's initialized in `cleanup` stage then we allocate a storage group, and it will be deallocated when transitioning to `end_migration`. This guarantees that the storage group is always deallocated on the leaving replica by `end_migration`, and that it is always allocated if the tablet wasn't cleaned up fully yet. It is a similar case also for the pending replica when the migration is aborted. We deallocate the state on `revert_migration` which is the stage following `cleanup_target`. Previously the storage group would be allocated when the tablet is initialized on any of the tablet replicas - also on the leaving replica, and when the tablet stage is `cleanup` or `end_migration`, and deallocated during `cleanup`. This fixes the following issue: 1. A migrating tablet enters cleanup stage 2. the tablet is cleaned up successfuly 3. The leaving replica is restarted, and allocates storage group 4. tablet cleanup is not called because it's already cleaned up 5. the storage group remains allocated on the leaving replica after the migration is completed - it's not cleaned up properly. Fixes https://github.com/scylladb/scylladb/issues/23481 backport to all relevant releases since it's a bug that results in a crash - (cherry picked from commit `34f15ca871`) - (cherry picked from commit `fb18fc0505`) - (cherry picked from commit `bd88ca92c8`) Parent PR: #24393 Closes scylladb/scylladb#24487 * github.com:scylladb/scylladb: test/cluster/test_tablets: test restart during tablet cleanup test: tablets: add get_tablet_info helper tablets: deallocate storage state on end_migration	2025-06-18 14:29:04 +03:00
Anna Stuchlik	607d39609d	doc: add support for z3 GCP This commit adds support for z3-highmem-highlssd instance types to Cloud Instance Recommendations for GCP. Fixes https://github.com/scylladb/scylladb/issues/24511 Closes scylladb/scylladb#24533 (cherry picked from commit `648d8caf27`) Closes scylladb/scylladb#24544	2025-06-17 18:26:34 +03:00
Michael Litvak	bc8289da8c	test/cluster/test_tablets: test restart during tablet cleanup Add a test that reproduces issue scylladb/scylladb#23481. The test migrates a tablet from one node to another, and while the tablet is in some stage of cleanup - either before or right after, depending on the parameter - the leaving replica, on which the tablet is cleaned, is restarted. This is interesting because when the leaving replica starts and loads its state, the tablet could be in different stages of cleanup - the SSTables may still exist or they may have been cleaned up already, and we want to make sure the state is loaded correctly. (cherry picked from commit `bd88ca92c8`)	2025-06-17 18:11:26 +03:00
Michael Litvak	7f3883d072	test: tablets: add get_tablet_info helper Add a helper for tests to get the tablet info from system.tablets for a tablet owning a given token. (cherry picked from commit `fb18fc0505`)	2025-06-17 18:11:26 +03:00
Michael Litvak	c45d465100	tablets: deallocate storage state on end_migration When a tablet is migrated and cleaned up, deallocate the tablet storage group state on `end_migration` stage, instead of `cleanup` stage: * When the stage is updated from `cleanup` to `end_migration`, the storage group is removed on the leaving replica. * When the table is initialized, if the tablet stage is `end_migration` then we don't allocate a storage group for it. This happens for example if the leaving replica is restarted during tablet migration. If it's initialized in `cleanup` stage then we allocate a storage group, and it will be deallocated when transitioning to `end_migration`. This guarantees that the storage group is always deallocated on the leaving replica by `end_migration`, and that it is always allocated if the tablet wasn't cleaned up fully yet. It is a similar case also for the pending replica when the migration is aborted. We deallocate the state on `revert_migration` which is the stage following `cleanup_target`. Previously the storage group would be allocated when the tablet is initialized on any of the tablet replicas - also on the leaving replica, and when the tablet stage is `cleanup` or `end_migration`, and deallocated during `cleanup`. This fixes the following issue: 1. A migrating tablet enters cleanup stage 2. the tablet is cleaned up successfuly 3. The leaving replica is restarted, and allocates storage group 4. tablet cleanup is not called because it was already cleaned up 4. the storage group remains allocated on the leaving replica after the migration is completed - it's not cleaned up properly. Fixes scylladb/scylladb#23481 (cherry picked from commit `34f15ca871`)	2025-06-17 18:11:26 +03:00
Anna Stuchlik	6592334520	doc: remove the limitation for disabling CDC This commit removes the instruction to stop all writes before disabling CDC with ALTER. Fixes https://github.com/scylladb/scylla-docs/issues/4020 Closes scylladb/scylladb#24406 (cherry picked from commit `b0ced64c88`) Closes scylladb/scylladb#24475	2025-06-17 13:15:49 +03:00
Andrzej Jackowski	fa1244c2ca	test: wait for normal state propagation in test_auth_v2_migration By default, cluster tests have skip_wait_for_gossip_to_settle=0 and ring_delay_ms=0. In tests with gossip topology, it may lead to a race, where nodes see different state of each other. In case of test_auth_v2_migration, there are three nodes. If the first node already knows that the third node is NORMAL, and the second node does not, the system_auth tables can return incomplete results. To avoid such a race, this commit adds a check that all nodes see other nodes as NORMAL before any writes are done. Refs: #24163 Closes scylladb/scylladb#24185 (cherry picked from commit `555d897a15`) Closes scylladb/scylladb#24519	2025-06-17 13:15:01 +03:00
Ferenc Szili	ea822c6c44	tablets: trigger mutation source refresh on tablet count change Consider the following scenario: - let's assume tablet 0 has range [1, 5] (pre merge) - tablet merge happens, tablet 0 has now range [1, 10] - tablet_sstable_set isn't refreshed, so holds a stale state, thinks tablet 0 still has range [1, 5] - during a full scan, forward service will intersect the full range with tablet ranges and consume one tablet at a time - replica service is asked to consume range [1, 10] of tablet 0 (post merge) We have two possible outcomes: With cache bypass: 1) cache reader is bypassed 2) sstable reader is created on range [1, 10] 3) unrefreshed tablet_sstable_set holds stale state, but select correctly all sstables intersecting with range [1, 10] With cache: 1) cache reader is created 2) finds partition with token 5 is cached 3) sstable reader is created on range [1, 4] (later would fast forward to range [6, 10]; also belongs to tablet 0) 4) incremental selector consumes the pre-merge sstable spanning range [1, 5] 4.1) since the partitioned_sstable_set pre-merge contains only that sstable, EOS is reached 4.2) since EOS is reached, the fast forward to range [6, 10] is not allowed. So with the set refreshed, sstable set is aligned with tablet ranges, and no premature EOS is signalled, otherwise preventing fast forward to from happening and all data from being properly captured in the read. This change fixes the bug and triggeres a mutation source refresh whenever the number of tablets for the table has changed, not only when we have incoming tablets. Fixes: #23313 (cherry picked from commit `d0329ca370`)	2025-06-17 09:23:22 +00:00
Dawid Mędrek	730e16f475	test/cluster/mv: Adjust test to RF-rack-valid keyspaces We adjust the test in the directory so that all of the used keyspaces are RF-rack-valid throughout the their execution. Refs scylladb/scylladb#23428 Closes scylladb/scylladb#23490 (cherry picked from commit `10589e966f`) Closes scylladb/scylladb#24508	2025-06-17 07:33:22 +02:00
Botond Dénes	05476e8aba	Merge '[Backport 2025.1] test/cluster/test_read_repair.py: improve trace logging test (again)' from Scylladb[bot] The test test_read_repair_with_trace_logging wants to test read repair with trace logging. Turns out that node restart + trace-level logging + debug mode is too much and even with 1 minute timeout, the read repair times out sometimes. Refactor the test to use injection point instead of restart. To make sure the test still tests what it supposed to test, use tracing to assert that read repair did indeed happen. Fixes: scylladb/scylladb#23968 Needs backport to 2025.1 and 6.2, both have the flaky test - (cherry picked from commit `51025de755`) - (cherry picked from commit `29eedaa0e5`) Parent PR: #23989 Closes scylladb/scylladb#24049 * github.com:scylladb/scylladb: test/cluster/test_read_repair.py: improve trace logging test (again) test/cluster: extract execute_with_tracing() into pylib/util.py	2025-06-16 06:55:13 +03:00
Jenkins Promoter	e7a1183d9c	Update pgo profiles - aarch64	2025-06-15 04:29:52 +03:00

1 2 3 4 5 ...

46813 Commits