scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 10:00:35 +00:00

Author	SHA1	Message	Date
Dawid Mędrek	ee96f8dcfc	test/cluster/suite.yaml: Enable rf_rack_valid_keyspaces in suite Almost all of the tests have been adjusted to be able to be run with the `rf_rack_valid_keyspaces` configuration option enabled, while the rest, a minority, create nodes with it disabled. Thanks to that, we can enable it by default, so let's do that.	2025-05-10 16:30:51 +02:00
Dawid Mędrek	c4b32c38a3	test/cluster: Disable rf_rack_valid_keyspaces in problematic tests Some of the tests in the test suite have proven to be more problematic in adjusting to RF-rack-validity. Since we'd like to run as many tests as possible with the `rf_rack_valid_keyspaces` configuration option enabled, let's disable it in those. In the following commit, we'll enable it by default.	2025-05-10 16:30:49 +02:00
Dawid Mędrek	c8c28dae92	test/cluster/test_tablets: Divide rack into two to adjust tests to RF-rack-validity Three tests in the file use a multi-DC cluster. Unfortunately, they put all of the nodes in a DC in the same rack and because of that, they fail when run with the `rf_rack_valid_keyspaces` configuration option enabled. Since the tests revolve mostly around zero-token nodes and how they affect replication in a keyspace, this change should have zero impact on them.	2025-05-10 16:30:46 +02:00
Dawid Mędrek	04567c28a3	test/cluster/test_tablets: Adjust test_tablet_rf_change to RF-rack-validity We reduce the number of nodes and the RF values used in the test to make sure that the test can be run with the `rf_rack_valid_keyspaces` configuration option. The test doesn't seem to be reliant on the exact number of nodes, so the reduction should not make any difference.	2025-05-10 16:30:43 +02:00
Dawid Mędrek	d3c0cd6d9d	test/cluster/test_tablet_repair_scheduler.py: Adjust to RF-rack-validity The change boils down to matching the number of created racks to the number of created nodes in each DC in the auxiliary function `prepare_multi_dc_repair`. This way, we ensure that the created keyspace will be RF-rack-valid and so we can run the test file even with the `rf_rack_valid_keyspaces` configuration option enabled. The change has no impact on the tests that use the function; the distribution of nodes across racks does not affect how repair is performed or what the tests do and verify. Because of that, the change is correct.	2025-05-10 16:30:40 +02:00
Dawid Mędrek	5d1bb8ebc5	test/pylib/repair.py: Assign nodes to multiple racks in create_table_insert_data_for_repair We assign the newly created nodes to multiple racks. If RF <= 3, we create as many racks as the provided RF. We disallow the case of RF > 3 to avoid trying to create an RF-rack-invalid keyspace; note that no existing test calls `create_table_insert_data_for_repair` providing a higher RF. The rationale for doing this is we want to ensure that the tests calling the function can be run with the `rf_rack_valid_keyspaces` configuration option enabled.	2025-05-10 16:30:37 +02:00
Dawid Mędrek	92f7d5bf10	test/cluster/test_zero_token_nodes_topology_ops: Adjust to RF-rack-validity We assign the nodes to the same DC, but multiple racks to ensure that the created keyspace is RF-rack-valid and we can run the test with the `rf_rack_valid_keyspaces` configuration option enabled. The changes do not affect what the test does and verifies.	2025-05-10 16:30:34 +02:00
Dawid Mędrek	4c46551c6b	test/cluster/test_zero_token_nodes_no_replication.py: Adjust to RF-rack-validity We simply assign the nodes used in the test to seprate racks to ensure that the created keyspace is RF-rack-valid to be able to run the test with the `rf_rack_valid_keyspaces` configuration option set to true. The change does not affect what the test does and verifies -- it only depends on the type of nodes, whether they are normal token owners or not -- and so the changes are correct in that sense.	2025-05-10 16:30:31 +02:00
Dawid Mędrek	2882b7e48a	test/cluster/test_zero_token_nodes_multidc.py: Adjust to RF-rack-validity We parameterize the test so it's run with and without enforced RF-rack-valid keyspaces. In the test itself, we introduce a branch to make sure that we won't run into a situation where we're attempting to create an RF-rack-invalid keyspace. Since the `rf_rack_valid_keyspaces` option is not commonly used yet and because its semantics will most likely change in the future, we decide to parameterize the test rather than try to get rid of some of the test cases that are problematic with the option enabled.	2025-05-10 16:30:29 +02:00
Dawid Mędrek	73b22d4f6b	test/cluster/test_not_enough_token_owners.py: Adjust to RF-rack-validity We simply assign DC/rack properties to every node used in the test. We put all of them in the same DC to make sure that the cluster behaves as closely to how it would before these changes. However, we distribute them over multiple racks to ensure that the keyspace used in the test is RF-rack-valid, so we can also run it with the `rf_rack_valid_keyspaces` configuration option set to true. The distribution of nodes between racks has no effect on what the test does and verifies, so the changes are correct in that sense.	2025-05-10 16:30:26 +02:00
Dawid Mędrek	5b83304b38	test/cluster/test_multidc.py: Adjust to RF-rack-validity Instead of putting all of the nodes in a DC in the same rack in `test_putget_2dc_with_rf`, we assign them to different racks. The distribution of nodes in racks is orthogonal to what the test is doing and verifying, so the change is correct in that sense. At the same time, it ensures that the test never violates the invariant of RF-rack-valid keyspaces, so we can also run it with `rf_rack_valid_keyspaces` set to true.	2025-05-10 16:30:23 +02:00
Dawid Mędrek	9281bff0e3	test/cluster/object_store/test_backup.py: Adjust to RF-rack-validity We modify the parameters of `test_restore_with_streaming_scopes` so that it now represents a pair of values: topology layout and the value `rf_rack_valid_keyspaces` should be set to. Two of the already existing parameters violate RF-rack-validity and so the test would fail when run with `rf_rack_valid_keyspaces: true`. However, since the option isn't commonly used yet and since the semantics of RF-rack-valid keyspaces will most likely change in the future, let's keep those cases and just run them with the option disabled. This way, we still test everything we can without running into undesired failures that don't indicate anything.	2025-05-10 16:30:20 +02:00
Dawid Mędrek	dbb8835fdf	test/cluster: Adjust simple tests to RF-rack-validity We adjust all of the simple cases of cluster tests so they work with `rf_rack_valid_keyspaces: true`. It boils down to assigning nodes to multiple racks. For most of the changes, we do that by: * Using `pytest.mark.prepare_3_racks_cluster` instead of `pytest.mark.prepare_3_nodes_cluster`. * Using an additional argument -- `auto_rack_dc` -- when calling `ManagerClient::servers_add()`. In some cases, we need to assign the racks manually, which may be less obvious, but in every such situation, the tests didn't rely on that assignment, so that doesn't affect them or what they verify.	2025-05-10 16:30:18 +02:00
Avi Kivity	092a88c9b9	dist: drop the scylla-env package scylla-env was used to glue together support for older distributions. It hasn't been used for many years. Remove it. Closes scylladb/scylladb#23985	2025-05-09 14:10:00 +03:00
Raphael S. Carvalho	28056344ba	replica: Fix take_storage_snapshot() running concurrently to merge completion Some background: When merge happens, a background fiber wakes up to merge compaction groups of sibling tablets into main one. It cannot happen when rebuilding the storage group list, since token metadata update is not preemptable. So a storage group, post merge, has the main compaction group and two other groups to be merged into the main. When the merge happens, those two groups are empty and will be freed. Consider this scenario: 1) merge happens, from 2 to 1 tablet 2) produces a single storage group, containing main and two other compaction groups to be merged into main. 3) take_storage_snapshot(), triggered by migration post merge, gets a list of pointer to all compaction groups. 4) t__s__s() iterates first on main group, yields. 5) background fiber wakes up, moves the data into main and frees the two groups 6) t__s__s() advances to other groups that are now freed, since step 5. 7) segmentation fault In addition to memory corruption, there's also a potential for data to escape the iteration in take_storage_snapshot(), since data can be moved across compaction groups in background, all belonging to the same storage group. That could result in data loss. Readers should all operate on storage group level since it can provide a view on all the data owned by a tablet replica. The movement of sstable from group A to B is atomic, but iteration first on A, then later on B, might miss data that was moved from B to A, before the iteration reached B. By switching to storage group in the interface that retrieves groups by token range, we guarantee that all data of a given replica can be found regardless of which compaction group they sit on. Fixes #23162. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#24058	2025-05-09 14:07:06 +03:00
Gleb Natapov	c6e1758457	topology coordinator: make decommissioning node non voter before completing the operation A decommissioned node is removed from a raft config after operation is marked as completed. This is required since otherwise the decommissioned node will not see that decommission has completed (the status is propagated through raft). But right after the decommission is marked as completed a decommissioned node may terminate, so in case of a two node cluster, the configuration change that removes it from the raft will fail, because there will no be quorum. The solution is to mark the decommissioning node as non voter before reporting the operation as completed. Fixes: #24026 Backport to 2025.2 because it fixes a potential hang. Don't backport to branches older than 2025.2 because they don't have `8b186ab0ff`, which caused this issue. Closes scylladb/scylladb#24027	2025-05-09 12:43:31 +02:00
Tomasz Grabiec	be2c3ad6fd	Merge 'logalloc_test: don't test performance in test background_reclaim' from Michał Chojnowski The test is failing in CI sometimes due to performance reasons. There are at least two problems: 1. The initial 500ms (wall time) sleep might be too short. If the reclaimer doesn't manage to evict enough memory during this time, the test will fail. 2. During the 100ms (thread CPU time) window given by the test to background reclaim, the `background_reclaim` scheduling group isn't actually guaranteed to get any CPU, regardless of shares. If the process is switched out inside the `background_reclaim` group, it might accumulate so much vruntime that it won't get any more CPU again for a long time. We have seen both. This kind of timing test can't be run reliably on overcommitted machines without modifying the Seastar scheduler to support that (by e.g. using thread clock instead of wall time clock in the scheduler), and that would require an amount of effort disproportionate to the value of the test. So for now, to unflake the test, this patch removes the performance test part. (And the tradeoff is a weakening of the test). After the patch, we only check that the background reclaim happens eventually. Fixes https://github.com/scylladb/scylladb/issues/15677 Backporting this is optional. The test is flaky even in stable branches, but the failure is rare. Closes scylladb/scylladb#24030 * github.com:scylladb/scylladb: logalloc_test: don't test performance in test `background_reclaim` logalloc: make background_reclaimer::free_memory_threshold publicly visible	2025-05-09 11:35:02 +02:00
Patryk Jędrzejczak	be4532bcec	Merge 'Correctly skip updating node's own ip address due to oudated gossiper data ' from Gleb Natapov Used host id to check if the update is for the node itself. Using IP is unreliable since if a node is restarted with different IP a gossiper message with previous IP can be misinterpreted as belonging to a different node. Fixes: #22777 Backport to 2025.1 since this fixes a crash. Older version do not have the code. Closes scylladb/scylladb#24000 * https://github.com/scylladb/scylladb: test: add reproducer for #22777 storage_service: Do not remove gossiper entry on address change storage_service: use id to check for local node	2025-05-09 11:28:21 +02:00
Piotr Smaron	f740f9f0e1	cql: fix CREATE tablets KS warning msg Materialized Views and Secondary Indexes are yet another features that keyspaces with tablets do not support, but these were not listed in a warning message returned to the user on CREATE KEYSPACE statement. This commit adds the 2 missing features. Fixes: #24006 Closes scylladb/scylladb#23902	2025-05-08 17:18:43 +02:00
Tomasz Grabiec	fadfbe8459	Merge 'transport: storage_proxy: release ERM when waiting for query timeout' from Andrzej Jackowski Before this change, if a read executor had just enough targets to achieve query's CL, and there was a connection drop (e.g. node failure), the read executor waited for the entire request timeout to give drivers time to execute a speculative read in a meantime. Such behavior don't work well when a very long query timeout (e.g. 1800s) is set, because the unfinished request blocks topology changes. This change implements a mechanism to thrown a new read_failure_exception_with_timeout in the aforementioned scenario. The exception is caught by CQL server which conducts the waiting, after ERM is released. The new exception inherits from read_failure_exception, because layers that don't catch the exception (such as mapreduce service) should handle the exception just a regular read_failure. However, when CQL server catch the exception, it returns read_timeout_exception to the client because after additional waiting such an error message is more appropriate (read_timeout_exception was also returned before this change was introduced). This change: - Rewrite cql_server::connection::process_request_one to use seastar::futurize_invoke and try_catch<> instead of utils::result_try - Add new read_failure_exception_with_timeout and throws it in storage_proxy - Add sleep in CQL server when the new exception is caught - Catch local exceptions in Mapreduce Service and convert them to std::runtime_error. - Add get_cql_exclusive to manager_client.py - Add test_long_query_timeout_erm No backport needed - minor issue fix. Closes scylladb/scylladb#23156 * github.com:scylladb/scylladb: test: add test_long_query_timeout_erm test: add get_cql_exclusive to manager_client.py mapreduce: catch local read_failure_exception_with_timeout transport: storage_proxy: release ERM when waiting for query timeout transport: remove redundant references in process_request_one transport: fix the indentation in process_request_one transport: add futures in CQL server exception handling	2025-05-08 12:45:49 +02:00
Avi Kivity	2d2a2ef277	tools: toolchain: dbuild: support nested containers Pass through the local containers directory (it cannot be bind-mounted to /var/lib/containers since podman checks the path hasn't changed) with overrides to the paths. This allows containers to be created inside the dbuild container, so we can enlist pre-packaged software (such as opensearch) in test.py. If the container images are already downloaded in the host, they won't be downloaded again. It turns out that the container ecosystem doesn't support nested network namespaces well, so we configure the outer container to use host networking for the inner containers. It's useful anyway. The frozen toolchain now installs podman and buildah so there's something to actually drive those nested containers. We disable weak dnf dependencies to avoid installing qemu. The frozen toolchain is regenerated with optimized clang from https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz Closes scylladb/scylladb#24020	2025-05-08 13:00:16 +03:00
Botond Dénes	4a802baccb	Merge 'compress: make sstable compression dictionaries NUMA-aware ' from Michał Chojnowski compress: distribute compression dictionaries over shards We don't want each shard to have its own copy of each dictionary. It would unnecessary pressure on cache and memory. Instead, we want to share dictionaries between shards. Before this commit, all dictionaries live on shard 0. All other shards borrow foreign shared pointers from shard 0. There's a problem with this setup: dictionary blobs receive many random accesses. If shard 0 is on a remote NUMA node, this could pose a performance problem. Therefore, for each dictionary, we would like to have one copy per NUMA node, not one copy per the entire machine. And each shard should use the copy belonging to its own NUMA node. This is the main goal of this patch. There is another issue with putting all dicts on shard 0: it eats an assymetric amount of memory from shard 0. This commit spreads the ownership of dicts over all shards within the NUMA group, to make the situation more symmetric. (Dict owner is decided based on the hash of dict contents). It should be noted that the last part isn't necessarily a good thing, though. While it makes the situation more symmetric within each node, it makes it less symmetric across the cluster, if different node sizes are present. If dicts occupy 1% of memory on each shard of a 100-shard node, then the same dicts would occupy 100% of memory on a 1-shard node. So for the sake of cluster-wide symmetry, we might later want to consider e.g. making the memory limit for dictionaries inversely proportional to the number of shards. New functionality, added to a feature which isn't in any stable branch yet. No backporting. Closes scylladb/scylladb#23590 * github.com:scylladb/scylladb: test: add test/boost/sstable_compressor_factory_test compress: add some test-only APIs compress: rename sstable_compressor_factory_impl to dictionary_holder compress: fix indentation compress: remove sstable_compressor_factory_impl::_owner_shard compress: distribute compression dictionaries over shards test: switch uses of make_sstable_compressor_factory() to a seastar::thread-dependent version test: remove sstables::test_env::do_with()	2025-05-08 09:52:46 +03:00
Botond Dénes	e5d944f986	Merge 'replica: Fix use-after-free with concurrent schema change and sstable set update' from Raphael Raph Carvalho When schema is changed, sstable set is updated according to the compaction strategy of the new schema (no changes to set are actually made, just the underlying set type is updated), but the problem is that it happens without a lock, causing a use-after-free when running concurrently to another set update. Example: 1) A: sstable set is being updated on compaction completion 2) B: schema change updates the set (it's non deferring, so it happens in one go) and frees the set used by A. 3) when A resumes, system will likely crash since the set is freed already. ASAN screams about it: SUMMARY: AddressSanitizer: heap-use-after-free sstables/sstable_set.cc ... Fix is about deferring update of the set on schema change to compaction, which is triggered after new schema is set. Only strategy state and backlog tracker are updated immediately, which is fine since strategy doesn't depend on any particular implementation of sstable set. Fixes #22040. Closes scylladb/scylladb#23680 * github.com:scylladb/scylladb: replica: Fix use-after-free with concurrent schema change and sstable set update sstables: Implement sstable_set_impl::all_sstable_runs()	2025-05-08 06:56:16 +03:00
Petr Gusev	e6c3f954f6	main: check if current process group controls stdin tty test.py doesn't override stdin when starting Scylla, so when tests are run from a terminal, isatty() returns true and parsed command line output is not printed, which is inconvenient. In this commit we add a check if the current process group controls the stdin terminal. This serves two purposes: * improves the "interactive mode" check from #scylladb/scylladb#18309, as only the controlling process group can interact with the terminal. * solves the test.py problem above, because test.py runs scylla in a new session/process group (it calls setsid after fork), and is now correctly not considered interactive. Closes scylladb/scylladb#24047	2025-05-08 06:52:48 +03:00
Michał Chojnowski	746ec1d4e4	test/boost/mvcc_test: fix an overly-strong assertion in test_snapshot_cursor_is_consistent_with_merging The test checks that merging the partition versions on-the-fly using the cursor gives the same results as merging them destructively with apply_monotonically. In particular, it tests that the continuity of both results is equal. However, there's a subtlety which makes this not true. The cursor puts empty dummy rows (i.e. dummies shadowed by the partition tombstone) in the output. But the destructive merge is allowed (as an expection to the general rule, for optimization reasons), to remove those dummies and thus reduce the continuity. So after this patch we instead check that the output of the cursor has continuity equal to the merged continuities of version. (Rather than to the continuity of merged versions, which can be smaller as described above). Refs https://github.com/scylladb/scylladb/pull/21459, a patch which did the same in a different test. Fixes https://github.com/scylladb/scylladb/issues/13642 Closes scylladb/scylladb#24044	2025-05-08 00:41:01 +02:00
Pavel Emelyanov	0a9675de01	sstable: Use fmt::to_string(sstable::filename()) to get component file path The stream sink abort() method wants to remove component file by its path. For that the path is calculated from storage prefix and component basename, but there's a filename() method for it already. SStable filenames shouldn't be considered as on-disk paths (see #23194), but places that want it should be explicit and format the filename to string by hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24039	2025-05-07 22:25:58 +03:00
Pavel Emelyanov	36baeaeb57	sstable: Move update_info_for_opened_data() method to private: block The method is internally called by ssatble itself to refresh its state after opening or assigning (from foreign info) data and index files. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24041	2025-05-07 20:58:34 +03:00
Pavel Emelyanov	c2ecc45db8	sstable: Remove validate argument from sstable::load_metadata() There are only two callers of the method and the one that wants validation (the sstable::load()) can do it on its own. This helps the other caller (schema loader) being simpler and shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24038	2025-05-07 20:57:37 +03:00
Michał Chojnowski	f075674ebe	test: add test/boost/sstable_compressor_factory_test Add a basic test for NUMA awareness of `default_sstable_compressor_factory`.	2025-05-07 14:43:20 +02:00
Michał Chojnowski	518f04f1c4	compress: add some test-only APIs Will be needed by the test added in the next patch.	2025-05-07 14:43:20 +02:00
Michał Chojnowski	66a454f61d	compress: rename sstable_compressor_factory_impl to dictionary_holder Since sstable_compressor_factory_impl no longer implements sstable_compressor_factory, the name can be misleading. Rename it to something closer to its new role.	2025-05-07 14:43:20 +02:00
Michał Chojnowski	e952992560	compress: fix indentation Purely cosmetic.	2025-05-07 14:43:20 +02:00
Michał Chojnowski	6b831aaf1b	compress: remove sstable_compressor_factory_impl::_owner_shard Before the series, sstable_compressor_factory_impl was directly accessed by multiple shards. Now, it's a part of a `sharded` data structure and is never directly from other shards, so there's no need to check for that. Remove the leftover logic.	2025-05-07 14:43:20 +02:00
Michał Chojnowski	1bcf77951c	compress: distribute compression dictionaries over shards We don't want each shard to have its own copy of each dictionary. It would unnecessary pressure on cache and memory. Instead, we want to share dictionaries between shards. Before this commit, all dictionaries live on shard 0. All other shards borrow foreign shared pointers from shard 0. There's a problem with this setup: dictionary blobs receive many random accesses. If shard 0 is on a remote NUMA node, this could pose a performance problem. Therefore, for each dictionary, we would like to have one copy per NUMA node, not one copy per the entire machine. And each shard should use the copy belonging to its own NUMA node. This is the main goal of this patch. There is another issue with putting all dicts on shard 0: it eats an assymetric amount of memory from shard 0. This commit spreads the ownership of dicts over all shards within the NUMA group, to make the situation more symmetric. (Dict owner is decided based on the hash of dict contents). It should be noted that the last part isn't necessarily a good thing, though. While it makes the situation more symmetric within each node, it makes it less symmetric across the cluster, if different node sizes are present. If dicts occupy 1% of memory on each shard of a 100-shard node, then the same dicts would occupy 100% of memory on a 1-shard node. So for the sake of cluster-wide symmetry, we might later want to consider e.g. making the memory limit for dictionaries inversely proportional to the number of shards.	2025-05-07 14:43:18 +02:00
Michał Chojnowski	8649adafa8	test: switch uses of make_sstable_compressor_factory() to a seastar::thread-dependent version In next patches, make_sstable_compressor_factory() will have to disappear. In preparation for that, we switch to a seastar::thread-dependent replacement.	2025-05-07 14:43:04 +02:00
Aleksandra Martyniuk	2549f5e16b	test_tablet_repair_hosts_filter: change injected error test_tablet_repair_hosts_filter checks whether the host filter specfied for tablet repair is correctly persisted. To check this, we need to ensure that the repair is still ongoing and its data is kept. The test achieves that by failing the repair on replica side - as the failed repair is going to be retried. However, if the filter does not contain any host (included_host_count = 0), the repair is started on no replica, so the request succeeds and its data is deleted. The test fails if it checks the filter after repair request data is removed. Fail repair on topology coordinator side, so the request is ongoing regardless of the specified hosts. Fixes: #23986. Closes scylladb/scylladb#24003	2025-05-07 15:30:05 +03:00
Michał Chojnowski	0e4d0ded8d	test: remove sstables::test_env::do_with() `sstable_manager` depends on `sstable_compressor_factory&`. Currently, `test_env` obtains an implementation of this interface with the synchronous `make_sstable_compressor_factory()`. But after this patch, the only implementation of that interface `sstable_compressor_factory&` will use `sharded<...>`, so its construction will become asynchronous, and the synchronous `make_sstable_compressor_factory()` must disappear. There are several possible ways to deal with this, but I think the easiest one is to write an asynchronous replacement for `make_sstable_compressor_factory()` that will keep the same signature but will be only usable in a `seastar::thread`. All other uses of `make_sstable_compressor_factory()` outside of `test_env::do_with()` already are in seastar threads, so if we just get rid of `test_env::do_with()`, then we will be able to use that thread-dependent replacement. This is the purpose of this commit. We shouldn't be losing much.	2025-05-07 13:19:21 +02:00
Nadav Har'El	7ccf77b84f	test/alternator: another test for UpdateExpression's SET I found on StackOverflow an interesting discussion about the fact that DynamoDB's UpdateExpression documentation "recommends" to use SET instead of ADD, and the rather convoluted expression that is actually needed to emulate ADD using SET: ``` SET #count = if_not_exists(#count, :zero) + :one ``` https://stackoverflow.com/questions/14077414/dynamodb-increment-a-key-value Although we do have separate tests for the different pieces of that idiom - a SET with missing attribute or item, the if_not_exists() function, etc. - I thought it would be nice to have a dedicated test that verifies that this idiom actually works, and moreover that the more naive "SET #count = #count + :one" does NOT work if the item or the attribute are missing. Unsurprisingly, the new test passes on both Alternator and DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23963	2025-05-07 13:57:50 +03:00
Nadav Har'El	b4a9fe9928	test/alternator: another test for expression with a lot of ORs We already have a test, test_limits.py::test_deeply_nested_expression_2, which checks that in the long condition expression a<b or (a<b or (a<b or (a<b or (....)))) with more than MAX_DEPTH (=400) repeats is rejected by Alternator, as part of commit `04e5082d52` which restricted the depth of the recursive parser to prevent crashing Scylla. However, I got curious what will happen without the parentheses: a<b or a<b or a<b or a<b or ... It turns out that our parser actually parses this syntax without recursion - it's just a loop (a "*" in the Antlr alternator/expressions.g allows reading more and more ORs in a loop). So Alternator doesn't limit the length of this expression more than the length limit of 4096 bytes which we also have. We can fit 584 repeats in the above expression in 4096 bytes, and it will not be rejected even though 584 > 400. This test confirms that this is indeed the case. The test is Scylla-only because on DynamoDB, this expression is rejected because it has more than 300 "OR" operators. Scylla doesn't have this specific limit - we believe the other limitations (on total expression length, and on depth) are better for protecting Scylla. Remember that in an expression like "(((((((((((((" there is a very high recursion depth of the parser but zero operators, so counting the operators does nothing to protect Scylla. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23973	2025-05-07 13:57:18 +03:00
Piotr Dulikowski	156ff8798b	topology_coordinator: silence ERROR messages on abort When the topology coordinator is shut down while doing a long-running operation, the current operation might throw a raft::request_aborted exception. This is not a critical issue and should not be logged with ERROR verbosity level. Make sure that all the try..catch blocks in the topology coordinator which: - May try to acquire a new group0 guard in the `try` part - Have a `catch (...)` block that print an ERROR-level message ...have a pass-through `catch (raft::request_aborted&)` block which does not log the exception. Fixes: scylladb/scylladb#22649 Closes scylladb/scylladb#23962	2025-05-07 13:51:41 +03:00
Aleksandra Martyniuk	20c2d6210e	streaming: skip dropped tables Currently, stream_session::prepare throws when a table in requests or summaries is dropped. However, we do not want to fail streaming if the table is dropped. Delete table checks from stream_session::prepare. Further streaming steps can handle the dropped table and finish the streaming successfully. Fixes: #15257. Closes scylladb/scylladb#23915 scylla-2025.2.0-rc0-candidate-20250508114800 scylla-2025.2.0-rc0-candidate-20250508104010 scylla-2025.2.0-rc0-candidate-20250508112925	2025-05-07 11:51:56 +03:00
Anna Mikhlin	73b4c35601	Update ScyllaDB version to: 2025.3.0-dev	2025-05-07 11:43:11 +03:00
Pavel Emelyanov	6389099dfb	Merge 'test/cluster/test_read_repair.py: improve trace logging test (again)' from Botond Dénes The test test_read_repair_with_trace_logging wants to test read repair with trace logging. Turns out that node restart + trace-level logging + debug mode is too much and even with 1 minute timeout, the read repair times out sometimes. Refactor the test to use injection point instead of restart. To make sure the test still tests what it supposed to test, use tracing to assert that read repair did indeed happen. Fixes: scylladb/scylladb#23968 Needs backport to 2025.1 and 6.2, both have the flaky test Closes scylladb/scylladb#23989 * github.com:scylladb/scylladb: test/cluster/test_read_repair.py: improve trace logging test (again) test/cluster: extract execute_with_tracing() into pylib/util.py	2025-05-07 10:32:45 +03:00
Botond Dénes	0a9ca52cfd	replica/database: memtable_list: save ref to memtable_table_shared_data This is passed by reference to the constructor, but a copy is saved into the _table_shared_data member. A reference to this member is passed down to all memtable readers. Because of the copy, the memtable readers save a reference to the memtable_list's member, which goes away together with the memtable_list when the storage_group is destroyed. This causes use-after-free when a storage group is destroyed while a memtable read is still ongoing. The memtable reader keeps the memtable alive, but its reference to the memtable_table_shared_data becomes stale. Fix by saving a reference in the memtable_list too, so memtable readers receive a reference pointing to the original replica::table member, which is stable accross tablet migrations and merges. The copy was introduced by `2a76065e3d`. There was a copy even before this commit, but in the previous vnode-only world this was fine -- there was one memtable_list per table and it was around until the table itself was. In the tablet world, this is no longer given, but the above commit didn't account for this. A test is included, which reproduces the use-after-free on memtable migration. The test is somewhat artificial in that the use-after-free would be prevented by holding on to an ERM, but this is done intentionaly to keep the test simple. Migration -- unlike merge where this use-after-free was originally observed -- is easy to trigger from unit tests. Fixes: #23762 Closes scylladb/scylladb#23984	2025-05-06 22:13:17 +03:00
Michał Chojnowski	1c1741cfbc	logalloc_test: don't test performance in test `background_reclaim` The test is failing in CI sometimes due to performance reasons. There are at least two problems: 1. The initial 500ms (wall time) sleep might be too short. If the reclaimer doesn't manage to evict enough memory during this time, the test will fail. 2. During the 100ms (thread CPU time) window given by the test to background reclaim, the `background_reclaim` scheduling group isn't actually guaranteed to get any CPU, regardless of shares. If the process is switched out inside the `background_reclaim` group, it might accumulate so much vruntime that it won't get any more CPU again for a long time. We have seen both. This kind of timing test can't be run reliably on overcommitted machines without modifying the Seastar scheduler to support that (by e.g. using thread clock instead of wall time clock in the scheduler), and that would require an amount of effort disproportionate to the value of the test. So for now, to unflake the test, this patch removes the performance test part. (And the tradeoff is a weakening of the test).	2025-05-06 18:59:18 +02:00
Michał Chojnowski	c47f438db3	logalloc: make background_reclaimer::free_memory_threshold publicly visible Wanted by the change to the background_reclaim test in the next patch.	2025-05-06 18:59:18 +02:00
David Garcia	b1ee0e2a6a	docs: fix AttributeError with 'myst_enable_extensions' in publication workflow Rolled back some dependencies in `poetry.lock` to previous versions while we investigate how to make the extension `sphinx_scylladb_markdown` compatible with the latest versions. This should fix the error in https://github.com/scylladb/scylladb/actions/runs/14708656912/job/41275115239, which currently prevents publishing new versions of https://opensource.docs.scylladb.com/ Closes scylladb/scylladb#23969	2025-05-06 16:33:00 +03:00
Pavel Emelyanov	1b5bbc2433	Merge 'test.py: split boost pytest integration' from Andrei Chekun This PR contains changes that do not add new functionality, and have small refactoring of the existing code. The most significant change is the refactoring of resource gathering, so it will not create another cgroup to put itself in. So there will be no nested redundant 'initial' groups, e.x. `/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/initial/initial/initial.../initial` This is part two of splitting the original PR. This PR is an extraction of several commits from https://github.com/scylladb/scylladb/pull/22894 as reviewer https://github.com/scylladb/scylladb/pull/22894?notification_referrer_id=NT_kwDOACiLR7MxNDg0ODk2MDU1MjoyNjU3MDk1&notifications_query=reason%3Aparticipating#pullrequestreview-2778582278. Closes scylladb/scylladb#23882 * github.com:scylladb/scylladb: test.py: add awareness of extra_scylla_cmdline_options test.py: increase timeout for C++ tests in pytest test.py: switch method of finding the root repo directory test.py: move get_combined_tests to the correct facade test.py: add common directory for reports test.py: add the possibility to provide additional env vars test.py: move setup cgroups to the generic method test.py: refactor resource_gather.py	2025-05-06 16:22:49 +03:00
Raphael S. Carvalho	434c2c4649	replica: Fix use-after-free with concurrent schema change and sstable set update When schema is changed, sstable set is updated according to the compaction strategy of the new schema (no changes to set are actually made, just the underlying set type is updated), but the problem is that it happens without a lock, causing a use-after-free when running concurrently to another set update. Example: 1) A: sstable set is being updated on compaction completion 2) B: schema change updates the set (it's non deferring, so it happens in one go) and frees the set used by A. 3) when A resumes, system will likely crash since the set is freed already. ASAN screams about it: SUMMARY: AddressSanitizer: heap-use-after-free sstables/sstable_set.cc ... Fix is about deferring update of the set on schema change to compaction, which is triggered after new schema is set. Only strategy state and backlog tracker are updated immediately, which is fine since strategy doesn't depend on any particular implementation of sstable set, since patch "sstables: Implement sstable_set_impl::all_sstable_runs()". Fixes #22040. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-05-06 10:06:55 -03:00
Raphael S. Carvalho	628bec4dbd	sstables: Implement sstable_set_impl::all_sstable_runs() With upcoming change where table::set_compaction_strategy() might delay update of sstable set, ICS might temporarily work with sstable set implementations other than partitioned_sstable_set. ICS relies on all_sstable_runs() during regular compaction, and today it triggers bad_function_call exception if not overriden by set implementation. To remove this strong dependency between compaction strategy and a particular set implementation, let's provide a default implementation of all_sstable_runs(), such that ICS will still work until the set is updated eventually through a process that adds or remove a sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-05-06 10:06:06 -03:00

1 2 3 4 5 ...

47739 Commits