scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 02:20:37 +00:00

Author	SHA1	Message	Date
Dawid Mędrek	5c5911d874	test/cluster/test_tablets: Divide rack into two to adjust tests to RF-rack-validity Three tests in the file use a multi-DC cluster. Unfortunately, they put all of the nodes in a DC in the same rack and because of that, they fail when run with the `rf_rack_valid_keyspaces` configuration option enabled. Since the tests revolve mostly around zero-token nodes and how they affect replication in a keyspace, this change should have zero impact on them. (cherry picked from commit `c8c28dae92`)	2025-05-12 13:10:12 +00:00
Dawid Mędrek	6a2e52d250	test/cluster/test_tablets: Adjust test_tablet_rf_change to RF-rack-validity We reduce the number of nodes and the RF values used in the test to make sure that the test can be run with the `rf_rack_valid_keyspaces` configuration option. The test doesn't seem to be reliant on the exact number of nodes, so the reduction should not make any difference. (cherry picked from commit `04567c28a3`)	2025-05-12 13:10:12 +00:00
Dawid Mędrek	f98c83b92f	test/cluster/test_tablet_repair_scheduler.py: Adjust to RF-rack-validity The change boils down to matching the number of created racks to the number of created nodes in each DC in the auxiliary function `prepare_multi_dc_repair`. This way, we ensure that the created keyspace will be RF-rack-valid and so we can run the test file even with the `rf_rack_valid_keyspaces` configuration option enabled. The change has no impact on the tests that use the function; the distribution of nodes across racks does not affect how repair is performed or what the tests do and verify. Because of that, the change is correct. (cherry picked from commit `d3c0cd6d9d`)	2025-05-12 13:10:12 +00:00
Dawid Mędrek	f5cf4a3893	test/pylib/repair.py: Assign nodes to multiple racks in create_table_insert_data_for_repair We assign the newly created nodes to multiple racks. If RF <= 3, we create as many racks as the provided RF. We disallow the case of RF > 3 to avoid trying to create an RF-rack-invalid keyspace; note that no existing test calls `create_table_insert_data_for_repair` providing a higher RF. The rationale for doing this is we want to ensure that the tests calling the function can be run with the `rf_rack_valid_keyspaces` configuration option enabled. (cherry picked from commit `5d1bb8ebc5`)	2025-05-12 13:10:12 +00:00
Dawid Mędrek	12f0136b26	test/cluster/test_zero_token_nodes_topology_ops: Adjust to RF-rack-validity We assign the nodes to the same DC, but multiple racks to ensure that the created keyspace is RF-rack-valid and we can run the test with the `rf_rack_valid_keyspaces` configuration option enabled. The changes do not affect what the test does and verifies. (cherry picked from commit `92f7d5bf10`)	2025-05-12 13:10:12 +00:00
Dawid Mędrek	4e45ceda21	test/cluster/test_zero_token_nodes_no_replication.py: Adjust to RF-rack-validity We simply assign the nodes used in the test to seprate racks to ensure that the created keyspace is RF-rack-valid to be able to run the test with the `rf_rack_valid_keyspaces` configuration option set to true. The change does not affect what the test does and verifies -- it only depends on the type of nodes, whether they are normal token owners or not -- and so the changes are correct in that sense. (cherry picked from commit `4c46551c6b`)	2025-05-12 13:10:12 +00:00
Dawid Mędrek	2c8b5143ba	test/cluster/test_zero_token_nodes_multidc.py: Adjust to RF-rack-validity We parameterize the test so it's run with and without enforced RF-rack-valid keyspaces. In the test itself, we introduce a branch to make sure that we won't run into a situation where we're attempting to create an RF-rack-invalid keyspace. Since the `rf_rack_valid_keyspaces` option is not commonly used yet and because its semantics will most likely change in the future, we decide to parameterize the test rather than try to get rid of some of the test cases that are problematic with the option enabled. (cherry picked from commit `2882b7e48a`)	2025-05-12 13:10:12 +00:00
Dawid Mędrek	474de0f048	test/cluster/test_not_enough_token_owners.py: Adjust to RF-rack-validity We simply assign DC/rack properties to every node used in the test. We put all of them in the same DC to make sure that the cluster behaves as closely to how it would before these changes. However, we distribute them over multiple racks to ensure that the keyspace used in the test is RF-rack-valid, so we can also run it with the `rf_rack_valid_keyspaces` configuration option set to true. The distribution of nodes between racks has no effect on what the test does and verifies, so the changes are correct in that sense. (cherry picked from commit `73b22d4f6b`)	2025-05-12 13:10:12 +00:00
Dawid Mędrek	5ac07a6c72	test/cluster/test_multidc.py: Adjust to RF-rack-validity Instead of putting all of the nodes in a DC in the same rack in `test_putget_2dc_with_rf`, we assign them to different racks. The distribution of nodes in racks is orthogonal to what the test is doing and verifying, so the change is correct in that sense. At the same time, it ensures that the test never violates the invariant of RF-rack-valid keyspaces, so we can also run it with `rf_rack_valid_keyspaces` set to true. (cherry picked from commit `5b83304b38`)	2025-05-12 13:10:11 +00:00
Dawid Mędrek	f88d8edcaf	test/cluster/object_store/test_backup.py: Adjust to RF-rack-validity We modify the parameters of `test_restore_with_streaming_scopes` so that it now represents a pair of values: topology layout and the value `rf_rack_valid_keyspaces` should be set to. Two of the already existing parameters violate RF-rack-validity and so the test would fail when run with `rf_rack_valid_keyspaces: true`. However, since the option isn't commonly used yet and since the semantics of RF-rack-valid keyspaces will most likely change in the future, let's keep those cases and just run them with the option disabled. This way, we still test everything we can without running into undesired failures that don't indicate anything. (cherry picked from commit `9281bff0e3`)	2025-05-12 13:10:11 +00:00
Dawid Mędrek	05c70b0820	test/cluster: Adjust simple tests to RF-rack-validity We adjust all of the simple cases of cluster tests so they work with `rf_rack_valid_keyspaces: true`. It boils down to assigning nodes to multiple racks. For most of the changes, we do that by: * Using `pytest.mark.prepare_3_racks_cluster` instead of `pytest.mark.prepare_3_nodes_cluster`. * Using an additional argument -- `auto_rack_dc` -- when calling `ManagerClient::servers_add()`. In some cases, we need to assign the racks manually, which may be less obvious, but in every such situation, the tests didn't rely on that assignment, so that doesn't affect them or what they verify. (cherry picked from commit `dbb8835fdf`)	2025-05-12 13:10:11 +00:00
Patryk Jędrzejczak	2b1b4d1dfc	Merge '[Backport 2025.2] Correctly skip updating node's own ip address due to oudated gossiper data ' from Scylladb[bot] Used host id to check if the update is for the node itself. Using IP is unreliable since if a node is restarted with different IP a gossiper message with previous IP can be misinterpreted as belonging to a different node. Fixes: #22777 Backport to 2025.1 since this fixes a crash. Older version do not have the code. - (cherry picked from commit `a2178b7c31`) - (cherry picked from commit `ecd14753c0`) - (cherry picked from commit `7403de241c`) Parent PR: #24000 Closes scylladb/scylladb#24089 * https://github.com/scylladb/scylladb: test: add reproducer for #22777 storage_service: Do not remove gossiper entry on address change storage_service: use id to check for local node	2025-05-12 09:31:20 +02:00
Gleb Natapov	827563902c	test: add reproducer for #22777 Add sleep before starting gossiper to increase a chance of getting old gossiper entry about yourself before updating local gossiper info with new IP address. (cherry picked from commit `7403de241c`)	2025-05-09 12:56:15 +00:00
Gleb Natapov	ccf194bd89	storage_service: Do not remove gossiper entry on address change When gossiper indexed entries by ip an old entry had to be removed on an address change, but the index is id based, so even if ip was change the entry should stay. Gossiper simply updates an ip address there. (cherry picked from commit `ecd14753c0`)	2025-05-09 12:56:15 +00:00
Gleb Natapov	9b735bb4dc	storage_service: use id to check for local node IP may change and an old gossiper message with previous IP may be processed when it shouldn't. Fixes: #22777 (cherry picked from commit `a2178b7c31`)	2025-05-09 12:56:15 +00:00
Michał Chojnowski	f29b87970a	test/boost/mvcc_test: fix an overly-strong assertion in test_snapshot_cursor_is_consistent_with_merging The test checks that merging the partition versions on-the-fly using the cursor gives the same results as merging them destructively with apply_monotonically. In particular, it tests that the continuity of both results is equal. However, there's a subtlety which makes this not true. The cursor puts empty dummy rows (i.e. dummies shadowed by the partition tombstone) in the output. But the destructive merge is allowed (as an expection to the general rule, for optimization reasons), to remove those dummies and thus reduce the continuity. So after this patch we instead check that the output of the cursor has continuity equal to the merged continuities of version. (Rather than to the continuity of merged versions, which can be smaller as described above). Refs https://github.com/scylladb/scylladb/pull/21459, a patch which did the same in a different test. Fixes https://github.com/scylladb/scylladb/issues/13642 Closes scylladb/scylladb#24044 (cherry picked from commit `746ec1d4e4`) Closes scylladb/scylladb#24083	2025-05-09 13:00:34 +02:00
Botond Dénes	17a76b6264	Merge '[Backport 2025.2] test/cluster/test_read_repair.py: improve trace logging test (again)' from Scylladb[bot] The test test_read_repair_with_trace_logging wants to test read repair with trace logging. Turns out that node restart + trace-level logging + debug mode is too much and even with 1 minute timeout, the read repair times out sometimes. Refactor the test to use injection point instead of restart. To make sure the test still tests what it supposed to test, use tracing to assert that read repair did indeed happen. Fixes: scylladb/scylladb#23968 Needs backport to 2025.1 and 6.2, both have the flaky test - (cherry picked from commit `51025de755`) - (cherry picked from commit `29eedaa0e5`) Parent PR: #23989 Closes scylladb/scylladb#24051 * github.com:scylladb/scylladb: test/cluster/test_read_repair.py: improve trace logging test (again) test/cluster: extract execute_with_tracing() into pylib/util.py	2025-05-08 11:01:18 +03:00
Aleksandra Martyniuk	ab45df1aa1	streaming: skip dropped tables Currently, stream_session::prepare throws when a table in requests or summaries is dropped. However, we do not want to fail streaming if the table is dropped. Delete table checks from stream_session::prepare. Further streaming steps can handle the dropped table and finish the streaming successfully. Fixes: #15257. Closes scylladb/scylladb#23915 (cherry picked from commit `20c2d6210e`) Closes scylladb/scylladb#24053	2025-05-08 11:00:27 +03:00
Botond Dénes	97f0f312e0	test/cluster/test_read_repair.py: improve trace logging test (again) The test test_read_repair_with_trace_logging wants to test read repair with trace logging. Turns out that node restart + trace-level logging + debug mode is too much and even with 1 minute timeout, the read repair times out sometimes. Refactor the test to use injection point instead of restart. To make sure the test still tests what it supposed to test, use tracing to assert that read repair did indeed happen. (cherry picked from commit `29eedaa0e5`)	2025-05-07 13:26:08 +00:00
Botond Dénes	4df6a17d30	test/cluster: extract execute_with_tracing() into pylib/util.py To allow reuse in other tests. (cherry picked from commit `51025de755`)	2025-05-07 13:26:08 +00:00
Anna Mikhlin	b3dbfaf27a	Update ScyllaDB version to: 2025.2.0-rc0 scylla-2025.2.0-rc0-candidate-20250508122345 scylla-2025.2.0-rc0-candidate-20250508120337 scylla-2025.2.0-rc0 scylla-2025.2.0-rc0-candidate-20250508124206	2025-05-07 11:41:33 +03:00
Botond Dénes	0a9ca52cfd	replica/database: memtable_list: save ref to memtable_table_shared_data This is passed by reference to the constructor, but a copy is saved into the _table_shared_data member. A reference to this member is passed down to all memtable readers. Because of the copy, the memtable readers save a reference to the memtable_list's member, which goes away together with the memtable_list when the storage_group is destroyed. This causes use-after-free when a storage group is destroyed while a memtable read is still ongoing. The memtable reader keeps the memtable alive, but its reference to the memtable_table_shared_data becomes stale. Fix by saving a reference in the memtable_list too, so memtable readers receive a reference pointing to the original replica::table member, which is stable accross tablet migrations and merges. The copy was introduced by `2a76065e3d`. There was a copy even before this commit, but in the previous vnode-only world this was fine -- there was one memtable_list per table and it was around until the table itself was. In the tablet world, this is no longer given, but the above commit didn't account for this. A test is included, which reproduces the use-after-free on memtable migration. The test is somewhat artificial in that the use-after-free would be prevented by holding on to an ERM, but this is done intentionaly to keep the test simple. Migration -- unlike merge where this use-after-free was originally observed -- is easy to trigger from unit tests. Fixes: #23762 Closes scylladb/scylladb#23984	2025-05-06 22:13:17 +03:00
David Garcia	b1ee0e2a6a	docs: fix AttributeError with 'myst_enable_extensions' in publication workflow Rolled back some dependencies in `poetry.lock` to previous versions while we investigate how to make the extension `sphinx_scylladb_markdown` compatible with the latest versions. This should fix the error in https://github.com/scylladb/scylladb/actions/runs/14708656912/job/41275115239, which currently prevents publishing new versions of https://opensource.docs.scylladb.com/ Closes scylladb/scylladb#23969	2025-05-06 16:33:00 +03:00
Pavel Emelyanov	1b5bbc2433	Merge 'test.py: split boost pytest integration' from Andrei Chekun This PR contains changes that do not add new functionality, and have small refactoring of the existing code. The most significant change is the refactoring of resource gathering, so it will not create another cgroup to put itself in. So there will be no nested redundant 'initial' groups, e.x. `/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/initial/initial/initial.../initial` This is part two of splitting the original PR. This PR is an extraction of several commits from https://github.com/scylladb/scylladb/pull/22894 as reviewer https://github.com/scylladb/scylladb/pull/22894?notification_referrer_id=NT_kwDOACiLR7MxNDg0ODk2MDU1MjoyNjU3MDk1&notifications_query=reason%3Aparticipating#pullrequestreview-2778582278. Closes scylladb/scylladb#23882 * github.com:scylladb/scylladb: test.py: add awareness of extra_scylla_cmdline_options test.py: increase timeout for C++ tests in pytest test.py: switch method of finding the root repo directory test.py: move get_combined_tests to the correct facade test.py: add common directory for reports test.py: add the possibility to provide additional env vars test.py: move setup cgroups to the generic method test.py: refactor resource_gather.py	2025-05-06 16:22:49 +03:00
Botond Dénes	3c3f6ca233	tools/scylla-sstable: scrub: use UUID sstable identifiers Much easier to avoid sstable collisions. Makes it possible to scrub multiple sstables, with multiple calls to scylla-sstable, reusing the same output directory. Previously, each new call to scylla-sstable scrub, would start from generation 0, guaranteeing collision. Remove the unit test for generation clash -- with UUID generations, this is no longer possible to reproduce in practice. Refs: #21387 Closes scylladb/scylladb#23990	2025-05-06 15:09:53 +03:00
Patryk Jędrzejczak	7f843e0a5c	Merge 'raft: make sure to retain the existing voters including the current leader (topology coordinator)' from Emil Maskovsky Fix an issue in the voter calculator where existing voters were not retained across data centers and racks in certain scenarios. This occurred when voters were distributed across more data centers and racks than the maximum allowed number of voters. Previously, the prioritization logic for data centers and racks did not consider the number of existing assigned voters. It only prioritized nodes within a single data center or rack, which could result in unnecessary reassignment of voters. Improved the prioritization logic to account for the number of existing assigned voters in each data center and rack. Additionally, the limited voters feature did not account for the existing topology coordinator (Raft leader) when selecting voters to be removed. As a result, the limited voters calculator could inadvertently remove the votership of the topology coordinator, triggering unnecessary Raft leader re-election. To address this, the topology coordinator's votership status is now preserved unless absolutely necessary. When choosing between otherwise equivalent voters, the node other than the existing topology coordinator is prioritized for removal. This change ensures a more stable voter distribution and reduces unnecessary voter reassignments. The limited voters calculator is refactored to use a priority queue for sorting nodes by their priorities. This change simplifies the voter selection logic and makes it more extensible for future enhancements, such as supporting more complex priority calculations. Fixes: scylladb/scylladb#23950 Fixes: scylladb/scylladb#23588 Fixes: scylladb/scylladb#23786 No backport: The limited voters feature is currently only present in master. Closes scylladb/scylladb#23888 * https://github.com/scylladb/scylladb: raft: ensure topology coordinator retains votership raft: retain existing voters across data centers and racks raft: refactor limited voters calculator to prioritize nodes raft: replace pointer with reference for non-null output parameter raft: reduce code duplication in group0 voter handler raft: unify and optimize datacenter and rack info creation	2025-05-06 13:49:55 +02:00
Nadav Har'El	252c5b5c9d	Merge 'Alternator batch_write_item wcu' from Amnon Heiman This series adds support for WCU tracking in batch_write_item and tests it. The patches include: Switch the metrics (RCU and WCU) to count units vs half-units as they were, to make the metrics clearer for users. Adding a public static get_half_units function to wcu_consumed_capacity_counter for use by batch write item, which cannot directly use the counter object. Adding WCU calculation support to batch_write_item, based on item size for puts and a fixed 1 WCU for deletes. WCU metrics are updated, and consumed capacity is returned per table when requested. The return handling was refactored to be coroutine-like for easier management of the consumed capacity array. Adding tests that validate WCU calculation for batch put requests on a single table and across multiple tables, ensuring delete operations are counted correctly. Adding a test that validates that WCU metrics are updated correctly during batch write item operations, ensuring the WCU of each item is calculated independently. Need backport, WCU is partially supported, and is missing from batch_write_item Fixes #23940 Closes scylladb/scylladb#23941 * github.com:scylladb/scylladb: alternator/test_metrics.py: batch_write validate WCU alternator/test_returnconsumedcapacity.py: Add tests for batch write WCU alternator/executor: add WCU for batch_write_items alternator/consumed_capacity: make wcu get_units public Alternator: Change the WCU/RCU to use units	2025-05-06 13:31:53 +03:00
Avi Kivity	fc2204cea0	Merge ' test/boost/multishard_mutation_query_test: fix test_read_with_partition_row_limits' from Botond Dénes This test has multiple problems: * has 3 embedded loops to run different scenarios, ignores variable from 2 of these, running with hardcoded settings instead * initializes misses and lookups to 0 at the start of each scenario, this throws off per-page increment checks, when the previous scenario moved these metrics and they don't start from 0; this causes the test to sometimes fail * duplicate check of drops == 0 (just cosmetic) Fix all three problems, the second is especially important because it made the test flaky. Additionally, ensure the test will keep using vnodes in the future, by explicitly creating a vnodes keyspace for them. Fixes: #16794 Test fix, not a backport candidate normally, we can backport to 2025.1 if the test becomes too unstable there Closes scylladb/scylladb#23783 * github.com:scylladb/scylladb: test/boost/multishard_mutation_query_test: ensure test runs with vnodes test/boost/multishard_mutation_query_test: fix test_read_with_partition_row_limits	2025-05-05 20:49:03 +03:00
Emil Maskovsky	24dfd2034b	raft: ensure topology coordinator retains votership The limited voters feature did not account for the existing topology coordinator (Raft leader) when selecting voters to be removed. As a result, the limited voters calculator could inadvertently remove the votership of the current topology coordinator, triggering an unnecessary Raft leader re-election. This change ensures that the existing topology coordinator's votership status is preserved unless absolutely necessary. When choosing between otherwise equivalent voters, the node other than the topology coordinator is prioritized for removal. This helps maintain stability in the cluster by avoiding unnecessary leader re-elections. Additionally, only the alive leader node is considered relevant for this logic. A dead existing leader (topology coordinator) is excluded from consideration, as it is already in the process of losing leadership. Fixes: scylladb/scylladb#23588 Fixes: scylladb/scylladb#23786	2025-05-05 16:58:34 +02:00
Emil Maskovsky	2ae59e8a87	raft: retain existing voters across data centers and racks Fix an issue in the voter calculator where existing voters were not retained across data centers and racks in certain scenarios. This occurred when voters were distributed across more data centers and racks than the maximum allowed number of voters. Previously, the prioritization logic for data centers and racks did not consider the number of existing assigned voters. It only prioritized nodes within a single data center or rack, which could result in unnecessary reassignment of voters. Improved the prioritization logic to account for the number of existing voters in each data center and rack. This change ensures a more stable voter distribution and reduces unnecessary voter reassignments. Fixes: scylladb/scylladb#23950	2025-05-05 16:51:48 +02:00
Emil Maskovsky	018fb63305	raft: refactor limited voters calculator to prioritize nodes Refactor the limited voters calculator to use a priority queue for sorting nodes by their priorities. This change simplifies the voter selection logic and makes it more extensible for future enhancements, such as supporting more complex priority calculations. The priority value is determined based on the node's existing status, including whether it is alive, a voter, or any further criteria.	2025-05-05 16:36:17 +02:00
Emil Maskovsky	26fdc7b8f8	raft: replace pointer with reference for non-null output parameter The output parameter cannot be `null`. Previously, a pointer was used to make it explicit that the parameter is an output parameter being modified. However, this is unnecessary, as references are more appropriate for parameters that cannot be `null`. Switching to a reference improves code readability and ensures the parameter's non-null constraint is enforced at the type level.	2025-05-05 16:12:00 +02:00
Emil Maskovsky	f0468860a3	raft: reduce code duplication in group0 voter handler Refactor the group0 voter handler by introducing a helper lambda to handle the common logic for adding a node. This eliminates unnecessary code duplication. This refactor does not introduce any functional changes but prepares the codebase for easier future modifications.	2025-05-05 16:09:53 +02:00
Botond Dénes	855411caad	test/boost/multishard_mutation_query_test: ensure test runs with vnodes All tests in this suite use the default "ks" keyspace from cql_test_env. This keyspace has tablet support and at any time we might decide to make it use tablets by default. This would make all these tests use the tablet path in multishard_mutation_query.cc. These tests were created to test the vastly more complex vnodes code path in said file. The tablet path is much simpler and it is only used by SELECT * FROM MUTATION_FRAGMENTS() and which has its own correctness tests. So explicitely create a vnodes keyspace and use it in all the tests to restore the test functionality.	2025-05-05 09:22:54 -04:00
Botond Dénes	1175e1ed49	test/boost/multishard_mutation_query_test: fix test_read_with_partition_row_limits This test has multiple problems: * has 3 embedded loops to run different scenarios, ignores variable from 2 of these, running with hardcoded settings instead * initializes misses and lookups to 0 at the start of each scenario, this throws off per-page increment checks, when the previous scenario moved these metrics and they don't start from 0; this causes the test to sometimes fail * duplicate check of drops == 0 (just cosmetic) Fix all three problems, the second is especially important because it made the test flaky.	2025-05-05 09:22:53 -04:00
Emil Maskovsky	2ef654149f	raft: unify and optimize datacenter and rack info creation Refactor the code to use a consistent pattern for creating the datacenter info list and the rack info list. Both now use a map of vectors, which improves efficiency by reducing temporary conversions to maps/sets during node list processing. Also ensure the node descriptor is passed by reference instead of by copy, leveraging the guaranteed lifetime of the descriptors.	2025-05-05 15:15:17 +02:00
Pavel Emelyanov	cf1ffd6086	Merge 'sstables_loader: fix the racing between get_progress() and release_resources()' from Kefu Chai This change addresses a critical race condition in the sstables_loader where `get_progress()` could access invalid `progress_holder` instances after `release_resources()` destroyed them. Problem: - Progress tracking uses two components: `_progress_state` (tracks state) and `_progress_per_shard` (sharded service with actual progress data) - `get_progress()` first checks if `_progress_state` is initialized, then accumulates progress from `_progress_per_shard` - As both functions are coroutines, `get_progress()` could be preempted after state check but before accessing `_progress_per_shard` - If `release_resources()` runs during this preemption, it destroys the `progress_holder` instances in `_progress_per_shard`, causing `get_progress()` to access invalid memory. Solution: - Implemented shared/exclusive locking to protect access to both state and sharded progress data - Multiple `get_progress()` calls can execute in parallel (shared access) - `release_resources()` acquires exclusive access before modifying resources - This prevents potential memory corruption and ensures consistent progress reporting Fixes #23801 --- this change addresses a racing related to tracking the restore progress from S3 using scylla's native API, which is not used in production yet, hence no need to backport. Closes scylladb/scylladb#23808 * github.com:scylladb/scylladb: sstables_loader: fix the indent sstables_loader: fix the racing between get_progress() and release_resources()	2025-05-05 15:45:15 +03:00
Avi Kivity	e688e89430	tools: toolchain: clear .cache and .cargo directories The .cache and .cargo directories are used during pip and rust builds when preparing the toolchain, but aren't useful afterwards. Remove them to save a bit of space. Closes scylladb/scylladb#23955	2025-05-05 14:43:14 +03:00
Avi Kivity	4c1f4c419c	tools: toolchain: dbuild: run as root in container under podman Running as root enables nested containers under podman without trouble from uid remapping. Unlike docker, under podman uid 0 in the container is remapped to the host uid for bind mounts, so writes to the build directory do not end up owned by root on the host. Nested containers will allow us to consume opensearch, cassandra-stress, and minio as containers rather than embedding them into the frozen toolchain. Closes scylladb/scylladb#23954	2025-05-05 14:40:43 +03:00
Amnon Heiman	2ab99d7a07	alternator/test_metrics.py: batch_write validate WCU This patch adds a test that verifies the WCU metrics are updated correctly during a batch_write_item operation. It ensures that the WCU of each item is calculated independently. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-05-05 13:20:24 +03:00
Amnon Heiman	14570f1bb5	alternator/test_returnconsumedcapacity.py: Add tests for batch write WCU This patch adds two tests: A test that validates WCU calculation for batch put requests on a single table. A test that validates WCU calculation for batch requests across multiple tables, including ensuring that delete operations are counted as 1 WCU. Both tests verify that the consumed capacity is reported correctly according to the WCU rules. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-05-05 13:20:23 +03:00
Amnon Heiman	68db77643f	alternator/executor: add WCU for batch_write_items This patch adds consumed capacity unit support to batch_write_item. It calculates the WCU based on an item's length (for put) or a static 1 WCU (for delete), for each item on each table. The WCU metrics are always updated. if the user requests consumed capacity, a vector of consumed capacity is returned with an entry for each of the tables. For code simplicity, the return part of batch_write_item was updated to be coroutine-like; this makes it easier to manage the life cycle of the returned consumed_capacity array. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-05-05 13:20:14 +03:00
Amnon Heiman	f2ade71f4f	alternator/consumed_capacity: make wcu get_units public This patch adds a public static get_units function to wcu_consumed_capacity_counter. It will be used by the batch write item implementation, which cannot use the wcu_consumed_capacity_counter directly. Signed-off-by: Amnon Heiman <amnon@scylladb.com> consume_capacity need merge	2025-05-05 13:19:04 +03:00
Amnon Heiman	5ae11746fa	Alternator: Change the WCU/RCU to use units This patch changes the RCU/WCU Alternator metrics to use whole units instead of half units. The change includes the following: Change the metrics documentation. Keep the RCU counter internally in half units, but return the actual (whole unit) value. Change the RCU name to be rcu_half_units_total to indicates that it counts half units. Change the WCU to count in whole units instead of half units. Update the tests accordingly. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2025-05-05 13:18:09 +03:00
Anna Stuchlik	851a433663	doc: add a link to the previous Enterprise documentation This commit adds a link to the docs for previous Enterprise versions at https://enterprise.docs.scylladb.com/ to the left menu. As we still support versions 2024.1 and 2024.2, we need to ensure easier access to those docs sets. Fixes https://github.com/scylladb/scylladb/issues/23870 Closes scylladb/scylladb#23945	2025-05-05 12:16:47 +03:00
Avi Kivity	04fb2c026d	config: decrease default large allocation warning threshold to 128k Back in 2017 (`5a2439e702`), we introduced a check for large allocations as they can stall the memory allocator. The warning threshold was set at 1 MB. Since then many fixes for large allocations went in and it is now time to reduce the threshold further. We reduce it here to 128 kB, the natural allocation size for the system. A quick run showed no warnings. Closes scylladb/scylladb#23975	2025-05-05 12:13:48 +03:00
Pavel Emelyanov	b56d6fbb84	Merge 'sstables: Fix quadratic space complexity in partitioned_sstable_set' from Raphael Raph Carvalho Interval map is very susceptible to quadratic space behavior when it's flooded with many entries overlapping all (or most of) intervals, since each such entry will have presence on all intervals it overlaps with. A trigger we observed was memtable flush storm, which creates many small "L0" sstables that spans roughly the entire token range. Since we cannot rely on insertion order, solution will be about storing sstables with such wide ranges in a vector (unleveled). There should be no consequence for single-key reads, since upper layer applies an additional filtering based on token of key being queried. And for range scans, there can be an increase in memory usage, but not significant because the sstables span an wide range and would have been selected in the combined reader if the range of scan overlaps with them. Anyway, this is a protection against storm of memtable flushes and shouldn't be the common scenario. It works both with tablets and vnodes, by adjusting the token range spanned by compaction group accordingly. Fixes #23634. We can backport this into 2024.2, 2025.1, but we should let this cook in master for 1 month or so. Closes scylladb/scylladb#23806 * github.com:scylladb/scylladb: test: Verify partitioned set store split and unsplit correctly sstables: Fix quadratic space complexity in partitioned_sstable_set compaction: Wire table_state into make_sstable_set() compaction: Introduce token_range() to table_state dht: Add overlap_ratio() for token range	2025-05-05 11:28:38 +03:00
David Garcia	4ba7182515	docs: fix md redirections for multiversion support This change resolves an issue where selecting a version from the multiversion dropdown on Markdown pages (e.g. https://docs.scylladb.com/manual/stable/alternator/getting-started.html) incorrectly redirected users to the main page instead of the corresponding versioned page. The underlying cause was that the `multiversion` extension relies on `source_suffix` to identify available pages for URL mapping. Without this configuration, proper redirection fails for `.md` files. This fix should be backported to `2025.1` to ensure correct behavior. Otherwise, the fix will only take effect in future releases. Testing locally is non-trivial: clone the repository, apply the changes to each relevant branch, set `smv_remote_whitelist` to "", then run `make multiversionpreview`. Afterward, switch between versions in the dropdown to verify behavior. I've tested it locally, so the best next step is to merge and confirm that it works as expected in the live environment. Closes scylladb/scylladb#23957	2025-05-05 10:39:39 +03:00
Pavel Emelyanov	7b786d9398	topology_coordinator: Use this->_feature_service directly This dependency is already there, topology coordinator doesn't need to use database reference to get to the features. Previous patch of the same kind: `b79137eaa4` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23777	2025-05-05 09:37:29 +02:00
Piotr Dulikowski	05c797795f	Merge 'Simplify test/sstable_assertions class API' from Pavel Emelyanov It had recently been patched to re-use the sstables::test class functionality (scylladb/scylladb#23697), now it can be put on some more strict diet. Closes scylladb/scylladb#23815 * github.com:scylladb/scylladb: test: Remove sstable_assertions::get_stats_metadata() test: Add sstable_assertions::operator->()	2025-05-05 09:33:45 +02:00

1 2 3 4 5 ...

47701 Commits