scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Ernest Zaslavsky	6a3cef5703	metadata: Correct "DESCRIBE" output for keyspace metadata Update the "DESCRIBE" command output to accurately display `tablet` settings in keyspace metadata. Closes scylladb/scylladb#23056	2025-03-09 14:50:08 +02:00
Robert Bindar	27f2d64725	Remove object storage config credentials provider During development of #22428 we decided that we have no need for `object-storage.yaml`, and we'd rather store the endpoints in `scylla.yaml` and get a REST api to exopose the endpoints for free. This patch removes the credentials provider used to read the aws keys from this yaml file. Followup work will remove the `object-storage.yaml` file altogether and move the endpoints to `scylla.yaml`. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#22951	2025-03-07 10:40:58 +03:00
Avi Kivity	28906c9261	Merge 'scylla-sstable: introduce the query command' from Botond Dénes The scylla-sstable dump-* command suite has proven invaluable in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data` is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit. Select everything from a table: $ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-/-big-Data.db keyspace_name \| durable_writes \| replication -------------------------------+----------------+------------------------------------------------------------------------------------- system_replicated_keys \| true \| ({class : org.apache.cassandra.locator.EverywhereStrategy}) system_auth \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1}) system_schema \| true \| ({class : org.apache.cassandra.locator.LocalStrategy}) system_distributed \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3}) system \| true \| ({class : org.apache.cassandra.locator.LocalStrategy}) ks \| true \| ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) system_traces \| true \| ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2}) system_distributed_everywhere \| true \| ({class : org.apache.cassandra.locator.EverywhereStrategy}) Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability): $ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db \| jq [ { "keyspace_name": "system_schema", "durable_writes": true, "replication": { "class": "org.apache.cassandra.locator.LocalStrategy" } }, { "keyspace_name": "system", "durable_writes": true, "replication": { "class": "org.apache.cassandra.locator.LocalStrategy" } } ] Select a specific field in a specific partition using the command-line: $ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-/-Data.db replication ------------------------------------------------------------------------------------- ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) Select a specific field in a specific partition using ``--query-file``: $ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql $ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-/-Data.db replication ------------------------------------------------------------------------------------- ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1}) New functionality: no backport needed. Closes scylladb/scylladb#22007 github.com:scylladb/scylladb: docs/operating-scylla: document scylla-sstable query test/cqlpy/test_tools.py: add tests for scylla-sstable query test/cqlpy/test_tools.py: make scylla_sstable() return table name also scylla-sstable: introduce the query command tools/utils: get_selected_operation(): use std::string for operation_options utils/rjson: streaming_writer: add RawValue() cql3/type_json: add to_json_type() test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()	2025-03-06 13:42:45 +02:00
Botond Dénes	1139cf3a98	Merge 'Speed up (and generalize) the way API calculates sstable disk usage' from Pavel Emelyanov There are several API endpoints that walk a specific list of sstables and sum up their bytes_on_disk() values. All those endpoints accumulate a map of sstable names to their sizes, then squashe the maps together and, finally, sum up the map values to report it back. Maintaining these intermediate collections is the waste of CPU and memory, the usage values can be summed up instantly. Also add a test for per-cf endpoints to validate the change, and generalize the helper functions while at it. Closes scylladb/scylladb#23143 * github.com:scylladb/scylladb: api: Generalize disk space counting for table and system api: Use map_reduce_cf_raw() overload with table name api: Don't collect sstables map to count disk space usage test: Add unit test for total/live sstable sizes	2025-03-06 11:26:35 +02:00
Raphael S. Carvalho	fedd838b9d	replica: Fix race of some operations like cleanup with snapshot There are two semaphores in table for synchronizing changes to sstable list: sstable_set_mutation_sem: used to serialize two concurrent operations updating the list, to prevent them from racing with each other. sstable_deletion_sem: A deletion guard, used to serialize deletion and iteration over the list, to prevent iteration from finding deleted files on disk. they're always taken in this order to avoid deadlocks: sstable_set_mutation_sem -> sstable_deletion_sem. problem: A = tablet cleanup B = take_snapshot() 1) A acquires sstable_set_mutation_sem for updating list 2) A acquires sstable_deletion_sem, then delete sstable before updating list 3) A releases sstable_deletion_sem, then yield 4) B acquires sstable_deletion_sem 5) B iterates through list and bumps sstable deleted in step 2 6) B fails since it cannot find the file on disk Initial reaction is to say that no procedure must delete sstable before updating the list, that's true. But we want a iteration, running concurrently to cleanup, to not find sstables being removed from the system. Otherwise, e.g. snapshot works with sstables of a tablet that was just cleaned up. That's achieved by serializing iteration with list update. Since sstable_deletion_sem is used within the scope of deletion only, it's useless for achieving this. Cleanup could acquire the deletion sem when preparing list updates, and then pass the "permit" to deletion function, but then sstable_deletion_sem would essentially become sstable_set_mutation_sem, which was created exactly to protect the list update. That being said, it makes sense to merge both semaphores. Also things become easier to reason about, and we don't have to worry about deadlocks anymore. The deletion goes through sstable_list_builder, which holds a permit throughout its lifetime, which guarantees that list updates and deletion are atomic to other concurrent operations. The interface becomes less error prone with that. It allowed us to find discard_sstables() was doing deletion without any permit, meaning another race could happen between truncate and snapshot. So we're fixing race of (truncate\|cleanup) with take_snapshot, as far as we know. It's possible another unknown races are fixed as well. Fixes #23049. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#23117	2025-03-06 11:00:48 +02:00
Nadav Har'El	e0f24c03e7	Merge 'test.py: merge all 'Topology' suite types int one folder 'cluster'' from Artsiom Mishuta Now that we support suite subfolders, there is no need to create an own suite for object_store and auth_cluster, topology, topology_custom. this PR merge all these folders into one: 'cluster" this pr also introduce and apply 'prepare_3_nodes_cluster' fixture that allow preparing non-dirty 3 nodes cluster that can be reused between tests(for tests that was in topology folder) number of tests in master release -3461 dev -3472 debug -3446 number of tests in this PR release -3460 dev -3471 debug -3445 There is a minus one test in each mode because It was 2 test_topology_failure_recovery files(topology and topology_custom) with the same utility functions but different test cases. This PR merged them into one Closes scylladb/scylladb#22917 * github.com:scylladb/scylladb: test.py: merge object_store into cluster folder test.py: merge auth_cluster into cluster folter test.py: rename topology_custom folder to cluster test.py: merge topology test suite into topology_custom test.py delete conftest in topology_custom test.py apply prepare_3_nodes_cluster in topology test.py: introduce prepare_3_nodes_cluster marker	2025-03-04 19:26:32 +02:00
Pavel Emelyanov	a8fc1d64bc	test: Add unit test for total/live sstable sizes The pair of column_family/metrics/(total\|live)_disk_space_used/{name} reports the disk usage by sstables. The test creates table, populates, flushes and checks that the size corresonds to what stat(2) reports for the respective files. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-03-04 19:52:33 +03:00
Patryk Jędrzejczak	c13b6c91d3	Merge 'raft topology: drop changing the raft voters config via storage_service' from Emil Maskovsky For the limited voters feature to work properly we need to make sure that we are only managing the voter status through the topology coordinator. This means that we should not change the node votership from the storage_service module for the raft topology directly. We can drop the voter status changes from the storage_service module because the topology coordinator will handle the votership changes eventually. The calls in the storage_service module were not essential and were only used for optimization (improving the HA under certain conditions). Furthermore, the other bundled commit improves the reaction again by reacting to the node `on_up()` and `on_down()` events, which again shortens the reaction time and improves the HA. The change has effect on the timing in the tablets migration test though, as it previously relied on the node being made non-voter from the service_storage `raft_removenode()` function. The fix is to add another server to the topology to make sure we will keep the quorum. Previously the test worked because the test waits for an injection to be reached and it was ensured that the injection (log line) has only been triggered after the node has been made non-voter from the `raft_removenode()`. This is not the case anymore. An alternative fix would be to wait for the first node to be made non-voter before stopping the second server, but this would make the test more complex (and it is not strictly required to only use 4 servers in the test, it has been only done for optimization purposes). Fixes: scylladb/scylladb#22860 Refs: scylladb/scylladb#18793 Refs: scylladb/scylladb#21969 No backport: Part of the limited voters new feature, so this shouldn't to be backported. Closes scylladb/scylladb#22847 * https://github.com/scylladb/scylladb: raft: use direct return of future for `run_op_with_retry` raft: adjust the voters interface to allow atomic changes raft topology: drop removing the node from raft config via storage_service raft topology: drop changing the raft voters config via storage_service	2025-03-04 13:59:47 +01:00
Nadav Har'El	d096aac200	test/cqlpy/run: reduce number of tablets In commit `2463e524ed`, Scylla's default changed from starting with one tablet per shard to starting 10 per shard. The functional tests don't need more tablets and it can only slow down the tests, so the patch added --tablets-initial-scale-factor=1 to test//suite.yaml but forgot to add it to test/cqlpy/run.py (to affect test/cqlpy/run) so this patch does this now. This patch should only* be about making tests faster, although to be honest, I don't see any measurable improvement in test speed (10 isn't so many). But, unfortunately, this is only part of the story. Over time we allowed a few cqlpy tests to be written in a way that relies on having only a small number of tablets or even exactly one tablet per shard (!). These tests are buggy and should be fixed - see issues #23115 and #23116 as examples. But adding the option --tablets-initial-scale-factor=1 also to run.py will make these bugs not affect test/cqlpy/run in the same way as it doesn't affect test.py. These buggy tests will still break with `pytest cqlpy` against a Scylla you ran yourself manually, so eventually will still need to fix those test bugs. Refs #23115 Refs #23116 Closes scylladb/scylladb#23125	2025-03-04 15:39:21 +03:00
Artsiom Mishuta	97a620cda9	test.py: merge object_store into cluster folder Now that we support suite subfolders, there is no need to create an own suite for object_store	2025-03-04 10:32:44 +01:00
Artsiom Mishuta	a283b391c2	test.py: merge auth_cluster into cluster folter Now that we support suite subfolders, there is no need to create an own suite for auth_cluster	2025-03-04 10:32:44 +01:00
Artsiom Mishuta	d1198f8318	test.py: rename topology_custom folder to cluster rename topology_custom folder to cluster as it contains not only topology test cases	2025-03-04 10:32:44 +01:00
Artsiom Mishuta	d8e17c4356	test.py: merge topology test suite into topology_custom Now that we support suite subfolders, there is no need to create an own suite for topology	2025-03-04 10:32:44 +01:00
Artsiom Mishuta	ef62dfa6a9	test.py delete conftest in topology_custom delete conftest in the sepatate commi for brtter diff listing during merge topology_custom and topology	2025-03-04 10:32:43 +01:00
Artsiom Mishuta	cf48444e3b	test.py apply prepare_3_nodes_cluster in topology apply prepare_3_nodes_cluster for all tests in the topology folder via applying mark at the test module level using pytestmark https://docs.pytest.org/en/stable/example/markers.html#marking-whole-classes-or-modules set initial initial_size for topology folder to 0	2025-03-04 10:32:43 +01:00
Artsiom Mishuta	20777d7fc6	test.py: introduce prepare_3_nodes_cluster marker prepare_3_nodes_cluster marker will allow preparing non-dirty 3 nodes cluster that can be reused between tests	2025-03-04 10:32:43 +01:00
Nadav Har'El	a56751e71b	test/cqlpy: fix test assuming just one tablet The cqlpy test test_compaction.py::test_compactionstats_after_major_compaction was written to assume we have just one tablet per shard - if there are many tablets compaction splitting the data, the test scenario might not need compaction in the way that the test assumes it does. Recently (commit `2463e524ed`) Scylla's default was changed to have 10 tablets per shard - not one. This broke this test. The same commit modified test/cqlpy/suite.yaml, but that affects only test.py and not test/cqlpy/run, and also not manual runs against a manually-installed Scylla. If this test absolutely requires a keyspace with 1 and not 10 tablets, then it should create one explicitly. So this is what this test does (but only if tablets are in use; if vnodes are used that's fine too). Before this patch, test/cqlpy/run test_compaction.py::test_compactionstats_after_major_compaction fails. After the patch, it passes. Fixes #23116 Closes scylladb/scylladb#23121	2025-03-04 10:15:29 +02:00
Kefu Chai	a43072a21e	cql3,test: replace boost::range::adjacent_find with std::ranges to reduce third-party dependencies and modernize the codebase. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22998	2025-03-04 10:08:02 +02:00
Artsiom Mishuta	d7f9c5654b	test.py: change test uname This commit change the test uname replacement fron "_" to "." to be able support sub-folders in scylla-pkg scripts logic Closes scylladb/scylladb#23130	2025-03-04 09:58:58 +02:00
Wojciech Mitros	dae7221342	rust: update dependencies The currently used versions of "wasmtime", "idna", "cap-std" and "cap-primitives" packages had low to moderate security issues. In this patch we update the dependencies to versions with these issues fixed. The update was performed by changing the "wasmtime" (and "wasmtime-wasi") version in rust/wasmtime_bindings/Cargo.toml and updating rust/Cargo.lock using the "cargo update" command with the affected package. To fix an issue with different dependencies having different versions of sub-dependencies, the package "smallvec" was also updated to "1.13.1". After the dependency update, the Rust code also needed to be updated because of the slightly changed API. One Wasm test case needed to be updated, as it was actually using an incorrect Wat module and not failing before. The crate also no longer allows multiple tables in Wasm modules by default - it is now enabled by setting the "gc" crate feature and configuring the Engine with config.wasm_reference_types(true). Fixes https://github.com/scylladb/scylladb/issues/23127 Closes scylladb/scylladb#23128	2025-03-04 09:45:23 +02:00
Pavel Emelyanov	e4e15a00b7	Merge 'reader_concurrency_semaphore: register_inactive_read(): handle aborted permit' from Botond Dénes It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up. Fixes: scylladb/scylladb#22919 Bug is present in all live versions, backports are required. Closes scylladb/scylladb#23044 * github.com:scylladb/scylladb: reader_concurrency_semaphore: register_inactive_read(): handle aborted permit test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()	2025-03-04 10:40:28 +03:00
Emil Maskovsky	834f506790	raft topology: drop changing the raft voters config via storage_service For the limited voters feature to work properly we need to make sure that we are only managing the voter status through the topology coordinator. This means that we should not change the node votership from the storage_service module for the raft topology directly. We can drop the voter status changes from the storage_service module because the topology coordinator will handle the votership changes eventually. The calls in the storage_service module were not essential and were only used for optimization (improving the HA under certain conditions). This has effect on the timing in the tablets migration test though, as it relied on the node being made non-voter from the service_storage `raft_removenode()` function. The fix is to add another server to the topology to make sure we will keep the quorum. Previously the test worked because the test waits for an injection to be reached and it was ensured that the injection (log line) has only been triggered after the node has been made non-voter from the `raft_removenode()`. This is not the case anymore. An alternative fix would be to wait for the first node to be made non-voter before stopping the second server, but this would make the test more complex (and it is not strictly required to only use 4 servers in the test, it has been only done for optimization purposes). Fixes: scylladb/scylladb#22860 Refs: scylladb/scylladb#18793 Refs: scylladb/scylladb#21969	2025-03-03 15:15:43 +01:00
Artsiom Mishuta	90106c6f19	test.py: skip test_incremental_read_repair[row-tombstone] skip test test_incremental_read_repair[row-tombstone] due to https://github.com/scylladb/scylladb/issues/21179 Closes scylladb/scylladb#23126	2025-03-03 15:26:28 +02:00
Kefu Chai	5571b537b5	tree: Make values mutable to enable move semantics Previously, variables were marked as const, causing std::move() calls to be redundant as reported by GCC warnings. This change either removes const qualifiers or marks related lambdas as mutable, allowing the compiler to properly utilize move constructors for better performance. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23066	2025-03-03 13:53:02 +03:00
Evgeniy Naydanov	cb0e0ebcf7	test.py: extract prepare dirs and S3 mock steps to test/conftest.py As a part of the moving to bare pytest we need to extract the required test environment preparation steps into pytest's hooks/fixtures. Do this for S3 mock stuff (MinioServer, MockS3Server, and S3ProxyServer) and for directories with test artifacts. For compatibility reason add --test-py-init CLI option for bare pytest test runner: need to add it to pytest command if you need test.py stuff in your tests (boost, topology, etc.) Also, postpone initialization of TestSuite.artifacts and TestSuite.hosts from import-time to runtime. Closes scylladb/scylladb#23087	2025-03-03 13:24:37 +03:00
Paweł Zakrzewski	9e7f79d1ab	cql3/select_statement: require LIMIT and PER PARTITION LIMIT to be strictly positive LIMIT and PER PARTITION LIMIT limit the number of rows returned or taken into consideration by a query. It makes no logical sense to have this value at less than 1. Cassandra also has this requirement. This patch ensures that the limit value is strictly positive and adds an explicit test for it - it was only tested in a test ported from Cassandra, that is disabled due to other issues. Closes scylladb/scylladb#23013	2025-03-03 08:13:27 +02:00
Tomasz Grabiec	0343235aa2	Merge 'tablets: repair: fix hosts and dcs filters behavior for tablet repair' from Aleksandra Martyniuk If hosts and/or dcs filters are specified for tablet repair and some replicas match these filters, choose the replica that will be the repair master according to round-robin principle (currently it's always the first replica). If hosts and/or dcs filters are specified for tablet repair and no replica matches these filters, the repair succeeds and the repair request is removed (currently an exception is thrown and tablet repair scheduler reschedules the repair forever). Fixes: https://github.com/scylladb/scylladb/issues/23100. Needs backport to 2025.1 that introduces hosts and dcs filters for tablet repair Closes scylladb/scylladb#23101 * github.com:scylladb/scylladb: test: add new cases to tablet_repair tests test: extract repiar check to function locator: add round-robin selection of filtered replicas locator: add tablet_task_info::selected_by_filters service: finish repair successfully if no matching replica found	2025-03-01 14:47:43 +01:00
Aleksandra Martyniuk	c7c6d820d7	test: add new cases to tablet_repair tests Add tests for tablet repair with host and dc filters that select one or no replica.	2025-02-28 13:03:04 +01:00
Aleksandra Martyniuk	c40eaa0577	test: extract repiar check to function	2025-02-28 13:01:10 +01:00
Botond Dénes	7ba29ec46c	reader_concurrency_semaphore: register_inactive_read(): handle aborted permit It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up. Fixes: scylladb/scylladb#22919	2025-02-28 01:32:46 -05:00
Botond Dénes	4d8eb02b8d	test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now() Unless the test in question actually wants to test timeouts. Timeouts will have more pronounced consequences soon and thus using db::timeout_clock::now() becomes a sure way to make tests flaky. To avoid this, use db::no_timeout in the tests that don't care about timeouts.	2025-02-28 01:31:33 -05:00
Artsiom Mishuta	cd5d34f9b7	test.py: fix failed_test collection after introducing the test.py subfolders support, test.py start creating weird log files like testlog/topology_custom.mv/tablets/test_mv_tablets.1 that affect failed test collection logic this commit fixes this and test.py logs as previously in testlog directory without any subfolders: topology_custom.mv_tablets_test_mv_tablets.1 Closes scylladb/scylladb#23009	2025-02-27 12:37:11 +03:00
Avi Kivity	3f05fa3a9b	test: lib: replace boost::generate with std equivalent Reduces dependencies on boost/range. Closes scylladb/scylladb#23034	2025-02-27 01:05:46 +01:00
Kefu Chai	6e4cb20a69	tree: implement boost::accumulate with std::ranges library Replace boost::accumulate() calls with std::ranges facilities. This change reduces external dependencies and modernizes the codebase. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23062	2025-02-26 23:22:02 +02:00
Piotr Szymaniak	f887466c3f	alternator: Clean error handling on CreateTable without AttributeDefinitions If user fails to supply the AttributeDefinitions parameter when creating a table, Scylla used to fail on RAPIDJSON_ASSERT. Now it calls a polite exception, which is fully in-line with what DynamoDB does. The commit supplies also a new, relevant test routine. Fixes #23043 Closes scylladb/scylladb#23041	2025-02-26 14:24:57 +02:00
Botond Dénes	5d63ef4d15	Merge 'scylla sstable: Add standard extensions and propagate to schema load ' from Calle Wilund Fixes #22314 Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them. Bundles together the setup of "always on" schema extensions into a single call, and uses this from the three (3) init points. Could have opted for static reg via `configurables`, but since we are moving to a single code base, the need for this is going away, hence explicit init seems more in line. Closes scylladb/scylladb#22327 * github.com:scylladb/scylladb: tools: Add standard extensions and propagate to schema load cql_test_env: Use add all extensions instead of inidividually main: Move extensions adding to function tomstone_gc: Make validate work for tools	2025-02-26 13:52:47 +02:00
Kefu Chai	6e4df57f97	mutation,test: replace boost::equal with std::ranges::equal to reduce third-party dependencies and modernize the codebase. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22999	2025-02-26 14:27:42 +03:00
Wojciech Mitros	6bc445b841	test: increase timeout for adding a server in test_mv_topology_change Currently, when we add servers to the cluster in the test, we use a 60s timeout which proved to be not enough in one of the debug runs. There is no reason for this test to use a shorter timeout than all the other tests, so in this patch we reset it to the higher default. Fixes https://github.com/scylladb/scylladb/issues/23047 Closes scylladb/scylladb#23048	2025-02-26 10:18:05 +02:00
Pavel Emelyanov	eff61b167c	treewide: Reduce db/config.hh header fanout Drop it from files that obviously don't need it. Also kill some forward declarations while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#22979	2025-02-25 15:16:40 +01:00
Piotr Dulikowski	43ae3ab703	test: test_mv_topology_change: increase timeout for removenode The test `test_mv_topology_change` is a regression test for scylladb/scylladb#19529. The problem was that CL=ANY writes issued when all replicas were down would be kept in memory until the timeout. In particular, MV updates are CL=ANY writes and have a 5 minute timeout. When doing topology operations for vnodes or when migrating tablet replicas, the cluster goes through stages where the replica sets for writes undergo changes, and the writes started with the old replica set need to be drained first. Because of the aforementioned MV updates, the removenode operation could be delayed by 5 minutes or more. Therefore, the `test_mv_topology_change` test uses a short timeout for the removenode operation, i.e. 30s. Apparently, this is too low for the debug mode and the test has been observed to time out even though the removenode operation is progressing fine. Increase the timeout to 60s. This is the lowest timeout for the removenode operation that we currently use among the in-repo tests, and is lower than 5 minutes so the test will still serve its purpose. Fixes: scylladb/scylladb#22953 Closes scylladb/scylladb#22958	2025-02-25 17:00:36 +03:00
Evgeniy Naydanov	e572771f36	test.py: refactor test.py: move test suites classes into pylib Split huge test.py into smaller pieces: test.pylib.suite.* Closes scylladb/scylladb#23005	2025-02-25 14:35:29 +03:00
Avi Kivity	6e70e69246	test/lib: mutation_assertions: deinline While generally better to reduce inline code, here we get rid of the clustering_interval_set.hh dependency, which in turns depends on boost interval_set, a large dependency. incremental_compaction_test.cc is adjusted for a missing header. Closes scylladb/scylladb#22957	2025-02-25 11:40:54 +01:00
Kefu Chai	9fdbe0e74b	tree: Remove unused boost headers This commit eliminates unused boost header includes from the tree. Removing these unnecessary includes reduces dependencies on the external Boost.Adapters library, leading to faster compile times and a slightly cleaner codebase. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22997	2025-02-25 10:32:32 +03:00
Avi Kivity	d99df7af6c	Merge 'Respect per-shard tablet goal and 10x default per-shard tablet count' from Tomasz Grabiec This series achieves two things: 1) changes default number of tablet replicas per shard to be 10 in order to reduce load imbalance between shards This will result in new tables having at least 10 tablet replicas per shard by default. We want this to reduce tablet load imbalance due to differences in tablet count per shard, where some shards have 1 tablet and some shards have 2 tablets. With higher tablet count per shard, this difference-by-one is less relevant. Fixes https://github.com/scylladb/scylladb/issues/21967 2) introduces a global goal for tablet replica count per shard and adds logic to tablet scheduler to respect it by controlling per-table tablet count The per-shard goal is enforced by controlling average per-shard tablet replica count in a given DC, which is controlled by per-table tablet count. This is effective in respecting the limit on individual shards as long as tablet replicas are distributed evenly between shards. There is no attempt to move tablets around in order to enforce limits on individual shards in case of imbalance between shards. If the average per-shard tablet count exceeds the limit, all tables which contribute to it (have replicas in the DC) are scaled down by the same factor. Due to rounding up to the nearest power of 2, we may overshoot the per-shard goal by at most a factor of 2. The scaling is applied after computing desired tablet count due to all other factors: per-table tablet count hints, defaults, average tablet size. If different DCs want different scale factors of a given table, the lowest scale factor is chosen for a given table. When creating a new table, its tablet count is determined by tablet scheduler using the scheduler logic, as if the table was already created. So any scaling due to per-shard tablet count goal is reflected immediately when creating a table. It may however still take some time for the system to shrink existing tables. We don't reject requests to create new tables. Fixes #21458 Closes scylladb/scylladb#22522 * github.com:scylladb/scylladb: config, tablets: Allow tablets_initial_scale_factor to be a fraction test: tablets_test: Test scaling when creating lots of tables test: tablets_test: Test tablet count changes on per-table option and config changes test: tablets_test: Add support for auto-split mode test: cql_test_env: Expose db config config: Make tablets_initial_scale_factor live-updateable tablets: load_balancer: Pick initial_scale_factor from config tablets, load_balancer: Fix and improve logging of resize decisions tablets, load_balancer: Log reason for target tablet count tablets: load_balancer: Move hints processing to tablet scheduler tablets: load_balancer: Scale down tablet count to respect per-shard tablet count goal tablets: Use scheduler's make_sizing_plan() to decide about tablet count of a new table tablets: load_balancer: Determine desired count from size separately from count from options tablets: load_balancer: Determine resize decision from target tablet count tablets: load_balancer: Allow splits even if table stats not available tablets: load_balancer: Extract make_sizing_plan() tablets: Add formatter for resize_decision::way_type tablets: load_balancer: Simplify resize_urgency_cmp() tablets: load_balancer: Keep config items as instance members locator: network_topology_strategy: Simplify calculate_initial_tablets_from_topology() tablets: Change the meaning of initial_scale to mean min-avg-tablets-per-shard tablets: Set default initial tablet count scale to 10 tablets: network_topology_stragy: Coroutinize calculate_initial_tablets_from_topology() tablets: load_balancer: Extract get_schema_and_rs() tablets: load_balancer: Drop test_mode	2025-02-24 17:59:26 +02:00
Łukasz Paszkowski	9ec1a457d6	alter_keyspace_statement: Include tablets information in system.topology Altering a keyspace (that has tablets enabled) without changing tablets attributes, i.e. no `AND tablets = {...}` results in incorrect "Update Keyspace..." log message being printed. The printed log contains "tablets={"enabled":false}". Refs https://github.com/scylladb/scylladb/issues/22261 Closes scylladb/scylladb#22324	2025-02-24 15:11:14 +02:00
Paweł Zakrzewski	854d2917a1	cql3/select_statement: reject PER PARTITION LIMIT with SELECT DISTINCT Before this patch we silently allowed and ignored PER PARTITION LIMIT. SELECT DISTINCT requires all the partition key columns, which means that setting PER PARTITION LIMIT is redundant - only one result will be returned from every partition anyway. Cassandra behaves the same way, so this patch also ensures compatibility. Fixes scylladb/scylladb#15109 Closes scylladb/scylladb#22950	2025-02-24 14:50:18 +02:00
Kefu Chai	dfa40972bb	topology_custom/test_zero_token_nodes_multidc: Enhance test logging and error handling Add verbose logging to identify failing test combinations in multi-DC setup: - Log replication factor (RF) and consistency level (CL) for each test iteration - Add validation checks for empty result sets Improve error handling: - Before indexing in a list, use `assert` to check for its emptiness - Use assertion failures instead of exceptions for clearer test diagnostics This change helps debug test failures by showing which RF/CL combinations cause inconsistent results between zero-token and regular nodes. Refs scylladb/scylladb#22967 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22968	2025-02-24 11:09:51 +01:00
Patryk Jędrzejczak	de751cad03	Merge 'test/topology_experimental_raft: add test_topology_upgrade_stuck' from Piotr Dulikowski The test simulates the cluster getting stuck during upgrade to raft topology due to majority loss, and then verifies that it's possible to get out of the situation by performing recovery and redoing the upgrade. Fixes: #17410 Closes scylladb/scylladb#17675 * https://github.com/scylladb/scylladb: test/topology_experimental_raft: add test_topology_upgrade_stuck test.py: bump minimum python version to 3.11 test.py: move gather_safely to pylib utils cdc: generation: don't capture token metadata when retrying update test.py: topology: ignore hosts when waiting for group0 consistency raft: add error injection that drops append_entries topology_coordinator: add injection which makes upgrade get stuck	2025-02-24 11:02:32 +01:00
Evgeniy Naydanov	99be9ac8d8	test.py: test_random_failures: improve handling of hung node In some cases the paused/unpaused node can hang not after 30s timeout. This make the test flaky. Change the condition to always check the coordinator's log if there is a hung node. Add `stop_after_streaming` to the list of error injections which can cause a node's hang. Also add a wait for a new coordinator election in cluster events which cause such elections. Closes scylladb/scylladb#22825	2025-02-24 10:23:05 +03:00
Kefu Chai	a80d7e6159	test/pylib: test/pylib: Simplify boolean logic in pagination check Replace complex boolean expression: ```py not driver_response_future.has_more_pages or not all_pages ``` with clearer equivalent: ```py driver_response_future.has_more_pages and all_pages ``` The new expression is more intuitive as it directly checks for both conditions (having more pages and wanting all pages) rather than using double negation. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22969	2025-02-21 14:21:09 +03:00

1 2 3 4 5 ...

8428 Commits