scylladb

Author	SHA1	Message	Date
Avi Kivity	94c21e5c05	Merge 'sstables: Reduce amount of I/O for clustering-key-bounded reads from large partitions' from Tomasz Grabiec Single-row reads from large partition issue 64 KiB reads to the data file, which is equal to the default span of the promoted index block in the data file. If users would want to increase selectivity of the index to speed up single-row reads, this won't be effective. The reason is that the reader uses promoted index to look up the start position in the data file of the read, but end position will in practice extend to the next partition, and amount of I/O will be determined by the underlying file input stream implementation and its read-ahead heuristics. By default, that results in at least 2 IOs 32KB each. There is already infrastructure to lookup end position based on upper bound of the read, in anticipation for sharing the promoted index cache, but it's not effective becasue it's a non-populating lookup and the upper bound cursor has its own private cached_promoted_index, which is cold when positions are computed. It's non-populating on purpose, to avoid extra index file IO to read upper bound. In case upper bound is far-enough from the lower bound, this will only increase the cost of the read. The solution employed here is to warm up the lower bound cursor's cache before positions are computed, and use that cursor for non-populating lookup of the upper bound. We use the lower bound cursor and the slice's lower bound so that we read the same blocks as later lower-bound slicing would, so that we don't incur extra IO for cases where looking up upper bound is not worth it, that is when upper bound is far from the lower bound. If upper bound is near lower bound, then warming up using lower bound will populate cached_promoted_index with blocks which will allow us to locate the upper bound block accurately. This is especially important for single-row reads, where the bounds are around the same key. In this case we want to read the data file range which belongs to a single promoted index block. It doesn't matter that the upper bound is not exactly the same. They both will likely lie in the same block, and if not, binary search will bring adjacent blocks into cache. Even if upper bound is not near, the binary search will populate the cache with blocks which can be used to narrow down the data file range somewhat. Fixes #10030. The change was tested with perf-fast-forward. I populated the data set with `column_index_size_in_kb` set to 1 scylla perf-fast-forward --populate --run-tests=large-partition-slicing --column-index-size-in-kb=1 Test run: build/release/scylla perf-fast-forward --run-tests=large-partition-select-few-rows -c1 --keep-cache-across-test-cases --test-case-duration=0 This test issues two reads of subsequent keys from the middle of a large partition (1M rows in total). The first read will miss in the index file page cache, the second read will hit. Notice that before the change, the second read issued 2 aio requests worth of 64KiB in total. After the change, the second read issued 1 aio worth of 2 KiB. That's because promoted index block is larger than 1 KiB. I verified using logging that the data file range matches a single promoted index block. Also, the first read which misses in cache is still faster after the change. Before: ``` running: large-partition-select-few-rows on dataset large-part-ds1 Testing selecting few rows from a large partition: stride rows time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu 500000 1 0.009802 1 1 102 0 102 102 21.0 21 196 2 1 0 1 1 0 0 0 568 269 4716050 53.4% 500001 1 0.000321 1 1 3113 0 3113 3113 2.0 2 64 1 0 1 0 0 0 0 0 116 26 555110 45.0% ``` After: ``` running: large-partition-select-few-rows on dataset large-part-ds1 Testing selecting few rows from a large partition: stride rows time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu 500000 1 0.009609 1 1 104 0 104 104 20.0 20 137 2 1 0 1 1 0 0 0 561 268 4633407 43.1% 500001 1 0.000217 1 1 4602 0 4602 4602 1.0 1 2 1 0 1 0 0 0 0 0 110 26 313882 64.1% ``` Backports: none, not a regression Closes scylladb/scylladb#20522 * github.com:scylladb/scylladb: perf: perf_fast_forward: Add test case for querying missing rows perf-fast-forward: Allow overriding promoted index block size perf-fast-forward: Test subsequent key reads from the middle in test_large_partition_select_few_rows perf-fast-forward: Allow adding key offset in test_large_partition_select_few_rows perf-fast-forward: Use single-partition reads in test_large_partition_select_few_rows sstables: bsearch_clustered_cursor: Add more tracing points sstables: reader: Log data file range sstables: bsearch_clustered_cursor: Unify skip_info logging sstables: bsearch_clustered_cursor: Narrow down range using "end" position of the block sstables: bsearch_clustered_cursor: Skip even to the first block test: sstables: sstable_3_x_test: Improve failure message sstables: mx: writer: Never include partition_end marker in promoted index block width sstables: Reduce amount of I/O for clustering-key-bounded reads from large partitions sstables: clustered_cursor: Track current block	2024-10-28 21:13:23 +02:00
Kamil Braun	101c1d50f0	Merge 'fix nodetool status to show zero-token nodes' from Abhinav Kumar Jha In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes. This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API and adding appropriate logic in scylla-nodetool.cc to support zero token nodes. A test is also added in nodetool/test_status.py to verify this logic. This test fails without this commit’s zero token node support logic, hence verifying the behavior. This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only to 6.2 version, since earlier versions don't support zero token nodes. Fixes: scylladb/scylladb#19849 Fixes: scylladb/scylladb#17857 Closes scylladb/scylladb#20909 * github.com:scylladb/scylladb: fix nodetool status to show zero-token nodes test: move `wait_for_first_completed` to pylib/util.py token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes	2024-10-28 12:19:36 +01:00
Kefu Chai	9f8adcd207	backup_task: track the first failure uploading sstables before this change, we only record the exception returned by `upload_file()`, and rethrow the exception. but the exception thrown by `update_file()` not populated to its caller. instead, the exceptional future is ignored on pupose -- we need to perform the uploads in parallel. this is why the task is not marked fail even if some of the uploads performed by it fail. in this change, we - coroutinize `backup_task_impl::do_backup()`. strictly speaking, this is not necessary to populate the exception. but, in order to ensure that the possible exception is captured before the gate is closed, and to reduce the intentation, the teardown steps are performed explicitly. - in addition to note down the exception in the logging message, we also store it in a local variable, which it rethrown before this function returns. Fixes scylladb/scylladb#21248 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21254	2024-10-28 12:54:27 +03:00
Aleksandra Martyniuk	85d9565158	test: repair: drop log checks from test_repair_succeeds_with_unitialized_bm Currently, test_repair_succeeds_with_unitialized_bm checks whether repair finishes successfully and the error is properly handled if batchlog_manager isn't initialized. Error handling depends on logs, making the test fragile to external conditions and flaky. Drop the error handling check, successful repair is a sufficient passing condition. Fixes: #21167. Closes scylladb/scylladb#21208	2024-10-28 08:39:16 +02:00
Botond Dénes	be70755f47	Merge 'repair: Fix finished ranges metrics for removenode' from Asias He The skipped ranges should be multiplied by the number of tables Otherwise the finished ranges ratio will not reach 100%. Fixes #21174 Closes scylladb/scylladb#21252 * github.com:scylladb/scylladb: test: Add test_node_ops_metrics.py repair: Make the ranges more consistent in the log repair: Fix finished ranges metrics for removenode	2024-10-28 08:09:32 +02:00
Asias He	9868ccbac0	test: Add test_node_ops_metrics.py It tests the node_ops_metrics_done metric reaches 100% when a node ops is done. Refs: #21174	2024-10-28 08:45:37 +08:00
Kefu Chai	24d14b601b	treewide: s/boost::adaptors::map_values/std::views::values/ now that we are allowed to use C++23. we now have the luxury of using `std::views::values`. in this change, we: - replace `boost::adaptors::map_values` with `std::views::values` - update affected code to work with `std::views::values` - the places where we use `boost::join()` are not changed, because we cannot use `std::views::concat` yet. this helper is only available in C++26. to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21265	2024-10-27 21:32:45 +02:00
Avi Kivity	3124711fc4	Merge 'Report rows_merged in compaction_history rest api and nodetool' from Łukasz Paszkowski Currently, running the `nodetool compactionhistory` command or using the rest api `curl -X GET --header "Accept: application/json" "http://localhost:10000/compaction_manager/compaction_history"` return compaction history without the `row_merged` field. The series computes rows merged during compaction and provides this information to users via both the nodetool command and the rest api. The `rows_merged` field contains information on merged clustering keys across multiple sstable files. For instance, compacting two sstables of a table consisting of 7 rows where two rows are part of the both sstables, the output would have the following format: {1: 5, 2: 2}. No backport is required. It extends the existing compaction history output. Fixes https://github.com/scylladb/scylladb/issues/666 Closes scylladb/scylladb#20481 * github.com:scylladb/scylladb: test/rest_api: Add tests for compactionhistory nodetool: Add rows merged stats into compactionhistory output compaction: Update compaction history with collected histogram compaction: Remove const qualifier from methods creating sstable readers sstable_set: Add optional statistics to make_local_shard_sstable_reader make_combined_reader: Add optional parameter, combined_reader_statistics reader_selector: Extend with maximum reader count mutation_fragment_merger: Create histogram while consuming mutation fragment batches	2024-10-27 21:26:11 +02:00
Nadav Har'El	6fdd0ebd3b	RBAC: confirm that unprivileged users can't read the roles table A worry was raised that an unprivileged user might be able to read the system.roles table - which contains the Alternator secret keys (and also CQL's hashed passwords). This patch adds tests that show that this worry is unjustified - and acts as a regression test to ensure it never becomes justified. The tests show that an unprivileged user cannot read the system.roles table using either CQL or Alternator APIs. More specifically, the two tests in this patch demonstrate that: * The Alternator API does not allow an unprivileged user to read ANY system table, unless explicitly granted permissions for that table. * The CQL API whitelists (see service::client_state::has_access) specific system tables - e.g., system_schema.tables - that are made readable to any unprivileged user. But the system.auth table is NOT whitelisted in this way - and is unreadable to unprivileged users unless explicitly granted permissions on that table. The new tests passes on both Scylla and Casssandra. Refs #5206 (that issue is about removing the Alternator secret keys from the roles table - but stealing CQL salted hashes is still pretty bad, so it's good to know that unprivileged users can't read them). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21215	2024-10-27 21:09:38 +02:00
Nadav Har'El	1634a64ffd	cql-pytest: test a few small materialized views CQL issues While documenting materialized view in a new document (Refs #16569) I encountered a few questions on how various CQL operations work on a table that has views, and this patch contains tests that clarify their answer - and can later guarantee that the answer doesn't unintentionally change in the future. The questions that these tests answer are: 1. That TRUNCATE on a base table also TRUNCATEs its views. This is just a basic test, with no attempt to reproduce issue #17635 (which is about the truncation of the base and views not being atomic). 2. That DROP TABLE is not allowed on a base table that has views. 3. That DROP KEYSPACE is allowed, even if there are tables with views. 4. Test that ALTER TABLE tbl DROP is never allowed in Cassandra, but allowed in some cases by Scylla 5. Test that ALTER TABLE tbl ADD is allowed, and "SELECT *" expands to select the new column into the materialized view as well. All the new tests pass on both Scylla and Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21142	2024-10-27 21:08:28 +02:00
Avi Kivity	7ffbfe8bb3	Merge 'Squash some sstables::test helpers' from Pavel Emelyanov There's a `missing_summary_first_last_sane` test case that uses some very specific way of modifying an sstable -- it loads one from resources, then tries to "write" the loaded stuff elsewhere. For that it uses a special purpose test::store() helper and a bunch of auxiliary ones from the same class. Those aux helpers are not used anywhere else and are also very special for this test case, so it make sense to keep this whole functionality in a single helper. Closes scylladb/scylladb#21255 * github.com:scylladb/scylladb: test: Squash test::change_generation_number() into test::store() test: Squash test::change_dir() into test::store() test: Coroutinize sstables::test::store()	2024-10-27 19:59:59 +02:00
Paweł Zakrzewski	b077685fec	test/cql-pytest: GROUP BY with static columns This commit adds a new test case 'test_group_by_static_column_and_tombstones' to verify the behavior of GROUP BY queries with static columns. The test is adapted from Cassandra's test suite and aims to reproduce issue #21267. Original, larger test: cassandra_tests/validation/operations/select_group_by_test.py::testGroupByWithPaging() Closes scylladb/scylladb#21270	2024-10-27 14:45:53 +02:00
Abhinav	c00d40b239	fix nodetool status to show zero-token nodes In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes. This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API and adding appropriate logic in scylla-nodetool.cc to support zero token nodes. Robust topology tests are added, which spins up scylla nodes and confirm nodetool status output for various cases, providing good coverage. A test is also added in nodetool/test_status.py to verify this logic. These tests fail without this commit’s zero token node support logic, hence verifying the behavior. The test `test_status_keyspace_joining_node` has been removed. This test is based on case where host_id=None, which is impossible. Since we now use host_id_map for node discovery in nodetool, the nodes with "host_id=None" go undetected. Since this case is anyway impossible, we can get rid of this. This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only to 6.2 version, since earlier versions dont support zero token nodes. Fixes: scylladb/scylladb#19849	2024-10-25 13:28:09 +05:30
Abhinav	39dfd2d7ac	test: move `wait_for_first_completed` to pylib/util.py This function is needed in a new test added in the next commit and this refactoring avoids code duplication.	2024-10-25 13:26:42 +05:30
Pavel Emelyanov	7595ef7303	test: Squash test::change_generation_number() into test::store() No other usages of the former helper other than immediatelly followed by the latter, no point in keepint it around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-24 11:29:17 +03:00
Pavel Emelyanov	e885b0e6cd	test: Squash test::change_dir() into test::store() No other usages of the former helper other than immediatelly followed by the latter, no point in keepint it around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-24 11:28:39 +03:00
Pavel Emelyanov	874cf2ea6f	test: Coroutinize sstables::test::store() Ahead of future changes Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-24 11:28:07 +03:00
Michał Jadwiszczak	68d0c9a18a	test/auth_cluster/test_raft_service_levels: match enterprise SL limit Despite OSS doesn't limit number of created service levels, match the enterprise limit to decrease divergence in the test between OSS and enterprise. Fixes scylladb/scylladb#21044 Closes scylladb/scylladb#21045	2024-10-23 17:44:19 +02:00
Dawid Mędrek	298cafff35	cql-pytest/test_describe: Introduce auxiliary type for service levels We introduce an auxiliary type representing a service level for making it easier to adjust the tests in Enterprise. We move the responsibility of producing create statements for service levels to the class, so we only need to modify the code in one place when necessary. All existing relevant tests have been adjusted to this change. Closes scylladb/scylladb#21230	2024-10-23 10:15:25 +02:00
Kamil Braun	f5c60e538d	Merge 'cql/tablets: fix retrying ALTER tablets KEYSPACE' from Piotr Smaron ALTER tablets-enabled KEYSPACES (KS) may fail due to `group0_concurrent_modification`, in which case it's repeated by a `for` loop surrounding the code. But because raft's `add_entry` consumes the raft's guard (by `std::move`'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. Note: refactor is implemented in the follow-up commit. Fixes: scylladb/scylladb#21102 Should be backported to every 6.x branch, as it may lead to a crash. Closes scylladb/scylladb#21121 * github.com:scylladb/scylladb: test: add UT to test retrying ALTER tablets KEYSPACE cql/tablets: fix indentation in `rf_change` event handler cql/tablets: fix retrying ALTER tablets KEYSPACE	2024-10-23 10:01:21 +02:00
Botond Dénes	519e167611	Merge 'replica/table: check memtable before discarding tombstone during read' from Lakshmi Narayanan Sreethar On the read path, the compacting reader is applied only to the sstable reader. This can cause an expired tombstone from an sstable to be purged from the request before it has a chance to merge with deleted data in the memtable leading to data resurrection. Fix this by checking the memtables before deciding to purge tombstones from the request on the read path. A tombstone will not be purged if a key exists in any of the table's memtables with a minimum live timestamp that is lower than the maximum purgeable timestamp. Fixes #20916 `perf-simple-query` stats before and after this fix : `build/Dev/scylla perf-simple-query --smp=1 --flush` : ``` // Before this Fix // --------------- 94941.79 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59393 insns/op, 24029 cycles/op, 0 errors) 97551.14 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59376 insns/op, 23966 cycles/op, 0 errors) 96599.92 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59367 insns/op, 23998 cycles/op, 0 errors) 97774.91 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59370 insns/op, 23968 cycles/op, 0 errors) 97796.13 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59368 insns/op, 23947 cycles/op, 0 errors) throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79 instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02 cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19 // After this Fix // -------------- 95313.53 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59392 insns/op, 24058 cycles/op, 0 errors) 97311.48 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59375 insns/op, 24005 cycles/op, 0 errors) 98043.10 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59381 insns/op, 23941 cycles/op, 0 errors) 96750.31 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59396 insns/op, 24025 cycles/op, 0 errors) 93381.21 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59390 insns/op, 24097 cycles/op, 0 errors) throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21 instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73 cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22 ``` This PR fixes a regression introduced in `ce96b472d3` and should be backported to older versions. Closes scylladb/scylladb#20985 * github.com:scylladb/scylladb: topology-custom: add test to verify tombstone gc in read path replica/table: check memtable before discarding tombstone during read compaction_group: track maximum timestamp across all sstables	2024-10-23 10:28:00 +03:00
Botond Dénes	b9b778054a	Merge 'test.py: Add option to fail after number of failures' from Petr Hála * Add `--max-failures` flag to test.py, which will stop the execution after number of failures * Helps with "fails-fast" approach and can be used to improve CI speed, especially the 100times run * Adds the number of cancelled tests to both summary and junit xml. I did not include them in boost, since it does not contain any statistics. * Removes unnecessary list creation in test.py * Completely unrelated change, but it is small enough that I feel it can be included as part of this one. If this is an issue I can create separate PR for it * Add `Test.started` property * Helps with determining the current status of the Test and differentiating cancelled/not started tests. * Add `Test.failed` and `Test.did_not_run` read-only computed properties * Helper methods to determine status, instead of using `Test.success`, which does not tell the entire story * Fix `ScyllaClusterManager.stop()` method, so it doesn't fail when ran multiple times * This happens when tasks are cancelled, not sure yet why, it almost certainly non-wanted behaviour but this behaviour was already there and with this fix it no longer causes errors I will use backport/None for now as it is a new feature. Fixes https://github.com/scylladb/qa-tasks/issues/1714 Closes scylladb/scylladb#21098 * github.com:scylladb/scylladb: test.py: Add option to fail after number of failures test.py: Add started, failed and did_not_run properties to Test test.py: Remove unnecessary list creation test: lib: Fix ScyllaClusterManager.stop()	2024-10-23 09:11:52 +03:00
Nadav Har'El	5fd3177057	Merge 'mv: add a dedicated read concurrency semaphore for view update read before writes' from Wojciech Mitros When writing to some tables with materialized views, we need to read from the base table first to perform a delete of the old view row. When doing so, the memory used for the read is tracked by the user read concurrency semaphore. When we have a large number of such reads, we may use up all of the semaphore units, causing the following reads to be queued. When we have some user reads coming at the same time, these reads can have very high latency due to the write workload on the base table. We want to avoid this, so that the write workload doesn't have a high impact on the latency of the read workload. This is fixed in this patch by adding a separate read concurrency semaphore just for view update read-before-writes. With the new semaphore, even if there are many view update read-before-writes, they will be queued on a different semaphore than the user reads, and they won't impact their latency. The second issue fixed by this patch is the concurrency of the view updates that is currently unlimited. Because of that view updates may take up so much memory that they we may run out of memory. This is fixed by using the read admission on the view update concurrency semaphore. This limits the number of concurrent view update reads to max_count_concurrent_view_update_reads, all other incoming view update reads are queued using just a small chunk of memory. Without this, the reads would also get queued after exceeding view_update_reader_concurrency_semaphore_serialize_limit_multiplier, but they would take much more memory while staying in the queue. The new semaphore has half the capacity of the regular user read concurrency semahpore and is currently used only for user writes - is't used independently of the scheduling group on which we base the read semaphore selection, but we use a different code path for streaming (not database::do_apply) and we shouldn't have view updates in system writes or during compaction. This patch also adds a test to confirm that the view update workload doesn't impact the read latency, as well as a test which confirms that we do not run out of memory even under heavy view udpate workload. The issue of view updates causing increased latencies most often occurs in the following scenario: * we have a medium to high write workload to a table with a materialized view which requires reading from the base table before sending the update to delete the old rows * we have any read workload * one replica is slower or is handling more writes due to an imbalance of data distribution * we write with a cl<ALL, the mentioned replica is replying to write requests slower while new ones keep being sent to it. * each write performs a read first taking resources from the user read concurrency semaphore, so when enough writes accumulate the reads using the semaphore start getting queued * the queue is shared by regular reads and view update reads. When there's enough view update reads in the queue, regular reads start getting increased latencies An sct test (perf-regression-latency-mv-read-concurrency) was prepared to somewhat resemble this scenario: * the tables were prepared satisfying the conditions above * we use a medium write workload and a very low read workload * the imbalance is achieved by writing to just a few (10) partitions - some replicas (and shards) can have twice or more used partitions than others. We also keep writing to a limited (though high) number of rows, to cause overwrites which require reading before sending the view update * to minimize the test case, we use a cluster of 3 nodes and rf=2, we write with cl=ONE to have background replica writes and read with cl=ALL to wait for the slower replica to respond. In the test above: * without the fix, the latency of reads increases over 50s * with the fix, the latency of reads stays below 20ms Fixes https://github.com/scylladb/scylladb/issues/8873 Fixes https://github.com/scylladb/scylladb/issues/15805 The patch is not that small and it isn't fixing a regression, so no backports Closes scylladb/scylladb#20887 * github.com:scylladb/scylladb: test: add test for high view update concurrency causing bad_allocs test: add test for high view update concurrency degrading read latency mv: add a dedicated read concurrency semaphore for view update read before writes	2024-10-22 22:17:23 +03:00
Aleksandra Martyniuk	878a12c922	test: change quotation marks Before python 3.12 formatted strings couldn't have reused quotes. Change the type of quotation mark in get_cgroup so it could be used with earlier python versions. Closes scylladb/scylladb#21209	2024-10-22 20:42:05 +03:00
Piotr Smaron	522bede8ec	test: add UT to test retrying ALTER tablets KEYSPACE The newly added testcase is based on the already existing `test_alter_dropped_tablets_keyspace`. A new error injection is created, which stops the ALTER execution just before the changes are submitted to RAFT. In the meantime, a new schema change is performed using the 2nd node in the cluster, thus causing the 1st node to retry the ALTER statement.	2024-10-22 18:22:01 +02:00
pehala	16cd3fccdd	test: lib: Fix ScyllaClusterManager.stop() When cancelling running tasks, stop() could run multiple times and fail. Removed usage of del and added checks to ensure it won't crash.	2024-10-22 13:29:18 +02:00
Łukasz Paszkowski	34c05cb94f	test/rest_api: Add tests for compactionhistory For a table with NullCompactionStrategy and TimeWindowCompactionStrategy, the test - inserts a bunch of data and flushes the table - deletes/update some data, delete a range of data and flushes the table - Triggers a major compaction and calls for compactionhistory to retrieve and validate the histogram	2024-10-22 08:15:02 +02:00
Łukasz Paszkowski	8188a71787	nodetool: Add rows merged stats into compactionhistory output Incorporate rows merged statistics into the output of the compactionhistory command. Depending on the requested format type, the output has different form. For instance, compacting two sstables of a table consisting of 7 rows where two rows are part of the both sstables, the output would have the following format: text: {1: 5, 2: 2} json: [{"key":1,"value":5},{"key":2,"value":1}]} yaml: - key: 1 value: 5 - key: 2 value: 1	2024-10-22 08:15:02 +02:00
Łukasz Paszkowski	84912c3155	reader_selector: Extend with maximum reader count The maximum reader count allows to predict the number of readers that can be created with create_new_readers(). This helps to correctly allocate a vector size in the rows_merged statistics when a combiner reader is created via make_combined_reader.	2024-10-22 08:15:02 +02:00
Kefu Chai	ce0a86c585	build: cmake: correct some tests' KIND before this change, we build some tests as if they are Seastar tests. but after `415c83fa`, these tests failed to link. because the Seastar::seastar_testing does not expose `-DSEASTAR_TESTING_MAIN` in its cflags. the behavior of the Seastar::seastar_testing is expected. because a test linking against this library is not necessarily driven by the `main()` provided by `testing/seastar_test.hh`. so, in this change, we correct the `KIND` parameter of these tests, so that they use `KIND BOOST`, as these tests can be driven by the `main()` provided by Boost.Test's driver. also there are some tests driven by Boost.Test's `main()`, but in the meanwhile, they utilize seastar_testing, so let's add `Seastar::seastar_testing` to their `LIBRARIES`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21183	2024-10-22 07:10:47 +03:00
Kefu Chai	6ead5a4696	treewide: move log.hh into utils/log.hh the log.hh under the root of the tree was created keep the backward compatibility when seastar was extracted into a separate library. so log.hh should belong to `utils` directory, as it is based solely on seastar, and can be used all subsystems. in this change, we move log.hh into utils/log.hh to that it is more modularized. and this also improves the readability, when one see `#include "utils/log.hh"`, it is obvious that this source file needs the logging system, instead of its own log facility -- please note, we do have two other `log.hh` in the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 06:54:46 +03:00
Wojciech Mitros	4d719bacca	test: add test for high view update concurrency causing bad_allocs This commit add a test for checking whether a large view update workload can cause Scylla to run out of memory. In the test, we keep writing to a table table with a materialized view with a limited number of rows, causing overwrites which require reading from the table to perform view updates. Currently, due to the unlimited concurrency of view update reads, we may use too much memory which can lead to bad_allocs, causing Scylla to fail. To reach the failing state more consistently, we use add a sleep after reading the old value of the base row, to keep the reader concurrency semaphore units longer. At the same time, we use high concurrency and large row size to use up all Scylla's memory quickly. The test fails if Scylla runs out of memory and aborts, and succeeds otherwise.	2024-10-21 12:35:20 +02:00
Wojciech Mitros	f2c740710c	test: add test for high view update concurrency degrading read latency This commit add a test for checking whether a large view update workload impacts the latency of other user reads. In the test, we first create a table for reads and another table with a materialized view. We then start writing to the table with the view with a limited number of rows - when overwriting, we need to read the previous value of the row to prepare a delete of the old row in the view. This should not impact the latency of the read workload from the other table that we start at the same time. The test fails if any of the reads times out. To reach the failing state more consistantly, we use add a sleep after reading the old value of the base row, to keep the reader concurrency semaphore units longer. At the same time, we use a lower threshold for queueing reads on the semaphore, to see the impact of view update reads earlier. Because of the high load, the writes may timeout, but that's expected - we fail the test only if the user reads time out.	2024-10-21 12:34:55 +02:00
Kefu Chai	5cd619a60c	treewide: s/boost::adaptors::map_keys/std::views::keys/ now that we are allowed to use C++23. we now have the luxury of using `std::views::keys`. in this change, we: - replace `boost::adaptors::map_keys` with `std::views::keys` - update affected code to work with `std::views::keys` to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21198	2024-10-21 12:47:52 +03:00
Wojciech Mitros	242079d70b	mv: add a dedicated read concurrency semaphore for view update read before writes When writing to some tables with materialized views, we need to read from the base table first to perform a delete of the old view row. When doing so, the memory used for the read is tracked by the user read concurrency semaphore. When we have a large number of such reads, we may use up all of the semaphore units, causing the following reads to be queued. When we have some user reads coming at the same time, these reads can have very high latency due to the write workload on the base table. We want to avoid this, so that the write workload doesn't have a high impact on the latency of the read workload. This is fixed in this patch by adding a separate read concurrency semaphore just for view update read-before-writes. With the new semaphore, even if there are many view update read-before-writes, they will be queued on a different semaphore than the user reads, and they won't impact their latency. The second issue fixed by this patch is the concurrency of the view updates that is currently unlimited. Because of that view updates may take up so much memory that they we may run out of memory. This is fixed by using the read admission on the view update concurrency semaphore. This limits the number of concurrent view update reads to max_count_concurrent_view_update_reads, all other incoming view update reads are queued using just a small chunk of memory. Without this, the reads would also get queued after exceeding view_update_reader_concurrency_semaphore_serialize_limit_multiplier, but they would take much more memory while staying in the queue. The new semaphore has half the capacity of the regular user read concurrency semahpore and is currently used only for user writes - is't used independently of the scheduling group on which we base the read semaphore selection, but we use a different code path for streaming (not database::do_apply) and we shouldn't have view updates in system writes or during compaction. Fixes https://github.com/scylladb/scylladb/issues/8873 Fixes https://github.com/scylladb/scylladb/issues/15805	2024-10-21 11:02:06 +02:00
Avi Kivity	b5a1173880	utils: small_vector: support from_range_t std::ranges::to<>() has a little protocol with containers to allow them to optimize their construction from ranges. Implement it for small_vector. It optimizes ranges that can have their size determined quickly, or that can be traversed twice to determine the size by reserving up front. Single-pass ranges (std::ranges::input_range) use the less efficient push_back method. A unit test (which fails without the new constructor) is added. Closes scylladb/scylladb#21094	2024-10-21 09:31:38 +03:00
Avi Kivity	c3be2489ce	treewide: drop includes of <boost/range/adaptors.hpp> This includes way too much, including <boost/regex.hpp>, which is huge. Drop includes of adaptors.hpp and replace by what is needed. Closes scylladb/scylladb#21187	2024-10-20 17:17:11 +03:00
Kefu Chai	85518463a9	test/boost: stop using ranges::to() now that we are able to use ranges library provided by the C++ standard library. there is no need to use the homebrew `ranges::to()`. in this change, we switch to `std::ranges::to()` in favor of `ranges::to()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-19 13:21:20 +08:00
Kefu Chai	e2b18eb7eb	data_dictionary: compose the location with "/" in `787ea4b1`, we construct a new `storage_options` for each sstable to be restored. the `location` of the new `storage_option` instances is composed of the configured `prefix` and the dirname of each toc component. but instead of separating them with "/", we just concatenate them. this breaks the test if the specified key representing toc components includes "dirname" in them. in this change - data_directory: instead of using "{prefix}{dirname}", we use "{prefix}/{dirname}". - test/object_store: update the existing test to add a suffix in the keys of the toc objects to mimic the typical use case. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21170	2024-10-18 21:57:56 +03:00
Lakshmi Narayanan Sreethar	afad1b3c85	topology-custom: add test to verify tombstone gc in read path Co-authored-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-10-18 19:20:03 +05:30
Pavel Emelyanov	b11d50f591	Merge 'multishard reader: make it safe to create with admitted permits' from Botond Dénes Passing an admitted permit -- i.e. one with count resources on it -- to the multishard reader, will possibly result in a deadlock, because the permit of the multishard reader is destroyed after the permits of its child readers. Therefore its semaphore resources won't be automatically released until children acquire their own resources. This creates a dependency (an edge in the "resource allocation graph"), where the semaphore used by the multishard reader depends on the semaphores used by children. When such dependencies create a cycle, and permits are acquired by different reads in just the right order, a deadlock will happen. Users of the multishard reader have to be aware of this gotcha -- and of course they aren't. This is small wonder, considering that not even the documentation on the multishard reader mentions this problem. To work around this, the user has to call `reader_permit::release_base_resources()` on the permit, before passing it to the multishard reader. On multiple occasions, developers (including the very author of the multishard reader), forgot or didn't know about this and this resulted in deadlocks down the line. This is a design-flaw of the multishard reader, which is addressed in this PR, after which, it is safe to pass admitted or not admitted permits to the multishard reader, it will handle the call to `release_base_resources()` if needed. After fixing the problem in the multishard reader, the existing calls to `release_base_resources()` on permits passed to multishard readers are removed. A test is added which reproduces the problem and ensures we don't regress. Refs: https://github.com/scylladb/scylladb/issues/20885 (partial fix, there is another deadlock in that issue, which this PR doesn't fix) This fixes (indirectly) a regression introduced by `d98708013c` so it has to be backported to 6.2 Closes scylladb/scylladb#21058 * github.com:scylladb/scylladb: test/boost/mutation_test: add test for multishard permit safety test/lib/reader_lifecycle_policy: add semaphore factory to constructor test/lib/reader_lifecycle_policy: rename factory_function repair/row_level: drop now unneeded release_base_resource() calls readers/multishard: make multishard reader safe to create with admitted permits	2024-10-18 13:30:21 +03:00
Pavel Emelyanov	280cd23c13	Merge 'Allow specifying TLS options with internode_encryption=none + add "transitional" mode' from Calle Wilund Fixes #18903 Adds a "transitional" internode encryption mode, under which all _outgoing_ RPC connections will use TLS, but we will still accept any incoming non-tls connection. This allows an operator to perform a move to TLS RPC without cluster downtime: 1. For each server, add certificate etc options to server_encryption_options + internode_encryption=none + set ssl_storage_port + restart (rolling) 2. For each server, set internode_encryption=transitional + RR 3. For each server, set internode_encryption=all + RR Closes scylladb/scylladb#18939 * github.com:scylladb/scylladb: test::topology: Add test for TLS upgrade and downgrade of internode encryption docs: Add internode_encryption=transitional documentation messaging_service: Add "transitional" internode encryptipn mode messaging_service: Create TLS connector even if internode_enc=none when certs set	2024-10-18 11:01:07 +03:00
Botond Dénes	b6da82dba3	Merge 'build: build seastar as an external project' from Kefu Chai before this change, scylla's CMake-based system consumes Seastar library by including it directly. but this failed to address the needs of linking against Seastar shared libraries in Debug and Dev builds, while linking against the static libraries in other builds. because Seastar uses `BUILD_SHARED_LIBS` CMake variable to determine if it builds shared libraries. and we cannot assign different values to this CMake variable based on current configure type -- CMake does not support. see https://gitlab.kitware.com/cmake/cmake/-/issues/19467 in order to address this problem, we have a couple possible solutions: - to enable Seastar to build both shared and static libraries in a pass. without sacrificing the performance, we have to build all object files twice: once with -fPIC, once without. in order to accompolish this goal, we need to develop a machinary to populate the same settings to these two builds. this would complicate the design of Seastar's building system further. - to build Seastar libraries twice in scylla, we could use the ExternalProject module to implement this. but it'd be complicate to extract the compile options, and link options previously populated by Seastar's targets with CMake -- we would have to replicate all of them in scylla. this is out of the question. - to build Seastar libraries twice before building scylla, and let scylla to consume them using CMake config files or .pc files. this is a compromise. it enables scylla to drive the build of Seastar libraries and to consume the compile options and link options. the downside is: * the generated compilation database (compile_commands.json) does not include the commands building Seastar anymore. * the building system of scylla does not have finer graind control on the building process of seastar. for instance, we cannot specify the build dependency to a certain seastar library, and just build it instead of building the whole seastar project. turns out the last approach is the best one we can have at this moment. this is also the approach used by the existing `configure.py`. in this change, we - add FindSeastar.cmake to * detect the preconfigured Seastar builds, and * extract the build options from .pc files * expose library targets to be consumed by parent project - add Seastar as an external project, so we can build it from the parent project. this is atypical compared to standard ExternalProject usage: - Seastar's build system should already be configured at this point. - We maintain separate project variants for each configuration type. Benefits of this approach: - Allows the parent project to consume the compile options exposed by .pc file. as the compile options vary from one config to another. - Allows application of config-specific settings - Enables building Seastar within the parent project's build system - Facilitates linking of artifacts with the external project target, establishing proper dependencies between them we will update `configure.py` to merge the compilation database of scylla and seastar. Refs scylladb/scylladb#2717 --- this is a CMake-related change, hence no need to backport. Closes scylladb/scylladb#21131 * github.com:scylladb/scylladb: build: cmake: use GENERATOR_IS_MULTI_CONFIG property to detect mult-config build: cmake: consume Seastar using its .pc files build: do not use `mode` as the index into `modes` build: cmake: detect and link against GnuTLS library build: cmake: detect and link against yaml-cpp build: cmake: link Seastar with Seastar::<COMPONENT> build: cmake: define CMake generate helper funcs in scylla	2024-10-18 09:42:59 +03:00
Botond Dénes	568b767ec3	Merge 'schema: convert from boost ranges to std ranges' from Avi Kivity To reduce dependency load, change uses of boost ranges to std::ranges. The first patch is preparation, replacing a construct that isn't easy to support with std ranges with something simpler. No backport as this is a code cleanup. Closes scylladb/scylladb#21122 * github.com:scylladb/scylladb: schema: replace boost ranges with std ranges schema: precompute all_columns_in_select_order()	2024-10-18 08:42:50 +03:00
Pavel Emelyanov	df6991edd3	test: Do not duplicate sstable twice The statistics_rewrite test case copies an sstable from resources two times: - first time -- explicitly by listing resource components and copying files to the test temp dir - second time -- implicitly, by calling create_links() linking copied files by new set in the staging/ subdirectory The 2nd step is not needed and the history of changes justifies that. The test itself appeared with `70b793e4d3` and it only contained the 2nd "copying" -- test linked files from resource directory and then worked in the newly created set. Later, commit `59c57861ae` added the first step and copied the files from resource into test temp dir. At this point linking copied files because pointless, but was preserved. Let's remove it now. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21097	2024-10-18 08:31:08 +03:00
Kefu Chai	2e4be56112	build: cmake: link Seastar with Seastar::<COMPONENT> before this change, we link against the targets defined in Seastar's source tree. but these targets are not part of Seastar's public interface -- they are not exposed by Seastar's CMake config files. so, let link against the target names qualified by the library module name. this also prepares for the transition to using Seastar without including it directly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-18 08:36:52 +08:00
Botond Dénes	e1d8cddd09	test/boost/mutation_test: add test for multishard permit safety Add a test checking that the multishard reader will not deadlock, when created with an admitted permit, on a semaphore with a single count resource.	2024-10-17 08:47:50 -04:00
Botond Dénes	5a3fd69374	test/lib/reader_lifecycle_policy: add semaphore factory to constructor Allowing callers to specify how the semaphore is created and stopped, instead of doing so via boolean flags like it is done currently. This method doesn't scale, so use a factory instead.	2024-10-17 08:47:50 -04:00
Botond Dénes	c8598e21e8	test/lib/reader_lifecycle_policy: rename factory_function To reader_factor_function. We are about to add a new factory function parameters, so the current factory_function has to be renamed to something more specific.	2024-10-17 08:47:50 -04:00
Raphael S. Carvalho	f3ab5e1f1e	tests: Fix perf test for load balancer Broken after introduction of zero-token nodes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#21156	2024-10-17 14:02:31 +02:00

1 2 3 4 5 ...

7745 Commits