scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Nadav Har'El	b4e3d4ac2f	alternator: nicer error message for integer overflow in list index In the DynamoDB API, when "a" is a list attribute, a[999] returns the 1000th element. But if the list isn't that long (e.g., it only has 5 elements), a[999] returns nothing - it's not an error. But it turns out that when the index is so long that it can't even be parsed as an integer, e.g., 99999999999999, DynamoDB does report an error: Invalid ProjectionExpression: List index is not within the allowable range; index: [99999999999999] Before this patch, Alternator also returned an error in this case, with the right type (ValidationException), but with a strange low-level error text: Failed parsing ProjectionExpression 'a[99999999999999]': std::out_of_range (stoi) The problem was that the code (in alternator/expressions.g) ran stoi() without converting its std::out_of_range exception to a better user-facing message. We do this in this patch, and the error message now looks like: Failed parsing ProjectionExpression 'a[99999999999999]': list index out of integer range This patch also includes a test reproducing this error, which passes on DynamDB and on Alternator it fails before this patch and passes with the patch. Fixes #25947 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25951	2025-09-15 08:43:00 +03:00
Nadav Har'El	208d3986a7	alternator: add explanation of internal tags Alternator needs to store a few pieces of information for each table that it can't store in the existing CQL schema. We decided to store this information in hidden tags - tags named with the prefix "system:" - and we already have four of those: Provisioned RCU and WCU, table creation time, and TTL's expiration-time attribute. This patch moves the definition of all four tags to one place in executor.cc, adds a short comment about the content of each tag, and adds a longer comment explaining why we have these hidden tags at all. It is expected that more hidden tags will follow - e.g., to solve issue #5320. So we expect more tags to be added later in the same place in the code. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25980	2025-09-15 08:41:39 +03:00
Emil Maskovsky	99db980899	gossiper: eliminate duplicate code in do_shadow_round Remove a redundant code block inadvertently introduced in commit `4b3d160f34`. While the duplicate did not affect functionality, its presence could cause confusion and maintenance issues. This change does not alter behavior and is purely a cleanup. Fixes: scylladb/scylladb#25999 Backport: The issue exists in all 2025 branches, so it should be backported accordingly. Closes scylladb/scylladb#26001	2025-09-15 08:35:04 +03:00
Jenkins Promoter	c63b335819	Update pgo profiles - aarch64	2025-09-15 05:17:07 +03:00
Jenkins Promoter	e97a0c8b42	Update pgo profiles - x86_64	2025-09-14 21:23:37 -04:00
Aleksandra Martyniuk	75b772adfb	db: optimize cache invalidation following repair/streaming Currently, if a new sstable is created during repair/streaming, we invalidate its whole token range in cache. If the sstable is sparse, we unnecessarily clear too much data. Modify cache invalidation, so that only the partitions present in the sstable are cleared. To check whether a partition is present in the sstable, we use bloom filters. Bloom filters may return false positives and show that an sstable contains a partition, even though it does not. Due to that we may invalidate a bit more than we need to, but the cache will be in valid state. An issue arises when we do not invalidate two consecutive partitions that are continuous. The sstable may contain a token that falls between these partitions, breaking the continuity. To check that, we would need to scan sstable index. However, such a change would noticeably complicate the invalidation, both performance and code. In this change, sstable index reader isn't used. Instead, the continuity flag is unset for all scanned partitions. This comes at a cost of heavier reads, as we will need to verify continuity when reading more than one partition from cache. Fixes: https://github.com/scylladb/scylladb/issues/9136. Closes scylladb/scylladb#25996	2025-09-14 19:48:14 +03:00
Lakshmi Narayanan Sreethar	1d1e572962	sstables: skip bloom filter rebuilds with minimal savings If a bloom filter was built with a bad partition estimate, it is rebuilt right before the sstable is sealed. The rebuild is already skipped if the current bitset size results in a false-positive rate within 75%–125% of the configured value. This patch adds additional conditions to prevent rebuilds when the savings are minimal. It also skips rebuilding for garbage collected sstables, since they will be dropped soon anyway. Also updated and added more test cases to cover these new criteria for bloom filter rebuilds. Fixes #25464 Fixes #25468 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#25968	2025-09-14 18:19:50 +03:00
Nadav Har'El	5307d1b9a8	Merge 'vector_index: add version to index options' from Dawid Pawlik Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lacked the information about the version of the index. The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`. It requires few changes and seems unintruitive for existing infrastructure. This patch implements the solution described above. Refs: VECTOR-142 Closes scylladb/scylladb#25614 * github.com:scylladb/scylladb: cqlpy/test_vector_index: add vector index version test vector_index, index_prop_defs: add version to index options create_index_statement: rename `validator` to `custom_index_factory` custom index: rename `custom_index_option_name` vector_index: rename `supported_options` to `vector_index_options`	2025-09-14 15:35:53 +03:00
Radosław Cybulski	30306c3375	Remove const & from tags_extension constructor `tags_extension` constructor unnecesarily takes `std::map` by const ref, forcing a copy. This patch removes const ref for performance reasons. Closes scylladb/scylladb#25977	2025-09-14 13:32:21 +03:00
Ernest Zaslavsky	44d34663bc	treewide: seastar module update and fix broken rest client start using `write_body` in `rest/client` to properly set headers due to changes applied to seastar's http client Seastar module update ``` b6be384e Merge 'http: generalize Content-Type setting' from Nadav Har'El 74472298 http: generalize request's Content-Type setting 9fd5a1cc http: generalize reply's Content-Type setting a2665f38 memory: Remove deprecated enable_abort_on_allocation_failure() d2a5a8a9 resource.cc: Remove some dead code 7ad9f424 http: Add support of multiple key repetitions for the request a636baca task: Move task::get_backtrace() definition in its class a0101efa Fixed "doxygen" spelling in error message db969482 Merge 'http/reply: introduce set_cookie()' from Botond Dénes 5357b434 http/reply: introduce set_cookie() 1ddcf05f http/reply: make write_reply*() public 4b782d73 http/connection: start_response(): fix indentation 720feca0 http/reply: encapsulate reply writing in write_reply() 3e19917d Merge 'exceptions: log thrown and propagated exception with distinct log levels' from Botond Dénes db9aea93 Merge 'Correctly wrap up abandoned yielding directory lister' from Pavel Emelyanov dbb2bf3f test: Add test for input_stream::read_exactly() a5308ec9 file/directory_lister: Correctly wrap up fallback generator 4f0811f4 file/directory_lister: Convert on-stack queue to shared pointer 59801da7 tests: Add directory lister early drop cases 33233032 http/reply: s/write_reply_to_connection/write_reply/ 69b93620 http/reply: write_reply_{to_connection,headers}(): pass output stream 56e9bda7 test: Convert directory_test into seastar test 96782358 Merge 'Improve io_tester's seqwrite and append workloads' from Pavel Emelyanov 8b46e3d4 SEASTAR_ASSERT: assert to stderr and flush stream 3370e22a tutorial.md: use current_exception_as_future() e977453a Add fixture support for seastar::testing 3e70d7f7 io_tester: Do not set append_is_unlikely unconditionally 2a4ae7b4 io_tester: Count file size overflows 5e678bb5 io_tester: Tuneup size overflow check d5dad8ce io_tester: Move position management code to io_class_data 5586a056 io_tester: Rename seqwrite -> overwrite 92df2fb2 io_tester: Relax return value of create_and_fill_file() 03d9500d io_tester: Dont fill file for APPEND d6844a7b io_tester: Indentation fix after previous patch fb9e0088 io_tester: Coroutinize create_and_fill_file() 2f802f57 exceptions: log thrown and propagated exception with distinct log levels 4971fa70 util: move log-level into own header 39448fc1 Merge 'Fix and tune http::request setup by client' from Pavel Emelyanov 52d0c4fb iostream: Move output_stream::write(scattered_message) lower 7a52f734 Merge 'read_first_line: Missing pragma and licence' from Ernest Zaslavsky d0881b7e read_first_line: Add missing license boilerplate 988a0e99 read_first_line:: Add missing `#pragma once` 42675266 http: Make client::make_request accept const request& c7709fb5 http: Make request making API return exceptional future not throw b68ed89b http: Move request content length header setup 1d96dac6 http: Move request version configuration 072e86f6 http: Setup request once ``` Closes scylladb/scylladb#25915	2025-09-13 17:14:28 +03:00
Asias He	9bca90be0d	repair: Fix repair_row_level_stop verb idl The version keyword is missed for the optional mark_as_repaired parameter. This causes the new node to expect more data to come: INFO 2025-09-01 19:23:05,332 [shard 0:strm] rpc - client 127.0.7.6:50116 msg_id 8: caught exception while processing a message: std::out_of_range (deserialization buffer underflow) When the sender is an old node in a mixed cluster, the data will never come. To fix, add the missing version keyword. Our idl-compiler.py should have caught the typo since the keyword was missing in the [[]] tag. Fixes #25666 Closes scylladb/scylladb#25782	2025-09-12 15:58:19 +03:00
Avi Kivity	ef7babda3d	Merge 'test: deflake test_restart_leaving_replica_during_cleanup' from Patryk Jędrzejczak The test started hitting #21779 recently. We deflake it in this commit by disabling the tablet load balancing before dropping the keyspace at the end of the test. We still have to understand why the test started hitting #21779, so we keep #25938 open. Refs #25938 The test was flaky only on master, so no backport needed. Closes scylladb/scylladb#25975 * github.com:scylladb/scylladb: test: enable load balancing on a single node in test_restart_leaving_replica_during_cleanup test: deflake test_restart_leaving_replica_during_cleanup	2025-09-12 15:58:19 +03:00
Sayanta Banerjee	6092520631	Small grammatical changes Closes scylladb/scylladb#24667	2025-09-12 15:58:19 +03:00
Radosław Cybulski	436150eb52	treewide: fix spelling errors Fix spelling errors reported by copilot on github. Remove single use namespace alias. Closes scylladb/scylladb#25960	2025-09-12 15:58:19 +03:00
Patryk Jędrzejczak	aaab71c14e	test: enable load balancing on a single node in test_restart_leaving_replica_during_cleanup Doing it on more than one node is redundant.	2025-09-11 13:19:56 +02:00
Patryk Jędrzejczak	4c9efc08d8	test: deflake test_restart_leaving_replica_during_cleanup The test started hitting #21779 recently. We deflake it in this commit by disabling the tablet load balancing before dropping the keyspace at the end of the test. We still have to understand why the test started hitting #21779, so we keep #25938 open. Refs #25938	2025-09-11 13:19:51 +02:00
Patryk Jędrzejczak	eae12c1717	test: cluster: add a test for restarts with no group 0 quorum We don't have such a test, and we could add a group 0 quorum requirement on the restart path by mistake. A new test, no backport. Closes scylladb/scylladb#25623	2025-09-11 08:56:34 +03:00
Raphael S. Carvalho	b607b1c284	compaction: Fix stop of sstable cleanup The interface suggests the whole sstable cleanup is aborted with 'nodetool stop CLEANUP', but it is currently stopping only the ongoing cleanup task, and the compaction manager will retry the task since the error is not propagated all the way back to the caller. With raft topology, the coordinator should retry it though since cleanup became mandatory with automatic cleanup. So it's only fixing the usage where cleanup is issued manually. The stop exception is only propagated to the caller of cleanup. When stopping tasks during shutdown, the exception is swallowed and the error only returned to the caller. Fixes #20823. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#24996	2025-09-11 08:55:10 +03:00
Yaron Kaikov	902d139c80	tools: toolchain: dbuild: add setuptools_scm as dependency this package was added as a dependnancy to `cqlsh` in `216d8b0658` Fixes: https://github.com/scylladb/scylladb/issues/25613 [Yaron: regenerate frozen toolchain with optimized clang from https://devpkg.scylladb.com/clang/clang-20.1.8-Fedora-42-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-20.1.8-Fedora-42-x86_64.tar.gz ] Closes scylladb/scylladb#25932	2025-09-11 08:51:28 +03:00
Cezar Moise	20ba8d4e8c	test: skip flaky test test_one_big_mutation_corrupted_on_startup The test is flaky since it tries to corrupt the commitlog in a non-deterministic way that sometimes allows the tested mutation to escape and be replayed anyhow. refs: #25627 Closes scylladb/scylladb#25950	2025-09-11 08:39:24 +03:00
Avi Kivity	c91b326d5a	Merge 'transport: replace throwing protocol_exception with returns' from Dario Mirovic Replace throwing `protocol_exception` with returning it as a result or an exceptional future in the transport server module. The goal is to improve performance. Most of the `protocol_exception` throws were made from `fragmented_temporary_buffer` module, by passing `exception_thrower()` to its `read` methods. `fragmented_temporary_buffer` is changed so that it now accepts an exception creator, not exception thrower. `fragmented_temporary_buffer_concepts::ExceptionCreator` concept replaced `fragmented_temporary_buffer_concepts::ExceptionThrower` and all methods that have been throwing now return failed result of type `utils::result_with_eptr`. This change is then propagated to the callers. The scope of this patch is `protocol_exception`, so commitlog just calls `.value()` method on the result. If the result failed, that will throw the exception from the result, as defined by `utils::result_with_eptr_throw_policy`. This means that the behavior of commitlog module stays the same. transport server module handles results gracefully. All the caller functions that return non-future value `T` now return `utils::result_with_eptr<T>`. When the caller is a function that returns a future, and it receives failed result, `make_exception_future(std::move(failed_result).value())` is returned. The rest of the callstack up to the transport server `handle_error` function is already working without throwing, and that's how zero throws is achieved. cql3 module changes do the same as transport server module. Benchmark that is not yet merged has commit `67fbe35833e2d23a8e9c2dcb5e04580231d8ec96`, [GitHub diff view](https://github.com/scylladb/scylladb/compare/master...nuivall:scylladb:perf_cql_raw). It uses either read or write query. Command line used: ``` ./build/release/scylla perf-cql-raw --workdir ~/tmp/scylladir --smp 1 --developer-mode 1 --workload write --duration 300 --concurrency 1000 --username cassandra --password cassandra 2>/dev/null ``` The only thing changed across runs is `--workload write`/`--workload read`. Built and run on `release` target. <details> ``` throughput: mean= 36946.04 standard-deviation=1831.28 median= 37515.49 median-absolute-deviation=1544.52 maximum=39748.41 minimum=28443.36 instructions_per_op: mean= 108105.70 standard-deviation=965.19 median= 108052.56 median-absolute-deviation=53.47 maximum=124735.92 minimum=107899.00 cpu_cycles_per_op: mean= 70065.73 standard-deviation=2328.50 median= 69755.89 median-absolute-deviation=1250.85 maximum=92631.48 minimum=66479.36 ⏱ real=5:11.08 user=2:00.20 sys=2:25.55 cpu=85% ``` ``` throughput: mean= 40718.30 standard-deviation=2237.16 median= 41194.39 median-absolute-deviation=1723.72 maximum=43974.56 minimum=34738.16 instructions_per_op: mean= 117083.62 standard-deviation=40.74 median= 117087.54 median-absolute-deviation=31.95 maximum=117215.34 minimum=116874.30 cpu_cycles_per_op: mean= 58777.43 standard-deviation=1225.70 median= 58724.65 median-absolute-deviation=776.03 maximum=64740.54 minimum=55922.58 ⏱ real=5:12.37 user=27.461 sys=3:54.53 cpu=83% ``` ``` throughput: mean= 37107.91 standard-deviation=1698.58 median= 37185.53 median-absolute-deviation=1300.99 maximum=40459.85 minimum=29224.83 instructions_per_op: mean= 108345.12 standard-deviation=931.33 median= 108289.82 median-absolute-deviation=55.97 maximum=124394.65 minimum=108188.37 cpu_cycles_per_op: mean= 70333.79 standard-deviation=2247.71 median= 69985.47 median-absolute-deviation=1212.65 maximum=92219.10 minimum=65881.72 ⏱ real=5:10.98 user=2:40.01 sys=1:45.84 cpu=85% ``` ``` throughput: mean= 38353.12 standard-deviation=1806.46 median= 38971.17 median-absolute-deviation=1365.79 maximum=41143.64 minimum=32967.57 instructions_per_op: mean= 117270.60 standard-deviation=35.50 median= 117268.07 median-absolute-deviation=16.81 maximum=117475.89 minimum=117073.74 cpu_cycles_per_op: mean= 57256.00 standard-deviation=1039.17 median= 57341.93 median-absolute-deviation=634.50 maximum=61993.62 minimum=54670.77 ⏱ real=5:12.82 user=4:10.79 sys=11.530 cpu=83% ``` This shows ~240 instructions per op increase for reads and ~180 instructions per op increase for writes. Tests have been run multiple times, with almost identical results. Each run lasted 300 seconds. Number of operations executed is roughly 38k per second 300 seconds = 11.4m ops. Update: I have repeated the benchmark with clean state - reboot computer, put in performance mode, rebuild, closed other apps that might affect CPU and disk usage. run count: 5 times before and 5 times after the patch duration: 300 seconds Average write throughput median before patch: 41155.99 Average write throughput median after patch: 42193.22 Median absolute deviation is also lower now, with values in range 350-550, while the previous runs' values were in range 750-1350. </details> Built and run on `release` target. <details> ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null ``` throughput: mean= 14910.90 standard-deviation=477.72 median= 14956.73 median-absolute-deviation=294.16 maximum=16061.18 minimum=13198.68 instructions_per_op: mean= 659591.63 standard-deviation=495.85 median= 659595.46 median-absolute-deviation=324.91 maximum=661184.94 minimum=658001.49 cpu_cycles_per_op: mean= 213301.49 standard-deviation=2724.27 median= 212768.64 median-absolute-deviation=1403.85 maximum=225837.15 minimum=208110.12 ⏱ real=5:19.26 user=5:00.22 sys=15.827 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null ``` throughput: mean= 93345.45 standard-deviation=4499.00 median= 93915.52 median-absolute-deviation=2764.41 maximum=104343.64 minimum=79816.66 instructions_per_op: mean= 65556.11 standard-deviation=97.42 median= 65545.11 median-absolute-deviation=71.51 maximum=65806.75 minimum=65346.25 cpu_cycles_per_op: mean= 34160.75 standard-deviation=803.02 median= 33927.16 median-absolute-deviation=453.08 maximum=39285.19 minimum=32547.13 ⏱ real=5:03.23 user=4:29.46 sys=29.255 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null ``` throughput: mean= 206982.18 standard-deviation=15894.64 median= 208893.79 median-absolute-deviation=9923.41 maximum=232630.14 minimum=127393.34 instructions_per_op: mean= 35983.27 standard-deviation=6.12 median= 35982.75 median-absolute-deviation=3.75 maximum=36008.24 minimum=35952.14 cpu_cycles_per_op: mean= 17374.87 standard-deviation=985.06 median= 17140.81 median-absolute-deviation=368.86 maximum=26125.38 minimum=16421.99 ⏱ real=5:01.23 user=4:57.88 sys=0.124 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null ``` throughput: mean= 16198.26 standard-deviation=902.41 median= 16094.02 median-absolute-deviation=588.58 maximum=17890.10 minimum=13458.74 instructions_per_op: mean= 659752.73 standard-deviation=488.08 median= 659789.16 median-absolute-deviation=334.35 maximum=660881.69 minimum=658460.82 cpu_cycles_per_op: mean= 216070.70 standard-deviation=3491.26 median= 215320.37 median-absolute-deviation=1678.06 maximum=232396.48 minimum=209839.86 ⏱ real=5:17.33 user=4:55.87 sys=18.425 cpu=99% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null ``` throughput: mean= 97067.79 standard-deviation=2637.79 median= 97058.93 median-absolute-deviation=1477.30 maximum=106338.97 minimum=87457.60 instructions_per_op: mean= 65695.66 standard-deviation=58.43 median= 65695.93 median-absolute-deviation=37.67 maximum=65947.76 minimum=65547.05 cpu_cycles_per_op: mean= 34300.20 standard-deviation=704.66 median= 34143.92 median-absolute-deviation=321.72 maximum=38203.68 minimum=33427.46 ⏱ real=5:03.22 user=4:31.56 sys=29.164 cpu=99% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null ``` throughput: mean= 223495.91 standard-deviation=6134.95 median= 224825.90 median-absolute-deviation=3302.09 maximum=234859.90 minimum=193209.69 instructions_per_op: mean= 35981.41 standard-deviation=3.16 median= 35981.13 median-absolute-deviation=2.12 maximum=35991.46 minimum=35972.55 cpu_cycles_per_op: mean= 17482.26 standard-deviation=281.82 median= 17424.08 median-absolute-deviation=143.91 maximum=19120.68 minimum=16937.43 ⏱ real=5:01.23 user=4:58.54 sys=0.136 cpu=99% ``` </details> Fixes: #24567 This PR is a continuation of #24738 [transport: remove throwing protocol_exception on connection start](https://github.com/scylladb/scylladb/pull/24738). This PR does not solve a burning issue, but is rather an improvement in the same direction. As it is just an enhancement, it should not be backported. Closes scylladb/scylladb#25408 * github.com:scylladb/scylladb: test/cqlpy: add protocol exception tests test/cqlpy: `test_protocol_exceptions.py` refactor message frame building test/cqlpy: `test_protocol_exceptions.py` refactor duplicate code transport: replace `make_frame` throw with return result cql3: remove throwing `protocol_exception` transport: replace throw in validate_utf8 with result_with_exception_ptr return transport: replace throwing protocol_exception with returns utils: add result_with_exception_ptr test/cqlpy: add unknown compression algorithm test case	2025-09-10 21:54:15 +03:00
Yaron Kaikov	0a025d121f	packaging: Add `adduser` as dependnacy As `adduser` command is being used by `/var/lib/dpkg/info/scylla-server.postinst` and similar during rpm post-install. Fixes: https://github.com/scylladb/scylladb/issues/23722 Closes scylladb/scylladb#25928	2025-09-10 21:51:25 +03:00
Avi Kivity	fc64333040	Merge 'sstables/trie: add BTI index readers and writers' from Michał Chojnowski This is yet another part in the BTI index project. Overarching issue: https://github.com/scylladb/scylladb/issues/19191 Previous part: https://github.com/scylladb/scylladb/pull/25506/ Next part: plugging the BTI index readers and writers into sstable readers and writers. The new code added in this PR isn't used outside of tests yet, but it's posted as a separate PR for reviewability. This series implements, on top of the key translation logic, and abstract trie writing and traversal logic, a writer and a reader of sstable index files (which map primary keys to positions in Data.db), as described in `f16fb6765b/src/java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.md`. Caveats: 1. I think the added test has reasonable coverage, but that depends on running it multiple times. (Though it shouldn't need more than a few runs to catch any bug it covers). It's somewhat awkward as a test meant for running in CI, it's better as something you run many times after a relevant change. 2. These readers and writers are intended to be compatible with Cassandra, but I did NOT do any compatibility testing. The writers and readers added here have only been tested against each other, not against Cassandra's readers and writers. 3. This didn't undergo any proper benchmarking and optimization work. I was doing some measurements in the past, but everything was rewritten so much since then that the my old measurements are effectively invalidated. Frankly I have no idea what the performance of all this branchy-branchy logic is now. No backports needed, new functionality. Closes scylladb/scylladb#25626 * github.com:scylladb/scylladb: test/manual: add bti_cassandra_compatibility_test test/lib/random_schema: add some constraints for generated uuid and time/date values test/lib/random_utils: add a variant of get_bytes which takes an `engine&` test/boost: add bti_index_test sstables/writer: add an accessor for the current write position in Data.db sstables/trie: introduce bti_index_reader sstables/trie: add bti_partition_index_writer.cc sstables/trie: add bti_row_index_writer.cc utils/bit_cast: add a new overload of write_unaligned() sstables/trie: add trie_writer::add_partial() sstables/consumer: add read_56() sstables/trie: make bti_node_reader::page_ptr copy-constructible sstables: extract abstract_index_reader from index_reader.hh to its own header sstables/trie: add an accessor to the file_writer under bti_node_sink sstables/types: make `deletion_time::operator tombstone()` const sstables/types: add sstables::deletion_time::make_live() sstables/trie: fix a special case in max_offset_from_child sstables/trie: handle `partition_region`s other than `clustered` in BTI position encoding sstables/trie: rewrite lcb_mismatch to handle fragment invalidation test/boost/bti_key_translation_test: fix a compilation error hidden behind `if constexpr`	2025-09-10 21:48:52 +03:00
Pavel Emelyanov	88a01308e7	api: Move /storage_service/keyspaces handler to database module The handler uses database service, not storage_service, and should belong to the corresponding API module from column_family.cc Once moved, the handler can use captured sharded<database> reference and forget about http_context::db. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25834	2025-09-10 17:01:11 +02:00
Nadav Har'El	ce4592d8fc	Merge 'test: cluster: deflake consistency checks after decommission' from Patryk Jędrzejczak In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). Therefore, `check_token_ring_and_group0_consistency` called just after decommission might fail when the decommissioned node is still in group 0 (as a non-voter). We deflake all tests that call `check_token_ring_and_group0_consistency` after decommission in this PR. Fixes #25809 This PR improves CI stability and changes only tests, so it should be backported to all supported branches. Closes scylladb/scylladb#25927 * github.com:scylladb/scylladb: test: cluster: deflake consistency checks after decommission test: cluster: util: handle group 0 changes after token ring changes in wait_for_token_ring_and_group0_consistency	2025-09-10 17:57:02 +03:00
Dawid Pawlik	1ce76a6ca2	cqlpy/test_vector_index: add vector index version test Test if the index version is the same as the base table version before the index was created. Test if recreating the index with the same parameters changes the version. Test if altering the base table does not change the version. Test if the user cannot specify the index version option by themself.	2025-09-10 15:19:36 +02:00
Dawid Pawlik	909a51e524	vector_index, index_prop_defs: add version to index options Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lack the information about the version of the index. The mentioned version is used to recognize the quick-drop-create index with the same parameters that needs to be rebuild. The case is mainly experienced while testing, benchmarking or experimenting with Vector Search. Nevertheless it is important to have it considered, as it is really weird having seen that DROP and CREATE commands did not change anything. Although being nice "optimization" to use the same old index, the rebuild feels more natural for the get-to-know-VS-users. Should not change anything in a real production environment. The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`. The version of vector index is a base table's schema version on which the index was created. The table's schema version changes everytime a table is changed meaning that CREATE INDEX or DROP INDEX statement also change it. Every index has a different index version, so it allows to identify them easily. This patch implements the solution described above.	2025-09-10 15:16:54 +02:00
Michał Chojnowski	47c2d09c22	test/manual: add bti_cassandra_compatibility_test Adds a heavy test which tests compatibility of BTI index files between Cassandra and Scylla. It's composed from a C++ part, used to read and write BTI files with Scylla's readers and writers, and a Python part which uses a Cassandra node and the C++ executable to make them read and write each other's files. The stages of the test are: 1. Use the C++ part to generate a random BIG sstable, and matching BTI index files. 2. Import the BIG files into Cassandra, let it generate its own BTI index files. 3. Read both Scylla's BTI and Cassandra's BTI index files using the C++ part. Check that they return the right positions and tombstones for each partition and row. 4. Sneakily swap Cassandra's BTI files for Scylla's BTI files, and query Cassandra (via CQL) for each row. Check that each query returns the right result. Not much can be inferred about the index via CQL queries, so the check we are doing on Cassandra is relatively weak. But in conjunction with the checks done on the Scylla part, it's probably good enough. The test is weird enough, and with heavy-enough dependencies (it uses a podman container to run the Cassandra) that ith has been put in test/manual. To run the test, build `build/$build_mode/test/manual/bti_cassandra_compatibility_test_g`, and run `python test/manual/bti_cassandra_compatibility_test.py`. Note: there's a lot of things that could go wrong in this test. (E.g. file permission issues or port mapping issues due to the container usage, incompatibilities between the Python driver and the random CQL values generated by generate_random_mutations, etc). I hope it works everywhere, but I only tested it on my machine, running it inside the dbuild container.	2025-09-10 13:04:42 +02:00
Botond Dénes	514f59d157	tools/scylla-sstable: write: move to UUID generation We are moving away from integer generations, so stop using them. Also drop the --generation command-line parameter, UUID generations don't have be provided by the caller, because random UUIDs will not collide with each other. To help the caller still know what generation the output sstable has (previously they provided it via --generation), print the generation to stdout. Closes scylladb/scylladb#25166	2025-09-10 13:47:26 +03:00
Nadav Har'El	5e7251cd40	secondary index: fix xfailing test to pass on Cassandra We have an xfailing test test_secondary_index.py::test_limit_partition which reproduces a Scylla bug in LIMIT when scanning a secondary index (Refs #22158). The point of such a reproducer is to demonstrate the bug by passing on Cassandra but failing on Scylla - yet this specific test doesn't pass on Cassandra because it expects the wrong 3 out of 4 results to be returned: The test begins with LIMIT 1 and sees the first result is (2,1), so we expect when we increase the LIMIT to 3 to see more results from the same partition (2) - and yet the test mistakenly expected the next results to come from partition 1, which is not a reasonable expectation, and doesn't happen in Cassandra (I checked both Cassandra 5 and 4). After this patch, the test passes on Cassandra (I tried 4 and 5), and continues to fail on Scylla - which returns 4 rows despite the LIMIT 3. Note that it is debatable whether this test should insist at all on which 3 items are returned by "LIMIT 3" - In Cassandra the ordering of a SELECT with a secondary index is not well defined (see discussion in Refs #23392). So an alternative implementation of this test would be to just check that LIMIT 3 returns 3 items without insisting which: # In Cassandra the ordering of a SELECT with a secondary index is not # defined (see discussion in #23392), so we don't know which three # results to expect - just that it must be a 3-item subset. rows = list(rs) assert len(rows) == 3 assert set(rows).issubset({(1,1), (1,2), (2,1), (2,2)}) However, as of yet, I did not modify this test to do this. I still believe there is value in secondary index scans having the same order as a scan without a secondary index has - and not an undefined order, and if both Scylla and Cassandra implement that in practice, it's useful for tests to validate this so we'll know if this guarantee is ever broken. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25676	2025-09-10 08:48:52 +03:00
Wojciech Mitros	1f9be235b8	mv: delete previously undetected ghost rows in PRUNE MATERIALIZED VIEW statement The PRUNE MATERIALIZED VIEW statement is supposed to remove ghost rows from the view. Ghost rows are rows in the view with no corresponding row in the base table. Before this patch, only rows whose primary key columns of the base table had different values than any of the base rows were treated as ghost rows by the PRUNE statement. However, view rows which have a column in their primary key that's not in the base primary can also be ghost rows if this column has a different value than the base row with the same values of remaining primary key columns. That's because these rows won't be deleted unless we change value of this column in the base table to this specific value. In this patch we add a check for this column in the PRUNE MATERIALIZED VIEW logic. If this column isn't the same in the base table and the view, these rows are also deleted. Fixes https://github.com/scylladb/scylladb/issues/25655 Closes scylladb/scylladb#25720	2025-09-10 07:35:00 +02:00
Patryk Jędrzejczak	520cc0eeaa	Merge 'test: fix race condition in test_long_join' from Emil Maskovsky The test could trigger gossiper API calls before the API was properly registered, causing intermittent 404 errors. Previously the test waited for the "init - starting gossiper" log, but this appears before API registration completes. Add explicit wait for gossiper API registration to ensure the endpoint is available before making requests, eliminating test flakiness. Fixes: scylladb/scylladb#25582 No backport needed: Issue only observed in master so far. Closes scylladb/scylladb#25583 * https://github.com/scylladb/scylladb: test: improve async execution in test_long_join test: fix race condition in test_long_join	2025-09-09 19:12:59 +02:00
Patryk Jędrzejczak	bb9fb7848a	test: cluster: deflake consistency checks after decommission In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). Therefore, `check_token_ring_and_group0_consistency` called just after decommission might fail when the decommissioned node is still in group 0 (as a non-voter). We deflake all tests that call `check_token_ring_and_group0_consistency` after decommission in this commit. Fixes #25809	2025-09-09 19:01:12 +02:00
Patryk Jędrzejczak	e41fc841cd	test: cluster: util: handle group 0 changes after token ring changes in wait_for_token_ring_and_group0_consistency In the Raft-based topology, a decommissioning node is removed from group 0 after the decommission request is considered finished (and the token ring is updated). `wait_for_token_ring_and_group0_consistency` doesn't handle such a case; it only handles cases where the token ring is updated later. We fix this in this commit. We rely on the new implementation of `wait_for_token_ring_and_group0_consistency` in the following commit to fix flakiness of some tests. We also update the obsolete docstring in this commit.	2025-09-09 19:01:09 +02:00
Avi Kivity	5237a20993	Merge 'replica: Fix split compaction when tablet boundaries change' from Raphael Raph Carvalho Consider the following: 1) balancer emits split decision 2) split compaction starts 3) split decision is revoked 4) emits merge decision 5) completes merge, before compaction in step 2 finishes After last step, split compaction initiated in step 2 can fail because it works with the global tablet map, rather than the map when the compaction started. With the global state changing under its feet, on merge, the mutation splitting writer will think it's going backwards since sibling tablets are merged. This problem was also seen when running load-and-stream, where split initiated by the sstable writer failed, split completed, and the unsplit sstable is left in the table dir, causing problems in the restart. To fix this, let's make split compaction always work with the state when it started, not a global state. Fixes #24153. All 2025.* versions are vulnerable, so fix must be backported to them. Closes scylladb/scylladb#25690 * github.com:scylladb/scylladb: replica: Fix split compaction when tablet boundaries change replica: Futurize split_compaction_options()	2025-09-09 17:05:32 +03:00
Dawid Mędrek	789a4a1ce7	test/perf: Adjust tablet_load_balancing.cc to RF-rack-validity We modify the logic to make sure that all of the keyspaces that the test creates are RF-rack-valid. For that, we distribute the nodes across two DCs and as many racks as the provided replication factor. That may have an effect on the load balancing logic, but since this is a performance test and since tablet load balancing is still taking place, it should be acceptable. This commit also finishes work in adjusting perf tests to pass with the `rf_rack_valid_keyspaces` configuration option enabled. The remaining tests either don't attempt to create keyspaces or they already create RF-rack-valid keyspaces. We don't need to explicitly enable the configuration option. It's already enabled by default by `cql_test_config`. The reason why we haven't run into any issue because of that is that performance tests are not part of our CI. Fixes scylladb/scylladb#25127 Closes scylladb/scylladb#25728	2025-09-09 12:46:46 +03:00
Botond Dénes	a89d0a747b	Merge 'test.py: add different levels of verbosity for output' from Andrei Chekun Add another level of verbosity: quiet. Before this it was used as a default one, but it provides not enough information. These changes should be coupled with pytest-sugar plugin to have an intended information for each level. Invoke the pytest as a module, instead of a separate process, to get access to the terminal to be able to it interactively. Framework change only, so backporting in to 2025.3 Fixes: #25403 Closes scylladb/scylladb#25698 * github.com:scylladb/scylladb: test.py: add additional level of verbosity for output test.py: start pytest as a module instead of subprocess	2025-09-09 11:49:51 +03:00
Asias He	cb7db47ae1	repair: Add incremental_mode option for tablet repair This patch introduces a new `incremental_mode` parameter to the tablet repair REST API, providing more fine-grained control over the incremental repair process. Previously, incremental repair was on and could not be turned off. This change allows users to select from three distinct modes: - `regular`: This is the default mode. It performs a standard incremental repair, processing only unrepaired sstables and skipping those that are already repaired. The repair state (`repaired_at`, `sstables_repaired_at`) is updated. - `full`: This mode forces the repair to process all sstables, including those that have been previously repaired. This is useful when a full data validation is needed without disabling the incremental repair feature. The repair state is updated. - `disabled`: This mode completely disables the incremental repair logic for the current repair operation. It behaves like a classic (pre-incremental) repair, and it does not update any incremental repair state (`repaired_at` in sstables or `sstables_repaired_at` in the system.tablets table). The implementation includes: - Adding the `incremental_mode` parameter to the `/storage_service/repair/tablet` API endpoint. - Updating the internal repair logic to handle the different modes. - Adding a new test case to verify the behavior of each mode. - Updating the API documentation and developer documentation. Fixes #25605 Closes scylladb/scylladb#25693	2025-09-09 06:50:21 +03:00
Avi Kivity	c4ed7dd814	Merge 'gossiper: fix issues in processing gossip status during the startup and when messages are delayed to avoid empty host ids' from Emil Maskovsky Populate the local state during gossiper initialization in start_gossiping, preventing an empty state from being added to _endpoint_state_map and returned in get_endpoint_states responses, that was causing an 'empty host id issue' on the other nodes during nodes restart. Check for a race condition in do_apply_state_locally In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. Fixes https://github.com/scylladb/scylladb/issues/25831 Fixes https://github.com/scylladb/scylladb/issues/25803 Fixes https://github.com/scylladb/scylladb/issues/25702 Fixes https://github.com/scylladb/scylladb/issues/25621 Ref https://github.com/scylladb/scylla-enterprise/issues/5613 Backport: The issue affects all current releases(2025.x), therefore this PR needs to be backported to all 2025.1-2025.3. Closes scylladb/scylladb#25849 * github.com:scylladb/scylladb: gossiper: fix empty initial local node state gossiper: add test for a race condition in start_gossiping gossiper: check for a race condition in `do_apply_state_locally` test/gossiper: add reproducible test for race condition during node decommission	2025-09-08 20:51:01 +03:00
Andrei Chekun	ea4cd431c9	test.py: add pytest-sugar plugin to the dependencies This plugin allows having better terminal output with progress bar for the tests. Closes scylladb/scylladb#25845 [avi: regenerate frozen toolchain] Closes scylladb/scylladb#25860	2025-09-08 20:50:02 +03:00
Radosław Cybulski	6d150e2d0c	Fix oversized allocation in paxos under pressure When cpu pressured, `_locks` structure in paxos might grow and cause oversized allocations and performance drops. We reserve memory ahead of time. Fixes #25559 Closes scylladb/scylladb#25874	2025-09-08 20:49:00 +03:00
Yaron Kaikov	d57741edc2	build_docker.sh: enable debug symboles installation Adding the latest scylla.repo location to our docker container, this will allow installation scylla-debuginfo package in case it's needed Fixes: https://github.com/scylladb/scylladb/issues/24271 Closes scylladb/scylladb#25646	2025-09-08 18:39:27 +03:00
Emil Maskovsky	f8c297ca27	test: improve async execution in test_long_join Replace list comprehensions with asyncio.gather() to await the injection API calls in fully concurrent manner.	2025-09-08 17:14:37 +02:00
Emil Maskovsky	a86bd06f08	test: fix race condition in test_long_join The test could trigger gossiper API calls before the API was properly registered, causing intermittent 404 errors. Previously the test waited for the "init - starting gossiper" log, but this appears before API registration completes. Add explicit wait for gossiper API registration to ensure the endpoint is available before making requests, eliminating test flakiness. Fixes: scylladb/scylladb#25582	2025-09-08 17:14:37 +02:00
Pavel Emelyanov	34d1648d21	main: Properly handle zero allocation warning threshold The --help text says about --large-memory-allocation-warning-threshold: "Warn about memory allocations above this size; set to zero to disable." That's half-true: setting the value to zero spams logs with warnings of allocation of any size, as seastar treats zero threshold literaly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25850	2025-09-08 12:41:19 +02:00
Asias He	451e1ec659	streaming: Fix use after move in the tablet_stream_files_handler The files object is moved before the log when stream finishes. We've logged the files when the stream starts. Skip it in the end of streaming. Fixes #25830 Closes scylladb/scylladb#25835	2025-09-08 11:59:52 +02:00
Sergey Zolotukhin	b34d543f30	gossiper: fix empty initial local node state This change removes the addition of an empty state to `_endpoint_state_map`. Instead, a new state is created locally and then published via replicate, avoiding the issue of an empty state existing in `_endpoint_state_map` before the preemption point. Since this resolves the issue tested in `test_gossiper_empty_self_id_on_shadow_round`, the `xfail` mark has been removed. Fixes: scylladb/scylladb#25831	2025-09-08 11:38:31 +02:00
Sergey Zolotukhin	775642ea23	gossiper: add test for a race condition in start_gossiping This change adds a test for a race condition in `start_gossiping` that can lead to an empty self state sent in `gossip_get_endpoint_states_response`. Test for scylladb/scylladb#25831	2025-09-08 11:38:30 +02:00
Sergey Zolotukhin	f08df7c9d7	gossiper: check for a race condition in `do_apply_state_locally` In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change 1. adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. 2. Removes xfail from the test_gossiper_race test since the issue is now fixed. 3. Adds exception handling in `do_shadow_round` to skip responses from nodes that sent an empty host ID. This re-applies the commit `13392a40d4` that was reverted in `46aa59fe49`, after fixing the issues that caused the CI to fail. Fixes: scylladb/scylladb#25702 Fixes: scylladb/scylladb#25621 Ref: scylladb/scylla-enterprise#5613	2025-09-08 11:38:30 +02:00
Emil Maskovsky	28e0f42a83	test/gossiper: add reproducible test for race condition during node decommission This change introduces a targeted test that simulates the gossiper race condition observed during node decommissioning. The test delays gossip state application and host ID lookup to reliably reproduce the scenario where `gossiper::get_host_id()` is called on a removed endpoint, potentially triggering an abort in `apply_new_states`. There is a specific error injection added to widen the race window, in order to increase the likelihood of hitting the race condition. The error injection is designed to delay the application of gossip state updates, for the specific node that is being decommissioned. This should then result in the server abort in the gossiper. This re-applies the commit `5dac4b38fb` that was reverted in `dc44fca67c`, but modified to relax the check from "on_internal_error" to a just warning log. The more strict can be re-introduced later once we are sure that all remaining problems are resolved and it will not break the CI. Refs: scylladb/scylladb#25621 Fixes: scylladb/scylladb#25721	2025-09-08 11:38:30 +02:00

1 2 3 4 5 ...

49369 Commits