scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-22 07:42:16 +00:00

Author	SHA1	Message	Date
Piotr Dulikowski	5b269be37b	Merge 'test/cluster/test_view_building_coordinator: migrate test from dtest' from Michał Jadwiszczak Move `materialized_views_test.py::TestMaterializedViews::test_do_not_finish_view_building_with_hints` test from dtest to test.py. The dtest was throttling down IO throughput in the hope that the view building won't be finished too soon. This introduces some unreliability, which can be solved by using error injection and pausing view building until we stop necessary nodes. This patch adds 2 tests: one for tablet-based view and one for vnode-based. Both of the tests use error injection to pause view building. Fixes [SCYLLADB-1261](https://scylladb.atlassian.net/browse/SCYLLADB-1261) The issue was seen in 2026.2, so we should backport this patch to this version. [SCYLLADB-1261]: https://scylladb.atlassian.net/browse/SCYLLADB-1261?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29788 * github.com:scylladb/scylladb: test/cluster/mv/test_mv_building: add similar test for vnode-based view test/cluster/test_view_building_coordinator: migrate test from dtest db/view/view_building_worker: add more logs when flushing base table	2026-05-14 15:34:26 +02:00
Michał Jadwiszczak	25c176c1b4	sstables_loader: fix missing include Commit `c97232b` introduced use of `seastar::util::read_entire_stream()`, however it didn't included relevant header which is causing compilation error. It probably went silently through CI because of precompiled headers. Refs scylladb#28763 Closes scylladb/scylladb#29901	2026-05-14 15:16:34 +02:00
Piotr Szymaniak	ac3fff897a	alternator/doc: update Streams compatibility docs Alternator Streams graduated from experimental in #29604. Update the compatibility and FAQ docs accordingly: - Replace the "Experimental API features" section with a new "Alternator Streams" section that lists known differences without the experimental framing. - Expand the alternator_streams_increased_compatibility paragraph to explain both consequences of leaving it off (spurious no-op events and inaccurate INSERT/MODIFY distinction) and the performance cost of enabling it (LWT path for every write). - Drop the stale ShardFilter limitation (now implemented). - Replace the alternator-streams FAQ example with strongly-consistent-tables so the multi-feature syntax example remains useful. Fixes SCYLLADB-462 Closes scylladb/scylladb#29695	2026-05-14 15:06:19 +02:00
Michał Jadwiszczak	5c84cff78a	test/cluster/mv/test_mv_building: add similar test for vnode-based view In the dtest repo, the test run for both vnode and tablet based views. Since in test.py infra we're using error injection to pause the view building process, we need separate tests for those two cases.	2026-05-14 10:52:44 +02:00
Piotr Dulikowski	0c016cecc3	Merge 'QOS: self-heal stale V1-to-V2 migration state on upgrade' from Alex Dathskovsky service_levels: self-heal stale v1 marker after raft topology upgrade This PR handles an upgrade corner case where a node may already be using raft topology, while `system.scylla_local` still marks service levels as v1. The problem was introduced by commit `2917ec5d51` ("service:qos: service levels migration"), which added the service-levels migration from `system_distributed.service_levels` to `system.service_levels_v2` as part of the raft topology upgrade. However, if the cluster had no service levels configured, there was no data to migrate. In that case, the migration path could leave the local version marker unchanged, so the node would later observe an inconsistent state: * raft topology is already enabled; * service levels are still marked as v1 in `system.scylla_local`. Such clusters can be left in a stale state and fail startup during upgrade to 2026.2 This PR makes the upgrade path self-healing. The first commit restores `service_level_controller::migrate_to_v2()`, giving us a group0-based path for writing the service-levels v2 state even after raft topology is already in use. The second commit wires this path into startup. When the node detects the stale raft-topology + service-levels-v1 state, it retries the migration a bounded number of times and updates the version marker to v2 instead of failing startup. With this change, clusters that were left in this stale state can recover automatically during upgrade to 2026. Fixes: SCYLLADB-1807 backport: 2026.2 2026.1 we need this functionality when we are upgrading older servers Closes scylladb/scylladb#29749 * github.com:scylladb/scylladb: test/auth_cluster: simulate v1 state in self-heal test When skip_service_levels_v2_initialization is used, write an explicit v1 service level version marker while skipping v2 initialization. This lets the restart test exercise self-healing from v1 to v2. qos: self-heal stale service levels version on startup qos: reintroduce service levels v2 migration self-heal	2026-05-14 10:32:43 +02:00
Michał Jadwiszczak	b887f8cb2b	test/cluster/test_view_building_coordinator: migrate test from dtest Move `materialized_views_test.py::TestMaterializedViews::test_do_not_finish_view_building_with_hints` test from dtest to test.py. The dtest was throttling down IO throughput in the hope that the view building won't be finished too soon. This introduces some unreliability, which can be solved by using error injection and pausing view building until we stop necessary nodes. Fixes SCYLLADB-1261	2026-05-14 10:23:42 +02:00
Michał Jadwiszczak	b175f5b97d	db/view/view_building_worker: add more logs when flushing base table Add debug logs around flushing the base table to see how long does it take in case of some stalls in view building. Refs SCYLLADB-1261	2026-05-14 10:23:42 +02:00
Avi Kivity	6db152afbb	Update seastar submodule Drop local formatter for seastar::http::reply, which should have been added to Seastar in the first place, and now conflicts. Also drop local formatters for types that are aliases for Seastar types which have gained formatters. Disable recently-gained TLS use of OpenSSL instead of gnutls. We don't need it, and it causes link errors with LTO. Fix incorrect skipping in encrypted_file_test, which computed the remaining stream length but did not account for already consumed size_to_compare. Change utils::gcp::storage::client::object_data_source::skip() to match new Seastar behavior (rejecting skip-past-eof with an exception). This is needed since `30f1075544` switched the test's data source to a Seastar implementation. It is also more correct - if we're asked to skip n bytes but the stream doesn't have n bytes, this is a protocol violation. Contains test fix from Pavel, exposed by [1]: test: Handle premature EOF in test_gcp_storage_skip_read The test intentionally uses file_size larger than the actual object to exercise EOF behavior. When input_stream::skip() is called after EOF, it throws std::runtime_error("premature end of stream"). Catch this specific exception from both streams, verify they agree, and exit the loop gracefully. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> [1] `cbd1e17d2f`, included in this Seastar submodule update * seastar 4d268e0e...485a62b2 (50): > reactor: open_directory(): honor bypass_fsync > http: Add formatters for http::request and http::reply > Merge 'Assorted set of io-tester cleanups' from Pavel Emelyanov io_tester: Remove unused and internal-only accessor io_tester: Move think-time machinery into thinker_state io_tester: Move _file to io_class_data io_tester: Replace class_data::_start member with a local variable io_tester: Move _alignment from class_data to io_class_data io_tester: Remove buffer allocation from top-level request issuing io_tester: Cleanup context::stop() invocation io_tester: Allocate write buffer once to fill a file io_tester: Declare quantiles arrays as static constexpr io_tester: Drop class_data::type_str() io_tester: Replace != "" comparisons with .empty() io_tester: Replace gen_class_data() if/else chain with a switch io_tester: Deduplicate vectorized I/O classes > io_tester: fix crash from missing metric during startup > net: tls: adjust openssl integration to new module support > http/client: Count and export integrated queue length > Merge 'Introduce pipe_data_source_impl and pipe_data_sink_impl' from Pavel Emelyanov fstream: add pipe_data_source_impl and pipe_data_sink_impl pollable_fd: add write_some/write_all backed by writev pollable_fd: rename write_some/write_all(iovec) to send_some/send_all > reactor: Make pollable_fd_state helper methods private > module: extend seastar.cppm with comprehensive public API exports > Merge 'Add exhaustive input_stream invariant test + fixes' from Pavel Emelyanov tests: add exhaustive input_stream read/skip invariant test iostream: make skip() reject premature end of stream with exception > Merge 'Allow runtime selectability of GnuTLS or OpenSSL' from Noah Watkins net/tls: avoid potential read-past-buffer net/tls: move credential methods to generic tls layer net/tls: rename credentials_impl::dh_params to set_dh_params test/tls: enable openssl tls unit test test/tls: fix CA cert generation to use v3_ca extensions github: disable parallel test execution in alpine workflow crypto: support compiling seastar without gnutls net/tcp: use crypto provider for md5 calculation tls: fix test_peer_certificate_chain_handling for OpenSSL net/tls: fix test for self-signed server cert opoenssl compat net/tls: disable priority strings test for openssl provider core/crypto: expose crypto backend name for introspection test/tls: remove gnutls version guard net/tls: add openssl tls backend http: use backend agnostic tls error code net/tls: make error codes configurable by each tls backend net/tls: move reloadable_credentials to generic tls layer net/tls: move build_certificate to generic tls layer net/tls: move apply_to() to generic tls layer net/tls: move credential methods to generic tls layer net/tls: add OpenSSL-specific methods to public API with no-op defaults net/tls: introduce dh_params and credentials abstraction layer net/tls: add credentials_impl abstract base class net/tls: dispatch tls::error_category() through crypto_provider net/tls: dispatch wrap_client/wrap_server through crypto_provider net/tls: add tls_backend interface to crypto_provider net/tls: move public tls API methods to generic tls layer net/tls: move formatting utilities to generic tls layer net/tls: move credentials_builder blob methods to generic tls layer net/tls: move dh_params::from_file to generic tls layer net/tls: move abstract_credentials file methods to generic tls layer net/tls: move tls_socket_impl to generic tls layer net/tls: move server_session to general tls layer net/tls: move tls_connected_socket_impl to generic tls layer net/tls: move net::get_impl to generic tls layer net/tls: move session_ref to generic tls layer net/tls: add session_impl abstract interface for tls pluggability net/tls: rename tls.cc to be gnutls specific crypto: introduce crypto provider abstraction http: remove unused include > tls: test_send_two_large > rpc: include exception type for remote errors > GHA: increase timeout to 60 minutes > apps/httpd: replace deprecated reply::done() with write_body() > missing header(s) > net: Fix missing throw for runtime_error in create_native_net_device > tests/io_queue: account for token bucket refill granularity in bandwidth checks > Merge 'iovec: fix iovec_trim_front infinite loop on zero-length iovecs' from Travis Downs tests: add regression tests for zero-length iovec handling iovec: fix iovec_trim_front infinite loop on zero-length iovecs > util/process: graduate process management API from experimental > cooking: don't register ready.txt as a build output > sstring: make make_sstring not static > Add SparkyLinux to debian list in install-dependencies.sh > http: allow control over default response headers > Merge 'chunked_fifo: make cached chunk retention configurable' from Brandon Allard tests/perf: add chunked_fifo microbenchmarks chunked_fifo: set the default free chunk retention to 0 chunked_fifo: make free chunk retention configurable > Merge 'reactor_backend: fix pollable_fd_state_completion reuse in io_uring' from Kefu Chai tests: add regression test for pollable_fd_state_completion reuse reactor_backend: use reset() in AIO and epoll poll paths reactor_backend: fix pollable_fd_state_completion reuse after co_await in io_uring > Merge 'coroutine: Generator cleanups' from Kefu Chai coroutine/generator: extract schedule_or_resume helper coroutine/generator: remove unused next_awaiter classes coroutine/generator: remove write-only _started field coroutine/generator: assert on unreachable path in buffered await_resume coroutine/generator: add elements_of tag and #include <ranges> coroutine/generator: add empty() to bounded_container concept > cmake: bump minimum Boost version to 1.79.0 > seastar_test: remove unnecessary headers > cmake: bump minimum GnuTLS version to 3.7.4 > Merge 'reactor: add get_all_io_queues() method' from Travis Downs tests: add unit test for reactor::get_all_io_queues() reactor: add get_all_io_queues() method reactor: move get_io_queue and try_get_io_queue to .cc file > http: deprecate reply::done(), remove _response_line dead field > core: Deprecate scattered_message > ci: add workflow dispatch to tests workflow > perf_tests: exit non-zero when -t pattern matches no tests > Replace duplicate SEGV_MAPERR check in sigsegv_action() with SEGV_ACCERR. > perf_tests: add total runtime to json output > Merge 'Relax large allocation error originating from json_list_template' from Robert Bindar implement move assignment operator for json_list_template json_list_template copy assignment operator reserves capacity upfront > perf_tests: add --no-perf-counters option > Merge 'Fix to_human_readable_value() ability to work with large values' from Pavel Emelyanov memory: Add compile-time test for value-to-human-readable conversion memory: Extend list of suffixes to have peta-s memory: Fix off-by-one in suffix calculation memory: Mark to_human_readable_value() and others constexpr > http: Improve writing of response_line() into the output > Merge 'websocket: add template parameter for text/binary frame mode and implement client-side WebSocket' from wangyuwei websocket: add template parameter for text/binary frame mode websocket: impl client side websocket function > file: Fix checks for file being read-only > reactor: Make do_dump_task_queue a task_queue method > Merge 'Implement fully mixed mode for output_stream-s' from Pavel Emelyanov tests/output_stream: sample type patterns in sanitizer builds tests/output_stream: extend invariant test to cover mixed write modes iostream: allow unrestricted mixing of buffered and zero-copy writes tests/output_stream: remove obsolete ad-hoc splitting tests tests/output_stream: add invariant-based splitting tests iostream: rename output_stream::_size to ::_buffer_size > reactor_backend: replace virtual bool methods with const bool_class members > resource: Avoid copying CPU vector to break it into groups > perf_tests: increase overhead column precision to 3 decimal places > Merge 'Move reactor::fdatasync() into posix_file_impl' from Pavel Emelyanov reactor: Deprecate fdatasync() method file: Do fdatasync() right in the posix_file_impl::flush() file: Propagate aio_fdatasync to posix_file_impl reactor: Move reactor::fdatasync() code to file.cc reactor,file: Make full use of file_open_options::durable bit file: Add file_open_options::durable boolean file: Account io_stats::fsyncs in posix_file_impl::flush() reactor: Move _fsyncs counter onto io_stats > http: Remove connection::write_body() Closes scylladb/scylladb#29553	2026-05-14 10:45:39 +03:00
Botond Dénes	1403f18240	Merge 'alternator: add more vector search features' from Nadav Har'El Recently (in commit `37fc1507f0`) we added vector search support for Alternator. That implementation was functional, but did not yet support all the features that we had envisioned. This patch series adds some of the missing features to Alternator's vector search. Each feature is described in more detail in its own patch. * Metrics related to vector search usage in Alternator. * `SimilarityFunction` option when creating a vector index to choose the similarity function. Defaults to `COSINE` (the existing default). Other options are `DOT_PRODUCT` and `EUCLIDEAN`. * An optimized vector type, `{"FLOAT32VECTOR": [1.0, 2.0, ..]}`, which is stored on disk efficiently as 32-bit floats, not a JSON. * A Query VectorSearch option `ReturnScores` asking to return the similarity score calculated for each returned result (the results are sorted in decreasing similarity score - the highest similarity is the best and returned first). Closes scylladb/scylladb#29554 * github.com:scylladb/scylladb: alternator: add ReturnScores option to VectorSearch vector_store_client: read and return similarity_scores alternator: add optimized vector type for vector search alternator: add SimilarityFunction option to vector index creation alternator: add vector search metrics	2026-05-14 10:41:41 +03:00
Andrei Chekun	a09fdfc46a	test.py: fix issue that C++ tests' logs are deleted Add skiping deletion of the log file in case of the fail in C++ tests. Closes scylladb/scylladb#29859	2026-05-13 21:31:03 +03:00
Avi Kivity	f2ab911a46	Merge 'test/cluster: fix server-starting functions to wait for all ports' from Nadav Har'El This series fixes a recurring source of flaky tests in the cluster test suite. When a test configures Scylla to listen on non-default ports (e.g. a custom Alternator port, proxy-protocol port or shard-aware port), server_add() and server_start() would declare the server ready by polling the hardcoded standard CQL and Alternator ports. Those ports can become available slightly before the custom ports finish binding, so the test could start using the custom port before it was open — causing intermittent failures. The fix for each affected test was to pass `expected_server_up_state=ServerUpState.SERVING` explicitly, which waits for Scylla's sd_notify("STATUS=serving") signal instead. That signal is sent only after all configured listeners are fully open, so it is always the right readiness signal regardless of the port configuration. This workaround was applied again in PR #29737 and will keep being needed for every new test that uses a non-default port. This series makes ServerUpState.SERVING the default at every level of the server start/add call stack so no test needs to remember it: * Make server_add(), servers_add(), server_start() et al. all default to ServerUpState.SERVING. * Document that server_add/server_start wait for all ports to be ready, so future test authors understand what the functions guarantee. * Remove now-redundant expected_server_up_state=SERVING from exiting tests. * A small optimization: Fix check_serving_notification() returning False on first completion. When the sd_notify future completed, the function correctly updated _received_serving but still returned False, wasting one 100ms polling interval. Return self._received_serving directly. Closes scylladb/scylladb#29758 * github.com:scylladb/scylladb: test/pylib: fix missing protocol_version=4 on control_cluster scylla_cluster: guard poll_status() set_result() calls against cancelled future test/cluster: avoid repeated CQL checks and leaks while waiting for SERVING test/cluster: fix check_serving_notification() inefficiency test/cluster: remove now-redundant expected_server_up_state=SERVING test/cluster: document that add/start waits for all ports to be ready test/cluster: update remaining CQL_ALTERNATOR_QUERIED defaults to SERVING test/cluster: fix server_add/server_start hanging when starting in maintenance mode main: notify "entering maintenance mode" after the maintenance CQL server is ready test/cluster: make server_start() default to ServerUpState.SERVING test/cluster: make server_add() default to ServerUpState.SERVING	2026-05-13 21:23:18 +03:00
Alex	6188bf3e01	test/auth_cluster: simulate v1 state in self-heal test When skip_service_levels_v2_initialization is used, write an explicit v1 service level version marker while skipping v2 initialization. This lets the restart test exercise self-healing from v1 to v2.	2026-05-13 17:55:20 +03:00
Alex	c2014f7e50	qos: self-heal stale service levels version on startup Add self_heal_service_levels_version() and use it during startup when the node is already on raft topology but service levels are still marked as v1. In that stale state, migrate service levels to v2 through group0 instead of failing startup.	2026-05-13 17:55:20 +03:00
Piotr Dulikowski	f3ac35f9d2	Merge 'strong_consistency: wait for raft servers to start in create table' from Michael Litvak When creating a strongly consistent table, wait for the table's raft servers to start and be ready to serve queries before completing the operation. We want the create table operation to absorb the delay of starting the raft groups instead of the first queries. The create table coordinator commits and applies the schema statement, then it waits for all hosts that have a tablet replica to create and start the raft groups for the table's tablets. It does this by sending an RPC to all the relevant hosts that executes a group0 barrier, in order to ensure the table and raft groups are created, then waits for all raft groups on the host to finish starting and be ready. Fixes SCYLLADB-807 no backport - strong consistency is still experimental Closes scylladb/scylladb#28843 * github.com:scylladb/scylladb: strong_consistency: wait for leader when starting a group strong_consistency: change wait for groups to start on startup strong_consistency: optimize wait_for_groups_to_start strong_consistency: wait for raft servers to start in create table	2026-05-13 16:42:05 +02:00
Piotr Dulikowski	dc05bd35bb	Merge 'strong_consistency: limit available consistency levels in strong consistent requests' from Michał Jadwiszczak Strong consistent requests take different patch then EC requests and consistency levels don’t map well. We should limit available consistency levels in SC request to avoid ignoring them silently, which may cause confusion to user. For writes, there is only one option: - QUORUM/LOCAL_QUORUM (multi DC is not supported yet, so both of those CLs have the same effect) - we need quorum of replicas to successfully commit new mutations to Raft log. For reads, there are 2 options: - QUORUM/LOCAL_QUORUM - if user wants to be sure he sees latest data and the query needs to execute `read_barrier()`, which requires quorum of replicas - ONE/LOCAL_ONE - if user just wants to read data from one replica without synchronization All tests were updated to use LOCAL_QUORUM for both read and writes. Fixes SCYLLADB-1766 SC is in experimental phase and this patch is an improvement, no backport needed. Closes scylladb/scylladb#29691 * github.com:scylladb/scylladb: strong_consistency: allow QUORUM/LOCAL_QUORUM and ONE/LOCAL_ONE for reads strong_consistency: allow only QUORUM/LOCAL_QUORUM CL for writes	2026-05-13 16:31:05 +02:00
Piotr Dulikowski	3c2c814215	Merge 'db/view/view_building: replace system keyspace functions with mutation builder' from Michał Jadwiszczak `system.view_building_tasks` is a single partition table, so it makes more sense to use a mutation builder and generate 1 mutation per group0 command instead of generating multiple mutations. This PR removes all `make_..._mutation()` system keyspace functions related to view building tasks and replaces them with mutation builder. Refs https://github.com/scylladb/scylladb/issues/25929 This patch doesn't fix any bug, it only reduces number of generated mutations, no need to backport it. Closes scylladb/scylladb#26557 * github.com:scylladb/scylladb: db/system_keyspace: replace `make_remove_view_building_task_mutation()` with mutation builder db/view/view_building_task_mutation_builder: make uuid generator optional db/system_keyspace: replace `make_view_building_task_mutation()` with mutation builder db/view/view_building_task_mutation_builder: add helper method	2026-05-13 16:10:55 +02:00
Nadav Har'El	5c065c7746	test/cqlpy: make test_vector_search_with_vector_store_mock faster The previous patch made test_vector_search_with_vector_store_mode significantly faster, but at 5 seconds for 7 tests, it was still not fast enough. It turns out that the reason why the tests was slow is that each test used a function-scoped fixture, which set up the vector store mock again and again, separately for each test. This - especially waiting for the client in Scylla to recognize the new server - took time (before the previous patch it was 5 seconds, after the patch it went down to 0.5 seconds - but still too slow). The solution is simple: 1. Create a module scoped fixture that creates the mock and connects it to Scylla just once for all the tests in that file. 2. The function scoped fixture just uses the module-scoped one but resets the saved responses, to avoid one test influencing the other. After this patch, the time to run this test file is down to 1 second (!). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 14:57:56 +03:00
Nadav Har'El	00bad34f12	vector-search: reset DNS timeout after changing host The vector-search client in ScyllaDB limits itself to doing one DNS lookup per 5 seconds. However, when the configuration changes to point to a different host, the DNS lookup should happen immediately, and this patch makes it do that. Before this patch, test/cqlpy/run test_vector_search_with_vector_store_mock.py Takes a whopping 34 seconds, more than 4 seconds per test! The problem is that each test creates a new mock vector-store server and reconfigures Scylla, and when reconfiguring Scylla nothing happens until the 5-second clock runs out. After this patch, the same test run is down to 5 seconds. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 14:49:00 +03:00
Nadav Har'El	4082fdf350	alternator: add ReturnScores option to VectorSearch A vector search operation in Alternator (VectorSearch option to Query) returns items sorted by decreasing similarity to the searched vector. Although the items are sorted by decreasing similarity scores, before this patch the user had no way to see the values of these scores. This patch adds a new VectorSearch option, `ReturnScores`. This option defaults to `NONE`. But if set to `SIMILARITY`, the query will return an array `Scores` with the same length as `Items`, which gives the similarity score for each item. As usual, this patch includes the implementation, the documentation, and tests for the new feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 14:19:17 +03:00
Nadav Har'El	c56361a6d7	vector_store_client: read and return similarity_scores The vector store returns for every ANN search, in addition to the keys of the matching items, two additional vectors - "distances" and "similarity_cores". The "distances" are raw distance metrics - lower scores are better matches, while "similarity_scores" are modified such that higher scores are better matches. Traditionally, search scores in systems like Cassandra and Open Search use the "similarity scores" approach (higher is better, results are returned in decreasing similarity order), so this is the more interesting vector of the two. But before this patch, our vector_store_client::ann() inspected only "distances". But... then, it didn't return even that to the caller :-) So in this patch, we: 1. Ignore "distances" and instead look at "similarity scores", which is what users really want based on their experience with other vector and non-vector search engines. 2. Return the similarity score of each match together with the match. We already have this score (the vector store returns it) and we can add it to the existing primary_key structure of each result. So each result is a "struct primary_key" which has fields partition, clustering, and after this patch - similarity. Existing callers in CQL and Alternator vector search will ignore this "similarity" field in each result, and not notice it was added. But in the next patch, we'll allow Alternator's vector search to return this similarity in each result. The existing unit tests for vector_store_client.cc mocked vector-store responses with "distances", without "similarity_scores", so no longer represent what we actually expect the vector store to do. So this patch also contains modifications for these tests, to mock and to test "similarity_scores" - not "distances". The more interesting tests, in the next patch, use the real vector store and check that we really do get a "similarity_scores" response from it. This patch also handles a small corner case for DOT_PRODUCT, which is the only unbounded similarity function. If the similarity overflows the 32-bit float, the vector store returns a JSON "null" instead of a JSON number (since JSON doesn't support infinite numbers). Our existing vector-store client code errored out when it saw this "null", which is wrong - the request should be allowed to proceed. So in this patch when we see a "null" JSON for similarity, we return +Inf. This is usually correct because the top results really have +Inf, not -Inf, but if we ask for all items we can reach those with similarity -Inf and incorrectly assign +Inf to them (we have a test for this case in the next patch). But this problenm won't happen when Limit is low, and in any case it's better than aborting the request after it had already succeeded. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 14:19:17 +03:00
Tomasz Grabiec	66439bb753	Merge 'load_balancer: apply balance threshold to intranode shard balancing' from Ferenc Szili - Fix intranode shard balancing to respect the size-based balance threshold, preventing unnecessary migrations when load difference between shards is negligible - Add a regression test that verifies the threshold is respected for intranode balancing The intranode shard balancing loop only stopped when the algorithm exhausted the migration candidates or when a migration would go against convergence (it would increase imbalance instead of decrease it). This caused unnecessary tablet migrations for negligible imbalances (e.g., 0.78% difference between shards). The inter-node balancer already uses `is_balanced()` to stop when the relative load difference is within the configured `size_based_balance_threshold`, but this check was missing from the intranode path. Apply the same `is_balanced()` threshold check that is already used for inter-node balancing to the intranode convergence loop. When the relative load difference between the most-loaded and least-loaded shards on a node is within the threshold, the balancer now stops without issuing further migrations. The test creates a single node with 2 shards and 512 tablets: 1. Balanced scenario (257 vs 255 tablets, same size): relative diff = 0.78% < 1% threshold → verifies no intranode migration is emitted 2. Unbalanced scenario (307 vs 205 tablets, same size): relative diff = 33% >> 1% threshold → verifies intranode migration IS emitted Fixes: SCYLLADB-1775 This is a performance improvement which reduces the number of intranode migrations issued, and needs to be backported to versions with size-based load balancing: 2026.1 and 2026.2 Closes scylladb/scylladb#29756 * github.com:scylladb/scylladb: test: add test for intranode balance threshold in size-based mode tablet_allocator: apply balance threshold to intranode shard balancing	2026-05-13 13:09:52 +02:00
Piotr Smaron	0fcae72530	test: bootstrap tombstone gc repair cluster sequentially Avoid concurrent topology changes in the tombstone GC repair setup, where debug-mode nodes running hinted handoff and materialized view startup work can time out while applying Raft entries before the test starts. Keep the sequential path opt-in so unrelated repair tests still exercise concurrent bootstrap behavior. Closes scylladb/scylladb#29829	2026-05-13 13:58:44 +03:00
Nadav Har'El	51c35c05e2	test/cqlpy: teach run-cassandra to use Docker The test/cqlpy/run-cassandra script makes it quite easy to run test/cqlpy tests against Cassandra, which is important for checking compatibility. Unfortunately, because modern Linux distributions like Fedora do not have either Cassandra or the old version of Java that it needs, the user needs to download those manually. This is fairly easy, and explained in detail in test/cqlpy/README.md, but nevertheless is a non-trivial manual step. So this patch adds an even simpler alternative, the "--docker" option which tells the script to run the official Cassandra docker image, complete with the version of Java that it prefers - the user does not need to download or install Cassandra or Java. The image is efficiently cached by Docker, so running run-cassandra again doesn't need to download it again; Moreover, trying several different versions of Cassandra only needs to download and store the shared parts (base image and Java) once. test/cqlpy/run-cassandra --docker test_file.py::test_function Runs by default the latest Cassandra 5 release. You can also use "--docker=4" to get the latest Cassandra 4 release, "--docker=3.11" to get the latest Cassandra 3.11 patch release, or "--docker=3.11.1" to get a specific patch release. In addition to the "--docker" option, this patch also introduces a second option, "--java-docker", which takes only Java from docker, but runs your locally installed Cassandra (to which you should point with the CASSANDRA environment variable, as before). This option can be useful if your host does not have a suitable version of Java, but you want to run a locally-installed or locally-modified version of Cassandra. The "--java-docker" option defaults to getting Java 11, to use other versions you can use for example "--java-docker=17". Fixes #25826. Closes scylladb/scylladb#29860	2026-05-13 11:57:18 +02:00
Nadav Har'El	85c6cafb1d	alternator: add optimized vector type for vector search Today in Alternator vector search, vectors are presented to the API as lists of numbers. I.e., in JSON a vector is sent in requests and responses as: {"L": [{"N": "3.14159"}, {"N":" "6.7"}} This format is verbose and inefficient for long vectors. Even worse, because the "N" number format has precision guarantees in DynamoDB, we cannot optimize the storage of such vectors by, for example, storing the numbers as 32-bit floats. We actually store these vectors as JSON, exactly as shown above. So in this patch we introduce a new DynamoDB type, "FLOAT32VECTOR", for vectors. The above vector will look like this in JSON: {"FLOAT32VECTOR": [3.14159, 6.7]} Note that each number is an unquoted JSON number, not a JSON string. Importantly, the definition of the "FLOAT32VECTOR" type specifies that components of the vector only have 32-bit precision. This means that Scylla may store internally these vectors as lists of 32-bit floats - not as a JSON. And indeed, this patch includes this optimization: Top-level vector attributes are now encoded in an optimized way, as a byte 5 (alternator_type::FLOAT32VECTOR) followed by the elements of the vector, just 4 bytes each (the 4-byte big-endian IEEE 754 representation of each floating-point component). This patch also includes documentation, and extensive tests that the new "FLOAT32VECTOR" type works (which also serves as an example how to use it in the boto3 SDK), that it is indeed encoded internally as 32-bit floats and not wasteful JSON strings, and that vector search on such items work. The last thing requires cooperation from the vector store, of course - it needs to be able to understand the new optimized encoding of vector attributes in addition to the old unoptimized one. Note that the old unoptimized ("list of numbers") vectors are still supported. Although not recommended for general use, some users might still want to use the unoptimized type if they have pre-existing data created on DynamoDB or Alternator without vector search in mind, and the vectors already exist as lists of numbers. Although this is less important, the new vector type "FLOAT32VECTOR" is also allowed in a Query's QueryVector. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 11:57:45 +03:00
Nadav Har'El	ea910acdd4	alternator: add SimilarityFunction option to vector index creation Before this patch, vector search always used the COSINE similarity function. In this patch we add the ability to choose a different similarity function when creating a new vector index (with CreateTable or UpdateTable) by using the SimilarityFunction option. We still default to "COSINE" if SimilarityFunction isn't specified. Allowed similarity functions are COSINE, DOT_PRODUCT, and EUCLIDEAN. DescribeTable can also retrieve a vector index's SimilarityFunction. As usual, this patch also includes documentation for the new feature, and tests. Some of the tests can run without a vector store - verifying the API syntax and which similarity function is supported - but we also add tests that require the vector store and check that the different similarity functions actually sort the nearest items in the expected order. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 11:57:45 +03:00
Nadav Har'El	70283967d3	alternator: add vector search metrics Before this patch, we did not have any special metrics for vector search in Alternator. We have had count of "Query" operations, but there was no distinction between "standard" queries - of a base table or GSI/LSI - and vector-search queries. This patch adds four new metrics: * vector_search_query - counting how many Query requests are actually vector searches. * vector_search_query_returned_items - counting how many items were returned by vector searches. * vector_search_query_items_from_vs - counting how many results were retrieved from the vector-store backend. * vector_search_query_items_from_base_table - counting how many items were read from the base table during vector-search queries. Some vector search queries using SELECT=ALL_PROJECTED_ATTRIBUTES or COUNT are optimized to not need to read items from the base table. This patch also includes documentation for the new four metrics, and tests that they count what we want them to count. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 11:57:44 +03:00
Patryk Jędrzejczak	3f2ff5a13f	Merge 'Remove raft_group0::finish_setup_after_join' from Gleb Natapov The function does nothing useful now. No backport needed. Removes code. Closes scylladb/scylladb#29828 * https://github.com/scylladb/scylladb: raft_group0: remove finish_setup_after_join function raft_group0: fix indentation after the last change raft_group: drop unneeded checks	2026-05-13 10:53:37 +02:00
Michał Jadwiszczak	1a32ccd8f6	db/system_keyspace: replace `make_remove_view_building_task_mutation()` with mutation builder Again, get rid of system keyspace method in favor of mutation builder, because `system.view_building_tasks` is a single parition table.	2026-05-13 10:06:18 +02:00
Michał Jadwiszczak	2561cc1546	db/view/view_building_task_mutation_builder: make uuid generator optional After scylladb/scylladb#28929 `task_uuid_generator` became necassary dependency of `view_building_task_mutation_builder`. However to create the generator we need `view_building_state`, which in some parts of the code (schema_tables.cc, migration_manager.cc) requires remote proxy to be obtained. But sometimes we need the mutation builder to just remove some view building task. In those cases, we don't need the uuid generator and the remote proxy requirement is not necassary.	2026-05-13 09:58:27 +02:00
Alex	ac0a19aab8	qos: reintroduce service levels v2 migration self-heal migrate_to_v2() was removed after gossip-based service level migration support was dropped, since upgraded nodes were expected to already use service levels v2. However, clusters affected by the old migration bug may reach raft topology while system.scylla_local still has a stale service level version. Restore the migration helper so startup can self-heal those nodes by writing the v2 state through group0.	2026-05-13 10:16:02 +03:00
Michael Litvak	80bfc445a8	strong_consistency: wait for leader when starting a group When starting the raft server for a group, wait for the leader before completing the start operation. We want the group to be ready to accept writes by the time the start is reported to be completed without the additional latency of waiting for leader.	2026-05-13 08:43:26 +02:00
Michael Litvak	5f8322a820	strong_consistency: change wait for groups to start on startup on startup, previously groups_manager::start() was called and waited for the groups to start. we change it instead to just start the raft servers in the background without waiting for them to be fully started. we wait for the servers to start explicitly at a later stage of startup, after starting the messaging service. the reason is that for the servers to be fully started they may require communication that requires the messaging service. currently it is not required, but it will be changed in the next commit.	2026-05-13 08:43:26 +02:00
Michael Litvak	e568ca2bd8	strong_consistency: optimize wait_for_groups_to_start instead of iterating over all raft groups in wait_for_groups_to_start and check if we need to wait for them, maintain a list of only the raft groups that are starting and need to be waited.	2026-05-13 08:43:26 +02:00
Michael Litvak	5a5c7c6241	strong_consistency: wait for raft servers to start in create table When creating a strongly consistent table, wait for the table's raft servers to start and be ready to serve queries before completing the operation. We want the create table operation to absorb the delay of starting the raft groups instead of the first queries. The create table coordinator commits and applies the schema statement, then it waits for all hosts that have a tablet replica to create and start the raft groups for the table's tablets. It does this by sending an RPC to all the relevant hosts that executes a group0 barrier, in order to ensure the table and raft groups are created, then waits for all raft groups on the host to finish starting and be ready. Fixes SCYLLADB-807	2026-05-13 08:43:24 +02:00
Yaniv Michael Kaul	5d6f160129	test: update get_scylla_2025_1_executable() to use 2025.1.12 Update the hardcoded 2025.1.0 binary URL to the latest 2025.1.12 release for upgrade tests. The 2025.1.12 binary now supports and enforces the rf_rack_valid_keyspaces option which the test harness enables by default. Since test_sstable_compression_dictionaries_upgrade creates a 2-node cluster in a single rack with RF=2, it violates the constraint. Disable the option explicitly for this test. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29714	2026-05-12 23:20:55 +02:00
Michał Jadwiszczak	d073097ebf	strong_consistency: allow QUORUM/LOCAL_QUORUM and ONE/LOCAL_ONE for reads We can execute strong consistent read queries in 2 ways: - with QUORUM/LOCAL_QUORUM CL - this path executes `read_barrier()` before reading the data, which synchronizes Raft log with the leader. But to execute it, we need quorum of replicas - with ONE/LOCAL_ONE CL - this path just reads data from one replica without any synchronization (not implemented yet)	2026-05-12 23:20:07 +02:00
Michał Jadwiszczak	68f0cf6fac	strong_consistency: allow only QUORUM/LOCAL_QUORUM CL for writes To successfully write data to strong consistent table, a quorum of replicas need to be used to save the data to Raft log. So the only reasonable consistency level is QUORUM/LOCAL_QUORUM (currently SC doesn't support multi DC).	2026-05-12 23:20:03 +02:00
Michał Jadwiszczak	e002665aa7	db/system_keyspace: replace `make_view_building_task_mutation()` with mutation builder `system.view_building_tasks` is a single partition table, so it makes more sense to use a mutation builder and generate 1 mutation per group0 command instead of generating multiple mutations.	2026-05-12 21:49:18 +02:00
Michał Jadwiszczak	4227cab5cb	db/view/view_building_task_mutation_builder: add helper method Add a method to set all task's fields.	2026-05-12 21:28:06 +02:00
Wojciech Mitros	f3cf20803b	test: run test_mv_admission_control_exception on one shard In the test we perform 2 consecutive writes where the first write is supposed to increase the view update backlog above the mv admission control threshold and the second one is expected to be rejected because of that. On each node/shard we have 2 types of view update backlogs: 1. for deciding whether we should admit writes 2. for propagating the backlog information to other nodes/shards. For the second write to be rejected, it must be performed on a node and shard which updated its backlog of type 1. The view update backlog of type 2. is immediately increased on the base table replica. For this backlog to be registered as a backlog of type 1., it needs to be either carried by gossip (happening once every second) or by attaching it to a replica write response. We don't want to increase the runtime of tests unnecessarily, so we don't wait and we rely on the second mechanism. The response to the first base table write (the one causing increase in the backlog) carries the increased backlog to the coordinator of this write. So for the second write to observe the increased backlog, it needs to be coordinated on the same node+shard as the first write. We make sure that both writes are coordinated on the same node+shard by using prepared statements combined with setting the host in `run_async`. Both writes target the same partition and with prepared statements we route them directly to the correct shard. That was the idea, at least. In practice, for the driver to learn the correct shard, it first needs to learn the token->shard mapping from the server. For vnodes it can expect a shard by calculating the token of the affected partition, but for tablets, it had no opportunity to learn the tablet->shard mapping so the first write may route to any shard. Additionally, we aren't guaranteed that the driver established connections to all shards on all nodes at the point of any write. So if a connection finishes establishing between the two writes, this may also cause us to coordinate these 2 writes on different shards, leading to a missed view backlog growth and not-rejected second write. We fix this in this patch by running the test using one shard on each node. This way, as long as we perform both writes on the same node, they'll also be coordinated on the same shard. This also makes the prepared statement and BoundStatement unnecessary — we can use SimpleStatement with FallthroughRetryPolicy directly. Fixes: SCYLLADB-1901 Closes scylladb/scylladb#29862	2026-05-12 17:34:19 +02:00
Piotr Dulikowski	129f193116	Merge 'strong_consistency: implement basic coordinator metrics' from Michał Jadwiszczak Add per-shard metrics for strong consistency coordinator operations (latency, timeouts, bounces, status unknown) under the `"strong_consistency_coordinator"` category. These are analogous to the eventual consistency metrics in `storage_proxy_stats`, enabling direct performance comparison between the two consistency modes. The metrics are simplified compared to `storage_proxy_stats` — no breakdown by table, tablet, scheduling group, or DC, only per-shard. Fixes SCYLLADB-1343 Strong consistency is still in experimental phase, no need to backport. Closes scylladb/scylladb#29318 * github.com:scylladb/scylladb: test/strong_consistency: verify metrics strong_consistency: wire up metrics to operations strong_consistency: add stats struct and metrics registration	2026-05-12 16:15:51 +02:00
Botond Dénes	e95eb21a16	Merge 'Tablet-aware restore' from Pavel Emelyanov The mechanics of the restore is like this - A /storage_service/tablets/restore API is called with (keyspace, table, endpoint, bucket, manifests) parameters - First, it populates the system_distributed.snapshot_sstables table with the data read from the manifests - Then it emplaces a bunch of tablet transitions (of a new "restore" kind), one for each tablet - The topology coordinator handles the "restore" transition by calling a new RESTORE_TABLET RPC against all the current tablet replicas - Each replica handles the RPC verb by - Reading the snapshot_sstables table - Filtering the read sstable infos against current node and tablet being handled - Downloading and attaching the filtered sstables This PR includes system_distributed.snapshot_sstables table from @robertbindar and preparation work from @kreuzerkrieg that extracts raw sstables downloading and attaching from existing generic sstables loading code. This is first step towards SCYLLADB-197 and lacks many things. In particular - the API only works for single-DC cluster - the caller needs to "lock" tablet boundaries with min/max tablet count - not abortable - no progress tracking - sub-optimal (re-kicking API on restore will re-download everything again) - not re-attacheable (if API node dies, restoration proceeds, but the caller cannot "wait" for it to complete via other node) - nodes download sstables in maintenance/streaming sched gorup (should be moved to maintenance/backup) Other follow-up items: - have an actual swagger object specification for `backup_location` Closes #28436 Closes #28657 Closes #28773 Closes scylladb/scylladb#28763 * github.com:scylladb/scylladb: docs: Update topology_over_raft.md with `restore` transition kind test: Add test for backup vs migration race test: Restore resilience test sstables_loader: Fail tablet-restore task if not all sstables were downloaded sstables_loader: mark sstables as downloaded after attaching sstables_loader: return shared_sstable from attach_sstable db: add update_sstable_download_status method db: add downloaded column to snapshot_sstables db: extract snapshot_sstables TTL into class constant test: Add a test for tablet-aware restore tablets: Implement tablet-aware cluster-wide restore messaging: Add RESTORE_TABLET RPC verb sstables_loader: Add method to download and attach sstables for a tablet tablets: Add restore_config to tablet_transition_info sstables_loader: Add restore_tablets task skeleton test: Add rest_client helper to kick newly introduced API endpoint api: Add /storage_service/tablets/restore endpoint skeleton sstables_loader: Add keyspace and table arguments to manfiest loading helper sstables_loader_helpers: just reformat the code sstables_loader_helpers: generalize argument and variable names sstables_loader_helpers: generalize get_sstables_for_tablet sstables_loader_helpers: add token getters for tablet filtering sstables_loader_helpers: remove underscores from struct members sstables_loader: move download_sstable and get_sstables_for_tablet sstables_loader: extract single-tablet SST filtering sstables_loader: make download_sstable static sstables_loader: fix formating of the new `download_sstable` function sstables_loader: extract single SST download into a function sstables_loader: add shard_id to minimal_sst_info sstables_loader: add function for parsing backup manifests split utility functions for creating test data from database_test export make_storage_options_config from lib/test_services rjson: Add helpers for conversions to dht::token and sstable_id Add system_distributed_keyspace.snapshot_sstables add get_system_distributed_keyspace to cql_test_env code: Add system_distributed_keyspace dependency to sstables_loader storage_service: Export export handle_raft_rpc() helper storage_service: Export do_tablet_operation() storage_service: Split transit_tablet() into two tablets: Add braces around tablet_transition_kind::repair switch	2026-05-12 16:24:13 +03:00
Yaniv Michael Kaul	c359a09189	test: add UDF/UDA keyspace isolation and UDT tests Port 3 tests from scylla-dtest user_functions_test.py: - test_udf_with_udt: UDF taking frozen UDT arg, verifies DROP TYPE blocked - test_udf_with_udt_keyspace_isolation: cross-keyspace UDT references rejected - test_aggregate_with_udt_keyspace_isolation: cross-keyspace UDT in UDA rejected All tests use Lua (Scylla's supported UDF language). Reproduces CASSANDRA-9409. Closes scylladb/scylladb#1928 Closes scylladb/scylladb#29843	2026-05-12 14:57:14 +03:00
Yaniv Michael Kaul	f55a55fbf3	docker: fix coredump collection when host uses pipe-based core_pattern The container image inherits kernel.core_pattern from the host. When the host pipes core dumps to a handler (e.g. Ubuntu's apport), that handler does not exist or work correctly inside the container, so core dumps are silently lost. Override any pipe-based core_pattern with a file-based pattern that writes directly to /var/lib/scylla/coredump/. The override is attempted both from the entrypoint (scyllasetup.coredumpSetup) and from scylla-server.sh when running as root; it succeeds only when the container has write access to /proc/sys/kernel/core_pattern and is silently skipped otherwise. Fixes: SCYLLADB-1366 Closes scylladb/scylladb#29337	2026-05-12 14:16:22 +03:00
Piotr Smaron	1018710e38	test/cqlpy: un-xfail oversized indexed value build test Issue #8627 is fixed, so test_too_large_indexed_value_build now passes and should run normally instead of XPASSing under strict xfail. Fixes: SCYLLADB-1938 Closes scylladb/scylladb#29853	2026-05-12 11:40:53 +02:00
Avi Kivity	ddb1181103	Merge 'load_balance: fix drain with forced capacity-based balancing' from Ferenc Szili When `force_capacity_based_balancing` is enabled and a node is being drained/excluded, the tablet allocator incorrectly aborts balancing due to incomplete tablet stats - even though capacity-based balancing doesn't depend on tablet sizes. The tablet allocator normally waits for complete load stats before balancing. An exception exists for drained+excluded nodes (they're unreachable and won't return stats). However, when forced capacity-based balancing is active, this exception was not being applied, causing the balancer to reject the drain plan. Adjust the condition in `tablet_allocator.cc` so that the "ignore missing data for drained nodes" logic applies regardless of whether capacity-based balancing is forced. Added a Boost unit test that forces capacity-based balancing and verifies a drained/excluded node gets its tablets migrated even when tablet size stats are missing. This bug was introduced in 2026.1, so this needs to be backported to 2026.1 and 2026.2 Fixes: SCYLLADB-1803 Closes scylladb/scylladb#29791 * github.com:scylladb/scylladb: test: boost: add drain test for forced capacity-based balancing service: allow draining with forced capacity-based balancing	2026-05-12 12:38:25 +03:00
Andrzej Jackowski	89261bf759	test: wait for TTL scheduling sanity metric The test samples sl:default runtime before and after setup writes to prove that it measures the scheduling group used by regular CQL writes. The metric is exported in milliseconds, so a single 200-row batch may not be visible immediately, or may be too small in some environments. Keep the original 200-row table size, but wait up to 30 seconds for the metric to advance. If it does not, retry the same writes before TTL is enabled. The retries update the same keys, so the expiration part of the test still waits for exactly the original number of rows. In a local 100-run with N=200 rows, the observed delta of `ms_statement_before - ms_statement_before_write` was: min=4.0, max=16.0, mean=8.13, and median=8.0. Therefore, it looks possible that in a rare corner case the delta drops even to 0. Fixes SCYLLADB-1869 Closes scylladb/scylladb#29797	2026-05-12 12:38:25 +03:00
Avi Kivity	6fca064ac8	Merge 'alternator: a couple of small cleanups suggested by copilot' from Nadav Har'El The first patch improves the input validation of the CONTAINS operator. I believe this is not a critical fix, because RapidJSON already has exception-throwing RAPIDJSON_ASSERT() that check for unexpected JSON structure (like something we expect to be a list isn't actually a list), but it's cleaner to do these checks explicitly. The second patch just removes an unnecessary call to format() on a constant string. Closes scylladb/scylladb#28506 * github.com:scylladb/scylladb: alternator: remove unneeded call to format() alternator: improve CONTAINS operator's validity checking	2026-05-12 12:38:25 +03:00
Botond Dénes	8d6f031a4a	schema: fix DESCRIBE showing NullCompactionStrategy when compaction is disabled When a table's compaction is disabled via 'enabled': 'false', the DESCRIBE output incorrectly showed NullCompactionStrategy instead of the actual strategy. This happened because schema_properties() called compaction_strategy(), which returns compaction_strategy_type::null when compaction is disabled. Fix it by using configured_compaction_strategy(), which always returns the real strategy type - consistent with how schema_tables.cc serializes it to disk. Fixes SCYLLADB-1353 Closes scylladb/scylladb#29804	2026-05-12 12:38:25 +03:00
Piotr Dulikowski	7c2b1ea0b5	Merge 'view_building: fix tombstone_warn_threshold warnings' from Michał Jadwiszczak `system.view_building_tasks` is a single-partition Raft group0 table (pk = `"view_building"`, CK = timeuuid). When `clean_finished_tasks()` deletes hundreds of finished tasks, the physical rows remain in SSTables until compaction. Any subsequent read of the partition counts every column of every tombstoned row as a dead cell, triggering `tombstone_warn_threshold` warnings in large clusters. Two-part fix: 1. Range tombstones instead of row tombstones (commits 2–3) Instead of one row tombstone per finished task, find the minimum alive task UUID (`min_alive_uuid`) and emit a single range tombstone `[before_all, min_alive_uuid)` covering all tasks below that boundary. This reduces the tombstone count significantly and also benefits future compaction. 2. Bounded scan with `min_task_id` (commits 4–6) Even with range tombstones, physical rows remain until compaction and still count as dead cells during reads. The only way to avoid them is to not read them at all. - Add a `min_task_id timeuuid` static column to `system.view_building_tasks`. - On every GC, write `min_task_id = min_alive_uuid` atomically with the range tombstone (same Raft batch). - On reload, read `min_task_id` first using a static-only partition slice (empty `_row_ranges` + `always_return_static_content`): the SSTable reader stops immediately after the static row before processing any clustering tombstones — zero dead cells counted. - Use `AND id >= min_task_id` as a lower bound for the main task scan, skipping all tombstoned rows. The static-only read and the bounded scan are gated on the `VIEW_BUILDING_TASKS_MIN_TASK_ID` cluster feature so mixed-version clusters fall back to the full scan. The issue is not critical, so the fix shouldn't be backported. Fixes SCYLLADB-657 Closes scylladb/scylladb#28929 * github.com:scylladb/scylladb: test/cluster/test_view_building_coordinator: add reproducer for tombstone threshold warning docs: document tombstone avoidance in view_building_tasks view_building: add `task_uuid_generator` to `view_building_task_mutation_builder` view_building: introduce `task_uuid_generator` view_building: store `min_alive_uuid` in view building state view_building: set min_task_id when GC-ing finished tasks view_building: add min_task_id support to view_building_task_mutation_builder view_building: add min_task_id static column and bounded scan to system_keyspace view_building: use range tombstone when GC-ing finished tasks view_building: add range tombstone support to view_building_task_mutation_builder view_building: introduce VIEW_BUILDING_TASKS_MIN_TASK_ID cluster feature	2026-05-12 12:38:25 +03:00

1 2 3 4 5 ...

53948 Commits