scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Botond Dénes	55704908a0	data_dictionary: table: add get_truncation_time() So the batchlog manager can avoid looking up the real table and instead just work with data dictionary.	2025-12-02 14:21:25 +02:00
Avi Kivity	ce2a403f18	Merge 'alternator: implement gzip-compressed requests' from Nadav Har'El In this series we implement Alternator's support for gzip-compressed requests, i.e., requests with the "Content-Encoding: gzip" header, other uncompressed header, and a gzip-compressed body. The server needs to verify the signature of the compressed content, and then uncompress the body before running the request. We only support gzip compression because this is what DynamoDB supports. But in the future we can easily add support for other compression algorithms like lz4 or zstd. This series Refs #5041 but doesn't "Fixes" it because it only implements compressed requests (Content-Encoding), not compressed responses (Accept-Encoding). In addition to the code changes, the series also contains tests for this feature that make sure it behaves like DynamoDB. Note that while we will have now support in our server for compressed requests, just like DynamoDB does, the clients (AWS SDKs) will probably NOT make use of it because they do not enable request compression by default. For example, see the tests for some hoops one needs to jump through in boto3 (the Python SDK) to send compressed requests. However, we are hoping that in the future Alternator's modified clients will use compressed requests and enjoy this feature. Closes scylladb/scylladb#27080 * github.com:scylladb/scylladb: test/alternator: enable, and add, tests for gzip'ed requests alternator: implement gzip-compressed requests	2025-11-30 13:27:46 +02:00
Avi Kivity	d4be9a058c	Update seastar submodule seastar::compat::source_location (which should not have been used outside Seastar) is replaced with std::source_location to avoid deprecation warnings. The relevant header, which was removed, is no longer included. * seastar 8c3fba7a...b5c76d6b (3): > testing: There can be only one memory_data_sink > util: Use std::source_location directly > Merge 'net: support proxy protocol v2' from Avi Kivity apps: httpd: add --load-balancing-algorithm apps: httpd: add /shard endpoint test: socket_test: add proxy protocol v2 test suite test: socket_test: test load balancer with proxy protocol net: posix_connected_socket: specialize for proxied connections net: posix_server_socket_impl: implement proxy protocol in server sockets net: posix_server_socket_impl: adjust indentation net: posix_server_socket_impl: avoid immediately-invoked lambda net: conntrack: complete handle nested class special member functions net: posix_server_socket_impl: coroutinize accept() Closes scylladb/scylladb#27316	2025-11-30 12:38:47 +02:00
Piotr Dulikowski	44c605e59c	Merge 'Fix the types of change events in Alternator Streams' from Piotr Wieczorek This patch increases the compatibility with DynamoDB Streams by integrating the DynamoDB's event type rules (described in https://github.com/scylladb/scylladb/issues/6918) into Alternator. The main changes are: - introduce a new flag `alternator_streams_strict_compatibility`, meant as a guard of performance-intensive operations that increase the compatibility with DynamoDB Streams. If enabled, Alternator always performs a RBW before a data-modifying operation, and propagates its result to CDC. Then, the old item is compared to the new one, to determine the mutation type (INSERT vs MODIFY). This option is a no-op for tables with disabled Alternator Streams, - reduce splitting of simple Alternator mutations, - correctly distinguish event types described in #6918, except for item deletes. Deleting a missing item with DeleteItem, BatchWriteItem, or a missing field with UpdateItem still emit REMOVEs. To summarize, the emitted events of the data manipulation operations should be as follows: - DeleteItem/BatchWriteItem.DeleteItem of existing item: REMOVE (OK) - DeleteItem of nonexistent item: nothing (OK) - BatchWriteItem.DeleteItem of nonexistent item: nothing (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of existing and not equal item: MODIFY (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of existing and equal item: nothing (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of nonexistent item: INSERT (OK) No backport is necessary. Refs https://github.com/scylladb/scylladb/pull/26149 Refs https://github.com/scylladb/scylladb/pull/26396 Refs https://github.com/scylladb/scylladb/issues/26382 Fixes https://github.com/scylladb/scylladb/issues/6918 Closes scylladb/scylladb#26121 * github.com:scylladb/scylladb: test/alternator: Enable the tests failing because of #6918 alternator, cdc: Don't emit events for no-op removes alternator, cdc: Don't emit an event for equal items alternator/streams, cdc: Differentiate item replace and item update in CDC alternator: Change the return type of rmw_operation_return config: Add alternator_streams_strict_compatibility flag cdc: Don't split a row marker away from row cells	2025-11-30 07:20:22 +01:00
Asias He	da5cc13e97	repair: Fix deadlock when topology coordinator steps down in the middle Consider this: 1) n1 is the topology coordinator 2) n1 schedules and executes a tablet repair with session id s1 for a tablet on n3 an n4. 3) n3 and n4 take and store the in _rs._repair_compaction_locks[s1] 4) n1 steps down before it executes locator::tablet_transition_stage::end_repair 5) n2 becomes the new topology coordinator 6) n2 runs locator::tablet_transition_stage::repair again 7) n3 and n4 try to take the lock again and hangs since the lock is already taken. To avoid the deadlock, we can throw in step 7 so that n2 will proceed to end_repair stage and release the lock. After that, the scheduler could schedule the tablet repair request again. Fixes #26346 Closes scylladb/scylladb#27163	2025-11-28 15:14:39 +01:00
Radosław Cybulski	b54a9f4613	Fix use-after-free in encode_paging_state in Alternator Fix unlikely use-after-free in `encode_paging_state`. The function incorrectly assumes that current position to encode will always have data for all clustering columns the schema defines. It's possible to encounter current position having less than all columns specified, for eample in case of range tombstone. Those don't happen in Alternator tables as DynamoDB doesn't allow range deletions and clustering key might be of size at most 1. Alternator api can be used to read scylla system tables and those do have range tombstones with more than single clustering column. The fix is to stop trying to encode columns, that don't have the value - they are not needed anyway, as there's no possible position with those values (range tombstone made sure of that). Fixes #27001 Fixes #27125 Closes scylladb/scylladb#26960	2025-11-28 16:51:15 +03:00
Pavel Emelyanov	d35ce81ff1	Merge 'test: wait for read_barrier in wait_until_driver_service_level_created' from Andrzej Jackowski Previously, `wait_until_driver_service_level_created` only waited for the `driver` service level to appear in the output of `LIST ALL SERVICE_LEVELS`. However, the fact that one node lists `sl:driver` does not necessarily mean that all other nodes can see it yet. This caused sporadic test failures, especially in DEBUG builds. To prevent these failures, this change adds an extra wait for a `raft/read_barrier` after the `driver` service level first appears. This ensures the service level is globally visible across the cluster. Fixes: https://github.com/scylladb/scylladb/issues/27019 Na backport - test fix for `sl:driver` tests, and this that is only available on `master` Closes scylladb/scylladb#27076 * github.com:scylladb/scylladb: test: wait for read_barrier in wait_until_driver_service_level_created test: use ManagerClient in wait_until_driver_service_level_created	2025-11-28 16:47:29 +03:00
Dawid Mędrek	b76af2d07f	cql3: Improve errors when manipulating default service level Before this commit, any attempt to create, alter, attach, or drop the default service level would result in a syntax error whose error message was unclear: ``` cqlsh> attach service level default to cassandra; SyntaxException: line 1:21 no viable alternative at input 'default' ``` The error stems from the grammar not being able to parse `default` as a correct service level name. To fix that, we cover it manually. This way, the grammar accepts it and we can process it in Scylla. The reason why we'd like to cover the default service level is that it's an actual service level that the user should reference. Getting a syntax error is not what should happen. Hence this fix. We validate the input and if the given role is really the default service level, we reject the query and provide an informative error message. Two validation tests are provided. Fixes scylladb/scylladb#26699 Closes scylladb/scylladb#27162	2025-11-28 15:32:37 +03:00
Calle Wilund	59c87025d1	commitlog::read_log_file: Check for eof position on all data reads Fixes #24346 When reading, we check for each entry and each chunk, if advancing there will hit EOF of the segment. However, IFF the last chunk being read has the last entry _exactly_ matching the chunk size, and the chunk ending at _exactly_ segment size (preset size, typically 32Mb), we did not check the position, and instead complained about not being able to read. This has literally _never_ happened in actual commitlog (that was replayed at least), but has apparently happened more and more in hints replay. Fix is simple, just check the file position against size when advancing said position, i.e. when reading (skipping already does). v2: * Added unit test Closes scylladb/scylladb#27236	2025-11-28 15:26:46 +03:00
Michael Litvak	97b7c03709	tablet: scheduler: Do not emit conflicting migration in merge colocation The tablet scheduler should not emit conflicting migrations for the same tablet. This was addressed initially in scylladb/scylladb#26038 but the check is missing in the merge colocation plan, so add it there as well. Without this check, the merge colocation plan could generate a conflicting migration for a tablet that is already scheduled for migration, as the test demonstrates. This can cause correctness problems, because if the load balancer generates two migrations for a single tablet, both will be written as mutations, and the resulting mutation could contain mixed cells from both migrations. Fixes scylladb/scylladb#27304 Closes scylladb/scylladb#27312	2025-11-28 11:17:12 +01:00
Pavel Emelyanov	54edb44b20	code: Stop using seastar::compat::source_location And switch to std::source_location. Upcoming seastar update will deprecate its compatibility layer. The patch is for f in $(git grep -l 'seastar::compat::source_location'); do sed -e 's/seastar::compat::source_location/std::source_location/g' -i $f; done and removal of few header includes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#27309	2025-11-27 19:10:11 +02:00
Andrzej Jackowski	e366030a92	treewide: seastar module update The reason for this seastar update is to have the fixed handling of the `integer` type in `seastar-json2code` because it's needed for further development of ScyllaDB REST API. The following changes were introduced to ScyllaDB code to ensure it compiles with the updated seastar: - Remove `seastar/util/modules.hh` includes as the file was removed from seastar - Modified `metrics::impl::labels_type` construction in `test/boost/group0_test.cc` because now it requires `escaped_string` * seastar 340e14a7...8c3fba7a (32): > Merge 'Remove net::packet usage from dns.cc' from Pavel Emelyanov dns: Optimize packet sending for newer c-ares versions dns: Replace net::packet with vector<temporary_buffer> dns: Remove unused local variable dns: Remove pointless for () loop wrapping dns: Introduce do_sendv_tcp() method dns: Introduce do_send_udp() method > test: Add http rules test of matching order > Merge 'Generalize packet_data_source into memory_data_source' from Pavel Emelyanov memcached: Patch test to use memory_data_source memcached: Use memory_data_source in server rpc: Use memory_data_sink without constructing net::packet util: Generalize packet_data_source into memory_data_source > tests: coroutines: restore "explicit this" tests > reactor: remove blocking of SIGILL > Merge 'Update compilers in GH actions scripts' from Pavel Emelyanov github: Use gcc-14 github: Use clang-20 > Merge 'Reinforce DNS reverse resolution test ' from Pavel Emelyanov test: Make test_resolve() try several addresses test: Coroutinize test_resolve() helper > modules: make module support standards-compliant > Merge 'Fix incorrect union access in dns resolver' from Pavel Emelyanov dns: Squash two if blocks together dns: Do not check tcp entry for udp type > coroutine: Fix compilation of execute_involving_handle_destruction_in_await_suspend > promise: Document that promise is resolved at most once > coroutine: exception: workaround broken destroy coroutine handle in await_suspend > socket: Return unspecified socket_address for unconnected socket > smp: Fix exception safety of invoke_on_... internal copying > Merge 'Improve loads evaluation by reactor' from Pavel Emelyanov reactor: Keep loads timer on reactor reactor: Update loads evaluation loop > Merge 'scripts: add 'integer' type to seastar-json2code' from Andrzej Jackowski test: extend tests/unit/api.json to use 'integer' type scripts: add 'integer' type to seastar-json2code > Merge 'Sanitize tls::session::do_put(_one)? overloads' from Pavel Emelyanov tls: Rename do_put_one(temporary_buffer) into do_put() tls: Fix indentation after previous patch tls: Move semaphore grab into iterating do_put() > net: tcp: change unsent queue from packets to temporary_buffer:s > timer: Enable highres timer based on next timeout value > rpc: Add a new constructor in closed_error to accept string argument > memcache: Implement own data sink for responses > Merge 'file: recursive_remove_directory: general cleanup' from Avi Kivity file: do_recursive_remove_directory(): move object when popping from queue file: do_recursive_remove_directory(): adjust indentation file: do_recursive_remove_directory(): coroutinize file: do_recursive_remove_directory(): simplify conditional file: do_recursive_remove_directory(): remove wrong const file: do_recursive_remove_directory(): clean up work_entry > tests: Move thread_context_switch_test into perf/ > test: Add unit test for append_challenged_posix_file > Merge 'Prometheus metrics handler optimization' from Travis Downs prometheus: optimize metrics aggregation prometheus: move and test aggregate_by helper prometheus: various optimizations metrics: introduce escaped_string for label values metric:value: implement + in terms of += tests: add prometheus text format acceptance tests extract memory_data_sink.hh metrics_perf: enhance metrics bench > demos: Simplify udp_zero_copy_demo's way of preparing the packet > metrics: Remove deprecated make_...-ers > Merge 'Make slab_test be BOOST kind' from Pavel Emelyanov test: Use BOOST_REQUIRE checkers test: Replace some SEASTAR_ASSERT-s with static_assert-s test: Convert slab test into boost kind > Merge 'Coroutinize lister_test' from Pavel Emelyanov test: Fix indentation after previuous patch test: Coroutinize lister_test lister::report() method test: Coroutinize lister_test main code > file: recursive_remove_directory(): use a list instead of a deque > Merge 'Stop using packets in tls data_sink and session' from Pavel Emelyanov tls: Stop using net::packet in session::put() tls: Fix indentation after previous patch tls: Split session::do_put() tls: Mark some session methods private Closes scylladb/scylladb#27240	2025-11-27 12:34:22 +02:00
Nadav Har'El	32afcdbaf0	test/alternator: enable, and add, tests for gzip'ed requests After in the previous patch we implemented support in Alternator for gzip-compressed requests ("Content-Encoding: gzip"), here we enable an existing xfail-ing test for this feature, and also add more tests for more cases: * A test for longer compressed requests, or a short compressed request which expands to a longer request. Since the decompression uses small buffers, this test reaches additional code paths. * Check for various cases of a malformed gzip'ed request, and also an attempt to use an unsupported Content-Encoding. DynamoDB returns error 500 for both cases, so we want to test that we do to - and not silently ignore such errors. * Check that two concatenated gzip'ed streams is a valid request, and check that garbage at the end of the gzip - or a missing character at the end of the gzip - is recognized as an error. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-11-27 09:42:47 +02:00
Patryk Jędrzejczak	cc273e867d	Merge 'fix notification about expiring erm held for to long' from Gleb Natapov Commit `6e4803a750` broke notification about expired erms held for too long since it resets the tracker without calling its destructor (where notification is triggered). Fix the assign operator to call the destructor like it should. Fixes https://github.com/scylladb/scylladb/issues/27141 Closes scylladb/scylladb#27140 * https://github.com/scylladb/scylladb: test: test that expired erm that held for too long triggers notification token_metadata: fix notification about expiring erm held for to long	2025-11-26 12:59:00 +01:00
Nadav Har'El	9cde93e3da	Merge 'db/view/view_building_coordinator: get rid of task's state in group0' from Michał Jadwiszczak Previously, the view building coordinator relied on setting each task's state to STARTED and then explicitly removing these state entries once tasks finished, before scheduling new ones. This approach induced a significant number of group0 commits, particularly in large clusters with many nodes and tablets, negatively impacting performance and scalability. With the update, the coordinator and worker logic has been restructured to operate without maintaining per-task states. Instead, tasks are simply tracked with an aborted boolean flag, which is still essential for certain tablet operations. This change removes much of the coordination complexity, simplifies the view building code, and reduces operational overhead. In addition, the coordinator now batches reports of finished tasks before making commits. Rather than committing task completions individually, it aggregates them and reports in groups, significantly minimizing the frequency of group0 commits. This new approach is expected to improve efficiency and scalability during materialized view construction, especially in large deployments. Fixes https://github.com/scylladb/scylladb/issues/26311 This patch needs to be backported to 2025.4. Closes scylladb/scylladb#26897 * github.com:scylladb/scylladb: docs/dev/view-building-coordinator: update the docs after recent changes db/view/view_building: send coordinator's term in the RPC db/view/view_building_state: replace task's state with `aborted` flag db/view/view_building_coordinator: batch finished tasks reporting db/view/view_building_worker: change internal implementation db/view/view_building_coordinator: change `work_on_tasks` RPC return type	2025-11-26 11:35:44 +02:00
Botond Dénes	384bffb8da	Merge 'compaction: limit the maximum shares allocated to a compaction scheduling class' from Raphael Raph Carvalho This PR adds support for limiting the maximum shares allocated to a compaction scheduling class by the compaction controller. It introduces a new configuration parameter, compaction_max_shares, which, when set to a non zero value, will cap the shares allocated to compaction jobs. This PR also exposes the shares computed by the compaction controller via metrics, for observability purposes. Fixes https://github.com/scylladb/scylladb/issues/9431 Enhancement. No need to backport. NOTE: Replaces PR https://github.com/scylladb/scylladb/pull/26696 Ran a test in which the backlog raised the need for max shares (normalized backlog above normalization_factor), and played with different values for new option compaction_max_shares to see it works (500, 1000, 2000, 250, 50) Closes scylladb/scylladb#27024 * github.com:scylladb/scylladb: db/config: introduce new config parameter `compaction_max_shares` compaction_manager:config: introduce max_shares compaction_controller: add configurable maximum shares compaction_controller: introduce `set_max_shares()`	2025-11-26 06:51:30 +02:00
Botond Dénes	584f4e467e	tools/scylla-sstable: introduce the dump-schema command There is a limited number of ways to obtain the schema of a table: 1) Use DESCRIBE TABLE in cqlsh 2) Find the schema definition in the code (for system tables) 3) Ask support/user to provide schema 4) Piece together the schema definition from the system tables Option (1) is the most convenient but requires access to live cluster. (2) is limited to system tables only. When investigating issues for customers, we have to rely on (3) and this often adds communication round-trips and delays. (4) requires knowledge of ScyllaDB internals and access to system tables. The new dump-schema commands provides a convenient way to obtain the schema of tables, given that there is access to either an sstable or the system tables. It can dump the schema of system tables without either. Closes scylladb/scylladb#26433	2025-11-25 20:32:36 +03:00
Gleb Natapov	5dcdaa6f66	test: test that expired erm that held for too long triggers notification	2025-11-25 17:33:54 +02:00
Piotr Dulikowski	ff5c7bd960	Merge 'topology_coordinator: don't repair colocated tablets' from Michael Litvak With the introduction of colocated tables, all the tablet transitions now operate on groups of colocated tablets instead of individual tablets. such is tablet migration, and also tablet repair. The tablet repair currently doesn't work on individual tablets due to the limitations in the tablet map being shared. The way it was implemented to work on a group of colocated tablets is by repairing all the colocated tablets together, using a dedicated rpc, and setting a shared repair_time in the shared tablet map. It was implemented this way because we wanted to have some way to repair the tablets of a colocated table. However, we want to change this in the next release so that it will be possible to repair the tablets of a colocated table individually. In order to simplify and prepare for the future change, we prefer until then to not repair colocated tables at all. otherwise, we will need to support both the shared repair and individual repair together for a long time, and the upgrade will be more complicated. We change the handling of the tablet 'repair' transition to repair only the base table's tablets. It means it will not be possible to request tablet repair for a non-base colocated table such as local MV, CDC and paxos table. This restriction will be temporary until a later release where we will suuport repairing colocated tablets. This is a reasonable restriction because repair for these kind of tables is not required or as important as for normal tables. Fixes https://github.com/scylladb/scylladb/issues/27119 backport to 2025.4 since we must change it in the same version it's introduced before it's released Closes scylladb/scylladb#27120 * github.com:scylladb/scylladb: tombstone_gc: don't use 'repair' mode for colocated tables Revert "storage service: add repair colocated tablets rpc" topology_coordinator: don't repair colocated tablets	2025-11-25 14:58:06 +01:00
Michał Jadwiszczak	fb8cbf1615	db/view/view_building: send coordinator's term in the RPC To avoid case when an old coordinator (which hasn't been stopped yet) dictates what should be done, add raft term to the `work_on_view_building_tasks` RPC. The worker needs to check if the term matches the current term from raft server, and deny the request when the term is bad.	2025-11-25 12:14:05 +01:00
Nadav Har'El	bcd1758911	Merge 'vector_search: add validator tests' from Pawel Pery The vector-search-validator is a binary tool which do functional and integration tests between scylla and vector-store. It is build in Rust mainly in vector-store repository. This patch adds possibility to write tests on scylladb repository side, compile them together with vector-store tests and run them in `test.py` environment. There are three parts of the change: - add sources of validator to the `test/vector_search_validator` directory - add support for building validator and vector-store in `build/vector-search-validator/bin` directory with or without cmake - add support for `pytest` and `test.py` to run validator test locally and in the CI environment; this part adds also README to the `test/vector_search_validator` directory Design for validator integration tests: https://scylladb.atlassian.net/wiki/spaces/RND/pages/39518215/Vector+Search+Core+Test+Plan+Document References: VECTOR-50 No backport needed as this is a new functionality. Closes scylladb/scylladb#26653 * github.com:scylladb/scylladb: vector_search: add vector-search-validator tests vector_search: implement building vector-search-validator vector_search: add vector-search-validator sources	2025-11-25 10:34:33 +02:00
Michael Litvak	868ac42a8b	tombstone_gc: don't use 'repair' mode for colocated tables For tables of special types that can be located: MV, CDC, and paxos table, we should not use tombstone_gc=repair mode because colocated tablets are never repaired, hence they will not have repair_time set and will never be GC'd using 'repair' mode.	2025-11-25 09:15:46 +01:00
Michael Litvak	273f664496	topology_coordinator: don't repair colocated tablets With the introduction of colocated tables, all the tablet transitions now operate on groups of colocated tablets instead of individual tablets. such is tablet migration, and also tablet repair. The tablet repair currently doesn't work on individual tablets due to the limitations in the tablet map being shared. The way it was implemented to work on a group of colocated tablets is by repairing all the colocated tablets together, using a dedicated rpc, and setting a shared repair_time in the shared tablet map. It was implemented this way because we wanted to have some way to repair the tablets of a colocated table. However, we want to change this in the next release so that it will be possible to repair the tablets of a colocated table individually. In order to simplify and prepare for the future change, we prefer until then to not repair colocated tables at all. otherwise, we will need to support both the shared repair and individual repair together for a long time, and the upgrade will be more complicated. We change the handling of the tablet 'repair' transition to repair only the base table's tablets. It means it will not be possible to request tablet repair for a non-base colocated table such as local MV, CDC and paxos table. This restriction will be temporary until a later release where we will suuport repairing colocated tablets. This is a reasonable restriction because repair for these kind of tables is not required or as important as for normal tables. Fixes scylladb/scylladb#27119	2025-11-25 09:05:59 +01:00
Karol Nowacki	ca62effdd2	vector_search: Restrict vector index tests to tablets only Vector indexes are going to be supported only for tablets (see VECTOR-322). As a result, tests using vector indexes will be failing when run with vnodes. This change ensures tests using vector indexes run exclusively with tablets. Fixes: VECTOR-49 Closes scylladb/scylladb#26843	2025-11-25 09:26:16 +02:00
Pawel Pery	9f10aebc66	vector_search: add vector-search-validator tests The commit adds a functionality for `pytest` and `test.py` to run `vector-search-validator` in `sudo unshare` environment. There are already two tests - first parametrized `test_validator.py::test_validator[test-case-name]` (run validator) and second `test_cargo_toml.py::test_cargo_toml` (check if the current `Cargo.toml` for validator is correct). Documentation for these tests are provided in `README.md`.	2025-11-24 17:26:04 +01:00
Pawel Pery	3702e982b9	vector_search: implement building vector-search-validator The commit adds targets building `build/vector-search-validator/bin/{vector-store,vector-search-validator}. The targets must be build for tests. They don't depend on build mode. The commit adds target in `configure.py` and also in `cmake`.	2025-11-24 17:26:04 +01:00
Pawel Pery	e569a04785	vector_search: add vector-search-validator sources The commit adds validator sources uses combination of local files and vector-store's files. In `build-env` there are definition of vector-store git repository and revision on which validator will be built. `cargo-toml-template` is script for printing current `Cargo.toml` to the stdout. After updating `build-env` developer needs to update new configuration with `./cargo-toml-template > Cargo.toml`. Git revision is used in several places in `Cargo.toml` and will be used for building `vector-store`, so for better handling git revision it should be setup only in one place. The validator is divided into several crates to be able to built it within scylladb and vector-store repositories. Here we need to create a new validator crate with simple `main` function and call `validator_engine::main` there. We provide tests written in scylladb repo in `validator-scylla` crate. The commit provides empty `cql` test case, which should be filled in the future.	2025-11-24 17:26:04 +01:00
Gleb Natapov	39cec4ae45	topology: let banned node know that it is banned Currently if a banned node tries to connect to a cluster it fails to create connections, but has no idea why, so from inside the node it looks like it has communication problems. This patch adds new rpc NOTIFY_BANNED which is sent back to the node when its connection is dropped. On receiving the rpc the node isolates itself and print an informative message about why it did so. Closes scylladb/scylladb#26943	2025-11-24 17:12:13 +01:00
Lakshmi Narayanan Sreethar	9cb766f929	db/config: introduce new config parameter `compaction_max_shares` Add support for the new configuration parameter `compaction_max_shares`, and update the compaction manager to pass it down to the compaction controller when it changes. The shares allocated to compaction jobs will be limited by this new parameter. Fixes #9431 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-11-24 12:52:29 -03:00
Lakshmi Narayanan Sreethar	468b800e89	compaction_manager:config: introduce max_shares Introduce an updateable value `max_shares` to compaction manager's config. Also add a method `update_max_shares()` that applies the latest `max_shares` value to the compaction controller’s `max_shares`. This new variable will be connected to a config parameter in the next patch. Refs #9431 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-11-24 11:43:38 -03:00
Tomasz Grabiec	d4b77c422f	Merge 'load_stats: leaving replica could be std::nullopt' from Ferenc Szili When migrating tablet size during the end_migration tablet transition stage, we need the pending and leaving replica hosts. The leaving and pending replicas are gathered in objects of type std::optional<tablet_replica> and are not checked if they contain a value before dereferencing which could cause an exception in the topology coordinator. This patch adds a check for leaving and pending replicas, and only performs the tablet size migration if neither are empty. This bug was introduced in `10f07fb95a` This change also adds the ability to create a tablet size in load_stats during end_migration stage of a tablet rebuild. We compute the new tablet size from by averaging the tablet sizes of the existing replicas. This change also adds the virtual table tablet_sizes which contains tablet sizes of all the replicas of all the tablets in the cluster. A version containing this bug has not yet been released, so a backport is not needed. Closes scylladb/scylladb#27118 * github.com:scylladb/scylladb: test: add tests for tablet size migration during end_migration virtual_table: add tablet_sizes virtual table load_stats: update tablet sizes after migration or rebuild	2025-11-24 15:31:30 +01:00
Botond Dénes	296d7b8595	Merge 'Enable digest+checksum verification for file based streaming' from Taras Veretilnyk This patch enables integrity check in 'create_stream_sources()' by introducing a new 'sstable_data_stream_source_impl' class for handling the Data component of SSTables. The new implementation uses 'sstable::data_stream()' with 'integrity_check::yes' instead of the raw input_stream. These additional checks require reading the digest and CRC components from disk, which may introduce some I/O overhead. For uncompressed SSTables, this involves loading and computing checksums and digest from the data. For compressed SSTables - where checksums are already embedded - the cost comes from reading, calculating and verifying the diges. New test cases were added to verify that the integrity checks work correctly, detecting both data and digest mismatches. Backport is not required, since it is a new feature Fixes #21776 Closes scylladb/scylladb#26702 * github.com:scylladb/scylladb: file_stream_test: add sstable file streaming integrity verification test cases streaming: prioritize sender-side errors in tablet_stream_files sstables: enable integrity check for data file streaming sstables: Add compressed raw streaming support sstables: Allow to read digest and checksum from user provided file instance sstables: add overload of data_stream() to accept custom file_input_stream_options	2025-11-24 06:37:27 +02:00
Aleksandra Martyniuk	76174d1f7a	cql3: reject ALTER KEYSPACE if rf of datacenter with tablets is omitted In ALTER KEYSPACE, when a datacenter name is omitted, its replication factor is implicitly set to zero with vnodes, while with tablets, it remains unchanged. ALTER KEYSPACE should behave the same way for tablets as it does for vnodes. However, this can be dangerous as we may mistakenly drop the whole datacenter. Reject ALTER KEYSPACE if it changes replication factor, but omits a datacenter that currently contains tablet replicas. Fixes: https://github.com/scylladb/scylladb/issues/25549. Closes scylladb/scylladb#25731	2025-11-24 06:36:51 +02:00
Avi Kivity	85db7b1caf	Merge 'address_map: Use more efficient and reliable replication method' from Tomasz Grabiec Primary issue with the old method is that each update is a separate cross-shard call, and all later updates queue behind it. If one of the shards has high latency for such calls, the queue may accumulate and system will appear unresponsive for mapping changes on non-zero shards. This happened in the field when one of the shards was overloaded with sstables and compaction work, which caused frequent stalls which delayed polling for ~100ms. A queue of 3k address updates accumulated, because we update mapping on each change of gossip states. This made bootstrap impossible because nodes couldn't learn about the IP mapping for the bootstrapping node and streaming failed. To protect against that, use a more efficient method of replication which requires a single cross-shard call to replicate all prior updates. It is also more reliable, if replication fails transiently for some reason, we don't give up and fail all later updates. Fixes #26865 Closes scylladb/scylladb#26941 * github.com:scylladb/scylladb: address_map: Use barrier() to wait for replication address_map: Use more efficient and reliable replication method utils: Introduce helper for replicated data structures	2025-11-23 19:15:12 +02:00
Avi Kivity	b0643f8959	Merge 'db/config: enable `ms` sstable format by default' from Michał Chojnowski Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes. Make them the new default. If we change our mind, this change can be reverted later. New functionality, and this is a drastic change. No backport needed. Closes scylladb/scylladb#26377 * github.com:scylladb/scylladb: db/config: enable `ms` sstable format by default cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format api/system: add /system/chosen_sstable_version test/cluster/dtest: reduce num_tokens to 16	2025-11-23 13:52:57 +02:00
Karol Nowacki	c40b3ba4b3	vector_search: Add HTTPS support for vector store connections This commit introduces TLS encryption support for vector store connections. A new configuration option is added: - vector_store_encryption_options.truststore: path to the trust store file To enable secure connections, use the https:// scheme in the vector_store_primary_uri/vector_store_secondary_uri configuration options. Fixes: VECTOR-327	2025-11-22 08:18:45 +01:00
Ferenc Szili	39711920eb	test: add tests for tablet size migration during end_migration This change adds tests for the correctness of tablet size migration during the end_migrations stage. This size migration can happend for tablet migrations and for tablet rebuild.	2025-11-21 16:58:11 +01:00
Taras Veretilnyk	3003669c96	file_stream_test: add sstable file streaming integrity verification test cases Add 'test_sstable_stream' to verify SSTable file streaming integrity check. The new tests cover both compressed and uncompressed SSTables and includes: - Checksum mismatch detection verification - Digest mismatch detection verifivation	2025-11-21 12:52:35 +01:00
Michał Chojnowski	da51a30780	db/config: enable `ms` sstable format by default Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes. Make them the new default. If we change our mind, this change can be reverted later.	2025-11-21 12:39:46 +01:00
Michał Chojnowski	73090c0d27	cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format Trie-based indexes and older indexes have a difference in metrics, and the test uses the metrics to check for bypass cache. To choose the right metrics, it uses highest_supported_sstable_format, which is inappropriate, because the sstable format chosen for writes by Scylla might be different than highest_supported_sstable_format. Use chosen_sstable_format instead.	2025-11-21 12:39:46 +01:00
Michał Chojnowski	38e14d9cd5	api/system: add /system/chosen_sstable_version Returns the sstable version currently chosen for use in for new sstables. We are adding it because some tests want to know what format they are writing (tests using upgradesstable, tests which check stats that only apply to one of the index types, etc). (Currently they are using `highest_supported_sstable_format` for this purpose, which is inappropriate, and will become invalid if a non-latest format is the default).	2025-11-21 12:39:46 +01:00
Botond Dénes	5c6813ccd0	test/cluster/test_repair.py: add test_repair_timestamp_difference Add a test which verifies that if two nodes have the same data, with different timestamps, repair will detect and fix the diverging timestamps. All our repair tests focus on difference in data and I remember writing this test multiple times in the past to quickly verify whether this works. Time to upstream this test. Closes scylladb/scylladb#26900	2025-11-21 14:19:51 +03:00
Nadav Har'El	66bd3dc22c	test/alternator: tests for request compression DynamoDB's documentation https://docs.aws.amazon.com/sdkref/latest/guide/feature-compression.html suggests that DynamoDB allows request bodies to be compressed (currently only by gzip). The purpose of patch is to have a test reproducing this feature. The test shows us that indeed DynamoDB understands compressed requests using the "gzip" encoding, but Alternator does not, so the new test is xfail. As you can see in the test code, although the low-level SDK (botocore) can send compress requests, this is not actually enabled for DynamoDB and we need to resort to some trickery to send compressed requests. But the point is that once we do manage to send compressed requests, the test shows us that they work properly on AWS, but fail on Alternator. The failure of the compressed requests on Alternator is reported like: An error occurred (ValidationException) when calling the PutItem operation: Parsing JSON failed: Invalid value. at 70459088 This error message should probably be improved (what is that high number?!) but of course even better would be to make it really work. By enabling tracing on alternator-server (e.g., edit test/cqlpy/run.py and add `'--logger-log-level', 'alternator-server=trace',`) we can see exactly what request the SDK sends Alternator. What we can see in the request is: 1. The request headers are uncompressed (this is expected in HTTP) 2. There is a header "Content-Encoding: gzip" 3. The request's body is binary, a full-fleged gzip output complete with a gzip magic in the beginning. Refs #5041 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27049	2025-11-21 10:48:33 +02:00
Botond Dénes	0cc5208f8e	Merge 'Add sstables_manager::config' from Pavel Emelyanov Currently sstables_manager keeps a reference on global db::config to configure itself. Most of other services use their own specific configs with much less data on-board for the same purposes (e.g. #24841, #19051 and #23705 did same for other services) This PR applies this approach to sstables_manager as well. Mostly it moves various values from db::config onto newly introduced struct sstables_manager::config, but it also adds specific tracking of sstable_file_io_extensions and patches tools/scylla-sstable not to use sstables_manager as "proxy" object to get db::config from along its calls. Shuffling components dependencies, no need to backport Closes scylladb/scylladb#27021 * github.com:scylladb/scylladb: sstables_manager: Drop db::config from sstables_manager tools/sstable: Make shard_of_with_tablets use db::config argument tools/sstable: Add db::config& to all operations tools/sstable: Get endpoints from storage manager sstables_manager: Hold sstable IO extensions on it sstables: Manager helper to grab file io extensions sstables_manager: Move default format on config sstables_manager: Move enable_sstable_data_integrity_check on config sstables_manager: Move data_file_directories on config sstables_manager: Move components_memory_reclaim_threshold on config sstables_manager: Move column_index_auto_scale_threshold on config sstables_manager: Move column_index_size on config sstables_manager: Move sstable_summary_ratio on config sstables_manager: Move enable_sstable_key_validation on config sstables_manager: Move available_memory on config code: Introduce sstables_manager::config sstables: Patch get_local_directories() to work on vector of paths code: Rename sstables_manager::config() into db_config()	2025-11-21 10:21:41 +02:00
Botond Dénes	f89bb68fe2	Merge 'cdc: Preserve properties when reattaching log table' from Dawid Mędrek When we enable CDC on a table, Scylla creates a log table for it. It has default properties, but the user may change them later on. Furthermore, it's possible to detach that log table by simply disabling CDC on the base table: ```cql /* Create a table with CDC enabled. The log table is created. / CREATE TABLE ks.t (pk int PRIMARY KEY) WITH cdc = {'enabled': true}; / Detach the log table. / ALTER TABLE ks.t WITH cdc = {'enabled': false}; / Modify a property of the log table. / ALTER TABLE ks.t_scylla_cdc_log WITH bloom_filter_fp_chance = 0.13; ``` The log table can also be reattached by enabling CDC on the base table again: ```cql / Reattach the log table / ALTER TABLE ks.t WITH cdc = {'enabled': true}; ``` However, because the process of reattachment goes through the same code that created it in the first place, the properties of the log table are rolled back to their default values. This may be confusing to the user and, if unnoticed, also have other consequences, e.g. affecting performance. To prevent that, we ensure that the properties are preserved. A reproducer test, `test_log_table_preserves_properties_after_reattachment`, has been provided to verify that the changes are correct. Another test, `test_log_table_preserves_id_after_reattachment`, has also been added because the current implementation sets properties and the ID separately. Fixes scylladb/scylladb#25523 Backport: not necessary. Although the behavior may be unexpected, it's not a bug per se. Closes scylladb/scylladb#26443 github.com:scylladb/scylladb: cdc: Preserve properties when reattaching log table cdc: Extract creating columns in CDC log table to dedicated function cdc: Extract default properties of CDC log tables to dedicated function schema/schema_builder.hh: Add set_properties schema: Add getter for schema::user_properties schema: Remove underscores in fields of schema::user_properties schema: Extract user properties out of raw_schema	2025-11-21 10:06:05 +02:00
Calle Wilund	03408b185e	utils::gcp::object_storage: Fix buffer alignment reordering trailing data Fixes #26874 Due to certain people (me) not being able to tell forward from backward, the data alignment to ensure partial uploads adhere to the 256k-align rule would potentially _reorder_ trailing buffers generated iff the source buffers input into the sink are small enough. Which, as a fun fact, they are in backup upload. Change the unit test to use raw sink IO and add two unit tests (of which the smaller size provokes the bug) that checks the same 64k buf segmented upload backup uses. Closes scylladb/scylladb#26938	2025-11-21 09:36:13 +02:00
Radosław Cybulski	ce8db6e19e	Add table name to tracing in alternator Add a table name to Alternator's tracing output, as some clients would like to consistently receive this information. - add missing `tracing::add_table_name` in `executor::scan` - add emiting tables' names in `trace_state::build_parameters_map` - update tests, so when tracing is looked for it is filtered by table's name, which confirms table is being outputed. - change `struct one_session_records` declaration to `class one_session_records`, as `one_session_records` is later defined as class. Refs #26618 Fixes #24031 Closes scylladb/scylladb#26634	2025-11-21 09:33:40 +02:00
Michał Chojnowski	3f11a5ed8c	test/cluster/dtest: reduce num_tokens to 16 cluster.dtest_alternator_tests.test_slow_query_logging performs a bootstrap with 768 token ranges. It works with `me` sstables, which have 2 open file descriptors per open sstable, but with `ms` sstables, which have 3 open file descriptors per open sstable, it fails with EMFILE. To avoid this problem, let's just decrease the number of vnodes for in the test suite. It's appropriate anyway, because it avoids some unneeded work without weakening the tests. (Note: pylib-based have been setting `num_tokens` to 16 for a long time too). This breaks `bypass_cache_test`, which is written in a way that expects a certain number of token ranges. We adjust the relevant parameter accordingly.	2025-11-21 00:38:50 +01:00
Raphael S. Carvalho	74ecedfb5c	replica: Fail timed-out single-key read on cleaned up tablet replica Consider the following: 1) single-key read starts, blocks on replica e.g. waiting for memory. 2) the same replica is migrated away 3) single-key read expires, coordinator abandons it, releases erm. 4) migration advances to cleanup stage, barrier doesn't wait on timed-out read 5) compaction group of the replica is deallocated on cleanup 6) that single-key resumes, but doesn't find sstable set (post cleanup) 7) with abort-on-internal-error turned on, node crashes It's fine for abandoned (= timed out) reads to fail, since the coordinator is gone. For active reads (non timed out), the barrier will wait for them since their coordinator holds erm. This solution consists of failing reads which underlying tablet replica has been cleaned up, by just converting internal error to plain exception. Fixes #26229. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#27078	2025-11-20 11:44:03 +02:00
Gleb Natapov	ad3cf2c174	utils: fix get_random_time_UUID_from_micros to generate correct time uuid According to the IETF spec uuid variant bits should be set to '10'. All others are either invalid or reserved. The patch change the code to follow the spec. Closes scylladb/scylladb#27073	2025-11-20 10:27:29 +02:00

1 2 3 4 5 ...

10183 Commits