scylladb

Author	SHA1	Message	Date
Botond Dénes	34473302b0	Merge 'docs: document existing guardrails' from Andrzej Jackowski This patch series introduces a new documentation for exiting guardrails. Moreover: - Warning / failure messages of recently added write CL guardrails (SCYLLADB-259) are rephrased, so all guardrails have similar messages. - Some new tests are added, to help verify the correctness of the documentation and avoid situations where the documentation and implementation diverge. Fixes: [SCYLLADB-257](https://scylladb.atlassian.net/browse/SCYLLADB-257) No backport, just new docs and tests. [SCYLLADB-257]: https://scylladb.atlassian.net/browse/SCYLLADB-257?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29011 * github.com:scylladb/scylladb: test: add new guardrail tests matching documentation scenarios test: add metric assertions to guardrail replication strategy tests test: use regex matching in guardrail replication strategy tests test: extract ks_opts helper in test_guardrail_replication_strategy docs: document CQL guardrails cql: improve write consistency level guardrail messages	2026-03-20 08:56:00 +02:00
Michael Litvak	9172cc172e	schema: add logstor cf property add a schema property for tables with logstor storage	2026-03-18 19:24:26 +01:00
Piotr Dulikowski	d8b283e1fb	Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros In this series we add support for forwarding strongly consistent CQL requests to suitable replicas, so that clients can issue reads/writes to any node and have the request executed on an appropriate tablet replica (and, for writes, on the Raft leader). We return the same CQL response as what the user would get while sending the request to the correct replica and we perform the same logging/stats updates on the request coordinator as if the coordinator was the appropriate replica. The core mechanism of forwarding a strongly consistent request is sending an RPC containing the user's cql request frame to the appropriate replica and returning back a ready, serialized `cql_transport::response`. We do this in the CQL server - it is most prepared for handling these types and forwarding a request containing a CQL frame allows us to reuse near-top-level methods for CQL request handling in the new RPC handler (such as the general `process`) For sending the RPC, the CQL server needs to obtain the information about who should it forward the request to. This requires knowledge about the tablet raft group members and leader. We obtain this information during the execution of a `cql3/strong_consistency` statement, and we return this information back to the CQL server using the generalized `bounce_to_shard` `response_message`, where we now store the information about either a shard, or a specific replica to which we should forward to. Similarly to `bounce_to_shard`, we need to handle this `result_message` in a loop - a replica may move during statement execution, or the Raft leader can change. We also use it for forwarding strongly consistent writes when we're not a member of the affected tablet raft group - in that case we need to forward the statement twice - once to any replica of the affected tablet, then that replica can find the leader and return this information to the coordinator, which allows the second request to be directed to the leader. This feature also allows passing through exception messages which happened on the target replica while executing the statement. For that, many methods of the `cql_transport::cql_server::connection` for creating error responses needed to be moved to `cql_transport::cql_server`. And for final exception handling on the coordinator, we added additional error info to the RPC response, so that the handling can be performed without having the `result_message::exception` or `exception_ptr` itself. Fixes [SCYLLADB-71](https://scylladb.atlassian.net/browse/SCYLLADB-71) [SCYLLADB-71]: https://scylladb.atlassian.net/browse/SCYLLADB-71?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27517 * github.com:scylladb/scylladb: test: add tests for CQL forwarding transport: enable CQL forwarding for strong consistency statements transport: add remote statement preparation for CQL forwarding transport: handle redirect responses in CQL forwarding transport: add exception handling for forwarded CQL requests transport: add basic CQL request forwarding idl: add a representation of client_state for forwarding cql_server: handle query, execute, batch in one case transport: inline process_on_shard in cql_server::process transport: extract process() to cql_server transport: add messaging_service to cql_server transport: add response reconstruction helpers for forwarding transport: generalize the bounce result message for bouncing to other nodes strong consistency: redirect requests to live replicas from the same rack transport: pass foreign_ptr into sleep_until_timeout_passes and move it to cql_server transport: extract the error handling from process_request_one transport: move error response helpers from connection to cql_server	2026-03-13 15:03:10 +01:00
Andrzej Jackowski	60aaea8547	cql: improve write consistency level guardrail messages Update warn and fail messages for the write_consistency_levels_warned and write_consistency_levels_disallowed guardrails to include the configuration option name and actionable guidance. The main motivation is to make the messages follow the conventions of other guardrails. Refs: SCYLLADB-257	2026-03-13 14:40:45 +01:00
Avi Kivity	03186ce60d	Merge 'Cleanup after auth v1 and default superuser code removal' from Marcin Maliszkiewicz This is short cleanup after recent removal of creating default cassandra superuser and auth-v1 code removal. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1036 Backport: no, just code cleanup Closes scylladb/scylladb#29004 * github.com:scylladb/scylladb: auth: remove DEFAULT_SUPERUSER_NAME constant and dead DEFAULT_USER_PASSWORD auth: use configurable default_superuser in describe_roles auth: move default_superuser to common, remove _superuser member auth: use LOCAL_ONE for all auth queries auth: remove get_auth_ks_name indirection	2026-03-12 23:44:32 +02:00
Avi Kivity	e2eeef3e01	Merge 'service level: remove remnants of version 1 service level' from Gleb Natapov can_use_effective_service_level_cache() always returns true now, so the function can be dropped entirely and all the code that assumes it may return false can be dropped as well. Also drop async versions of find_effective_service_level and get_user_scheduling_group since they are unused. No need to backport, code removal, Closes scylladb/scylladb#29002 * github.com:scylladb/scylladb: service level: make maybe_update_per_service_level_params synchronous service level: remove unused get_user_scheduling_group function service level: drop async find_effective_service_level service level: remove remnants of version 1 service level	2026-03-12 23:39:41 +02:00
Wojciech Mitros	32974770b0	test: add tests for CQL forwarding Add basic cluster tests for CQL forwarding. The test cases include: - basic reads and writes - prepared statements with binds - forwarding from a non-replica - exception passthrough during forwarding (using an injection) - re-preparing a statement on the target node, even if the user query is also an EXECUTE request on a prepared statement - verification metric updates The existing test_basic_write_read was modified so that a few extra cases could be validated on the same cluster.	2026-03-12 19:43:35 +01:00
Wojciech Mitros	916a9995c1	transport: enable CQL forwarding for strong consistency statements We enable CQL forwarding by starting to return the bounce_to_node result message in redirect_statement() instead of throwing. The forwarding code introduced in the preceding patches reacts to these messages, allowing the requests to be forwarded. With the update, some tests assuming that requests can't be forwarded need to be adjusted, so we do that as well.	2026-03-12 19:43:35 +01:00
Wojciech Mitros	e44820ba1f	transport: generalize the bounce result message for bouncing to other nodes In the following patches, we'll start allowing forwarding requests to strongly consistent tables so that they'll get executed on the suitable tablet Raft group members. For that we'll reuse the approach that we already have for bouncing requests to other shards - we'll try to execute a request locally, and the result of that will be a bounce message with another replica as the target. In this patch we generalize the former bounce_to_shard result message so that it will be able to specify the target of the bounce as another shard or specific replica. We also rename it to result_message::bounce so that it stops implying that only another shard may be its target. Aside from the host_id and the shard, the new message also includes the timeout, because in the service handling the forwarding we won't have the access to it, and it's needed for specifying how long we should wait for the forwarded requests. It also includes an information whether this is a write request to return correct timeout response in case the deadline is exceeded. We will return other hosts in the new bounce message when executing requests to strongly consistent tables when we can't handle the request because we aren't a suitable replica. We can't handle this message yet, so we don't return it anywhere and we still assume that every bounce message is a bounce to the same host.	2026-03-12 17:48:57 +01:00
Gleb Natapov	a934d8391d	service level: drop async find_effective_service_level find_cached_effective_service_level does exactly same thing now and it is synchronous.	2026-03-12 14:28:26 +02:00
Marcin Maliszkiewicz	2d22eea2f9	Merge 'cql3: Replace SCYLLA_ASSERT and abort by throwing_assert' from Nadav Har'El In this patch we replace every single use of SCYLLA_ASSERT(), abort() and assert() in the cql3/ directory by throwing_assert(). The problem with SCYLLA_ASSERT()/abort()/assert() is that when it fails, it crashes Scylla. This is almost always a bad idea (see #7871 discussing why), but it's even riskier in front-end code like cql3/: In front-end code, there is a risk that due to a bug in our code, a specific user request can cause Scylla to crash. A malicious user can send this query to all nodes and crash the entire cluster. When the user is not malicious, it causes a small problem (a failing request) to become a much worse crash - and worse, the user has no idea which request is causing this crash and the crash will repeat if the same request is tried again. All of this is solved by using the new throwing_assert(), which is the same as SCYLLA_ASSERT() but throws an exception (using on_internal_error()) instead of crashing. The exception will prevent the code path with the invalid assumption from continuing, but will result in only the current user request being aborted, with a clear error message reporting the internal server error due to an assertion failure. I reviewed all the changes that I did in these patches to check that (to the best of my understanding) none of the assertions in cql3/ involve the sort of serious corruption that might require crashing the Scylla node entirely. throwing_assert() also improves logging of assertion failures compared to the original SCYLLA_ASSERT()/abort() - SCYLLA_ASSERT() printed a message to stderr which in many installations is lost, and abort() often prints no message at all. But throwing_assert() uses Scylla's standard logger, and also includes a backtrace in the log message. Fixes #13970 (Exorcise assertions from CQL code paths) Refs #7871 (Exorcise assertions from Scylla) Closes scylladb/scylladb#28847 * github.com:scylladb/scylladb: cql3: remove unnecessary assert() cql3: replace abort() by throwing_assert() cql3: Replace SCYLLA_ASSERT by throwing_assert	2026-03-12 09:09:24 +01:00
Marcin Maliszkiewicz	6d1153687a	auth: remove get_auth_ks_name indirection Replace get_auth_ks_name(qp) with db::system_keyspace::NAME directly. The function always returned the constant "system" and its qp parameter was unused.	2026-03-11 16:26:47 +01:00
Botond Dénes	475220b9c9	Merge 'Remove the rest of pre raft topology code' from Gleb Natapov Remove the rest of the code that assumes that either group0 does not exist yet or a cluster is till not upgraded to raft topology. Both of those are not supported any more. No need to backport since we remove functionality here. Closes scylladb/scylladb#28841 * github.com:scylladb/scylladb: service level: remove version 1 service level code features: move GROUP0_SCHEMA_VERSIONING to deprecated features list migration_manager: remove unused forward definitions test: remove unused code auth: drop auth_migration_listener since it does nothing now schema: drop schema_registry_entry::maybe_sync() function schema: drop make_table_deleting_mutations since it should not be needed with raft schema: remove calculate_schema_digest function schema: drop recalculate_schema_version function and its uses migration_manager: drop check for group0_schema_versioning feature cdc: drop usage of cdc_local table and v1 generation definition storage_service: no need to add yourself to the topology during reboot since raft state loading already did it storage_service: remove unused functions group0: drop with_raft() function from group0_guard since it always returns true now gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more gossiper: drop tokens from loaded_endpoint_state gossiper: remove unused functions storage_service: do not pass loaded_peer_features to join_topology() storage_service: remove unused fields from replacement_info gossiper: drop is_safe_for_restart() function and its use storage_service: remove unused variables from join_topology gossiper: remove the code that was only used in gossiper topology storage_service: drop the check for raft mode from recovery code cdc: remove legacy code test: remove unused injection points auth: remove legacy auth mode and upgrade code treewide: remove schema pull code since we never pull schema any more raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer group0: hoist the checks for an illegal upgrade into main.cc api: drop get_topology_upgrade_state and always report upgrade status as done service_level_controller: drop service level upgrade code test: drop run_with_raft_recovery parameter to cql_test_env group0: get rid of group0_upgrade_state storage_service: drop topology_change_kind as it is no longer needed storage_service: drop check_ability_to_perform_topology_operation since no upgrades can happen any more service_storage: remove unused functions storage_service: remove non raft rebuild code storage_service: set topology change kind only once group0: drop in_recovery function and its uses group0: rename use_raft to maintenance_mode and make it sync	2026-03-11 10:24:20 +02:00
Nadav Har'El	00a819bcd8	cql3: remove unnecessary assert() In cql3/, there was one call to assert() (not SCYLLA_ASSERT or throwing_assert), and it was: const auto shard_num = smp::count; assert(shard_num > 0) Rather than converting this assert() to throwing_assert() as I did in previous patches, I decided to outright remove it: Seastar guarantees that smp::count is not zero. Many other places in the code use smp::count assuming that it is correct, no other place bothers to assert it isn't zero. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-11 09:43:24 +02:00
Nadav Har'El	34eec020b3	cql3: replace abort() by throwing_assert() After the previous patch replaced all SCYLLA_ASSERT() calls by throwing_assert(), this patch also replaces all calls to abort(). All these abort() calls are supposedly cases that can never happen, but if they ever do happen because of a bug, in none of these places we absolutely need to crash - and exception that aborts the current operation should be enough. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-11 09:43:11 +02:00
Nadav Har'El	c87d6407ed	cql3: Replace SCYLLA_ASSERT by throwing_assert In this patch we replace every single use of SCYLLA_ASSERT() in the cql3/ directory by throwing_assert(). The problem with SCYLLA_ASSERT() is that when it fails, it crashes Scylla. This is almost always a bad idea (see #7871 discussing why), but it's even riskier in front-end code like cql3/: In front-end code, there is a risk that due to a bug in our code, a specific user request can cause Scylla to crash. A malicious user can send this query to all nodes and crash the entire cluster. When the user is not malicious, it causes a small problem (a failing request) to become a much worse crash - and worse, the user has no idea which request is causing this crash and the crash will repeat if the same request is tried again. All of this is solved by using the new throwing_assert(), which is the same as SCYLLA_ASSERT() but throws an exception (using on_internal_error()) instead of crashing. The exception will prevent the code path with the invalid assumption from continuing, but will result in only the current user request being aborted, with a clear error message reporting the internal server error due to an assertion failure. I reviewed all the changes that I did in this patch to check that (to the best of my understanding) none of the assertions in cql3/ involve the sort of serious corruption that might require crashing the Scylla node entirely. throwing_assert() also improves logging of assertion failures compared to the original SCYLLA_ASSERT() - SCYLLA_ASSERT() printed a message to stderr which in many installations is lost, whereas throwing_assert() uses Scylla's standard logger, and also includes a backtrace in the log message. Fixes #13970 (Exorcise assertions from CQL code paths) Refs #7871 (Exorcise assertions from Scylla) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-11 09:41:20 +02:00
Dawid Mędrek	167feabe1a	cql3: Reject user-provided timestamps for strongly consistent tables Similarly to LWTs, we reject queries with user-provided timestamps when they target strongly consistent tables. Such statements could force us to rewrite history, and that contradicts the philosophy of linearizability we aim for. Fixes SCYLLADB-879 Closes scylladb/scylladb#28867	2026-03-10 22:11:39 +02:00
Gleb Natapov	b59b3d4f8a	service level: remove version 1 service level code	2026-03-10 10:46:48 +02:00
Gleb Natapov	1d188f0394	auth: remove legacy auth mode and upgrade code A system needs to be upgraded to use v2 auth before moving to this ScyllaDB version otherwise the boot will fail.	2026-03-10 10:09:39 +02:00
Michał Chojnowski	ff60a5f1e5	cql3: suggest ALTER MATERIALIZED VIEW to users trying to use ALTER TABLE on a view When a user tries to use ALTER TABLE on a materialized view, the resulting error message is `Cannot use ALTER TABLE on Materialized View`. The intention behind this error is that ALTER MATERIALIZED VIEW should be used instead. But we observed that some users interpret this error message as a general "You cannot do any ALTER on this thing". This patch enhances the error message (and others similar to it) to prevent the confusion. Closes scylladb/scylladb#28831	2026-03-09 15:07:21 +01:00
Szymon Malewski	f9d213547f	cql3: selection: fix `add_column_for_post_processing` for ORDER BY The purpose of `add_column_for_post_processing` is to add columns that are required for processing of a query, but are not part of SELECT clause and shouldn't be returned. They are added to the final result set, but later are not serialized. Mainly it is used for filtering and grouping columns, with a special case of `WHERE primary_key IN ... ORDER BY ...` when the whole result set needs additional final sorting, and ordering columns must be added as well. There was a bug that manifested in #9435, #8100 and was actually identified in #22061. In case of selection with processing (e.g functions involved), result set row is formed in two stages. Initially it is a list of columns fetched from replicas - on which filtering and grouping is performed. After that the actual selection is resolved and the final number of columns can change. Ordering is performed on this final shape, but the ordering column index returned by `add_column_for_post_processing` refereed to initial shape. If selection refereed to the same column twice (e.g. `v, TTL(v)` as in #9435) final row was longer than initial and ordering refereed to incorrect column. If a function in selection refereed to multiple columns (e.g. as_json(.., ..) which #8100 effectively uses) the final row was shorter and ordering tried to use a non-existing column. This patch fixes the problem by making sure that column index of the final result set is used for ordering. The previously crashing test `cassandra_tests/validation/entities/json_test.py::testJsonOrdering` doesn't have to be skipped, but now it is failing on issue #28467. Fixes #9435 Fixes #8100 Fixes #22061 Closes scylladb/scylladb#28472	2026-03-05 19:22:34 +02:00
Marcin Maliszkiewicz	c3f59e4fa1	Merge 'cql3: implement write_consistency_levels guardrails' from Andrzej Jackowski This patch series implements `write_consistency_levels_warned` and `write_consistency_levels_disallowed` guardrails, allowing the configuration of which consistency levels are unwanted for writes. The motivation for these guardrails is to forbid writing with consistency levels that don't provide high durability guarantees (like CL=ANY, ONE, or LOCAL_ONE). Neither guardrail is enabled by default, so as not to disrupt clusters that are currently using any of the CLs for writes. The warning guardrail may seem harmless, as it only adds a warning to the CQL response; however, enabling it can significantly increase network traffic (as a warning message is added to each response) and also decrease throughput due to additional allocations required to prepare the warning. Therefore, both guardrails should be enabled with care. The newly added `writes_per_consistency_level` metric, which is incremented unconditionally, can help decide whether a guardrail can be safely enabled in an existing cluster. This commit adds additional `if` instructions on the critical path. However, based on the `perf_simple_query` benchmark for writes, the difference is marginal (~40 additional instructions, which is a relative difference smaller than 0.001). BEFORE: ``` 291443.35 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48067 insns/op, 18885 cycles/op, 0 errors) throughput: mean= 289743.07 standard-deviation=6075.60 median= 291424.69 median-absolute-deviation=1702.56 maximum=292498.27 minimum=261920.06 instructions_per_op: mean= 48072.30 standard-deviation=21.15 median= 48074.49 median-absolute-deviation=12.07 maximum=48119.87 minimum=48019.89 cpu_cycles_per_op: mean= 18884.09 standard-deviation=56.43 median= 18877.33 median-absolute-deviation=14.71 maximum=19155.48 minimum=18821.57 ``` AFTER: ``` 290108.83 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48121 insns/op, 18988 cycles/op, 0 errors) throughput: mean= 289105.08 standard-deviation=3626.58 median= 290018.90 median-absolute-deviation=1072.25 maximum=291110.44 minimum=274669.98 instructions_per_op: mean= 48117.57 standard-deviation=18.58 median= 48114.51 median-absolute-deviation=12.08 maximum=48162.18 minimum=48087.18 cpu_cycles_per_op: mean= 18953.43 standard-deviation=28.76 median= 18945.82 median-absolute-deviation=20.84 maximum=19023.93 minimum=18916.46 ``` Fixes: SCYLLADB-259 Refs: SCYLLADB-739 No backport, it's a new feature Closes scylladb/scylladb#28570 * github.com:scylladb/scylladb: scylla.yaml: add write CL guardrails to scylla.yaml scylla.yaml: reorganize guardrails config to be in one place test: add cluster tests for write CL guardrails test: implement test_guardrail_write_consistency_level cql3: start using write CL guardrails cql3/query_processor: implement metrics to track CL of writes db: cql3/query_processor: add write_consistency_levels enum_sets config: add write_consistency_levels_* guardrails configuration	2026-03-05 09:55:38 +01:00
Andrzej Jackowski	bb359b3b78	cql3: start using write CL guardrails Enable verification of write consistency level guardrails in `modification_statement` and `batch_statement`. Neither guardrail is enabled by default, so as not to disrupt clusters that are currently using any of the CLs for writes. The warning guardrail may seem harmless, as it only adds a warning to the CQL response; however, enabling it can significantly increase network traffic (as a warning message is added to each response) and also decrease throughput due to additional allocations required to prepare the warning. Therefore, both guardrails should be enabled with care. The newly added `writes_per_consistency_level` metric, which is incremented unconditionally, can help decide whether a guardrail can be safely enabled in an existing cluster. This commit adds additional `if` instructions on the critical path. However, based on the `perf_simple_query` benchmark for writes, the difference is marginal (~40 additional instructions, which is a relative difference smaller than 0.001). BEFORE: ``` 291443.35 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48067 insns/op, 18885 cycles/op, 0 errors) throughput: mean= 289743.07 standard-deviation=6075.60 median= 291424.69 median-absolute-deviation=1702.56 maximum=292498.27 minimum=261920.06 instructions_per_op: mean= 48072.30 standard-deviation=21.15 median= 48074.49 median-absolute-deviation=12.07 maximum=48119.87 minimum=48019.89 cpu_cycles_per_op: mean= 18884.09 standard-deviation=56.43 median= 18877.33 median-absolute-deviation=14.71 maximum=19155.48 minimum=18821.57 ``` AFTER: ``` 290108.83 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48121 insns/op, 18988 cycles/op, 0 errors) throughput: mean= 289105.08 standard-deviation=3626.58 median= 290018.90 median-absolute-deviation=1072.25 maximum=291110.44 minimum=274669.98 instructions_per_op: mean= 48117.57 standard-deviation=18.58 median= 48114.51 median-absolute-deviation=12.08 maximum=48162.18 minimum=48087.18 cpu_cycles_per_op: mean= 18953.43 standard-deviation=28.76 median= 18945.82 median-absolute-deviation=20.84 maximum=19023.93 minimum=18916.46 ``` Fixes: SCYLLADB-259	2026-03-04 07:26:00 +01:00
Dario Mirovic	6a1edab2ac	client_state: add has_superuser method Encapsulate the superuser check in client_state so that it respects _bypass_auth_checks. Connections that bypass auth (internal callers and the maintenance socket) are always considered superusers. Migrate existing call sites from auth::has_superuser(service, user) to client_state.has_superuser(). Also add _bypass_auth_checks handling to ensure_not_anonymous(). Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Karol Nowacki	30487e8854	index: fix vector index with filtering target column The secondary index mechanism is currently used to determine the target column. This mechanism works incorrectly for vector indexes with filtering because it returns the last specified column as the target (vectors) column. However, the syntax for a vector index requires the first column to be the target: ``` CREATE CUSTOM INDEX ON t(vectors, users) USING 'vector_index'; ``` This discrepancy eventually leads to the following exception when performing an ANN search on a vector index with filtering columns: ```` ANN ordering by vector requires the column to be indexed using 'vector_index' ```` This commit fixes the issue by introducing dedicated logic for vector indexes to correctly identify the target(vectors) column. Fixes: SCYLLADB-635 Closes scylladb/scylladb#28740	2026-03-02 18:47:58 +02:00
Nadav Har'El	eebd7b0fbc	cql ttl: fix ALTER TABLE to disable TTL if column is dropped If "ALTER TABLE tab DROP x" is done to delete column x, and column x was the designated TTL column, then the per-row TTL feature should be disabled on this table. If we don't do this, the expiration scanner will continue to scan the table trying to read the dropped column - which will be wasteful or worse. A test for this case is also included in test/cqlpy/test_ttl_row.py in a later patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:43 +02:00
Nadav Har'El	acbdf637b6	cql ttl: add setting/unsetting of TTL column to ALTER TABLE The previous patch added the ability in CREATE TABLE to designate one of the regular columns as a "TTL column", to be used by the per-row TTL feature (Refs #13000). In this patch we add to ALTER TABLE the ability to enable per-row TTL on an existing table with a given column as the TTL column: ALTER TABLE tab TTL colname and also the ability to disable per-row TTL with ALTER TABLE tab TTL NULL as in CREATE TABLE, the designated TTL column must be a regular column (it can't be a primary key column or a static column), and must have the types timestamp, bigint or int. You can't enable per-row TTL if already enabled, or disable it if already disabled. To change the TTL column on an existing table, you must first disable TTL, and then re-enable it with the new column. A large collection of functional tests (in test/cqlpy), for every detail of this patch, will come in a later patch in this series. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:43 +02:00
Nadav Har'El	22c79b6af8	cql ttl: add TTL column support to CREATE TABLE and DESC TABLE This patch enables the per-row TTL feature in CQL (Refs #13000). This patch allows the user to create a new table with one of its columns designated as the TTL column with a syntax like: CREATE TABLE tab ( id int PRIMARY KEY, t text, expiration timestamp TTL ); The column marked "TTL" must have the "timestamp", "bigint" or "int" types (the choice of these types was explained in the previous patch), and there can only be one such column. We decided not to allow a column to be both a primary key column and a TTL column - although it would have worked (it's supported in Alternator), I considered this non-useful and confusing, and decided not to allow it in CQL. A TTL column also can't be a static column. We save the information of which column is the TTL column in a tag which is read by the "expiration service" - originally a part of Alternator's TTL implementation. After the previous patch, the expiration service is running and knows how to understand CQL tables, so the CQL per-row TTL feature will start to work. This patch also implements DESC TABLE, printing the word "TTL" in the right place of the output. This patch doesn't yet implement ALTER TABLE that should allow enabling or disabling the TTL column setting on an existing table - we'll do that in the next patch. A large collection of functional tests (in test/cqlpy), for every detail of this feature will be added in a later patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:42 +02:00
Nadav Har'El	df69dbec2a	Merge ' cql3/statements/describe_statement: hide paxos state tables ' from Michał Jadwiszczak Paxos state tables are internal tables fully managed by Scylla and they shouldn't be exposed to the user nor they shouldn't be backed up. This commit hides those kind of tables from all listings and if such table is directly described with `DESC ks."tbl$paxos"`, the description is generated withing a comment and a note for the user is added. Fixes https://github.com/scylladb/scylladb/issues/28183 LWT on tablets and paxos state tables are present in 2025.4, so the patch should be backported to this version. Closes scylladb/scylladb#28230 * github.com:scylladb/scylladb: test/cqlpy: add reproducer for hidden Paxos table being shown by DESC cql3/statements/describe_statement: hide paxos state tables	2026-02-02 21:22:59 +02:00
Marcin Maliszkiewicz	e18b519692	cql3: remove find_schema call from select check_access Schema is already a member of select statement, avoiding the call saves around 400 cpu instructions on a select request hot path. Closes scylladb/scylladb#28328	2026-01-30 11:49:09 +01:00
Marcin Maliszkiewicz	a93ad3838f	audit: cql: remove create_no_audit_info We don't need a special guard value, it's only being filled for batch statements for which we can simply ignore the value. Not having special value allows us to return fast when audit is not enabled.	2026-01-26 10:18:38 +01:00
Piotr Dulikowski	3ec4f67407	Merge 'vector_index: Implement rescoring' from Szymon Malewski This series implements rescoring algorithm. Index options allowing to enable this functionality were introduced in earlier PR https://github.com/scylladb/scylladb/pull/28165. When Vector Index has enabled quantization, Vector Store uses reduced vector representation to save memory, but it may degrade correctness of ANN queries. For quantized index we can enable rescoring algorithm, which recalculates similarity score from full vector representation stored in Scylla and reorder returned result set. It works also with oversampling - we fetch more candidates from Vector Store, rescore them at Scylla and return only requested number of results. Example: Creating a Vector Index with Rescoring ```sql -- Create a table with a vector column CREATE TABLE ks.products ( id int PRIMARY KEY, embedding vector<float, 128> ); -- Create a vector index with rescoring enabled CREATE INDEX products_embedding_idx ON ks.products (embedding) USING 'vector_index' WITH OPTIONS = { 'similarity_function': 'cosine', 'quantization': 'i8', 'oversampling': '2.0', 'rescoring': 'true' }; ``` 1. Quantization (`i8`) compresses vectors in the index, reducing memory usage but introducing precision loss in distance calculations 2. Oversampling (`2.0`) retrieves 2× more candidates than requested from the vector store (e.g., `LIMIT 10` fetches 20 candidates) 3. Rescoring (`true`) recalculates similarity scores using full-precision (`f32`) vectors from the base table and re-ranks results Query example: ```sql -- Find 10 most similar products SELECT id, similarity_cosine(embedding, [0.1, 0.2, ...]) AS score FROM ks.products ORDER BY embedding ANN OF [0.1, 0.2, ...] LIMIT 10; ``` With rescoring enabled, the query: 1. Fetches 20 candidates from the quantized index (due to oversampling=2.0) 2. Reads full-precision embeddings from the base table 3. Recalculates similarity scores with full precision 4. Re-ranks and returns the top 10 results In this implementation we use CQL similarity function implementation to calculate new score values and use them in post query ordering. We add that column manually to selection, but it has to be removed from the final response. Follow-up https://github.com/scylladb/scylladb/pull/28165 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-83 New feature - doesn't need backport. Closes scylladb/scylladb#27769 * github.com:scylladb/scylladb: vector_index: rescoring: Fetch oversampled rows vector_index: rescoring: Sort by similarity column select_statement: Modify `needs_post_query_ordering` condition vector_index: rescoring: Add hidden similarity score column vector_index: Refactor extracting ANN query information	2026-01-23 15:20:10 +01:00
Piotr Dulikowski	fe9237fdc9	Merge 'alternator: don't require rf_rack flag for indexes, validate instead' from Michael Litvak In `8df61f6d99` we changed the requirements for creating materialized views and MV-based indexes - instead of requiring the rf_rack_valid_keyspaces flag to be set, we now require the keyspace to be RF-rack-valid at the time of creation, and it is enforced to remain RF-rack-valid while the MV exists. This validation is done in the cql create view/index statements. The same should be done also for alternator - when creating a table with GSI or LSI, or when adding a GSI to an existing table, previously we required the flag rf_rack_valid_keyspaces to be set. Now we change it to instead check if the keyspace is RF-rack-valid, and if not the operation fails with an appropriate error. Fixes https://github.com/scylladb/scylladb/issues/28214 backport to 2025.4 to add RF-rack-valid enforcements in alternator Closes scylladb/scylladb#28154 * github.com:scylladb/scylladb: locator: document the exception type of assert_rf_rack_valid_keyspace alternator: don't require rf_rack flag for indexes, validate instead	2026-01-23 11:49:02 +01:00
Patryk Jędrzejczak	4e984139b2	Merge 'strongly consistent tables: basic implementation' from Petr Gusev In this PR we add a basic implementation of the strongly-consistent tables: * generate raft group id when a strongly-consistent table is created * persist it into system.tables table * start raft groups on replicas when a strongly-consistent tablet_map reaches them * add strongly-consistent version of the storage_proxy, with the `query` and `mutate` methods * the `mutate` method submits a command to the tablets raft group, the query method reads the data with `raft.read_barrier()` * strongly-consistent versions of the `select_statement` and `modification_statement` are added * a basic `test_strong_consistency.py/test_basic_write_read` is added which to check that we can write and read data in a strongly consistent fashion. Limitations: * for now the strongly consistent tables can have tablets only on shard zero. This is because we (ab/re) use the existing raft system tables which live only on shard0. In the next PRs we'll create separate tables for the new tablets raft groups. * No Scylla-side proxying - the test has to figure out who is the leader and submit the command to the right node. This will be fixed separately. * No tablet balancing -- migration/split/merges require separate complicated code. The new behavior is hidden behind `STRONGLY_CONSISTENT_TABLES` feature, which is enabled when the `STRONGLY_CONSISTENT_TABLES` experimental feature flag is set. Requirements, specs and general overview of the feature can be found [here](https://scylladb.atlassian.net/wiki/spaces/RND/pages/91422722/Strong+Consistency). Short term implementation plan is [here](https://docs.google.com/document/d/1afKeeHaCkKxER7IThHkaAQlh2JWpbqhFLIQ3CzmiXhI/edit?tab=t.0#heading=h.thkorgfek290) One can check the strongly consistent writes and reads locally via cqlsh: scylla.yaml: ``` experimental_features: - strongly-consistent-tables ``` cqlsh: ``` CREATE KEYSPACE IF NOT EXISTS my_ks WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 1} AND tablets = {'initial': 1} AND consistency = 'local'; CREATE TABLE my_ks.test (pk int PRIMARY KEY, c int); INSERT INTO my_ks.test (pk, c) VALUES (10, 20); SELECT * FROM my_ks.test WHERE pk = 10; ``` Fixes SCYLLADB-34 Fixes SCYLLADB-32 Fixes SCYLLADB-31 Fixes SCYLLADB-33 Fixes SCYLLADB-56 backport: no need Closes scylladb/scylladb#27614 * https://github.com/scylladb/scylladb: test_encryption: capture stderr test/cluster: add test_strong_consistency.py raft_group_registry: disable metrics for non-0 groups strong consistency: implement select_statement::do_execute() cql: add select_statement.cc strong consistency: implement coordinator::query() cql: add modification_statement cql: add statement_helpers strong consistency: implement coordinator::mutate() raft.hh: make server::wait_for_leader() public strong_consistency: add coordinator modification_statement: make get_timeout public strong_consistency: add groups_manager strong_consistency: add state_machine and raft_command table: add get_max_timestamp_for_tablet tablets: generate raft group_id-s for new table tablet_replication_strategy: add consistency field tablets: add raft_group_id modification_statement: remove virtual where it's not needed modification_statement: inline prepare_statement() system_keyspace: disable tablet_balancing for strongly_consistent_tables cql: rename strongly_consistent statements to broadcast statements	2026-01-23 09:52:33 +01:00
Michael Litvak	d5009882c6	locator: document the exception type of assert_rf_rack_valid_keyspace The function assert_rf_rack_valid_keyspace uses the exception type std::invalid_argument when the RF-rack validation fails. Document it and change all callers to catch this specific exception type when checking for RF-rack validation failures, so that other exception types can be propagated properly.	2026-01-22 16:11:35 +01:00
Szymon Malewski	29d090845a	vector_index: rescoring: Fetch oversampled rows So far with oversampling the extended set of keys was returned from VS, but query to the base table was still limited by the query `limit`. Now for rescoring we want to fetch rows for all the keys returned from VS. However later we need to restore the command limit, to trim result_set accordingly. For non-rescoring scenarios we trim directly keys set returned from VS if it happens to exceed query limit. With this change rescoring validation tests (except `no_nulls_in_rescored_results`) pass fully. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-83	2026-01-22 15:38:44 +01:00
Szymon Malewski	0bc95bcf87	vector_index: rescoring: Sort by similarity column This patch implements second part of rescoring - ordering results by similarity column added in earlier patch. For this purpose in this patch we define `_ordering_comparator`, which enables pre-existing post-query ordering functionality. However, none additional test passes yet, as they include ovesampling, which will be the subject of following patches.	2026-01-22 15:38:44 +01:00
Szymon Malewski	57e7a4fa4f	select_statement: Modify `needs_post_query_ordering` condition Our plan for rescoring is to use the existing post-query ordering mechanism to sort (and trim) result_set by similarity column. For general SELECT case this ordering is permitted only for queries with IN on the partition key and an ORDER BY, which is checked in `needs_post_query_ordering`. Recently this check was overriden for ANN queries in https://github.com/scylladb/scylladb/pull/28109 to enable IN queries handled by VS without excessive post-processing. In this patch we revert that change - ANN case will be handled by general check. However we change the condition - we will enable post processing anytime `_ordering_comparator` is set. In current implementation `_ordering_comparator` is created only in `select_statement::prepare` with `get_ordering_comparator`, only for the same conditions as were checked in `needs_post_query_ordering`, so this change should be transparent for general SELECT. For ANN query it is also not set (yet), so it will not influence ANN filtering, but we confirm that this functionality still works by adding filtering test: `test/vector_search/filter_test.cc::vector_store_client_test_filtering_ann_cql`. Rescoring ordering for ANN queries will be enabled when we add `_ordering_comparator` in following patch.	2026-01-22 15:38:44 +01:00
Szymon Malewski	c89957b725	vector_index: rescoring: Add hidden similarity score column Rescoring consist of recalculating similarity score and reordering results based on it. In this patch we add calculation of similarity score as a hidden (non-serialized) column and following patch will add reordering. Normal ordering uses `add_column_for_post_processing`, however this works only for regular columns, not function. So we create it together with user requested columns (this also forces the use of `selection_with_processing`) and hide the column later. This also requires special handling for 'SELECT *' case - we need to manually add all columns before adding similarity column. In case user already asks for similarity score in the SELECT clause, this value will be calculated twice - is should be optimized in future patches.	2026-01-22 15:38:40 +01:00
Szymon Malewski	e0cc6ca7e6	vector_index: Refactor extracting ANN query information For the purpose of rescoring we will need information if the query is an ANN query and the access to index option earlier in the `select_statement::prepare` than it happened before. This patch refactors extracting this information to new helper structure `ann_ordering_info` and is consistently using it.	2026-01-22 10:00:47 +01:00
Botond Dénes	7d2e6c0170	Merge 'config: add enforce_rack_list option' from Aleksandra Martyniuk Add enforce_rack_list option. When the option is set to true, all tablet keyspaces have rack list replication factor. When the option is on: - CREATE STATEMENT always auto-extends rf to rack lists; - ALTER STATEMENT fails when there is numeric rf in any DC. The flag is set to false by default and a node needs to be restarted in order to change its value. Starting a node with enforce_rack_list option will fail, if there are any tablet keyspaces with numeric rf in any DC. enforce_rack_list is a per-node option and a user needs to ensure that no tablet keyspace is altered or created while nodes in the cluster don't have the consistent value. Mark rf_rack_valid_keyspaces as deprecated. Fixes: https://github.com/scylladb/scylladb/issues/26399. New feature; no backport needed Closes scylladb/scylladb#28084 * github.com:scylladb/scylladb: test: add test for enforce_rack_list option db: mark rf_rack_valid_keyspaces as deprecated config: add enforce_rack_list option Revert "alternator: require rf_rack_valid_keyspaces when creating index"	2026-01-22 10:27:35 +02:00
Botond Dénes	4281d18c2e	Merge 'schema: Apply `sstable_compression_user_table_options` to CQL aux and Alternator tables' from Nikos Dragazis In PR `5b6570be52` we introduced the config option `sstable_compression_user_table_options` to allow adjusting the default compression settings for user tables. However, the new option was hooked into the CQL layer and applied only to CQL base tables, not to the whole spectrum of user tables: CQL auxiliary tables (materialized views, secondary indexes, CDC log tables), Alternator base tables, Alternator auxiliary tables (GSIs, LSIs, Streams). This gap also led to inconsistent default compression algorithms after we changed the option’s default algorithm from LZ4 to LZ4WithDicts (`adf9c426c2`). This series introduces a general “schema initializer” mechanism in `schema_builder` and uses it to apply the default compression settings uniformly across all user tables. This ensures that all base and aux tables take their default compression settings from config. Fixes #26914. Backport justification: LZ4WithDicts is the new default since 2025.4, but the config option exists since 2025.2. Based on severity, I suggest we backport only to 2025.4 to maintain consistency of the defaults. Closes scylladb/scylladb#27204 * github.com:scylladb/scylladb: db/config: Update sstable_compression_user_table_options description schema: Add initializer for compression defaults schema: Generalize static configurators into schema initializers schema: Initialize static properties eagerly db: config: Add accessor for sstable_compression_user_table_options test: Check that CQL and Alternator tables respect compression config	2026-01-22 06:50:48 +02:00
Petr Gusev	1f170d2566	strong consistency: implement select_statement::do_execute()	2026-01-21 14:56:01 +01:00
Petr Gusev	a5d611866e	cql: add select_statement.cc	2026-01-21 14:56:01 +01:00
Petr Gusev	ccf90cfde8	cql: add modification_statement We use decoration instead of inheritance, since inheritance already serves to differentiate statement types (modification_statement has update_statement and delete_statement as descendants). A better solution would likely involve refactoring modification_statement and extracting the mutation-generation logic into a reusable component shared by both eventual and strongly consistent statements.	2026-01-21 14:56:01 +01:00
Petr Gusev	989566e8a3	cql: add statement_helpers Introduce two helper methods that will be used for strongly consistent select_statement and modification_statement. redirect_statement() forwards the request to another shard or node. Currently, only shard forwarding is implemented; node-level proxying will be added in follow-up PRs. is_strongly_consistent() will be used in the prepare() method of raw statements to determine whether a strongly consistent statement should be created for the given CQL statement.	2026-01-21 14:56:01 +01:00
Petr Gusev	4413142f25	modification_statement: make get_timeout public We'll need to access this method in a new strong_consistency/modification_statement class.	2026-01-21 14:56:00 +01:00
Petr Gusev	cab3e1eea5	modification_statement: remove virtual where it's not needed This is a refactoring/simplification commit.	2026-01-21 14:56:00 +01:00
Petr Gusev	9015bed794	modification_statement: inline prepare_statement() This is a refactoring/simplification commit. There are many 'prepare' functions in this class that don't meaningfully differ from each other. The prepare_statement() adds accidental complexity by adding a level of indirection -- the reader has to jump between the call site and the function body to reconstruct the full picture.	2026-01-21 14:56:00 +01:00
Petr Gusev	6b0d757f28	cql: rename strongly_consistent statements to broadcast statements In preparation for upcoming work on strongly consistent queries in Scylla, this commit renames the existing `strongly_consistent` statements to `broadcast_statements` to avoid confusion. The old code paths are kept temporarily, as they may be useful for reference or for copying parts during the implementation of the new strongly consistent statements.	2026-01-21 14:56:00 +01:00

1 2 3 4 5 ...

1962 Commits