scylladb

Author	SHA1	Message	Date
Piotr Dulikowski	d8b283e1fb	Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros In this series we add support for forwarding strongly consistent CQL requests to suitable replicas, so that clients can issue reads/writes to any node and have the request executed on an appropriate tablet replica (and, for writes, on the Raft leader). We return the same CQL response as what the user would get while sending the request to the correct replica and we perform the same logging/stats updates on the request coordinator as if the coordinator was the appropriate replica. The core mechanism of forwarding a strongly consistent request is sending an RPC containing the user's cql request frame to the appropriate replica and returning back a ready, serialized `cql_transport::response`. We do this in the CQL server - it is most prepared for handling these types and forwarding a request containing a CQL frame allows us to reuse near-top-level methods for CQL request handling in the new RPC handler (such as the general `process`) For sending the RPC, the CQL server needs to obtain the information about who should it forward the request to. This requires knowledge about the tablet raft group members and leader. We obtain this information during the execution of a `cql3/strong_consistency` statement, and we return this information back to the CQL server using the generalized `bounce_to_shard` `response_message`, where we now store the information about either a shard, or a specific replica to which we should forward to. Similarly to `bounce_to_shard`, we need to handle this `result_message` in a loop - a replica may move during statement execution, or the Raft leader can change. We also use it for forwarding strongly consistent writes when we're not a member of the affected tablet raft group - in that case we need to forward the statement twice - once to any replica of the affected tablet, then that replica can find the leader and return this information to the coordinator, which allows the second request to be directed to the leader. This feature also allows passing through exception messages which happened on the target replica while executing the statement. For that, many methods of the `cql_transport::cql_server::connection` for creating error responses needed to be moved to `cql_transport::cql_server`. And for final exception handling on the coordinator, we added additional error info to the RPC response, so that the handling can be performed without having the `result_message::exception` or `exception_ptr` itself. Fixes [SCYLLADB-71](https://scylladb.atlassian.net/browse/SCYLLADB-71) [SCYLLADB-71]: https://scylladb.atlassian.net/browse/SCYLLADB-71?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27517 * github.com:scylladb/scylladb: test: add tests for CQL forwarding transport: enable CQL forwarding for strong consistency statements transport: add remote statement preparation for CQL forwarding transport: handle redirect responses in CQL forwarding transport: add exception handling for forwarded CQL requests transport: add basic CQL request forwarding idl: add a representation of client_state for forwarding cql_server: handle query, execute, batch in one case transport: inline process_on_shard in cql_server::process transport: extract process() to cql_server transport: add messaging_service to cql_server transport: add response reconstruction helpers for forwarding transport: generalize the bounce result message for bouncing to other nodes strong consistency: redirect requests to live replicas from the same rack transport: pass foreign_ptr into sleep_until_timeout_passes and move it to cql_server transport: extract the error handling from process_request_one transport: move error response helpers from connection to cql_server	2026-03-13 15:03:10 +01:00
Wojciech Mitros	e44820ba1f	transport: generalize the bounce result message for bouncing to other nodes In the following patches, we'll start allowing forwarding requests to strongly consistent tables so that they'll get executed on the suitable tablet Raft group members. For that we'll reuse the approach that we already have for bouncing requests to other shards - we'll try to execute a request locally, and the result of that will be a bounce message with another replica as the target. In this patch we generalize the former bounce_to_shard result message so that it will be able to specify the target of the bounce as another shard or specific replica. We also rename it to result_message::bounce so that it stops implying that only another shard may be its target. Aside from the host_id and the shard, the new message also includes the timeout, because in the service handling the forwarding we won't have the access to it, and it's needed for specifying how long we should wait for the forwarded requests. It also includes an information whether this is a write request to return correct timeout response in case the deadline is exceeded. We will return other hosts in the new bounce message when executing requests to strongly consistent tables when we can't handle the request because we aren't a suitable replica. We can't handle this message yet, so we don't return it anywhere and we still assume that every bounce message is a bounce to the same host.	2026-03-12 17:48:57 +01:00
Marcin Maliszkiewicz	b277d9d9aa	cql3: track CQL parsing memory cost and use it for admission control Use rolling_max_tracker to record gross bytes allocated during each CQL parse. The rolling maximum is then added to the memory estimate for incoming QUERY and PREPARE requests so that the admission control in the CQL transport layer accounts for parsing overhead. The measured memory footprint serves as upper bound rather than exact number but it's purpose is to prevent OOMs under unprepared statements heavy load. In benchmark 1G memory node shows decrease of non-LSA memory usage from peak 320MB (our coordinator budget is 10% of 1G) to 96MB. While tps drops from 1.2 kops to 0.8 kops. Drop in tps is expected as memory admission kicks in trying to prevent OOM.	2026-03-12 10:16:10 +01:00
Gleb Natapov	1d188f0394	auth: remove legacy auth mode and upgrade code A system needs to be upgraded to use v2 auth before moving to this ScyllaDB version otherwise the boot will fail.	2026-03-10 10:09:39 +02:00
Andrzej Jackowski	bb359b3b78	cql3: start using write CL guardrails Enable verification of write consistency level guardrails in `modification_statement` and `batch_statement`. Neither guardrail is enabled by default, so as not to disrupt clusters that are currently using any of the CLs for writes. The warning guardrail may seem harmless, as it only adds a warning to the CQL response; however, enabling it can significantly increase network traffic (as a warning message is added to each response) and also decrease throughput due to additional allocations required to prepare the warning. Therefore, both guardrails should be enabled with care. The newly added `writes_per_consistency_level` metric, which is incremented unconditionally, can help decide whether a guardrail can be safely enabled in an existing cluster. This commit adds additional `if` instructions on the critical path. However, based on the `perf_simple_query` benchmark for writes, the difference is marginal (~40 additional instructions, which is a relative difference smaller than 0.001). BEFORE: ``` 291443.35 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48067 insns/op, 18885 cycles/op, 0 errors) throughput: mean= 289743.07 standard-deviation=6075.60 median= 291424.69 median-absolute-deviation=1702.56 maximum=292498.27 minimum=261920.06 instructions_per_op: mean= 48072.30 standard-deviation=21.15 median= 48074.49 median-absolute-deviation=12.07 maximum=48119.87 minimum=48019.89 cpu_cycles_per_op: mean= 18884.09 standard-deviation=56.43 median= 18877.33 median-absolute-deviation=14.71 maximum=19155.48 minimum=18821.57 ``` AFTER: ``` 290108.83 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48121 insns/op, 18988 cycles/op, 0 errors) throughput: mean= 289105.08 standard-deviation=3626.58 median= 290018.90 median-absolute-deviation=1072.25 maximum=291110.44 minimum=274669.98 instructions_per_op: mean= 48117.57 standard-deviation=18.58 median= 48114.51 median-absolute-deviation=12.08 maximum=48162.18 minimum=48087.18 cpu_cycles_per_op: mean= 18953.43 standard-deviation=28.76 median= 18945.82 median-absolute-deviation=20.84 maximum=19023.93 minimum=18916.46 ``` Fixes: SCYLLADB-259	2026-03-04 07:26:00 +01:00
Andrzej Jackowski	3606934458	db: cql3/query_processor: add write_consistency_levels enum_sets Add enum_sets to query_processor that track the configuration values of `write_consistency_levels_warned` and `write_consistency_levels_disallowed`. Refs: SCYLLADB-259	2026-03-03 20:28:57 +01:00
Patryk Jędrzejczak	4e984139b2	Merge 'strongly consistent tables: basic implementation' from Petr Gusev In this PR we add a basic implementation of the strongly-consistent tables: * generate raft group id when a strongly-consistent table is created * persist it into system.tables table * start raft groups on replicas when a strongly-consistent tablet_map reaches them * add strongly-consistent version of the storage_proxy, with the `query` and `mutate` methods * the `mutate` method submits a command to the tablets raft group, the query method reads the data with `raft.read_barrier()` * strongly-consistent versions of the `select_statement` and `modification_statement` are added * a basic `test_strong_consistency.py/test_basic_write_read` is added which to check that we can write and read data in a strongly consistent fashion. Limitations: * for now the strongly consistent tables can have tablets only on shard zero. This is because we (ab/re) use the existing raft system tables which live only on shard0. In the next PRs we'll create separate tables for the new tablets raft groups. * No Scylla-side proxying - the test has to figure out who is the leader and submit the command to the right node. This will be fixed separately. * No tablet balancing -- migration/split/merges require separate complicated code. The new behavior is hidden behind `STRONGLY_CONSISTENT_TABLES` feature, which is enabled when the `STRONGLY_CONSISTENT_TABLES` experimental feature flag is set. Requirements, specs and general overview of the feature can be found [here](https://scylladb.atlassian.net/wiki/spaces/RND/pages/91422722/Strong+Consistency). Short term implementation plan is [here](https://docs.google.com/document/d/1afKeeHaCkKxER7IThHkaAQlh2JWpbqhFLIQ3CzmiXhI/edit?tab=t.0#heading=h.thkorgfek290) One can check the strongly consistent writes and reads locally via cqlsh: scylla.yaml: ``` experimental_features: - strongly-consistent-tables ``` cqlsh: ``` CREATE KEYSPACE IF NOT EXISTS my_ks WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 1} AND tablets = {'initial': 1} AND consistency = 'local'; CREATE TABLE my_ks.test (pk int PRIMARY KEY, c int); INSERT INTO my_ks.test (pk, c) VALUES (10, 20); SELECT * FROM my_ks.test WHERE pk = 10; ``` Fixes SCYLLADB-34 Fixes SCYLLADB-32 Fixes SCYLLADB-31 Fixes SCYLLADB-33 Fixes SCYLLADB-56 backport: no need Closes scylladb/scylladb#27614 * https://github.com/scylladb/scylladb: test_encryption: capture stderr test/cluster: add test_strong_consistency.py raft_group_registry: disable metrics for non-0 groups strong consistency: implement select_statement::do_execute() cql: add select_statement.cc strong consistency: implement coordinator::query() cql: add modification_statement cql: add statement_helpers strong consistency: implement coordinator::mutate() raft.hh: make server::wait_for_leader() public strong_consistency: add coordinator modification_statement: make get_timeout public strong_consistency: add groups_manager strong_consistency: add state_machine and raft_command table: add get_max_timestamp_for_tablet tablets: generate raft group_id-s for new table tablet_replication_strategy: add consistency field tablets: add raft_group_id modification_statement: remove virtual where it's not needed modification_statement: inline prepare_statement() system_keyspace: disable tablet_balancing for strongly_consistent_tables cql: rename strongly_consistent statements to broadcast statements	2026-01-23 09:52:33 +01:00
Petr Gusev	7d111f2396	strong_consistency: add coordinator Add the `coordinator` class, which will be responsible for coordinating reads and writes to strongly consistent tables. This commit includes only the boilerplate; the methods will be implemented in separate commits.	2026-01-21 14:56:01 +01:00
Piotr Smaron	3ca6b59f80	cql: extend `query_internal` with `query_state` param This later is going to be used to pass a query timeout via `qs` to `query_internal`.	2026-01-20 09:15:48 +01:00
Aleksandra Martyniuk	2e7ba1f8ce	cql3: reject concurrent alter of the same keyspace Reject ALTER KEYSPACE request if there is unfinished (queued, pending, or paused) alter request of the same keyspace. This is required as in the following changes, global request queue will contain rf change requests meant to be resumed.	2025-12-16 13:27:48 +01:00
Ernest Zaslavsky	5ba5aec1f8	treewide: Move mutation related files to a `mutation` directory As requested in #22104, moved the files and fixed other includes and build system. Moved files: - combine.hh - collection_mutation.hh - collection_mutation.cc - converting_mutation_partition_applier.hh - converting_mutation_partition_applier.cc - counters.hh - counters.cc - timestamp.hh Fixes: #22104 This is a cleanup, no need to backport Closes scylladb/scylladb#25085	2025-09-24 13:23:38 +03:00
Karol Nowacki	eae71d3e91	vector_store_client: Move to vector_search module Vector search related implementation moved to a new module vector_search. As the vector search functionality is going to be extended, it is better to keep it in a separate module.	2025-09-22 08:01:47 +02:00
Gleb Natapov	041011b2ee	qp: fold prepare_one function into its only caller	2025-07-31 14:12:34 +03:00
Gleb Natapov	715f1d994f	qp: co-routinize prepare_one function	2025-07-31 14:11:17 +03:00
Petr Gusev	ff1caa9798	query_options: add node_local_only mode We want to access the paxos state table only on the local node and shard (or shards in case of intranode_migration). In this commit we add a node_local_only flag to query_options, which allows to do that. This flag can be set for a query via make_internal_options. We handle this flag on the statements layer by forwarding it to either coordinator_query_options or coordinator_mutate_options.	2025-07-24 19:48:08 +02:00
Petr Gusev	7eb198f2cc	storage_proxy: introduce node_local_only flag Add a per-request flag that restricts query execution to the local node by filtering out all non-local replicas. Standard consistency level (CL) rules still apply: if the local node alone cannot satisfy the requested CL, an exception is thrown. This flag is required for Paxos state access, where reads and writes must target only the local node. As a side effect, this also enables the implementation of scylladb/scylladb#16478, which proposes a CQL extension to expose 'local mode' query execution to users. Support for this flag in storage_proxy's read and write code paths will be added in follow-up commits.	2025-07-24 19:48:08 +02:00
Petr Gusev	6caa1ae649	qp: make make_internal_options public In upcoming commits, we will switch paxos_store from using internal queries to regular prepared queries, so that prepared statements are correctly updated when the base table is recreated. To support this, we want to reuse the logic for converting parameters from vector<data_value_or_unset> to raw_value_vector_with_unset. This commit makes make_internal_options public to enable that reuse.	2025-07-24 16:39:50 +02:00
Avi Kivity	6fce817aa8	Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft. Pulling database::apply() out of schema merging code will allow to batch changes to subsystems. Future generic code will first call prepare() on all implementations, then single database::apply() and then update() on all implementations, then on each shard it will call commit() for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then post_commit(). Backport: no, it's a new feature Fixes: https://github.com/scylladb/scylladb/issues/19649 Fixes https://github.com/scylladb/scylladb/issues/24531 Closes scylladb/scylladb#24886 [avi: adjust for std::vector<mutations> -> utils::chunked_vector<mutations>] * github.com:scylladb/scylladb: test: add type creation to test_snapshot storage_service: always wake up load balancer on update tablet metadata db: schema_applier: call destroy also when exception occurs db: replica: simplify seeding ERM during shema change db: remove cleanup from add_column_family db: abort on exception during schema commit phase db: make user defined types changes atomic replica: db: make keyspace schema changes atomic db: atomically apply changes to tables and views replica: make truncate_table_on_all_shards get whole schema from table_shards service: split update_tablet_metadata into two phases service: pull out update_tablet_metadata from migration_listener db: service: add store_service dependency to schema_applier service: simplify load_tablet_metadata and update_tablet_metadata db: don't perform move on tablet_hint reference replica: split add_column_family_and_make_directory into steps replica: db: split drop_table into steps db: don't move map references in merge_tables_and_views() db: introduce commit_on_shard function db: access types during schema merge via special storage replica: make non-preemptive keyspace create/update/delete functions public replica: split update keyspace into two phases replica: split creating keyspace into two functions db: rename create_keyspace_from_schema_partition db: decouple functions and aggregates schema change notification from merging code db: store functions and aggregates change batch in schema_applier db: decouple tables and views schema change notifications from merging code db: store tables and views schema diff in schema_applier db: decouple user type schema change notifications from types merging code service: unify keyspace notification functions arguments db: replica: decouple keyspace schema change notifications to a separate function db: add class encapsulating schema merging	2025-07-13 20:47:55 +03:00
Benny Halevy	3feb759943	everywhere: use utils::chunked_vector for list of mutations Currently, we use std::vector<*mutation> to keep a list of mutations for processing. This can lead to large allocation, e.g. when the vector size is a function of the number of tables. Use a chunked vector instead to prevent oversized allocations. `perf-simple-query --smp 1` results obtained for fixed 400MHz frequency and PGO disabled: Before (read path): ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 89055.97 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39417 insns/op, 18003 cycles/op, 0 errors) 103372.72 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39380 insns/op, 17300 cycles/op, 0 errors) 98942.27 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39413 insns/op, 17336 cycles/op, 0 errors) 103752.93 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39407 insns/op, 17252 cycles/op, 0 errors) 102516.77 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39403 insns/op, 17288 cycles/op, 0 errors) throughput: mean= 99528.13 standard-deviation=6155.71 median= 102516.77 median-absolute-deviation=3844.59 maximum=103752.93 minimum=89055.97 instructions_per_op: mean= 39403.99 standard-deviation=14.25 median= 39406.75 median-absolute-deviation=9.30 maximum=39416.63 minimum=39380.39 cpu_cycles_per_op: mean= 17435.81 standard-deviation=318.24 median= 17300.40 median-absolute-deviation=147.59 maximum=18002.53 minimum=17251.75 ``` After (read path) ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 59755.04 tps ( 66.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39466 insns/op, 22834 cycles/op, 0 errors) 71854.16 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39417 insns/op, 17883 cycles/op, 0 errors) 82149.45 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39411 insns/op, 17409 cycles/op, 0 errors) 49640.04 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.3 tasks/op, 39474 insns/op, 19975 cycles/op, 0 errors) 54963.22 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.3 tasks/op, 39474 insns/op, 18235 cycles/op, 0 errors) throughput: mean= 63672.38 standard-deviation=13195.12 median= 59755.04 median-absolute-deviation=8709.16 maximum=82149.45 minimum=49640.04 instructions_per_op: mean= 39448.38 standard-deviation=31.60 median= 39466.17 median-absolute-deviation=25.75 maximum=39474.12 minimum=39411.42 cpu_cycles_per_op: mean= 19267.01 standard-deviation=2217.03 median= 18234.80 median-absolute-deviation=1384.25 maximum=22834.26 minimum=17408.67 ``` `perf-simple-query --smp 1 --write` results obtained for fixed 400MHz frequency and PGO disabled: Before (write path): ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no} Disabling auto compaction 63736.96 tps ( 59.4 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 49667 insns/op, 19924 cycles/op, 0 errors) 64109.41 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 49992 insns/op, 20084 cycles/op, 0 errors) 56950.47 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50005 insns/op, 20501 cycles/op, 0 errors) 44858.42 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50014 insns/op, 21947 cycles/op, 0 errors) 28592.87 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50027 insns/op, 27659 cycles/op, 0 errors) throughput: mean= 51649.63 standard-deviation=15059.74 median= 56950.47 median-absolute-deviation=12087.33 maximum=64109.41 minimum=28592.87 instructions_per_op: mean= 49941.18 standard-deviation=153.76 median= 50005.24 median-absolute-deviation=73.01 maximum=50027.07 minimum=49667.05 cpu_cycles_per_op: mean= 22023.01 standard-deviation=3249.92 median= 20500.74 median-absolute-deviation=1938.76 maximum=27658.75 minimum=19924.32 ``` After (write path) ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no} Disabling auto compaction 53395.93 tps ( 59.4 allocs/op, 16.5 logallocs/op, 14.3 tasks/op, 50326 insns/op, 21252 cycles/op, 0 errors) 46527.83 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50704 insns/op, 21555 cycles/op, 0 errors) 55846.30 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50731 insns/op, 21060 cycles/op, 0 errors) 55669.30 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50735 insns/op, 21521 cycles/op, 0 errors) 52130.17 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50757 insns/op, 21334 cycles/op, 0 errors) throughput: mean= 52713.91 standard-deviation=3795.38 median= 53395.93 median-absolute-deviation=2955.40 maximum=55846.30 minimum=46527.83 instructions_per_op: mean= 50650.57 standard-deviation=182.46 median= 50731.38 median-absolute-deviation=84.09 maximum=50756.62 minimum=50325.87 cpu_cycles_per_op: mean= 21344.42 standard-deviation=202.86 median= 21334.00 median-absolute-deviation=176.37 maximum=21554.61 minimum=21060.24 ``` Fixes #24815 Improvement for rare corner cases. No backport required Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#24919	2025-07-13 19:13:11 +03:00
Marcin Maliszkiewicz	2f840e51d1	service: pull out update_tablet_metadata from migration_listener It's not a good usage as there is only one non-empty implementation. Also we need to change it further in the following commit which makes it incompatible with listener code.	2025-07-10 10:40:43 +02:00
Pawel Pery	7bf53fc908	vector_store_client: implement initial vector_store_client service This patch is a part of vector_store_client sharded service implementation for a communication with vector-store service. It adds a `services/vector_store_client.{cc\|hh}` sharded service and a configuration parameter `vector_store_uri` with a `http://vector-store.dns.name:port` format. If there will be an error during parsing that parameter there will be an exception during construction. For the future unit testing purposes the patch adds `vector_store_client_tester` as a way to inject mockup functionality. This service will be used by the select statements for the Vector search indexes (see VS-46). For this reason I've added vector_store_client service in the query processor. Reference: VS-47 VS-45	2025-07-08 16:29:55 +02:00
Avi Kivity	cd79a8fc25	Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz" This reverts commit `0b516da95b`, reversing changes made to `30199552ac`. It breaks cluster.random_failures.test_random_failures.test_random_failures in debug mode (at least). Fixes #24513	2025-06-16 22:38:12 +03:00
Tomasz Grabiec	0b516da95b	Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft. Pulling `database::apply()` out of schema merging code will allow to batch changes to subsystems. Future generic code will first call `prepare()` on all implementations, then single `database::apply()` and then `update()` on all implementations, then on each shard it will call `commit()` for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then `post_commit()`. Backport: no, it's a new feature Fixes: https://github.com/scylladb/scylladb/issues/19649 Closes scylladb/scylladb#20853 * github.com:scylladb/scylladb: storage_service: always wake up load balancer on update tablet metadata db: schema_applier: call destroy also when exception occurs db: replica: simplify seeding ERM during shema change db: remove cleanup from add_column_family db: abort on exception during schema commit phase db: make user defined types changes atomic replica: db: make keyspace schema changes atomic db: atomically apply changes to tables and views replica: make truncate_table_on_all_shards get whole schema from table_shards service: split update_tablet_metadata into two phases service: pull out update_tablet_metadata from migration_listener db: service: add store_service dependency to schema_applier service: simplify load_tablet_metadata and update_tablet_metadata db: don't perform move on tablet_hint reference replica: split add_column_family_and_make_directory into steps replica: db: split drop_table into steps db: don't move map references in merge_tables_and_views() db: introduce commit_on_shard function db: access types during schema merge via special storage replica: make non-preemptive keyspace create/update/delete functions public replica: split update keyspace into two phases replica: split creating keyspace into two functions db: rename create_keyspace_from_schema_partition db: decouple functions and aggregates schema change notification from merging code db: store functions and aggregates change batch in schema_applier db: decouple tables and views schema change notifications from merging code db: store tables and views schema diff in schema_applier db: decouple user type schema change notifications from types merging code service: unify keyspace notification functions arguments db: replica: decouple keyspace schema change notifications to a separate function db: add class encapsulating schema merging	2025-06-10 13:45:32 +02:00
Marcin Maliszkiewicz	21a5a3c01f	service: pull out update_tablet_metadata from migration_listener It's not a good usage as there is only one non-empty implementation. Also we need to change it further in the following commit which makes it incompatible with listener code.	2025-06-06 08:50:33 +02:00
Andrzej Jackowski	086df24555	transport: implement SCYLLA_USE_METADATA_ID support Metadata id was introduced in CQLv5 to make metadata of prepared statement consistent between driver and database. This commit introduces a protocol extension that allows to use the same mechanism in CQLv4. This change: - Introduce SCYLLA_USE_METADATA_ID protocol extension for CQLv4 - Introduce METADATA_CHANGED flag in RESULT. The flag cames directly from CQLv5 binary protocol. In CQLv4, the bit was never used, so we assume it is safe to reuse it. - Implement handling of metadata_id and METADATA_CHANGED in RESULT rows - Implement returning metadata_id in RESULT prepared - Implement reading metadata_id from EXECUTE - Added description of SCYLLA_USE_METADATA_ID in documentation Metadata_id is wrapped in cql_metadata_id_wrapper because we need to distinguish the following situations: - Metadata_id is not supported by the protocol (e.g. CQLv4 without the extension is used) - Metadata_id is supported by the protocol but not set - e.g. PREPARE query is being handled: it doesn't contain metadata_id in the request but the reply (RESULT prepared) must contain metadata_id - Metadata_id is supported by the protocol and set, any number of bytes >= 0 is allowed, according to the CQLv5 protocol specification Fixes scylladb/scylladb#20860	2025-05-14 09:59:16 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Avi Kivity	5d68efe0bd	raft_group0_client: uninclude "db/system_keyspace.hh" It doesn't need it apart from a forward declaration. Files that lost necessary includes are adjusted, and some users of auth_version_t are redirected to the definition outside system_keyspace.	2024-09-28 16:31:53 +03:00
Avi Kivity	d69bf4f010	cql3: introduce dialect infrastructure A dialect is a different way to interpret the same CQL statement. Examples: - how duplicate bind variable names are handled (later in this series) - whether `column = NULL` in LWT can return true (as is now) or whether it always returns NULL (as in SQL) Currently, dialect is an empty structure and will be filled in later. It is passed to query_processor methods that also accept a CQL string, and from there to the parser. It is part of the prepared statement cache key, so that if the dialect is changed online, previous parses of the statement are ignored and the statement is prepared again. The patch is careful to pick up the dialect at the entry point (e.g. CQL protocol server) so that the dialect doesn't change while a statement is parsed, prepared, and cached.	2024-08-29 21:19:23 +03:00
Botond Dénes	2cec0d8dd1	service/migration_listener: update_tablet_metadata(): add hint parameter The hint contains information related to what exactly changed, allowing listeners to do partial updates, instead of reloading all metadata on each notification.	2024-08-11 09:53:19 -04:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Avi Kivity	3fc4e23a36	forward_service: rename to mapreduce_service forward_service is nondescriptive and misnamed, as it does more than forward requests. It's a classic map/reduce algorithm (and in fact one of its parameters is "reducer"), so name it accordingly. The name "forward" leaked into the wire protocol for the messaging service RPC isolation cookie, so it's kept there. It's also maintained in the name of the logger (for "nodetool setlogginglevel") for compatibility with tests. Closes scylladb/scylladb#19444	2024-07-03 19:29:47 +03:00
Botond Dénes	4e96e320b4	cql3/query_processor: for_each_cql_result(): move func to the coro frame Said method has a func parameter (called just f), which it receives as rvalue ref and just uses as a reference. This means that if caller doesn't keep the func alive, for_each_cql_result() will run into use-after-free after the first suspention point. This is unexpected for callers, who don't expect to have to keep something alive, which they passed in with std::move(). Adjust the signature to take a value instead, value parameters are moved to the coro frame and survive suspention points. Adjust internal callers (query_internal()) the same way. There are no known vulnerable external callers.	2024-06-25 06:15:25 -04:00
Nadav Har'El	4faceeaa33	Merge 'treewide: drop thrift support' from Kefu Chai thrift support was deprecated since ScyllaDB 5.2 > Thrift API - legacy ScyllaDB (and Apache Cassandra) API is > deprecated and will be removed in followup release. Thrift has > been disabled by default. so let's drop it. in this change, * thrift protocol support is dropped * all references to thrift support in document are dropped * the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well. * "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool. Fixes #3811 Fixes #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> - [x] not a fix, no need to backport Closes scylladb/scylladb#18453 * github.com:scylladb/scylladb: config: expand on rpc_keepalive's description api: s/rpc/thrift/ db/system_keyspace: drop thrift_version from system.local table transport: do not return client_type from cql_server::connection::make_client_key() treewide: drop thrift support	2024-06-17 22:36:49 +03:00
Pavel Emelyanov	bebd121936	code: Enlighten wasm headers usage Now when function context creation is encapsulated in lang::manager, some .cc files can stop using wasm-specific headers and just go with the lang/manager.hh one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	882b2f4e9f	cql3, schema_tables: Generalize function creation When a function is created with the CREATE FUNCTION statement, the statement handler does all the necessary preparations on its own. The very same code exists in schema_tables, when the function is loaded on boot. This patch generalizes both and keeps function language-specific context creation inside lang/ code. The creation function returns context via argument reference. It would have been nicer if it was returned via future<>, but it's not suitable for future<T> type :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	f950469af5	lang: Move manager to lang namespace And, while at it, rename local variable to refer to it to as "manager" not "wasm". Query processor and database also have getters named "wasm()", these are not renamed yet to keep patch smaller (and those getters are going to be reworked further anyway). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Kefu Chai	ad649be1bf	treewide: drop thrift support thrift support was deprecated since ScyllaDB 5.2 > Thrift API - legacy ScyllaDB (and Apache Cassandra) API is > deprecated and will be removed in followup release. Thrift has > been disabled by default. so let's drop it. in this change, * thrift protocol support is dropped * all references to thrift support in document are dropped * the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well. * "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool. * `rpc_port` and `start_rpc` options are preserved, but they are marked as "Unused". so that the new release of scylladb can consume existing scylla.yaml configurations which might contain these settings. by making them deprecated, user will be able get warned, and update their configurations before we actually remove them in the next major release. Fixes #3811 Fixes #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 06:44:59 +08:00
Marcin Maliszkiewicz	63e6334a64	raft: rename mutations_collector to group0_batch	2024-06-06 13:26:34 +02:00
Marcin Maliszkiewicz	0573fee2a9	cql3: auth: use mutation collector for grant and revoke permissions This is done to achieve single transaction semantics. The change includes auto-grant feature. In particular for schema related auto-grant we don't use normal mutation collector announce path but follow migration manager, this may be unified in the future.	2024-06-04 15:43:04 +02:00
Piotr Smaron	6fd0a49b63	Allow query_processor to check if global topo queue is empty With current implementation only 1 global topo req can be executed at a time, so when ALTER KS is executed, we'll have to check if any other global topo req is ongoing and fail the req if that's the case.	2024-05-28 13:55:11 +02:00
Piotr Smaron	cb40f13831	Add storage service to query processor Query processor needs to access storage service to check if global topology request is still ongoing and to be able to wait until it completes.	2024-05-27 12:48:44 +02:00
Marcin Maliszkiewicz	2ab143fb40	db: auth: move auth tables to system keyspace Separate keyspace which also behaves as system brings little benefit while creating some compatibility problems like schema digest mismatch during rollback. So we decided to move auth tables into system keyspace. Fixes https://github.com/scylladb/scylladb/issues/18098 Closes scylladb/scylladb#18769	2024-05-26 22:30:42 +03:00
Marcin Maliszkiewicz	562caaf6c6	auth: keep auth version in scylla_local Before the patch selection of auth version depended on consistent topology feature but during raft recovery procedure this feature is disabled so we need to persist the version somewhere to not switch back to v1 as this is not supported. During recovery auth works in read-only mode, writes will fail.	2024-04-02 19:04:21 +02:00
Marcin Maliszkiewicz	bd444ed6f1	cql3: auth: add a way to create mutations without executing To make table modifications go via raft we need to publish mutations. Currently many system tables (especially auth) use CQL to generate table modifications. Added function is a missing link which will allow to do a seamless transition of certain system tables to raft.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	ae2d8975b9	auth: parametrize keyspace name in default_authorizer When adding group0 replication for auth we will change only write path and plan to reuse read path. To not copy the code or make more complicated class hierarchy default_authorizer's read code will remain unchanged except this parametrization, it is needed as group0 implementation uses separate keyspace (replication is defined on a keyspace level). In subsequent commits legacy write path code will be separated and new implementation placed in default_authorizer. For now we add keyspace name as class member because it's static value anyway. But statics will be removed in future commits because migration can occur and auth need to switch keyspace name in runtime.	2024-03-01 16:22:17 +01:00
Kefu Chai	2dbf044b91	cql3: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16791	2024-01-16 16:43:17 +02:00
Benny Halevy	328ce23c78	types: add data_value_list data_value_list is a wrapper around std::initializer_list<data_value>. Use it for passing values to `cql3::query_processor::execute_internal` and friends. A following path will add a std::variant for data_value_or_unset and extend data_value_list to support unset values. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:17:27 +02:00
Benny Halevy	0b310c471c	service_level_controller: use locator::topology rather than fb_utilities Expose cql3::query_processor in auth::service to get to the topology via storage_proxy.replica::database Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 10:17:47 +02:00
Nadav Har'El	548386a0bb	treewide: reduce include of cql_statement.hh ClangBuildAnalyzer reports cql3/cql_statement.hh as being one of the most expensive header files in the project - being included (mostly indirectly) in 129 source files, and costing a total of 844 CPU seconds of compilation. This patch is an attempt, only partially successful, to reduce the number of times that cql_statement.hh is included. It succeeds in lowering the number 129 to 99, but not less :-( One of the biggest difficulties in reducing it further is that query_processor.hh includes a lot of templated code, which needs stuff from cql_statement.hh. The solution should be to un-template the functions in query_processor.hh and move them from the header to a source file, but this is beyond the scope of this patch and query_processor.hh appears problematic in other respects as well. Unfortunately the compilation speedup by this patch is negligible (the `du -bc build/dev/*/.o` metric shows less than 0.01% reduction). Beyond the fact that this patch only removes 30% of the inclusions of this header, it appears that most of the source files that no longer include cql_statement.hh after this patch, included anyway many of the other headers that cql_statement.hh included, so the saving is minimal. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #15212	2023-09-08 13:23:50 +03:00
Gleb Natapov	4ffc39d885	cql3: Extend the scope of group0_guard during DDL statement execution Currently we hold group0_guard only during DDL statement's execute() function, but unfortunately some statements access underlying schema state also during check_access() and validate() calls which are called by the query_processor before it calls execute. We need to cover those calls with group0_guard as well and also move retry loop up. This patch does it by introducing new function to cql_statement class take_guard(). Schema altering statements return group0 guard while others do not return any guard. Query processor takes this guard at the beginning of a statement execution and retries if service::group0_concurrent_modification is thrown. The guard is passed to the execute in query_state structure. Fixes: #13942 Message-ID: <ZNsynXayKim2XAFr@scylladb.com>	2023-08-17 15:52:48 +03:00

1 2 3 4

177 Commits