scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 14:15:46 +00:00

Author	SHA1	Message	Date
Michał Hudobski	ae4d4908ba	configure: increase SCHEDULING_GROUPS_COUNT to 20 We would like to have an additional service level available for users of the Vector Store service, which would allow us to de/prioritize vector operations as needed. To allow that, we increase the number of scheduling groups from 19 to 20 and adjust the related test accordingly. Closes scylladb/scylladb#26316	2025-09-30 12:41:28 +03:00
Nadav Har'El	38002718a9	cqlpy: improve testing for "duration" column type We had very rudimentary tests for the "duration" CQL type in the cqlpy framework - just for reproducing issue #8001. But we left two alternative formats, and a lot of corner cases, untested. So this patch aims to add the missing tests - to exhaustively cover the "duration" literal formats and their capabilities. Some of the examples tested in the new test are inspired by Cassandra's unit test test/unit/org/apache/cassandra/cql3/DurationTest.java and the corner cases that this file covers. However, the new tests are not direct translation of that file because DurationTest.java was not a CQL test - it was a unit test of Cassandra's internal "Duration" type, so could not be directly translated into a CQL-based test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25092	2025-09-30 12:25:02 +03:00
Avi Kivity	4d9271df98	Merge 'sstables: introduce sstable version `ms`' from Michał Chojnowski This is yet another part in the BTI index project. Overarching issue: https://github.com/scylladb/scylladb/issues/19191 Previous part: https://github.com/scylladb/scylladb/pull/25626 Next parts: make `ms` the default. Then, general tweaks and improvements. Later, potentially a full `da` format implementation. This patch series introduces a new, Scylla-only sstable format version `ms`, which is like `me`, but with the index components (Summary.db and Index.db) replaced with BTI index components (Partitions.db and Rows.db), as they are in Cassandra 5.0's `da` format version. (Eventually we want to just implement `da`, but there are several other changes (unrelated to the index files) between `me` and `da`. By adding this `ms` as an intermediate step we can adapt the new index formats without dragging all the other changes into the mix (and raising the risk of regressions, which is already high)). The high-level structure of the PR is: 1. Introduce new component types — `Partitions` and `Rows`. 2. Teach `class sstable` to open them when they exist. 3. Teach the sstable writer how to write index data to them. 4. Teach `class sstable` and unit tests how to deal with sstables that have no `Index` or `Summary` (but have `Partitions` and `Rows` instead). 5. Introduce the new sstable version `ms`, specify that it has `Partitions` and `Rows` instead of `Index` and `Summary`. 6. Prepare unit tests for the appearance of `ms`. 7. Enable `ms` in unit tests. 8. Make `ms` enablable via db::config (with a silent fall back to `me` until the new `MS_SSTABLE_FORMAT` cluster feature is enabled). 9. Prepare integration tests for the appearance of `ms`. 10. Enable both `ms` and `me` in tests where we want both versions to be tested. This series doesn't make `ms` the default yet, because that requires teaching Scylla Manager and a few dtests about the new format first. It can be enabled by setting `sstable_format: ms` in the config. Per a review request, here is an example from `perf_fast_forward`, demonstrating some motivation for a new format. (Although not the main one. The main motivations are getting rid of restrictions on the RAM:disk ratio, and index read throughput for datasets with tiny partitions). The dataset was populated with `build/release/scylla perf-fast-forward --smp=1 --sstable-format=$VERSION --data-directory=data.$VERSION --column-index-size-in-kb=1 --populate --random-seed=0`. This test involves a partition with 1000000 clustering rows (with 32-bit keys and 100-byte values) and ~500 index blocks, and queries a few particular rows from the partition. Since the branching factor for the BIG promoted index is 2 (it's a binary search), the lookup involves ~11.2 sequential page reads per row. The BTI format has a more reasonable branching factor, so it involves ~2.3 page reads per row. `build/release/scylla perf-fast-forward --smp=1 --data-directory=perf_fast_forward_data/me --run-tests=large-partition-select-few-rows`: ``` offset stride rows iterations avg aio aio (KiB) 500000 1 1 70 18.0 18 128 500001 1 1 647 19.0 19 132 0 1000000 1 748 15.0 15 116 0 500000 2 372 29.0 29 284 0 250000 4 227 56.0 56 504 0 125000 8 116 106.0 106 928 0 62500 16 67 195.0 195 1732 ``` `build/release/scylla perf-fast-forward --smp=1 --data-directory=perf_fast_forward_data/ms --run-tests=large-partition-select-few-rows`: ``` offset stride rows iterations avg aio aio (KiB) 500000 1 1 51 5.1 5 20 500001 1 1 64 5.3 5 20 0 1000000 1 679 4.0 4 16 0 500000 2 492 8.0 8 88 0 250000 4 804 16.0 16 232 0 125000 8 409 31.0 31 516 0 62500 16 97 54.0 54 1056 ``` Index file size comparison for the default `perf_fast_forward` tables with `--random-seed=0`: Large partition table (dominated by intra-partition index): 2.4 MB with `me`, 732 kB with `ms`. For the small partitions table (dominated by inter-partition index): 11 MB with `me`, 8.4 MB with `ms`. External tests: I ran SCT test `longevity-mv-si-4days-streaming-test` test on 6 nodes with 30 shards each for 8 hours. No anomalies were observed. New functionality, no backport needed. Closes scylladb/scylladb#26215 * github.com:scylladb/scylladb: test/boost/bloom_filter_test: add test_rebuild_from_temporary_hashes test/cluster: add test_bti_index.py test: prepare bypass_cache_test.py for `ms` sstables sstables/trie/bti_index_reader: add a failure injection in advance_lower_and_check_if_present test/cqlpy/test_sstable_validation.py: prepare the test for `ms` sstables tools/scylla-sstable: add `--sstable-version=?` to `scylla sstable write` db/config: expose "ms" format to the users via database config test: in Python tests, prepare some sstable filename regexes for `ms` sstables: add `ms` to `all_sstable_versions` test/boost/sstable_3_x_test: add `ms` sstables to multi-version tests test/lib/index_reader_assertions: skip some row index checks for BTI indexes test/boost/sstable_inexact_index_test: explicitly use a `me` sstable test/boost/sstable_datafile_test: skip test_broken_promoted_index_is_skipped for `ms` sstables test/resource: add `ms` sample sstable files for relevant tests test/boost/sstable_compaction_test: prepare for `ms` sstables. test/boost/index_reader_test: prepare for `ms` sstables test/boost/bloom_filter_tests: prepare for `ms` sstables test/boost/sstable_datafile_test: prepare for `ms` sstables test/boost/sstable_test: prepare for `ms` sstables. sstables: introduce `ms` sstable format version tools/scylla-sstable: default to "preferred" sstable version, not "highest" sstables/mx/reader: use the same hashed_key for the bloom filter and the index reader sstables/trie/bti_index_reader: allow the caller to passing a precalculated murmur hash sstables/trie/bti_partition_index_writer: in add(), get the key hash from the caller sstables/mx: make Index and Summary components optional sstables: open Partitions.db early when it's needed to populate key range for sharding metadata sstables: adapt sstable::set_first_and_last_keys to sstables without Summary sstables: implement an alternative way to rebuild bloom filters for sstables without Index utils/bloom_filter: add `add(const hashed_key&)` sstables: adapt estimated_keys_for_range to sstables without Summary sstables: make `sstable::estimated_keys_for_range` asynchronous sstables/sstable: compute get_estimated_key_count() from Statistics instead of Summary replica/database: add table::estimated_partitions_in_range() sstables/mx: implement sstable::has_partition_key using a regular read sstables: use BTI index for queries, when present and enabled sstables/mx/writer: populate BTI index files sstables: create and open BTI index files, when enabled sstables: introduce Partition and Rows component types sstables/mx/writer: make `_pi_write_m.partition_tombstone` a `sstables::deletion_time`	2025-09-30 09:40:02 +03:00
Michał Chojnowski	182c8ce87b	test/cqlpy/test_sstable_validation.py: prepare the test for `ms` sstables BIG sstables and BTI sstables use different code paths for validating the Data file against the index. So we want to test both types of indexes, not just the default one. This patch changes the test so that it explicitly tests both `me` and `ms` instead of only testing the default format. Note that we disable some tests for BTI indexes: the tests which check that validation detects mismatches between the row index ("promoted index") and the Data file. This is because currently iteration over the row index in BTI isn't implemented at the moment, so for BTI the validation behaves as if there was no row indexes.	2025-09-29 22:15:25 +02:00
Michał Chojnowski	2ed2033224	test: in Python tests, prepare some sstable filename regexes for `ms`	2025-09-29 22:15:25 +02:00
Piotr Dulikowski	3abe6eadce	Merge 'Add CQL documentation for vector queries using SELECT ANN' from Szymon Wasik This PR adds the missing documentation for the SELECT ... ANN statement that allows performing vector queries. This is just the basic explanation of the grammar and how to use it. More comprehensive documentation about vector search will be added separately in Scylla Cloud documentation and features description. Links to this additional documentation will be added as part of VECTOR-244. Fixes: VECTOR-247. No backport is needed as this is the new feature. Closes scylladb/scylladb#26282 * github.com:scylladb/scylladb: cql3: Update error messages to be in line with documentation. docs: Add CQL documentation for vector queries using SELECT ANN	2025-09-29 12:46:55 +02:00
Botond Dénes	34cc7aafae	tools/scylla-sstable: introduce the upgrade command An offline, scylla-sstable variant of nodetool upgradesstables command. Applies latest (or selected) sstable version and latest schema. Closes scylladb/scylladb#26109	2025-09-27 16:53:14 +03:00
Szymon Wasik	ccfe80ab97	cql3: Update error messages to be in line with documentation. ANN (aproximate nearest neighborhood) is just the name of the type of algorithm used to perform vector search. For this reason the error messages should refer to vector queries rather than ANN queries.	2025-09-26 17:01:10 +02:00
Artsiom Mishuta	f23d19e248	test.py: fix dumping big logs to output 1. Remove dumping cluster logs and print only the link to the log. 2. Fail the test (to fail CI and not ignore the problem) and mark the cluster as dirty (to avoid affecting subsequent tests) in case setup/teardown fails. 3. Add 2 cqlpy tests that fail after applying step 2 to the dirties_cluster list so the cluster is discarded afterward. Closes scylladb/scylladb#26183	2025-09-25 22:36:46 +03:00
Botond Dénes	50038ef2cc	Merge 'alternator: update references to alternator streams issue' from Michael Litvak update all the references about the issue of tablets support for alternator streams to issue https://github.com/scylladb/scylladb/issues/23838 instead of https://github.com/scylladb/scylladb/issues/16317. The issue https://github.com/scylladb/scylladb/issues/16317 is about support of CDC with tablets, but it is now closed and it didn't address alternator streams. the remaining issues about alternator streams should be addressed as part of https://github.com/scylladb/scylladb/issues/23838, so fix the references in order for them not to be missed. backport is not needed Closes scylladb/scylladb#26178 * github.com:scylladb/scylladb: test/cqlpy/test_permissions: unskip test for tablets alternator: update references to alternator streams issue	2025-09-25 11:05:52 +03:00
Michael Litvak	beb11760e0	test/cqlpy/test_permissions: unskip test for tablets the test was skipped for tablets because CDC wasn't supported with tablets, but now it is supported and the issue is closed, so the test should be unskipped.	2025-09-22 10:03:32 +02:00
Avi Kivity	1258e7c165	Revert "Merge 'transport: service_level_controller: create and use `driver` service level' from Andrzej Jackowski" This reverts commit `fe7e63f109`, reversing changes made to `b5f3f2f4c5`. It is causing test.py failures around cqlpy. Fixes #26163 Closes scylladb/scylladb#26174	2025-09-22 09:32:46 +03:00
Evgeniy Naydanov	85cbe7a8d4	test: add test for creating table with CDC enabled if not exists Check if there are no errors on the second attempt of executing "create table if not exists" query if CDC is enabled.	2025-09-21 09:38:36 +02:00
Michał Hudobski	1690e5265a	vector search: correct column name formatting This patch corrects the column name formatting whenever an "Undefined column name" exception is thrown. Until now we used the `name()` function which returns a bytes object. This resulted in a message with a garbled ascii bytes column name instead of a proper string. We switch to the `text()` function that returns a sstring instead, making the message readable. Tests are adjusted to confirm this behavior. Fixes: VECTOR-228 Closes scylladb/scylladb#26120	2025-09-20 07:02:53 +02:00
Michał Chojnowski	9e70df83ab	db: get rid of sstables-format-selector Our sstable format selection logic is weird, and hard to follow. If I'm not misunderstanding, the pieces are: 1. There's the `sstable_format` config entry, which currently doesn't do anything, but in the past it used to disable cluster features for versions newer than the specified one. 2. There are deprecated and unused config entries for individual versions (`enable_sstables_mc_format`, `enable_sstables_md_format`, etc). 3. There is a cluster feature for each version: ME_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, etc. (Currently all sstable version features have been grandfathered, and aren't checked by the code anymore). 4. There's an entry in `system.scylla_local` which contains the latest enabled sstable version. (Why? Isn't this directly derived from cluster features anyway)? 5. There's `sstable_manager::_format` which contains the sstable version to be used for new writes. This field is updated by `sstables_format_selector` based on cluster features and the `system.scylla_local` entry. I don't see why those pieces are needed. Version selection has the following constraints: 1. New sstables must be written with a format that supports existing data. For example, range tombstones with an infinite bound are only supported by sstables since version "mc". So if a range tombstone with an infinite bound exists somewhere in the dataset, the format chosen for new sstables has to be at least as new as "mc". 2. A new format might only be used after a corresponding cluster feature is enabled. (Otherwise new sstables might become unreadable if they are sent to another node, or if a node is downgraded). 3. The user should have a way to inhibit format ugprades if he wishes. So far, constraint (1) has been fulfilled by never using formats older than the newest format ever enabled on the node. (With an exception for resharding and reshaping system tables). Constraint (2) has been fulfilled by calling `sstable_manager::set_format` only after the corresponsing cluster feature is enabled. Constraint (3) has been fulfilled by the ability to inhibit cluster features by setting `sstable_format` by some fixed value. The main thing I don't like about this whole setup is that it doesn't let me downgrade the preferred sstable format. After a format is enabled, there is no way to go back to writing the old format again. That is no good -- after I make some performance-sensitive changes in a new format, it might turn out to be a pessimization for the particular workload, and I want to be able to go back. This patch aims to give a way to downgrade formats without violating the constraints. What it does is: 1. The entry in `system.scylla_local` becomes obsolete. After the patch we no longer update or read it. As far as I understand, the purpose of this entry is to prevent unwanted format downgrades (which is something cluster features are designed for) and it's updated if and only if relevant cluster features are updated. So there's no reason to have it, we can just directly use cluster features. 2. `sstable_format_selector` gets deleted. Without the `system.scylla_local` around, it's just a glorified feature listener. 3. The format selection logic is moved into `sstable_manager`. It already sees the `db::config` and the `gms::feature_service`. For the foreseeable future, the knowledge of enabled cluster features and current config should be enough information to pick the right formats. 4. The `sstable_format` entry in `db::config` is no longer intended to inhibit cluster features. Instead, it is intended to select the format for new sstables, and it becomes live-updatable. 5. Instead of writing new sstables with "highest supported" format, (which used to be set by `sstables_format_selector`) we write them with the "preferred" format, which is determined by `sstable_manager` based on the combination of enabled features and the current value of `sstable_format`. Closes scylladb/scylladb#26092 [avi: Pavel found the reason for the scylla_local entry - it predates stable storage for cluster features]	2025-09-19 16:17:56 +03:00
Avi Kivity	fe7e63f109	Merge 'transport: service_level_controller: create and use `driver` service level' from Andrzej Jackowski This patch series: - Increases the number of allowed scheduling groups to allow creation of `sl:driver` - Implements `create_driver_service_level` that creates `sl:driver` with shares=200 if it wasn't already created - Implements creation of `sl:driver` for new systems and tests in `raft_initialize_discovery_leader` - Modifies `topology_coordinator` to use create `sl:driver` after upgrades. - Implements using `sl:driver` for new connections in `transport/server` - Adds to `transport/server` recognition of driver's control connections and forcing them to keep using `sl:driver`. - Adds tests to verify the new functionality - Modifies existing tests to let them pass after `sl:driver` is added - Modifies the documentation to contain new `sl:driver` The changes were evaluated by a test with the following scenario ([test_connections-sl-driver.py](https://github.com/user-attachments/files/22021273/test_connections-sl-driver.py)): - Start ScyllaDB with one node - Create 1000 keyspaces, 1 table in each keyspace - Start `cassandra-stress` (`-rate threads=50 -mode native cql3`) - Run connection storm with 1000 session (100 python processes, 10 sessions each) The maximum latency during connection storm dropped from 224.94ms to 41.43ms (those numbers are average from 20 test executions, were max latency was in [140ms, 361ms] before change and [31.4ms, 61.5ms] after). The snippet of cassandra-stress output from the moment of connection storm: Before: ``` type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb ... total, 789206, 85887, 85887, 85887, 0.6, 0.3, 2.0, 2.0, 2.5, 5.0, 9.0, 0.09679, 0, 0, 0, 0, 0, 0 total, 909322, 120116, 120116, 120116, 0.4, 0.2, 1.9, 2.0, 2.1, 3.1, 10.0, 0.09053, 0, 0, 0, 0, 0, 0 total, 964392, 55070, 55070, 55070, 0.9, 0.4, 2.0, 4.5, 7.7, 18.9, 11.0, 0.09203, 0, 0, 0, 0, 0, 0 total, 975705, 11313, 11313, 11313, 4.4, 3.5, 6.5, 24.5, 82.7, 83.0, 12.0, 0.11713, 0, 0, 0, 0, 0, 0 total, 987548, 11843, 11843, 11843, 4.2, 3.5, 6.5, 33.7, 48.6, 51.5, 13.0, 0.13366, 0, 0, 0, 0, 0, 0 total, 995422, 7874, 7874, 7874, 6.3, 4.0, 7.7, 85.6, 112.9, 113.5, 14.0, 0.14753, 0, 0, 0, 0, 0, 0 total, 1007228, 11806, 11806, 11806, 4.3, 3.5, 6.5, 29.1, 43.8, 87.1, 15.0, 0.15598, 0, 0, 0, 0, 0, 0 total, 1012840, 5612, 5612, 5612, 8.2, 5.0, 11.5, 121.8, 166.6, 170.1, 16.0, 0.16535, 0, 0, 0, 0, 0, 0 total, 1016186, 3346, 3346, 3346, 13.4, 7.4, 20.1, 204.9, 207.6, 210.4, 17.0, 0.17405, 0, 0, 0, 0, 0, 0 total, 1025462, 9276, 9276, 9276, 6.3, 3.9, 9.6, 74.6, 206.8, 210.0, 18.0, 0.17800, 0, 0, 0, 0, 0, 0 total, 1035979, 10517, 10517, 10517, 4.8, 3.5, 6.7, 38.5, 82.6, 83.0, 19.0, 0.18120, 0, 0, 0, 0, 0, 0 total, 1047488, 11509, 11509, 11509, 4.3, 3.5, 6.0, 32.6, 72.3, 74.0, 20.0, 0.18334, 0, 0, 0, 0, 0, 0 total, 1077456, 29968, 29968, 29968, 1.7, 1.6, 2.9, 3.6, 7.0, 8.2, 21.0, 0.17943, 0, 0, 0, 0, 0, 0 total, 1105490, 28034, 28034, 28034, 1.8, 1.8, 3.5, 4.6, 5.3, 13.8, 22.0, 0.17609, 0, 0, 0, 0, 0, 0 total, 1132221, 26731, 26731, 26731, 1.9, 1.8, 3.8, 5.2, 8.4, 11.1, 23.0, 0.17314, 0, 0, 0, 0, 0, 0 total, 1162149, 29928, 29928, 29928, 1.7, 1.7, 3.0, 4.5, 8.0, 9.1, 24.0, 0.16950, 0, 0, 0, 0, 0, 0 ... ``` After: ``` type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb ... total, 822863, 94379, 94379, 94379, 0.5, 0.3, 2.0, 2.0, 2.1, 3.7, 9.0, 0.06669, 0, 0, 0, 0, 0, 0 total, 937337, 114474, 114474, 114474, 0.4, 0.2, 2.0, 2.0, 2.1, 3.4, 10.0, 0.06301, 0, 0, 0, 0, 0, 0 total, 986630, 49293, 49293, 49293, 1.0, 1.0, 2.0, 2.1, 17.9, 19.0, 11.0, 0.07318, 0, 0, 0, 0, 0, 0 total, 1026734, 40104, 40104, 40104, 1.2, 1.0, 2.0, 2.2, 6.3, 7.1, 12.0, 0.08410, 0, 0, 0, 0, 0, 0 total, 1066124, 39390, 39390, 39390, 1.3, 1.0, 2.0, 2.2, 2.6, 3.4, 13.0, 0.09108, 0, 0, 0, 0, 0, 0 total, 1103082, 36958, 36958, 36958, 1.3, 1.1, 2.1, 2.5, 3.1, 4.2, 14.0, 0.09643, 0, 0, 0, 0, 0, 0 total, 1141987, 38905, 38905, 38905, 1.3, 1.0, 2.0, 2.4, 11.4, 12.7, 15.0, 0.09894, 0, 0, 0, 0, 0, 0 total, 1180023, 38036, 38036, 38036, 1.3, 1.0, 2.0, 3.7, 5.6, 7.1, 16.0, 0.10070, 0, 0, 0, 0, 0, 0 total, 1216481, 36458, 36458, 36458, 1.4, 1.0, 2.1, 3.6, 4.7, 5.0, 17.0, 0.10210, 0, 0, 0, 0, 0, 0 total, 1256819, 40338, 40338, 40338, 1.2, 1.0, 2.0, 2.2, 3.5, 5.4, 18.0, 0.10173, 0, 0, 0, 0, 0, 0 total, 1295122, 38303, 38303, 38303, 1.3, 1.0, 2.0, 2.4, 21.0, 21.1, 19.0, 0.10136, 0, 0, 0, 0, 0, 0 total, 1334743, 39621, 39621, 39621, 1.3, 1.0, 2.0, 2.3, 3.3, 4.0, 20.0, 0.10055, 0, 0, 0, 0, 0, 0 total, 1375579, 40836, 40836, 40836, 1.2, 1.0, 2.0, 2.1, 3.4, 5.7, 21.0, 0.09927, 0, 0, 0, 0, 0, 0 total, 1415576, 39997, 39997, 39997, 1.2, 1.0, 2.0, 2.3, 3.2, 4.1, 22.0, 0.09807, 0, 0, 0, 0, 0, 0 total, 1449268, 33692, 33692, 33692, 1.5, 1.4, 2.5, 3.2, 4.2, 5.6, 23.0, 0.09800, 0, 0, 0, 0, 0, 0 total, 1471873, 22605, 22605, 22605, 2.2, 2.0, 4.8, 5.9, 7.0, 7.9, 24.0, 0.10015, 0, 0, 0, 0, 0, 0 ... ``` Fixes: https://github.com/scylladb/scylladb/issues/24411 This is a new feature, so no backport needed. Closes scylladb/scylladb#25412 * github.com:scylladb/scylladb: docs: workload-prioritization: add driver service level test: add test to verify use of `sl:driver` transport: use `sl:driver` to handle driver's control connections transport: whitespace only change in update_scheduling_group transport: call update_scheduling_group for non-auth connections generic_server: transport: start using `sl:driver` for new connections test: add test_desc_* for driver service level test: service_levels: add tests for sl:driver creation and removal test: add reload_raft_topology_state() to ScyllaRESTAPIClient service_level_controller: automatically create `sl:driver` service_level_controller: methods to create driver service level service_level_controller: handle special sl:driver in DESC output topology_coordinator: add service_level_controller reference system_keyspace: add service_level_driver_created test: add MAX_USER_SERVICE_LEVELS	2025-09-18 19:45:17 +03:00
Piotr Dulikowski	5f55787e50	Merge 'CDC with tablets' from Michael Litvak initial implementation to support CDC in tablets-enabled keyspaces. The design is described in https://docs.google.com/document/d/1qO5f2q5QoN5z1-rYOQFu6tqVLD3Ha6pphXKEqbtSNiU/edit?usp=sharing It is followed closely for the most part except "Deciding when to change streams" - instead, streams are changed synchronously with tablet split / merge. Instead of the stream switching algorithm with the double writes, we use a scheme similar to the previous method for vnodes - we add the new streams with timestamp that is sufficiently far into the future. In this PR we: * add new group0-based internal system tables for tablet stream metadata and loading it into in-memory CDC metadata * add virtual tables for CDC consumers * the write coordinator chooses a stream by looking up the appropriate stream in the CDC metadata * enable creating tables with CDC enabled in tablets-enabled keyspaces. tablets are allocated for the CDC table, and a stream is created per each tablet. * on tablet resize (split / merge), the topology coordinator creates a new stream set with a new stream for each new tablet. * the cdc tablets are co-located with the base tablets Fixes https://github.com/scylladb/scylladb/issues/22576 backport not needed - new feature update dtests: https://github.com/scylladb/scylla-dtest/pull/5897 update java cdc library: https://github.com/scylladb/scylla-cdc-java/pull/102 update rust cdc library: https://github.com/scylladb/scylla-cdc-rust/pull/136 Closes scylladb/scylladb#23795 * github.com:scylladb/scylladb: docs/dev: update CDC dev docs for tablets doc: update CDC docs for tablets test: cluster_events: enable add_cdc and drop_cdc test/cql: enable cql cdc tests to run with tablets test: test_cdc_with_alter: adjust for cdc with tablets test/cqlpy: adjust cdc tests for tablets test/cluster/test_cdc_with_tablets: introduce cdc with tablets tests cdc: enable cdc with tablets topology coordinator: change streams on tablet split/merge cdc: virtual tables for cdc with tablets cdc: generate_stream_diff helper function cdc: choose stream in tablets enabled keyspaces cdc: rename get_stream to get_vnode_stream cdc: load tablet streams metadata from tables cdc: helper functions for reading metadata from tables cdc: colocate cdc table with base cdc: remove streams when dropping CDC table cdc: create streams when allocating tablets migration_listener: add on_before_allocate_tablet_map notification cdc: notify when creating or dropping cdc table cdc: move cdc table creation to pre_create cdc: add internal tables for cdc with tablets cdc: add cdc_with_tablets feature flag cdc: add is_log_schema helper	2025-09-18 13:39:37 +02:00
Andrzej Jackowski	e1b4a338ba	test: add test_desc_* for driver service level Driver service level is a special service level that is created automatically by the system. Therefore, it requires special handling in DESC SCHEMA WITH INTERNALS and those test verifies the special behavior. Refs: scylladb/scylladb#24411	2025-09-18 09:28:32 +02:00
Andrzej Jackowski	6f678a2d1f	service_level_controller: automatically create `sl:driver` This commit: - Increases the number of allowed scheduling groups to allow the creation of `sl:driver`. - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating `sl:driver` until all nodes have increased the number of scheduling groups. - Starts using `get_create_driver_service_level_mutations` to unconditionally create `sl:driver` on `raft_initialize_discovery_leader`. The purpose of this code path is ensuring existence of `sl:driver` in new system and tests. - Starts using `migrate_to_driver_service_level` to create `sl:driver` if it is not already present. The creation of `sl:driver` is managed by `topology_coordinator`, similar to other system keyspace updates, such as the `view_builder` migration. The purpose of this code path is handling upgrades. - Modifies related tests to pass after `sl:driver` is added. Later in this patch series, `sl:driver` will be used by `transport/server` to handle selected traffic, such as the driver's schema and topology fetches. Refs: scylladb/scylladb#24411	2025-09-18 09:28:32 +02:00
Andrzej Jackowski	d30590c1d0	test: add MAX_USER_SERVICE_LEVELS Previously, tests used the hardcoded value 7 for the maximum number of user service levels. This commit introduces a named variable that can be shared across tests to avoid cases where this magic number goes out of sync.	2025-09-18 09:28:32 +02:00
Nadav Har'El	3c969e2122	cql: document and test permissions on materialized views and CDC We were recently surprised (in pull request #25797) to "discover" that Scylla does not allow granting SELECT permissions on individual materialized views. Instead, all materialized views of a base table are readable if the base table is readable. In this patch we document this fact, and also add a test to verify that it is indeed true. As usual for cqlpy tests, this test can also be run on Cassandra - and it passes showing that Cassandra also implemented it the same way (which isn't surprising, given that we probably copied our initial implementation from them). The test demonstrates that neither Scylla nor Cassandra prints an error when attempting to GRANT permissions on a specific materialized view - but this GRANT is simply ignored. This is not ideal, but it is the existing behavior in both and it's not important now to change it. Additionally, because pull request #25797 made CDC-log permissions behave the same as materialized views - i.e., you need to make the base table readable to allow reading from the CDC log, this patch also documents this fact and adds a test for it also. Fixes #25800 Closes scylladb/scylladb#25827	2025-09-18 07:41:35 +03:00
Nadav Har'El	d63fdd1e8b	test/cqlpy: fix run-cassandra to run with Java 21 The script test/cqpy/run-cassandra aims to make it easy to run any version of Cassandra using whatever version of Java the user has installed. Sadly, the fact that Java keeps changing and the Cassandra developers are very slow to adapt to new Javas makes doing this non-trivial. This patch makes it possible for run-cassandra to run Cassandra 5 on the Java 21 that is now the default on Fedora 42. Fedora 42 no longer carries antique version of Java (like Java 8 or 11), not even as an optional package. Sadly, even with this patch it is not possible to run older versions of Cassandra (4 and 3) with Java 21, because the new Java is missing features such as Netty that the older Cassandra require. But at least it restores the ability to run our cqlpy tests against Cassandra 5. Also, this patch adds to test/cqlpy/README.md simple instructions on how to install Java 11 (in addition to the system's default Java 21) on Fedora 42. Doing this is very easy and very recommended because it restores the ability to run Cassandra 3 and 4, not just Cassandra 5. Fixes #25822. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25825	2025-09-17 17:24:47 +03:00
Piotr Smaron	bdb90ee15c	set ssl_* columns in system.clients Depends on https://github.com/scylladb/seastar/pull/2651 Missing columns have been present since probably forever - they were added to the schema but never assigned any value: ``` cqlsh> select * from system.clients; ------------------+------------------------ ... ssl_cipher_suite \| null ssl_enabled \| null ssl_protocol \| null ... ``` This patch sets values of these columns: - with a TLS connection, the 3 TLS-related fields are filled in, - without TLS, `ssl_enabled` is set to `false` and other columns are `null`, - if there's an error while inspecting TLS values, the connection is dropped. We want to save the TLS info of a connection just after accepting it, but without waiting for a TLS handshake to complete, so once the connection is accepted, we're inspecting it in the background for the server to be able to accept next connections immediately. Later, when we construct system.clients virtual table, the previously saved data can be instantaneously assigned to client_data, which is a struct representing a row in system.clients table. This way we don't slow down constructing this table by more than necessary, which is relevant for cases with plenty of connections. Fixes: #9216 Closes scylladb/scylladb#22961	2025-09-17 16:29:55 +03:00
Michael Litvak	778dec2630	test/cqlpy: adjust cdc tests for tablets update cdc-related tests in test/cqlpy for cdc with tablets. * test_cdc_log_entries_use_cdc_streams: this test depends on the implementation of the cdc tables, which is different for tablets, so it's changed to run for both vnodes and tablets keyspaces, and we add the implementation for tablets. * some cdc-related are unskipped for tablets so they will be run with both tablets and vnodes keyspaces. these are tests where the implementation may be different between tablets and vnodes and we want to have converage of both. * other cdc-related tests do not depend on the implementation differences between tablets and vnodes, so we can just enable them to run with the default configuration. previously they were disabled for tablets keyspaces because it wasn't supported, so now we remove this.	2025-09-17 14:47:13 +02:00
Nadav Har'El	e322902506	Merge 'index, metrics: add per-index metrics' from Michał Hudobski This patch adds the possibility to track metrics per secondary index. Currently, only a histogram of query latencies is tracked, but more metrics can be added in the future. To add a new metric, it needs to be added to the index_metrics struct in index/secondary_index_manager.hh and then initialized in index/secondary_index_manager.cc in the constructor of the index_metrics struct. The metrics are created when the index is created and removed when the index is dropped. First lines of the new metric: \# HELP scylla_index_query_latencies Index query latencies \# TYPE scylla_index_query_latencies histogram scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640 scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1 Fixes: https://github.com/scylladb/scylladb/issues/25970 Closes scylladb/scylladb#25995 * github.com:scylladb/scylladb: test: verify that the index metric is added index, metrics: add per-index metrics	2025-09-17 14:54:12 +03:00
Botond Dénes	0cf6a648bb	Merge 'Default create keyspace syntax' from Dario Mirovic Allow for the following CQL syntax: ``` CREATE KEYSPACE [IF NOT EXISTS] <name>; ``` for example: ``` CREATE KEYSPACE test_keyspace; ``` With this syntax all the keyspace's parameters would be defaulted to: replication strategy = `NetworkTopologyStrategy`, replication factor = number of racks , but excluding racks that only have arbiter nodes storage options, durable writes = defaults we normally would use, tablets enabled if they are enabled in the db configuration, e.g. scylla.yaml or db/config.cc by default. Options besides `replication` already have defaults. `replication` had to be specified, but it could be an empty set, where defaults for sub-options (replication strategy and replication factor) would be used - `replication = {}`. Now there is no need for specifying an empty set - omitting `replication = {}` has the same effect as `replication = {}`. Since all the options now have defaults, `WITH` is optional for `CREATE KEYSPACE` statement. Fixes #25145 This is an improvement, no backport needed. Closes scylladb/scylladb#25872 * github.com:scylladb/scylladb: docs: cql: default create keyspace syntax test: cqlpy: add test for create keyspace with no options specified cql: default `CREATE KEYSPACE` syntax	2025-09-16 23:40:47 +03:00
Michał Hudobski	3364cc96f5	test: verify that the index metric is added This commit adds a test that performs a sanity check that the implemented metric is actually being added to Scylla's metrics and has the correct value.	2025-09-16 18:10:01 +02:00
Nadav Har'El	5307d1b9a8	Merge 'vector_index: add version to index options' from Dawid Pawlik Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lacked the information about the version of the index. The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`. It requires few changes and seems unintruitive for existing infrastructure. This patch implements the solution described above. Refs: VECTOR-142 Closes scylladb/scylladb#25614 * github.com:scylladb/scylladb: cqlpy/test_vector_index: add vector index version test vector_index, index_prop_defs: add version to index options create_index_statement: rename `validator` to `custom_index_factory` custom index: rename `custom_index_option_name` vector_index: rename `supported_options` to `vector_index_options`	2025-09-14 15:35:53 +03:00
Avi Kivity	c91b326d5a	Merge 'transport: replace throwing protocol_exception with returns' from Dario Mirovic Replace throwing `protocol_exception` with returning it as a result or an exceptional future in the transport server module. The goal is to improve performance. Most of the `protocol_exception` throws were made from `fragmented_temporary_buffer` module, by passing `exception_thrower()` to its `read` methods. `fragmented_temporary_buffer` is changed so that it now accepts an exception creator, not exception thrower. `fragmented_temporary_buffer_concepts::ExceptionCreator` concept replaced `fragmented_temporary_buffer_concepts::ExceptionThrower` and all methods that have been throwing now return failed result of type `utils::result_with_eptr`. This change is then propagated to the callers. The scope of this patch is `protocol_exception`, so commitlog just calls `.value()` method on the result. If the result failed, that will throw the exception from the result, as defined by `utils::result_with_eptr_throw_policy`. This means that the behavior of commitlog module stays the same. transport server module handles results gracefully. All the caller functions that return non-future value `T` now return `utils::result_with_eptr<T>`. When the caller is a function that returns a future, and it receives failed result, `make_exception_future(std::move(failed_result).value())` is returned. The rest of the callstack up to the transport server `handle_error` function is already working without throwing, and that's how zero throws is achieved. cql3 module changes do the same as transport server module. Benchmark that is not yet merged has commit `67fbe35833e2d23a8e9c2dcb5e04580231d8ec96`, [GitHub diff view](https://github.com/scylladb/scylladb/compare/master...nuivall:scylladb:perf_cql_raw). It uses either read or write query. Command line used: ``` ./build/release/scylla perf-cql-raw --workdir ~/tmp/scylladir --smp 1 --developer-mode 1 --workload write --duration 300 --concurrency 1000 --username cassandra --password cassandra 2>/dev/null ``` The only thing changed across runs is `--workload write`/`--workload read`. Built and run on `release` target. <details> ``` throughput: mean= 36946.04 standard-deviation=1831.28 median= 37515.49 median-absolute-deviation=1544.52 maximum=39748.41 minimum=28443.36 instructions_per_op: mean= 108105.70 standard-deviation=965.19 median= 108052.56 median-absolute-deviation=53.47 maximum=124735.92 minimum=107899.00 cpu_cycles_per_op: mean= 70065.73 standard-deviation=2328.50 median= 69755.89 median-absolute-deviation=1250.85 maximum=92631.48 minimum=66479.36 ⏱ real=5:11.08 user=2:00.20 sys=2:25.55 cpu=85% ``` ``` throughput: mean= 40718.30 standard-deviation=2237.16 median= 41194.39 median-absolute-deviation=1723.72 maximum=43974.56 minimum=34738.16 instructions_per_op: mean= 117083.62 standard-deviation=40.74 median= 117087.54 median-absolute-deviation=31.95 maximum=117215.34 minimum=116874.30 cpu_cycles_per_op: mean= 58777.43 standard-deviation=1225.70 median= 58724.65 median-absolute-deviation=776.03 maximum=64740.54 minimum=55922.58 ⏱ real=5:12.37 user=27.461 sys=3:54.53 cpu=83% ``` ``` throughput: mean= 37107.91 standard-deviation=1698.58 median= 37185.53 median-absolute-deviation=1300.99 maximum=40459.85 minimum=29224.83 instructions_per_op: mean= 108345.12 standard-deviation=931.33 median= 108289.82 median-absolute-deviation=55.97 maximum=124394.65 minimum=108188.37 cpu_cycles_per_op: mean= 70333.79 standard-deviation=2247.71 median= 69985.47 median-absolute-deviation=1212.65 maximum=92219.10 minimum=65881.72 ⏱ real=5:10.98 user=2:40.01 sys=1:45.84 cpu=85% ``` ``` throughput: mean= 38353.12 standard-deviation=1806.46 median= 38971.17 median-absolute-deviation=1365.79 maximum=41143.64 minimum=32967.57 instructions_per_op: mean= 117270.60 standard-deviation=35.50 median= 117268.07 median-absolute-deviation=16.81 maximum=117475.89 minimum=117073.74 cpu_cycles_per_op: mean= 57256.00 standard-deviation=1039.17 median= 57341.93 median-absolute-deviation=634.50 maximum=61993.62 minimum=54670.77 ⏱ real=5:12.82 user=4:10.79 sys=11.530 cpu=83% ``` This shows ~240 instructions per op increase for reads and ~180 instructions per op increase for writes. Tests have been run multiple times, with almost identical results. Each run lasted 300 seconds. Number of operations executed is roughly 38k per second 300 seconds = 11.4m ops. Update: I have repeated the benchmark with clean state - reboot computer, put in performance mode, rebuild, closed other apps that might affect CPU and disk usage. run count: 5 times before and 5 times after the patch duration: 300 seconds Average write throughput median before patch: 41155.99 Average write throughput median after patch: 42193.22 Median absolute deviation is also lower now, with values in range 350-550, while the previous runs' values were in range 750-1350. </details> Built and run on `release` target. <details> ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null ``` throughput: mean= 14910.90 standard-deviation=477.72 median= 14956.73 median-absolute-deviation=294.16 maximum=16061.18 minimum=13198.68 instructions_per_op: mean= 659591.63 standard-deviation=495.85 median= 659595.46 median-absolute-deviation=324.91 maximum=661184.94 minimum=658001.49 cpu_cycles_per_op: mean= 213301.49 standard-deviation=2724.27 median= 212768.64 median-absolute-deviation=1403.85 maximum=225837.15 minimum=208110.12 ⏱ real=5:19.26 user=5:00.22 sys=15.827 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null ``` throughput: mean= 93345.45 standard-deviation=4499.00 median= 93915.52 median-absolute-deviation=2764.41 maximum=104343.64 minimum=79816.66 instructions_per_op: mean= 65556.11 standard-deviation=97.42 median= 65545.11 median-absolute-deviation=71.51 maximum=65806.75 minimum=65346.25 cpu_cycles_per_op: mean= 34160.75 standard-deviation=803.02 median= 33927.16 median-absolute-deviation=453.08 maximum=39285.19 minimum=32547.13 ⏱ real=5:03.23 user=4:29.46 sys=29.255 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null ``` throughput: mean= 206982.18 standard-deviation=15894.64 median= 208893.79 median-absolute-deviation=9923.41 maximum=232630.14 minimum=127393.34 instructions_per_op: mean= 35983.27 standard-deviation=6.12 median= 35982.75 median-absolute-deviation=3.75 maximum=36008.24 minimum=35952.14 cpu_cycles_per_op: mean= 17374.87 standard-deviation=985.06 median= 17140.81 median-absolute-deviation=368.86 maximum=26125.38 minimum=16421.99 ⏱ real=5:01.23 user=4:57.88 sys=0.124 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null ``` throughput: mean= 16198.26 standard-deviation=902.41 median= 16094.02 median-absolute-deviation=588.58 maximum=17890.10 minimum=13458.74 instructions_per_op: mean= 659752.73 standard-deviation=488.08 median= 659789.16 median-absolute-deviation=334.35 maximum=660881.69 minimum=658460.82 cpu_cycles_per_op: mean= 216070.70 standard-deviation=3491.26 median= 215320.37 median-absolute-deviation=1678.06 maximum=232396.48 minimum=209839.86 ⏱ real=5:17.33 user=4:55.87 sys=18.425 cpu=99% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null ``` throughput: mean= 97067.79 standard-deviation=2637.79 median= 97058.93 median-absolute-deviation=1477.30 maximum=106338.97 minimum=87457.60 instructions_per_op: mean= 65695.66 standard-deviation=58.43 median= 65695.93 median-absolute-deviation=37.67 maximum=65947.76 minimum=65547.05 cpu_cycles_per_op: mean= 34300.20 standard-deviation=704.66 median= 34143.92 median-absolute-deviation=321.72 maximum=38203.68 minimum=33427.46 ⏱ real=5:03.22 user=4:31.56 sys=29.164 cpu=99% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null ``` throughput: mean= 223495.91 standard-deviation=6134.95 median= 224825.90 median-absolute-deviation=3302.09 maximum=234859.90 minimum=193209.69 instructions_per_op: mean= 35981.41 standard-deviation=3.16 median= 35981.13 median-absolute-deviation=2.12 maximum=35991.46 minimum=35972.55 cpu_cycles_per_op: mean= 17482.26 standard-deviation=281.82 median= 17424.08 median-absolute-deviation=143.91 maximum=19120.68 minimum=16937.43 ⏱ real=5:01.23 user=4:58.54 sys=0.136 cpu=99% ``` </details> Fixes: #24567 This PR is a continuation of #24738 [transport: remove throwing protocol_exception on connection start](https://github.com/scylladb/scylladb/pull/24738). This PR does not solve a burning issue, but is rather an improvement in the same direction. As it is just an enhancement, it should not be backported. Closes scylladb/scylladb#25408 * github.com:scylladb/scylladb: test/cqlpy: add protocol exception tests test/cqlpy: `test_protocol_exceptions.py` refactor message frame building test/cqlpy: `test_protocol_exceptions.py` refactor duplicate code transport: replace `make_frame` throw with return result cql3: remove throwing `protocol_exception` transport: replace throw in validate_utf8 with result_with_exception_ptr return transport: replace throwing protocol_exception with returns utils: add result_with_exception_ptr test/cqlpy: add unknown compression algorithm test case	2025-09-10 21:54:15 +03:00
Dawid Pawlik	1ce76a6ca2	cqlpy/test_vector_index: add vector index version test Test if the index version is the same as the base table version before the index was created. Test if recreating the index with the same parameters changes the version. Test if altering the base table does not change the version. Test if the user cannot specify the index version option by themself.	2025-09-10 15:19:36 +02:00
Botond Dénes	514f59d157	tools/scylla-sstable: write: move to UUID generation We are moving away from integer generations, so stop using them. Also drop the --generation command-line parameter, UUID generations don't have be provided by the caller, because random UUIDs will not collide with each other. To help the caller still know what generation the output sstable has (previously they provided it via --generation), print the generation to stdout. Closes scylladb/scylladb#25166	2025-09-10 13:47:26 +03:00
Nadav Har'El	5e7251cd40	secondary index: fix xfailing test to pass on Cassandra We have an xfailing test test_secondary_index.py::test_limit_partition which reproduces a Scylla bug in LIMIT when scanning a secondary index (Refs #22158). The point of such a reproducer is to demonstrate the bug by passing on Cassandra but failing on Scylla - yet this specific test doesn't pass on Cassandra because it expects the wrong 3 out of 4 results to be returned: The test begins with LIMIT 1 and sees the first result is (2,1), so we expect when we increase the LIMIT to 3 to see more results from the same partition (2) - and yet the test mistakenly expected the next results to come from partition 1, which is not a reasonable expectation, and doesn't happen in Cassandra (I checked both Cassandra 5 and 4). After this patch, the test passes on Cassandra (I tried 4 and 5), and continues to fail on Scylla - which returns 4 rows despite the LIMIT 3. Note that it is debatable whether this test should insist at all on which 3 items are returned by "LIMIT 3" - In Cassandra the ordering of a SELECT with a secondary index is not well defined (see discussion in Refs #23392). So an alternative implementation of this test would be to just check that LIMIT 3 returns 3 items without insisting which: # In Cassandra the ordering of a SELECT with a secondary index is not # defined (see discussion in #23392), so we don't know which three # results to expect - just that it must be a 3-item subset. rows = list(rs) assert len(rows) == 3 assert set(rows).issubset({(1,1), (1,2), (2,1), (2,2)}) However, as of yet, I did not modify this test to do this. I still believe there is value in secondary index scans having the same order as a scan without a secondary index has - and not an undefined order, and if both Scylla and Cassandra implement that in practice, it's useful for tests to validate this so we'll know if this guarantee is ever broken. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25676	2025-09-10 08:48:52 +03:00
Dario Mirovic	d92ceed19a	test: cqlpy: add test for create keyspace with no options specified This patch introduces one new test case. It tests that a keyspace can be created without specifying replication options. Since other options already had defaults, this test assures a keyspace can be created with no options specified at all, with the following query: `CREATE KEYSPACE ks;` Refs #25145	2025-09-08 15:25:23 +02:00
Nadav Har'El	a1ed2c9d4b	Merge 'Allow users to SELECT from CDC log tables they created.' from Dawid Pawlik Before the patch, user with CREATE access could create a table with CDC or alter the table enabling CDC, but could not query a SELECT on the CDC table they created. It was due to the fact, the SELECT permission was checked on the CDC log, and later it's "parent" - the keyspace, but not the base table, on which the user had SELECT permission automatically granted on CREATE. This patch matches the behavior of querying the CDC log to the one implemented for Materialized Views: 1. No new permissions are granted on CREATE. 2. When querying SELECT, the permissions on base table SELECT are checked. Fixes: https://github.com/scylladb/scylladb/issues/19798 Fixes: VECTOR-151 Closes scylladb/scylladb#25797 * github.com:scylladb/scylladb: cqlpy/test_permissions: run the reproducer tests for #19798 select_statement: check for access to CDC base table	2025-09-04 16:56:52 +03:00
Dawid Mędrek	d2c5268196	cql3: Produce CREATE MATERIALIZED VIEW statement when describing MV of index Before this change, executing `DESCRIBE MATERIALIZED VIEW` on the underlying materialized view of a secondary index would produce a `CREATE INDEX` statement. It was not only confusing, but it also prevented from learning about the definition of the view. The only way to do so was to query system tables. We change that behavior and produce a `CREATE MATERIALIZED VIEW` statement instead. The statement is printed as a comment to implicitly convey that the user should not attempt to execute it to restore the view. A short comment is provided to make it clearer. Before this commit: ``` cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int); cqlsh> CREATE INDEX i ON ks.t(v); cqlsh> DESCRIBE MATERIALIZED VIEW ks.i; CREATE INDEX i ON ks.t(v); ``` After this commit: ``` cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int); cqlsh> CREATE INDEX i ON ks.t(v); cqlsh> DESCRIBE MATERIALIZED VIEW ks.i; /* Do NOT execute this statement! It's only for informational purposes. This materialized view is the underlying materialized view of a secondary index. It can be restored via restoring the index. CREATE MATERIALIZED VIEW ks.i_index [...]; */ ``` Note that describing the base table has not been affected and still works as follows: ``` cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int); cqlsh> CREATE INDEX i ON ks.t(v); cqlsh> DESCRIBE TABLE ks.t; CREATE TABLE ks.t ( p int, v int, PRIMARY KEY (p) ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = '' AND compaction = {'class': 'IncrementalCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND speculative_retry = '99.0PERCENTILE' AND tombstone_gc = {'mode': 'timeout', 'propagation_delay_in_seconds': '3600'}; CREATE INDEX i ON ks.t(v); ``` We also provide two reproducers of scylladb/scylladb#24610. Fixes scylladb/scylladb#24610 Closes scylladb/scylladb#25697	2025-09-03 15:21:37 +02:00
Dawid Pawlik	5e72d71188	cqlpy/test_permissions: run the reproducer tests for #19798 Since the previous commit fixes the issue, we can remove the xfail mark. The tests should pass now.	2025-09-03 13:20:39 +02:00
Pavel Emelyanov	b0aa2d61d9	Merge 'cql3: add default replication factor to `create_keyspace_statement`' from Dario Mirovic When creating a new keyspace, replication factor must be stated. For example: `CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3 };` This patch changes it in the following way - if there is no replication factor specified, use default replication factor. Default replication factor is equal to the number of racks that are not arbiter-only, i.e. racks that have at least one non-arbiter node. The following syntax is now valid: `CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy' };` `CREATE KEYSPACE ks WITH REPLICATION { };` Fixes #16028 Backport is not needed. This is an enhancement for future releases. Closes scylladb/scylladb#25570 * github.com:scylladb/scylladb: docs/cql: update documentation for default replication factor test/cqlpy: add keyspace creation default replication factor tests cql3: add default replication factor to `create_keyspace_statement`	2025-09-03 12:31:53 +03:00
Piotr Dulikowski	762d9ef68f	Merge 'cdc: Set tombstone_gc when creating log table' from Dawid Mędrek Normally, when we create a table, MV, etc., we apply `cf_prop_defs` to the schema builder via the function `cf_prop_defs::apply_to_builder`. Unfortunately, that didn't happen when creating CDC log tables, and so we might have missed some of the properties that would normally be set to some value, even if the default one. One particular example of that phenomenon was `tombstone_gc`. For better or worse, it's not a "standalone property" of a table, but rather part of `extensions`. [Somewhat related issue: scylladb/scylladb#9722] That may have and did cause trouble. Consider this scenario: 1. A CDC log table is created. 2. The table does NOT have any value of `tombstone_gc` set. 3. The user edits the table via `ALTER TABLE`. That statement treats the log table just like any other one (at least as far as the relevant portion of the logic is concerned). Among other things, it uses `cf_prop_defs::apply_to_builder`, and as a result, the `tombstone_gc` property is set to some value: * the default one if the user doesn't specify it in the statement, * a custom one if they do. Why is that a problem? First of all, it's confusing. When we perform a schema backup and a table uses CDC, we include an ALTER statement for its corresponding CDC log table (for more context, see issue scylladb/scylladb#18467 or commit scylladb/scylladb@f12edbdd95). There are two consequences for the user here: 1. If the log table had NOT been altered ever since it was created, the statement will miss the `tombstone_gc` property as if it couldn't be set for it at all. That's confusing! 2. If the log table HAD in fact been altered after its creation, the statement will include the `tombstone_gc` property. That's even more confusing (why was it not present the first time, but it is now?). The `tombstone_gc` property should always be set to avoid confusion and problematic edge cases in tests and to simply be consistent with how other schema entities work. The solution we employ is that we always set the property to the default value. That includes the case when we reattach the log table to the base; consider the following scenario: 1. Create a table with CDC enabled. 2. Detach the log table by performing `ALTER TABLE ... WITH cdc = {'enabled': false}`. 3. Change the `tombstone_gc` property of the log table. 4. Reattach the log table to the base in the same way as in step 2. The expected result would be that the new value of `tombstone_gc` would be preserved after reattaching the log table. However, that's not what will happen. We decide to stay consistent with how other properties of a log table behave, and we reset them after every reattachment. We might change that in the future: see issue scylladb/scylladb#25523. Two reproducer tests of scylladb/scylladb#25187 are included in the changes. Backport: The problem is not critical, so it may not be necessary to backport the changes. That's to be discussed. Closes scylladb/scylladb#25521 * github.com:scylladb/scylladb: cdc: Set tombstone_gc when creating log table tombstone_gc: Add overload of get_default_tombstone_gc_mode tombstone_gc: Rename get_default_tombstonesonte_gc_mode	2025-09-02 10:20:11 +02:00
Karol Nowacki	3086d15999	cql3: Fix crash on ANN OF query when TRACING ON is enabled Executing a vector search (SELECT with ANN OF ordering) query with `TRACING ON` enabled caused a node to crash due to a null pointer dereference. This occurred because a vector index does not have an associated view table, making its `_view_schema` member null. The implementation attempted to enable tracing on this null view schema, leading to the crash. The fix adds a null check for `_view_schema` before attempting to enable tracing on the view (index) table. A regression test is included to prevent this from happening again. Fixes: VECTOR-179 Closes scylladb/scylladb#25500	2025-09-01 17:26:54 +03:00
Dario Mirovic	8e994b3890	test/cqlpy: add protocol exception tests Add protocol exception tests that check errors and exceptions. `test_process_startup_invalid_string_map`: `STARTUP` (0x01) with declared map count, but missing entries - `read_string_map` out-of-range. `test_process_query_internal_malformed_query`: `QUERY` (0x07) long string declared larger than available bytes - `read_long_string_view`. `test_process_query_internal_fail_read_options`: `QUERY` (0x07) with `PAGE_SIZE` flag, but truncated page_size - `read_options` path. `test_process_prepare_malformed_query`: `PREPARE` (0x09) long string declared larger than available bytes - `read_long_string_view` in prepare. `test_process_execute_internal_malformed_cache_key`: `EXECUTE` (0x0A) cache key short bytes declared larger than provided bytes - `read_short_bytes`. `test_process_register_malformed_string_list`: `REGISTER` (0x0B) string list with truncated element - `read_string_list`/`read_string`. Each test asserts an `ERROR` frame is returned and `protocol_error` metrics increase, without causing C++ exceptions. Refs: #24567	2025-08-31 23:40:03 +02:00
Dario Mirovic	84e6979adf	test/cqlpy: `test_protocol_exceptions.py` refactor message frame building Frame building is repetitive and increases verbosity, reducing code readability. This patch solves it by extracting common functionality of frame building into `_build_frame`. Also, helpers `_send_frame` and `_recv_frame` are introduced. While `_recv_frame` is not really useful, it goes well in pair with `_send_frame`. Refs: #24567	2025-08-31 23:40:01 +02:00
Dario Mirovic	19c610d9f7	test/cqlpy: `test_protocol_exceptions.py` refactor duplicate code The code that measures errors and exceptions in `test_protocol_exceptions.py` tests is repetitive. This patch refactors common functionality in a separate `_test_impl` function, improving readability. Refs: #24567	2025-08-31 23:39:58 +02:00
Piotr Dulikowski	7ccb50514d	Merge 'Introduce view building coordinator' from Michał Jadwiszczak This patch introduces `view_building_coordinator`, a single entity within whole cluster responsible for building tablet-based views. The view building coordinator takes slightly different approach than the existing node-local view builder. The whole process is split into smaller view building tasks, one per each tablet replica of the base table. The coordinator builds one base table at a time and it can choose another when all views of currently processing base table are built. The tasks are started by setting `STARTED` state and they are executed by node-local view building worker. The tasks are scheduled in a way, that each shard processes only one tablet at a time (multiple tasks can be started for a shard on a node because a table can have multiple views but then all tasks have the same base table and tablet (last_token)). Once the coordinator starts the tasks, it sends `work_on_view_building_tasks` RPC to start the tasks and receive their results. This RPC is resilient to RPC failure or raft leader change, meaning if one RPC call started a batch of tasks but then failed (for instance the raft leader was changed and caller aborted waiting for the response), next RPC call will attach itself to the already started batch. The coordinator plugs into handling tablet operations (migration/resize/RF change) and adjusts its tasks accordingly. At the start of each tablet operation, the coordinator aborts necessary view building tasks to prevent https://github.com/scylladb/scylladb/issues/21564. Then, new adjusted tasks are created at the end of the operation. If the operation fails at any moment, aborted tasks are rollback. The view building coordinator can also handle staging sstables using process_staging view building tasks. We do this because we don't want to start generating view updates from a staging sstable prematurely, before the writes are directed to the new replica (https://github.com/scylladb/scylladb/issues/19149). For detailed description check: `docs/dev/view-building-coordinator.md` Fixes https://github.com/scylladb/scylladb/issues/22288 Fixes https://github.com/scylladb/scylladb/issues/19149 Fixes https://github.com/scylladb/scylladb/issues/21564 Fixes https://github.com/scylladb/scylladb/issues/17603 Fixes https://github.com/scylladb/scylladb/issues/22586 Fixes https://github.com/scylladb/scylladb/issues/18826 Fixes https://github.com/scylladb/scylladb/issues/23930 --- This PR is reimplementation of https://github.com/scylladb/scylladb/pull/21942 Closes scylladb/scylladb#23760 * github.com:scylladb/scylladb: test/cluster: add view build status tests test/cluster: add view building coordinator tests utils/error_injection: allow to abort `injection_handler::wait_for_message()` test: adjust existing tests utils/error_injection: add injection with `sleep_abortable()` db/view/view_builder: ignore `no_such_keyspace` exception docs/dev: add view building coordinator documentation db/view/view_building_worker: work on `process_staging` tasks db/view/view_building_worker: register staging sstable to view building coordinator when needed db/view/view_building_worker: discover staging sstables db/view/view_building_worker: add method to register staging sstable db/view/view_update_generator: add method to process staging sstables instantly db/view/view_update_generator: extract generating updates from staging sstables to a method db/view/view_update_generator: ignore tablet-based sstables db/view/view_building_coordinator: update view build status on node join/left db/view/view_building_coordinator: handle tablet operations db/view: add view building task mutation builder service/topology_coordinator: run view building coordinator db/view: introduce `view_building_coordinator` db/view/view_building_worker: update built views locally db/view: introduce `view_building_worker` db/view: extract common view building functionalities db/view: prepare to create abstract `view_consumer` message/messaging_service: add `work_on_view_building_tasks` RPC service/topology_coordinator: make `term_changed_error` public db/schema_tables: create/cleanup tasks when an index is created/dropped service/migration_manager: cleanup view building state on drop keyspace service/migration_manager: cleanup view building state on drop view service/migration_manager: create view building tasks on create view test/boost: enable proxy remote in some tests service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()` service/migration_manager: coroutinize `prepare_new_view_announcement()` service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine` service: reload `view_building_state_machine` on group0 apply() service/vb_coordinator: add currently processing base db/system_keyspace: move `get_scylla_local_mutation()` up db/system_keyspace: add `view_building_tasks` table db/view: add view_building_state and views_state db/system_keyspace: add method to get view build status map db/view: extract `system.view_build_status_v2` cql statements to system_keyspace db/system_keyspace: move `internal_system_query_state()` function earlier db/view: ignore tablet-based views in `view_builder` gms/feature_service: add VIEW_BUILDING_COORDINATOR feature	2025-08-29 17:28:44 +02:00
Dario Mirovic	fd84da7a50	test/cqlpy: add keyspace creation default replication factor tests Add test cases for create keyspace default replication factor. It is expected that the default replication factor is equal to the number of racks containing at least some non-zero-token nodes in the test suite. Refs: #16028	2025-08-28 01:42:34 +02:00
Dawid Mędrek	646f8bc4cd	cdc: Set tombstone_gc when creating log table Normally, when we create a table, MV, etc., we apply `cf_prop_defs` to the schema builder via the function `cf_prop_defs::apply_to_builder`. Unfortunately, that didn't happen when creating CDC log tables, and so we might have missed some of the properties that would normally be set to some value, even if the default one. One particular example of that phenomenon was `tombstone_gc`. For better or worse, it's not a "standalone property" of a table, but rather part of `extensions`. [Somewhat related issue: scylladb/scylladb#9722] That may have and did cause trouble. Consider this scenario: 1. A CDC log table is created. 2. The table does NOT have any value of `tombstone_gc` set. 3. The user edits the table via `ALTER TABLE`. That statement treats the log table just like any other one (at least as far as the relevant portion of the logic is concerned). Among other things, it uses `cf_prop_defs::apply_to_builder`, and as a result, the `tombstone_gc` property is set to some value: * the default one if the user doesn't specify it in the statement, * a custom one if they do. Why is that a problem? First of all, it's confusing. When we perform a schema backup and a table uses CDC, we include an ALTER statement for its corresponding CDC log table (for more context, see issue scylladb/scylladb#18467 or commit scylladb/scylladb@f12edbdd95). There are two consequences for the user here: 1. If the log table had NOT been altered ever since it was created, the statement will miss the `tombstone_gc` property as if it couldn't be set for it at all. That's confusing! 2. If the log table HAD in fact been altered after its creation, the statement will include the `tombstone_gc` property. That's even more confusing (why was it not present the first time, but it is now?). The `tombstone_gc` property should always be set to avoid confusion and problematic edge cases in tests and to simply be consistent with how other schema entities work. The solution we employ is that we always set the property to the default value. That includes the case when we reattach the log table to the base; consider the following scenario: 1. Create a table with CDC enabled. 2. Detach the log table by performing `ALTER TABLE ... WITH cdc = {'enabled': false}`. 3. Change the `tombstone_gc` property of the log table. 4. Reattach the log table to the base in the same way as in step 2. The expected result would be that the new value of `tombstone_gc` would be preserved after reattaching the log table. However, that's not what will happen. We decide to stay consistent with how other properties of a log table behave, and we reset them after every reattachment. We might change that in the future: see issue scylladb/scylladb#25523. Two reproducer tests of scylladb/scylladb#25187 are included in the changes. Fixes scylladb/scylladb#25187	2025-08-27 13:18:41 +02:00
Michał Jadwiszczak	cf138da853	test: adjust existing tests - Disable tablets in `test_migration_on_existing_raft_topology`. Because views on tablets are experimental now, we can safely assume that view building coordinator will start with view build status on raft. - Add error injection to pause view building on worker. Used to pause view building process, there is analogous error injection in view_builder. - Do a read barrier in `test_view_in_system_tables` Increases test stability by making sure that the node sees up-to-date group0 state and `system.built_views` is synced. - Wait for view is build in some tests Increases tests stability by making sure that the view is built. - Remove xfail marker from `test_tablet_streaming_with_unbuilt_view` This series fix https://github.com/scylladb/scylladb/issues/21564 and this test should work now.	2025-08-27 10:23:04 +02:00
Nadav Har'El	e2c99436cf	Merge 'cdc, vector_search: enable CDC when the index is created' from Dawid Pawlik When a vector index is created in Scylla, it is initially built using a full scan of the database. After that, it stays up to date by tracking changes through CDC, which should be automatically enabled when the vector index is created. When a user attempts to enable Vector Search (VS), the system checks whether Change Data Capture (CDC) is enabled and properly configured: 1. CDC is not enabled - CDC is automatically enabled with the minimum required TTL (Time-to-Live) for VS (24 hours) and the delta mode set to 'full' or post-image is enabled. - If the user later tries to reduce the CDC TTL below 24 hours or set delta mode to 'keys' with post-image disabled, the action fails. - Error message: Clearly states that CDC TTL must be at least 24 hours and delta mode must be set to 'full' or post-image must be enabled for VS to function. 2. CDC is already enabled - If CDC TTL is ≥ 24 hours and delta mode is set to 'full' or post-image is enabled: VS is enabled successfully. - If CDC TTL is < 24 hours or delta mode is set to 'keys' with post-image disabled: The VS enabling process fails. - Error message: Informs the user that CDC TTL must be at least 24 hours, delta mode must be set to 'full' or post-image must be enabled, and provides a link to documentation on how to update the TTL, delta mode, and post-image. When a user attempts to disable CDC when VS is enabled, the action will fail and the user will be informed by error message that clearly states that VS needs to be disabled (vector indexes have to be dropped) first. Full setup requirements and steps will be detailed in the documentation of Vector Search. Co-authored-by: @smoczy123 Fixes: VECTOR-27 Fixes: VECTOR-25 Closes scylladb/scylladb#25179 * github.com:scylladb/scylladb: test/cqlpy: ensure Vector Search CDC options test/boost: adjust CDC boost tests for Vector Search test/cql: add Vector Search CDC enable/disable test cdc, vector_index: provide minimal option setup for Vector Search test/cqlpy: adjust describe table tests with CDC for Vector Search describe, cdc: adjust describe for cdc log tables cdc: enable CDC log when vector index is created test/cqlpy: run vector_index tests only on vnodes vector_index: check if vector index exists in schema	2025-08-26 23:01:32 +03:00
Dario Mirovic	8b0a551177	test/cqlpy: add unknown compression algorithm test case Add `test_unknown_compression_algorithm` test case to `test_protocol_exceptions.py` test suite. This change improves test coverage for zero throws protocol exception handling. Refs: #24567	2025-08-25 13:31:40 +02:00
Dawid Pawlik	9463ac10e2	test/cqlpy: ensure Vector Search CDC options Add test to check if minimal options for Vector Search are ensured and if it is disallowed to create CDC unrespectfully to the minimal requirements.	2025-08-20 17:20:38 +02:00
Ran Regev	ebf1db5c5e	remove ./redis and dependencies Remove ./redis and all its usages. This is the second commit that removes ./redis from Scylla Signed-off-by: Ran Regev <ran.regev@scylladb.com>	2025-08-20 17:53:23 +03:00

1 2 3 4

194 Commits