In ALTER KEYSPACE, when a datacenter name is omitted, its replication
factor is implicitly set to zero with vnodes, while with tablets,
it remains unchanged.
ALTER KEYSPACE should behave the same way for tablets as it does
for vnodes. However, this can be dangerous as we may mistakenly
drop the whole datacenter.
Reject ALTER KEYSPACE if it changes replication factor, but omits
a datacenter that currently contains tablet replicas.
Fixes: https://github.com/scylladb/scylladb/issues/25549.
Closesscylladb/scylladb#25731
Primary issue with the old method is that each update is a separate
cross-shard call, and all later updates queue behind it. If one of the
shards has high latency for such calls, the queue may accumulate and
system will appear unresponsive for mapping changes on non-zero shards.
This happened in the field when one of the shards was overloaded with
sstables and compaction work, which caused frequent stalls which
delayed polling for ~100ms. A queue of 3k address updates
accumulated, because we update mapping on each change of gossip
states. This made bootstrap impossible because nodes couldn't
learn about the IP mapping for the bootstrapping node and streaming
failed.
To protect against that, use a more efficient method of replication
which requires a single cross-shard call to replicate all prior
updates.
It is also more reliable, if replication fails transiently for some
reason, we don't give up and fail all later updates.
Fixes#26865Closesscylladb/scylladb#26941
* github.com:scylladb/scylladb:
address_map: Use barrier() to wait for replication
address_map: Use more efficient and reliable replication method
utils: Introduce helper for replicated data structures
Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes.
Make them the new default.
If we change our mind, this change can be reverted later.
New functionality, and this is a drastic change. No backport needed.
Closesscylladb/scylladb#26377
* github.com:scylladb/scylladb:
db/config: enable `ms` sstable format by default
cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format
api/system: add /system/chosen_sstable_version
test/cluster/dtest: reduce num_tokens to 16
This commit introduces TLS encryption support for vector store connections.
A new configuration option is added:
- vector_store_encryption_options.truststore: path to the trust store file
To enable secure connections, use the https:// scheme in the
vector_store_primary_uri/vector_store_secondary_uri configuration options.
Fixes: VECTOR-327
This change adds tests for the correctness of tablet size migration
during the end_migrations stage. This size migration can happend for
tablet migrations and for tablet rebuild.
Add 'test_sstable_stream' to verify SSTable file streaming integrity check.
The new tests cover both compressed and uncompressed SSTables and includes:
- Checksum mismatch detection verification
- Digest mismatch detection verifivation
Trie-based sstable indexes are supposed to be (hopefully)
a better default than the old BIG indexes.
Make them the new default.
If we change our mind, this change can be reverted later.
Trie-based indexes and older indexes have a difference in metrics,
and the test uses the metrics to check for bypass cache.
To choose the right metrics, it uses highest_supported_sstable_format,
which is inappropriate, because the sstable format chosen for writes
by Scylla might be different than highest_supported_sstable_format.
Use chosen_sstable_format instead.
Returns the sstable version currently chosen for use in for new sstables.
We are adding it because some tests want to know what format they are
writing (tests using upgradesstable, tests which check stats that only
apply to one of the index types, etc).
(Currently they are using `highest_supported_sstable_format` for this
purpose, which is inappropriate, and will become invalid if a non-latest
format is the default).
Add a test which verifies that if two nodes have the same data, with
different timestamps, repair will detect and fix the diverging
timestamps.
All our repair tests focus on difference in data and I remember writing
this test multiple times in the past to quickly verify whether this
works. Time to upstream this test.
Closesscylladb/scylladb#26900
DynamoDB's documentation
https://docs.aws.amazon.com/sdkref/latest/guide/feature-compression.html
suggests that DynamoDB allows request bodies to be compressed (currently
only by gzip).
The purpose of patch is to have a test reproducing this feature. The
test shows us that indeed DynamoDB understands compressed requests using
the "gzip" encoding, but Alternator does *not*, so the new test is xfail.
As you can see in the test code, although the low-level SDK (botocore)
can send compress requests, this is not actually enabled for DynamoDB
and we need to resort to some trickery to send compressed requests.
But the point is that once we do manage to send compressed requests,
the test shows us that they work properly on AWS, but fail on
Alternator.
The failure of the compressed requests on Alternator is reported like:
An error occurred (ValidationException) when calling the PutItem
operation: Parsing JSON failed: Invalid value. at 70459088
This error message should probably be improved (what is that high
number?!) but of course even better would be to make it really work.
By enabling tracing on alternator-server (e.g., edit test/cqlpy/run.py
and add `'--logger-log-level', 'alternator-server=trace',`) we can
see exactly what request the SDK sends Alternator. What we can see in
the request is:
1. The request headers are uncompressed (this is expected in HTTP)
2. There is a header "Content-Encoding: gzip"
3. The request's body is binary, a full-fleged gzip output complete with
a gzip magic in the beginning.
Refs #5041
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27049
Currently sstables_manager keeps a reference on global db::config to configure itself. Most of other services use their own specific configs with much less data on-board for the same purposes (e.g. #24841, #19051 and #23705 did same for other services) This PR applies this approach to sstables_manager as well.
Mostly it moves various values from db::config onto newly introduced struct sstables_manager::config, but it also adds specific tracking of sstable_file_io_extensions and patches tools/scylla-sstable not to use sstables_manager as "proxy" object to get db::config from along its calls.
Shuffling components dependencies, no need to backport
Closesscylladb/scylladb#27021
* github.com:scylladb/scylladb:
sstables_manager: Drop db::config from sstables_manager
tools/sstable: Make shard_of_with_tablets use db::config argument
tools/sstable: Add db::config& to all operations
tools/sstable: Get endpoints from storage manager
sstables_manager: Hold sstable IO extensions on it
sstables: Manager helper to grab file io extensions
sstables_manager: Move default format on config
sstables_manager: Move enable_sstable_data_integrity_check on config
sstables_manager: Move data_file_directories on config
sstables_manager: Move components_memory_reclaim_threshold on config
sstables_manager: Move column_index_auto_scale_threshold on config
sstables_manager: Move column_index_size on config
sstables_manager: Move sstable_summary_ratio on config
sstables_manager: Move enable_sstable_key_validation on config
sstables_manager: Move available_memory on config
code: Introduce sstables_manager::config
sstables: Patch get_local_directories() to work on vector of paths
code: Rename sstables_manager::config() into db_config()
When we enable CDC on a table, Scylla creates a log table for it.
It has default properties, but the user may change them later on.
Furthermore, it's possible to detach that log table by simply
disabling CDC on the base table:
```cql
/* Create a table with CDC enabled. The log table is created. */
CREATE TABLE ks.t (pk int PRIMARY KEY) WITH cdc = {'enabled': true};
/* Detach the log table. */
ALTER TABLE ks.t WITH cdc = {'enabled': false};
/* Modify a property of the log table. */
ALTER TABLE ks.t_scylla_cdc_log WITH bloom_filter_fp_chance = 0.13;
```
The log table can also be reattached by enabling CDC on the base table
again:
```cql
/* Reattach the log table */
ALTER TABLE ks.t WITH cdc = {'enabled': true};
```
However, because the process of reattachment goes through the same
code that created it in the first place, the properties of the log
table are rolled back to their default values. This may be confusing
to the user and, if unnoticed, also have other consequences, e.g.
affecting performance.
To prevent that, we ensure that the properties are preserved.
A reproducer test,
`test_log_table_preserves_properties_after_reattachment`, has been
provided to verify that the changes are correct.
Another test, `test_log_table_preserves_id_after_reattachment`, has
also been added because the current implementation sets properties
and the ID separately.
Fixesscylladb/scylladb#25523
Backport: not necessary. Although the behavior may be unexpected,
it's not a bug per se.
Closesscylladb/scylladb#26443
* github.com:scylladb/scylladb:
cdc: Preserve properties when reattaching log table
cdc: Extract creating columns in CDC log table to dedicated function
cdc: Extract default properties of CDC log tables to dedicated function
schema/schema_builder.hh: Add set_properties
schema: Add getter for schema::user_properties
schema: Remove underscores in fields of schema::user_properties
schema: Extract user properties out of raw_schema
Fixes#26874
Due to certain people (me) not being able to tell forward from backward,
the data alignment to ensure partial uploads adhere to the 256k-align
rule would potentially _reorder_ trailing buffers generated iff the
source buffers input into the sink are small enough. Which, as a fun fact,
they are in backup upload.
Change the unit test to use raw sink IO and add two unit tests (of which
the smaller size provokes the bug) that checks the same 64k buf segmented
upload backup uses.
Closesscylladb/scylladb#26938
Add a table name to Alternator's tracing output, as some clients would
like to consistently receive this information.
- add missing `tracing::add_table_name` in `executor::scan`
- add emiting tables' names in `trace_state::build_parameters_map`
- update tests, so when tracing is looked for it is filtered by table's
name, which confirms table is being outputed.
- change `struct one_session_records` declaration to `class one_session_records`,
as `one_session_records` is later defined as class.
Refs #26618Fixes#24031Closesscylladb/scylladb#26634
cluster.dtest_alternator_tests.test_slow_query_logging performs
a bootstrap with 768 token ranges.
It works with `me` sstables, which have 2 open file descriptors
per open sstable, but with `ms` sstables, which have 3 open
file descriptors per open sstable, it fails with EMFILE.
To avoid this problem, let's just decrease the number of vnodes
for in the test suite. It's appropriate anyway, because it avoids some
unneeded work without weakening the tests.
(Note: pylib-based have been setting `num_tokens` to 16 for a long time too).
This breaks `bypass_cache_test`, which is written in a way that expects
a certain number of token ranges. We adjust the relevant parameter
accordingly.
Consider the following:
1) single-key read starts, blocks on replica e.g. waiting for memory.
2) the same replica is migrated away
3) single-key read expires, coordinator abandons it, releases erm.
4) migration advances to cleanup stage, barrier doesn't wait on
timed-out read
5) compaction group of the replica is deallocated on cleanup
6) that single-key resumes, but doesn't find sstable set (post cleanup)
7) with abort-on-internal-error turned on, node crashes
It's fine for abandoned (= timed out) reads to fail, since the
coordinator is gone.
For active reads (non timed out), the barrier will wait for them
since their coordinator holds erm.
This solution consists of failing reads which underlying tablet
replica has been cleaned up, by just converting internal error
to plain exception.
Fixes#26229.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#27078
According to the IETF spec uuid variant bits should be set to '10'. All
others are either invalid or reserved. The patch change the code to
follow the spec.
Closesscylladb/scylladb#27073
One of the tests in test_describe.py used "USE {test_keyspace}" which
affects the CQL session shared by all tests in an unrecoverable way (there
is no "UNUSE" statement).
As an example of what might happen if the shared CQL session is "polluted"
by a USE, issue #26334 is about a bug we have in DESC KEYSPACES when USE
is active.
So in this patch, we:
1. Fix the test to not use USE on the shared CQL session - it's easy to
create a separate session to use the "USE" on. With this fix, the test
no longer leaves the shared CQL session in a "USE" state.
2. Add a new xfailing test to reproduce the DESC KEYSPACES bug.
Refs #26334
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#26345
The purpose of this two-patch series is to reproduce a previously unknown bug, Refs #26914.
Recently we saw a lot of patches that change how we create new schemas (keyspaces and tables), sometimes changing various long-time defaults. We started to worry that perhaps some of these defaults were applied only to CQL base tables and perhaps not to Alternator or to CQL's auxiliary tables (materialized views, secondary indexes, or CDC logs). For example, in Refs #26307 we wondered if perhaps the default "speculative_retry" option is different in Alternator than in CQL.
The first patch includes Alternator tests, and the second CQL tests. In both tests we discover that although recently (commit adf9c42, Refs #26610) we changed the default sstable compressor from LZ4Compressor to LZ4WithDictsCompressor, actually this change was **only** applied to CQL base tables. All Alternator tables and all CQL auxiliary tables (views, indexes, CDC) still use the old LZ4Compressor. This is issue #26914.
Closesscylladb/scylladb#26915
* github.com:scylladb/scylladb:
test/cqlpy: test compression setting for auxiliary table
test/alternator: tests for schema of Alternator table
This change adds support for secondary vector store clients, typically
located in different availability zones. Secondary clients serve as
fallback targets when all primary clients are unavailable.
New configuration option allows specifying secondary client addresses
and ports.
Fixes: VECTOR-187
Closesscylladb/scylladb#26484
When streaming SSTables by tablet range, the original implementation of tablet_sstable_streamer::stream may break out of the loop too early when encountering a non-overlaping SSTable. As a result, subsequent SSTables that should be classified as partially contained are skipped entirely.
Tablet range: [4, 5]
SSTable ranges:
[0,5]
[0, 3] <--- is considered exhausted, and causes skip to next tablet
[2, 5] <--- is missed for range [4, 5]
The loop uses if (!overlaps) break; semantics, which conflated “no overlap” with “done scanning.” This caused premature termination when an SSTable did not overlapped but the following one did.
Correct logic should be:
before(sst_last) → skip and continue.
after(sst_first) → break (no further SSTables can overlap).
Otherwise → `contains` to classify as full or partial.
Missing SSTables in streaming and potential data loss or incomplete streaming in repair/streaming operations.
1. Correct the loop termination logic that previously caused certain SSTables to be prematurely excluded, resulting in lost mutations. This change ensures all relevant SSTables are properly streamed and their mutations preserved.
2. Refactor the loop to use before() and after() checks explicitly, and only break when the SSTable is entirely after the tablet range
3. Add pytest to cover this case, full streaming flow by means of `restore`
4. Add boost tests to test the new refactored function
This data corruption fix should be ported back to 2024.2, 2025.1, 2025.2, 2025.3 and 2025.4
Fixes: https://github.com/scylladb/scylladb/issues/26979Closesscylladb/scylladb#26980
* github.com:scylladb/scylladb:
streaming: fix loop break condition in tablet_sstable_streamer::stream
streaming: add pytest case to reproduce mutation loss issue
vector_search: Fix error handling and status parsing
This change addresses two issues in the vector search client that caused
validator test failures: incorrect handling of 5xx server errors and
faulty status response parsing.
1. 5xx Error Handling:
Previously, a 5xx response (e.g., 503 Service Unavailable) from the
underlying vector store for an `/ann` search request was incorrectly
interpreted as a node failure. This would cause the node to be marked
as down, even for transient issues like an index scan being in progress.
This change ensures that 5xx errors are treated as transient search
failures, not node failures, preventing nodes from being incorrectly
marked as down.
2. Status Response Parsing:
The logic for parsing status responses from the vector store was
flawed. This has been corrected to ensure proper parsing.
Fixes: SCYLLADB-50
Backport to 2025.4 as this problem is present on this branch.
Closesscylladb/scylladb#27111
* github.com:scylladb/scylladb:
vector_search: Don't mark nodes as down on 5xx server errors
test: vector_search: Move unavailable_server to dedicated file
vector_search: Fix status response parsing
This patch adds a metric for pre-compression size of sstable files.
This patch adds a per-table metric
`scylla_column_family_total_disk_space_before_compression`,
which measures the hypothetical total size of sstables on disk,
if Data.db was replaced with an uncompressed equivalent.
As for the implementation:
Before the patch, tables and sstable sets are already tracking their total physical file size.
Whenever sstables are added or removed, the size delta is propagated from the sstable up through sstable sets into table_stats.
To implement the new metric, we turn the size delta that is getting passed around from a one-dimensional to a two-dimensional value, which includes both the physical and the pre-compression size.
New functionality, no backport needed.
Closesscylladb/scylladb#26996
* github.com:scylladb/scylladb:
replica/table: add a metric for hypothetical total file size without compression
replica/table: keep track of total pre-compression file size
For an `/ann` search request, a 5xx server response does not
indicate that the node is down. It can signify a transient state, such
as the index full scan being in progress.
Previously, treating a 503 error as a node fault would cause the node
to be incorrectly marked as down, for example, when a new index was
being created. This commit ensures that such errors are treated as
transient search failures, not node failures.
In pull request #26960 there was some discussion about what is the valid form of ExclusiveStartKey, and whether we need to allow some "non-standard" uses of it for scan over system tables (which aren't real Alternator tables and may have multiple key columns, something not possible in normal Altenrator tables). This made me realize our tests for what is allowed - and what is not allowed - in ExclusiveStartKey - are very sparse and don't cover all the cases that are possible in Scan and Query, in base tables and in GSIs.
So this small series attempts to increase the coverage of the tests for ExclusiveStartKey to make sure we are compatible with DynamoDB and also that we don't regress in #26960.
The new tests reproduce a previously unknown error-path issues, #26988, where in some cases DynamoDB considers ExclusiveStartKey to be invalid but Alternator erronously accepts. Fortunately, we didn't find any success-path (correctness) bugs.
Closesscylladb/scylladb#26994
* github.com:scylladb/scylladb:
test/alternator: tests for ExclusiveStartKey in GSI
test/alternator: more tests for ExclusiveStartKey in Scan
test/alternator: more tests for ExclusiveStartKey in Query
Increase the value of the max-networking-io-control-blocks option
for the cpp tests as it is too low and causes flakiness
as seen in vector_search.vector_store_client_test.vector_store_client_single_status_check_after_concurrent_failures:
```
seastar/src/core/reactor_backend.cc:342: void seastar::aio_general_context::queue(linux_abi::iocb *): Assertion `last < end` failed.
```
See also https://github.com/scylladb/seastar/issues/976Fixes#27056
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#27117
Correct the loop termination logic that previously caused
certain SSTables to be prematurely excluded, resulting in
lost mutations. This change ensures all relevant SSTables
are properly streamed and their mutations preserved.
More efficient than 100 pings.
There was one ping in test which was done "so this shard notices the
clock advance". It's not necessary, since obsering completed SMP
call implies that local shard sees the clock advancement done within in.
Primary issue with the old method is that each update is a separate
cross-shard call, and all later updated queue behind it. If one of the
shards has high latency for such calls, the queue may accumulate and
system will appear unresponsive for mapping changes on non-zero shards.
This happened in the field when one of the shards was overloaded with
sstables and compaction work, which caused frequent stalls which
delayed polling for ~100ms. A queue of 3k address updates
accumulated. This made bootstrap impossible, since nodes couldn't
learn about the IP mapping for the bootstrapping node and streaming
failed.
To protect against that, use a more efficient method of replication
which requires a single cross-shard call to replicate all prior
updates.
It is also more reliable, if replication fails transiently for some
reason, we don't give up and fail all later updates.
Fixes#26865Fixes#26835
Key goals:
- efficient (batching updates)
- reliable (no lost updates)
Will be used in data structures maintained on one designed owning
shard and replicated to other shards.
Somehow, the line of code responsible for freeing flushed nodes
in `trie_writer` is missing from the implementation.
This effectively means that `trie_writer` keeps the whole index in
memory until the index writer is closed, which for many dataset
is a guaranteed OOM.
Fix that, and add some test that catches this.
Fixesscylladb/scylladb#27082Closesscylladb/scylladb#27083
The response was incorrectly parsed as a plain string and compared
directly with C++ string. However, the body contains a JSON string,
which includes escaped quotes that caused comparison failures.
In the previous patch we noticed that although recently (commit adf9c42,
Refs #26610) we changed the default sstable compressor from LZ4Compressor
to LZ4WithDictsCompressor, this change was only applied to CQL, not to
Alternator.
In this patch we add tests that demonstrate that it's even worse - the
new compression only applies to CQL's *base* table - all the "auxiliary"
tables -
* Materialized views
* Secondary index's materialized views
* CDC log tables
all still have the old LZ4Compressor, different from the base table's
default compressor.
The new test fails on Scylla, reproducing #26914, and passes on
Cassandra (on Cassandra, we only compare the materialized view table,
because SI and CDC is implemented differently).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch introduces a new test that exposed a previously unknown bug,
Refs #26914:
Recently we saw a lot of patches that change how we create new schemas
(keyspaces and tables), sometimes changing various long-time defaults.
We started to worry that perhaps some of these defaults were applied only
to CQL and not to Alternator. For example, in Refs #26307 we wondered if
perhaps the default "speculative_retry" option is different in Alternator
than in CQL.
This patch includes a new test file test/alternator/test_cql_schema.py,
with tests for verifying how Alternator configures the underlying tables
it creates. This test shows that the "speculative_retry" doesn't have
this suspected bug - it defaults to "99.0PERCENTILE" in both CQL and
Alternator. But unfortunately, we do have this bug with the "compression"
option:
It turns out that recently (commit adf9c42, Refs #26610) we changed the
default sstable compressor from LZ4Compressor to LZ4WithDictsCompressor,
but the change was only applied to CQL, not Alternator. So the test that
"compression" is the same in both fails - and marked "xfails" and
I created a new issue to track it - #26914.
Another test verifies that Alternators "auxiliary" tables - holding
GSIs, LSIs and Streams - have the same default properties as the base
table. This currently seems to hold (there is no bug).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Not waiting for nodes to see each other as alive can cause the driver to
fail the request sent in `wait_for_upgrade_state()`.
scylladb/scylladb#19771 has already replaced concurrent restarts with
`ManagerClient.rolling_restart()`, but it has missed this single place,
probably because we do concurrent starts here.
Fixes#27055Closesscylladb/scylladb#27075
Introduce a test that demonstrates mutation loss caused by premature
loop termination in tablet_sstable_streamer::stream. The code broke
out of the SSTable iteration when encountering a non-overlapping range,
which skipped subsequent SSTables that should have been partially
contained. This test showcases the problem only.
Example:
Tablet range: [4, 5]
SSTable ranges:
[0,5]
[0, 3] <--- is considered exhausted, and causes skip to next tablet
[2, 5] <--- is missed for range [4, 5]
This reverts commit 43738298be.
This commit causes instability in dtests. Several non-gating dtests
started failing, as well as some gating ones, see #27047.
Closesscylladb/scylladb#27067Fixes#27047
This series allows an operator to reset 'cleanup needed' flag if he already cleaned up the node, so that automatic cleanup will not do it again. We also change 'nodetool cleanup' back to run cleanup on one node only (and reset 'cleanup needed' flag in the end), but the new '--global' option allows to run cleanup on all nodes that needed it simultaneously.
Fixes https://github.com/scylladb/scylladb/issues/26866
Backport to all supported version since automatic cleanup behaviour as it is now may create unexpected by the operator load during cluster resizing.
Closesscylladb/scylladb#26868
* https://github.com/scylladb/scylladb:
cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster
cleanup: Add RESTful API to allow reset cleanup needed flag
…played
Currently, if flushing hints falls within the repair cache timeout, then the flush_time is set to batchlog_manager::_last_replay. _last_replay is updated on each replay, even if some batches weren't replayed. Due to that, we risk the data resurrection.
Update _last_replay only if all batches were replayed.
Fixes: https://github.com/scylladb/scylladb/issues/24415.
Needs backport to all live versions.
Closesscylladb/scylladb#26793
* github.com:scylladb/scylladb:
test: extend test_batchlog_replay_failure_during_repair
db: batchlog_manager: update _last_replay only if all batches were replayed
After in the previous patches we added more exhaustive testing for
the ExclusiveStartKey feature of Query and Scan, in this patch we
add tests for this feature in the context of GSIs.
Most interestingly, the ExclusiveStartKey when querying a GSI
isn't just the key of the GSI, but also includes the key columns
of the base - in other words, it is the key that Scylla uses for
its materialized view.
The tests here confirm that paging on GSI works - this paging uses
ExclusiveStartKey of course - but also what is the specific structure
and meaning of the content of ExclusiveStartKey.
We also include two xfailing tests which again, like in the previous
patches, show we don't do enough validation (issue #26988) and
don't recognize wrong values or spurious columns in ExclusiveStartKey.
As usual, all new tests pass on DynamoDB, and all except the xfailing
ones pass on Alternator.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In the previous patch we added more tests for ExclusiveStartKey
in the context of the "Query" request. Here we do a similar thing
for "Scan". There are fewer error cases for Scan. In particular,
while it didn't make sense to use ExclusiveStartKey on a Query on a
table without a sort key (since a Query there always returns a single
item), for Scan it's needed - for paging. So we add in this patch
a test (that we didn't have before!) that Scan paging works correctly
also in the case of a table without a sort key.
This patch has one xfailing test reproducing #26988, that we don't
recognize and refuse spurious columns (columns not in the key) in
ExclusiveStartKey.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
We already have in test/alternator/test_query.py a test -
test_query_exclusivestartkey - for one successful uses of
ExclusiveStartKey. But we missed testing quite a few edge cases of
this parameter, so this patch adds more tests for it - see the
comments on each individual test explaining its purpose.
With the new tests, we actually identified three cases where we got
the error handling wrong - cases of ExclusiveStartKey which DynamoDB
refuses, but Alternator allows. So three of the tests included here
pass on DynamoDB but fail on Alternator, so are marked with "xfail".
Refs #26988 - which is a new issue about these three cases of missing
validation.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Previously, `wait_until_driver_service_level_created` only waited for
the `driver` service level to appear in the output of
`LIST ALL SERVICE_LEVELS`. However, the fact that one node lists
`sl:driver` does not necessarily mean that all other nodes can see
it yet. This caused sporadic test failures, especially in DEBUG builds.
To prevent these failures, this change adds an extra wait for
a `raft/read_barrier` after the `driver` service level first appears.
This ensures the service level is globally visible across the cluster.
Fixes: scylladb/scylladb#27019
Pass a ManagerClient instead of a `cql` session to
`wait_until_driver_service_level_created`. This makes it easier
to add additional functionality to the helper later (e.g. waiting for
a Raft read barrier in a subsequent commit).
Refs: scylladb/scylladb#27019
This patch adds support for multiple audit log outputs.
If only one audit log output is enabled, the behavior does not change.
If multiple audit log outputs are enabled, then the `audit_composite_storage_helper` class is used. It has a collection
of `storage_helper` objects.
Performance testing shows that read query throughput and auth request throughput are consistent even at high reactor utilization. It can also be observed that read query latency increases a bit.
Read query ops = 60k/s
AUTH ops = 200/s
| Audit Mode | QUERY latency (p99) | Δ% vs none |
|------------|---------------------|------------|
| none | 777 | 0 |
|table| 801 | +3.09% |
|syslog | 803 | +3.35% |
|table,syslog | 818 | +5.28% |
Read query ops = 50k/s
AUTH ops = 200/s
| Audit Mode | QUERY latency (p99) | Δ% vs none |
|------------|---------------------|------------|
| none | 643 | 0 |
|table| 647 | +0.62% |
|syslog | 648 | +0.78% |
|table,syslog | 656 | +2.02% |
Detailed performance results are in the following Confluence document: [Audit performance impact test](https://scylladb.atlassian.net/wiki/spaces/RND/pages/148308005/Audit+performance+impact+test)
Fixes#26022
Backport:
The decision is to not backport for now. After making sure it works on the latest release, and if there is a need, we can do it.
Closesscylladb/scylladb#26613
* github.com:scylladb/scylladb:
test: dtest: audit_test.py: add AuditBackendComposite
test: dtest: audit_test.py: group logs in dict per audit mode
audit: write out to both table and syslog
audit: move storage helper creation from `audit::start` to `audit::audit`
audit: fix formatting in `audit::start_audit`
audit: unify `create_audit` and `start_audit`
97ab3f6622 changed "nodetool cleanup" (without arguments) to run
cleanup on all dirty nodes in the cluster. This was somewhat unexpected,
so this patch changes it back to run cleanup on the target node only (and
reset "cleanup needed" flag afterwards) and it adds "nodetool cluster
cleanup" command that runs the cleanup on all dirty nodes in the
cluster.
A Vector Store node is now considered down if it returns an HTTP 500
server error. This can happen, for example, if the node fails to
connect to the database or has not completed its initial full scan.
The logic for marking a node as 'up' is also enhanced. A node is now
only considered up when its status is explicitly 'SERVING'.
Fixes: VECTOR-187
Backport to 2025.4 as this feature is expected to be available in 2025.4.
Closesscylladb/scylladb#26413
* github.com:scylladb/scylladb:
vector_search: Improve vector-store health checking
vector_search: Move response_content_to_sstring to utils.hh
vector_search: Add unit tests for client error handling
vector_search: Enable mocking of status requests
vector_search: Extract abort_source_timeout and repeat_until
vector_search: Move vs_mock_server to dedicated files
When we enable CDC on a table, Scylla creates a log table for it.
It has default properties, but the user may change them later on.
Furthermore, it's possible to detach that log table by simply
disabling CDC on the base table:
```cql
/* Create a table with CDC enabled. The log table is created. */
CREATE TABLE ks.t (pk int PRIMARY KEY) WITH cdc = {'enabled': true};
/* Detach the log table. */
ALTER TABLE ks.t WITH cdc = {'enabled': false};
/* Modify a property of the log table. */
ALTER TABLE ks.t_scylla_cdc_log WITH bloom_filter_fp_chance = 0.13;
```
The log table can also be reattached by enabling CDC on the base table
again:
```cql
/* Reattach the log table */
ALTER TABLE ks.t WITH cdc = {'enabled': true};
```
However, because the process of reattachment goes through the same
code that created it in the first place, the properties of the log
table are rolled back to their default values. This may be confusing
to the user and, if unnoticed, also have other consequences, e.g.
affecting performance.
To prevent that, we ensure that the properties are preserved.
A reproducer test,
`test_log_table_preserves_properties_after_reattachment`, has been
provided to verify that the changes are correct. It fails before this
commit.
Another test, `test_log_table_preserves_id_after_reattachment`, has
also been added because the current implementation sets properties
and the ID separately.
Fixesscylladb/scylladb#25523