In issue #5320 we noticed that when we have a GSI with a hash key only
(no range key) but the implementation's MV needs to add a clustering
key for the original base key columns, the output of DescribeTable
wrongly lists that exta "range key" - which isn't a real range key of
the GSI.
It turns out that the fact that the extra attribute is not a real GSI
range key has another implication: It should not be allowed in
KeyConditions or KeyConditionExpression - which should allow only real
key columns of the GSI.
This patch adds two reproducing tests for this issue (issue #2601),
both pass on DynamoDB but xfail on Alternator.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
As noticed in issue #26079 the Alternator test test_gsi.py::test_17119a
fails on DynamoDB. The problem was that the test added to KeyConditions
reading from a GSI an unnecessary attribute - one which was accidentally
allowed by Alternator (Refs #26103) but not allowed by DynamoDB.
This is easy to fix - just remove the unnecessary attribute from
KeyConditions, and the test still works properly and passes on both
DynamoDB and Alternator.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
As noticed in issue #26079, the Alternator test
test_number.py::test_invalid_numbers failed on DynamoDB, because one of
the things it did, as a "sanity check", was to check that the number
0e1000 was a valid number. But it turns out it isn't allowed by
DynamoDB.
So this patch removes 0e1000 from the list of *valid* numbers in
test_invalid_numbers, and instead creates a whole new test for the
case of 0e1000.
It turns out that DynamoDB has a bug (it appears to be a regression,
because test_invalid_numbers used to pass on DynamoDB!) where it
allows 0.0e1000 (since it's just zero, really!) but forbids 0e1000
which is incorrectly considered to have a too-large magnitude.
So we introduce a test that confirms that Alternator correctly allows
both 0.0e1000 and 0e1000. DynamoDB fails this test (it allows the
first, forbidding the second), making it the first Alternator test
tagged as a "dynamodb_bug".
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Load balancer aims to preserve a balance in rack loads when generating
tablet migrations. However, this balance might get broken when dead nodes
are present. Currently, these nodes aren't include in rack load calculations,
even if they own tablet replicas. As a result, load balancer treats racks
with dead nodes as racks with a lower load, so I generates migrations to these
racks.
This is incorrect, because a dead node might come back alive, which would result
in having multiple tablet replicas on the same rack. It's also inefficient
even if we know that the node won't come back - when it's being replaced or removed.
In that case we know we are going to rebuild the lost tablet replicas
so migrating tablets to this rack just doubles the work. Allowing such migrations
to happen would also require adjustments in the materialized view pairing code
because we'd temporarily allow having multiple tablet replicas on the same rack.
So in this patch we include dead nodes when calculating rack loads in the load
balancer. The dead nodes still aren't treated as potential migration sources or
destinations.
We also add a test which verifies that no migrations are performed by doing a node
replace with a mv workload in parallel. Before the patch, we'd get pairing errors
and after the patch, no pairing errors are detected.
Fixes https://github.com/scylladb/scylladb/issues/24485Closesscylladb/scylladb#26028
tools/scylla-sstable.cc has 3.5k SLOC, out of which this class alone is 1K. Extract into own hh and cc. Since this class was already using pimpl, the header remains nice and small.
Code cleanup, no backport needed.
Closesscylladb/scylladb#26064
* github.com:scylladb/scylladb:
tools: extract json_mtuation_stream_parser to its own hh,cc files
tools/scylla-sstable: fix indentation
tools/scylla-sstable: prepare for extracting json_mutation_stream_parser
Currently, in repair_tablet we retrieve session_id from tablet
map (and throw if it isn't specified). In case of topology
coordinator failover, we may end up in a situation where a node
runs outdated repair, treating session of a different operation
as the repair's session:
- topology coordinator starts repair transition (A);
- topology coordinator sends tablet repair rpc to node1;
- topology coordinator is separated from the cluster;
- new topology coordinator is elected;
- new topology coordinator sees waiting repair request (A_2)
and executes it;
- new repair of the same tablet is requested (B);
- new topology coordinator starts repair transition (B);
- new topology coordinator sends tablet repair rpc to node2;
- node2 starts repair (B) as repair master;
- node1 starts repair (A), checks the current session (B), proceeds
with repair (B) as repair master.
Send current session_id in repair_tablet rpc. If this session_id
and session id got from tablet map don't match, an exception
is thrown.
Fixes: https://github.com/scylladb/scylladb/issues/23318.
No backport; changes in rpc signatures
Closesscylladb/scylladb#25369
* github.com:scylladb/scylladb:
test: check that repair with outdated session_id fails
service: pass current session_id to repair rpc
Currently, during cache invaldation we check if we need to preempt
only after the partition gets invaldaited. This may lead to stalls
if we have a chain of filtered out partitions.
Check for preemption even if the partition does not get invaldated.
Refs: https://github.com/scylladb/scylladb/issues/9136.
Optimization; no backport
Closesscylladb/scylladb#26053
* github.com:scylladb/scylladb:
db: fix indentation
db: cache: consider preempting after each partition
This PR introduces a major rewrite of the EaR document. The initial motivation for this PR was to fully cover all our supported key providers with working examples, and to add instructions for key rotation. However, many other improvements were made along the way.
Main changes in this PR:
* Add a high-level description for every key provider. Mention limitations.
* Better organize existing provider-specific instructions by placing them into clearly separated, tabbed sections.
* Add instructions for the replicated key provider. Mention explicitly that it cannot be used as default option for user or system encryption, and that it does not support key rotation.
* Add more examples for KMS and GCP to cover all credential types.
* Document missing configuration options.
* Add a new section for key rotation.
Notes:
* Some of the patches in this series have been cherry-picked from Laszlo's wip branch.
* This PR is expected to conflict with the Azure Key Vault PR, which should be merged first. (https://github.com/scylladb/scylladb/pull/23920/)
* Support for KMIP system keys in the Replicated Key Provider is currently broken. (https://github.com/scylladb/scylladb/issues/24443)
Fixesscylladb/scylla-enterprise#3535.
Refs scylladb/scylla-enterprise#3183.
Only doc changes. No backport is needed.
Closesscylladb/scylladb#24558
* github.com:scylladb/scylladb:
encryption-at-rest.rst: add "Rotate Encryption Keys" section
encryption-at-rest.rst: rewrite "Encrypt System Resources" section
encryption-at-rest.rst: rewrite "Update Encryption Properties of Existing Tables" section
encryption-at-rest.rst: rewrite "Encrypt a Single Table" section
encryption-at-rest.rst: rewrite "Encrypt Tables" section
encryption-at-rest.rst: update "Set the Azure Host" section
encryption-at-rest.rst: update "Set the GCP Host" section
encryption-at-rest.rst: update "Set the KMS Host" section
encryption-at-rest.rst: update "Set the KMIP Host" section
encryption-at-rest.rst: rewrite "Create Encryption Keys" section
encryption-at-rest.rst: rewrite "Key Providers" section
encryption-at-rest.rst: hoist and update "Cipher Algorithm Descriptors"
encryption-at-rest.rst: rewrite/replace section "Encryption Key Types"
encryption-at-rest.rst: About: describe high-level operation more precisely
encryption-at-rest.rst: improve wording / formatting in About intro
encryption-at-rest.rst: users (plural) typo fix
encryption-at-rest.rst: rewrap
encryption-at-rest.rst: strip trailing whitespace
The script test/cqpy/run-cassandra aims to make it easy to run
any version of Cassandra using whatever version of Java the user
has installed. Sadly, the fact that Java keeps changing and the
Cassandra developers are very slow to adapt to new Javas makes
doing this non-trivial.
This patch makes it possible for run-cassandra to run Cassandra 5
on the Java 21 that is now the default on Fedora 42. Fedora 42
no longer carries antique version of Java (like Java 8 or 11), not
even as an optional package.
Sadly, even with this patch it is not possible to run older
versions of Cassandra (4 and 3) with Java 21, because the new
Java is missing features such as Netty that the older Cassandra
require. But at least it restores the ability to run our cqlpy
tests against Cassandra 5.
Also, this patch adds to test/cqlpy/README.md simple instructions on
how to install Java 11 (in addition to the system's default Java 21)
on Fedora 42. Doing this is very easy and very recommended because
it restores the ability to run Cassandra 3 and 4, not just Cassandra 5.
Fixes#25822.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25825
compaction/scrub: register sstables for compaction before validation
When `scrub --validate` runs, it collects all candidate sstables at the
start and validates them one by one in separate compaction tasks.
However, scrub in validate mode does not register these sstables for
compaction, which allows regular compaction to pick them up and
potentially compact them away before validation begins. This leads to
scrub failures because the sstables can no longer be found.
This patch fixes the issue by first disabling compaction, collecting the
sstables, and then registering them for compaction before starting
validation. This ensures that the enqueued sstables remain available for
the entire duration of the scrub validation task.
Fixes#23363
This reported scrub failure occurs on all versions that have the
checksum/digest validation feature for uncompressed sstables.
So, backport it to older versions.
Closesscylladb/scylladb#26034
* github.com:scylladb/scylladb:
compaction/scrub: register sstables for compaction before validation
compaction/scrub: handle exceptions when moving invalid sstables to quarantine
Depends on https://github.com/scylladb/seastar/pull/2651
Missing columns have been present since probably forever -
they were added to the schema but never assigned any value:
```
cqlsh> select * from system.clients;
------------------+------------------------
...
ssl_cipher_suite | null
ssl_enabled | null
ssl_protocol | null
...
```
This patch sets values of these columns:
- with a TLS connection, the 3 TLS-related fields are filled in,
- without TLS, `ssl_enabled` is set to `false` and other columns are
`null`,
- if there's an error while inspecting TLS values, the connection is
dropped.
We want to save the TLS info of a connection just after accepting it,
but without waiting for a TLS handshake to complete, so once the
connection is accepted, we're inspecting it in the background for the
server to be able to accept next connections immediately.
Later, when we construct system.clients virtual table, the previously
saved data can be instantaneously assigned to client_data, which is a
struct representing a row in system.clients table. This way we don't
slow down constructing this table by more than necessary, which is
relevant for cases with plenty of connections.
Fixes: #9216Closesscylladb/scylladb#22961
In test/alternator/test_returnvalues.py we had tests for the
ReturnValues feature on UpdateItem requests - but we only tested
UpdateItem requests with the "modern" UpdateExpression, and forgot to
test the combination of ReturnValues with the old AttributeUpdates API.
It turns out this combination is buggy: when both ReturnValues=ALL_OLD
and AttributeUpdates need the previous value of the item, we may wrongly
std::move() the value out, and the operation will fail with a strange
error:
An error occurred (ValidationException) when calling the UpdateItem
operation: JSON assert failed on condition 'IsObject()'
The fix in this patch is trivial - just move the std::move() to the
correct place, after both UpdateExpression and AttributeUpdates
handling is done.
This patch also includes a reproducing test, which fails before this
patch and passes with it - and of course passes on DynamoDB. This
test reproduces two cases where the bug happened, as well as one
case where it didn't (to make sure we don't regress in what already
worked).
Fixes#25894
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25900
The view building batch lives on shard0 but it might be doing
work on shard which owns the tablet replica.
Until now the batch data was accessed from multiple shards (shard0 and
where the batch was executed).
This patch fixes this by splitting tasks execution into:
- preparation which is always happening on shard0
- actual execution of the tasks on relevant shard, but all necessary
data is copied to the shard and batch object isn't accessed.
Fixes https://github.com/scylladb/scylladb/issues/25804
View building coordinator hasn't been released yet, so no backport needed.
Closesscylladb/scylladb#26058
* github.com:scylladb/scylladb:
db/view/view_building_worker: move try-catch outside `invoke_on()`
db/view/view_building_worker: split batch's data preparation and execution
expected_total_workload methods of scrub compaction tasks create a vector of table_info based on table names. If any table was already dropped, then the exception is thrown. It leaves table_info in corrupted state and node crashes with `free(): invalid size`.
Return std::nullopt if an exception was thrown to indicate that total workload cannot be found.
Fixes: #25941.
No release branches affected
Closesscylladb/scylladb#25944
* github.com:scylladb/scylladb:
tasks: get progress of failed task based on children
compaction: handle exception in expected_total_workload
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.
First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1
Fixes: https://github.com/scylladb/scylladb/issues/25970Closesscylladb/scylladb#25995
* github.com:scylladb/scylladb:
test: verify that the index metric is added
index, metrics: add per-index metrics
This patch overrides the antlr3 function that allocates the missing
tokens that would eventually leak. The override stores these tokens in
a vector, ensuring memory is freed whenever the parser is destroyed.
Solution is copied from CQL implementation.
A unit test to reproduce the issue is added - leak would be reported
by ASAN, when running this test in debug mode - the test passed but
the leak is discovered when the test file exits.
Fixes#25878Closesscylladb/scylladb#25930
tools/scylla-sstable.cc has 3.5k SLOC, out of which this class alone is
1K. Extract into own hh and cc, just a copy-paste after the preparation
commit.
Make methods out-of-line, so class declaration stands on its own,
without definition of impl.
Move auxiliary structures, used only by impl, out of the class scope.
Move parser to tools namespace, and auxiliaries to anonymous namespace
within the tools one.
Pass down logger ref to parser impl and below, to prepare for sst_log
not being available in scope.
Add comment to parser class explaining what it does.
As discussed in https://github.com/scylladb/scylladb/pull/24606#discussion_r2281870939
clear_gently of shared pointers should release the wrapped
object reference and when the object's use_count reaches 1,
the object itself would be cleared_gently, before it's destroyed.
This behavior is similar to the way we clear gently containers
like arrays or vectors, and so it is extended in this patch
to smart pointers like unique_ptr and foreign_ptr.
The unit tests are adjusted respectively to expect the
smart pointers to be reset after clear_gently, plus
the use of `reset()` for `foreign_ptr<shared_ptr<>>` was
replaced by `clear_gently().get()` which now ensures the
reference to a shared object is released, and awaited for,
if it happens on a foreign owner shard, unlike reset of
a foreign_ptr that kicks off destroy of that shared object
in the background on the owner shard - causing flakiness.
Fixes#25723
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#25759
The Raft topology workflow was changed by the limited voters feature: nodes no longer request votership themselves. As a result, the "stop_before_becoming_raft_voter" error injection is now obsolete and has been removed.
Fixes: scylladb/scylladb#23418
No backport: This re-enables a test, only needed for master.
Closesscylladb/scylladb#26042
* https://github.com/scylladb/scylladb:
group0: remove obsolete "stop_before_becoming_raft_voter" error injection
test/random_failures: preserve test repeatability when removing error injections
The view building batch lives on shard0 but it might be doing
work on shard which owns the tablet replica.
Until now the batch data was accessed from multiple shards (shard0 and
where the batch was executed).
This patch fixes this by splitting tasks execution into:
- preparation which is always happening on shard0
- actual execution of the tasks on relevant shard, but all necessary
data is copied to the shard and batch object isn't accessed.
Fixesscylladb/scylladb#25804
This PR consists of three parts:
* Small refactoring of the fencing APIs in storage_proxy (renames + comments + some functions were extracted)
* Implement the fencing for LWT verbs itself. This includes checking the fencing token before and after local replica data accesses.
* Two new `test.py` tests in `test_fencing.py`, which check the fencing in some real-world scenarios.
Backport: no need -- fencing for LWT requests is needed primarily for LWT over tablets, which is not released yet.
Fixesscylladb/scylladb#22332Closesscylladb/scylladb#25550
* https://github.com/scylladb/scylladb:
test_tablets_lwt: eliminate redundant disable_tablet_balancing
test_fencing: add test_lwt_fencing_upgrade
pylib: extract upgrade helpers from test_sstable_compression_dictionaries_upgrade.py
test_fencing: add test_fenced_out_on_tablet_migration_while_handling_paxos_verb
test_fencing: test_fence_lwt_during_bootstap
pylib/rest_client.py: encode injection name
storage_proxy_stats: add fenced_out_requests metric
storage_proxy: add fencing to Paxos verbs
storage_proxy::apply_fence: add overload that throws on failure
storage_proxy: extract apply_fence_result
sp::apply_fence: rename to apply_fence_on_ready
sp::apply_fence: rename to check_fence
sp::apply_fence: make non-generic
As requested in #22120, moved the files and fixed other includes and build system.
Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh
Fixes: #22120
This is a cleanup, no need to backport
Closesscylladb/scylladb#25105
The chunked download source sends large GET requests and then consumes data
as it arrives. Sometimes it can stop reading from socket early and drop the
in-flight data. The existing read-bytes metrics show only the number of
consumed bytes, we we also want to know the number of requested bytes
Refs #25770 (accounting of read-bytes)
Fixes#25876
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25877
This change disables caching for raft log table due to the following reasons:
* Immediate reason is a deficiency in handling emerging range tombstones in the cache, which causes stalls.
* Long-term reason is that sequential reads from the raft log do not benefit from the cache, making it better to bypass it to free up space and avoid stalls.
Fixesscylladb/scylladb#26027Closesscylladb/scylladb#26031
The compaction_manager::stop_compaction() method internally walks the
list of tasks and compares each task's compacting_table (which is
compaction group view pointer) with the given one. In case this
stop_compaction() method is called via API for a specific table, the
method walks the list of tasks for every compaction group from the
table, thus resulting in nr_groups * nr_tasks complexity. Not terrible,
but not nice either.
The proposal is to pass filtering function into the inner
do_stop_ongoing_compactions() method. Some users will pass a simple
"return true" lambda, but those that need to stop compactions for a
specitif table (e.g. -- the API handler) will effectively walk the
list of tasks once comparing the given compaction group's schema with
the target table one (spoiler: eventually this place will also be
simplified not to mess with replica::table at all).
One ugliness with the change is the way "scope" for logging message is
collected. If all tasks belong to the same table, then "for table ..."
is printed in logs. With the change the scope is no longer known
instantly and is evaluated dynamically while walking the list of tasks.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25846
Add a helper which prints all prepared statements currently
present in the query processor.
Example output:
```
(gdb) scylla prepared-statements
(cql3::cql_statement*)(0x600003d71050): SELECT * FROM ks.ks WHERE pk = ?
(cql3::cql_statement*)(0x600003972b50): SELECT pk FROM ks.ks WHERE pk = ?
```
Closesscylladb/scylladb#26007
Allow for the following CQL syntax:
```
CREATE KEYSPACE [IF NOT EXISTS] <name>;
```
for example:
```
CREATE KEYSPACE test_keyspace;
```
With this syntax all the keyspace's parameters would be defaulted to:
replication strategy = `NetworkTopologyStrategy`,
replication factor = number of racks , but excluding racks that only have arbiter nodes
storage options, durable writes = defaults we normally would use,
tablets enabled if they are enabled in the db configuration, e.g. scylla.yaml or db/config.cc by default.
Options besides `replication` already have defaults. `replication` had to be specified, but it could be an empty set, where defaults for sub-options (replication strategy and replication factor) would be used - `replication = {}`. Now there is no need for specifying an empty set - omitting `replication = {}` has the same effect as `replication = {}`.
Since all the options now have defaults, `WITH` is optional for `CREATE KEYSPACE` statement.
Fixes#25145
This is an improvement, no backport needed.
Closesscylladb/scylladb#25872
* github.com:scylladb/scylladb:
docs: cql: default create keyspace syntax
test: cqlpy: add test for create keyspace with no options specified
cql: default `CREATE KEYSPACE` syntax
Ensure all direct and global topology commands rethrow the
`raft::request_aborted` exception when aborted, typically due to
leadership changes. This makes abortion explicit to callers, enabling
proper handling such as retries or workflow termination.
This change completes the work started in PR scylladb/scylladb#23962,
covering all remaining cases where the exception was not rethrown.
Fixes: scylladb/scylladb#23589
No backport: No related issues observed in previous versions; backport
not required.
Closesscylladb/scylladb#26021
The Raft topology workflow was changed by the limited voters feature:
nodes no longer request votership themselves. As a result, the
"stop_before_becoming_raft_voter" error injection is now obsolete and
has been removed.
Fixes: scylladb/scylladb#23418
The order of entries in the ERROR_INJECTIONS list determines test
repeatability for a given random seed.
To allow removing error injections without affecting the order of the
remaining ones, removed injections are now renamed with a "REMOVED_"
prefix instead of being deleted.
This ensures they are ignored by the tests, while the sequence of active
injections—and thus test reproducibility—remains unchanged.
This commit adds a test that performs
a sanity check that the implemented metric
is actually being added to Scylla's metrics
and has the correct value.
Currently, for failed tasks task_manager::task::impl::get_progress
attempts to find expected_total_workload. However, if the task has
finished long time ago, the state might have totally changed, e.g.
some tables might have been dropped or have changed their sizes.
Due to that, the result of expected_total_workload might be irrelevant.
Count the progress of a finish task based on children only, regardless
whether the task has succeeded or failed.
Currently, during cache invaldation we check if we need to preempt
only after the partition gets invaldaited. This may lead to stalls
if we have a chain of filtered out partitions.
Check for preemption even if the partition does not get invaldated.
Refs: #9136.
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.
First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1
When `scrub --validate` runs, it collects all candidate sstables at the
start and validates them one by one in separate compaction tasks.
However, scrub in validate mode does not register these sstables for
compaction, which allows regular compaction to pick them up and
potentially compact them away before validation begins. This leads to
scrub failures because the sstables can no longer be found.
This patch fixes the issue by first disabling compaction, collecting the
sstables, and then registering them for compaction before starting
validation. This ensures that the enqueued sstables remain available for
the entire duration of the scrub validation task.
Fixes#23363
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Sometimes gossiper operations invoked from storage_service and other components run under a non-gossiper scheduling group. If these operations acquire gossiper locks, priority inversion can occur: higher-priority gossiper tasks may wait behind lower-priority tasks (e.g. streaming), which can cause gossiper slowness or even failures.
This patch ensures that gossiper operations requiring locks on gossiper structures are explicitly executed in the gossiper scheduling group. To help detect similar issues in the future, a warning is logged whenever a gossiper lock is acquired under a non-gossiper scheduling group.
Fixesscylladb/scylladb#25907
Refs: scylladb/scylladb#25702
Backport: this patch fixes an issue with gossiper operations scheduling group, that might affect topology operations, therefore backport is needed to 2025.1, 2025.2, 2025.3
Closesscylladb/scylladb#25981
* https://github.com/scylladb/scylladb:
gossiper: ensure gossiper operations are executed in gossiper scheduling group
gossiper: fix wrong gossiper instance used in `force_remove_endpoint`
The `--incremental-mode` option specifies the incremental repair mode.
Can be 'disabled', 'regular', or 'full'.
'regular': The incremental repair logic is enabled. Unrepaired sstables
will be included for repair. Repaired sstables will be skipped. The
incremental repair states will be updated after repair.
'full': The incremental repair logic is enabled. Both repaired and
unrepaired sstables will be included for repair. The incremental repair
states will be updated after repair.
'disabled': The incremental repair logic is disabled completely. The
incremental repair states, e.g., repaired_at in sstables and
sstables_repaired_at in the system.tablets table, will not be updated
after repair.
When the option is not provided, it defaults to regular.
Fixes#25931Closesscylladb/scylladb#25969
In validate mode, scrub moves invalid sstables into the quarantine
folder. If validation fails because the sstable files are missing from
disk, there is nothing to move, and the quarantine step will throw an
exception. Handle such exceptions so scrub can return a proper
compaction_result instead of propagating the exception to the caller.
This will help the testcase for #23363 to reliably determine if the
scrub has failed or not.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Sometimes gossiper operations invoked from storage_service and other components
run under a non-gossiper scheduling group. If these operations acquire gossiper
locks, priority inversion can occur: higher-priority gossiper tasks may wait
behind lower-priority tasks (e.g. streaming), which can cause gossiper slowness
or even failures.
This patch ensures that gossiper operations requiring locks on gossiper
structures are explicitly executed in the gossiper scheduling group.
To help detect similar issues in the future, a warning is logged whenever
a gossiper lock is acquired under a non-gossiper scheduling group.
Fixesscylladb/scylladb#25907
This test verifies that the fencing token is checked on replicas
after the local Paxos state is updated. This ensures that if we failed
to drain an LWT request during topology changes the replicas
where paxos verbs got stuck won't contributed to the target CLs.