Audit tests are vulnerable to noise from LOGIN queries (because AUTH
audit logs can appear at any time). Most tests already use the
`filter_out_noise` mechanism to remove this noise, but tests
focused on AUTH verification did not, leading to sporadic failures.
This change adds a filter to ignore AUTH logs generated by the default
"cassandra" user, so tests only verify logs from the user created
specifically for each test.
Fixes: scylladb/scylladb#25069
(cherry picked from commit aef6474537)
This is a refactoring commit that changes the names of the parameters
of the `filter_out_noise` function, as well as names of related
variables. The motiviation for the change is introduction of more
complex filtering logic in next commit of this patch series.
Refs: scylladb/scylladb#25069
(cherry picked from commit daf1c58e21)
Add `make_data_or_index_source` to the storages to utilize new S3 based data source which should improve restore performance
* Introduce the `encrypted_data_source` class that wraps an existing data source to read and decrypt data on the fly using block encryption. Also add unit tests to verify correct decryption behavior.
* Add `make_data_or_index_source` to the `storage` interface, implement it for `filesystem_storage` storage which just creates `data_source` from a file and for the `s3_storage` create a (maybe) decrypting source from s3 make_download_source. This change should solve performance improvement for reading large objects from S3 and should not affect anything for the `filesystem_storage`
Fixes: https://github.com/scylladb/scylladb/issues/22458
- (cherry picked from commit 211daeaa40)
- (cherry picked from commit 7e5e3c5569)
- (cherry picked from commit 0de61f56a2)
- (cherry picked from commit 8ac2978239)
- (cherry picked from commit dff9a229a7)
- (cherry picked from commit 8d49bb8af2)
Parent PR: #23695Closesscylladb/scylladb#25016
* github.com:scylladb/scylladb:
sstables: Start using `make_data_or_index_source` in `sstable`
sstables: refactor readers and sources to use coroutines
sstables: coroutinize futurized readers
sstables: add `make_data_or_index_source` to the `storage`
encryption: refactor key retrieval
encryption: add `encrypted_data_source` class
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.
In this commit, we're restricting those operations. We also provide two
validation tests.
One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.
Fixes scylladb/scylladb#24643
Backport: we should backport the change to all affected
branches to prevent the consequences that may affect the user.
- (cherry picked from commit 20d0050f4e)
- (cherry picked from commit 59800b1d66)
Parent PR: #25008Closesscylladb/scylladb#25108
* github.com:scylladb/scylladb:
cdc: Forbid altering columns of inactive CDC log table
cdc: Forbid altering columns of CDC log tables directly
When CDC becomes disabled on the base table, the CDC log table
still exsits (cf. scylladb/scylladb@adda43edc7).
If it continues to exist up to the point when CDC is re-enabled
on the base table, no new log table will be created -- instead,
the old olg table will be *re-attached*.
Since we want to avoid situations when the definition of the log
table has become misaligned with the definition of the base table
due to actions of the user, we forbid modifying the set of columns
or renaming them in CDC log tables, even when they're inactive.
Validation tests are provided.
(cherry picked from commit 59800b1d66)
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.
In this commit, we're restricting those operations. We also provide two
validation tests.
One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.
Fixesscylladb/scylladb#24643
(cherry picked from commit 20d0050f4e)
in the CDC log transformer, when creating a CDC mutation based on some
base table mutation, for each value of a base column we set the value in
the CDC column with the same name.
When looking up the column in the CDC schema by name, we may get a null
pointer if a column by that name is not found. This shouldn't happen
normally because the base schema and CDC schema should be compatible,
and for each base column there should be a CDC column with the same
name.
However, there are scenarios where the base schema and CDC schema are
incompatible for a short period of time when they are being altered.
When a base column is being added or dropped, we could get a base
mutation with this column set, and then the CDC transformer picks up the
latest CDC schema which doesn't have this column.
If such thing happens, we fix the code to throw an exception instead of
crashing on null pointer dereference. Currently we don't have a safer
approach to handle this, but this might be changed in the future. The
other alternative is dropping that data silently which we prefer not to
do.
Throwing an error is acceptable because this scenario most likely
indicates this behavior by the user:
* The user adds a new column, and start writing values to the column
before the ALTER is complete. or,
* The user drops a column, and continues writing values to the column
while it's being dropped.
Both cases might as well fail with an error because the column is not
found in the base table.
Fixes scylladb/scylladb#/24952
backport needed - simple fix for a node crash
- (cherry picked from commit b336f282ae)
- (cherry picked from commit 86dfa6324f)
Parent PR: #24986Closesscylladb/scylladb#25067
* github.com:scylladb/scylladb:
test: cdc: add test_cdc_with_alter
cdc: throw error if column doesn't exist
in the CDC log transformer, when creating a CDC mutation based on some
base table mutation, for each value of a base column we set the value in
the CDC column with the same name.
When looking up the column in the CDC schema by name, we may get a null
pointer if a column by that name is not found. This shouldn't happen
normally because the base schema and CDC schema should be compatible,
and for each base column there should be a CDC column with the same
name.
However, there are scenarios where the base schema and CDC schema are
incompatible for a short period of time when they are being altered.
When a base column is being added or dropped, we could get a base
mutation with this column set, and then the CDC transformer picks up the
latest CDC schema which doesn't have this column.
If such thing happens, we fix the code to throw an exception instead of
crashing on null pointer dereference. Currently we don't have a safer
approach to handle this, but this might be changed in the future. The
other alternative is dropping that data silently which we prefer not to
do.
Throwing an error is acceptable because this scenario most likely
indicates this behavior by the user:
* The user adds a new column, and start writing values to the column
before the ALTER is complete. or,
* The user drops a column, and continues writing values to the column
while it's being dropped.
Both cases might as well fail with an error because the column is not
found in the base table.
Fixesscylladb/scylladb#24952
(cherry picked from commit b336f282ae)
The functions password_authenticator::start and
standard_role_manager::start have a similar structure: they spawn a
fiber which invokes a callback that performs some migration until that
migration succeeds. Both handlers set a shared promise called
_superuser_created_promise (those are actually two promises, one for the
password authenticator and the other for the role manager).
The handlers are similar in both cases. They check if auth is in legacy
mode, and behave differently depending on that. If in legacy mode, the
promise is set (if it was not set before), and some legacy migration
actions follow. In auth-on-raft mode, the superuser is attempted to be
created, and if it succeeds then the promise is _unconditionally_ set.
While it makes sense at a glance to set the promise unconditionally,
there is a non-obvious corner case during upgrade to topology on raft.
During the upgrade, auth switches from the legacy mode to auth on raft
mode. Thus, if the callback didn't succeed in legacy mode and then tries
to run in auth-on-raft mode and succeds, it will unconditionally set a
promise that was already set - this is a bug and triggers an assertion
in seastar.
Fix the issue by surrounding the `shared_promise::set_value` call with
an `if` - like it is already done for the legacy case.
Fixes: scylladb/scylladb#24975Closesscylladb/scylladb#24976
(cherry picked from commit a14b7f71fe)
Closesscylladb/scylladb#25019
Convert all necessary methods to be awaitable. Start using `make_data_or_index_source`
when creating data_source for data and index components.
For proper working of compressed/checksummed input streams, start passing
stream creator functors to `make_(checksummed/compressed)_file_(k_l/m)_format_input_stream`.
(cherry picked from commit 8d49bb8af2)
Refactor readers and sources to support coroutine usage in
preparation for integration with `make_data_or_index_source`.
Move coroutine-based member initialization out of constructors
where applicable, and defer initialization until first use.
(cherry picked from commit dff9a229a7)
Add `make_data_or_index_source` to the `storage` interface, implement it
for `filesystem_storage` storage which just creates `data_source` from a
file and for the `s3_storage` create a (maybe) decrypting source from s3
make_download_source.
This change should solve performance improvement for reading large objects
from S3 and should not affect anything for the `filesystem_storage`.
(cherry picked from commit 0de61f56a2)
Introduce the `encrypted_data_source` class that wraps an existing data
source to read and decrypt data on the fly using block encryption. Also add
unit tests to verify correct decryption behavior.
NOTE: The wrapped source MUST read from offset 0, `encrypted_data_source` assumes it is
Co-authored-by: Calle Wilund <calle@scylladb.com>
(cherry picked from commit 211daeaa40)
The test could fail with RF={DC1: 2, DC2: 0} and CL=ONE when:
- both writes succeeded with the same replica responding first,
- one of the following reads succeeded with the other replica
responding before it applied mutations from any of the writes.
We fix the test by not expecting reads with CL=ONE to return a row.
We also harden the test by inserting different rows for every pair
(CL, coordinator), where one of the two coordinators is a normal
node from DC1, and the other one is a zero-token node from DC2.
This change makes sure that, for example, every write really
inserts a row.
Fixesscylladb/scylladb#22967
The fix addresses CI flakiness and only changes the test, so it
should be backported.
Closesscylladb/scylladb#23518
(cherry picked from commit 21edec1ace)
Closesscylladb/scylladb#24985
- Fix missing negation in the `if` in the background downloading fiber
- Add test to catch this case
- Improve the s3 proxy to inject errors if the same resource requested more than once
- Suppress client retry since retrying the same request when each produces multiple buffers may lead to the same data appear more than once in the buffer deque
- Inject exception from the test to simulate response callback failure in the middle
No need to backport anything since this class in not used yet
- (cherry picked from commit f1d0690194)
- (cherry picked from commit e73b83e039)
- (cherry picked from commit 6d9cec558a)
- (cherry picked from commit ec59fcd5e4)
- (cherry picked from commit c75acd274c)
- (cherry picked from commit d2d69cbc8c)
- (cherry picked from commit e50f247bf1)
- (cherry picked from commit 49e8c14a86)
- (cherry picked from commit a5246bbe53)
- (cherry picked from commit acf15eba8e)
Parent PR: #24657Closesscylladb/scylladb#24943
* github.com:scylladb/scylladb:
s3_test: Add s3_client test for non-retryable error handling
s3_test: Add trace logging for default_retry_strategy
s3_client: Fix edge case when the range is exhausted
s3_client: Fix indentation in try..catch block
s3_client: Stop retries in chunked download source
s3_client: Enhance test coverage for retry logic
s3_client: Add test for Content-Range fix
s3_client: Fix missing negation
s3_client: Refine logging
s3_client: Improve logging placement for current_range output
When a tablet transitions to a post-cleanup stage on the leaving replica
we deallocate its storage group. Before the storage can be deallocated
and destroyed, we must make sure it's cleaned up and stopped properly.
Normally this happens during the tablet cleanup stage, when
table::cleanup_table is called, so by the time we transition to the next
stage the storage group is already stopped.
However, it's possible that tablet cleanup did not run in some scenario:
1. The topology coordinator runs tablet cleanup on the leaving replica.
2. The leaving replica is restarted.
3. When the leaving replica starts, still in `cleanup` stage, it
allocates a storage group for the tablet.
4. The topology coordinator moves to the next stage.
5. The leaving replica deallocates the storage group, but it was not
stopped.
To address this scenario, we always stop the storage group when
deallocating it. Usually it will be already stopped and complete
immediately, and otherwise it will be stopped in the background.
Fixesscylladb/scylladb#24857Fixesscylladb/scylladb#24828Closesscylladb/scylladb#24896
(cherry picked from commit fa24fd7cc3)
Closesscylladb/scylladb#24909
If small_table_optimization is on, a repair works on a whole table
simultaneously. It may be distributed across the whole cluster and
all nodes might participate in repair.
On a repair master, row buffer is copied for each repair peer.
This means that the memory scales with the number of peers.
In large clusters, repair with small_table_optimization leads to OOM.
Divide the max_row_buf_size by the number of repair peers if
small_table_optimization is on.
Use max_row_buf_size to calculate number of units taken from mem_sem.
Fixes: https://github.com/scylladb/scylladb/issues/22244.
Closesscylladb/scylladb#24868
(cherry picked from commit 17272c2f3b)
Closesscylladb/scylladb#24907
The test has two major problems
1. Wrongly computed time windows. Data was not spread across two 1-minute
windows causing the test to generate even three sstables instead
of two
2. Timestamp was not propagated to the prepared CQL statements. So
in fact, a current time was used implicitly
3. Because of the incorrect timestamp issue, the remaining tests
testing purged tombstones were affected as well.
Fixes https://github.com/scylladb/scylladb/issues/24532Closesscylladb/scylladb#24609
(cherry picked from commit a22d1034af)
Closesscylladb/scylladb#24791
ScyllaDB container image doesn't have ps command installed, while this command is used by perftune.py script shipped within the same image. This breaks node and container tuning in Scylla Operator.
Fixes: #24827Closesscylladb/scylladb#24830
(cherry picked from commit 66ff6ab6f9)
Closesscylladb/scylladb#24956
Introduce a test that injects a non-retryable error and verifies
that the chunked download source throws an exception as expected.
(cherry picked from commit acf15eba8e)
Introduce trace-level logging for `default_retry_strategy` in
`s3_test` to improve visibility into retry logic during test
execution.
(cherry picked from commit a5246bbe53)
Handle case where the download loop exits after consuming all data,
but before receiving an empty buffer signaling EOF. Without this, the
next request is sent with a non-zero offset and zero length, resulting
in "Range request cannot be satisfied" errors. Now, an empty buffer is
pushed to indicate completion and exit the fiber properly.
(cherry picked from commit 49e8c14a86)
Disable retries for S3 requests in the chunked download source to
prevent duplicate chunks from corrupting the buffer queue. The
response handler now throws an exception to bypass the retry
strategy, allowing the next range to be attempted cleanly.
This exception is only triggered for retryable errors; unretryable
ones immediately halt further requests.
(cherry picked from commit d2d69cbc8c)
Extend the S3 proxy to support error injection when the client
makes multiple requests to the same resource—useful for testing
retry behavior and failure handling.
(cherry picked from commit c75acd274c)
Introduce a test that accurately verifies the Content-Range
behavior, ensuring the previous fix is properly validated.
(cherry picked from commit ec59fcd5e4)
Relocated logging to occur after determining the `current_range`,
ensuring more relevant output during S3 client operations.
(cherry picked from commit f1d0690194)
This reverts commit 04fb2c026d. 2025.3 got
the reduced threshold, but won't get many of the fixes the warning will
generate, leaving it very noisy. Better to avoid the noise for this release.
Fixes#24384.
make_repair_plan() allocates a temporary vector which can grow larger
than our 128k basic allocation unit. Use a chunked vector to avoid
stalls due to large allocations.
Fixes#24713.
Closesscylladb/scylladb#24801
(cherry picked from commit 0138afa63b)
Closesscylladb/scylladb#24902
The series adds more logging and provides new REST api around topology command rpc execution to allow easier debugging of stuck topology operations.
Backport since we want to have in the production as quick as possible.
Fixes#24860
- (cherry picked from commit c8ce9d1c60)
- (cherry picked from commit 4e6369f35b)
Parent PR: #24799Closesscylladb/scylladb#24881
* https://github.com/scylladb/scylladb:
topology coordinator: log a start and an end of topology coordinator command execution at info level
topology coordinator: add REST endpoint to query the status of ongoing topology cmd rpc
When replaying a failed batch and sending the mutation to all replicas, make the write response handler cancellable and abort it on shutdown or if some target is marked down. also set a reasonable timeout so it gets aborted if it's stuck for some other unexpected reason.
Previously, the write response handler is not cancellable and has no timeout. This can cause a scenario where some write operation by the batchlog manager is stuck indefinitely, and node shutdown gets stuck as well because it waits for the batchlog manager to complete, without aborting the operation.
backport to relevant versions since the issue can cause node shutdown to hang
Fixes scylladb/scylladb#24599
- (cherry picked from commit 8d48b27062)
- (cherry picked from commit fc5ba4a1ea)
- (cherry picked from commit 7150632cf2)
- (cherry picked from commit 74a3fa9671)
- (cherry picked from commit a9b476e057)
- (cherry picked from commit d7af26a437)
Parent PR: #24595Closesscylladb/scylladb#24882
* github.com:scylladb/scylladb:
test: test_batchlog_manager: batchlog replay includes cdc
test: test_batchlog_manager: test batch replay when a node is down
batchlog_manager: set timeout on writes
batchlog_manager: abort writes on shutdown
batchlog_manager: create cancellable write response handler
storage_proxy: add write type parameter to mutate_internal
Add a new test that verifies that when replaying batch mutations from
the batchlog, the mutations include cdc augmentation if needed.
This is done in order to verify that it works currently as expected and
doesn't break in the future.
(cherry picked from commit d7af26a437)
Add a test of the batchlog manager replay loop applying failed batches
while some replica is down.
The test reproduces an issue where the batchlog manager tries to replay
a failed batch, doesn't get a response from some replica, and becomes
stuck.
It verifies that the batchlog manager can eventually recover from this
situation and continue applying failed batches.
(cherry picked from commit a9b476e057)
Set a timeout on writes of replayed batches by the batchlog manager.
We want to avoid having infinite timeout for the writes in case it gets
stuck for some unexpected reason.
The timeout is set to be high enough to allow any reasonable write to
complete.
(cherry picked from commit 74a3fa9671)
On shutdown of batchlog manager, abort all writes of replayed batches
by the batchlog manager.
To achieve this we set the appropriate write_type to BATCH, and on
shutdown cancel all write handlers with this type.
(cherry picked from commit 7150632cf2)
When replaying a batch mutation from the batchlog manager and sending it
to all replicas, create the write response handler as cancellable.
To achieve this we define a new wrapper type for batchlog mutations -
batchlog_replay_mutation, and this allows us to overload
create_write_response_handler for this type. This is similar to how it's
done with hint_wrapper and read_repair_mutation.
(cherry picked from commit fc5ba4a1ea)
Currently mutate_internal has a boolean parameter `counter_write` that
indicates whether the write is of counter type or not.
We replace it with a more general parameter that allows to indicate the
write type.
It is compatible with the previous behavior - for a counter write, the
type COUNTER is passed, and otherwise a default value will be used
as before.
(cherry picked from commit 8d48b27062)