`kmip_test_helper()` is a utility function to spawn a dedicated PyKMIP
server for a particular Boost test case. The function runs the server as
an external process and uses a thread to parse the port from the
server's logs. The thread communicates the port to the main thread via
a promise.
The current implementation has a bug where the thread may set a value
to the promise after its destruction, causing a segfault. This happens
when the server does not start within 20 seconds, in which case the port
future throws and the stack unwinding machinery destroys the port
promise before the thread that writes to it.
Fix the bug by declaring the promise before the cleanup action.
The bug has been encountered in CI runs on slow machines, where the
PyKMIP server takes too long to create its internal tables (due to slow
fdatasync calls from SQLite). This patch does not improve CI stability -
it only ensures that the error condition is properly reflected in the
test output.
This patch is not a backport. The same bug has been fixed in master as
part of a larger rewrite of the `kmip_test_helper()` (see 722e2bce96).
Refs #24747, #24842.
Fixes#24574.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#25028
Interval map is very susceptible to quadratic space behavior when it's flooded with many entries overlapping all (or most of) intervals, since each such entry will have presence on all intervals it overlaps with.
A trigger we observed was memtable flush storm, which creates many small "L0" sstables that spans roughly the entire token range.
Since we cannot rely on insertion order, solution will be about storing sstables with such wide ranges in a vector (unleveled).
There should be no consequence for single-key reads, since upper layer applies an additional filtering based on token of key being queried.
And for range scans, there can be an increase in memory usage, but not significant because the sstables span an wide range and would have been selected in the combined reader if the range of scan overlaps with them.
Anyway, this is a protection against storm of memtable flushes and shouldn't be the common scenario.
It works both with tablets and vnodes, by adjusting the token range spanned by compaction group accordingly.
Fixes#23634.
We can backport this into 2024.2, 2025.1, but we should let this cook in master for 1 month or so.
- (cherry picked from commit 494ed6b887)
- (cherry picked from commit 59dad2121f)
- (cherry picked from commit 21d1e78457)
- (cherry picked from commit c77f710a0c)
- (cherry picked from commit d5bee4c814)
Parent PR: #23806Closesscylladb/scylladb#24012
* github.com:scylladb/scylladb:
test: Verify partitioned set store split and unsplit correctly
sstables: Fix quadratic space complexity in partitioned_sstable_set
compaction: Wire table_state into make_sstable_set()
compaction: Introduce token_range() to table_state
dht: Add overlap_ratio() for token range
Currently the service levels cache is unnecessarily updated in every
call of `topology_state_load()`.
But it is enough to reload it only when a snapshot is loaded.
(The cache is also already updated when there is a change to one of
`service_levels_v2`, `role_members`, `role_attributes` tables.)
Fixesscylladb/scylladb#25114Fixesscylladb/scylladb#23065Closesscylladb/scylladb#25116
(cherry picked from commit 10214e13bd)
Closesscylladb/scylladb#25303
The PyKMIP server uses an SQLite database to store artifacts such as
encryption keys. By default, SQLite performs a full journal and data
flush to disk on every CREATE TABLE operation. Each operation triggers
three fdatasync(2) calls. If we multiply this by 16, that is the number
of tables created by the server, we get a significant number of file
syncs, which can last for several seconds on slow machines.
This behavior has led to CI stability issues from KMIP unit tests where
the server failed to complete its schema creation within the 20-second
timeout (observed on spider9 and spider11).
Fix this by configuring the server to use an in-memory SQLite.
Fixes#24842.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#24995
(cherry picked from commit 2656fca504)
Closesscylladb/scylladb#25297
In this PR, we adjust tests in the cqlpy test suite so they
only use RF-rack-valid keyspaces. After that, we enable
the configuration option `rf_rack_valid_keyspaces` in the
suite by default.
Refs scylladb/scylladb#23428
Fixes scylladb/scylladb#25306
Backport: backporting to 2025.1 so we can test the option there too.
- (cherry picked from commit 6bde01bb59)
- (cherry picked from commit 958eaec056)
- (cherry picked from commit a59842257a)
- (cherry picked from commit be0877ce69)
Parent PR: #23489Closesscylladb/scylladb#25307
* github.com:scylladb/scylladb:
test/cqlpy: Enable rf_rack_valid_keyspaces by default
test: Move test_alter_tablet_keyspace_rf to cluster suite
test/cqlpy: Adjust tests to RF-rack-valid keyspaces
test/cqlpy/cassandra_tests: Adjust to RF-rack-valid keyspaces
Right now, service levels are migrated in one group0 command and auth is migrated in the next one. This has a bad effect on the group0 state reload logic - modifying service levels in group0 causes the effective service levels cache to be recalculated, and to do so we need to fetch information about all roles. If the reload happens after SL upgrade and before auth upgrade, the query for roles will be directed to the legacy auth tables in system_auth - and the query, being a potentially remote query, has a timeout. If the query times out, it will throw an exception which will break the group0 apply fiber and the node will need to be restarted to bring it back to work.
In order to solve this issue, make sure that the service level module does not start populating and using the service level cache until both service levels and auth are migrated to raft. This is achieved by adding the check both to the cache population logic and the effective service level getter - they now look at service level's accessor new method, `can_use_effective_service_level_cache` which takes a look at the auth version.
Fixes: scylladb/scylladb#24963
Should be backported to all versions which support upgrade to topology over raft - the issue described here may put the cluster into a state which is difficult to get out of (group0 apply fiber can break on multiple nodes, which necessitates their restart).
- (cherry picked from commit 2bb800c004)
- (cherry picked from commit 3a082d314c)
Parent PR: #25188Closesscylladb/scylladb#25283
* github.com:scylladb/scylladb:
test: sl: verify that legacy auth is not queried in sl to raft upgrade
qos: don't populate effective service level cache until auth is migrated to raft
All of the tests in the suite have been adjusted so they only
use RF-rack-valid keyspaces, so let's start enabling the option
by default.
(cherry picked from commit be0877ce69)
We move the test `test_alter_tablet_keyspace_rf` from the cqlpy to the
cluster test suite. The reason behind the change is that the test cannot
be run with `rf_rack_valid_keyspaces` turned on in the configuration.
During the test, we make the keyspace RF-rack-invalid multiple times.
Since RF-rack-validity is a very strong constraint, adjust the test
otherwise is impossible.
By moving it to the cluster test suite, we're able to change the
configuration of the node used in the test, and so the test can work
again.
(cherry picked from commit a59842257a)
We adjust three existing Cassandra tests so that they don't create
RF-rack-invalid keyspaces. We modify the replication factor used
in the problematic tests. The changes don't affect the tests as
the value of the RF is unrelated to what they verify. Thanks to
that, we can run them now even with enforced RF-rack-valid keyspaces.
The drawback is that the modified ALTER statements do not modify
the RF at all. However, since the tests seem to verify that the code
responsible for VALIDATING a request works as intended, that should
have little to no impact on them.
(cherry picked from commit 6bde01bb59)
Adjust `test_service_levels_upgrade`: right before upgrade to topology
on raft, enable an error injection which triggers when the standard role
manager is about to query the legacy auth tables in the
system_auth keyspace. The preceding commit which fixes
scylladb/scylladb#24963 makes sure that the legacy tables are not
queried during upgrade to topology on raft, so the error injection does
not trigger and does not cause a problem; without that commit, the test
fails.
(cherry picked from commit 3a082d314c)
Right now, service levels are migrated in one group0 command and auth
is migrated in the next one. This has a bad effect on the group0 state
reload logic - modifying service levels in group0 causes the effective
service levels cache to be recalculated, and to do so we need to fetch
information about all roles. If the reload happens after SL upgrade and
before auth upgrade, the query for roles will be directed to the legacy
auth tables in system_auth - and the query, being a potentially remote
query, has a timeout. If the query times out, it will throw
an exception which will break the group0 apply fiber and the node will
need to be restarted to bring it back to work.
In order to solve this issue, make sure that the service level module
does not start populating and using the service level cache until both
service levels and auth are migrated to raft. This is achieved by adding
the check both to the cache population logic and the effective service
level getter - they now look at service level's accessor new method,
`can_use_effective_service_level_cache` which takes a look at the auth
version.
Fixes: scylladb/scylladb#24963
(cherry picked from commit 2bb800c004)
Note: The simplest approach to resolving `process_request_one` merge issues, since it has been refactored, was to include the three commits from before, and then the commits that are actually being backported.
`protocol_exception` is thrown in several places. This has become a performance issue, especially when starting/restarting a server. To alleviate this issue, throwing the exception has to be replaced with returning it as a result or an exceptional future.
This PR replaces throws in the `transport/server` module. This is achieved by using result_with_exception, and in some places, where suitable, just by creating and returning an exceptional future.
There are four commits in this PR. The first commit introduces tests in `test/cqlpy`. The second commit refactors transport server `handle_error` to not rethrow exceptions. The third commit refactors reusable buffer writer callbacks. The fourth commit replaces throwing `protocol_exception` to returning it.
Based on the comments on an issue linked in https://github.com/scylladb/scylladb/issues/24567, the main culprit from the side of protocol exceptions is the invalid protocol version one, so I tested that exception for performance.
In order to see if there is a measurable difference, a modified version of `test_protocol_version_mismatch` Python is used, with 100'000 runs across 10 processes (not threads, to avoid Python GIL). One test run consisted of 1 warm-up run and 5 measured runs. First test run has been executed on the current code, with throwing protocol exceptions. Second test urn has been executed on the new code, with returning protocol exceptions. The performance report is in https://github.com/scylladb/scylladb/pull/24738#issuecomment-3051611069. It shows ~10% gains in real, user, and sys time for this test.
Testing
Build: `release`
Test file: `test/cqlpy/test_protocol_exceptions.py`
Test name: `test_protocol_version_mismatch` (modified for mass connection requests)
Test arguments:
```
max_attempts=100'000
num_parallel=10
```
Throwing `protocol_exception` results:
```
real=1:26.97 user=10:00.27 sys=2:34.55 cpu=867%
real=1:26.95 user=9:57.10 sys=2:32.50 cpu=862%
real=1:26.93 user=9:56.54 sys=2:35.59 cpu=865%
real=1:26.96 user=9:54.95 sys=2:32.33 cpu=859%
real=1:26.96 user=9:53.39 sys=2:33.58 cpu=859%
real=1:26.95 user=9:56.85 sys=2:34.11 cpu=862% # average
```
Returning `protocol_exception` as `result_with_exception` or an exceptional future:
```
real=1:18.46 user=9:12.21 sys=2:19.08 cpu=881%
real=1:18.44 user=9:04.03 sys=2:17.91 cpu=869%
real=1:18.47 user=9:12.94 sys=2:19.68 cpu=882%
real=1:18.49 user=9:13.60 sys=2:19.88 cpu=883%
real=1:18.48 user=9:11.76 sys=2:17.32 cpu=878%
real=1:18.47 user=9:10.91 sys=2:18.77 cpu=879% # average
```
This PR replaced `transport/server` throws of `protocol_exception` with returns. There are a few other places where protocol exceptions are thrown, and there are many places where `invalid_request_exception` is thrown. That is out of scope of this single PR, so the PR just refs, and does not resolve issue #24567.
Refs: #24567
This PR improves performance in cases when protocol exceptions happen, for example during connection storms. It will require backporting.
* (cherry picked from commit 7aaeed012e)
* (cherry picked from commit 30d424e0d3)
* (cherry picked from commit 9f4344a435)
* (cherry picked from commit 5390f92afc)
* (cherry picked from commit 4a6f71df68)
Parent PR: #24738Closesscylladb/scylladb#25240
* github.com:scylladb/scylladb:
test/cqlpy: add cpp exception metric test conditions
transport/server: replace protocol_exception throws with returns
utils/reusable_buffer: accept non-throwing writer callbacks via result_with_exception
transport/server: avoid exception-throw overhead in handle_error
test/cqlpy: add protocol_exception tests
transport: remove redundant references in process_request_one
transport: fix the indentation in process_request_one
transport: add futures in CQL server exception handling
This commit:
- Extends the Drivers support table with information on which driver supports tablets
and since which version.
- Adds the driver support policy to the Drivers page.
- Reorganizes the Drivers page to accommodate the updates.
In addition:
- The CPP-over-Rust driver is added to the table.
- The information about Serverless (which we don't support) is removed
and replaced with tablets to correctly describe the contents of the table.
Fixes https://github.com/scylladb/scylladb/issues/19471
Refs https://github.com/scylladb/scylladb-docs-homepage/issues/69Closesscylladb/scylladb#24635
(cherry picked from commit 18b4d4a77c)
Closesscylladb/scylladb#25247
Tested code paths should not throw exceptions. `scylla_reactor_cpp_exceptions`
metric is used. This is a global metric. To address potential test flakiness,
each test runs multiple times:
- `run_count = 100`
- `cpp_exception_threshold = 10`
If a change in the code introduced an exception, expectation is that the number
of registered exceptions will be > `cpp_exception_threshold` in `run_count` runs.
In which case the test fails.
Fixes: #25273
(cherry picked from commit 4a6f71df68)
Replace throwing protocol_exception with returning it as a result
or an exceptional future in the transport server module. This
improves performance, for example during connection storms and
server restarts, where protocol exceptions are more frequent.
In functions already returning a future, protocol exceptions are
propagated using an exceptional future. In functions not already
returning a future, result_with_exception is used.
Notable change is checking v.failed() before calling v.get() in
process_request function, to avoid throwing in case of an
exceptional future.
Refs: #24567Fixes: #25273
(cherry picked from commit 5390f92afc)
Make make_bytes_ostream and make_fragmented_temporary_buffer accept
writer callbacks that return utils::result_with_exception instead of
forcing them to throw on error. This lets callers propagate failures
by returning an error result rather than throwing an exception.
Introduce buffer_writer_for, bytes_ostream_writer, and fragmented_buffer_writer
concepts to simplify and document the template requirements on writer callbacks.
This patch does not modify the actual callbacks passed, except for the syntax
changes needed for successful compilation, without changing the logic.
Refs: #24567Fixes: #25273
(cherry picked from commit 9f4344a435)
Previously, connection::handle_error always called f.get() inside a try/catch,
forcing every failed future to throw and immediately catch an exception just to
classify it. This change eliminates that extra throw/catch cycle by first checking
f.failed(), getting the stored std::exception_ptr via f.get_exception(), and
then dispatching on its type via utils::try_catch<T>(eptr).
The error-response logic is not changed - cassandra_exception, std::exception,
and unknown exceptions are caught and processed, and any exceptions thrown by
write_response while handling those exceptions continues to escape handle_error.
Refs: #24567Fixes: #25273
(cherry picked from commit 30d424e0d3)
Add a helper to fetch scylla_transport_cql_errors_total{type="protocol_error"} counter
from Scylla's metrics endpoint. These metrics are used to track protocol error
count before and after each test.
Add cql_with_protocol context manager utility for session creation with parameterized
protocol_version value. This is used for testing connection establishment with
different protocol versions, and proper disposal of successfully established sessions.
The tests cover two failure scenarios:
- Protocol version mismatch in test_protocol_version_mismatch which tests both supported
and unsupported protocol version
- Malformed frames via raw socket in _protocol_error_impl, used by several test functions,
and also test_no_protocol_exceptions test to assert that the error counters never decrease
during test execution, catching unintended metric resets
Refs: #24567Fixes: #25273
(cherry picked from commit 7aaeed012e)
This issue happens with removenode, when RBNO is disabled, so range
streamer is used.
The deadlock happens in a scenario like this:
1. Start 3 nodes: {A, B, C}, RF=2
2. Node A is lost
3. removenode A
4. Both B and C gain ownership of ranges.
5. Streaming sessions are started with crossed directions: B->C, C->B
Readers created by sender side exhaust streaming semaphore on B and C.
Receiver side attempts to obtain a permit indirectly by calling
check_needs_view_update_path(), which reads local tables. That read is
blocked and times-out, causing streaming to fail. The streaming writer
is already using a tracking-only permit.
Even if we didn't deadlock, and the streaming semaphore was simply exhausted
by other receiving sessions (via tracking-only permit), the query may still time-out due to starvation.
To avoid that, run the query under a different scheduling group, which
translates to the system semaphore instead of the maintenance
semaphore, to break the dependency. The gossip group was chosen
because it shouldn't be contended and this change should not interfere
with it much.
Fixes#24807Fixes#24925
- (cherry picked from commit ee2fa58bd6)
- (cherry picked from commit dff2b01237)
Parent PR: #24929Closesscylladb/scylladb#25052
* github.com:scylladb/scylladb:
streaming: Avoid deadlock by running view checks in a separate scheduling group
service: migration_manager: Run group0 barrier in gossip scheduling group
This issue happens with removenode, when RBNO is disabled, so range
streamer is used.
The deadlock happens in a scenario like this:
1. Start 3 nodes: {A, B, C}, RF=2
2. Node A is lost
3. removenode A
4. Both B and C gain ownership of ranges.
5. Streaming sessions are started with crossed directions: B->C, C->B
Readers created by sender side exhaust streaming semaphore on B and C.
Receiver side attempts to obtain a permit indirectly by calling
check_needs_view_update_path(), which reads local tables. That read is
blocked and times-out, causing streaming to fail. The streaming writer
is already using a tracking-only permit.
To avoid that, run the query under a different scheduling group, which
translates to the system semaphore instead of the maintenance
semaphore, to break the dependency. The gossip group was chosen
because it shouldn't be contended and this change should not interfere
with it much.
Fixes: #24807
(cherry picked from commit dff2b01237)
The references were added and used in previous commits to
limit the number of line changes for a reviewer convenience.
This commit removes the redundant references to make the code
more clear and concise.
(cherry picked from commit 9b1f062827)
Fix the indentation after the previous commit that intentionally had
a wrong indent to limit the number of changed lines
(cherry picked from commit 9c0f369cf8)
Prepare for the next commit that will introduce a
seastar::sleep in handling of selected exception.
This commit:
- Rewrite cql_server::connection::process_request_one to use
seastar::futurize_invoke and try_catch<> instead of
utils::result_try.
- The intentation is intentionally incorrect to reduce the
number of changed lines. Next commits fix it.
(cherry picked from commit 8a7454cf3e)
Currently, progress of a parent task depends on expected_total_workload,
expected_children_number, and children progresses. Basically, if total
workload is known or all children have already been created, progresses
of children are summed up. Otherwise binary progress is returned.
As a result, two tasks of the same type may return progress in different
units. If they are children of the same task and this parent gathers the
progress - it becomes meaningless.
Drop expected_children_number as we can't assume that children are able
to show their progresses.
Modify get_progress method - progress is calculated based on children
progresses. If expected_total_workload isn't specified, the total
progress of a task may grow. If expected_total_workload isn't specified
and no children are created, empty progress (0/0) is returned.
Fixes: https://github.com/scylladb/scylladb/issues/24650.
Closesscylladb/scylladb#25113
(cherry picked from commit a7ee2bbbd8)
Closesscylladb/scylladb#25197
Currently, nodetool repair command repairs both vnode and tablet keyspaces
if no keyspace is specified. We should use this command to repair
only vnode keyspaces, but this isn't easily accessible - we have to
explicitly run repair only on vnode keyspaces.
nodetool repair skips tablet keyspaces unless a tablet keyspace
is explicitely passed as an argument.
Fixes: #24040.
Closesscylladb/scylladb#24042
(cherry picked from commit 6f8b378e80)
Closesscylladb/scylladb#25152
The functions password_authenticator::start and
standard_role_manager::start have a similar structure: they spawn a
fiber which invokes a callback that performs some migration until that
migration succeeds. Both handlers set a shared promise called
_superuser_created_promise (those are actually two promises, one for the
password authenticator and the other for the role manager).
The handlers are similar in both cases. They check if auth is in legacy
mode, and behave differently depending on that. If in legacy mode, the
promise is set (if it was not set before), and some legacy migration
actions follow. In auth-on-raft mode, the superuser is attempted to be
created, and if it succeeds then the promise is _unconditionally_ set.
While it makes sense at a glance to set the promise unconditionally,
there is a non-obvious corner case during upgrade to topology on raft.
During the upgrade, auth switches from the legacy mode to auth on raft
mode. Thus, if the callback didn't succeed in legacy mode and then tries
to run in auth-on-raft mode and succeds, it will unconditionally set a
promise that was already set - this is a bug and triggers an assertion
in seastar.
Fix the issue by surrounding the `shared_promise::set_value` call with
an `if` - like it is already done for the legacy case.
Backport note: the bugfix part for password_authenticator was removed
from the commit because 2025.1 does not have scylladb/scylladb#22532 and
thus does not contain the bug.
Fixes: scylladb/scylladb#24975Closesscylladb/scylladb#24976
(cherry picked from commit a14b7f71fe)
Closesscylladb/scylladb#25017
Fixes#24447
This factory type, which is really more a data holder/connection producer
per connection instance, creates, if using https, a new certificate_credentials
on every instance. Which when used by S3 client is per client and
scheduling groups.
Which eventually means that we will do a set_system_trust + "cold" handshake
for every tls connection created this way.
This will cause both IO and cold/expensive certificate checking -> possible
stalls/wasted CPU. Since the credentials object in question is literally a
"just trust system", it could very well be shared across the shard.
This PR adds a thread local static cached credentials object and uses this
instead. Could consider moving this to seastar, but maybe this is too much.
Closesscylladb/scylladb#24448
(cherry picked from commit 80feb8b676)
Closesscylladb/scylladb#24460
When a tablet transitions to a post-cleanup stage on the leaving replica
we deallocate its storage group. Before the storage can be deallocated
and destroyed, we must make sure it's cleaned up and stopped properly.
Normally this happens during the tablet cleanup stage, when
table::cleanup_table is called, so by the time we transition to the next
stage the storage group is already stopped.
However, it's possible that tablet cleanup did not run in some scenario:
1. The topology coordinator runs tablet cleanup on the leaving replica.
2. The leaving replica is restarted.
3. When the leaving replica starts, still in `cleanup` stage, it
allocates a storage group for the tablet.
4. The topology coordinator moves to the next stage.
5. The leaving replica deallocates the storage group, but it was not
stopped.
To address this scenario, we always stop the storage group when
deallocating it. Usually it will be already stopped and complete
immediately, and otherwise it will be stopped in the background.
Fixesscylladb/scylladb#24857Fixesscylladb/scylladb#24828Closesscylladb/scylladb#24896
(cherry picked from commit fa24fd7cc3)
Closesscylladb/scylladb#24906
As seen in #23284, when the tablet_metadata contains many tables, even empty ones,
we're seeing a long queue of seastar tasks coming from the individual destruction of
`tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>`.
This change improves `tablet_metadata::clear_gently` to destroy the `tablet_map_ptr` objects
on their owner shard by sorting them into vectors, per- owner shard.
Also, background call to clear_gently was added to `~token_metadata`, as it is destroyed
arbitrarily when automatic token_metadata_ptr variables go out of scope, so that the
contained tablet_metadata would be cleared gently.
Finally, a unit test was added to reproduce the `Too long queue accumulated for gossip` symptom
and verify that it is gone with this change.
Fixes#24814
Refs #23284
This change is not marked as fixing the issue since we still need to verify that there is no impact on query performance, reactor stalls, or large allocations, with a large number of tablet-based tables.
* Since the issue exists in 2025.1, requesting backport to 2025.1 and upwards
- (cherry picked from commit 3acca0aa63)
- (cherry picked from commit 493a2303da)
- (cherry picked from commit e0a19b981a)
- (cherry picked from commit 2b2cfaba6e)
- (cherry picked from commit 2c0bafb934)
- (cherry picked from commit 4a3d14a031)
- (cherry picked from commit 6e4803a750)
Parent PR: #24618Closesscylladb/scylladb#24862
* github.com:scylladb/scylladb:
token_metadata_impl: clear_gently: release version tracker early
test: topology_custom: test_tablets_merge: add test_tablet_split_merge_with_many_tables
token_metadata: clear_and_destroy_impl when destroyed
token_metadata: keep a reference to shared_token_metadata
token_metadata: move make_token_metadata_ptr into shared_token_metadata class
replica: database: get and expose a mutable locator::shared_token_metadata
locator: tablets: tablet_metadata: clear_gently: optimize foreign ptr destruction
Destructor of database_sstable_write_monitor, which is created
in table::try_flush_memtable_to_sstable, tries to get the compaction
state of the processed compaction group. If at this point
the compaction group is already stopped (and the compaction state
is removed), e.g. due to concurrent tablet merge, an exception is
thrown and a node coredumps.
Add flush gate to compaction group to wait for flushes in
compaction_group::stop. Hold the gate in seal function in
table::make_memtable_list. seal function is turned into
a coroutine to ensure it won't throw.
Wait until async_gate is closed before flushing, to ensure that
all data is written into sstables. Stop ongoing compactions
beforehand.
Remove unnecessary flush in tablet_storage_group_manager::merge_completion_fiber.
Stop method already flushes the compaction group.
Fixes: #23911.
Closesscylladb/scylladb#24582
(cherry picked from commit 2ec54d4f1a)
Closesscylladb/scylladb#24950
The test could fail with RF={DC1: 2, DC2: 0} and CL=ONE when:
- both writes succeeded with the same replica responding first,
- one of the following reads succeeded with the other replica
responding before it applied mutations from any of the writes.
We fix the test by not expecting reads with CL=ONE to return a row.
We also harden the test by inserting different rows for every pair
(CL, coordinator), where one of the two coordinators is a normal
node from DC1, and the other one is a zero-token node from DC2.
This change makes sure that, for example, every write really
inserts a row.
Fixesscylladb/scylladb#22967
The fix addresses CI flakiness and only changes the test, so it
should be backported.
Closesscylladb/scylladb#23518
(cherry picked from commit 21edec1ace)
Fixing conflicts required additionally backporting the log line
from scylladb/scylladb#22968.
Closesscylladb/scylladb#24983
in the CDC log transformer, when creating a CDC mutation based on some
base table mutation, for each value of a base column we set the value in
the CDC column with the same name.
When looking up the column in the CDC schema by name, we may get a null
pointer if a column by that name is not found. This shouldn't happen
normally because the base schema and CDC schema should be compatible,
and for each base column there should be a CDC column with the same
name.
However, there are scenarios where the base schema and CDC schema are
incompatible for a short period of time when they are being altered.
When a base column is being added or dropped, we could get a base
mutation with this column set, and then the CDC transformer picks up the
latest CDC schema which doesn't have this column.
If such thing happens, we fix the code to throw an exception instead of
crashing on null pointer dereference. Currently we don't have a safer
approach to handle this, but this might be changed in the future. The
other alternative is dropping that data silently which we prefer not to
do.
Throwing an error is acceptable because this scenario most likely
indicates this behavior by the user:
* The user adds a new column, and start writing values to the column
before the ALTER is complete. or,
* The user drops a column, and continues writing values to the column
while it's being dropped.
Both cases might as well fail with an error because the column is not
found in the base table.
Fixes https://github.com/scylladb/scylladb/issues/24952
backport needed - simple fix for a node crash
- (cherry picked from commit b336f282ae)
- (cherry picked from commit 86dfa6324f)
Parent PR: #24986Closesscylladb/scylladb#25065
* github.com:scylladb/scylladb:
test: cdc: add test_cdc_with_alter
cdc: throw error if column doesn't exist
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.
In this commit, we're restricting those operations. We also provide two
validation tests.
One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.
Fixes scylladb/scylladb#24643
Backport: we should backport the change to all affected
branches to prevent the consequences that may affect the user.
- (cherry picked from commit 20d0050f4e)
- (cherry picked from commit 59800b1d66)
Parent PR: #25008Closesscylladb/scylladb#25106
* github.com:scylladb/scylladb:
cdc: Forbid altering columns of inactive CDC log table
cdc: Forbid altering columns of CDC log tables directly
When CDC becomes disabled on the base table, the CDC log table
still exsits (cf. scylladb/scylladb@adda43edc7).
If it continues to exist up to the point when CDC is re-enabled
on the base table, no new log table will be created -- instead,
the old olg table will be *re-attached*.
Since we want to avoid situations when the definition of the log
table has become misaligned with the definition of the base table
due to actions of the user, we forbid modifying the set of columns
or renaming them in CDC log tables, even when they're inactive.
Validation tests are provided.
(cherry picked from commit 59800b1d66)
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.
In this commit, we're restricting those operations. We also provide two
validation tests.
One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.
Fixesscylladb/scylladb#24643
(cherry picked from commit 20d0050f4e)
No need to wait for all members to be cleared gently.
We can release the version earlier since the
held version may be awaited for in barriers.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 6e4803a750)
Reproduces #23284
Currently skipped in release mode since it requires
the `short_tablet_stats_refresh_interval` interval.
Ref #24641
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4a3d14a031)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We have a lot of places in the code where
a token_metadata_ptr is kept in an automatic
variable and destroyed when it leaves the scope.
since it's a referenced counted lw_shared_ptr,
the token_metadata object is rarely destroyed in
those cases, but when it is, it doesn't go through
clear_gently, and in particular its tablet_metadata
is not cleared gently, leading to inefficient destruction
of potentially many foreign_ptr:s.
This patch calls clear_and_destroy_impl that gently
clears and destroys the impl object in the background
using the shared_token_metadata.
Fixes#13381
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 2c0bafb934)
To be used by a following patch to gently clean and destroy
the token_data_impl in the background.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 2b2cfaba6e)
So we can use the local shared_token_metadata instance
for safe background destroy of token_metadata_impl:s.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit e0a19b981a)
Prepare for next patch, the will use this shared_token_metadata
to make mutable_token_metadata_ptr:s
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 493a2303da)
Sort all tablet_map_ptr:s by shard_id
and then destroy them on each shard to prevent
long cross-shard task queues for foreign_ptr destructions.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 3acca0aa63)