We want to move towards rack-list based replication factor for tablets being the default mode, and in the future the only supported mode. This PR is a step towards that. We auto-expand numeric RF to rack list on keyspace creation and ALTER when rf_rack_valid_keyspaces option is enabled.
The PR is mostly about adjusting tests. The main logic change is in the last patch, which modifies option post-processing in ks_prop_defs.
Fixes#26397Closesscylladb/scylladb#26692
* github.com:scylladb/scylladb:
cql3: ks_prop_defs: Expand numeric RF to rack list
locator: Move rack_list to topology.hh
alternator: Do not set RF for zero-token DCs
alternator: Switch keyspace creation to use ks_prop_defs
test: alternator: Adjust for rack lists
cql3: Move validation of invalid ALTER KEYSPACE earlier, to ks_prop_defs
test: cqlpy: Mark tests using rack lists as scylla-only
test: Switch to rack-list based RF
test: Generalize tests to work with both numeric RF and rack lists
test: cluster: test_zero_token_nodes_multidc: Adjust to rack list RF
test: Prepare for handling errors specific to rack list path
test: cluster: dtest: alternator: Force RF=1 in test_putitem_contention
test: Create cluster with multiple racks in multi-dc setups
test: boost: network_topology_strategy_test: Adjust to rack-list RF
test: tablets: Adjust to rack list
test: cluster: test_group0_schema_versioning: Use smaller RF to respect rf-rack-validness
test: tablets_test: Convert test_per_shard_goal_mixed_dc_rf to be rack-valid
test: object_store: test_backup: Adjust for rack lists
test: cluster: tablets: Do not move tablet across racks in test_tablet_transition_sanity
test: cluster: mv: Do not move tablets across racks
test: cluster: util: Fix docstring for parse_replication_options()
tablets, topology_coordinator: Skip tablet draining on replace
When auto compaction is disabled, all ongoing compactions, including
major compactions, are stopped. However, major compactions should not be
stopped, since the disable request applies only to regular auto
compactions.
This PR fixes the issue by tagging major compaction tasks with a newly
introduced `compaction_type::Major` enum. Since
`table::disable_auto_compaction()` already requests the compaction
manager to stop only tasks of type `compaction_type::Compaction`, major
compactions will no longer be stopped.
Fixes#24501
PR improves how the compactions are stopped when a disable auto compaction request is executed.
No need to backport
Closesscylladb/scylladb#26288
* github.com:scylladb/scylladb:
replica/table: do not stop major compaction when disabling auto compaction
compaction/compaction_descriptor: introduce compaction_type::Major
We adjust the test to RF-rack-validity and then re-enable
index random events, which requires the configuration option
`rf_rack_valid_keyspaces` to be enabled.
Fixesscylladb/scylladb#26422
Backport: I'd rather not backport these changes. They're almost a hack and poses too much risk for little gain.
Closesscylladb/scylladb#26591
* github.com:scylladb/scylladb:
test/cluster/random_failures: Re-enable index events
test/cluster/random_failures: Enable rf_rack_valid_keyspaces
test/cluster/random_failures: Adjust to RF-rack-validity
Auto-exands numeric RF in CREATE/ALTER KEYSPACE statements for
new DCs specified in the statement.
Doesn't auto-expand existing options, as the rack choice may not be in
line with current replica placement. This requires co-locating tablet
replicas, and tracking of co-location state, which is not implemented yet.
Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
To achieve RF=3 with tablets and rf_rack_valid_keyspaces, we need 3
racks. So change the test to create 3 racks. Alternator was bypassing
standard keyspace creation path, so it escaped validation. But this
will change, and the test will stop wroking.
Also, after auto-expansion of RF to rack list, not all of 4 nodes
will host replicas. So need to adjust expectations.
Have to do that before we enable auto-expansion of numeric RF to
rack-lists, because those tests alter the replication factor, and
altering from rack-list to numeric will not be allowed.
Two changes here:
1) Allocate nodes in dc2 in separeate racks to make the test stronger
- it invites bugs where RF==nr_racks succeeds despite there being
zero-token nodes, and not simply fail due to rack count.
2) Due to auto-expansion to rack list, scylla throws in keyspace
creation rather than table creation.
With rf_rack_valid_keyspaces enabled, RF of alternator tables will be
equal to the number of racks (in this test: nodes). Prior to that, if
number of nodes is smaller than 3, alternator creates the keyspace
with RF=1. Turns out, with RF=2 the test fails with write timeouts due
to contention. Enforce RF=1 by creating the table with one node before
adding the second node.
test_decommission_rack_load_failure expects some tablets to land in
the rack which only has the decommissioning node. Since the table uses
RF=1, auto-expansion may choose the other rack and put all tablets
there, and the expected failure will not happen. Force placement by
using rack-list RF.
Choose old_replica and new_replica so that they're both in rack r1.
After later changes (rack list auto expansion), it's no longer
guaranteed that the first replica will be on r1.
In commit a3ec6c7d1d we supposedly
implemented the feature of telling TTL experation events from regular
user-sent deletions. However, that implementation did not actually work
at all... It had two bugs:
1. It created an null rjson::value() instead of an empty dictionary
rjson::empty_object(), so GetRecords failed every time such a
TTL expiration event was generated.
2. In the output, it used lowercase field names "type" and "principalId"
instead of the uppercase "Type" and "PrincipalId". This is not the
correct capitalization, and when boto3 recieves such incorrect
fields it silently deletes them and never passes them to the user's
get_records() call.
This patch fixes those two bugs, and importantly - enables a test for
this feature. We did already have such a test but it was marked as
"veryslow" so doesn't run in CI and apparently not even run once to
check the new feature. This test is not actually very long on Alternator
when the TTL period is set very low (as we do in our tests), so I replaced
the "veryslow" marker by "waits_for_expiration". The latter marker means
that the test is still very slow - as much as half an hour - on DynamoDB -
but runs quickly on Scylla in our test setup, and enabled in CI by
default.
The enabled test failed badly before this patch (a server error during
GetRecords), and passes with this patch.
Also, the aforementioned commit forgot to remove the paragraph in
Alternator's compatibility.md that claims we don't have that feature yet.
So we do it now.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#26633
Until this patch, CDC haven't fetched a preimage for mutations
containing only a partition tombstone. Therefore, single-row deletions
in a table witout a clustering key didn't include a preimage, which was
inconsistent with single-row clustered deletions. This commit addresses
this inconsistency.
Second reason is compatibility with DynamoDB Streams, which doesn't
support entire-partition deletes. Alternator uses partition tombstones
for single-row deletions, though, and in these cases the 'OldImage' was
missing from REMOVE records.
Fixes https://github.com/scylladb/scylladb/issues/26382Closesscylladb/scylladb#26578
When auto compaction is disabled, all ongoing compactions, including
major compactions, are stopped. However, major compactions should not be
stopped, since the disable request applies only to regular auto
compactions.
This patch fixes the issue by tagging major compaction tasks with the
newly introduced `compaction_type::MajorCompaction`. Since
`table::disable_auto_compaction()` already requests the compaction
manager to stop only tasks of type `compaction_type::Compaction`, major
compactions will no longer be stopped.
Fixes#24501
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Introduce a new compaction_type enum : `Major`.
This type will be used by the next patches to differentiate between
major compaction and regular compaction (compaction_type::Compaction).
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
when loading CDC streams metadata for tablets from the tables, read only
new entries from the history table instead of reading all entries. This
improves the CDC metadata reloading, making it more efficient and
predictable.
the CDC metadata is loaded as part of group0 reload whenever the
internal CDC tables are modified. on tablet split / merge, we create a
new CDC timestamp and streams by writing them to the cdc_streams_history
table by group0 operation, and when it's applied we reload the in-memory
CDC streams map by reading from the tables and constructing the updated map.
Previously, on every update, we would read the entire
cdc_streams_history entries for the changed table, constructing all its
streams and creating a new map from scratch.
We improve this now by reading only new entries from cdc_streams_history
and append them to the existing map. we can do this because we only
append new entries to cdc_streams_history with higher timestamp than all
previous entries.
This makes this reloading more efficient and predictable, because
previously we would read a number of entries that depends on the number
of tablets splits and merges, which increases over time and is
unbounded, whereas now we read only a single stream set on each update.
Fixes https://github.com/scylladb/scylladb/issues/26732
backport to 2025.4 where cdc with tablets is introduced
Closesscylladb/scylladb#26160
* github.com:scylladb/scylladb:
test: cdc: extend cdc with tablets tests
cdc: improve cdc metadata loading
We currently allow creating multiple vector indexes on one column.
This doesn't make much sense as we do not support picking one when
making ann queries.
To make this less confusing and to make our behavior similar
to Cassandra we disallow the creation of multiple vector indexes
on one column.
We also add a test that checks this behavior.
Fixes: VECTOR-254
Fixes: #26672Closesscylladb/scylladb#26508
This is a follow-up for https://github.com/scylladb/scylladb/pull/26315. Fixes several review comments that were left unresolved in the original PR.
backport: not needed, this PR contains only renames and code comment fixes
Closesscylladb/scylladb#26745
* https://github.com/scylladb/scylladb:
test_automatic_cleanup: fix comment
storage_proxy: remove stale comment
storage_proxy: improve run_fenceable_write comment
topology_coordinator: rename start_cleanup_on_dirty_nodes -> start_vnodes_cleanup_on_dirty_nodes
storage_service: rename is_cleanup_allowed -> is_vnodes_cleanup_allowed
storage_service: rename do_cluster_cleanup -> do_clusterwide_vnodes_cleanup
We migrate `tablets_test.py::TestTablets::test_moving_tablets_replica_on_node`
from dtests to the repository of Scylla. We divide the test into two
steps to make testing easier and even possible with RF-rack-valid
keyspaces being enforced.
Closesscylladb/scylladb#26285
– Workload: N workers perform CAS updates
UPDATE … SET s{i}=new WHERE pk=? IF (∀j≠i: s{j}>=guard_j) AND s{i}=prev
at CL=LOCAL_QUORUM / SERIAL=LOCAL_SERIAL. Non-apply without timeout is treated
as contention; “uncertainty” timeouts are resolved via LOCAL_SERIAL read.
- Enable balancing and increase min_tablet_count to force split,
flush and lower min_tablet_count to merge.
- “Uncertainty” timeouts (write timeout due to uncertainty) are resolved via a
LOCAL_SERIAL read to determine whether the CAS actually applied.
- Invariants: after the run, for every pk and column s{i}, the stored value
equals the number of confirmed CAS by worker i (no lost or phantom updates)
despite ongoing tablet moves.
Closesscylladb/scylladb#26113
extend and improve the tests of virtual tables for cdc with tablets.
split the existing virtual tables test to one test that validates the
virtual tables against the internal cdc tables, and triggering some
tablet splits in order to create entries in the cdc_streams_history
table, and add another test with basic validation of the virtual tables
when there are multiple cdc tables.
Recently (#26231) there was added a test to check that several API
endpoints, that return tokens and corresponding replica nodes, are
consistent with tablet map. This patch adds one more API endpoint to the
validation -- the /storage_service/tokens_endpoint one.
The extention is pretty straightforward, but the new endpoint returns
back a single (primary) replica for a token, so the test check is
slightly modified to account for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26580
We've enabled the configuration option `rf_rack_valid_keyspaces`,
so we can finally re-enable the events creating and dropping secondary
indexes.
Fixesscylladb/scylladb#26422
We adjust the test to work with the configuration option
`rf_rack_valid_keyspaces` enabled. For that, we ensure that there is
always at least one node in each of the three racks. This way, all
keyspaces we create and manipulate will remain RF-rack-valid since they
all use RF=3.
------------------------------------------------------------------------
To achieve that, we only need to adjust the following events:
1. `init_tablet_transfer`
The event creates a new keyspace and table and manually migrates
a tablet belonging to it. As long as we make sure the migration occurs
within the same rack, there will be no problem.
Since RF == #racks, each rack will have exactly one tablet replica,
so we can migrate the tablet to an arbitrary node in the same rack.
Note that there must exist a node that's not a replica. If there weren't
such a node, the test wouldn't have worked before this commit because
it's not possible to migrate a tablet from one node being its replica to
another. In other words, we have a guarantee that there are at least 4 nodes
in the cluster when we try to migrate a tablet replica.
That said, we check it anyway. If there's no viable node to migrate the
tablet replica to, we log that information and do nothing. That should be
an acceptable solution.
2. `add_new_node`
As long as we add a node to an existing rack, there's no way to
violate the invariant imposed by the configuration option, so we pick
a random rack out of the existing three and create a node in it.
3. `decommission_node`
We need to ensure that the node we'll be trying to decommission is
not the only one in its rack.
Following pretty much the same reasoning as in `init_tablet_transfer`,
we conclude there must be a rack with at least two nodes in it. Otherwise
we'd end up having to migrate a tablet from one replica node to another,
which is not possible.
What's more, decommissioning a node is not possible if any node in
the cluster is dead, so we can assume that `manager.running_servers`
returns the whole cluster.
4. `remove_node`
The same as `decommission_node`. Just note although the node we choose to
remove must be first stopped, none other node can be dead, so the whole
cluster must be returned by `manager.running_servers`.
------------------------------------------------------------------------
There's one more important thing to note. The test may sometimes trigger
a sequence of events where a new node is started, but, due to an error
injection, its initialization is not completed. Among other things, the
node may NOT have a host ID recognized by the rest of the nodes in the
cluster, and operations like tablet migration will fail if they target
it.
Thankfully, there seems to be a way to avoid problems stemming from
that. When a new node is added to the cluster, it should appear at the
end of the list returned by `manager.running_servers`. This most likely
stems from how dictionaries work in Python:
"Keys and values are iterated over in insertion order."
-- https://docs.python.org/3/library/stdtypes.html#dict-views
and the fact that we keep track of running servers using a dictionary.
Furthermore, we rely on the assumption that the test currently works
correctly.
Assume, to the contrary, that among the nodes taking part in the operations
listed above, there is at most one node per rack that has its host ID recognized
by the rest of the cluster. Note that only those nodes can store any tablets.
Let's refer to the set of those nodes as X.
Assume that we're dealing with tablet migration, decommissioning, or removing
a node. Since those operations involve tablet migration, at least one tablet
will need to be migrated from the node in question to another node in X.
However, since X consists of at most three nodes, and one of them is losing
its tablet, there is no viable target for the tablet, so the operation fails.
Using those assumptions, an auxiliary function, `select_viable_rack`,
was designed to carefully choose a correct rack, which we'll then pick nodes
from to perform the topological operations. It's simple: we just find the first
rack in the list that has at least two nodes in it. That should ensure that we
perform an operation that doesn't lead to any unforeseen disaster.
------------------------------------------------------------------------
Since the test effectively becomes more complex due to more care for keeping
the topology of the cluster valid, we extend the log messages to make them
more helpful when debugging a failure.
This patch adds reproducing tests in test/alternator for issue #23438,
which is about missing checks for the length of headers and the URL
in Alternator requests. These should be limited, because Seastar's
HTTP server, which Scylla uses, reads them into memory so they can OOM
Scylla.
The tests demonstrate that DynamoDB enforces a 16 KB limit on the
headers and the URL of the request, but Scylla doesn't (a code
inspection suggests it does not in fact have any limit).
The two tests pass on DynamoDB and currently xfail on Alternator.
Refs #23438.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23442
Sometimes file::list_directory() returns entries without type set. In
thase case lister calls file_type() on the entry name to get it. In case
the call returns disengated type, the code assumes that some error
occurred and resolves into exception.
That's not correct. The file_type() method returns disengated type only
if the file being inspected is missing (i.e. on ENOENT errno). But this
can validly happen if a file is removed bettween readdir and stat. In
that case it's not "some error happened", but a enry should be just
skipped. In "some error happened", then file_type() would resolve into
exceptional future on its own.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26595
Currently, batchlog replay is considered successful even if all batches fail
to be sent (they are replayed later). However, repair requires all batches
to be sent successfully. Currently, if batchlog isn't cleared, the repair never
learns and updates the repair_time. If GC mode is set to "repair", this means
that the tombstones written before the repair_time (minus propagation_delay)
can be GC'd while not all batches were replied.
Consider a scenario:
- Table t has a row with (pk=1, v=0);
- There is an entry in the batchlog that sets (pk=1, v=1) in table t;
- The row with pk=1 is deleted from table t;
- Table t is repaired:
- batchlog reply fails;
- repair_time is updated;
- propagation_delay seconds passes and the tombstone of pk=1 is GC'd;
- batchlog is replayed and (pk=1, v=1) inserted - data resurrection!
Do not update repair_time if sending any batch fails. The data is still repaired.
For tablet repair the repair runs, but at the end the exception is passed
to topology coordinator. Thanks to that the repair_time isn't updated.
The repair request isn't removed as well, due to which the repair will need
to rerun.
Apart from that, a batch is removed from the batchlog if its version is invalid
or unknown. The condition on which we consider a batch too fresh to replay
is updated to consider propagation_delay.
Fixes: https://github.com/scylladb/scylladb/issues/24415
Data resurrection fix; needs backport to all versions
Closesscylladb/scylladb#26319
* github.com:scylladb/scylladb:
db: fix indentation
test: add reproducer for data resurrection
repair: fail tablet repair if any batch wasn't sent successfully
db/batchlog_manager: fix making decision to skip batch replay
db: repair: throw if replay fails
db/batchlog_manager: delete batch with incorrect or unknown version
db/batchlog_manager: coroutinize replay_all_failed_batches
Before this commit, when the underlying materialized view was created,
it didn't have the property `tombstone_gc` set to any value. We fix the
bug in this PR.
Implementation strategy:
1. Move code responsible for producing the schema
of a secondary index to the file that handles
`CREATE INDEX`.
2. Set the property when creating the view.
3. Add reproducer tests.
Fixesscylladb/scylladb#26542
Backport: we can discuss it.
Closesscylladb/scylladb#26543
* github.com:scylladb/scylladb:
index: Set tombstone_gc when creating secondary index
index: Make `create_view_for_index` method of `create_index_statement`
index: Move code for creating MV of secondary index to cql3
db, cql3: Move creation of underlying MV for index
Following DynamoDB, Alternator also places a 16 MB limit on the size of
a request. Such a limit is necessary to avoid running out of memory -
because the AWS message authentication protocol requires reading the
entire request into memory before its signature can be verified.
Our implementation for this limit used Seastar's HTTP server's
content_length_limit feature. However, this Seastar feature is
incomplete - it only works when the request uses the Content-Length
header, and doesn't do anything if the request doesn't have a
Content-Length (it may use chunked encoding, or have no length at all).
So malicious users can cause Scylla to OOM by sending a huge request
without a Content-Length.
So in this patch we stop using the incomplete Seastar feature, and
implement the length limit in Scylla in a way that works correctly with
or without Content-Length: We read from the input stream and if we go
over 16MB, we generate an error.
Because we dropped Seastar's protection against a long Content-Length,
we also need to fix a piece of code which used Content-Length to reserve
some semaphore units to prevent reading many large requests in parallel.
We fix two problems in the code:
1. If Content-Length is over the limit, we shouldn't attempt to reserve
semaphore units - this should just be a Payload Too Large error.
2. If Content-Length is missing, the existing code did nothing and had
a TODO that we should. In this patch we implement what was suggested
in that TODO: We temporarily reserve the whole 16 MB limit, and
after reading the actual request, we return part of the reservation
according to the real request size.
That last fix is important, because typically the largest requests will be
BatchWriteItem where a well-written client would want to use chunked
encoding, not Content-Length, to avoid materializing the entire request
up-front. For such clients, the memory use semaphore did nothing, and
now it does the right thing.
Note that this patch does *not* solve the problem #12166 that existed
with Seastar's length-limiting implementation but still exists in the
new in-Scylla length-limiting implementation: The fact we send an
error response in the middle of the request and then close the
connection, while the client continues to send the request, can lead
to an RST being sent by the server kernel. Usually this will be fine -
well-written client libraries will be able to read the response before
the RST. But even with a well-written library in some rare timings
the client may get the RST before the response, and will miss the
response, and get an empty or partial response or "connection reset
by peer". This issue existed before this patch, and still exists, but
is probably of minor impact.
Fixes#8196
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23434
In #24031 users complained, that trace message is truncated, namely it's
no longer json parsable and table name might not be part of the output.
This path enables users to configure maximum size of trace message.
In case user wanted `table` name, but didn't care about message size,
#26634 will help.
- add configuration varable `alternator_max_users_query_size_in_trace_output`
with default value of 4096 (4 times old default value).
- modify `truncated_content_view` function to use new configuration
variable for truncation limit
- update `truncated_content_view` to consistently truncate at given
size, previously trunctation would also happen when data arrived in
more than one chunk
- update `truncated_content_view` to better handle truncated value
(limit number of copies)
- fix `scylla_config_read` call - call to `query` for a configuration
name that is not existing will return `Items` array empty
(but present) - this would raise array access exception few lines
below.
- add test
Refs #26634
Refs #24031Closesscylladb/scylladb#26618
When streaming SSTables across tablets, a single SSTable may be streamed to multiple tablets. The previous implementation unlinked SSTables immediately after streaming them for the first tablet, potentially making them partially unavailable for subsequent tablets. This patch replaces unlink() with mark_for_deletion() deferring actual unlinking till sstable::close_files.
test_tablets2::test_tablet_load_and_stream was enhanced to also verify that SSTables are removed after being streamed.
Fixes#26606
Backport is not required, although it is a bug fix, but it isn't visible. This is more of a preparatory fix for https://github.com/scylladb/scylladb/pull/26444.
Closesscylladb/scylladb#26622
* github.com:scylladb/scylladb:
test_tablets2: verify SSTable cleanup after tablet load and stream
tablet_sstable_streamer: replace unlink() call with mark_for_deletion()
It turns out that #21477 wasn't sufficient to fix the issue. The driver
may still decide to reconnect the connection after `rolling_restart`
returns. One possible explanation is that the driver sometimes handles
the DOWN notification after all nodes consider each other UP.
Reconnecting the driver after restarting nodes seems to be a reliable
workaround that many tests use. We also use it here.
Fixes#19959Closesscylladb/scylladb#26638
With the recent introduction of retry_strategy to Seastar, the pure virtual class previously defined in ScyllaDB is now redundant. This change allows us to streamline our codebase by directly inheriting from Seastar’s implementation, eliminating duplication in ScyllaDB.
Despite this update is purely a refactoring effort and does not introduce functional changes it should be ported back to 2025.3 and 2025.4 otherwise it will make future backports of bugfixes/improvements related to `s3_client` near to impossible
ref: https://github.com/scylladb/seastar/issues/2803
depends on: https://github.com/scylladb/seastar/pull/2960Closesscylladb/scylladb#25801
* github.com:scylladb/scylladb:
s3_client: remove unnecessary `co_await` in `make_request`
s3 cleanup: remove obsolete retry-related classes
s3_client: remove unused `filler_exception`
s3_client: fix indentation
s3_client: simplify chunked download error handling using `make_request`
s3_client: reformat `make_request` functions for readability
s3_client: eliminate duplication in `make_request` by using overload
s3_client: reformat `make_request` function declarations for readability
s3_client: reorder `make_request` and helper declarations
s3_client: add `make_request` override with custom retry and error handler
s3_client: migrate s3_client to Seastar HTTP client
s3_client: fix crash in `copy_s3_object` due to dangling stream
s3_client: coroutinize `copy_s3_object` response callback
aws_error: handle missing `unexpected_status_error` case
s3_creds: use Seastar HTTP client with retry strategy
retry_strategy: add exponential backoff to `default_aws_retry_strategy`
retry_strategy: introduce Seastar-based retry strategy
retry_strategy: update CMake and configure.py for new strategy
retry_strategy: rename `default_retry_strategy` to `default_aws_retry_strategy`
retry_strategy: fix include
retry_strategy: Copied utils/s3/retry_strategy.hh to utils/s3/default_aws_retry_strategy.hh
retry_strategy: Copied utils/s3/retry_strategy.cc to utils/s3/default_aws_retry_strategy.cc
introduce helper functions that can be used for garbage collecting old
cdc streams for tablets-based keyspaces.
add a background fiber to the topology coordinator that runs
periodically and checks for old CDC streams for tablets keyspaces that
can be garbage collected.
the garbage collection works by finding the newest cdc timestamp that has been
closed for more than the configured cdc TTL, and removing all information from
the cdc internal tables about cdc timestamps and streams up to this timestamp.
in general it should be safe to remove information about these streams because
they are closed for more than TTL, therefore all rows that were written to these streams
with the configured TTL should be dead.
the exception is if the TTL is altered to a smaller value, and then we may remove information
about streams that still have live rows that were written with the longer ttl.
Fixes https://github.com/scylladb/scylladb/issues/26669Closesscylladb/scylladb#26410
* github.com:scylladb/scylladb:
cdc: garbage collect CDC streams periodically
cdc: helpers for garbage collecting old streams for tablets
Currently when a null vector is passed to an ANN query we fail with a
quite confusing error ("NoHostAvailable: ('Unable to complete the
operation against any hosts', {<Host: 127.0.0.1:9042 datacenter1>:
<Error from server: code=0000 [Server error] message="to_bytes() called
on raw value that is null">})").
This patch fixes that by throwing an InvalidRequestException with an
appropriate message instead.
We also add a test case that validates this behavior.
Fixes: VECTOR-257
Closesscylladb/scylladb#26510
Fixes#26641
* Adds shared abstraction for dockerized mock services for out pytests (not using python docker, due to both library and podman)
* Adds test fixtures for our key providers (except GCS KMS, for which we have no mock server) to do local testing
* Ports (and prunes and sharpens) the test cases from dtest::encryption_at_rest_test to our pytest.
* Shared KMIP mock between boost test and pytest and speeds up boost test shutdown.
When merged, the dtest counterpart can be decommissioned.
Closesscylladb/scylladb#26642
* github.com:scylladb/scylladb:
test::cluster::object_store::conftest: Make GS proxy use shared docker mock server wrapper
test::cluster::test_encryption: Port dtest EAR tests
test::cluster::conftest: Add key_provider fixture
test::pylib::encryption_provider: Port dtest encryption provider classes
test::pylib::dockerized_service: Add helper for running docker/podman
test::pylib::kmip_wrapper: Modify to be usable by pytest fixtures
test::boost::kmip_wrapper: Move python script for PyKMIP to pylib