Commit Graph

49943 Commits

Author SHA1 Message Date
Nikos Dragazis
bafe2bbbbc db/config: Deprecate sstable_compression_dictionaries_allow_in_ddl
The option is a knob that allows to reject dictionary-aware compressors
in the validation stage of CREATE/ALTER statements, and in the
validation of `sstable_compression_user_table_options`. It was
introduced in 7d26d3c7cb to allow the admins of Scylla Cloud to
selectively enable it in certain clusters. For more details, check:
https://github.com/scylladb/scylla-enterprise/issues/5435

As of this series, we want to start offering dictionary compression as
the default option in all clusters, i.e., treat it as a generally
available feature. This makes the knob redundant.

Additionally, making dictionary compression the default choice in
`sstable_compression_user_table_options` creates an awkward dependency
with the knob (disabling the knob should cause
`sstable_compression_user_table_options` to fall back to a non-dict
compressor as default). That may not be very clear to the end user.

For these reasons, mark the option as "Deprecated", remove all relevant
tests, and adjust the business logic as if dictionary compression is
always available.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
(cherry picked from commit 96e727d7b9)
2025-11-04 15:40:46 +02:00
Nikos Dragazis
260c9972b0 boost/cql_query_test: Get expected compressor from config
Since 5b6570be52, the default SSTable compression algorithm for user
tables is no longer hardcoded; it can be configured via the
`sstable_compression_user_table_options.sstable_compression` option in
scylla.yaml.

Modify the `test_table_compression` test to get the expected value from
the configuration.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
(cherry picked from commit d95ebe7058)
2025-10-31 23:50:20 +00:00
Pavel Emelyanov
9459a58116 Merge '[Backport 2025.4] cdc: improve cdc metadata loading' from Scylladb[bot]
when loading CDC streams metadata for tablets from the tables, read only
new entries from the history table instead of reading all entries. This
improves the CDC metadata reloading, making it more efficient and
predictable.

the CDC metadata is loaded as part of group0 reload whenever the
internal CDC tables are modified. on tablet split / merge, we create a
new CDC timestamp and streams by writing them to the cdc_streams_history
table by group0 operation, and when it's applied we reload the in-memory
CDC streams map by reading from the tables and constructing the updated map.

Previously, on every update, we would read the entire
cdc_streams_history entries for the changed table, constructing all its
streams and creating a new map from scratch.

We improve this now by reading only new entries from cdc_streams_history
and append them to the existing map. we can do this because we only
append new entries to cdc_streams_history with higher timestamp than all
previous entries.

This makes this reloading more efficient and predictable, because
previously we would read a number of entries that depends on the number
of tablets splits and merges, which increases over time and is
unbounded, whereas now we read only a single stream set on each update.

Fixes https://github.com/scylladb/scylladb/issues/26732

backport to 2025.4 where cdc with tablets is introduced

- (cherry picked from commit 8743422241)

- (cherry picked from commit 4cc0a80b79)

Parent PR: #26160

Closes scylladb/scylladb#26798

* github.com:scylladb/scylladb:
  test: cdc: extend cdc with tablets tests
  cdc: improve cdc metadata loading
2025-10-30 10:32:27 +03:00
Michael Litvak
59f97d0b71 test: cdc: extend cdc with tablets tests
extend and improve the tests of virtual tables for cdc with tablets.
split the existing virtual tables test to one test that validates the
virtual tables against the internal cdc tables, and triggering some
tablet splits in order to create entries in the cdc_streams_history
table, and add another test with basic validation of the virtual tables
when there are multiple cdc tables.

(cherry picked from commit 4cc0a80b79)
2025-10-30 02:44:47 +00:00
Michael Litvak
0a07c2cb19 cdc: improve cdc metadata loading
when loading CDC streams metadata for tablets from the tables, read only
new entries from the history table instead of reading all entries. This
improves the CDC metadata reloading, making it more efficient and
predictable.

the CDC metadata is loaded as part of group0 reload whenever the
internal CDC tables are modified. on tablet split / merge, we create a
new CDC timestamp and streams by writing them to the cdc_streams_history
table by group0 operation, and when it's applied we reload the in-memory
CDC streams map by reading from the tables and constructing the updated map.

Previously, on every update, we would read the entire
cdc_streams_history entries for the changed table, constructing all its
streams and creating a new map from scratch.

We improve this now by reading only new entries from cdc_streams_history
and append them to the existing map. we can do this because we only
append new entries to cdc_streams_history with higher timestamp than all
previous entries.

This makes this reloading more efficient and predictable, because
previously we would read a number of entries that depends on the number
of tablets splits and merges, which increases over time and is
unbounded, whereas now we read only a single stream set on each update.

Fixes scylladb/scylladb#26732

(cherry picked from commit 8743422241)
2025-10-30 02:44:47 +00:00
Pavel Emelyanov
080c55a115 lister: Fix race between readdir and stat
Sometimes file::list_directory() returns entries without type set. In
thase case lister calls file_type() on the entry name to get it. In case
the call returns disengated type, the code assumes that some error
occurred and resolves into exception.

That's not correct. The file_type() method returns disengated type only
if the file being inspected is missing (i.e. on ENOENT errno). But this
can validly happen if a file is removed bettween readdir and stat. In
that case it's not "some error happened", but a enry should be just
skipped. In "some error happened", then file_type() would resolve into
exceptional future on its own.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#26595

(cherry picked from commit d9bfbeda9a)

Closes scylladb/scylladb#26767
2025-10-29 11:29:57 +02:00
Anna Stuchlik
93de570e33 doc: add --list-active-releases to Web Installer
Fixes https://github.com/scylladb/scylladb/issues/26688

V2 of https://github.com/scylladb/scylladb/pull/26687

Closes scylladb/scylladb#26689

(cherry picked from commit bd5b966208)

Closes scylladb/scylladb#26765
2025-10-29 11:28:51 +02:00
Patryk Jędrzejczak
680bfa9ab7 test: test_raft_recovery_stuck: reconnect driver after rolling restarts
It turns out that #21477 wasn't sufficient to fix the issue. The driver
may still decide to reconnect the connection after `rolling_restart`
returns. One possible explanation is that the driver sometimes handles
the DOWN notification after all nodes consider each other UP.

Reconnecting the driver after restarting nodes seems to be a reliable
workaround that many tests use. We also use it here.

Fixes #19959

Closes scylladb/scylladb#26638

(cherry picked from commit 5321720853)

Closes scylladb/scylladb#26763
2025-10-29 11:27:49 +02:00
Anna Stuchlik
ed58815199 doc: add OS support for version 2025.4
Fixes https://github.com/scylladb/scylladb/issues/26450

Closes scylladb/scylladb#26616

(cherry picked from commit 6fa342fb18)

Closes scylladb/scylladb#26750
2025-10-29 11:27:08 +02:00
Botond Dénes
9c8812a154 Merge '[Backport 2025.4] LWT: use shards_ready_for_reads for replica locks' from Scylladb[bot]
When a tablet is migrated between shards on the same node, during the write_both_read_new state we begin switching reads to the new shard. Until the corresponding global barrier completes, some requests may still use write_both_read_old erm, while others already use the write_both_read_new erm. To ensure mutual exclusion between these two types of requests, we must acquire locks on both the old and new shards. Once the global barrier completes, no requests remain on the old shard, so we can safely switch to acquiring locks only on the new shard.

The idea came from the similar locking problem in the [counters for tablets PR](https://github.com/scylladb/scylladb/pull/26636#discussion_r2463932395).

Fixes scylladb/scylladb#26727

backport: need to backport to 2025.4

- (cherry picked from commit 5ab2db9613)

- (cherry picked from commit 478f7f545a)

Parent PR: #26719

Closes scylladb/scylladb#26748

* github.com:scylladb/scylladb:
  paxos_state: use shards_ready_for_reads
  paxos_state: inline shards_for_writes into get_replica_lock
2025-10-29 11:26:29 +02:00
Botond Dénes
aac49601c6 Merge '[Backport 2025.4] cdc: garbage collect CDC streams for tablets' from Scylladb[bot]
introduce helper functions that can be used for garbage collecting old
cdc streams for tablets-based keyspaces.

add a background fiber to the topology coordinator that runs
periodically and checks for old CDC streams for tablets keyspaces that
can be garbage collected.

the garbage collection works by finding the newest cdc timestamp that has been
closed for more than the configured cdc TTL, and removing all information from
the cdc internal tables about cdc timestamps and streams up to this timestamp.

in general it should be safe to remove information about these streams because
they are closed for more than TTL, therefore all rows that were written to these streams
with the configured TTL should be dead.
the exception is if the TTL is altered to a smaller value, and then we may remove information
about streams that still have live rows that were written with the longer ttl.

Fixes https://github.com/scylladb/scylladb/issues/26669

- (cherry picked from commit 440caeabcb)

- (cherry picked from commit 6109cb66be)

Parent PR: #26410

Closes scylladb/scylladb#26728

* github.com:scylladb/scylladb:
  cdc: garbage collect CDC streams periodically
  cdc: helpers for garbage collecting old streams for tablets
2025-10-29 11:25:31 +02:00
Asias He
89364d3576 repair: Remove the regular mode name in the tablet repair api
The patch e34deb72f9 (repair: Rename incremental mode name)
missed one place that references the removed regular mode name.

Fixes #26503

Closes scylladb/scylladb#26660

(cherry picked from commit 5f1febf545)

Closes scylladb/scylladb#26684
2025-10-29 11:22:56 +02:00
Anna Stuchlik
68ea778b6b doc: add support for Debian 12
Fixes https://github.com/scylladb/scylladb/issues/26640

Closes scylladb/scylladb#26668

(cherry picked from commit 9c0ff7c46b)

Closes scylladb/scylladb#26681
2025-10-29 11:22:29 +02:00
Botond Dénes
087f739bf9 Merge '[Backport 2025.4] alternator/executor: instantly mark view as built when creating it with base table' from Scylladb[bot]
`CreateTable` request creates GSI/LSI together with the base table,
the base table is empty and we don't need to actually build the view.

In tablet-based keyspaces we can just don't create view building tasks
and mark the view build status as SUCCESS on all nodes. Then, the view
building worker on each node will mark the view as built in
`system.built_views` (`view_building_worker::update_built_views()`).

Vnode-based keyspaces will use the "old" logic of view builder, which
will process the view and mark it as built.

Fixes scylladb/scylladb#26615

This fix should be backported to 2025.4.

- (cherry picked from commit 8fbf122277)

- (cherry picked from commit bdab455cbb)

- (cherry picked from commit 34503f43a1)

Parent PR: #26657

Closes scylladb/scylladb#26670

* github.com:scylladb/scylladb:
  test/alternator/test_tablets: add test for GSI backfill with tablets
  test/alternator/test_tablets: add reproducer for GSI with tablets
  alternator/executor: instantly mark view as built when creating it with base table
2025-10-29 11:21:27 +02:00
Petr Gusev
332b776e87 paxos_state: use shards_ready_for_reads
Acquiring locks on both shards for the entire tablet migration period
is redundant. In most cases, locking only the old shard or only the new
shard is sufficient. Using shards_ready_for_reads reduces the
situations in which we need to lock both shards to:
* intra-node migrations only
* only during the write_both_read_new state
Once the global barrier completes in the write_both_read_new state, no
requests remain on the old shard, so we can safely acquire locks
only on the new shard.

Fixes scylladb/scylladb#26727

(cherry picked from commit 478f7f545a)
2025-10-28 16:59:47 +00:00
Petr Gusev
ff0e7ac853 paxos_state: inline shards_for_writes into get_replica_lock
No need to have two functions since both callers of get_replica_lock()
use shards_for_writes() to compute the shards where the locks
must be acquired.

Also while at it, inline the acquire() lambda in get_replica_lock()
and replace it with a loop over shards. This makes the code
more strightforward.

(cherry picked from commit 5ab2db9613)
2025-10-28 16:59:47 +00:00
Michael Litvak
5319759bdb cdc: garbage collect CDC streams periodically
add a background fiber to the topology coordinator that runs
periodically and checks for old CDC streams for tablets keyspaces that
can be garbage collected.

(cherry picked from commit 6109cb66be)
2025-10-27 19:53:04 +00:00
Michael Litvak
55d9d5e7c2 cdc: helpers for garbage collecting old streams for tablets
introduce helper functions that can be used for garbage collecting old
cdc streams for tablets-based keyspaces.

- get_new_base_for_gc: finds a new base timestamp given a TTL, such that
  all older timestamps and streams can be removed.
- get_cdc_stream_gc_mutations: given new base timestamp and streams,
  builds mutations that update the internal cdc tables and remove the
  older streams.
- garbage_collect_cdc_streams_for_table: combines the two functions
  above to find a new base and build mutations to update it for a
  specific table
- garbage_collect_cdc_streams: builds gc mutations for all cdc tables

(cherry picked from commit 440caeabcb)
2025-10-27 19:53:04 +00:00
Jenkins Promoter
7f08d0a6cf Update ScyllaDB version to: 2025.4.0-rc4 2025-10-27 14:57:11 +02:00
Patryk Jędrzejczak
c406e1dd17 Merge '[Backport 2025.4] raft topology: fix group0 tombstone GC in the Raft-based recovery procedure' from Scylladb[bot]
Group0 tombstone GC considers only the current group 0 members
while computing the group 0 tombstone GC time. It's not enough
because in the Raft-based recovery procedure, there can be nodes
that haven't joined the current group 0 yet, but they have belonged
to a different group 0 and thus have a non-empty group 0 state ID.
The current code can cause a data resurrection in group 0 tables.

We fix this issue in this PR and add a regression test.

This issue was uncovered by `test_raft_recovery_entry_loss`, which
became flaky recently. We skipped this test for now. We will unskip
it in a following PR because it's skipped only on master, while we
want to backport this PR.

Fixes #26534

This PR contains an important bugfix, so we should backport it
to all branches with the Raft-based recovery procedure (2025.2
and newer).

- (cherry picked from commit 1d09b9c8d0)

- (cherry picked from commit 6b2e003994)

- (cherry picked from commit c57f097630)

Parent PR: #26612

Closes scylladb/scylladb#26682

* https://github.com/scylladb/scylladb:
  test: test group0 tombstone GC in the Raft-based recovery procedure
  group0_state_id_handler: remove unused group0_server_accessor
  group0_state_id_handler: consider state IDs of all non-ignored topology members
2025-10-27 10:15:49 +01:00
Avi Kivity
e85ab70054 Merge '[Backport 2025.4] tablet_metadata_guard: fix split/merge handling' from Petr Gusev
The guard should stop refreshing the ERM when the number of tablets changes. Tablet splits or merges invalidate the tablet_id field (_tablet), which means the guard can no longer correctly protect ongoing operations from tablet migrations.

The problem is specific to LWT, since tablet_metadata_guard is used mostly for heavy topology operations, which exclude with split and merge. The guard was used for LWT as an optimization -- we don't need to block topology operations or migrations of unrelated tablets. In the future, we could use the guard for regular reads/writes as well (via the token_metadata_guard wrapper).

Fixes https://github.com/scylladb/scylladb/issues/26437

backports: need to backport to 2025.4 since the bug is relevant to LWT over tablets.

(cherry picked from commit e1667afa50)

(cherry picked from commit 6f4558ed4b)

(cherry picked from commit 64ba427b85)

(cherry picked from commit ec6fba35aa)

(cherry picked from commit b23f2a2425)

(cherry picked from commit 33e9ea4a0f)

(cherry picked from commit 03d6829783)

Parent PR: https://github.com/scylladb/scylladb/pull/26619

Closes scylladb/scylladb#26700

* github.com:scylladb/scylladb:
  test_tablets_lwt: add test_tablets_merge_waits_for_lwt
  test.py: add universalasync_typed_wrap
  tablet_metadata_guard: fix split/merge handling
  tablet_metadata_guard: add debug logs
  paxos_state: shards_for_writes: improve the error message
  storage_service: barrier_and_drain – change log level to info
  topology_coordinator: fix log message
2025-10-24 21:22:49 +03:00
Petr Gusev
41f8f6b571 test_tablets_lwt: add test_tablets_merge_waits_for_lwt
(cherry picked from commit 03d6829783)
2025-10-24 12:22:20 +02:00
Petr Gusev
31e4bb1bc3 test.py: add universalasync_typed_wrap
The universalasync.wrap function doesn't preserve the
type information, which confuses the VS Code Pylance
plugin and makes code navigation hard.

In this commit we fix the problem by adding a typed
wrapped around universalasync.wrap.

Fixes: scylladb/scylladb#26639
(cherry picked from commit 33e9ea4a0f)
2025-10-24 12:21:21 +02:00
Petr Gusev
be94aab207 tablet_metadata_guard: fix split/merge handling
The guard should stop refreshing the ERM when the number of tablets
changes. Tablet splits or merges invalidate the tablet_id field
(_tablet), which means the guard can no longer correctly protect
ongoing operations from tablet migrations.

Fixes scylladb/scylladb#26437

(cherry picked from commit b23f2a2425)
2025-10-24 12:21:21 +02:00
Petr Gusev
a5be65785c tablet_metadata_guard: add debug logs
(cherry picked from commit ec6fba35aa)
2025-10-24 12:21:21 +02:00
Petr Gusev
5720dd52b8 paxos_state: shards_for_writes: improve the error message
Add the current token and tablet info, remove 'this_shard_id'
since it's always written by the logging infrastructure.

(cherry picked from commit 64ba427b85)
2025-10-24 12:21:21 +02:00
Petr Gusev
aa2021888c storage_service: barrier_and_drain – change log level to info
Debugging global barrier issues is difficult without these logs.
Since barriers do not occur frequently, increasing the log level should not produce excessive output.

(cherry picked from commit 6f4558ed4b)
2025-10-24 12:21:21 +02:00
Petr Gusev
a09c1b355e topology_coordinator: fix log message
(cherry picked from commit e1667afa50)
2025-10-24 12:21:21 +02:00
Pawel Pery
67e0c8e4b0 vector_search: fix flaky dns_refresh_aborted test
The test process like that:
- run long dns refresh process
- request for the resolve hostname with short abort_source timer - result
  should be empty list, because of aborted request

The test sometimes finishes long dns refresh before abort_source fired and the
result list is not empty.

There are two issues. First, as.reset() changes the abort_source timeout. The
patch adds a get() method to the abort_source_timeout class, so there is no
change in the abort_source timeout. Second, a sleep could be not reliable. The
patch changes the long sleep inside a dns refresh lambda into
condition_variable handling, to properly signal the end of the dns refresh
process.

Fixes: #26561
Fixes: VECTOR-268

It needs to be backported to 2025.4

Closes scylladb/scylladb#26566

(cherry picked from commit 10208c83ca)

Closes scylladb/scylladb#26598
2025-10-23 11:24:32 +02:00
Piotr Dulikowski
03d57bae80 Merge '[Backport 2025.4] storage_proxy: wait for write handlers destruction' from Scylladb[bot]
`shared_ptr<abstract_write_response_handler>` instances are captured in the `lmutate` and `rmutate` lambdas of `send_to_live_endpoints()`. As a result, an `abstract_write_response_handler` object may outlive its removal from the `storage_proxy::_response_handlers` map -> `cancel_all_write_response_handlers()` doesn't actually wait for requests completion -> `sp::drain_on_shutdown()` doesn't guarantee all requests are drained -> `sp::stop_remote()` completes too early and `paxos_store` is destroyed while LWT local writes might still be in progress. In this PR we introduce a `write_handler_destroy_promise` to wait for such pending instances in `cancel_write_handlers()` and `cancel_all_write_response_handlers()` to prevent the `use-after-free`.

A better long-term solution might be to replace `shared_ptr` with `unique_ptr` for `abstract_write_response_handler` and use a separate gate to track the `lmutate/rmutate` lambdas. We do not actually need to wait for these lambdas to finish before sending a timeout or error response to the client, as we currently do in `~abstract_write_response_handler`.

Fixes scylladb/scylladb#26355

backport: need to be backported to 2025.4 since #26355 is reproduced on LWT over tablets

- (cherry picked from commit bf2ac7ee8b)

- (cherry picked from commit b269f78fa6)

- (cherry picked from commit bbcf3f6eff)

- (cherry picked from commit 8925f31596)

Parent PR: #26408

Closes scylladb/scylladb#26658

* github.com:scylladb/scylladb:
  test_tablets_lwt: add test_lwt_shutdown
  storage_proxy: wait for write handler destruction
  storage_proxy: coroutinize cancel_write_handlers
  storage_proxy: cancel_write_handlers: don't hold a strong pointer to handler
2025-10-23 10:49:52 +02:00
Patryk Jędrzejczak
76560ca095 test: test group0 tombstone GC in the Raft-based recovery procedure
We add a regression test for the bug fixed in the previous commits.

(cherry picked from commit c57f097630)
2025-10-22 17:13:34 +00:00
Patryk Jędrzejczak
8a11535a12 group0_state_id_handler: remove unused group0_server_accessor
It became unused in the previous commit.

(cherry picked from commit 6b2e003994)
2025-10-22 17:13:34 +00:00
Patryk Jędrzejczak
d727a086c5 group0_state_id_handler: consider state IDs of all non-ignored topology members
It's not enough to consider only the current group 0 members. In the
Raft-based recovery procedure, there can be nodes that haven't joined
the current group 0 yet, but they have belonged to a different group 0
and thus have a non-empty group 0 state ID.

We fix this issue in this commit by considering topology members
instead.

We don't consider ignored nodes as an optimization. When some nodes are
dead, the group 0 state ID handler won't have to wait until all these
nodes leave the cluster. It will only have to wait until all these nodes
are ignored, which happens at the beginning of the first
removenode/replace. As a result, tombstones of group 0 tables will be
purged much sooner.

We don't rename the `group0_members` variable to keep the change
minimal. There seems to be no precise and succinct name for the used set
of nodes anyway.

We use `std::ranges::join_view` in one place because:
- `std::ranges::concat` will become available in C++26,
- `boost::range::join` is not a good option, as there is an ongoing
  effort to minimize external dependencies in Scylla.

(cherry picked from commit 1d09b9c8d0)
2025-10-22 17:13:34 +00:00
Andrei Chekun
d1274f01aa test.py: rewrite the wait_for_first_completed
Rewrite wait_for first_completed to return only first completed task guarantee
of awaiting(disappearing) all cancelled and finished tasks
Use wait_for_first_completed to avoid false pass tests in the future and issues
like #26148
Use gather_safely to await tasks and removing warning that coroutine was
not awaited

Closes scylladb/scylladb#26435

(cherry picked from commit 24d17c3ce5)

Closes scylladb/scylladb#26663
scylla-2025.4.0-rc3-candidate-20251023065946 scylla-2025.4.0-rc3
2025-10-22 18:12:52 +02:00
Michael Litvak
aa2065fe2e storage_service: improve colocated repair error to show table names
When requesting repair for tablets of a colocated table, the request
fails with an error. Improve the error message to show the table names
instead of table IDs, because the table names are more useful for users.

Fixes scylladb/scylladb#26567

Closes scylladb/scylladb#26568

(cherry picked from commit b808d84d63)

Closes scylladb/scylladb#26624
2025-10-22 15:25:15 +02:00
Asias He
5c7eb2ac61 repair: Fix uuid and nodes_down order in the log
Fixes #26536

Closes scylladb/scylladb#26547

(cherry picked from commit 33bc1669c4)

Closes scylladb/scylladb#26630
2025-10-22 14:25:18 +02:00
Tomasz Grabiec
0621a8aee5 Merge '[Backport 2025.4] Synchronize tablet split and load-and-stream' from Scylladb[bot]
Load-and-stream is broken when running concurrently to the finalization step of tablet split.

Consider this:
1) split starts
2) split finalization executes barrier and succeed
3) load-and-stream runs now, starts writing sstable (pre-split)
4) split finalization publishes changes to tablet metadata
5) load-and-stream finishes writing sstable
6) sstable cannot be loaded since it spans two tablets

two possible fixes (maybe both):

1) load-and-stream awaits for topology to quiesce
2) perform split compaction on sstable that spans both sibling tablets

This patch implements # 1. By awaiting for topology to quiesce,
we guarantee that load-and-stream only starts when there's no
chance coordinator is handling some topology operation like
split finalization.

Fixes https://github.com/scylladb/scylladb/issues/26455.

- (cherry picked from commit 3abc66da5a)

- (cherry picked from commit 4654cdc6fd)

Parent PR: #26456

Closes scylladb/scylladb#26651

* github.com:scylladb/scylladb:
  test: Add reproducer for l-a-s and split synchronization issue
  sstables_loader: Synchronize tablet split and load-and-stream
2025-10-22 14:23:04 +02:00
Jenkins Promoter
10db3f7c85 Update ScyllaDB version to: 2025.4.0-rc3 2025-10-22 14:11:52 +03:00
Michał Jadwiszczak
f6dde0aa4b test/alternator/test_tablets: add test for GSI backfill with tablets
The test should pass without the fix for scylladb/scylladb#26615,
because the `executor::updata_table()` uses
`service::prepare_new_view_announcement()`, which creates view building
tasks for the view.

But it's better to add this test.

(cherry picked from commit 34503f43a1)
2025-10-22 10:51:55 +00:00
Michał Jadwiszczak
207c273b29 test/alternator/test_tablets: add reproducer for GSI with tablets
(cherry picked from commit bdab455cbb)
2025-10-22 10:51:54 +00:00
Michał Jadwiszczak
6df48aacd7 alternator/executor: instantly mark view as built when creating it with base table
`CreateTable` request creates GSI/LSI together with the base table,
the base table is empty and we don't need to actually build the view.

In tablet-based keyspaces we can just don't create view building tasks
and mark the view build status as SUCCESS on all nodes. Then, the view
building worker on each node will mark the view as built in
`system.built_views` (`view_building_worker::update_built_views()`).

Vnode-based keyspaces will use the "old" logic of view builder, which
will process the view and mark it as built.

Fixes scylladb/scylladb#26615

(cherry picked from commit 8fbf122277)
2025-10-22 10:51:54 +00:00
Pavel Emelyanov
45341ca246 Merge '[Backport 2025.4] s3_client: handle failures which require http::request updating' from Scylladb[bot]
Apply two main changes to the s3_client error handling
1. Add a loop to s3_client's `make_request` for the case whe the retry strategy will not help since the request itself have to be updated. For example, authentication token expiration or timestamp on the request header
2. Refine the way we handle exceptions in the `chunked_download_source` background fiber, now we carry the original `exception_ptr` and also we wrap EVERY exception in `filler_exception` to prevent retry strategy trying to retry the request altogether

Fixes: https://github.com/scylladb/scylladb/issues/26483

Should be ported back to 2025.3 and 2025.4 to prevent deadlocks and failures in these versions

- (cherry picked from commit 55fb2223b6)

- (cherry picked from commit db1ca8d011)

- (cherry picked from commit 185d5cd0c6)

- (cherry picked from commit 116823a6bc)

- (cherry picked from commit 43acc0d9b9)

- (cherry picked from commit 58a1cff3db)

- (cherry picked from commit 1d34657b14)

- (cherry picked from commit 4497325cd6)

- (cherry picked from commit fdd0d66f6e)

Parent PR: #26527

Closes scylladb/scylladb#26650

* github.com:scylladb/scylladb:
  s3_client: tune logging level
  s3_client: add logging
  s3_client: improve exception handling for chunked downloads
  s3_client: fix indentation
  s3_client: add max for client level retries
  s3_client: remove `s3_retry_strategy`
  s3_client: support high-level request retries
  s3_client: just reformat `make_request`
  s3_client: unify `make_request` implementation
2025-10-22 11:33:53 +03:00
Piotr Dulikowski
1efb2eb174 view_building_worker: access tablet map through erm on sstable discovery
Currently, the data returned by `database::get_tables_metadata()` and
`database::get_token_metadata()` may not be consistent. Specifically,
the tables metadata may contain some tablet-based tables before their
tablet maps appear in the token metadata. This is going to be fixed
after issue scylladb/scylladb#24414 is closed, but for the time being
work around it by accessing the token metadata via
`table`->effective_replication_map() - that token metadata is guaranteed
to have the tablet map of the `table`.

Fixes: scylladb/scylladb#26403

Closes scylladb/scylladb#26588

(cherry picked from commit f76917956c)

Closes scylladb/scylladb#26631
2025-10-22 11:33:22 +03:00
Pavel Emelyanov
320ef84367 Merge '[Backport 2025.4] compaction/twcs: fix use after free issues' from Scylladb[bot]
The `compaction_strategy_state` class holds strategy specific state via
a `std::variant` containing different state types. When a compaction
strategy performs compaction, it retrieves a reference to its state from
the `compaction_strategy_state` object. If the table's compaction
strategy is ALTERed while a compaction is in progress, the
`compaction_strategy_state` object gets replaced, destroying the old
state. This leaves the ongoing compaction holding a dangling reference,
resulting in a use after free.

Fix this by using `seastar::shared_ptr` for the state variant
alternatives(`leveled_compaction_strategy_state_ptr` and
`time_window_compaction_strategy_state_ptr`). The compaction strategies
now hold a copy of the shared_ptr, ensuring the state remains valid for
the duration of the compaction even if the strategy is altered.

The `compaction_strategy_state` itself is still passed by reference and
only the variant alternatives use shared_ptrs. This allows ongoing
compactions to retain ownership of the state independently of the
wrapper's lifetime.

The method `maybe_wait_for_sstable_count_reduction()`, when retrieving
the list of sstables for a possible compaction, holds a reference to the
compaction strategy. If the strategy is updated during execution, it can
cause a use after free issue. To prevent this, hold a copy of the
compaction strategy so it isn’t yanked away during the method’s
execution.

Fixes #25913

Issue probably started after 9d3755f276, so backport to 2025.4

- (cherry picked from commit 1cd43bce0e)

- (cherry picked from commit 35159e5b02)

- (cherry picked from commit 18c071c94b)

Parent PR: #26593

Closes scylladb/scylladb#26625

* github.com:scylladb/scylladb:
  compaction: fix use after free when strategy is altered during compaction
  compaction/twcs: pass compaction_strategy_state to internal methods
  compaction_manager: hold a copy to compaction strategy in maybe_wait_for_sstable_count_reduction
2025-10-22 11:32:28 +03:00
Petr Gusev
01658f9fcb test_tablets_lwt: add test_lwt_shutdown
(cherry picked from commit 8925f31596)
2025-10-22 00:10:59 +00:00
Petr Gusev
e56f14b9c5 storage_proxy: wait for write handler destruction
shared_ptr<abstract_write_response_handler> instances are captured in
the lmutate/rmutate lambdas of send_to_live_endpoints(). As a result,
an abstract_write_response_handler object may outlive its removal from
the _response_handlers map. We use write_handler_destroy_promise to
wait for such pending instances in cancel_write_handlers() and
cancel_all_write_response_handlers() to prevent use-after-free.

A better long-term solution might be to replace shared_ptr with
unique_ptr for abstract_write_response_handler and use a separate gate
to track the lmutate/rmutate lambdas. We do not actually need to wait
for these lambdas to finish before sending a timeout or error response
to the client, as we currently do in ~abstract_write_response_handler.

Fixes scylladb/scylladb#26355

(cherry picked from commit bbcf3f6eff)
2025-10-22 00:10:59 +00:00
Petr Gusev
5865dad0c9 storage_proxy: coroutinize cancel_write_handlers
The cancel_write_handlers() method was assumed to be called in a thread
context, likely because it was first used from gossiper events, where a
thread context already existed. Later, this method was reused in
abort_view_writes() and abort_batch_writes(), where threads are created
on the fly and appear redundant.

The drain_on_shutdown() method also used a thread, justified by some
"delicate lifetime issues", but it is unclear what that actually means.
It seems that a straightforward co_await should work just fine.

(cherry picked from commit b269f78fa6)
2025-10-22 00:10:59 +00:00
Petr Gusev
388dfbe3ee storage_proxy: cancel_write_handlers: don't hold a strong pointer to handler
A strong pointer was held for the duration of thread::yield(),
preventing abstract_write_response_handler destruction and possibly
delaying the sending of timeout or error responses to the client.

This commit removes the strong pointer. Instead, we compute the
next iterator before calling timeout_cb(), so if the handler is
destroyed inside timeout_cb(), we already have a valid next iterator.

(cherry picked from commit bf2ac7ee8b)
2025-10-22 00:10:59 +00:00
Raphael S. Carvalho
92a603699e test: Add reproducer for l-a-s and split synchronization issue
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 4654cdc6fd)
2025-10-21 12:26:55 +00:00
Raphael S. Carvalho
d998d9d418 sstables_loader: Synchronize tablet split and load-and-stream
Load-and-stream is broken when running concurrently to the
finalization step of tablet split.

Consider this:
1) split starts
2) split finalization executes barrier and succeed
3) load-and-stream runs now, starts writing sstable (pre-split)
4) split finalization publishes changes to tablet metadata
5) load-and-stream finishes writing sstable
6) sstable cannot be loaded since it spans two tablets

two possible fixes (maybe both):

1) load-and-stream awaits for topology to quiesce
2) perform split compaction on sstable that spans both sibling tablets

This patch implements #1. By awaiting for topology to quiesce,
we guarantee that load-and-stream only starts when there's no
chance coordinator is handling some topology operation like
split finalization.

Fixes #26455.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 3abc66da5a)
2025-10-21 12:26:54 +00:00