Tests sometimes fail in ScyllaCluster.add_server on the
'replaced_srv.host_id' line because host_id is not resolved yet. In
this commit we introduce functions try_get_host_id and get_host_id
that resolve it when needed.
Closesscylladb/scylladb#25177
This PR implements solution proposed in scylladb/scylladb#24481
Instead of terminating connections immediately, the shutdown now proceeds in two stages: first closing the receive (input) side to stop new requests, then waiting for all active requests to complete before fully closing the connections.
The updated shutdown process is as follows:
1. Initial Shutdown Phase
* Close the accept gate to block new incoming connections.
* Abort all accept() calls.
* For all active connections:
* Close only the input side of the connection to prevent new requests.
* Keep the output side open to allow responses to be sent.
2. Drain Phase
* Wait for all in-progress requests to either complete or fail.
3. Final Shutdown Phase
* Fully close all connections.
Fixesscylladb/scylladb#24481Closesscylladb/scylladb#24499
* https://github.com/scylladb/scylladb:
test: Set `request_timeout_on_shutdown_in_seconds` to `request_timeout_in_ms`, decrease request timeout.
generic_server: Two-step connection shutdown.
transport: consmetic change, remove extra blanks.
transport: Handle sleep aborted exception in sleep_until_timeout_passes
generic_server: replace empty destructor with `= default`
generic_server: refactor connection::shutdown to use `shutdown_input` and `shutdown_output`
generic_server: add `shutdown_input` and `shutdown_output` functions to `connection` class.
test: Add test for query execution during CQL server shutdown
This patch adds an xfailing test reproducing a bug where when adding
an IF NOT EXISTS to a INSERT JSON statement, the IF NOT EXISTS is
ignored.
This bug has been known for 4 years (issue #8682) and even has a FIXME
referring to it in cql3/statements/update_statement.cc, but until now
we didn't have a reproducing test.
The tests in this patch also show that this bug is specific to
INSERT JSON - regular INSERT works correctly - and also that
Cassandra works correctly (and passes the test).
Refs #8682
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25244
* seastar 60b2e7da...7c32d290 (14):
> posix: Replace static_assert with concept
> tls: Push iovec with the help of put(vector<temporary_buffer>)
> io_queue: Narrow down friendship with reactor
> util: drop concepts.hh
> reactor: Re-use posix::to_timespec() helper
> Fix incorrect defaults for io queue iops/bandwidth
> net: functions describing ssl connection
> Add label values to the duplicate metrics exception
> Merge 'Nested scheduling groups (CPU only)' from Pavel Emelyanov
test: Add unit test for cross-sched-groups wakeups
test: Add unit test for fair CPU scheduling
test: Add unit test for basic supergrops manipulations
test: Add perf test for context switch latency
scheduling: Add an internal method to get group's supergroup
reactor: Add supergroup get_shares() API
reactor: Add supergroup::set_shares() API
reactor: Create scheduling groups in supergroups
reactor: Supergroups destroying API
reactor: Supergroups creating API
reactor: Pass parent pointer to task_queue from caller
reactor: Wakeup queue group on child activation
reactor: Add pure virtual sched_entity::run_tasks() method
reactor: Make task_queue_group be sched_entity too
reactor: Split task_queue_group::run_some_tasks()
reactor: Count and limit supergroup children
reactor: Link sched entity to its parent
reactor: Switch activate(task_queue*) to work on sched_entity
reactor: Move set_shares() to sched_entity()
reactor: Make account_runtime() work with sched_entity
reactor: Make insert_activating_task_queue() work on sched_entity
reactor: Make pop_active_task_queue() work on sched_entity
reactor: Make insert_active_task_queue() work on sched_entity
reactor: Move timings to sched_entity
reactor: Move active bit to sched_entity
reactor: Move shares to sched_entity
reactor: Move vruntime to sched_entity
reactor: Introduce sched_entity
reactor: Rename _activating_task_queues -> _activating
reactor: Remove local atq* variable
reactor: Rename _active_task_queues -> _active
reactor: Move account_runtime() to task_queue_group
reactor: Move vruntime update from task_queue into _group
reactor: Simplify task_queue_group::run_some_tasks()
reactor: Move run_some_tasks() into task_queue_group
reactor: Move insert_activating_task_queues() into task_queue_group
reactor: Move pop_active_task_queue() into task_queue_group
reactor: Move insert_active_task_queue() into task_queue_group
reactor: Introduce and use task_queue_group::activate(task_queue)
reactor: Introduce task_queue_group::active()
reactor: Wrap scheduling fields into task_queue_group
reactor: Simplify task_queue::activate()
reactor: Rename task_queue::activate() -> wakeup()
reactor: Make activate() method of class task_queue
reactor: Make task_queue::run_tasks() return bool
reactor: Simplify task_queue::run_tasks()
reactor: Make run_tasks() method of class task_queue
> Fix hang in io_queue for big write ioproperties numbers
> split random io buffer size in 2 options
> reactor: document run_in_background
> Merge 'Add io_queue unit test for checking request rates' from Robert Bindar
Add unit test for validating computed params in io_queue
Move `disk_params` and `disk_config_params` to their own unit
Add an overload for `disk_config_params::generate_config`
Closesscylladb/scylladb#25254
This small series fixes two small bugs in the "--release" feature of test/cqlpy/run and test/alternator/run, which allows a developer to run signle-node functional tests against any past release of Scylla. The two patches fix:
1. Allow "run --release" to be used when Scylla has not even been built from source.
2. Fix a mistake in choosing the most recent release when only a ".0" and RC releases are available. This is currently the case for the 2025.2 branch, which is why I discovered the bug now.
Fixes#25223
This patch only affects developer's experience if using the test/cqlpy/run script manually (these scripts are not used by CI), so should not be backported.
Closesscylladb/scylladb#25227
* https://github.com/scylladb/scylladb:
test/cqlpy: fix fetch_scylla.py for .0 releases
test/cqlpy: fix "run --release" when Scylla hasn't been built
When repairing a partition with many rows, we can store many fragments in a repair_row_on_wire object which is sent as a rpc stream message.
This could cause reactor stalls when the rpc stream compression is turned on, because the compression compresses the whole message without any split and compression.
This patch solves the problem at the higher level by reducing the message size that is sent to the rpc stream.
Tests are added to make sure the message split works.
Fixes#24808Closesscylladb/scylladb#25002
* github.com:scylladb/scylladb:
repair: Avoid too many fragments in a single repair_row_on_wire
repair: Change partition_key_and_mutation_fragments to use chunked_vector
utils: Allow chunked_vector::erase to work with non-default-constructible type
When running compactions are aborted by the aforementioned helper, in logs there appear a line like
"Compaction for ks/cf was stopped due to: user-triggered operation". This message could've been better, since it may indicate several distinct reasons described with the same "user-triggered operation".
With this PR the message will help telling "truncate", "cleanup", "rewrite" and "split" from each other.
Closesscylladb/scylladb#25136
* https://github.com/scylladb/scylladb:
compaction: Pass "reason" to perform_task_on_all_files()
compaction: Pass "reason" to run_with_compaction_disabled()
compaction: Pass "reason" to stop_and_disable_compaction()
Instead of using lambda, pass pointer to struct member. The result is
the same, but the code is nicer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25123
decrease request timeout.
In debug mode, queries may sometimes take longer than the default 30 seconds.
To address this, the timeout value `request_timeout_on_shutdown_in_seconds`
during tests is aligned with other request timeouts.
Change request timeout for tests from 180s to 90s since we must keep the request
timeout during shutdown significantly lower than the graceful shutdown timeout(2m),
or else a request timeout would cause a graceful shutdown timeout and fail a test.
Add in docs/alternator/compatibility.md a mention of the ShardFilter
option which we don't support in Alternator Streams. This option was
only introduced to DynamoDB a week ago, so it's not surprising we
don't yet support it :-)
Refs #25160Closesscylladb/scylladb#25161
Documentation had outdated information how to run C++ test.
Additionally, some information added about gathered test metrics.
Closesscylladb/scylladb#25180
We're providing additional information in error messages when throwing
an exception related to data corruption: when a segment is truncated
and when it's content is invalid. That might prove helpful when debugging.
Closesscylladb/scylladb#25190
This PR adds the upgrade guide from version 2025.2 to 2025.3.
Also, it removes the upgrade guide existing for the previous version
that is irrelevant in 2025.2 (upgrade from 2025.1 to 2025.2).
Note that the new guide does not include the "Enable Consistent Topology Updates" page and note,
as users upgrading to 2025.3 have consistent topology updates already enabled.
Fixes https://github.com/scylladb/scylladb/issues/24696Closesscylladb/scylladb#25219
std::enable_if is obsolete and was replaced with concepts
and constraint.
Replace the std::is_fundamental_v enable_if constraint with
std::integral. The latter is more accurate - std::ntoh()
is not defined for floats, for example. In any case, we only
read integrals in commitlog.
Closesscylladb/scylladb#25226
Unless the client uses the SKIP_METADATA flag,
Scylla attaches some metadata to query results returned to the CQL
client.
In particular, it attaches the spec (keyspace name, table
name, name, type) of the returned columns.
By default, the keyspace name and table name is present in each column
spec. However, since they are almost always the same for every column
(I can't think of any case when they aren't the same;
it would make sense if Cassandra supported joins, but it doesn't)
that's a waste.
So, as an optimization, the CQL protocol has the GLOBAL_TABLES_SPEC flag.
The flag can be set if all columns belong to the same table,
and if is set, then the keyspace and table name are only written
in the first column spec, and skipped in other column specs.
Scylla sets this flag, if appropriate, in responses to a PREPARE requests.
But it never sets the flag in responses to queries.
But it could. And this patch causes it to do that.
Fixes#17788Closesscylladb/scylladb#25205
A recent commit a0c29055e5 added
some trace printouts which print an std::reference_wrapper<>.
Apparently a formatter for this type was only added to fmt
in version 11.1.0, and it doesn't exist on earlier versions,
such as fmt 11.0.2 on Fedora 41.
Let's avoid requiring shiny-new versions of fmt. The workaround
is easy: just unwrap the reference_wrapper - print pr.get()
instead of just pr, and Scylla returns to building correctly on
Fedora 41.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25228
Previously, `raft_group0::abort()` was called in `storage_service::do_drain` (introduced in #24418) to stop the group0 Raft server before destroying local storage. This was necessary because `raft::server` depends on storage (via `raft_sys_table_storage` and `group0_state_machine`).
However, this caused issues: services like `sstable_dict_autotrainer` and `auth::service`, which use `group0_client` but are not stopped by `storage_service`, could trigger use-after-free if `raft_group0` was destroyed too early. This can happen both during normal shutdown and when 'nodetool drain' is used.
This PR reworks the shutdown logic:
* Introduces `abort_and_drain()`, which aborts the server and waits for background tasks to finish, but keeps the server object alive. Clients will see `raft::stopped_error` if they try to access group0 after this method is called.
* Final destruction now happens in `abort_and_destroy()`, called later from `main.cc`, ensuring safe cleanup.
The `raft_server_for_group::aborted` is changed to a `shared_future`, as it is now awaited in both abort methods.
Node startup can fail before reaching `storage_service`, in which case `drain_on_shutdown()` and `abort_and_drain()` are never called. To ensure proper cleanup, `raft_group0` deinitialization logic must be included in both `abort_and_drain()` and `abort_and_destroy()`.
Refs #25115Fixes#24625
Backport: the changes are complicated and not safe to backport, we'll backport a revert of the original patch (#24418) in a separate PR.
Closesscylladb/scylladb#25151
* https://github.com/scylladb/scylladb:
raft_group0: split shutdown into abort_and_drain and destroy
Revert "main.cc: fix group0 shutdown order"
When repairing a partition with many rows, we can store many fragments
in a repair_row_on_wire object which is sent as a rpc stream message.
This could cause reactor stalls when the rpc stream compression is
turned on, because the compression compresses the whole message without
any split and compression.
This patch solves the problem at the higher level by reducing the
message size that is sent to the rpc stream.
Tests are added to make sure the message split works.
Fixes#24808
With the change in "repair: Avoid too many fragments in a single
repair_row_on_wire", the
std::list<frozen_mutation_fragment> _mfs;
in partition_key_and_mutation_fragments will not contain large number of
fragments any more. Switch to use chunked_vector.
The initial support for nested containers (2d2a2ef277) worked on
my machine (tm) and even laptop, but does not work on fresh installs.
This is likely due to changes in where persistent configuration is
stored on the host between various podman versions; even though my
podman is fully updated, it uses configuration created long ago.
Make nested containers work on fresh installs by also configuring
/etc/containers/storage.conf. The important piece is to set graphroot
to the same location as the host.
Verified both on my machine and on a fresh install.
Closesscylladb/scylladb#25156
Nowadays the way to configure an internal service is
1. service declares its config struct
2. caller (main/test/tool) fills the respective config with values it wants
3. the service is started with the config passed by value
The feature service code behaves likewise, but provides a helper method to create its config out of db::config. This PR moves this helper out of gms code, so that it doesn't mess with system-wide db::config and only needs its own small struct feature_config.
For the reference: similar changes with other services: #23705 , #20174 , #19166Closesscylladb/scylladb#25118
* github.com:scylladb/scylladb:
gms,init: Move get_disabled_features_from_db_config() from gms
code: Update callers generating feature service config
gms: Make feature_config a simple struct
gms: Split feature_config_from_db_config() into two
This commit:
- Extends the Drivers support table with information on which driver supports tablets
and since which version.
- Adds the driver support policy to the Drivers page.
- Reorganizes the Drivers page to accommodate the updates.
In addition:
- The CPP-over-Rust driver is added to the table.
- The information about Serverless (which we don't support) is removed
and replaced with tablets to correctly describe the contents of the table.
Fixes https://github.com/scylladb/scylladb/issues/19471
Refs https://github.com/scylladb/scylladb-docs-homepage/issues/69Closesscylladb/scylladb#24635
Compaction is routine and the log messages pollute the log files,
hiding important information.
All the data is available via `nodetool compactionhistory`.
Reduce noise by demoting those log messages to debug level.
One test is adjusted to use debug level for compaction, since it
listens for those messages.
Closesscylladb/scylladb#24949
The test/cqlpy/fetch_scylla.py script is used by test/cqlpy/run and
test/alternator/run to implement their "--release" option - which allows
you to run current tests against any official release of Scylla
downloaded from Scylla's S3 bucket.
When you ask to get release "2025.1", the idea is to fetch the latest
release available in the 2025.1 stream - currently it is 2025.1.5.
fetch_scylla.py does this by listing the available 2025.1 releases,
sorting them and fetching the last one.
We had a bug in the sort order - version 0 was sorted before version
0-rc1, which is incorrect (the version 2025.2.0 came after
2025.2.0~rc1).
For most releases this didn't cause any problem - 0~rc1 was sorted after
0, but 5 (for example) came after both, so 2025.1.5 got downloaded.
But when a release has **only** an rc and a .0 release, we incorrectly
used the rc instead of the .0.
This patch fixes the sort order by using the "/" character, which sorts
before "0", in rc version strings when sorting the release numbers.
Before this patch, we had this problem in "--release 2025.2" because
currently 2025.2 only has RC releases (rc0 and rc1) and a .0 release,
and we wrongly downloaded the rc1. After this patch, the .0 is chosen
as expected:
$ test/cqlpy/run --release 2025.2
Chosen download for ScyllaDB 2025.2: 2025.2.0
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The "--release" option of test/cqlpy/run can be used to run current
cqlpy tests against any official release of Scylla, which is
automatically downloaded from Scylla's S3 bucket. You should be
able to run tests like that even without having compiled Scylla
from source. But we had a bug, where test/cqlpy/run looked for
the built Scylla executable *before* parsing the "--release"
option, and this bug is fixed in this patch.
The Alternator version of the run script, test/alternator/run,
doesn't need to be fixed because it already did things in the
right order.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
We're enabling the configuration option `rf_rack_valid_keyspaces`
in all Python test suites. All relevant tests have been adjusted
to work with it enabled.
That encompasses the following suites:
* alternator,
* broadcast_tables,
* cluster (already enabled in scylladb/scylladb@ee96f8dcfc),
* cql,
* cqlpy (already enabled in scylladb/scylladb@be0877ce69),
* nodetool,
* rest_api.
Two remaining suites that use tests written in Python, redis and scylla_gdb,
are not affected, at least not directly.
The redis suite requires creating an instance of Scylla manually, and the tests
don't do anything that could violate the restriction.
The scylla_gdb suite focuses on testing the capabilities of scylla-gdb.py, but
even then it reuses the `run` file from the cqlpy suite.
Fixesscylladb/scylladb#25126Closesscylladb/scylladb#24617
Commit ddc3b6dcf5 added a check of group0 state in
get_schema_for_write(), but group0 client can only be used on shard 0,
and get_schema_for_write() can be called on any shard, so we cannot use
_group0_client there directly. Move assert where we use another group0
function already where it is guarantied to run on shard 0.
Closesscylladb/scylladb#25204
This PR enables **LWT (Lightweight Transactions)** support for tablet-based tables by leveraging **colocated tables**.
Currently, storing Paxos state in system tables causes two major issues:
* **Loss of Paxos state during tablet migration or base table rebuilds**
* When a tablet is migrated or the base table is rebuilt, system tables don't retain Paxos state.
* This breaks LWT correctness in certain scenarios.
* Failing test cases demonstrating this:
* test_lwt_state_is_preserved_on_tablet_migration
* test_lwt_state_is_preserved_on_rebuild
* **Shard misalignment and performance overhead**
* Tablets may be placed on arbitrary shards by the tablet balancer.
* Accessing Paxos state in system tables could require a shard jump, degrading performance.
We move Paxos state into a dedicated Paxos table, colocated with the base table:
* Each base table gets its own Paxos state table.
* This table is lazily created on the first LWT operation.
* Its tablets are colocated with those of the base table, ensuring:
* Co-migration during tablet movement
* Co-rebuilding with the base table
* Shard alignment for local access to Paxos state
Some reasoning for why this is sufficient to preserve LWT correctness is discussed in [2].
This PR addresses two issues from the "Why doesn't it work for tablets" section in [1]:
* Tablet migration vs LWT correctness
* Paxos table sharding
Other issues ("bounce to shard" and "locking for intranode_migration") have already been resolved in previous PRs.
References
[1] - [LWT over tablets design](https://docs.google.com/document/d/1CPm0N9XFUcZ8zILpTkfP5O4EtlwGsXg_TU4-1m7dTuM/edit?tab=t.0#heading=h.goufx7gx24yu)
[2] - [LWT: Paxos state and tablet balancer](https://docs.google.com/document/d/1-xubDo612GGgguc0khCj5ukmMGgLGCLWLIeG6GtHTY4/edit?tab=t.0)
[3] - [Colocated tables PR](https://github.com/scylladb/scylladb/pull/22906#issuecomment-3027123886)
[4] - [Possible LWT consistency violations after a topology change](https://github.com/scylladb/scylladb/issues/5251)
Backport: not needed because this is a new feature.
Closesscylladb/scylladb#24819
* github.com:scylladb/scylladb:
create_keyspace: fix warning for tablets
docs: fix lwt.rst
docs: fix tablets.rst
alternator: enable LWT
random_failures: enable execute_lwt_transaction
test_tablets_lwt: add test_paxos_state_table_permissions
test_tablets_lwt: add test_lwt_for_tablets_is_not_supported_without_raft
test_tablets_lwt: test timeout creating paxos state table
test_tablets_lwt: add test_lwt_concurrent_base_table_recreation
test_tablets_lwt: add test_lwt_state_is_preserved_on_rebuild
test_tablets_lwt: migrate test_lwt_support_with_tablets
test_tablets_lwt: add test_lwt_state_is_preserved_on_tablet_migration
test_tablets_lwt: add simple test for LWT
check_internal_table_permissions: handle Paxos state tables
client_state: extract check_internal_table_permissions
paxos_store: handle base table removal
database: get_base_table_for_tablet_colocation: handle paxos state table
paxos_state: use node_local_only mode to access paxos state
query_options: add node_local_only mode
storage_proxy: handle node_local_only in query
storage_proxy: handle node_local_only in mutate
storage_proxy: introduce node_local_only flag
abstract_replication_strategy: remove unused using
storage_proxy: add coordinator_mutate_options
storage_proxy: rename create_write_response_handler -> make_write_response_handler
storage_proxy: simplify mutate_prepare
paxos_state: lazily create paxos state table
migration_manager: add timeout to start_group0_operation and announce
paxos_store: use non-internal queries
qp: make make_internal_options public
paxos_store: conditional cf_id filter
paxos_store: coroutinize
feature_service: add LWT_WITH_TABLETS feature
paxos_state: inline system_keyspace functions into paxos_store
paxos_state: extract state access functions into paxos_store
Otherwise, tablet rebuilt will be delayed for up to 60s, as the tablet
scheduler needs load stats for the new node (replacing) to make
decisisons.
Fixes#25163Closesscylladb/scylladb#25181
When shutting down in `generic_server`, connections are now closed in two steps.
First, only the RX (receive) side is shut down. Then, after all ongoing requests
are completed, or a timeout happened the connections are fully closed.
Fixesscylladb/scylladb#24481
In PR #23156, a new function `sleep_until_timeout_passes` was introduced
to wait until a read request times out or completes. However, the function
did not handle cases where the sleep is aborted via _abort_source, which
could result in WARN messages like "Exceptional future is ignored" during
shutdown.
This change adds proper handling for that exception, eliminating the warning.
This change improves logging and modifies the behavior to attempt closing
the output side of a connection even if an error occurs while closing the input side.
`connection` class.
The functions are just wrappers for _fd.shutdown_input() and _fd.shutdown_output(), with added error reporting.
Needed by later changes.
This test simulates a scenario where a query is being executed while
the query coordinator begins shutting down the CQL server and client
connections. The shutdown process should wait until the query execution
is either completed or timed out.
Test for scylladb/scylladb#24481
Whilst the coredump script checks for prerequisites, the user
experience is not ideal because you either have to go in the
script and get the list of deps and install them or wait for
the script to complain about lacking dependencies one by one.
This commit completes the list of dependencies in the
install script (some of them were already there for Fedora),
so you already have them installed by the time you
get to run the coredump script.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
[avi:
- remove trailing whitespace
- regenerate frozen toolchain
Optimized clang binaries generated and stored in
https://devpkg.scylladb.com/clang/clang-20.1.8-Fedora-42-aarch64.tar.gzhttps://devpkg.scylladb.com/clang/clang-20.1.8-Fedora-42-x86_64.tar.gz
]
Closes#22369Closesscylladb/scylladb#25203
Unlike the currently-used sstable index files, BTI indexes don't store the entire partition keys. They only store prefixes of decorated keys, up to the minimum length needed to differentiate a key from its neighbours in the sstable. This saves space.
However, it means that a BTI index query might be off by one partition (on each end of the queried partition range) with respect to the optimal Data position.
For example, if the index stores prefixes `a`, `b`, `c`,
the index has no way to know if the first index entry after key `bb`
is `b` (which might correspond to `ba` as well as `bc`), or `c`.
So the index reader conservatively has to pick the wider Data range, and the Data reader must ignore the superfluous partitions. (And there's no way around that.)
Before this patch, the sstable reader expects the index query to return an exact (optimal) Data range. This patch adjusts the logic of the sstable reader to allow for inexact ranges.
Note: the patch is more complicated that it looks. The logic of the sstable reader was already fairly hard to follow and this adds even more flags, more weird special states and more edge cases. I think I managed to write a decent test and it did find three or four edge cases I wouldn't have noticed otherwise. I think it should cover all the added logic, but I didn't verify code coverage. (Do our scripts for that even work nowadays)? Simplification ideas are welcome.
Preparation for new functionality, no backporting needed.
Closesscylladb/scylladb#25093
* github.com:scylladb/scylladb:
sstables/index_reader: weaken some exactness guarantees in abstract_index_reader
test/boost: add a test for inexact index lookups
sstables/mx/reader: allow passing a custom index reader to the constructor
sstables/index_reader: remove advance_to
sstables/mx/reader: handle inexact lookups in `advance_context()`
sstables/mx/reader: handle inexact lookups in `advance_to_next_partition()`
sstables/index_reader: make the return value of `get_partition_key` optional
sstables/mx/reader: handle "backward jumps" in forward_to
sstables/mx/reader: filter out partitions outside the queried range
sstables/mx/reader: update _pr after `fast_forward_to`
As they are wasteful in many cases, it is better
to move the tablet_map if possible, or clone
it gently in an async fiber.
Add clone() and clone_gently() methods to
allow explicit copies.
* minor optimization, no backport needed
Closesscylladb/scylladb#24978
* github.com:scylladb/scylladb:
tablets: prevent accidental copy of tablets_map
locator: tablets: get rid of synchronous mutate_tablet_map
The `token_metadata_impl` stores the sorted tokens in an `std::vector`.
With a large number of nodes, the size of this vector can grow quickly,
and updating it might lead to oversized allocations.
This commit changes `_sorted_tokens` to a `chunked_vector` to avoid such
issues. It also updates all related code to use `chunked_vector` instead
of `std::vector`.
Fixes#24876
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#25027
A tablet repair started with /storage_service/repair_async/ API
bypasses tablet repair scheduler and repairs only the tablets
that are owned by the requested node. Due to that, to safely repair
the whole keyspace, we need to first disable tablet migrations
and then start repair on all nodes.
With the new API - /storage_service/tablets/repair -
tailored to tablet repair requirements, we do not need additional
preparation before repair. We may request it on one node in
a cluster only and, thanks to tablet repair scheduler,
a whole keyspace will be safely repaired.
Both nodetool and Scylla Manager have already started using
the new API to repair tablets.
Refuse repairing tablet keyspaces with /storage_service/repair_async -
403 Forbidden is returned. repair_async should still be used to repair
vnode keyspaces.
Fixes: https://github.com/scylladb/scylladb/issues/23008.
Breaking change; no backport.
Closesscylladb/scylladb#24678
* github.com:scylladb/scylladb:
repair: remove unused code
api: repair_async: forbid repairing tablet keyspaces
We improve logging in critical functions in hinted handoff
to capture more information about the behavior of the module.
That should help us in debugging sessions.
The logs should only be printed during more important events
and so they should not clog the log files.
Backport: not necessary.
Closesscylladb/scylladb#25031
* github.com:scylladb/scylladb:
db/hints/manager.cc: Add logs for changing host filter
db/hints: Increase log level in critical functions
The view builder uses group0 operations to coordinate view building, so
we should drain the view builder before stopping group0.
Fixesscylladb/scylladb#25096Closesscylladb/scylladb#25101
The init.hh contains some bits that only main.cc needs. Some of its
forward declarations are neede by neither the headers itself, nor the
main.cc that includes it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#25110