There are now four of those and these are all the same in the way they
interpret the value parameter (though it's named differently)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Both values are in fact db::config named values. They are observed by,
respectively, compaction manager and stream manager: when changed, the
observer kicks corresponding sched group's update_io_bandwidth() method.
Despite being referenced by managers, there's no way to update those
values anyhow other than updating config's named values themselves.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some endpoints in config block will need to actually _update_ values on
config (see next patches why), and const reference stands on the way.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In order to get stream throughput, the API will need stream_manager.
In order to set stream throughput, the API will need db::config to
update the corresponding named value on it.
Said that, move the endpoints to relevant blocks.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In order to update compaction throughput API would need to update the
db::config value, so the endpoint in question should sit in the block
that has db::config at hand.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The operator() of named_value() prints the allowed values on error which
can be a vector, so the ranges formatting should be there.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
* seastar 72c7ac575...3133ecdd6 (12):
> util/backtrace: Optimize formatter to reduce memory allocation overhead
> scheduler: Report long queue stall
> log: drop specialization of boost::lexical_cast for log_level
> stall-detector: Remove unused _stall_detector_reports_per_minute
> Merge 'when_all: add Sentinel support to when_all_succeed() ' from Kefu Chai
> scripts/perftune.py: Implement AWS IMDSv2 call
> net/tls: Add a way to disable certificate validation
> tests: Improve websocket parser tests
> scripts/stall-analyser: improve error messages on invalid input
> reserve-memory: document that seastar just doesnt use the reserves
> Merge 'Minor metrics memory optimizations' from Stephan Dollberg
> json_formatter: Add support for standard range containers
Closesscylladb/scylladb#21869
Store the endpoint ip address together with the client (note it may be
different from the address the client is connected to in case
preferable address is different). This allows up to drop lookup in the
address map which may eventually fail if an endpoint was already
deleted.
Fixes: scylladb/scylladb#21840
Message-ID: <Z1mpMMe-o0ggBU_F@scylladb.com>
"
The series moves node ops, repair and streaming verbs to IDL. Also
contains IDL related cleanups.
In addition to the CI tested manually by bootstrapping a node with the
series into a cluster of old nodes with repair and streaming both in
gossiper and raft mode. This exercises repair, streaming and node_ops
paths.
"
* 'gleb/move-more-rpcs-to-idl-v3' of github.com:scylladb/scylla-dev:
repair: repair_flush_hints_batchlog_request::target_nodes is not used any more, so mark it as such
streaming: move streaming verbs to IDL
messaging_service: move repair verbs to IDL
node_ops: move node_ops_cmd to IDL
idl: rename partition_checksum.dist.hh to repair.dist.hh
idl: move node_ops related stuff from the repair related IDL
This change goes thru locator:topology to use node&
instead of node* where nullptr is not possible. There are
places where the node object is used in unordered_set, in
those cases the node is wrapped in std::reference_wrapper.
Fixesscylladb/scylladb#20357Closesscylladb/scylladb#21863
Reads which need sstable index were computing
column_values_fixed_lengths each time. This showed up in perf profile
for a sstable-read heavy workload, and amounted to about 1-2% of time.
Computing it involves type name parsing.
Avoid by using cached per-sstable mapping. There is already
sstable::_column_translation which can be used for this. It caches the
mapping for the least-recently used schema. Since the cursor uses the
mapping only for primary key columns, which are stable, any schema
will do, so we can use the last _column_translation. We only need to
make sure that it's always armed, so sstable loading is augmented with
arming with sstable's schema.
Also, fixes a potential use-after-free on schema in column_translation.
Closesscylladb/scylladb#21347
* github.com:scylladb/scylladb:
sstables: Fix potential use-after-free on column_translation::column_info::name
sstables: Avoid computing column_values_fixed_lengths on each read
Replace manual subrange advancement with the more concise and readable
`subrange.advance()` method. This change:
- Eliminates unnecessary subrange instance creation
- Improves code readability
- Reduces potential for unnecessary object allocation
- Leverages the built-in `advance()` method for cleaner iterator handling
The modification simplifies the iteration logic while maintaining the
same functional behavior.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21865
This commit removes the information about the recommended way of upgrading
ScyllaDB images - by updating ScyllaDB and OS packages in one step. This upgrade
procedure is not supported (it was implemented, but then reverted).
Refs https://github.com/scylladb/scylladb/issues/15733Closesscylladb/scylladb#21876
Replace `dht/sharder.hh` with a "smaller" header, which provides
just the enough dependencies.
in f744007e, we traded `database.hh` with a smaller set of headers.
but it turns out `dht/sharder.hh` can be replaced with a even smaller
one. because `dht::sharder` is defined by `dht/token-sharding.hh`, and
what we need from `dht/sharder.hh` is this class's declaration.
`clang-include-cleaner` identified this issue.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21881
Update the service level cache in the node startup sequence, after the
service level and auth service are initialized.
The cache update depends on the service level data accessor being set
and the auth service being initialized. Before the commit, it may happen that a
cache update is not triggered after the initialization. The commit adds
an explicit call to update the cache where it is guaranteed to be ready.
Fixesscylladb/scylladb#21763Closesscylladb/scylladb#21773
Test cases related to #21826:
1. test_remove_failure_with_no_normal_token_owners_in_dc: attempts to
remove a node with another node down in the datacenter, leaving
no normal token owners in that dc (reproducing #21826).
Removenode is expected to fail in this case since it
should have no place to rebuild the removed node replicas,
yet it currently succeeds unexpectedly.
2. test_remove_failure_then_replace: verify that removenode
fails as expected when there are not enough nodes to
rebuild its replicas on, with and without additional zero-token nodes.
3. test_replace_with_no_normal_token_owners_in_dc: verify that
nodes can be replaced in a datacenter that has no live
token owners, with and without additional zero-token nodes.
Tablet replace uses all replicas to rebuild the lost replicas
and therefore should succeed in the edge case.
The restored data is verified as well.
Refs #21826
* New tests, no backport needed
Closesscylladb/scylladb#21827
* github.com:scylladb/scylladb:
topology_custom/test_tablets: add remove/replace tests for edge cases
test: pylib: _cluster_remove_node: log message on successful paths
test: pylib: _cluster_remove_node: mark server as removed only when removenode succeeded
Test cases related to #21826:
1. test_remove_failure_with_no_normal_token_owners_in_dc: attempts to
remove a node with another node down in the datacenter, leaving
no normal token owners in that dc (reproducing #21826).
Removenode is expected to fail in this case since it
should have no place to rebuild the removed node replicas,
yet it currently succeeds unexpectedly.
2. test_remove_failure_then_replace: verify that removenode
fails as expected when there are not enough nodes to
rebuild its replicas on, with and without additional zero-token nodes.
3. test_replace_with_no_normal_token_owners_in_dc: verify that
nodes can be replaced in a datacenter that has no live
token owners, with and without additional zero-token nodes.
Tablet replace uses all replicas to rebuild the lost replicas
and therefore should succeed in the edge case.
The restored data is verified as well.
Refs #21826
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently truncating a table works by issuing an RPC to all the nodes which call `database::truncate_table_on_all_shards()`, which makes sure that older writes are dropped.
It works with tablets, but is not safe. A concurrent replication process may bring back old data.
This change makes makes TRUNCATE TABLE a topology operation, so that it excludes with other processes in the system which could interfere with it. More specifically, it makes TRUNCATE a global topology request.
Backporting is not needed.
Fixes#16411Closesscylladb/scylladb#19789
* github.com:scylladb/scylladb:
docs: docs: topology-over-raft: Document truncate_table request
storage_proxy: fix indentation and remove empty catch/rethrow
test: add tests for truncate with tablets
storage_proxy: use new TRUNCATE for tablets
truncate: make TRUNCATE a global topology operation
storage_service: move logic of wait_for_topology_request_completion()
RPC: add truncate_with_tablets RPC with frozen_topology_guard
feature_service: added cluster feature for system.topology schema change
system.topology_requests: change schema
storage_proxy: propagate group0 client and TSM dependency
Everywhere replication strategy returns zero host id in replica set instead
of the real one if no tokens are configured yet in token metadata. It
worked because code that translates ids to ips knows that zero host id
is a special one, so putting zero there was equivalent to allow local
access. But now we use host ids directly so we need to return real host
id here to allow local access before token metadata is populated.
Message-ID: <Z1hBHsEo4wYzzgvJ@scylladb.com>
New logs allow us to easily distinguish two cases in which
waiting for apply times out:
- the node didn't receive the entry it was waiting for,
- the node received the entry but didn't apply it in time.
Distinguishing these cases simplifies reasoning about failures.
The first case indicates that something went wrong on the leader.
The second case indicates that something went wrong on the node
on which waiting for apply timed out.
As it turns out, many different bugs result in the `read_barrier`
(which calls `wait_for_apply`) timeout. This change should help
us in debugging bugs like these.
We want to backport this change to all supported branches so that
it helps us in all tests.
Closesscylladb/scylladb#21855
When tablet scheduler drains nodes, it chooses target location based
on "badness" metric. Nodes with lowest score are preferred. Before the
patch, the score which was used was the number of tablets on that node
post-movement. This way we populate least-loaded node first. But this
works only if nodes have equal number of shards. If nodes have different
capacity, then number of tablets is not a good metric, because we don't
aim to equalize per-node count, but per-shard count. We assume that each
shard has equal capacity.
Because of this bug, during decommission, the nodes with fewer shards
would be preferred to receive replicas, which may lead to overloading
of those nodes. This imbalance would be later fixed by the normal load
balancing logic, but it's still problematic.
Fixes#21783Closesscylladb/scylladb#21860
Log a message when removenode succeeded as expected
or when it failed as expected with the `expected_error`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, we call server_mark_removed also when removenode
failed with the `expected_error`, where the function returns success
but the server is not supposed to be in a removed state.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
TRUNCATE TABLE saves the current commit log replay positions in case there is a crash so that replay knows where to begin replaying the mutations. These are collected and saved per shard into `system.truncated`. In case a shard received no mutations, its replay position will be an empty, default constructed object of type `db::replay_position` with its members set to 0. Truncate will incorrectly interpret these empty replay positions as if they were coming from shard 0, and save them as such, potentially overwriting an actual valid replay position coming from the actual shard 0. In the case of a crash, this will cause the commit log on shard 0 to be replayed from the beginning, and result with data resurrection.
Fixes#21719Closesscylladb/scylladb#21722
* github.com:scylladb/scylladb:
test: add test for truncate saving replay positions
database: correctly save replay position for truncate
On startup, if a server reads an sstable that belongs to a tablet that
doesn't have any local replica, it throws an error in the following
format and refuses to start :
```
Storage wasn't found for tablet 1 of table test.test
```
This patch updates the code path to throw a nicer error that includes
the sstable name that caused the problem.
This patch also adds a testcase to verify the error being thrown.
Fixes https://github.com/scylladb/scylladb/issues/18038
PR improves an error message - no need to backport.
Closesscylladb/scylladb#21805
* github.com:scylladb/scylladb:
replica/table: fix indent in compaction_group_for_sstable
replica/table: improve error message when encountering orphaned sstables
Replace boost::make_iterator_range() with std::ranges::subrange.
This change improves code modernization and reduces external dependencies:
- Replace boost::make_iterator_range() with std::ranges::subrange
- Remove boost/range/iterator_range.hpp include
- Improve iterator type detection in interval.hh using std::ranges::const_iterator_t<Range>
This is part of ongoing efforts to modernize our codebase and minimize
external dependencies.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21787
One of run_when_memory_available() checks mirrors the one done by the
execution_permitted() helper, so its worth re-using it. Since the former
helper is header template, the latter is worth moving to header too.
And, once re-used, the `bool blocking` variable becomes excessive, and
the `if (blocking)` check can also be expressed with fewer LOCs.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#21812
since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also,
we've switched to the build image based on fedora 40, which ships
fmt-devel v10.2.1, there is no need to support fmt < 10.
in this change, we drop the support fmt < 10.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21847
Artifact tests have been failing since the switch to the native nodetool, because ScyllaDB doesn't leave any IOCBs for tools. On some setups it will consume all of them and then nodetool and any other native app will refuse to start because it will fail to allocate IOCBs.
This PR fixes this by making use of the freshly introduced `--reserve-io-control-blocks` seastar option, to reserve IOCBs for tool applications. Since the `linux-aio` and `epoll` reactor backends require quite a bit of these, we enable the `io_uring` reactor backend and switch tools to use this backend instead. The `io_uring` reactor backend needs just 2 IOCBs to function, so the reserve of 10 IOCBs set up in this PR is good for running 5 tool applications in parallel, which should be more than enough.
Fixes: https://github.com/scylladb/scylladb/issues/19185
The problem this PR fixes has a manual workaround (and is rare to begin with), no backport needed.
Closesscylladb/scylladb#21527
* github.com:scylladb/scylladb:
main: configure a reserve IOCB for scylla-nodetool and friends
configure: enable the io_uring backend
main: use configure seastar defaults via app_template::seastar_options
Use raw string literals to prevent syntax warnings when using regular
expressions with backslash-based patterns.
The original code triggered a SyntaxWarning in developer mode (`python3 -Xdev`)
due to unescaped backslash characters in regex patterns like '\s'. While
CPython typically interprets these silently, strict Python parsing modes
raise warnings about potentially unintended escape sequences.
This change adds the `r` prefix to string literals containing regex patterns,
ensuring consistent behavior across different Python runtime configurations
and eliminating unnecessary syntax warning like:
```
/opt/scylladb/scripts/libexec/scylla_io_setup:41: SyntaxWarning: invalid escape sequence '\s'
pattern = re.compile(_nocomment + r"CPUSET=\s*\"" + _reopt(_cpuset) + _reopt(_smp) + "\s*\"")
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21839
In the cleanup commit a840949ea0
a regression was introduced that caused backward incompatible changes
in the gossiper application state name strings.
In the e486e0f759 the value
`application_state::CDC_STREAMS_TIMESTAMP` was changed to
`application_state::CDC_GENERATION_ID`, but the name string
"CDC_STREAMS_TIMESTAMP" was kept for backward compatibility.
The cleanup commit a840949ea0 however
changed the name string to "CDC_GENERATION_ID" by ommission (not noticing
the difference) which caused backward incompatible change.
There is also another case found of "IGNOR_MSB_BITS" (that has a typo -
missing the "E" in "IGNORE") to "IGNORE_MSB_BITS", which also needs to
be reverted back to keep the backward compatibility.
Fixes: scylladb/scylladb#21811Closesscylladb/scylladb#21813
This patch adds the unit tests for truncate with tablets.
test_truncate_while_migration() triggers a tablet migration, then runs
a TRUNCATE TABLE for the table containing the tablet being migrated.
test_truncate_with_concurrent_drop() starts a truncate, then attempts to
drop the table while it is being truncated.
test_truncate_while_node_restart() validates the case where a replica
node is restarted while truncate is running.
test_truncate_with_coordinator_crash() validates if truncate is
correctly completed in cases where the topology coordinator has crashed
or restarted after the truncate session is cleared, but before the
truncate request is finalized.
This commit adds the code needed to create a TRUNCATE global topology
request. It also adds the handler for this request to the topology
coordinator.
The execution of the truncate operation is not canceled on a timeout,
but the query coordinator side will return a timeout error.
column_translation::state is storing pointers to column names, which
are stable only as long as schema_ptr is alive. sstable object caches
last used column_translation, and reuses column_translation::state if
the schema version matches. But this doesn't guarantee that the schema
object was not destroyed and recreated in between. This can happen if
the schema version expired in registry and then was pulled again from a
different node via get_schema_for_read().
Spotted by reading the code.
Fix by storing schema_ptr in column_translation. This can pin old
schema in memory until a newer schema is used to read the sstable, or
until sstable is compacted away. I think this shouldn't be a problem
in practice.
Reads which need clustering index cursor were computing
column_values_fixed_lengths each time. This showed up in perf profile
for a sstable-read heavy workload, and amounted to about 1%.
Avoid by using cached per-sstable mapping. There is already
sstable::_column_translation which can be used for this. It caches the
mapping for the most recently used schema. Since the cursor uses the
mapping only for primary key columns, which are stable, any schema
will do, so we can use the last _column_translation. We only need to
make sure that it's always armed, so sstable loading is augmented with
arming with sstable's schema.
The function get_service_levels is used to retrieve all service levels
and it is called from multiple different contexts.
Importantly, it is called internally from the context of group0 state reload,
where it should be executed with a long timeout, similarly to other
internal queries, because a failure of this function affects the entire
group0 client, and a longer timeout can be tolerated.
The function is also called in the context of the user command LIST
SERVICE LEVELS, and perhaps other contexts, where a shorter timeout is
preferred.
The commit introduces a function parameter to indicate whether the
context is internal or not. For internal context, a long timeout is
chosen for the query. Otherwise, the timeout is shorter, the same as
before. When the distinction is not important, a default value is
chosen which maintains the same behavior.
The main purpose is to fix the case where the timeout is too short and causes
a failure that propagates and fails the group0 client.
Fixesscylladb/scylladb#20483Closesscylladb/scylladb#21748
Replace boost::join() with std::ranges::join_view() as an interim solution
before C++26's std::views::concat becomes available. This change:
- Reduces dependencies on the Boost Ranges library
- Moves closer to standard library implementations
- Improves code maintainability and future compatibility
This is part of ongoing efforts to modernize our codebase and minimize
external dependencies.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21786