The test file contains two test cases, which both test
materialized view tombstone gc settings. With tablets the default
is "repair" which is different from vnodes.
The tests are testing that the gc settings are not inherited. With
tablets, the gc settings are forced. This is indistinguishable from
inheriting, so the tests are failing when run with tablets.
In this change, tablet_virtual_task starts supporting tablet
migration, in addition to tablet repair. Both tablet operations
reuse the same virtual_task because their task data is retrieved
similarly. However, it changes nothing from the task manager
API users' perspective. They can list running migrations or check
their statuses all the same as if migration had its own virtual_task.
Users can see running migration tasks - finished tasks are not
presented with the task manager API. However, the result
of the migration (whether it succeeded or failed) would be
presented to users, if they use wait API.
If a migration was reverted, it will appear to users as failed.
We assume that the migration was reverted, when its destination
does not contain a tablet replica.
Fixes: https://github.com/scylladb/scylladb/issues/21365.
No backport, new feature
Closesscylladb/scylladb#21729
* github.com:scylladb/scylladb:
test: boost: check migration_task_info in tablet_test.cc
replica: add repair related fields to tablet_map_to_mutation
test: add tests to check the failed migration virtual tasks
test: add tests to check the list of migration virtual tasks
test: add tests to check migration virtual tasks status
test: topology_tasks: generalize repair task functions
service: extend tablet_virtual_task::abort
service: extend tablet_virtual_task::wait
service: extend tablet_virtual_task::get_status_helper
service: extend tablet_virtual_task::contains
service: extend tablet_virtual_task::get_stats
service: tasks: make get_table_id a method of virtual_task_hint
service: tasks: extend virtual_task_hint
replica: service: add migration_task_info column to system.tablets
locator: extend tablet_task_info to cover migration tasks
locator: rename tablet_task_info methods
In Scylla, we can have either `closed` or `merged` PRs. Based on that we decide when to start the backport process when the label was added after the PR is closed (or merged),
In https://github.com/scylladb/scylladb/pull/21876 even when adding the proper backport label didn't trigger the backport automation. Https://github.com/scylladb/scylladb/pull/21809/ caused this, we should have left the `state=closed` (this includes both closed and merged PR)
Fixing it
Closesscylladb/scylladb#21906
* seastar 72c7ac575...3133ecdd6 (12):
> util/backtrace: Optimize formatter to reduce memory allocation overhead
> scheduler: Report long queue stall
> log: drop specialization of boost::lexical_cast for log_level
> stall-detector: Remove unused _stall_detector_reports_per_minute
> Merge 'when_all: add Sentinel support to when_all_succeed() ' from Kefu Chai
> scripts/perftune.py: Implement AWS IMDSv2 call
> net/tls: Add a way to disable certificate validation
> tests: Improve websocket parser tests
> scripts/stall-analyser: improve error messages on invalid input
> reserve-memory: document that seastar just doesnt use the reserves
> Merge 'Minor metrics memory optimizations' from Stephan Dollberg
> json_formatter: Add support for standard range containers
Closesscylladb/scylladb#21869
Store the endpoint ip address together with the client (note it may be
different from the address the client is connected to in case
preferable address is different). This allows up to drop lookup in the
address map which may eventually fail if an endpoint was already
deleted.
Fixes: scylladb/scylladb#21840
Message-ID: <Z1mpMMe-o0ggBU_F@scylladb.com>
"
The series moves node ops, repair and streaming verbs to IDL. Also
contains IDL related cleanups.
In addition to the CI tested manually by bootstrapping a node with the
series into a cluster of old nodes with repair and streaming both in
gossiper and raft mode. This exercises repair, streaming and node_ops
paths.
"
* 'gleb/move-more-rpcs-to-idl-v3' of github.com:scylladb/scylla-dev:
repair: repair_flush_hints_batchlog_request::target_nodes is not used any more, so mark it as such
streaming: move streaming verbs to IDL
messaging_service: move repair verbs to IDL
node_ops: move node_ops_cmd to IDL
idl: rename partition_checksum.dist.hh to repair.dist.hh
idl: move node_ops related stuff from the repair related IDL
This change goes thru locator:topology to use node&
instead of node* where nullptr is not possible. There are
places where the node object is used in unordered_set, in
those cases the node is wrapped in std::reference_wrapper.
Fixesscylladb/scylladb#20357Closesscylladb/scylladb#21863
Reads which need sstable index were computing
column_values_fixed_lengths each time. This showed up in perf profile
for a sstable-read heavy workload, and amounted to about 1-2% of time.
Computing it involves type name parsing.
Avoid by using cached per-sstable mapping. There is already
sstable::_column_translation which can be used for this. It caches the
mapping for the least-recently used schema. Since the cursor uses the
mapping only for primary key columns, which are stable, any schema
will do, so we can use the last _column_translation. We only need to
make sure that it's always armed, so sstable loading is augmented with
arming with sstable's schema.
Also, fixes a potential use-after-free on schema in column_translation.
Closesscylladb/scylladb#21347
* github.com:scylladb/scylladb:
sstables: Fix potential use-after-free on column_translation::column_info::name
sstables: Avoid computing column_values_fixed_lengths on each read
Replace manual subrange advancement with the more concise and readable
`subrange.advance()` method. This change:
- Eliminates unnecessary subrange instance creation
- Improves code readability
- Reduces potential for unnecessary object allocation
- Leverages the built-in `advance()` method for cleaner iterator handling
The modification simplifies the iteration logic while maintaining the
same functional behavior.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21865
Extend tablet_virtual_task::wait to support migration tasks.
To decide what is a state of a finished migration virtual task
(done or failed), the tablet replicas are checked. The task state
is set to done, if the replicas contain the destination of a tablet
migration.
Extend tablet_virtual_task::get_status_helper to cover migration
tasks. get_status_helper is used by get_status and wait methods.
Waiting for a task in the latter will be modified in the following
patch.
Extend tablet_virtual_task::contains to check migration operations.
Returned virtual_task_hint contains also tablet_id (only for migration
tasks) and task_type.
Return immediately from methods that do not support migration
for non-repair task types. The methods' support for migration
will be implemented in the following patches.
This commit removes the information about the recommended way of upgrading
ScyllaDB images - by updating ScyllaDB and OS packages in one step. This upgrade
procedure is not supported (it was implemented, but then reverted).
Refs https://github.com/scylladb/scylladb/issues/15733Closesscylladb/scylladb#21876
Replace `dht/sharder.hh` with a "smaller" header, which provides
just the enough dependencies.
in f744007e, we traded `database.hh` with a smaller set of headers.
but it turns out `dht/sharder.hh` can be replaced with a even smaller
one. because `dht::sharder` is defined by `dht/token-sharding.hh`, and
what we need from `dht/sharder.hh` is this class's declaration.
`clang-include-cleaner` identified this issue.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21881
Add migration_task_info column to system.tablets. Set migration_task_info
value on migration request if the feature is enabled in the cluster.
Reflect the column content in tablet_metadata.
Update the service level cache in the node startup sequence, after the
service level and auth service are initialized.
The cache update depends on the service level data accessor being set
and the auth service being initialized. Before the commit, it may happen that a
cache update is not triggered after the initialization. The commit adds
an explicit call to update the cache where it is guaranteed to be ready.
Fixesscylladb/scylladb#21763Closesscylladb/scylladb#21773
Test cases related to #21826:
1. test_remove_failure_with_no_normal_token_owners_in_dc: attempts to
remove a node with another node down in the datacenter, leaving
no normal token owners in that dc (reproducing #21826).
Removenode is expected to fail in this case since it
should have no place to rebuild the removed node replicas,
yet it currently succeeds unexpectedly.
2. test_remove_failure_then_replace: verify that removenode
fails as expected when there are not enough nodes to
rebuild its replicas on, with and without additional zero-token nodes.
3. test_replace_with_no_normal_token_owners_in_dc: verify that
nodes can be replaced in a datacenter that has no live
token owners, with and without additional zero-token nodes.
Tablet replace uses all replicas to rebuild the lost replicas
and therefore should succeed in the edge case.
The restored data is verified as well.
Refs #21826
* New tests, no backport needed
Closesscylladb/scylladb#21827
* github.com:scylladb/scylladb:
topology_custom/test_tablets: add remove/replace tests for edge cases
test: pylib: _cluster_remove_node: log message on successful paths
test: pylib: _cluster_remove_node: mark server as removed only when removenode succeeded
Test cases related to #21826:
1. test_remove_failure_with_no_normal_token_owners_in_dc: attempts to
remove a node with another node down in the datacenter, leaving
no normal token owners in that dc (reproducing #21826).
Removenode is expected to fail in this case since it
should have no place to rebuild the removed node replicas,
yet it currently succeeds unexpectedly.
2. test_remove_failure_then_replace: verify that removenode
fails as expected when there are not enough nodes to
rebuild its replicas on, with and without additional zero-token nodes.
3. test_replace_with_no_normal_token_owners_in_dc: verify that
nodes can be replaced in a datacenter that has no live
token owners, with and without additional zero-token nodes.
Tablet replace uses all replicas to rebuild the lost replicas
and therefore should succeed in the edge case.
The restored data is verified as well.
Refs #21826
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently truncating a table works by issuing an RPC to all the nodes which call `database::truncate_table_on_all_shards()`, which makes sure that older writes are dropped.
It works with tablets, but is not safe. A concurrent replication process may bring back old data.
This change makes makes TRUNCATE TABLE a topology operation, so that it excludes with other processes in the system which could interfere with it. More specifically, it makes TRUNCATE a global topology request.
Backporting is not needed.
Fixes#16411Closesscylladb/scylladb#19789
* github.com:scylladb/scylladb:
docs: docs: topology-over-raft: Document truncate_table request
storage_proxy: fix indentation and remove empty catch/rethrow
test: add tests for truncate with tablets
storage_proxy: use new TRUNCATE for tablets
truncate: make TRUNCATE a global topology operation
storage_service: move logic of wait_for_topology_request_completion()
RPC: add truncate_with_tablets RPC with frozen_topology_guard
feature_service: added cluster feature for system.topology schema change
system.topology_requests: change schema
storage_proxy: propagate group0 client and TSM dependency
Everywhere replication strategy returns zero host id in replica set instead
of the real one if no tokens are configured yet in token metadata. It
worked because code that translates ids to ips knows that zero host id
is a special one, so putting zero there was equivalent to allow local
access. But now we use host ids directly so we need to return real host
id here to allow local access before token metadata is populated.
Message-ID: <Z1hBHsEo4wYzzgvJ@scylladb.com>
New logs allow us to easily distinguish two cases in which
waiting for apply times out:
- the node didn't receive the entry it was waiting for,
- the node received the entry but didn't apply it in time.
Distinguishing these cases simplifies reasoning about failures.
The first case indicates that something went wrong on the leader.
The second case indicates that something went wrong on the node
on which waiting for apply timed out.
As it turns out, many different bugs result in the `read_barrier`
(which calls `wait_for_apply`) timeout. This change should help
us in debugging bugs like these.
We want to backport this change to all supported branches so that
it helps us in all tests.
Closesscylladb/scylladb#21855
When tablet scheduler drains nodes, it chooses target location based
on "badness" metric. Nodes with lowest score are preferred. Before the
patch, the score which was used was the number of tablets on that node
post-movement. This way we populate least-loaded node first. But this
works only if nodes have equal number of shards. If nodes have different
capacity, then number of tablets is not a good metric, because we don't
aim to equalize per-node count, but per-shard count. We assume that each
shard has equal capacity.
Because of this bug, during decommission, the nodes with fewer shards
would be preferred to receive replicas, which may lead to overloading
of those nodes. This imbalance would be later fixed by the normal load
balancing logic, but it's still problematic.
Fixes#21783Closesscylladb/scylladb#21860
Log a message when removenode succeeded as expected
or when it failed as expected with the `expected_error`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, we call server_mark_removed also when removenode
failed with the `expected_error`, where the function returns success
but the server is not supposed to be in a removed state.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
TRUNCATE TABLE saves the current commit log replay positions in case there is a crash so that replay knows where to begin replaying the mutations. These are collected and saved per shard into `system.truncated`. In case a shard received no mutations, its replay position will be an empty, default constructed object of type `db::replay_position` with its members set to 0. Truncate will incorrectly interpret these empty replay positions as if they were coming from shard 0, and save them as such, potentially overwriting an actual valid replay position coming from the actual shard 0. In the case of a crash, this will cause the commit log on shard 0 to be replayed from the beginning, and result with data resurrection.
Fixes#21719Closesscylladb/scylladb#21722
* github.com:scylladb/scylladb:
test: add test for truncate saving replay positions
database: correctly save replay position for truncate
On startup, if a server reads an sstable that belongs to a tablet that
doesn't have any local replica, it throws an error in the following
format and refuses to start :
```
Storage wasn't found for tablet 1 of table test.test
```
This patch updates the code path to throw a nicer error that includes
the sstable name that caused the problem.
This patch also adds a testcase to verify the error being thrown.
Fixes https://github.com/scylladb/scylladb/issues/18038
PR improves an error message - no need to backport.
Closesscylladb/scylladb#21805
* github.com:scylladb/scylladb:
replica/table: fix indent in compaction_group_for_sstable
replica/table: improve error message when encountering orphaned sstables
Replace boost::make_iterator_range() with std::ranges::subrange.
This change improves code modernization and reduces external dependencies:
- Replace boost::make_iterator_range() with std::ranges::subrange
- Remove boost/range/iterator_range.hpp include
- Improve iterator type detection in interval.hh using std::ranges::const_iterator_t<Range>
This is part of ongoing efforts to modernize our codebase and minimize
external dependencies.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21787
One of run_when_memory_available() checks mirrors the one done by the
execution_permitted() helper, so its worth re-using it. Since the former
helper is header template, the latter is worth moving to header too.
And, once re-used, the `bool blocking` variable becomes excessive, and
the `if (blocking)` check can also be expressed with fewer LOCs.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#21812
since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also,
we've switched to the build image based on fedora 40, which ships
fmt-devel v10.2.1, there is no need to support fmt < 10.
in this change, we drop the support fmt < 10.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21847
Artifact tests have been failing since the switch to the native nodetool, because ScyllaDB doesn't leave any IOCBs for tools. On some setups it will consume all of them and then nodetool and any other native app will refuse to start because it will fail to allocate IOCBs.
This PR fixes this by making use of the freshly introduced `--reserve-io-control-blocks` seastar option, to reserve IOCBs for tool applications. Since the `linux-aio` and `epoll` reactor backends require quite a bit of these, we enable the `io_uring` reactor backend and switch tools to use this backend instead. The `io_uring` reactor backend needs just 2 IOCBs to function, so the reserve of 10 IOCBs set up in this PR is good for running 5 tool applications in parallel, which should be more than enough.
Fixes: https://github.com/scylladb/scylladb/issues/19185
The problem this PR fixes has a manual workaround (and is rare to begin with), no backport needed.
Closesscylladb/scylladb#21527
* github.com:scylladb/scylladb:
main: configure a reserve IOCB for scylla-nodetool and friends
configure: enable the io_uring backend
main: use configure seastar defaults via app_template::seastar_options
Use raw string literals to prevent syntax warnings when using regular
expressions with backslash-based patterns.
The original code triggered a SyntaxWarning in developer mode (`python3 -Xdev`)
due to unescaped backslash characters in regex patterns like '\s'. While
CPython typically interprets these silently, strict Python parsing modes
raise warnings about potentially unintended escape sequences.
This change adds the `r` prefix to string literals containing regex patterns,
ensuring consistent behavior across different Python runtime configurations
and eliminating unnecessary syntax warning like:
```
/opt/scylladb/scripts/libexec/scylla_io_setup:41: SyntaxWarning: invalid escape sequence '\s'
pattern = re.compile(_nocomment + r"CPUSET=\s*\"" + _reopt(_cpuset) + _reopt(_smp) + "\s*\"")
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21839
In the cleanup commit a840949ea0
a regression was introduced that caused backward incompatible changes
in the gossiper application state name strings.
In the e486e0f759 the value
`application_state::CDC_STREAMS_TIMESTAMP` was changed to
`application_state::CDC_GENERATION_ID`, but the name string
"CDC_STREAMS_TIMESTAMP" was kept for backward compatibility.
The cleanup commit a840949ea0 however
changed the name string to "CDC_GENERATION_ID" by ommission (not noticing
the difference) which caused backward incompatible change.
There is also another case found of "IGNOR_MSB_BITS" (that has a typo -
missing the "E" in "IGNORE") to "IGNORE_MSB_BITS", which also needs to
be reverted back to keep the backward compatibility.
Fixes: scylladb/scylladb#21811Closesscylladb/scylladb#21813