There used to be a problem with restarting a node after
decommissioning some other node - the restarting node
tries to apply the raft log, this log contains a record
about the decommissioned node, and we got stuck trying
to resolve its IP.
This was fixed in #16639 - we excluded IPs from
the RAFt log application code and moved it entirely
to host_id-s.
In this commit we add a regression test
for this case. We move the decommission_node
call before server_stop/server_start. We need
to add one more server to retain majority when
the node is decommissioned, otherwise the topology
coordinator won't migrate from the stopped node
before replacing it, and we'll get an error.
closes#14803
The replaced node transitions to LEFT state, and
we used to remove the IPs of such nodes from gossiper.
If we replace with same IP, this caused the IP of the
new node to be removed from gossiper.
This problem was fixed by #16820, this commit
adds a regression test for it.
closes#15967
Since `t.parallel_foreach_table_state` may yield,
we should access `type` by reference when calling
`stop_compaction` since it is captured by the calling
lambda and gets lost when it returns if
`parallel_foreach_table_state` returns an unavailable
future.
Instead change all captures to `[&]` so we can access
the `type` variable held by the coroutine frame.
Fixes#16975
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#17143
get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it.
Replace with seastar::future::get(), which does the same thing.
Closesscylladb/scylladb#17130
* github.com:scylladb/scylladb:
treewide: replace seastar::future::get0() with seastar::future::get()
sstable: capture return value of get0() using auto
utils: result_loop: define result_type with decayed type
[avi: add another one that snuck in while this was cooking]
Commit e81fc1f095 accidentally broke the control
flow of row_cache::do_update().
Before that commit, the body of the loop was wrapped in a lambda.
Thus, to break out of the loop, `return` was used.
The bad commit removed the lambda, but didn't update the `return` accordingly.
Thus, since the commit, the statement doesn't just break out of the loop as
intended, but also skips the code after the loop, which updates `_prev_snapshot_pos`
to reflect the work done by the loop.
As a result, whenever `apply_to_incomplete()` (the `updater`) is preempted,
`do_update()` fails to update `_prev_snapshot_pos`. It remains in a
stale state, until `do_update()` runs again and either finishes or
is preempted outside of `updater`.
If we read a partition processed by `do_update()` but not covered by
`_prev_snapshot_pos`, we will read stale data (from the previous snapshot),
which will be remembered in the cache as the current data.
This results in outdated data being returned by the replica.
(And perhaps in something worse if range tombstones are involved.
I didn't investigate this possibility in depth).
Note: for queries with CL>1, occurences of this bug are likely to be hidden
by reconciliation, because the reconciled query will only see stale data if
the queried partition is affected by the bug on on *all* queried replicas
at the time of the query.
Fixes#16759Closesscylladb/scylladb#17138
The amount of standard Lua libraries loaded for the sstable-script was
limited, due to fears that some libraries (like the io library) could
expose methods, which if used from the script could interfere with
seastar's asynchronous arhitecture. So initially only the table and
string libraries were loaded.
This patch adds two more libraries to be loaded: match and os. The
former is self-explanatory and the latter contains methods to work with
date/time, obtain the values of environment variables as well as launch
external processes. None of these should interfere with seastar, on the
other hand the facilities they provide can come very handy for sstable
scripts.
Closesscylladb/scylladb#17126
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.
Replace with seastar::future::get(), which does the same thing.
instead of capturing the return value of `get0()` with a reference
type, use a plain type. as `get0()` returns a plain `T` while `get0()`
returns a `T&&`, to avoid the value referenced by `T&&` gets destroyed
after the expression, let's use a plain `auto` instead of `auto&&`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change prepares for replacing `seastar::future::get0()` with
`seastar::future::get()`. the former's return type is a plain `T`,
while the latter is `T&&`. in this case `T` is
`boost::outcome::result<..>`. in order to extract its `error_type`,
we need to get its decayed type. since `std::remove_reference_t<T>`
also returns `T`, let's use it so it works with both `get0()` and `get()`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Other than being fmt v10 compatible, it's also shorter and easier to
read, thanks to fmt::join() helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17115
This kills three birds with one stone
1. fixes broken indentation
2. re-uses new_options local variable
3. stops using string literal to check storage type
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17111
This is a follow up of "storage_service: Run stream_ranges cmd in streaming group" to fix indentation and drop a unnecessary co_return.
Refs: #17090Closesscylladb/scylladb#17114
* github.com:scylladb/scylladb:
storage_service: Drop unnecessary co_return in raft_topology_cmd_handler
storage_service: Fix indentation for stream_ranges
this comment has already served its purpose when rewriting
C* in C++. since we've re-implemented it, there is no need to keep it
around.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17120
The latter suite is now tablets-aware and tablets cases from the former one can happily work with single shared scylla instance
Closesscylladb/scylladb#17101
* github.com:scylladb/scylladb:
test/topology_custom: Remove test_tablets.py
test/topology: Move test_tablet_change_initial_tablets
test/topology: Move test_tablet_explicit_disabling
test/topology: Move test_tablet_default_initialization
test/topology: Move test_tablet_change_replication_strategy
test/topology: Move test_tablet_change_replication_vnode_to_tablets
cql-pytest: Add skip_without_tablets fixture
At the end of the test, we wait until a restarted node receives a
snapshot from the leader, and then verify that the log has been
truncated.
To check the snapshot, the test used the `system.raft_snapshots` table,
while the log is stored in `system.raft`.
Unfortunately, the two tables are not updated atomically when Raft
persists a snapshot (scylladb/scylladb#9603). We first update
`system.raft_snapshots`, then `system.raft` (see
`raft_sys_table_storage::store_snapshot_descriptor`). So after the wait
finishes, there's no guarantee the log has been truncated yet -- there's
a race between the test's last check and Scylla doing that last delete.
But we can check the snapshot using `system.raft` instead of
`system.raft_snapshots`, as `system.raft` has the latest ID. And since
1640f83fdc, storing that ID and truncating
the log in `system.raft` happens atomically.
Closesscylladb/scylladb#17106
`#warning` is a preprocessor macro in C/C++, while `#warn` is not. the
reason we haven't run into the build failure caused by this is likely
that we are only building on amd64/aarch64 with libstdc++ at the time
of writing.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17074
according to the document "nodetool cleanup"
> Triggers removal of data that the node no longer owns
currently, scylla performs cleanup by rewriting the sstables. but
commitlog segments may still contain the mutations to the tables
which are dropped during sstable rewriting. when scylla server
restarts, the dirty mutations are replayed to the memtable. if
any of these dirty mutations changes the tables cleaned up. the
stale data are reapplied. this would lead to data resurrection.
so, in this change we following the same model of major compaction
where we
1. forcing new active segment,
2. flushing tables being cleaned up
3. perform cleanup using compaction
Fixes#4734Closesscylladb/scylladb#16757
* github.com:scylladb/scylladb:
storage_service: fall back to local cleanup in cleanup_all
compaction: format flush_mode without the helper
compaction_manager: flush all tables before cleanup
replica: table: pass do_flush to table::perform_cleanup_compaction()
api, compaction: promote flush_mode
Otherwise it will inherit the rpc verb's scheduling group which is gossip. As a result, it causes the streaming runs in the wrong scheduling group.
Fixes#17090Closesscylladb/scylladb#17097
* github.com:scylladb/scylladb:
streaming: Verify stream consumer runs inside streaming group
storage_service: Run stream_ranges cmd in streaming group
During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache.
Mitigate this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache.
Fixes https://github.com/scylladb/scylladb/issues/14506Closesscylladb/scylladb#16952
* github.com:scylladb/scylladb:
directories: prevent inode cache fragmentation by orderly verifying data directory contents
directories: skip verifying data directory contents during startup
directories: co-routinize create_and_verify
This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs.
The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function.
Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw.
Fixes#17035Closesscylladb/scylladb#17073
* github.com:scylladb/scylladb:
utils: replace assert() by on_internal_error()
utils: add on_internal_error with common logger
utils: add a timeuuid minimum, like we had maximum
test/cql-pytest: tests for "date" type
This change introduces a new metric called tablet_count
that is recalculated during construction of table object
and on each call to table::update_effective_replication_map().
To get the count of tablet per current shard, tablet map
is traversed and for each tablet_id tablet_map::get_shard()
is called. Its return value is compared with this_shard_id().
The new metric is maintained and exposed only for tables
that uses tablets.
Refs: scylladb#16131
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17056
this series addresses couple `-Wsign-compare` warnings surfaced in the tree.
Closesscylladb/scylladb#17091
* github.com:scylladb/scylladb:
tablet_allocator: do not compare signed and unsigned
replica: table: do not compare signed with unsigned
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `db::write_type`, and drop
its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17093
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `gms::application_state`,
but its operator<< is preserved, as it is still used by the generic
homebrew formatter for `std::unordered_map<>`.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17096
During startup, the contents of the data directory are verified to ensure
that they have the right owner and permissions. Verifying all the
contents, which includes files that will be read and closed immediately,
and files that will be held open for longer durations, together, can
lead to memory fragementation in the dentry/inode cache.
Prevent this by updating the verification in a such way that these two
set of files will be verified separately ensuring their separation in
the dentry/inode cache.
Fixes#14506
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
This is in preparation for a subsequent patch that will verify the
contents of the data directory in a specific order.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>