Any expired tombstone can be garbage collected if it doesn't shadow data in the commit log, memtable, or uncompacting SSTables.
This PR introduces a new mode to major compaction, enabled by the `consider_only_existing_data` flag that bypasses these checks. When enabled, memtables and old commitlog segments are cleared with a system-wide flush and all the sstables (after flush) are included in the compaction, so that it works with all data generated up to a given time point.
This new mode works with the assumption that newly written data will not be shadowed by expired tombstones. So it ignores new sstables (and new data written to memtable) created after compaction started. Since there was a system wide flush, commitlog checks can also be skipped when garbage collecting tombstones. Introducing data shadowed by a tombstone during compaction can lead to undefined behavior, even without this PR, as the tombstone may or may not have already been garbage collected.
Fixes#19728Closesscylladb/scylladb#20031
* github.com:scylladb/scylladb:
cql-pytest: add test to verify consider_only_existing_data compaction option
tools/scylla-nodetool: add consider-only-existing-data option to compact command
api: compaction: add `consider_only_existing_data` option
compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp
compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled
tombstone_gc_state: introduce with_commitlog_check_disabled()
compaction: introduce new option to check only compacting sstables for gc
compaction: rename maybe_flush_all_tables to maybe_flush_commitlog
compaction: maybe_flush_all_tables: add new force_flush param
Currently, when attempting to send a hint, we might choose its recipients in one of two ways:
- If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.
There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded.
As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destination of the hint is leaving.
Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again.
Fixesscylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835
Should be backported to 6.0 and 6.1; the dtest started failing due to topology on raft, which sped up execution of the test and exposed the preexisting problem.
Closesscylladb/scylladb#20488
* github.com:scylladb/scylladb:
test: topology_custom/test_hints: consistency test for decommission
test: topology_custom/test_hints: move sync point helpers to top level
test: topology/util: extract find_server_by_host_id
hints: send hints with CL=ALL if target is leaving
hints: inline do_send_one_mutation
In sstable directory test there are two of those -- one that works on
path, state, env and callback, and the other one that just needs env and
callback, getting path from env and assuming state is normal.
Two test cases in this test can enjoy the shorter one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20395
c70f321c6f added an extra check if KS
exists. This check can throw `data_dictionary::no_such_keyspace`
exception, which is supposed to be caught and a more user-friendly
exception should be thrown instead.
This commit fixes the above problem and adds a testcase to validate it
doesn't appear ever again.
Also, I moved the check for the keyspace outside of the `for` loop, as
it doesn't need to be checked repeatedly.
Fixes: scylladb/scylladb#20097Closesscylladb/scylladb#20404
To enhance the test reports UX:
1. switching off/on passed/failed/skipped test for better visibility
2. better searching in test results
3. understanding the trends of execution for each test
4. better configurability of the final report
Enable allure adapter for all python tests.
Add tags and parameters to the test to be able to distinguish them across modes and runs.
Related: https://github.com/scylladb/qa-tasks/issues/1665
Related: https://github.com/scylladb/scylladb/pull/19335
Related: https://github.com/scylladb/scylladb/pull/18169Closesscylladb/scylladb#19942
* github.com:scylladb/scylladb:
[test.py] Clean duplicated arg for test suite
[test.py] Enable allure for python test
Adds the test_hints_consistency_during_decommission test which
reproduces the failure observed in scylladb/scylla-dtest#4582. It uses
error injections, including the newly added
topology_coordinator_pause_after_streaming injection, to reliably
orchestrate the scenario observed there.
In a nutshell, the test makes sure to replay hints after streaming
during decommission has finished, but before the cluster switches to
reading from new replicas. Without the fix, hints would be replayed to
the decommissioned node and then would be lost forever after the cluster
start reading from new replicas.
Move create_sync_point and await_sync_point from the scope of the
test_sync_point test to the file scope. They will be used in a test that
will be introduced in the commit that follows.
The idea of the test is to have a cluster where one node is stressed with injections and failures and the rest of the cluster is used to make progress of the raft state machine.
To achieve this following two lists introduced in the PR:
- ERROR_INJECTIONS in error_injections.py
- CLUSTER_EVENTS in cluster_events.py
Each cluster event is an async generator which has 2 yields and should be used in the following way:
0. Start the generator:
```python
>>> cluster_event_steps = cluster_event(manager, random_tables, error_injection)
```
1. Run the prepare part (before the first yield)
```python
>>> await anext(cluster_event_steps)
```
2. Run the cluster event itself (between the yields)
```python
>>> await anext(cluster_event_steps)
```
3. Run the check part (after the second yield)
```python
>>> await anext(cluster_event, None)
```
Closesscylladb/scylladb#16223
* github.com:scylladb/scylladb:
test: randomized failure injection for Raft-based topology
test: error injections for Raft-based topology
[test.py] topology.util: add get_non_coordinator_host() function
[test.py] random_tables: add UDT methods
[test.py] random_tables: add CDC methods
[test.py] api: get scylla process status
[test.py] api: add expected_server_up_state argument to server_add()
The test cases in this file use an error injection to reduce raft group
0 timeouts (from the default 1 minute), in order to speed up the tests;
the scenarios expect these timeouts to happen, so we want them to happen
as quick as possible, but we don't want to reduce timeouts so much that
it will make other operations fail when we don't expect them to (e.g.
when the test wants to add a node to the cluster).
Unfortunately the selected 5 seconds in debug mode was not enough and
made the tests flaky: scylladb/scylladb#20111.
Increase it to 10 seconds. This unfortunately will slow down these tests
as they have to sometimes wait for 10 seconds for the timeout to happen.
But better to have this than a flaky test.
Fixes: scylladb/scylladb#20111Closesscylladb/scylladb#20320
The idea of the test is to have a small cluster, where one node
is stressed with injections and failures and the rest of
the cluster is used to make progress of the Raft state machine.
To achieve this following two lists introduced in the commit:
- ERROR_INJECTIONS in error_injections.py
- CLUSTER_EVENTS in cluster_events.py
Each cluster event is an async generator which has 2 yields and should be
used in the following way:
0. Start the generator:
>>> cluster_event_steps = cluster_event(manager, random_tables, error_injection)
1. Run the prepare part (before the first yield)
>>> await anext(cluster_event_steps)
2. Run the cluster event itself (between the yields)
>>> await anext(cluster_event_steps)
3. Run the check part (after the second yield)
>>> await anext(cluster_event, None)
Add get_non_coordinator_host() function which returns
ServerInfo for the first host which is not a coordinator
or None if there is no such host.
Also rework get_coordinator_host() to not fail if some
of the hosts don't have a host id.
Allow to return from server_add() when a server reaches specified state.
One of:
- PROCESS_STARTED
- HOST_ID_QUERIED (previously called NOT_CONNECTED)
- CQL_CONNECTED (renamed from CONNECTED)
- CQL_QUERIED (was just QUERIED)
Also, rename CqlUpState to ServerUpState and move to internal_types.
When creating sstables this test allocates temporary local options.
That works, because this test doesn't run on object storage, but it's
more correct to pick storage options from the table at hand.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20440
When testing mv admission control, we perform a large view update
and check if the following view update can be admitted due to the
high view backlog usage. We rely on a delay which keeps the backlog
high for longer to make sure the backlog is still increased during
the second write. However, in some test runs the delay is not long
enough, causing the second write to miss the large backlog and not
hit admission control.
In this patch we keep the increased backlog high using another
injection instead of relying on a delay to make absolute sure
that the backlog is still high during the second write.
Fixesscylladb/scylladb#20382Closesscylladb/scylladb#20445
The with_sstable_dir() helper no longer needs one, it used to pass it as
argument to sstable_directory constructor, but now the directory doesn't
need it (takes semaphore via table object).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20396
Tests that try to access sstables from test/resource/ typically sstable::load() it after object creation. There's reusable_sst() helper for that. This PR fixes one more caller that still goes longer route by doing sstable and loading it on its own.
Closesscylladb/scylladb#20420
* github.com:scylladb/scylladb:
test: Call reusable sst from ka_sst() helper
test: Move sstable_open_config to reusable_sst()'s argument
Run the reversed queries on a 2-node cluster with CL=ALL with and
without NATIVE_REVERSE_QUERIES feature flag. When the flag is enabled,
the native reversed format is used, otherwise the legacy format.
The NATIVE_REVERSE_QUERIES feature flag is suppressed with an error
injection that simulates cluster upgrade process.
Backport is not required. The patch adds additional upgrade tests
for https://github.com/scylladb/scylladb/pull/18864Closesscylladb/scylladb#20179
The sstable_mutation_test wants to load pre-existing sstables from
resouce/ subdir. For that there's reusable_sst() helper on env.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
this change contains two improvements to "backup" and "restore" commands:
- let them print task id
- let them return 1 as the exist status code upon operation failure
----
these changes are improvements to the newly introduced commands, which are not in any LTS branches yet, so no need to backport.
Closesscylladb/scylladb#20371
* github.com:scylladb/scylladb:
tools/scylla-nodetool: return failure with exit code in backup/restore
tools/scylla-nodetool: let backup/restore print task id
before this change, "backup" and "restore" commands always return 0 as
their exist code no matter if the performed operation fails or not.
inspired by the "task" commands of nodetool, let's return 1 with
exit code if the operation fails.
the tests are updated accordingly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
in 20fffcdc, we added the "task wait" subcommand, so user is allowed to
interact with a task with its task id. and in existing implementation of
"backup" and "restore" command, if user does not pass `--nowait`, the
command just exits without any output upon sending the request to
scylladb.
in this change, we print out the task_id if user does not pass
`--nowait` command line option to "backup" or "restore" command. this
allows user to follow up on the operation if necessary.
the tests are updated accordingly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This patch adds functional testing for the role-based access control
(RBAC) "auto-grant" feature, where if a user that is allowed to create
a table, it also recieves full permissions over the table it just
created. We also test permissions over new materialized views created
by a user, and over CDC logs. The test for CDC logs reproduces an
already suspected bug, #19798: A user may be allowed to create a table
with CDC enabled, but then is not allowed to read the CDC log just
created. The tests show that the other cases (base tables and views)
do not have this bug, and the creating user does get appropriate
permissions over the new table and views.
In addition to testing auto-grant, the patch also includes tests for
the opposite feature, "auto-revoke" - that permissions are removed when
the table/view/cdc is deleted. If we forget to do that while implementing
auto-grant, we risk that users may be able to use tables created by
other users just because they used the same table _name_ earlier.
It's important to have these auto-revoke tests together with the
auto-grant tests that reproduce #19798 - so we don't forget this
part when finally fixing #19798.
Refs #19798.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#19845
Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping.
In 19a6e69001 we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to.
However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection.
Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables.
A unit test is added.
Fixes#15559
This may be useful to users transitioning from Cassandra, so merits a backport.
Closesscylladb/scylladb#19493
* github.com:scylladb/scylladb:
cql3: add option to not unify bind variables with the same name
cql3: introduce dialect infrastructure
cql3: prepared_statement_cache: drop cache key default constructor
for testing the load performance of load_and_stream operation.
Refs #19989
---
no need to backport. it adds two new tests to the existing `perf_sstable` tool for evaluating the load performance when performing the "load_and_streaming" operation. hence has no impact on the production.
Closesscylladb/scylladb#20186
* github.com:scylladb/scylladb:
perf/perf_sstable: add {crawling,partitioned}_streaming modes
test/perf/perf_sstable: use switch-case when appropriate
instead of evaluating the constants in-class, accessing them via
a cached class property.
it would be handy if we could source `scylla-gdb.py` in `.gdbinit`,
but this script accesses some symbols which are not available without
a file being debugged. what's why gdb fails to load the init script:
```
Traceback (most recent call last):
File "/home/kefu/dev/scylladb/scylla-gdb.py", line 167, in <module>
class intrusive_slist:
File "/home/kefu/dev/scylladb/scylla-gdb.py", line 168, in intrusive_slist
size_t = gdb.lookup_type('size_t')
^^^^^^^^^^^^^^^^^^^^^^^^^
gdb.error: No type named size_t.
```
so we have to `file path/to/scylla` and *then*
`source scylla-gdb.py` every time when we debug scylla or a seastar
application, instead of loading `scylla-gdb.py` in `.gdbinit`.
the reason is that the script accesses the debug symbols like
`gdb.lookup_type('size_t')` in-class. so when the python interpreter
reads the script, it evaluates this statement, but at that moment,
the debug symbols are not loaded, so `source scylla-gdb.py` fails
in `.gdbinit`.
in this change, we transform all these class variables to cached
properties, so that they
* are evaluated on-demand
* are evaluated only once at most
this addresses the pain at the expense of verbosity.
---
this change intends to improve the developer's user experience, and has no impacts on product, so no need to backport.
Closesscylladb/scylladb#20334
* github.com:scylladb/scylladb:
test/scylla_gdb: test the .gdb init use case
scylla-gdb.py: lazy-evaluate the constants
All users of it have sstable_test_env at hand (in fact -- they call env
method to get table_for_test). And since sstable_test_env already has a
bunch of methods to create sstable, the table_for_test wrapper doesn't
need to duplicate this code.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20360
before this change, when building the test of `view_build_test` with
clang-20, we can have following build failure:
```
FAILED: test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o
/home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o -MF test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o.d -o test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o -c /home/kefu/dev/scylladb/test/boost/view_build_test.cc
/home/kefu/dev/scylladb/test/boost/view_build_test.cc:998:5: error: unknown type name 'simple_schema'
998 | simple_schema ss;
| ^
```
apparently, `simple_schema`'s declaration is not available in this
translation unit.
in this change
* we include the header where `simple_schema` is defined, so that
the build passes with clang-20.
* also take this opportunity to reorder the header a little bit,
so the testing headers are grouped together.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20367
in a recent seastar change (644bb662), we do not include
`seastar/testing/random.hh` in `seastar/testing/test_runner.hh` anymore,
as the latter is not a facade of the former, and neither does it use the
former. as a sequence, some tests which take the advantage of the
included `seastar/testing/random.hh` do not build with the latest
seastar:
```
FAILED: test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o
/usr/bin/clang++ -DBOOST_REGEX_DYN_LINK -DBOOST_REGEX_NO_LIB -DBOOST_UNIT_TEST_FRAMEWORK_DYN_LINK -DBOOST_UNIT_TEST_FRAMEWORK_NO_LIB -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSCYLLA_ENABLE_PREEMPTION_SOURCE -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/build/gen -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -I/__w/scylladb/scylladb/build -isystem /__w/scylladb/scylladb/abseil -isystem /__w/scylladb/scylladb/build/rust -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -MD -MT test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -MF test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o.d -o test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -c /__w/scylladb/scylladb/test/lib/key_utils.cc
In file included from /__w/scylladb/scylladb/test/lib/key_utils.cc:11:
/__w/scylladb/scylladb/test/lib/random_utils.hh:25:30: error: no member named 'local_random_engine' in namespace 'seastar::testing'
25 | return seastar::testing::local_random_engine;
| ~~~~~~~~~~~~~~~~~~^
1 error generated.
```
in this change, we include `seastar/testing/random.hh` when the random
facility is used, so that they can be compiled with the latest seastar
library.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20368
Bind variables in CQL have two formats: positional (`?`) where a
variable is referred to by its relative position in the statement,
and named (`:var`), where the user is expected to supply a
name->value mapping.
In 19a6e69001 we identified the case where a named bind variable
appears twice in a query, and collapsed it to a single entry in the
statement metadata. Without this, a driver using the named variable
syntax cannot disambiguate which variable is referred to.
However, it turns out that users can use the positional call form
even with the named variable syntax, by using the positional
API of the driver. To support this use case, we add a configuration
variable to disable the same-variable detection.
Because the detection has to happen when the entire statement is
visible, we have to supply the configuration to the parser. We
call it the `dialect` and pass it from all callers. The alternative
would be to add a pre-prepare call similar to fill_prepare_context that
rewrites all expressions in a statement to deduplicate variables.
A unit test is added.
Fixes#15559
All seeds hostname resolution errors will be ignored during a node
restart in case the node had already joined a cluster. This will
prevent restart errors if some seed names are not resolvable.
Fixesscylladb/scylladb#14945Closesscylladb/scylladb#20292
* github.com:scylladb/scylladb:
Ignore seed name resolution errors on restart.
Add a test for starting with a wrong seed.
Currently if a coordinator and a node being replaced are in the same DC
while inter-dc encryption is enabled (connections between nodes in the
same DC should not be encrypted) the replace operation will fail. It
fails because a coordinator uses non encrypted connection to push raft
data to the new node, but the new node will not accept such connection
until it knows which DC the coordinator belongs to and for that the raft
data needs to be transferred.
The series adds the test for this scenario and the fix for the
chicken&egg problem above.
The series (or at least the fix itself) needs to be backported because
this is a serious regression.
Fixes: scylladb/scylladb#19025Closesscylladb/scylladb#20290
* github.com:scylladb/scylladb:
topology coordinator: fix indentation after the last patch
topology coordinator: do not add replacing node without a ring to topology
test: add test for replace in clusters with encryption enabled
test.py: add server encryption support to cluster manager
.gitignore: fix pattern for resources to match only one specific directory
before this change, we run all the tests in a single pytest session,
with scylladb debug symbols loaded. but we want to test another use
case, where the scylladb debug symbols are missing.
in this change,
* we do not check for the existence of debug symbols until necessary
* add a mark named "without_scylla"
* run the tests in two pytest sessions
- one with "without_scylla" mark
- one with "not without_scylla" mark
* add a test which is marked with the "without_scylla" mark. the test
verify that the scylla-gdb.py script can be loaded even without
scylladb debug symbols.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
repair_service::repair_flush_hints_batchlog_handler may access batchlog
manager while it is uninitialized.
Throw if batchlog manager isn't initialized.
Fixes: #20236.
Needs backport to 6.0 and 6.1 as they suffer from the uninitialized bm access.
Closesscylladb/scylladb#20251
* github.com:scylladb/scylladb:
test: add test to ensure repair won't fail with uninitialized bm
repair: throw if batchlog manager isn't initialized
Add nodetool commands to manage task manager tasks:
- tasks abort - aborts the task
- tasks list - lists all tasks in the module
- tasks modules - lists all modules
- tasks set-ttl - sets task ttl
- tasks status - gets status of the task
- tasks tree - gets statuses of the task and all its desendent's
- tasks ttl - gets task ttl
- tasks wait - waits for the task and gets its status
Fixes: https://github.com/scylladb/scylladb/issues/19201.
Closesscylladb/scylladb#19614
* github.com:scylladb/scylladb:
test: nodetool: add tests for tasks commands
nodetool: tasks: add nodetool commands to track task manager tasks
api: task_manager: return status 403 if a task is not abortable
api: task_manager: return none instead of empty task id
api: task_manager: add timeout to wait_task
api: task_manager: add operation to get ttl
nodetool: add suboperations support
nodetool: change operations_with_func type
nodetool: prepare operation related classes for suboperations
A dialect is a different way to interpret the same CQL statement.
Examples:
- how duplicate bind variable names are handled (later in this series)
- whether `column = NULL` in LWT can return true (as is now) or
whether it always returns NULL (as in SQL)
Currently, dialect is an empty structure and will be filled in later.
It is passed to query_processor methods that also accept a CQL string,
and from there to the parser. It is part of the prepared statement cache
key, so that if the dialect is changed online, previous parses of the
statement are ignored and the statement is prepared again.
The patch is careful to pick up the dialect at the entry point (e.g.
CQL protocol server) so that the dialect doesn't change while a statement
is parsed, prepared, and cached.
~~~
generic_server: convert connection tracking to seastar::gate
If we call server::stop() right after "server" construction, it hangs:
With the server never listening (never accepting connections and never
serving connections), nothing ever calls server::maybe_stop().
Consequently,
co_await _all_connections_stopped.get_future();
at the end of server::stop() deadlocks.
Such a server::stop() call does occur in controller::do_start_server()
[transport/controller.cc], when
- cserver->start() (sharded<cql_server>::start()) constructs a
"server"-derived object,
- start_listening_on_tcp_sockets() throws an exception before reaching
listen_on_all_shards() (for example because it fails to set up client
encryption -- certificate file is inaccessible etc.),
- the "deferred_action"
cserver->stop().get();
is invoked during cleanup.
(The cserver->stop() call exposing the connection tracking problem dates
back to commit ae4d5a60ca ("transport::controller: Shut down distributed
object on startup exception", 2020-11-25), and it's been triggerable
through the above code path since commit 6b178f9a4a
("transport/controller: split configuring sockets into separate
functions", 2024-02-05).)
Tracking live connections and connection acceptances seems like a good fit
for "seastar::gate", so rewrite the tracking with that. "seastar::gate"
can be closed (and the returned future can be waited for) without anyone
ever having entered the gate.
NOTE: this change makes it quite clear that neither server::stop() nor
server::shutdown() must be called multiple times. The permitted sequences
are:
- server::shutdown() + server::stop()
- or just server::stop().
Fixes#10305
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
~~~
Fixes#10305.
I think we might want to backport this -- it fixes a hang-on-misconfiguration which affects `scylla-6.1.0-0.20240804.abbf0b24a60c.x86_64` minimally. Basically every release that contains commit ae4d5a60ca has a theoretical chance for the hang, and every release that contains commit 6b178f9a4a has a practical chance for the hang.
Focusing on the more practical symptom (i.e., releases containing commit 6b178f9a4a), `git tag --contains 6b178f9a4a90` gives us (ignoring candidates and release candidates):
- scylla-6.0.0
- scylla-6.0.1
- scylla-6.0.2
- scylla-6.1.0
Closesscylladb/scylladb#20212
* github.com:scylladb/scylladb:
generic_server: make server::stop() idempotent
generic_server: coroutinize server::shutdown()
generic_server: make server::shutdown() idempotent
test/generic_server: add test case
configure, cmake: sort the lists of boost unit tests
generic_server: convert connection tracking to seastar::gate
Handed over from https://github.com/scylladb/scylladb/pull/20149
This adds minimal implementation of the start-restore API call.
The method starts a task that runs load-and-stream functionality against sstables from S3 bucket. Arguments are:
```
endpoint -- the ID in object_store.yaml config file
bucket -- the target bucket to get objects from
keyspace -- the keyspace to work on
table -- the table to work on
snapshot -- the name of the snapshot from which the backup was taken
```
The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion.
Remote sstables components are scanned as if they were placed in local upload/ directory. Then colelcted sstables are fed into load-and-stream.
This branch has https://github.com/scylladb/scylladb/pull/19890 (Integrated backup), https://github.com/scylladb/scylladb/pull/20120 (S3 lister) and few more minor PRs merged in. The restore branch itself starts with [utils: Introduce abstract (directory) lister](29c867b54d) commit.
refs: https://github.com/scylladb/scylladb/issues/18392Closesscylladb/scylladb#20305
* github.com:scylladb/scylladb:
tools/scylla-nodetool: add restore integration
test/object_store: Add simple restore test
test/object_store: Generalize prepare_snapshot_for_backup()
code: Introduce restore API method
sstable_loader: Add sstables::storage_manager dependency
sstable_loader: Maintain task manager module
sstable_loader: Out-line constructor
distributed_loader: Split get_sstables_from_upload_dir()
sstables/storage: Compose uploaded sstable path simpler
sstable_directory: Prepare FS lister to scan files on S3
sstable_directory: Parse sstable component without full path
s3-client: Add support for lister::filter
utils: Introduce abstract (directory) lister
We revive the `join_ring` option. We support it only in the
Raft-based topology, as we plan to remove the gossip-based topology
when we fix the last blocker - the implementation of the manual
recovery tool. In the Raft-based topology, a node can be assigned
tokens only once when it joins the cluster. Hence, we disallow
joining the ring later, which is possible in Cassandra.
The main idea behind the solution is simple. We make the unsupported
special case of zero tokens a supported normal case. Nodes with zero
tokens assigned are called "zero-token nodes" from now on.
From the topology point of view, zero-token nodes are the same as
token-owning nodes. They can be in the same states, etc. From the
data point of view, they are different. They are not members of
the token ring, so they are not present in
`token_metadata::_normal_token_owners`. Hence, they are ignored in
all non-local replication strategies. The tablet load balancer also
ignores them.
Zero-token nodes can be used as coordinator-only nodes, just like in
Cassandra. They can handle requests just like token-owning nodes.
The main motivation behind zero-token nodes is that they can prevent
the Raft majority loss efficiently. Zero-token nodes are group 0
voters, but they can run on much weaker and cheaper machines because
they do not replicate data and handle client requests by default
(drivers ignore them). For example, if there are two DCs, one with 4
nodes and one with 5 nodes, if we add a DC with 2 zero-token nodes,
every DC will contain less than half of the nodes, so we won't lose
the majority when any DC dies.
Another way of preventing the Raft majority loss is changing the
voter set, which is tracked by scylladb/scylladb#18793. That approach
can be used together with zero-token nodes. In the example above, if
we choose equal numbers of voters in both DCs, then a DC with one
zero-token node will be sufficient. However, in the typical setup of
2 DCs with the same number of nodes it is enough to add a DC with
only one zero-token node without changing the voter set.
Zero-token nodes could also be used as load balancers in the
Alternator.
Additionally, this PR fixesscylladb/scylladb#11087, which turned out to
be a blocker.
This PR introduced a new feature. There is no need to backport it.
Fixesscylladb/scylladb#6527Fixesscylladb/scylladb#11087Fixesscylladb/scylladb#15360Closesscylladb/scylladb#19684
* github.com:scylladb/scylladb:
docs: raft: document using zero-token nodes to prevent majority loss
test: test recovery mode in the presence of zero-token nodes
test: topology: util.py: add cqls parameter to check_system_topology_and_cdc_generations_v3_consistency
test: topology: util.py: accept zero tokens in check_system_topology_and_cdc_generations_v3_consistency
treewide: support zero-token nodes in the recovery mode
storage_proxy: make TRUNCATE work locally for local tables
test: topology: util.py: document that check_token_ring_and_group0_consistency fails with zero-token nodes
test: test zero-token nodes
test: test_topology_ops: move helpers to topology/util.py
feature_service: introduce the ZERO_TOKEN_NODES feature
storage_service: rename join_token_ring to join_topology
storage_service: raft_topology_cmd_handler: improve warnings
topology_coordinator: fix indentation after the previous patch
treewide: introduce support for zero-token nodes in Raft topology
system_keyspace: load_topology_state: remove assertion impossible to hit
treewide: distinguish all nodes from all token owners
gossip topology: make a replacing node remove the replaced node from topology
locator: topology: add_or_update_endpoint: use none as the default node state
test: boost: tablets tests: ensure all nodes are normal token owners
token_metadata: rename get_all_endpoints and get_all_ips
network_topology_strategy: reallocate_tablets: remove unused dc_rack_nodes
virtual_tables: cluster_status_table: execute: set dc regardless of the token ownership
When only inter dc encryption is enabled a non encrypted connection
between two nodes is allowed only if both nodes are in the same dc.
If a nodes that initiates the connection knows that dst is in the same
dc and hence use non encrypted connection, but the dst not yet knows the
topology of the src such connection will not be allowed since dst cannot
guaranty that dst is in the same dc.
Currently, when topology coordinator is used, a replacing node will
appear in the coordinator's topology immediately after it is added to the
group0. The coordinator will try to send raft message to the new node
and (assuming only inter dc encryption is enabled and replacing node and
the coordinator are in the same dc) it will try to open regular, non encrypted,
connection to it. But the replacing node will not have the coordinator
in it's topology yet (it needs to sync the raft state for that). so it
will reject such connection.
To solve the problem the patch does not add a replacing node that was
just added to group0 to the topology. It will be added later, when
tokens will be assigned to it. At this point a replacing node will
already make sure that its topology state is up-to-date (since it will
execute a raft barrier in join_node_response_params handler) and it knows
coordinator's topology. This aligns replace behaviour with bootstrap
since bootstrap also does not add a node without a ring to the topology.
The patch effectively reverts b8ee8911caFixes: scylladb/scylladb#19025