The test is booting nodes, and then immediately starts shutting down
nodes and removing them from the cluster. The shutting down and
removing may happen before driver manages to connect to all nodes in the
cluster. In particular, the driver didn't yet connect to the last
bootstrapped node. Or it can even happen that the driver has connected,
but the control connection is established to the first node, and the
driver fetched topology from the first node when the first node didn't
yet consider the last node to be normal. So the driver decides to close
connection to the last node like this:
```
22:34:03.159 DEBUG> [control connection] Removing host not found in
peers metadata: <Host: 127.42.90.14:9042 datacenter1>
```
Eventually, at the end of the test, only the last node remains, all
other nodes have been removed or stopped. But the driver does not have a
connection to that last node.
Fix this problem by ensuring that:
- all nodes see each other as NORMAL,
- the driver has connected to all nodes
at the beginning of the test, before we start shutting down and removing
nodes.
Fixesscylladb/scylladb#16373Closesscylladb/scylladb#17676
The following incompatibilities were identified by `listsnapshots_test.py` in dtests:
* Command doesn't bail out when there are no snapshots, instead it prints meaningless empty report
* Formatting is incompatible
Both are fixed in this mini-series.
Closesscylladb/scylladb#17541
* github.com:scylladb/scylladb:
tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
It might happen that multiple tablets co-habit the same shard, so we want load-and-stream to jump into a new streaming session for every tablet, such that the receiver will have the data properly segregated. That's a similar treatment we gave to repair. Today, load-and-stream fails due to sstables spanning more than 1 tablet in the receiver.
Synchronization with migration is done by taking replication map, so migrations cannot advance while streaming new data. A bug was fixed too, where data must be streamed to pending replicas too, to handle case where migration is ongoing and new data must reach both old and new replica set. A test was added stressing this synchronization path.
Another bug was fixed in sstable loading, which expected sharder to not be invalidated throughout the operation, but that breaks during migrations.
Fixes#17315.
Closesscylladb/scylladb#17449
* github.com:scylladb/scylladb:
test: test_tablets: Add load-and-stream test
sstables_loader: Stream to pending tablet replica if needed
sstables_loader: Implement tablet based load-and-stream
sstables_loader: Virtualize sstable_streamer for tablet
sstables_loader: Avoid reallocations in vector
sstable_loader: Decouple sstable streaming from selection
sstables_loader: Introduce sstable_streamer
Fix online SSTable loading with concurrent tablet migration
When no keyspace is provided, request all keyspaces from the server,
then scrub all of them. This is what the legacy nodetool does, for some
reason this was missed when re-implementing scrub.
Closesscylladb/scylladb#17495
There are 4 barrier-only stages when migrating a tablet and the test needs to fail pending/leaving replica that handles it in order to validate how coordinator handles dead node. Failing the barrier is done by suspending it with injection code and stopping the node without waking it up. The main difficulty here is how to tell one barrier RPC call from another, because they don't have anything onboard that could tell which stage the barrier is run for. This PR suggests that barrier injection code looks directly into the system.tablets table for the transition stage, the stage is already there by the time barrier is about to ack itself over RPC.
refs: #16527Closesscylladb/scylladb#17450
* github.com:scylladb/scylladb:
topology.tablets_migration: Handle failed use_new
topology.tablets_migration: Handle failed write_both_read_new
topology.tablets_migration: Handle failed write_both_read_old
topology.tablets_migration: Handle failed allow_write_both_read_old
test/tablets_migration: Add conditional break-point into barrier handler
replica: Add helper to read tablet transition stage
topology_coordinator: Add action_failed() helper
The author (me) tried to be clever and fix the formatting, but then he
realized this just means a lot of unnecessary fighting with tests. So
this patch makes the formatting compatible with that of the legacy
nodetool:
* Use compatible rounding and precision formatting
* Use incorrect unit (KB instead of KiB)
* Align numbers to the left
* Add trailing white-space to "Snapshot Details: "
These two parameters are not used by the native nodetool, because
ScyllaDB itself doesn't support them. These should be just ignored and
indeed there was a unit test checking that this is the case. However,
due to a mistake in the unit test, this was not actually tested and
nodetool complained when seeing these params.
This patch fixes both the test and the native nodetool.
Closesscylladb/scylladb#17477
key_view::explode() contains a blatant use-after-free:
unless the input is already linearized, it returns a view to a local temporary buffer.
This is rare, because partition keys are usually not large enough to be fragmented.
But for a sufficiently large key, this bug causes a corrupted partition_key down
the line.
Fixes#17625Closesscylladb/scylladb#17626
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* tablet_id
* tablet_replica
* tablet_metadata
* tablet_map
their operator<<:s are dropped
Refs scylladb/scylladb#13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17504
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.
in this change, we define formatters for
* position_range
* mutation_fragment
* range_tombstone_stream
* mutation_fragment_v2::printer
Refs #13245Closesscylladb/scylladb#17521
* github.com:scylladb/scylladb:
mutation: add fmt::formatter for position_range
mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream
mutation: add fmt::formatter for mutation_fragment_v2::printer
This stage doesn't need any special treatment, because we cannot revert
to old replicas and should proceed normally. The barrier itself won't
get stuck, because it already handles excluded/ignored nodes.
Just make the test validate it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Two options here -- go revert to old replicas by jumping into
cleanup_target stage or proceed noramlly. The choice depends on which
replica set has less number of dead nodes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
At this stage it can happen that target replica got some writes, so its
tablet needs to be cleaned up, so jump to cleanup_target stage.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are several transition stages that are executed by the topology
coordinator with the help of barrier-and-drain raft commands. For the
test to stop and remove a node while handling this stage it must inject
a break-point into barrier handler, wait for it to happen and then stop
the node without resuming the break-point. Then removenode from the
cluster.
The break-point suspends barrier handling when a specific tablet is in
specific transition stage. Tablet ID and desired stage are configured
via injector parameters.
With today's error-injection facilities the way to suspend code
execution is with injecting a lambda that waits for a message from the
injection engine.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch series makes all auth writes serialized via raft. Reads stay
eventually consistent for performance reasons. To make transition to new
code easier data is stored in a newly created keyspace: system_auth_v2.
Internally the difference is that instead of executing CQL directly for
writes we generate mutations and then announce them via raft group0. Per
commit descriptions provide more implementation details.
Refs https://github.com/scylladb/scylladb/issues/16970
Fixes https://github.com/scylladb/scylladb/issues/11157Closesscylladb/scylladb#16578
* github.com:scylladb/scylladb:
test: extend auth-v2 migration test to catch stale static
test: add auth-v2 migration test
test: add auth-v2 snapshot transfer test
test: auth: add tests for lost quorum and command splitting
test: pylib: disconnect driver before re-connection
test: adjust tests for auth-v2
auth: implement auth-v2 migration
auth: remove static from queries on auth-v2 path
auth: coroutinize functions in password_authenticator
auth: coroutinize functions in standard_role_manager
auth: coroutinize functions in default_authorizer
storage_service: add support for auth-v2 raft snapshots
storage_service: extract getting mutations in raft snapshot to a common function
auth: service: capture string_view by value
alternator: add support for auth-v2
auth: add auth-v2 write paths
auth: add raft_group0_client as dependency
cql3: auth: add a way to create mutations without executing
cql3: run auth DML writes on shard 0 and with raft guard
service: don't loose service_level_controller when bouncing client_state
auth: put system_auth and users consts in legacy namespace
cql3: parametrize keyspace name in auth related statements
auth: parametrize keyspace name in roles metadata helpers
auth: parametrize keyspace name in password_authenticator
auth: parametrize keyspace name in standard_role_manager
auth: remove redundant consts auth::meta::*::qualified_name
auth: parametrize keyspace name in default_authorizer
db: make all system_auth_v2 tables use schema commitlog
db: add system_auth_v2 tables
db: add system_auth_v2 keyspace
When a tool application is invoked with an unknown operation, an error
message is printed, which includes all the known operations, with all
their aliases. This is collected in `std::vector<std::string_view>`. The
problem is that the vector containing alias names, is returned as a
value, so the code ends up creating views to temporaries.
Fix this by returning alias vector with const&.
Fixes: #17584Closesscylladb/scylladb#17586
before this change, we failed to apply the filtering of tablestats
command in the right way:
1. `table_filter` failed to check if delimiter is npos before
extract the cf component from the specified table name.
2. the stats should not included the keyspace which are not
included by the filter.
3. the total number of tables in the stats report should contain
all tables no matter they are filtered or not.
in this change, all the problems above are addressed. and the tests
are updated to cover these use cases.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17468
For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. The shard_reshaping_compaction_task_impl now groups the sstables based on their compaction groups before reshaping them.
Fixes https://github.com/scylladb/scylladb/issues/16966Closesscylladb/scylladb#17395
* github.com:scylladb/scylladb:
test/topology_custom: add testcase to verify reshape with tablets
test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction
replica/distributed_loader: enable reshape for sstables
compaction: reshape sstables within compaction groups
replica/table : add method to get compaction group id for an sstable
compaction: reshape: update total reshaped size only on success
compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run
- use API endpoint of /storage_service/toppartition/
- only print out the specified samplings.
- print "\n" separator between samplings
Closesscylladb/scylladb#17574
* github.com:scylladb/scylladb:
tools/scylla-nodetool: print separator between samplings
tools/scylla-nodetool: only print the specified sampling
tools/scylla-nodetool: use /storage_service/toppartition/
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* std::vector<data_type>
* column_identifier
* column_identifier_raw
* untyped_constant::type_class
and drop their operator<<:s
Refs #13245Closesscylladb/scylladb#17538
* github.com:scylladb/scylladb:
cql3: add fmt::formatter for expression::printer
cql3: add fmt::formatter for raw_value{,_view}
cql3: add fmt::formatter for std::vector<data_type>
cql3: add fmt::formatter for untyped_constant::type_class
cql3: add fmt::formatter for column_identifier{,_row}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* raw_value
* raw_value_view
`raw_value_view` 's operator<< is still being used by the generic
homebrew printer for vector<>, so it is preserved.
`raw_value` 's operator<< is still being used by the generic
homebrew printer for optional<>, so it's preserved as well.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.
Since scylladb/scylladb#15924 is the last issue mentioned in
scylladb/scylladb#15962, this PR also reenables background
writes in `test_topology_ops` with tablets disabled. The test
doesn't pass with tablets and background writes because of
scylladb/scylladb#17025. We will reenable background writes
with tablets after fixing that issue.
Fixesscylladb/scylladb#15924Fixesscylladb/scylladb#15962Closesscylladb/scylladb#17585
* github.com:scylladb/scylladb:
test: test_topology_ops: reenable background writes without tablets
test: test_topology_ops: run with and without tablets
test: topology: decrease the server's request timeouts
Tests that verify upgrading to the raft-based topology
(`test_topology_upgrade`, `test_topology_recovery_basic`,
`test_topology_recovery_majority_loss`) have flaky
`check_system_topology_and_cdc_generations_v3_consistency` calls.
`assert topo_results[0] == topo_res` can fail because of different
`unpublished_cdc_generations` on different nodes.
The upgrade procedure creates a new CDC generation, which is later
published by the CDC generation publisher. However, this can happen
after the upgrade procedure finishes. In tests, if publishing
happens just before querying `system.topology` in
`check_system_topology_and_cdc_generations_v3_consistency`, we can
observe different `unpublished_cdc_generations` on different nodes.
It is an expected and temporary inconsistency.
For the same reasons,
`check_system_topology_and_cdc_generations_v3_consistency` can
fail after adding a new node.
To make the tests not flaky, we wait until the CDC generation
publisher finishes its job. Then, all nodes should always have
equal (and empty) `unpublished_cdc_generations`.
Fixesscylladb/scylladb#17587Fixesscylladb/scylladb#17600Fixesscylladb/scylladb#17621Closesscylladb/scylladb#17622
With auth-v2 we can login even if quorum is lost. So test
which checks if error occurs in such situation is deleted
and the opposite test which checks if logging in works was
added.
Alternator doesn't do any writes to auth
tables so it's simply change of keyspace
name.
Docs will be updated later, when auth-v2
is enabled as default.
After fixing scylladb/scylladb#15924 in one of the previous
patches, we reenable background writes in `test_topology_ops`.
We also start background writes a bit later after adding all nodes.
Without this change and with tablets, the test fails with:
```
> await cql.run_async(f"CREATE TABLE tbl (pk int PRIMARY KEY, v int)")
E cassandra.protocol.ConfigurationException: <Error from server: code=2300
[Query invalid because of configuration issue] message="Datacenter
datacenter1 doesn't have enough nodes for replication_factor=3">
```
The change above makes the test a bit weaker, but we don't have to
worry about it. If adding nodes is bugged, other tests should
detect it.
Unfortunately, the test still doesn't pass with tablets and
background writes because of scylladb/scylladb#17025, so we keep
background writes disabled with tablets and leave FIXME.
Fixesscylladb/scylladb#15962
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.
A high server's request timeout can slow down the topology tests
(see the new comment in `make_scylla_conf`). We make the timeout
dependent on the testing mode to not slow down tests for no reason.
We don't touch the driver's request timeout. Decreasing it in some
modes would require too much effort for almost no improvement.
Fixesscylladb/scylladb#15924
Calling notify_left for old ip on topology change in raft mode
was a regression. In gossiper mode it didn't occur. In gossiper
mode the function handle_state_normal was responsible for spotting
IP addresses that weren't managing any parts of the data, and
it would then initiate their removal by calling remove_endpoint.
This removal process did not include calling notify_left.
Actually, notify_left was only supposed to be called (via excise) by
a 'real' removal procedures - removenode and decommission.
The redundant notify_left caused troubles in scylla python driver.
The driver could receive REMOVED_NODE and NEW_NODE notifications
in the same time and their handling routines could race with each other.
In this commit we fix the problem by not calling notify_left if
the remove_ip lambda was called from the ip change code path.
Also, we add a test which verifies that the driver log doesn't
mention the REMOVED_NODE notification.
Fixesscylladb/scylladb#17444Closesscylladb/scylladb#17561
before this change, we print all samplings returned by the API,
but this is not what cassandra nodetool's behavior, which only
prints out the specified one. and the toppartitions_test.py
in dtest actually expects that the number of sampling should
match with the one specified with command line.
so, in this change, we only print out the specified samplings.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
instead of using the endpoint of /storage_service/toppartition,
use /storage_service/toppartition/. otherwise API server refuses
to return the expected result. as it does match with any API endpoint.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This test needed a lot of data to ensure multiple pages when doing the read repair. This change two key configuration items, allowing for a drastic reduction of the data size and consequently a large reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa, reducing this to a too small value would start killing internal queries. Now, after said commit, this is no longer a concern, as this limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.
The latter configuration is a new one, added by the first patches of this series. It allows configuring the page-size in bytes, after which pages are cut. Previously this was a hard-coded constant: 1MB. This forced any tests which wanted to check paging, with pages cut on size, to work with large datasets. This was especially pronounced in the tests fixed in this PR, because this test works with tombstones which are tiny and a lot of them were needed to trigger paging based on the size.
With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8
The runtime of the test consequently drops from 62 seconds to 13.5 seconds (dev mode, on my build machine).
Fixes: https://github.com/scylladb/scylladb/issues/15425
Fixes: https://github.com/scylladb/scylladb/issues/16899Closesscylladb/scylladb#17529
* github.com:scylladb/scylladb:
test/topology_custom: test_read_repair.py: reduce run-time
replica/database: get_query_max_result_size(): use query_page_size_in_bytes
replica/database: use include page-size in max-result-size
query-request: max_result_size: add without_page_limit()
db/config: introduce query_page_size_in_bytes
Fix test_storage_service.py to work with tablets.
- test_describe_ring was failing because in storage_service/describe_ring
table must be specified for keyspaces with tablets.
Do not check the status if tablets are enabled. Add checks for
specified table;
- test_storage_service_keyspace_cleanup_with_no_owned_ranges
was failing because cleanup is disabled on keyspaces with tablets.
Use test_keyspace_vnodes fixture to use keyspace with tablet disabled;
- test_storage_service_get_natural_endpoints required
some minor type-related fixes.
Fix test_compaction_task.py to work with tablets.
Currently test fail because cleanup on keyspace with tablets is
disabled, and reshape and reshard of keyspace with tablets uses
load_and_stream which isn't covered by tasks.
Use test_keyspace_vnodes for these tests to have a keyspace with
tablets disabled.