Commit Graph

6493 Commits

Author SHA1 Message Date
Botond Dénes
be5a18c07d tools/scylla-nodetool: repair: set the jobThreads request parameter
Although ScyllaDB ignores this request parameter, the Java nodetools
sets it, so it is better to have the native one do the same for
symmetry. It makes testing easier.
Discovered with the more strict request matching introduced in the next
patches.
2024-03-14 03:26:13 -04:00
Avi Kivity
4db4b2279c Merge 'tools/scylla-nodetool: implement the last batch of commands' from Botond Dénes
This PR implements the following new nodetool commands:
* netstats
* tablehistograms/cfhistograms
* proxyhistograms

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17651

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the proxyhistograms command
  tools/scylla-nodetool: implement the tableshistograms command
  tools/scylla-nodetool: introduce buffer_samples
  utils/estimated_histogram: estimated_histogram: add constructor taking buckets
  tools/scylla-nodetool: implement the netstats command
  tools/scylla-nodetool: add correct units to file_size_printer
2024-03-13 12:46:11 +02:00
Marcin Maliszkiewicz
7b60752e47 test: fix cql connection problem in test_auth_raft_command_split
This is a speculative fix as the problem is observed only on CI.
When run_async is called right after driver_connect and get_cql
it fails with ConnectionException('Host has been marked down or
removed').

If the approach proves to be succesfull we can start to deprecate
base get_cql in favor of get_ready_cql. It's better to have robust
testing helper libraries than try to take care of it in every test
case separately.

Fixes #17713

Closes scylladb/scylladb#17772
2024-03-13 10:36:51 +01:00
Pavel Emelyanov
2e982df898 test/tablets: Generalize repair history loading
Two repair test cases verify that repair generated enough rows in the
history table. Both use identical code for that, worth generalizing

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17761
2024-03-13 10:22:57 +02:00
Kefu Chai
fb4f48b4ed schema: add fmt::formatter for schema
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* column_definition
* column_mapping
* ordinal_column_id
* raw_view_info
* schema
* view_ptr

their operator<<:s are dropped. but operator<< for schema is preserved,
as we are still printing `seastar::lw_shared_ptr<const schema>` with
our homebrew generic formatter for `seastar::lw_shared_ptr<>`, which
uses operator<< to print the pointee.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17768
2024-03-13 09:29:00 +02:00
Pavel Emelyanov
d90db016bf treewide: Use partition_slice::is_reversed()
Continuation of cc56a971e8, more noisy places detected

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17763
2024-03-13 08:52:46 +02:00
Botond Dénes
a329cc34b7 tools/scylla-nodetool: implement the proxyhistograms command 2024-03-13 02:06:30 -04:00
Botond Dénes
a52eddc9c1 tools/scylla-nodetool: implement the tableshistograms command 2024-03-13 02:06:30 -04:00
Botond Dénes
006bc84761 tools/scylla-nodetool: implement the netstats command 2024-03-13 02:06:10 -04:00
Avi Kivity
f410038296 Merge 'Use do_with_cql_env_thread() helper in storage proxy test' from Pavel Emelyanov
Just a cleanup -- replace do_with_cql_env + async with do_with_cql_env_thread

Closes scylladb/scylladb#17758

* github.com:scylladb/scylladb:
  test/storage_proxy: Restore indentation after previous patch
  test/storage_proxy: Use do_with_cql_env_thread()
2024-03-12 20:23:40 +02:00
Pavel Emelyanov
34477ad98e test/storage_proxy: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-12 19:10:44 +03:00
Pavel Emelyanov
fd112446c2 test/storage_proxy: Use do_with_cql_env_thread()
One of the test cases explicitly wraps itself into async, but there's a
convenience helper for that already.

Indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-12 19:10:33 +03:00
Pavel Emelyanov
a755914265 test/cql_query_test: Use string_view by value
The test carries const std::string_view& around, but the type is
lightweight class that can be copied around at the same cost as its
reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17735
2024-03-12 13:44:04 +02:00
Botond Dénes
f3735dc8e0 Merge 'utils: add fmt::formatter for utils types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* utils::human_readable_value
* std::strong_ordering
* std::weak_ordering
* std::partial_ordering
* utils::exception_container

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17710

* github.com:scylladb/scylladb:
  utils/exception_container: add fmt::formatter for exception_container
  utils/human_readable: add fmt::formatter for human_readable_value
  utils: add fmt::formatter for std::strong_ordering and friends
2024-03-12 13:27:37 +02:00
Botond Dénes
3a7364525f Merge 'test/alternator: improve metrics tests' from Nadav Har'El
This small series improves the Alternator tests for metrics:
1. Improves some comments in the test.
2. Restores a test that was previously hidden by two tests having the same name.
3. Adds tests for latency histogram metrics.

Closes scylladb/scylladb#17623

* github.com:scylladb/scylladb:
  test/alternator: tests for latency metrics
  test/alternator: improve comments and unhide hidden test
2024-03-12 09:13:17 +02:00
Kefu Chai
007d7f1355 utils: add fmt::formatter for std::strong_ordering and friends
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* std::strong_ordering
* std::weak_ordering
* std::partial_ordering

and their operator<<:s are moved to test/lib/test_utils.{hh,cc}, as they
are only used by Boost.test.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-12 14:53:55 +08:00
Tomasz Grabiec
47a66d0150 Merge 'Handle tablet migration failure in wrapping-up stages' from Pavel Emelyanov
There are four stages left to handle: cleanup, cleanup_target, end_migration and revert_migration. All are handling removed nodes already, so the PR just extends the test.

fixes: #16527

Closes scylladb/scylladb#17684

* github.com:scylladb/scylladb:
  test/tablets_migration: Test revert_migration failure handling
  test/tablets_migration: Test end_migration failure handling
  test/tablets_migration: Test cleanup_target failure handling
  test/tablets_migration: Test cleanup failure handling
  test/tablets_migration: Prepare for do_... stages
  test/tablets_migration: Add ability to removenode via any other node
  test/tablets_migration: Wrap migration stages failing code into a helper class
  storage_service: Add failure injection to crash cleanup_tablet
2024-03-12 00:20:56 +01:00
Asias He
ebc0ab94e5 repair: Add ranges option support for tablet repair
The management tool, e.g., scylla manager, needs the ranges option to
select which ranges to repair on a node to schedule repair jobs.

This patch adds ranges option support.

E.g.,

curl -X POST "http://127.0.0.1:10000/storage_service/repair_async/ks1?ranges=-4611686018427387905:-1,4611686018427387903:9223372036854775807"

Fixes: #17416
Tests: test_tablet_repair_ranges_selection

Closes scylladb/scylladb#17436
2024-03-11 20:03:12 +02:00
Nadav Har'El
d207962e40 test/alternator: tests for latency metrics
In test/alternator/test_metrics.py we had tests for the operation-count
metrics for different Alternator API operations, but not for the latency
histograms for these same operations. So this patch adds the missing
tests (and removes a TODO asking to do that).

Note that only a subset of the operations - PutItem, GetItem, DeleteItem,
UpdateItem, and GetRecords - currently have a latency history, and this
test verifies this. We have an issue (Refs #17616) about adding latency
histograms for more operations - at which point we will be able to expand
this test for the additional operations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-11 19:26:59 +02:00
Nadav Har'El
970c2dc7a6 test/alternator: improve comments and unhide hidden test
The original goal of this patch was to improve comments in
test/alternator/test_metrics.py, but while doing that I discovered
that one of the test functions was hidden by a second test with
the same name! So this patch also renames the second test.

The test continues to work after this patch - the hidden test
was successful.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-11 19:26:59 +02:00
Botond Dénes
7d31093d4b Merge 'storage_service/ownership: handle requests when tablets are enabled' from Patryk Wróbel
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().

This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.

Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.

Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.

Fixes: https://github.com/scylladb/scylladb/issues/17342

Closes scylladb/scylladb#17405

* github.com:scylladb/scylladb:
  storage_service/ownership: discard get_ownership() requests when tablets enabled
  storage_service/ownership/{keyspace}: handle requests when tablets are enabled
  locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual
  locator/tablets: add tablet_map::get_sorted_tokens()
  pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient
  rest_api/test_storage_service: add simplistic tests of ownership API for vnodes
2024-03-11 14:55:26 +02:00
Kefu Chai
1ab30fc306 clustering_bounds_comparator: add fmt::formtter for bound_{kind,view}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `bound_kind` and `bound_view`,
and drop the latter's operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17706
2024-03-11 11:37:48 +02:00
Patryk Wrobel
9eb91b5526 storage_service/ownership: discard get_ownership() requests when tablets enabled
This change introduces a logic, that is responsible
for checking if tablets are enabled for any of
keyspaces when get_ownership() is invoked.

Without it, the result would be calculated
based solely on sorted_tokens() which was
invalid.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:52:25 +01:00
Patryk Wrobel
51da80da7d storage_service/ownership/{keyspace}: handle requests when tablets are enabled
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().

This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.

Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.

Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:52:23 +01:00
Patryk Wrobel
a39a5b671e pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient
This change adds a member function that can be used
to access 'storage_service/ownership' API.

It will be used by tests that need to access this API.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Patryk Wrobel
dea76c4763 rest_api/test_storage_service: add simplistic tests of ownership API for vnodes
This change is intended to introduce tests for vnodes for
the following API paths:
 - 'storage_service/ownership'
 - 'storage_service/ownership/{keyspace}'

In next patches the logic that is tested will be adjusted
to work correctly when tablets are enabled. This is a safety
net that ensures that the logic is not broken.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Kefu Chai
38ae52d5cd add fmt::formatter for reader_permit::state and reader_resources
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* reader_permit::state
* reader_resources

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17707
2024-03-11 09:55:51 +02:00
Pavel Emelyanov
feae470475 test/tablets_migration: Test revert_migration failure handling
This stage is also the error path that starts from write_both_read_old,
so check this failure in two steps -- first fail the latter stage in one
of the nodes, then fail the former in another.

For that one more node in the cluster is needed.

Also, to avoid name conflicts, the do_revert_migration pseudo stage name
is used.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
c3d96b1a86 test/tablets_migration: Test end_migration failure handling
This stage is pure barrier. Barriers already take ignored nodes into
account, so do the fail-injector, so just wire the stage name into the
test.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
180446e7b8 test/tablets_migration: Test cleanup_target failure handling
This stage is error path, so in order to fail it we need to fail some
other stage prior to that. This leads to the testing sequence of

1. fail streaming via source node
2. stop and remove source node to let state machine proceed
3. fail cleanup_target on the destination node
4. stop and remove destination node

First thing to note here, is that the test doesn't fail source node for
cleanup_target stage, symmetrically to how it does for cleanup stage.

Next, since we're removing two nodes, the cluster is equipeed with more
nodes nodes to have raft quorum.

Finally, since remove of source node doesn't finish until tablet
migration finishes, it's impossible to remove destination node via the
same node-0, so the 2nd removenode happens via node-3.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
724c79ecf6 test/tablets_migration: Test cleanup failure handling
The handling itself is already there -- if the leaving node is excluded
the cleanup stage resolves immediately. So just add a code that
validates that.

Also, skip testing of pending replica failure during cleanup stage, as
it doesn't really participate in it any longer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
ccefb7f21f test/tablets_migration: Prepare for do_... stages
The tablets migration test is parametrized with stage name to inject
failure in. Internal class node_failer uses this parameter as is when
injecting a failure into scylla barrier handler.

Next patch will need to extend the test with revert_migration value and
add handling of this name to node_failer class. The node_failer class,
in turn, will want to instantiate two other instances of the same class
-- one to fail the write_both_read_old stage, and the other one to fail
the revert_migration barrier. So internally the class will need to tell
revert_migration value as full test parameter from revert_migration as
barrier-only parameter.

This test adds the ability to add do_ prefix to node_failer parameter to
tell full test from barrier-only. When injecting a failure into scylla
the do_ prefix needs to be cut off, since scylla still needs to fail the
barrier named revert_migration, not do_revert_migration.

Also split the long line while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:58 +03:00
Pavel Emelyanov
abbd22cb90 test/tablets_migration: Add ability to removenode via any other node
Currently the test calls removenode via node-0 in the cluster, which is
always alive. Next test case will need to call removenode on some other
node (more details in that patch later).

refs: #17681

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:55 +03:00
Pavel Emelyanov
5d3291f322 test/tablets_migration: Wrap migration stages failing code into a helper class
One of the next stages will need to use two of them at the same time and
it's going to be easier if the failing code is encapsulated.

No functional changes here, just large portions of code and local
variables are moved into class and its methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:55 +03:00
Botond Dénes
9f97d21339 Merge 'Enhance perf-simple-query test' from Pavel Emelyanov
While measuring #17149 with this test some changes were applied, here they are

- keep initial_tablets number in output json's parameters section
- disable auto compaction
- add control over the amount of sstables generated for --bypass-cache case

Closes scylladb/scylladb#17473

* github.com:scylladb/scylladb:
  perf_simple_query: Add --memtable-partitions option
  perf_simple_query: Disable auto compaction
  perf_simple_query: Keep number of initial tablets in output json
2024-03-08 15:21:04 +02:00
Kefu Chai
079d70145e raft: add fmt::formatter for raft tracker types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raft::election_tracker
* raft::votes
* raft::vote_result

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17670
2024-03-08 15:19:37 +02:00
Botond Dénes
630be97d2f Merge 'tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"' from Kefu Chai
before this change, "ring" subcommand has two issues:

1. `--resolve-ip` option accepts a boolean argument, but this option
   should be a switch, which does not accept any argument at all
2. it always prints the endpoint no matter if `--resolve-ip` is
   specified or not. but it should print the resolved name, instead
   of an IP address if `--resolve-ip` is specified.

in this change, both issues are addressed. and the test is updated
accordingly to exercise the case where `--resolve-ip` is used.

Closes scylladb/scylladb#17553

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"
  test/nodetool: calc max_width from all_hosts
  test/nodetool: keep tokens as Host's member
  test/nodetool: remove unused import
2024-03-08 15:15:19 +02:00
Kefu Chai
8ca672a02c test/pylib: return better error if self.create_server() raises
in `ScyllaServer::add_server()`, `self.create_server()` is called to
create a server, but if it raises, we would reference a local variable
of `server` which is not bound to any value, as `server` is not assigned
at that moment. if `ScyllaServer` is used by `ScyllaClusterManager`, we
would not be able to see the real exception apart from the error like

```
cannot access local variable 'server' where it is not associated with a
value
```

which is but the error from Python runtime.

in this change, `server` is always initialized, and we check for None,
before dereference it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17693
2024-03-08 15:10:27 +02:00
Botond Dénes
505f137cc9 Merge 'Make object_store suite use ManagerClient' from Pavel Emelyanov
The test cases in this suite need to start scylla with custom config options, restart it and call API on it. By the time the suite was created all this wasn't possible with any library facility, so the suite carries its version of managed_cluster class that piggy-backs cql-pytest scylla starting. Now test.py has pretty flexible manager that provides all the scylla cluster management object_store suite needs. This PR makes the suite use the manager client instead of the home-brew managed_cluster thing

refs: #16006
fixes: #16268

Closes scylladb/scylladb#17292

* github.com:scylladb/scylladb:
  test/object_store: Remove unused managed_cluster (and other stuff)
  test/object_store: Use tmpdir fixture in flush-retry case
  test/object_store: Turn flush-retry case to use ManagerClient
  test/object_store: Turn "misconfigured" case to use ManagerClient
  test/object_store: Turn garbage-collect case to use ManagerClient
  test/object_store: Turn basic case to use ManagerClient
  test/object_store: Prepare to work with ManagerClient
2024-03-08 15:04:46 +02:00
Kamil Braun
ae954fb2ec test: unflake test_tablets_removenode
These tests are inserting data into RF=3 tables, but used the default
consistency level which is taken from the default execution profile
which is set to LOCAL_QUORUM. The tests would then read with CL=ONE, so
we cannot give a guarantee that some of the data won't be missed. Fix
this by inserting the data with CL=ALL. (Do it for all RF cases for
simplicity.)

Fixes scylladb/scylladb#17695

Closes scylladb/scylladb#17700
2024-03-08 12:47:47 +01:00
Kamil Braun
76fb902858 test: unflake test_topology_remove_garbage_group0
The test is booting nodes, and then immediately starts shutting down
nodes and removing them from the cluster. The shutting down and
removing may happen before driver manages to connect to all nodes in the
cluster. In particular, the driver didn't yet connect to the last
bootstrapped node. Or it can even happen that the driver has connected,
but the control connection is established to the first node, and the
driver fetched topology from the first node when the first node didn't
yet consider the last node to be normal. So the driver decides to close
connection to the last node like this:
```
22:34:03.159 DEBUG> [control connection] Removing host not found in
   peers metadata: <Host: 127.42.90.14:9042 datacenter1>
```

Eventually, at the end of the test, only the last node remains, all
other nodes have been removed or stopped. But the driver does not have a
connection to that last node.

Fix this problem by ensuring that:
- all nodes see each other as NORMAL,
- the driver has connected to all nodes
at the beginning of the test, before we start shutting down and removing
nodes.

Fixes scylladb/scylladb#16373

Closes scylladb/scylladb#17676
2024-03-08 10:08:09 +01:00
Nadav Har'El
ea53db379f Merge 'tools/scylla-nodetool: listsnapshot: make it compatible with origin' from Botond Dénes
The following incompatibilities were identified by `listsnapshots_test.py` in dtests:
* Command doesn't bail out when there are no snapshots, instead it prints meaningless empty report
* Formatting is incompatible

Both are fixed in this mini-series.

Closes scylladb/scylladb#17541

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
  tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
2024-03-08 10:08:09 +01:00
Kefu Chai
de276901f2 tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"
before this change, "ring" subcommand has two issues:

1. `--resolve-ip` option accepts a boolean argument, but this option
   should be a switch, which does not accept any argument at all
2. it always prints the endpoint no matter if `--resolve-ip` is
   specified or not. but it should print the resolved name, instead
   of an IP address if `--resolve-ip` is specified.

in this change, both issues are addressed. and the test is updated
accordingly to exercise the case where `--resolve-ip` is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:29:31 +08:00
Kefu Chai
d927ee8d8f test/nodetool: calc max_width from all_hosts
for better readability. as `token_to_endpoint` is but a derived
variable from `all_hosts`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Kefu Chai
4a748c7fb0 test/nodetool: keep tokens as Host's member
to be more consistent with the test_status.py.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Kefu Chai
aefc385786 test/nodetool: remove unused import
and add two empty lines in between global functions

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Botond Dénes
b69ee6bc27 Merge 'Fix load-and-stream for tablets' from Raphael "Raph" Carvalho
It might happen that multiple tablets co-habit the same shard, so we want load-and-stream to jump into a new streaming session for every tablet, such that the receiver will have the data properly segregated. That's a similar treatment we gave to repair. Today, load-and-stream fails due to sstables spanning more than 1 tablet in the receiver.

Synchronization with migration is done by taking replication map, so migrations cannot advance while streaming new data. A bug was fixed too, where data must be streamed to pending replicas too, to handle case where migration is ongoing and new data must reach both old and new replica set. A test was added stressing this synchronization path.

Another bug was fixed in sstable loading, which expected sharder to not be invalidated throughout the operation, but that breaks during migrations.

Fixes #17315.

Closes scylladb/scylladb#17449

* github.com:scylladb/scylladb:
  test: test_tablets: Add load-and-stream test
  sstables_loader: Stream to pending tablet replica if needed
  sstables_loader: Implement tablet based load-and-stream
  sstables_loader: Virtualize sstable_streamer for tablet
  sstables_loader: Avoid reallocations in vector
  sstable_loader: Decouple sstable streaming from selection
  sstables_loader: Introduce sstable_streamer
  Fix online SSTable loading with concurrent tablet migration
2024-03-07 14:18:30 +02:00
Botond Dénes
09068d20ea tools/scylla-nodetool: scrub: make keyspace parameter optional
When no keyspace is provided, request all keyspaces from the server,
then scrub all of them. This is what the legacy nodetool does, for some
reason this was missed when re-implementing scrub.

Closes scylladb/scylladb#17495
2024-03-07 11:15:46 +02:00
Tomasz Grabiec
ec6ed18b5c Merge 'Handle tablet migration failure in barrier stages' from Pavel Emelyanov
There are 4 barrier-only stages when migrating a tablet and the test needs to fail pending/leaving replica that handles it in order to validate how coordinator handles dead node. Failing the barrier is done by suspending it with injection code and stopping the node without waking it up. The main difficulty here is how to tell one barrier RPC call from another, because they don't have anything onboard that could tell which stage the barrier is run for. This PR suggests that barrier injection code looks directly into the system.tablets table for the transition stage, the stage is already there by the time barrier is about to ack itself over RPC.

refs: #16527

Closes scylladb/scylladb#17450

* github.com:scylladb/scylladb:
  topology.tablets_migration: Handle failed use_new
  topology.tablets_migration: Handle failed write_both_read_new
  topology.tablets_migration: Handle failed write_both_read_old
  topology.tablets_migration: Handle failed allow_write_both_read_old
  test/tablets_migration: Add conditional break-point into barrier handler
  replica: Add helper to read tablet transition stage
  topology_coordinator: Add action_failed() helper
2024-03-07 09:56:13 +01:00
Botond Dénes
5dfaa69bde tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
The author (me) tried to be clever and fix the formatting, but then he
realized this just means a lot of unnecessary fighting with tests. So
this patch makes the formatting compatible with that of the legacy
nodetool:
* Use compatible rounding and precision formatting
* Use incorrect unit (KB instead of KiB)
* Align numbers to the left
* Add trailing white-space to "Snapshot Details: "
2024-03-07 03:54:54 -05:00