Commit Graph

6459 Commits

Author SHA1 Message Date
Botond Dénes
9f97d21339 Merge 'Enhance perf-simple-query test' from Pavel Emelyanov
While measuring #17149 with this test some changes were applied, here they are

- keep initial_tablets number in output json's parameters section
- disable auto compaction
- add control over the amount of sstables generated for --bypass-cache case

Closes scylladb/scylladb#17473

* github.com:scylladb/scylladb:
  perf_simple_query: Add --memtable-partitions option
  perf_simple_query: Disable auto compaction
  perf_simple_query: Keep number of initial tablets in output json
2024-03-08 15:21:04 +02:00
Kefu Chai
079d70145e raft: add fmt::formatter for raft tracker types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raft::election_tracker
* raft::votes
* raft::vote_result

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17670
2024-03-08 15:19:37 +02:00
Botond Dénes
630be97d2f Merge 'tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"' from Kefu Chai
before this change, "ring" subcommand has two issues:

1. `--resolve-ip` option accepts a boolean argument, but this option
   should be a switch, which does not accept any argument at all
2. it always prints the endpoint no matter if `--resolve-ip` is
   specified or not. but it should print the resolved name, instead
   of an IP address if `--resolve-ip` is specified.

in this change, both issues are addressed. and the test is updated
accordingly to exercise the case where `--resolve-ip` is used.

Closes scylladb/scylladb#17553

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"
  test/nodetool: calc max_width from all_hosts
  test/nodetool: keep tokens as Host's member
  test/nodetool: remove unused import
2024-03-08 15:15:19 +02:00
Kefu Chai
8ca672a02c test/pylib: return better error if self.create_server() raises
in `ScyllaServer::add_server()`, `self.create_server()` is called to
create a server, but if it raises, we would reference a local variable
of `server` which is not bound to any value, as `server` is not assigned
at that moment. if `ScyllaServer` is used by `ScyllaClusterManager`, we
would not be able to see the real exception apart from the error like

```
cannot access local variable 'server' where it is not associated with a
value
```

which is but the error from Python runtime.

in this change, `server` is always initialized, and we check for None,
before dereference it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17693
2024-03-08 15:10:27 +02:00
Botond Dénes
505f137cc9 Merge 'Make object_store suite use ManagerClient' from Pavel Emelyanov
The test cases in this suite need to start scylla with custom config options, restart it and call API on it. By the time the suite was created all this wasn't possible with any library facility, so the suite carries its version of managed_cluster class that piggy-backs cql-pytest scylla starting. Now test.py has pretty flexible manager that provides all the scylla cluster management object_store suite needs. This PR makes the suite use the manager client instead of the home-brew managed_cluster thing

refs: #16006
fixes: #16268

Closes scylladb/scylladb#17292

* github.com:scylladb/scylladb:
  test/object_store: Remove unused managed_cluster (and other stuff)
  test/object_store: Use tmpdir fixture in flush-retry case
  test/object_store: Turn flush-retry case to use ManagerClient
  test/object_store: Turn "misconfigured" case to use ManagerClient
  test/object_store: Turn garbage-collect case to use ManagerClient
  test/object_store: Turn basic case to use ManagerClient
  test/object_store: Prepare to work with ManagerClient
2024-03-08 15:04:46 +02:00
Kamil Braun
ae954fb2ec test: unflake test_tablets_removenode
These tests are inserting data into RF=3 tables, but used the default
consistency level which is taken from the default execution profile
which is set to LOCAL_QUORUM. The tests would then read with CL=ONE, so
we cannot give a guarantee that some of the data won't be missed. Fix
this by inserting the data with CL=ALL. (Do it for all RF cases for
simplicity.)

Fixes scylladb/scylladb#17695

Closes scylladb/scylladb#17700
2024-03-08 12:47:47 +01:00
Kamil Braun
76fb902858 test: unflake test_topology_remove_garbage_group0
The test is booting nodes, and then immediately starts shutting down
nodes and removing them from the cluster. The shutting down and
removing may happen before driver manages to connect to all nodes in the
cluster. In particular, the driver didn't yet connect to the last
bootstrapped node. Or it can even happen that the driver has connected,
but the control connection is established to the first node, and the
driver fetched topology from the first node when the first node didn't
yet consider the last node to be normal. So the driver decides to close
connection to the last node like this:
```
22:34:03.159 DEBUG> [control connection] Removing host not found in
   peers metadata: <Host: 127.42.90.14:9042 datacenter1>
```

Eventually, at the end of the test, only the last node remains, all
other nodes have been removed or stopped. But the driver does not have a
connection to that last node.

Fix this problem by ensuring that:
- all nodes see each other as NORMAL,
- the driver has connected to all nodes
at the beginning of the test, before we start shutting down and removing
nodes.

Fixes scylladb/scylladb#16373

Closes scylladb/scylladb#17676
2024-03-08 10:08:09 +01:00
Nadav Har'El
ea53db379f Merge 'tools/scylla-nodetool: listsnapshot: make it compatible with origin' from Botond Dénes
The following incompatibilities were identified by `listsnapshots_test.py` in dtests:
* Command doesn't bail out when there are no snapshots, instead it prints meaningless empty report
* Formatting is incompatible

Both are fixed in this mini-series.

Closes scylladb/scylladb#17541

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
  tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
2024-03-08 10:08:09 +01:00
Kefu Chai
de276901f2 tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"
before this change, "ring" subcommand has two issues:

1. `--resolve-ip` option accepts a boolean argument, but this option
   should be a switch, which does not accept any argument at all
2. it always prints the endpoint no matter if `--resolve-ip` is
   specified or not. but it should print the resolved name, instead
   of an IP address if `--resolve-ip` is specified.

in this change, both issues are addressed. and the test is updated
accordingly to exercise the case where `--resolve-ip` is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:29:31 +08:00
Kefu Chai
d927ee8d8f test/nodetool: calc max_width from all_hosts
for better readability. as `token_to_endpoint` is but a derived
variable from `all_hosts`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Kefu Chai
4a748c7fb0 test/nodetool: keep tokens as Host's member
to be more consistent with the test_status.py.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Kefu Chai
aefc385786 test/nodetool: remove unused import
and add two empty lines in between global functions

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Botond Dénes
b69ee6bc27 Merge 'Fix load-and-stream for tablets' from Raphael "Raph" Carvalho
It might happen that multiple tablets co-habit the same shard, so we want load-and-stream to jump into a new streaming session for every tablet, such that the receiver will have the data properly segregated. That's a similar treatment we gave to repair. Today, load-and-stream fails due to sstables spanning more than 1 tablet in the receiver.

Synchronization with migration is done by taking replication map, so migrations cannot advance while streaming new data. A bug was fixed too, where data must be streamed to pending replicas too, to handle case where migration is ongoing and new data must reach both old and new replica set. A test was added stressing this synchronization path.

Another bug was fixed in sstable loading, which expected sharder to not be invalidated throughout the operation, but that breaks during migrations.

Fixes #17315.

Closes scylladb/scylladb#17449

* github.com:scylladb/scylladb:
  test: test_tablets: Add load-and-stream test
  sstables_loader: Stream to pending tablet replica if needed
  sstables_loader: Implement tablet based load-and-stream
  sstables_loader: Virtualize sstable_streamer for tablet
  sstables_loader: Avoid reallocations in vector
  sstable_loader: Decouple sstable streaming from selection
  sstables_loader: Introduce sstable_streamer
  Fix online SSTable loading with concurrent tablet migration
2024-03-07 14:18:30 +02:00
Botond Dénes
09068d20ea tools/scylla-nodetool: scrub: make keyspace parameter optional
When no keyspace is provided, request all keyspaces from the server,
then scrub all of them. This is what the legacy nodetool does, for some
reason this was missed when re-implementing scrub.

Closes scylladb/scylladb#17495
2024-03-07 11:15:46 +02:00
Tomasz Grabiec
ec6ed18b5c Merge 'Handle tablet migration failure in barrier stages' from Pavel Emelyanov
There are 4 barrier-only stages when migrating a tablet and the test needs to fail pending/leaving replica that handles it in order to validate how coordinator handles dead node. Failing the barrier is done by suspending it with injection code and stopping the node without waking it up. The main difficulty here is how to tell one barrier RPC call from another, because they don't have anything onboard that could tell which stage the barrier is run for. This PR suggests that barrier injection code looks directly into the system.tablets table for the transition stage, the stage is already there by the time barrier is about to ack itself over RPC.

refs: #16527

Closes scylladb/scylladb#17450

* github.com:scylladb/scylladb:
  topology.tablets_migration: Handle failed use_new
  topology.tablets_migration: Handle failed write_both_read_new
  topology.tablets_migration: Handle failed write_both_read_old
  topology.tablets_migration: Handle failed allow_write_both_read_old
  test/tablets_migration: Add conditional break-point into barrier handler
  replica: Add helper to read tablet transition stage
  topology_coordinator: Add action_failed() helper
2024-03-07 09:56:13 +01:00
Botond Dénes
5dfaa69bde tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
The author (me) tried to be clever and fix the formatting, but then he
realized this just means a lot of unnecessary fighting with tests. So
this patch makes the formatting compatible with that of the legacy
nodetool:
* Use compatible rounding and precision formatting
* Use incorrect unit (KB instead of KiB)
* Align numbers to the left
* Add trailing white-space to "Snapshot Details: "
2024-03-07 03:54:54 -05:00
Botond Dénes
80483ba732 tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
Print a message and exit, don't continue to output the snapshot table.
This is what the legacy nodetool does too.
2024-03-07 03:54:54 -05:00
Botond Dénes
ac15e4c109 tools/scylla-nodetool: repair: accept and ignore -full/--full and -j/--job-threads
These two parameters are not used by the native nodetool, because
ScyllaDB itself doesn't support them. These should be just ignored and
indeed there was a unit test checking that this is the case. However,
due to a mistake in the unit test, this was not actually tested and
nodetool complained when seeing these params.
This patch fixes both the test and the native nodetool.

Closes scylladb/scylladb#17477
2024-03-07 11:53:50 +03:00
Botond Dénes
75fe2f5c3a Merge 'test: rest_api: fix tests to work with tablets' from Aleksandra Martyniuk
Fix test_compaction_task.py, test_repair_task.py and
test_storage_service.py to work with tablets.

Fixes: #17338.

Closes scylladb/scylladb#17474

* github.com:scylladb/scylladb:
  test: rest_api: enable tablets by default
  test: fix indentation and delete unused this_dc param
  test: rest_api: fix test_storage_service.py
  test: rest_api: fix test_repair_task.py
  test: rest_api: fix test_compaction_task.py
  test: rest_api: use skip_without_tablets fixture
  test: rest_api: add some tablet related fixtures
2024-03-07 10:00:09 +02:00
Michał Chojnowski
f9e97fa632 sstables: fix a use-after-free in key_view::explode()
key_view::explode() contains a blatant use-after-free:
unless the input is already linearized, it returns a view to a local temporary buffer.

This is rare, because partition keys are usually not large enough to be fragmented.
But for a sufficiently large key, this bug causes a corrupted partition_key down
the line.

Fixes #17625

Closes scylladb/scylladb#17626
2024-03-07 09:07:07 +02:00
Kefu Chai
64e14d21db locator/tablets: add fmt::formatter for tablet_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* tablet_id
* tablet_replica
* tablet_metadata
* tablet_map

their operator<<:s are dropped

Refs scylladb/scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17504
2024-03-07 09:00:49 +03:00
Pavel Emelyanov
52a1b2c413 Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* position_range
* mutation_fragment
* range_tombstone_stream
* mutation_fragment_v2::printer

Refs #13245

Closes scylladb/scylladb#17521

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for position_range
  mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream
  mutation: add fmt::formatter for mutation_fragment_v2::printer
2024-03-07 08:56:21 +03:00
Pavel Emelyanov
df6048adec topology.tablets_migration: Handle failed use_new
This stage doesn't need any special treatment, because we cannot revert
to old replicas and should proceed normally. The barrier itself won't
get stuck, because it already handles excluded/ignored nodes.

Just make the test validate it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
fb7428c560 topology.tablets_migration: Handle failed write_both_read_new
Two options here -- go revert to old replicas by jumping into
cleanup_target stage or proceed noramlly. The choice depends on which
replica set has less number of dead nodes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
324eaaf873 topology.tablets_migration: Handle failed write_both_read_old
At this stage it can happen that target replica got some writes, so its
tablet needs to be cleaned up, so jump to cleanup_target stage.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
f81e0b2e88 topology.tablets_migration: Handle failed allow_write_both_read_old
This is early stage, just proceed to existing revert_migration

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
5bb1597a30 test/tablets_migration: Add conditional break-point into barrier handler
There are several transition stages that are executed by the topology
coordinator with the help of barrier-and-drain raft commands. For the
test to stop and remove a node while handling this stage it must inject
a break-point into barrier handler, wait for it to happen and then stop
the node without resuming the break-point. Then removenode from the
cluster.

The break-point suspends barrier handling when a specific tablet is in
specific transition stage. Tablet ID and desired stage are configured
via injector parameters.

With today's error-injection facilities the way to suspend code
execution is with injecting a lambda that waits for a message from the
injection engine.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Botond Dénes
8dd6fe75e7 Merge 'tools/scylla-nodetool: implement info ' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17498

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement info
  test/nodetool: move format_size into utils.py
2024-03-07 07:14:51 +02:00
Kamil Braun
19b816bb68 Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz
This patch series makes all auth writes serialized via raft. Reads stay
eventually consistent for performance reasons. To make transition to new
code easier data is stored in a newly created keyspace: system_auth_v2.

Internally the difference is that instead of executing CQL directly for
writes we generate mutations and then announce them via raft group0. Per
commit descriptions provide more implementation details.

Refs https://github.com/scylladb/scylladb/issues/16970
Fixes https://github.com/scylladb/scylladb/issues/11157

Closes scylladb/scylladb#16578

* github.com:scylladb/scylladb:
  test: extend auth-v2 migration test to catch stale static
  test: add auth-v2 migration test
  test: add auth-v2 snapshot transfer test
  test: auth: add tests for lost quorum and command splitting
  test: pylib: disconnect driver before re-connection
  test: adjust tests for auth-v2
  auth: implement auth-v2 migration
  auth: remove static from queries on auth-v2 path
  auth: coroutinize functions in password_authenticator
  auth: coroutinize functions in standard_role_manager
  auth: coroutinize functions in default_authorizer
  storage_service: add support for auth-v2 raft snapshots
  storage_service: extract getting mutations in raft snapshot to a common function
  auth: service: capture string_view by value
  alternator: add support for auth-v2
  auth: add auth-v2 write paths
  auth: add raft_group0_client as dependency
  cql3: auth: add a way to create mutations without executing
  cql3: run auth DML writes on shard 0 and with raft guard
  service: don't loose service_level_controller when bouncing client_state
  auth: put system_auth and users consts in legacy namespace
  cql3: parametrize keyspace name in auth related statements
  auth: parametrize keyspace name in roles metadata helpers
  auth: parametrize keyspace name in password_authenticator
  auth: parametrize keyspace name in standard_role_manager
  auth: remove redundant consts auth::meta::*::qualified_name
  auth: parametrize keyspace name in default_authorizer
  db: make all system_auth_v2 tables use schema commitlog
  db: add system_auth_v2 tables
  db: add system_auth_v2 keyspace
2024-03-06 10:11:33 +01:00
Botond Dénes
58265a7dc1 tools/utils: fix use-after-free when printing error message for unknown operation
When a tool application is invoked with an unknown operation, an error
message is printed, which includes all the known operations, with all
their aliases. This is collected in `std::vector<std::string_view>`. The
problem is that the vector containing alias names, is returned as a
value, so the code ends up creating views to temporaries.
Fix this by returning alias vector with const&.

Fixes: #17584

Closes scylladb/scylladb#17586
2024-03-06 10:42:02 +02:00
Kefu Chai
e248ab48db tools/scylla-nodetool: correct tablestats filtering
before this change, we failed to apply the filtering of tablestats
command in the right way:

1. `table_filter` failed to check if delimiter is npos before
   extract the cf component from the specified table name.
2. the stats should not included the keyspace which are not
   included by the filter.
3. the total number of tables in the stats report should contain
   all tables no matter they are filtered or not.

in this change, all the problems above are addressed. and the tests
are updated to cover these use cases.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17468
2024-03-06 10:36:20 +02:00
Botond Dénes
41424231f1 Merge 'compaction: reshape sstables within compaction groups' from Lakshmi Narayanan Sreethar
For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. The shard_reshaping_compaction_task_impl now groups the sstables based on their compaction groups before reshaping them.

Fixes https://github.com/scylladb/scylladb/issues/16966

Closes scylladb/scylladb#17395

* github.com:scylladb/scylladb:
  test/topology_custom: add testcase to verify reshape with tablets
  test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction
  replica/distributed_loader: enable reshape for sstables
  compaction: reshape sstables within compaction groups
  replica/table : add method to get compaction group id for an sstable
  compaction: reshape: update total reshaped size only on success
  compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run
2024-03-06 10:33:56 +02:00
Botond Dénes
dce42b2517 Merge 'tools/scylla-nodetool: fixes to address the test failure with dtest' from Kefu Chai
- use API endpoint of /storage_service/toppartition/
- only print out the specified samplings.
- print "\n" separator between samplings

Closes scylladb/scylladb#17574

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: print separator between samplings
  tools/scylla-nodetool: only print the specified sampling
  tools/scylla-nodetool: use /storage_service/toppartition/
2024-03-06 10:27:25 +02:00
Botond Dénes
c843f98769 Merge 'cql3: add fmt::formatter for cql3 types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* std::vector<data_type>
* column_identifier
* column_identifier_raw
* untyped_constant::type_class

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17538

* github.com:scylladb/scylladb:
  cql3: add fmt::formatter for expression::printer
  cql3: add fmt::formatter for raw_value{,_view}
  cql3: add fmt::formatter for std::vector<data_type>
  cql3: add fmt::formatter for untyped_constant::type_class
  cql3: add fmt::formatter for column_identifier{,_row}
2024-03-06 10:03:50 +02:00
Kefu Chai
fc774361e8 cql3: add fmt::formatter for raw_value{,_view}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raw_value
* raw_value_view

`raw_value_view` 's operator<< is still being used by the generic
homebrew printer for vector<>, so it is preserved.

`raw_value` 's operator<< is still being used by the generic
homebrew printer for optional<>, so it's preserved as well.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-05 14:00:13 +08:00
Kamil Braun
0a7854ea4d Merge 'test: test_topology_ops: fix flakiness and reenable bg writes' from Patryk Jędrzejczak
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.

Since scylladb/scylladb#15924 is the last issue mentioned in
scylladb/scylladb#15962, this PR also reenables background
writes in `test_topology_ops` with tablets disabled. The test
doesn't pass with tablets and background writes because of
scylladb/scylladb#17025. We will reenable background writes
with tablets after fixing that issue.

Fixes scylladb/scylladb#15924
Fixes scylladb/scylladb#15962

Closes scylladb/scylladb#17585

* github.com:scylladb/scylladb:
  test: test_topology_ops: reenable background writes without tablets
  test: test_topology_ops: run with and without tablets
  test: topology: decrease the server's request timeouts
2024-03-04 20:57:24 +01:00
Patryk Jędrzejczak
f1d9248df9 test: wait for CDC generations publishing before checking CDC-topology consistency
Tests that verify upgrading to the raft-based topology
(`test_topology_upgrade`, `test_topology_recovery_basic`,
`test_topology_recovery_majority_loss`) have flaky
`check_system_topology_and_cdc_generations_v3_consistency` calls.
`assert topo_results[0] == topo_res` can fail because of different
`unpublished_cdc_generations` on different nodes.

The upgrade procedure creates a new CDC generation, which is later
published by the CDC generation publisher. However, this can happen
after the upgrade procedure finishes. In tests, if publishing
happens just before querying `system.topology` in
`check_system_topology_and_cdc_generations_v3_consistency`, we can
observe different `unpublished_cdc_generations` on different nodes.
It is an expected and temporary inconsistency.

For the same reasons,
`check_system_topology_and_cdc_generations_v3_consistency` can
fail after adding a new node.

To make the tests not flaky, we wait until the CDC generation
publisher finishes its job. Then, all nodes should always have
equal (and empty) `unpublished_cdc_generations`.

Fixes scylladb/scylladb#17587
Fixes scylladb/scylladb#17600
Fixes scylladb/scylladb#17621

Closes scylladb/scylladb#17622
2024-03-04 19:28:51 +02:00
Kamil Braun
ec1f574b3a test/pylib: util: silence exception from refresh_nodes
Driver's `refresh_nodes` function may throw an exception if we call it
in the middle of driver reconnecting. Silence it.

Fixes scylladb/scylladb#17616

Closes scylladb/scylladb#17620
2024-03-04 17:50:16 +02:00
Marcin Maliszkiewicz
eb56ae3bb9 test: extend auth-v2 migration test to catch stale static 2024-03-01 16:31:04 +01:00
Marcin Maliszkiewicz
6c30dc6351 test: add auth-v2 migration test 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
53996e2557 test: add auth-v2 snapshot transfer test 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
4f65e173cf test: auth: add tests for lost quorum and command splitting
With auth-v2 we can login even if quorum is lost. So test
which checks if error occurs in such situation is deleted
and the opposite test which checks if logging in works was
added.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
a5f81f0836 test: pylib: disconnect driver before re-connection 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
1badd09d45 test: adjust tests for auth-v2 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
9cb1f111d5 alternator: add support for auth-v2
Alternator doesn't do any writes to auth
tables so it's simply change of keyspace
name.

Docs will be updated later, when auth-v2
is enabled as default.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
7f204a6e80 auth: add raft_group0_client as dependency
Most auth classes need this to be able to announce
raft commands.

Usage added in subsequent commit.
2024-03-01 16:25:14 +01:00
Patryk Jędrzejczak
e7d4e080e9 test: test_topology_ops: reenable background writes without tablets
After fixing scylladb/scylladb#15924 in one of the previous
patches, we reenable background writes in `test_topology_ops`.

We also start background writes a bit later after adding all nodes.
Without this change and with tablets, the test fails with:
```
>       await cql.run_async(f"CREATE TABLE tbl (pk int PRIMARY KEY, v int)")
E       cassandra.protocol.ConfigurationException: <Error from server: code=2300
        [Query invalid because of configuration issue] message="Datacenter
        datacenter1 doesn't have enough nodes for replication_factor=3">
```

The change above makes the test a bit weaker, but we don't have to
worry about it. If adding nodes is bugged, other tests should
detect it.

Unfortunately, the test still doesn't pass with tablets and
background writes because of scylladb/scylladb#17025, so we keep
background writes disabled with tablets and leave FIXME.

Fixes scylladb/scylladb#15962
2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak
90317c5ceb test: test_topology_ops: run with and without tablets
`test_topology_ops` is a valuable test that has uncovered many bugs.
It's worth running it with and without tablets.
2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak
9dfb26428b test: topology: decrease the server's request timeouts
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.

A high server's request timeout can slow down the topology tests
(see the new comment in `make_scylla_conf`). We make the timeout
dependent on the testing mode to not slow down tests for no reason.

We don't touch the driver's request timeout. Decreasing it in some
modes would require too much effort for almost no improvement.

Fixes scylladb/scylladb#15924
2024-02-29 18:37:38 +01:00
Petr Gusev
6afa80a443 sync_raft_topology_nodes: do no emit REMOVED_NODE on IP change
Calling notify_left for old ip on topology change in raft mode
was a regression. In gossiper mode it didn't occur. In gossiper
mode the function handle_state_normal was responsible for spotting
IP addresses that weren't managing any parts of the data, and
it would then initiate their removal by calling remove_endpoint.
This removal process did not include calling notify_left.
Actually, notify_left was only supposed to be called (via excise) by
a 'real' removal procedures - removenode and decommission.

The redundant notify_left caused troubles in scylla python driver.
The driver could receive REMOVED_NODE and NEW_NODE notifications
in the same time and their handling routines could race with each other.

In this commit we fix the problem by not calling notify_left if
the remove_ip lambda was called from the ip change code path.
Also, we add a test which verifies that the driver log doesn't
mention the REMOVED_NODE notification.

Fixes scylladb/scylladb#17444

Closes scylladb/scylladb#17561
2024-02-29 10:18:20 +01:00