Commit Graph

6904 Commits

Author SHA1 Message Date
Wojciech Mitros
0de3a5f3ff test mv: remove injection delaying shutdown of a node
In the test_mv_topology_change case, we use an injection to
delay the view updates application, so that the ERMs have
a chance to change in the process. This injection was also
enabled on a new node in the test, which was later decommissioned.
During the shutdown, writes were still being performed, causing
view update generation and delays due to the injection which in
turn delayed the node shutdown, causing the test to timeout.
This patch removes the injection for the node being shut down.
At the same time, the force_gossip_topology_changes=True option
is also removed from its config, but for that option it's enough
to enable on the first node in the cluster and all nodes use it.

Fixes: https://github.com/scylladb/scylladb/issues/18941

Closes scylladb/scylladb#18958
2024-05-29 15:29:55 +02:00
Nadav Har'El
4b04ed1360 test/alternator: be more forgiving on authorizer configuration
The Alternator test suite usually runs on a specific configuration of
Scylla set up by test.py or test/alternator/run. However, we do consider
it an important design goal of this test suite that developers should be
able to run these tests against any DynamoDB-API implementation, including
any version Scylla manually run by the developer in *any way* he or she
pleases.

The recent commit dc80b5dafe changed the way
we retrieve the configured autentication key, which is needed if Scylla is
run with --alternator-enforce-authorization. However, the new code assumed
that Scylla was also run with
     --authenticator PasswordAuthenticator --authorizer CassandraAuthorizer
so that the default role of "cassandra" has a valid, non-null, password
(namely, "cassandra"). If the developer ran Scylla manually without
these options, the test initialization code broke, and all tests in the
suite failed.

This patch fixes this breakage. You can now run the Alternator test
suite against Scylla run manually without any of the aforementioned
options, and everything will work except some tests in test_authorization.py
will fail as expected.

This patch has no affect on the usual test.py or test/alternator/run
runs, as they already run Scylla with all the aforementioned options
and weren't exposed to the problem fixed here.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18957
2024-05-29 16:22:45 +03:00
Pavel Emelyanov
e74a4b038f Merge 'tablets: alter keyspace' from Piotr Smaron
This change supports changing replication factor in tablets-enabled keyspaces.
This covers both increasing and decreasing the number of tablets replicas through
first building topology mutations (`alter_keyspace_statement.cc`) and then
tablets/topology/schema mutations (`topology_coordinator.cc`).
For the limitations of the current solution, please see the docs changes attached to this PR.

Fixes: #16129

Closes scylladb/scylladb#16723

* github.com:scylladb/scylladb:
  test: Do not check tablets mutations on nodes that don't have them
  test: Fix the way tablets RF-change test parses mutation_fragments
  test/tablets: Unmark RF-changing test with xfail
  docs: document ALTER KEYSPACE with tablets
  Return response only when tablets are reallocated
  cql-pytest: Verify RF is changes by at most 1 when tablets on
  cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1
  Reject ALTER with 'replication_factor' tag
  Implement ALTER tablets KEYSPACE statement support
  Parameterize migration_manager::announce by type to allow executing different raft commands
  Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks
  Extend system.topology with 3 new columns to store data required to process alter ks global topo req
  Allow query_processor to check if global topo queue is empty
  Introduce new global topo `keyspace_rf_change` req
  New raft cmd for both schema & topo changes
  Add storage service to query processor
  tablets: tests for adding/removing replicas
  tablet_allocator: make load_balancer_stats_manager configurable by name
2024-05-29 14:17:51 +03:00
Botond Dénes
d37eca0593 test/boost/mutation_reader_test: compacting_reader_next_partition: fix partition order
The test creates two partitions and passes them through the reader, but
the partitions are out-of-order. This is benign but best to fix it
anyway.
Found after bumping validation level inside the compactor.

Closes scylladb/scylladb#18848
2024-05-28 20:41:54 +03:00
Aleksandra Martyniuk
b7ae7e0b0e test: fix test_tombstone_gc.py
Tests in test_tombstone_gc.py are parametrized with string instead
of bool values. Fix that. Use the value to create a keyspace with
or without tablets.

Fixes: #18888.

Closes scylladb/scylladb#18893
2024-05-28 20:40:15 +03:00
Wojciech Mitros
519317dc58 mv: handle different ERMs for base and view table
When calculating the base-view mapping while the topology
is changing, we may encounter a situation where the base
table noticed the change in its effective replication map
while the view table hasn't, or vice-versa. This can happen
because the ERM update may be performed during the preemption
between taking the base ERM and view ERM, or, due to f2ff701,
the update may have just been performed partially when we are
taking the ERMs.

Until now, we assumed that the ERMs are synchronized while calling
finding the base-view endpoint mapping, so in particular, we were
using the topology from the base's ERM to check the datacenters of
all endpoints. Now that the ERMs are more likely to not be the same,
we may try to get the datacenter of a view endpoint that doesn't
exist in the base's topology, causing us to crash.

This is fixed in this patch by using the view table's topology for
endpoints coming from the view ERM. The mapping resulting from the
call might now be a temporary mapping between endpoints in different
topologies, but it still maps base and view replicas 1-to-1.

Fixes: #17786
Fixes: #18709

Closes scylladb/scylladb#18816
2024-05-28 16:01:39 +02:00
Pavel Emelyanov
66f6001c77 test: Do not check tablets mutations on nodes that don't have them
The check is performed by selecting from mutation_fragments(table), but
it's known that this query crashes Scylla when there's no tablet replica
on that node.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-28 13:56:46 +02:00
Pavel Emelyanov
6e0e2674f0 test: Fix the way tablets RF-change test parses mutation_fragments
When the test changes RF from 2 to 3, the extra node executes "rebuild"
transition which means that it streams tablets replicas from two other
peers. When doing it, the node receives two sets of sstables with
mutations from the given tablet. The test part that checks if the extra
node received the mutations notices two mutation fragments on the new
replica and errorneously fails by seeing, that RF=3 is not equal to the
number of mutations found, which is 4.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-28 13:56:46 +02:00
Pavel Emelyanov
2567e300d1 test/tablets: Unmark RF-changing test with xfail
Now the scailing works and test must check it does

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-28 13:56:46 +02:00
Dawid Medrek
ec5708bdee cql-pytest: Verify RF is changes by at most 1 when tablets on
This commit adds a test verifying that we can only
change the RF of a keyspace for any DC by at most 1
when using tablets.

Fixes #18029
2024-05-28 13:56:46 +02:00
Piotr Smaron
fbd75c5c06 Implement ALTER tablets KEYSPACE statement support
This commit adds support for executing ALTER KS for keyspaces with
tablets and utilizes all the previous commits.
The ALTER KS is handled in alter_keyspace_statement, where a global
topology request in generated with data attached to system.topology
table. Then, once topology state machine is ready, it starts to handle
this global topology event, which results in producing mutations
required to change the schema of the keyspace, delete the
system.topology's global req, produce tablets mutations and additional
mutations for a table tracking the lifetime of the whole req. Tracking
the lifetime is necessary to not return the control to the user too
early, so the query processor only returns the response while the
mutations are sent.
2024-05-28 13:56:42 +02:00
Piotr Smaron
80ed442be2 Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks 2024-05-28 13:55:11 +02:00
Kamil Braun
247eb9020b Merge 'cdc, raft topology: fix and test cdc in the recovery mode' from Patryk Jędrzejczak
This PR ensures that CDC keeps working correctly in the recovery
mode after leaving the raft-based topology.

We update `system.cdc_local` in `topology_state_load` to ensure
a node restarting in the recovery mode sees the last CDC generation
created by the topology coordinator.

Additionally, we extend the topology recovery test to verify
that the CDC keeps working correctly during the whole recovery
process. In particular, we test that after restarting nodes in the
recovery mode, they correctly use the active CDC generation created
by the topology coordinator.

Fixes scylladb/scylladb#17409
Fixes scylladb/scylladb#17819

Closes scylladb/scylladb#18820

* github.com:scylladb/scylladb:
  test: test_topology_recovery_basic: test CDC during recovery
  test: util: start_writes_to_cdc_table: add FIXME to increase CL
  test: util: start_writes_to_cdc_table: allow restarting with new cql
  storage_service: update system.cdc_local in topology_state_load
2024-05-28 11:53:28 +02:00
Patryk Jędrzejczak
c44d8eca15 test: test_topology_ops: run correctly without tablets
This patch fixes two bugs in `test_topology_ops`:
1. The values of `tablets_enabled` were nonempty strings, so they
always evaluated to `True` in the if statement responsible for
enabling writing workers only if tablets are disabled. Hence, the
writing workers were always disabled.
2. The `topology_experimental_raft suite` uses tablets by default,
so we need a config with empty `experimental_features` to disable
them.

Ensuring this test works with and without tablets is considered
a part of 6.0, so we should backport this patch.

Closes scylladb/scylladb#18900
2024-05-28 10:08:41 +02:00
Nadav Har'El
b7fa5261c8 Merge 'Fix parsing of initial tablets by ALTER' from Pavel Emelyanov
If the user wants to change the default initial tablets value, it uses ALTER KEYSPACE statement. However, specifying `WITH tablets = { initial: $value }`  will take no effect, because statement analyzer only applies `tablets` parameters together with the `replication` ones, so the working statement should be `WITH replication = $old_parameters AND tablets = ...` which is not very convenient.

This PR changes the analyzer so that altering `tablets` happens independently from `replication`. Test included.

fixes: #18801

Closes scylladb/scylladb#18899

* github.com:scylladb/scylladb:
  cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS
  cql3: Fix parsing of ALTER KEYSPACE's tablets parameters
  cql3: Remove unused ks_prop_defs/prepare_options() argument
2024-05-27 23:10:39 +03:00
Kefu Chai
e42d83dc46 treewide: include used headers
before this change, we rely on `seastar/util/std-compat.hh` to
include the used headers provided by stdandard library. this was
necessary before we moved to a C++20 compliant standard library
implementation. but since Seastar has dropped C++17 support. its
`seastar/util/std-compat.hh` is not responsible for providing these
headers anymore.

so, in this change, we include the used headers directly instead
of relying on `seastar/util/std-compat.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18883
2024-05-27 17:34:38 +03:00
Botond Dénes
2d79b0106c Merge 'storage_service: Fix race between tablet split and stats retrieval' from Raphael "Raph" Carvalho
Retrieval of tablet stats must be serialized with mutation to token metadata, as the former requires tablet id stability.
If tablet split is finalized while retrieving stats, the saved erm, used by all shards, can have a lower tablet count than the one in a particular shard, causing an abort as tablet map requires that any id feeded into it is lower than its current tablet count.

Fixes #18085.

Closes scylladb/scylladb#18287

* github.com:scylladb/scylladb:
  test: Fix flakiness in topology_experimental_raft/test_tablets
  service: Use tablet read selector to determine which replica to account table stats
  storage_service: Fix race between tablet split and stats retrieval
2024-05-27 16:32:54 +03:00
Pavel Emelyanov
1003391ed6 cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS
There's a test that checks how ALTER changes the initial tablets value,
but it equips the statement with `replication` parameters because of
limitations that parser used to impose. Now the `tablets` parameters can
come on their own, so add a new test. The old one is kept from
compatibility considerations.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-27 16:27:45 +03:00
Patryk Jędrzejczak
7c1e6ba8b3 test: test_topology_ops: stop a write worker after the first error
`test_topology_ops` is flaky, which has been uncovered by gating
in scylladb/scylladb#18707. However, debugging it is harder than it
should be because write workers can flood the logs. They may send
a lot of failed writes before the test fails. Then, the log file
can become huge, even up to 20 GB.

Fix this issue by stopping a write worker after the first error.

This test is important for 6.0, so we can backport this change.

Closes scylladb/scylladb#18851
2024-05-27 13:49:30 +02:00
Piotr Dulikowski
fa142a9ce7 Merge 'qos/raft_service_level_distributed_data_accessor: print correct error message when trying to modify a service level in recovery mode' from Michał Jadwiszczak
Raft service levels are read-only in recovery mode. This patch adds check and proper error message when a user tries to modify service levels in recovery mode.

Fixes https://github.com/scylladb/scylladb/issues/18827

Closes scylladb/scylladb#18841

* github.com:scylladb/scylladb:
  test/auth_cluster/test_raft_service_levels: try to create sl in recovery
  service/qos/raft_sl_dda: reject changes to service levels in recovery mode
  service/qos/raft_sl_dda: extract raft_sl_dda steps to common function
2024-05-27 13:26:06 +02:00
Kefu Chai
2d7545ade6 test/lib: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18884
2024-05-27 14:13:51 +03:00
Piotr Smaron
cb40f13831 Add storage service to query processor
Query processor needs to access storage service to check if global
topology request is still ongoing and to be able to wait until it
completes.
2024-05-27 12:48:44 +02:00
Paweł Zakrzewski
c888945354 tablets: tests for adding/removing replicas
Note we're suppressing a UBSanitizer overflow error in UTs. That's
because our linter complains about a possible overflow, which never
happens, but tests are still failing because of it.
2024-05-27 12:48:44 +02:00
Patryk Jędrzejczak
2111cb01df test: test_topology_recovery_basic: test CDC during recovery
In topology on raft, management of CDC generations is moved to the
topology coordinator. We extend the topology recovery test to verify
that the CDC keeps working correctly during the whole recovery
process. In particular, we test that after restarting nodes in the
recovery mode, they correctly use the active CDC generation created
by the topology coordinator. A node restarting in the recovery mode
should learn about the active generation from `system.cdc_local`
(or from gossip, but we don't want to rely on it). Then, it should
load its data from `system.cdc_generations_v3`.

Fixes scylladb/scylladb#17409
2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak
388db33dec test: util: start_writes_to_cdc_table: add FIXME to increase CL 2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak
68b6e8e13e test: util: start_writes_to_cdc_table: allow restarting with new cql
This patch allows us to restart writing (to the same table with
CDC enabled) with a new CQL session. It is useful when we want to
continue writing after closing the first CQL session, which
happens during the `reconnect_driver` call. We must stop writing
before calling `reconnect_driver`. If a write started just before
the first CQL session was closed, it would time out on the client.

We rename `finish_and_verify` - `stop_and_verify` is a better
name after introducing `restart`.
2024-05-27 10:39:04 +02:00
Botond Dénes
47dbf23773 Merge 'Rework view services and system-distributed-keyspace dependencies' from Pavel Emelyanov
The system-distributed-keyspace and view-update-generator often go in pair, because streaming, repair and sstables-loader (via distributed-loader) need them booth to check if sstable is staging and register it if it's such. The check is performed by messing directly with system_distributed.view_build_status table, and the registration happens via view-update-generator.

That's not nice, other services shouldn't know that view status is kept in system table. Also view-update-generator is a service to generae and push view updates, the fact that it keeps staging sstables list is the implementation detail.

This PR replaces dependencies on the mentioned pair of services with the single dependency on view-builder (repair, sstables-loader and stream-manager are enlightened) and hides the view building-vs-staging details inside the view_builder.

Along the way, some simplification of repair_writer_impl class is done.

Closes scylladb/scylladb#18706

* github.com:scylladb/scylladb:
  stream_manager: Remove system_distributed_keyspace and view_update_generator
  repair: Remove system_distributed_keyspace and view_update_generator
  streaming: Remove system_distributed_keyspace and view_update_generator
  sstables_loader: Remove system_distributed_keyspace and view_update_generator
  distributed_loader: Remove system_distributed_keyspace and view_update_generator
  view: Make register_staging_sstable() a method of view_builder
  view: Make check_view_build_ongoing() helper a method of view_builder
  streaming: Proparage view_builder& down to make_streaming_consumer()
  repair: Keep view_builder& on repair_writer_impl
  distributed_loader: Propagate view_builder& via process_upload_dir()
  stream_manager: Add view builder dependency
  repair_service: Add view builder dependency
  sstables_loader: Add view_bulder dependency
  main: Start sstables loader later
  repair: Remove unwanted local references from repair_meta
2024-05-27 10:51:11 +03:00
Botond Dénes
e0f4d79f3b Merge 'Do not export statement scheduling group from database' from Pavel Emelyanov
Database used to be (and still is in many ways) an object used to get configuration from. Part of the configuration is the set of pre-configured scheduling groups. That's not nice, services should use each other for some real need, not as proxies to configuration. This patch patches the places that explicitly switch to statement group _not_ to use database to get the group itself.

fixes: #17643

Closes scylladb/scylladb#18799

* github.com:scylladb/scylladb:
  database: Don't export statement scheduling group
  test: Use async attrs and cql-test-env scheduling groups
  test: Use get_scheduling_groups() to get scheduling groups
  api: Don't switch sched group to start/stop protocol servers
  main: Don't switch sched group to start protocol servers
  code: Switch to sched group in request_stop_server()
  code: Switch to server sched group in start()
  protocol_server: Keep scheduling group on board
  code: Add scheduling group to controllers
  redis: Coroutinize start() method
2024-05-27 10:48:33 +03:00
Kefu Chai
46d993a283 test: revert 4c1b6f04
in 4c1b6f04, we added a concept for fmt::is_formattable<>. but it
was not ncessary. the fmt::is_formattable<> trait was enough. the
reason 4c1b6f04 was actually a leftover of a bigger change which
tried to add trait for the cases where fmt::is_formattable<> was
not able to cover. but that was based on the wrong impression that
fmt::is_formattable<> should be able to work with container types
without including, for instance `fmt/ranges.h`. but in 222dbf2c,
we include `fmt/ranges.h` in tests, where the range-alike formatter
is used, that enables `fmt::is_formattable<>` to tell that container
types are formattable.

in short, 4c1b6f04 was created based on a misunderstanding, and
it was a reduced type trait, which is proved to be not necessary.

so, in this change, it is dropped. but the type constraints is
preserved to make the build failure more explicit, if the fallback
formatter does not match with the type to be formatted by Boost.test.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18879
2024-05-27 10:14:59 +03:00
Marcin Maliszkiewicz
2ab143fb40 db: auth: move auth tables to system keyspace
Separate keyspace which also behaves as system brings
little benefit while creating some compatibility problems
like schema digest mismatch during rollback. So we decided
to move auth tables into system keyspace.

Fixes https://github.com/scylladb/scylladb/issues/18098

Closes scylladb/scylladb#18769
2024-05-26 22:30:42 +03:00
Avi Kivity
56d523b071 Merge 'build, test: disable operator<< for vector and unordered_map' from Kefu Chai
this series disables operator<<:s for vector and unordered_map, and drop operator<< for mutation, because we don't have to keep it to work with these operator:s anymore. this change is a follow up of https://github.com/scylladb/seastar/issues/1544

this change is a cleanup. so no need to backport

Closes scylladb/scylladb#18866

* github.com:scylladb/scylladb:
  mutation,db: drop operator<< for mutation and seed_provider_type&
  build: disable operator<< for vector and unordered_map
  db/heat_load_balance: include used header
  test: define a more generic boost_test_print_type
  test/boost: define fmt::formatter for service_level_controller_test.cc
  test/boost: include test/lib/test_utils.hh
2024-05-26 19:19:20 +03:00
Pavel Emelyanov
9108952a52 test/cql-pytest: Add test for token() filter againts mutation_fragments()
When selecting from mutation_fragments(table) one may want to apply
token() filtering againts partition key. This doesn't work currently,
but used to crash. This patch adds a regression test for that

refs: #18637
refs: #18768

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18759
2024-05-26 15:31:20 +03:00
Kefu Chai
4c1b6f0476 test: define a more generic boost_test_print_type
fmt::is_formattable<T>::value is false, even if

* T is a container of U, and
* fmt::is_formattable<U>, and
* U can be formatted using fmt::formatter

so, we have to define a more generic boost_test_print_type()
for the all types supported by {fmt}. it will help us to ditch the
operator<< for vector and unordered_map in Seastar, and allow us
to use the fmt::formatter specialization of the element
types.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 12:32:43 +08:00
Kefu Chai
bfe918ac9e test/boost: define fmt::formatter for service_level_controller_test.cc
since we are moving away for operator<< based formatter, more and more
types now only have {fmt} based formatters. the same will apply to the
STL container types after ditching the generic homebrew formatter in
to_string.hh, so to be prepared for the change, let's add the
fmt::formatter for tests as well.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 12:32:43 +08:00
Kefu Chai
222dbf2ce4 test/boost: include test/lib/test_utils.hh
this change was created in the same spirit of 505900f18f. because
we are deprecating the operator<< for vector and unorderd_map in
Seastar, some tests do not compile anymore if we disable these
operators. so to be prepared for the change disabling them, let's
include test/lib/test_utils.hh for accessing the printer dedicated
for Boost.test. and also '#include <fmt/ranges.h>' when necessary,
because, in order to format the ranges using {fmt}, we need to
use fmt/ranges.h.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 12:32:43 +08:00
Michał Chojnowski
de798775fd test: test_coordinator_queue_management: wait for logs properly
The modified lines of code intend to await the first appearance of a log
on one of the nodes.

But due to misplaced parentheses, instead of creating a list of log-awaiting
tasks with a list comprehension, they pass a generator expression to
asyncio.create_task().

This is nonsense, and it fails immediately with a type error.
But since they don't actually check the result of the await,
the test just assumes that the search completed successfully.

This was uncovered by an upgrade to Python 3.12, because its typing is stronger
and asyncio.create_task() screams when it's passed a regular generator.

This patch fixes the bad list comprehension, and also adds an error check
on the completed awaitables (by calling `await` on them).

Fixes #18740

Closes scylladb/scylladb#18754
2024-05-25 10:54:44 +03:00
Pavel Emelyanov
ddc511872e test: Use async attrs and cql-test-env scheduling groups
Continuation of the prevuous patch, but with its own flavor. There's a
manual test that wants to run seastar thread in statement scheduling
group and gets one from database. This patch makes it get the group from
cql-test-env and, while at it, makes it switch to that group using
thread attributes passed to async() method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 18:00:01 +03:00
Pavel Emelyanov
2e3a057db1 test: Use get_scheduling_groups() to get scheduling groups
There's such a helper in cql-test-env that other tests use to get sched
groups from. Few other tests (ab)use databse for that, this patch fixes
those remnants.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 18:00:01 +03:00
Michał Jadwiszczak
af0b6bcc56 test/auth_cluster/test_raft_service_levels: try to create sl in recovery 2024-05-23 17:49:59 +02:00
Pavel Emelyanov
8906126a2c stream_manager: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:56 +03:00
Pavel Emelyanov
d917b06857 stream_manager: Add view builder dependency
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Raphael S. Carvalho
e7246751b6 test: Fix flakiness in topology_experimental_raft/test_tablets
One source of flakiness is in test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode
due to gossiper being aborted prematurely, and causing reconnection
storm.

Another is test_tablet_missing_data_repair which is flaky due an issue
in python driver that session might not reconnect on rolling restart
(tracked by https://github.com/scylladb/python-driver/issues/230)

Refs #15356.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-22 17:02:29 -03:00
Raphael S. Carvalho
abcc68dbe7 storage_service: Fix race between tablet split and stats retrieval
If tablet split is finalized while retrieving stats, the saved erm, used by all
shards, will be invalidated. It can either cause incorrect behavior or
crash if id is not available.

It's worked by feeding local tablet map into the "coordinator"
collecting stats from all shards. We will also no longer have a snapshot
of erm shared between shards to help intra-node migration. This is
simplified by serializing token metadata changes and the retrieval of
the stats (latter should complete pretty fast, so it shouldn't block
the former for any significant time).

Fixes #18085.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-22 09:25:29 -03:00
Nadav Har'El
dc80b5dafe test/alternator: do not write to auth tables
As part of the Alternator test suite, we check Alternator's support for
authentication. Alternator maps Scylla's existing CQL roles to AWS's
authentication:
  * AWS's access_key_id     <- the name of the CQL role
  * AWS's secret_access_key <- the salted hash of the password of the CQL role

Before this patch, the Alternator test suite created a new role with a
preset salted hash (role "alternator", salted hash "secret_pass")
and than used that in the tests. However, with the advent of Raft-based
metadata it is wrong to write directly to the roles table, and starting
with #17952 such writes will be outright forbidden.

But we don't actually need to create a new CQL role! We already have
a perfectly good CQL role called "cassandra", and our tests already use
it. So what this patch does is to have the Alternator tests (conftest.py)
read from the roles system-table the salted hash of the "cassandra" role,
and then use that - instead of the hard-coded pair alternator/secret_pass -
in the tests.

A couple more tests assumed that the role name that was used was
"alternator", but now it was changed to "cassandra" so those tests
needed minor fixes as well.

After this patch, the Alternator tests no longer *write* to the roles
system table. Moreover, after this patch, test/alternator/run and
test/alternator/suite.yaml (used when testing with test.py) no longer
need to do extra ugly CQL setup before starting the Alternator tests.

Fixes #18744

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18771
2024-05-22 11:00:15 +03:00
Pavel Emelyanov
26eda88401 test/tablets: Check that after RF change data is replicated properly
There's a test that checks system.tablets contents to see that after
changing ks replication factor via ALTER KEYSPACE the tablet map is
updated properly. This patch extends this test that also validates that
mutations themselves are replicated according to the desired replication
factor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18644
2024-05-22 11:00:15 +03:00
Botond Dénes
5e41dd28c7 Merge 'Sanitize sl controller draining' from Pavel Emelyanov
The sl-controller is stopped in three steps. The first (and instantly the second) is unsubscribing from lifecycle notification and draining. The third is stop itself. First two steps are "out of order" as compared to the desired start-stop sequence of any service, this patch fixes these steps.

After this PR the drain_on_shutdown() (the call that drains the node upon stop) finally becomes clean and tidy and is no longer accompanied by ad-hoc fellow drains/stops/aborts/whatever.

refs: #2737

Closes scylladb/scylladb#18731

* github.com:scylladb/scylladb:
  sl_controller: Remove drain() method
  sl_controller: Move abort kicking into do_abort()
  main,sl_controller: Subscribe for early abort
  main: Unsubscribe sl controller next to subscribing
2024-05-21 17:16:23 +03:00
Kefu Chai
86b988a70b test/lib: do not use variable which could be moved away
C++ standard does not define the order in which the parameters
passed to a function are evaluated. so in theory, in
```c++
reusable_sst(sst->get_schema(), std::move(sst));
```
`std::move(sst)` could be evaluated before `sst->get_schema`.
but please note, `std::move(sst)` does not move `sst`
away, it merely cast `sst` to a rvalue reference, it is
`reusable_sst()` which *could* move `sst` away by
consuming it. so following call is much more dangerous
than the above one:
```c++
reusable_sst(sst->get_schema(), modify_sst(std::move(sst)))
```
nevertheless, this usage is still confusing. so instead
of passing a copy of `sst` to `reusable_sst`.

this change is inspired by clang-tidy, it warns like:

```
Warning: /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: warning: 'sst' used after it was moved [bugprone-use-after-move]
  397 |     return reusable_sst(sst->get_schema(), std::move(sst));
      |                         ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:44: note: move occurred here
  397 |     return reusable_sst(sst->get_schema(), std::move(sst));
      |                                            ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
  397 |     return reusable_sst(sst->get_schema(), std::move(sst));
      |
```

per the analysis above, this is a false alarm.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18775
2024-05-21 10:02:10 +03:00
Avi Kivity
33ec6ccea9 test: boost: chunked_vector_test: include <optional>
std::optional is used but not imported. This fails on libstdc++-14.

Closes scylladb/scylladb#18739
2024-05-21 07:37:11 +03:00
Pavel Emelyanov
8d4c8711fa main,sl_controller: Subscribe for early abort
There's stop-signal in main that fires an abort source on stop. Lots of
other services are subscribed in it, add the sl-controller too. For now
it's a no-op, but next patches will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-20 21:26:31 +03:00
Avi Kivity
61505d057e Merge 'Sort user-defined types in describe statements' from Michał Jadwiszczak
User-defined types can depend on each other, creating directed acyclic graph.

In order to support restoring schema from `DESC SCHEMA`, UDTs should be
ordered topologically, not alphabetically as it was till now.

This patch changes the way UDTs are ordered in `DESC SCHEMA`/`DESC KEYSPACE <ks>` statements, so the output can be safely copy-pasted to restore the schema.

Fixes #18539

Closes scylladb/scylladb#18302

* github.com:scylladb/scylladb:
  test/cql-pytest/test_describe: add test for UDTs ordering
  cql3/statements/describe_statement: UDTs topological sorting
  cql3/statements/describe_statement: allow to skip alphabetical sorting
  types: add a method to get all referenced user types
  db/cql_type_parser: use generic topological sorting
  db/cql_type_parses: futurize raw_builder::build()
  test/boost: add test for topological sorting
  utils: introduce generic topological sorting algorithm
2024-05-20 16:58:17 +03:00