Commit Graph

40667 Commits

Author SHA1 Message Date
Kamil Braun
787b24cd24 Merge 'raft topology: join: shut down a node on error in response handler' from Patryk Jędrzejczak
If the joining node fails while handling the response from the
topology coordinator, it hangs even though it knows the join
operation has failed. Therefore, we ensure it shuts down in
this patch.

Additionally, we ensure that if the first join request response
was a rejection or the node failed while handling it, the
following acceptances by the (possibly different) coordinator
don't succeed. The node considers the join operation as failed.
We shouldn't add it to the cluster.

Fixes scylladb/scylladb#16333

Closes scylladb/scylladb#16650

* github.com:scylladb/scylladb:
  topology_coordinator: clarify warnings
  raft topology: join: allow only the first response to be a succesful acceptance
  storage_service: join_node_response_handler: fix indentation
  raft topology: join: shut down a node on error in response handler
2024-01-17 14:55:26 +01:00
Botond Dénes
f22fc88a64 Merge 'Configure service levels interval' from Michał Jadwiszczak
Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests.

This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest.

Closes scylladb/scylladb#16394

* github.com:scylladb/scylladb:
  test:cql-pytest: change service levels intervals in tests
  configure service levels interval
2024-01-17 12:24:49 +02:00
David Garcia
f555a2cb05 docs: dynamic include based on flag
docs: extend include options

Closes scylladb/scylladb#16753
2024-01-17 09:33:40 +02:00
Calle Wilund
af0772d605 commitlog: Add wait_for_pending_deletes
Refs #16757

Allows waiting for all previous and pending segment deletes to finish.
Useful if a caller of `discard_completed_segments` (i.e. a memtable
flush target) not only wants to ensure segments are clean and released,
but thoroughly deleted/recycled, and hence no treat to resurrecting
data on crash+restart.

Test included.

Closes scylladb/scylladb#16801
2024-01-17 09:30:55 +02:00
Kefu Chai
84a9d2fa45 add formatter for auth::role_or_anonymous
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for auth::role_or_anonymous,
and remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16812
2024-01-17 09:28:13 +02:00
Kefu Chai
3f0fbdcd86 replica: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16810
2024-01-17 09:27:09 +02:00
Tomasz Grabiec
3d76aefb98 Merge "Enhance topology request status tracking" from Gleb
Currently to figure out if a topology request is complete a submitter
checks the topology state and tries to figure out from that the status
of the request. This is not exact. Lets look at rebuild handling for
instance. To figure out if request is completed the code waits for
request object to disappear from the topology, but if another rebuild
starts between the end of the previous one and the code noticing that
it completed the code will continue waiting for the next rebuild.
Another problem is that in case of operation failure there is no way to
pass an error back to the initiator.

This series solves those problems by assigning an id for each request and
tracking the status of each request in a separate table. The initiator
can query the request status from the table and see if the request was
completed successfully or if it failed with an error, which is also
evadable from the table.

The schema for the table is:

    CREATE TABLE system.topology_requests (
        id timeuuid PRIMARY KEY,

        initiating_host uuid,
        start_time timestamp,

        done boolean,
        error text,
        end_time timestamp,
    );

and all entries have TTL of one month.
2024-01-17 00:37:19 +01:00
Benny Halevy
d6071945c8 compaction, table: ignore foreign sstables replay_position
The sstables replay_position in stats_metadata is
valid only on the originating node and shard.

Therefore, validate the originating host and shard
before using it in compaction or table truncate.

Fixes #10080

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16550
2024-01-16 18:45:59 +02:00
Benny Halevy
7a7a1db86b sstables_loader: load_new_sstables: auto-enable load-and-stream for tablets
And call on_internal_error if process_upload_dir
is called for tablets-enabled keyspace as it isn't
supported at the moment (maybe it could be in the future
if we make sure that the sstables are confined to tablets
boundaries).

Refs #12775
Fixes #16743

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16788
2024-01-16 18:43:52 +02:00
Gleb Natapov
9a7243d71a storage_service: topology coordinator: Consolidate some mutation builder code 2024-01-16 17:02:54 +02:00
Gleb Natapov
a145a73136 storage_service: topology coordinator: make topology operation rollback error more informative
Include an error which caused the rollback.
2024-01-16 17:02:54 +02:00
Gleb Natapov
bf91eb37f2 storage_service: topology coordinator: make topology operation cancellation error more informative
Include the list of nodes that were down when cancellation happened.
2024-01-16 17:02:54 +02:00
Gleb Natapov
8beb399b72 storage_service: topology coordinator: consolidate some code in cancel_all_requests
There is a code duplication that can be avoided.
2024-01-16 17:02:54 +02:00
Gleb Natapov
fba6877b3e storage_service: topology coordinator: TTL topology request table
To prevent topology_request table growth TTL all writes to expire after
a month.
2024-01-16 17:02:54 +02:00
Gleb Natapov
d576ed31dc storage_service: topology request: drop explicit shutdown rpc
Now that we have explicit status for each request we may use it to
replace shutdown notification rpc. During a decommission, in
left_token_ring state, we set done to true after metadata barrier
that waits for all request to the decommissioning node to complete
and notify the decommissioning node with a regular barrier. At this
point the node will see that the request is complete and exit.
2024-01-16 17:02:54 +02:00
Gleb Natapov
84197ff735 storage_service: topology coordinator: check topology operation completion using status in topology_requests table
Instead of trying to guess if a request completed by looking into the
topology state (which is sometimes can be error prone) look at the
request status in the new topology_requests. If request failed report
a reason for the failure from the table.
2024-01-16 17:02:54 +02:00
Kefu Chai
0092700ad1 memtable: add formatter for replica::{memtable,memtable_entry}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for replica::memtable and
replica::memtable_entry, and remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16793
2024-01-16 16:46:52 +02:00
Kefu Chai
2dbf044b91 cql3: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16791
2024-01-16 16:43:17 +02:00
Avi Kivity
a9844ed69a Merge 'view: revert cleanup filter that doesn't work with tablets' from Nadav Har'El
The goal of this PR is fix Scylla so that the dtest test_mvs_populating_from_existing_data, which starts to fail when enabling tablets, will pass.

The main fix (the second patch) is reverting code which doesn't work with tablets, and I explain why I think this code was not necessary in the first place.

Fixes #16598

Closes scylladb/scylladb#16670

* github.com:scylladb/scylladb:
  view: revert cleanup filter that doesn't work with tablets
  mv: sleep a bit before view-update-generator restart
2024-01-16 16:42:20 +02:00
Gleb Natapov
1c18476385 storage_service: topology coordinator: update topology_requests table with requests progress
Make topology coordinator update request's status in topology_requests table as it changes.
2024-01-16 15:35:18 +02:00
Benny Halevy
e277ec6aef force_keyspace_cleanup: skip keyspaces that do not require or support cleanup
Local keyspaces do not need cleanup, and
keyspaces configured with tablets, where their
replication strategy is per-table do not support
cleanup.

In both cases, just skip their cleanup via the api.

Fixes #16738

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16785
2024-01-16 15:01:49 +03:00
Gleb Natapov
1ce1c5001d topology coordinator: add topology_requests table to group0 snapshot
Since the table is updated through raft's group0 state machine its
content needs to be part of the snapshot.
2024-01-16 13:57:27 +02:00
Gleb Natapov
584551f849 topology coordinator: add request_id to the topology state machine
Provide a unique ID for each topology request and store it the topology
state machine. It will be used to index new topology requests table in
order to retrieve request status.
2024-01-16 13:57:27 +02:00
Gleb Natapov
ecb8778950 system keyspace: introduce local table to store topology requests status
The table has the following schema and will be managed by raft:

CREATE TABLE system.topology_requests (
    id timeuuid PRIMARY KEY,

    initiating_host uuid,
    start_time timestamp,

    done boolean,
    error text,
    end_time timestamp,
);

In case of an request completing with an error the "error" filed will be non empty when "done" is set to true.
2024-01-16 13:57:16 +02:00
Tomasz Grabiec
49026dc319 Merge 'Turn on tablets on keyspace by default when the feature is enabled' from Pavel Emelyanov
To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to.

This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is

* `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration
* `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing

fixes: #16319

Closes scylladb/scylladb#16364

* github.com:scylladb/scylladb:
  code: Enable tablets if cluster feature is enabled
  test: Turn off tablets feature by default
  test: Move test_tablet_drain_failure_during_decommission to another suite
  test/tablets: Enable tables for real on test keyspace
  test/tablets: Make timestamp local
  cql3: Add feature service to as_ks_metadata_update()
  cql3: Add feature service to ks_prop_defs::as_ks_metadata()
  cql3: Add feature service to get_keyspace_metadata()
  cql: Add tablets on/off switch to CREATE KEYSPACE
  cql: Move initial_tablets from REPLICATION to TABLETS in DDL
  network_topology_strategy: Estimate initial_tablets if 0 is set
2024-01-16 00:15:10 +01:00
Avi Kivity
5e70dd1dbe database: don't allow keyspace objects to be copied
keyspace objects are heavyweight and copies are immediately our-of-date,
so copying them is bad.

Fix by deleting the copy constructor and copy assignment operator. One
call site is fixed. This call site is safe since the it's only used
for accessing a few attributes (introduced in f70c4127c6).

Closes scylladb/scylladb#16782
2024-01-15 21:48:32 +01:00
Botond Dénes
204d3284fa readers/multishard: evictable_reader::fast_forward_to(): close reader on exception
When the reader is currently paused, it is resumed, fast-forwarded, then
paused again. The fast forwarding part can throw and this will lead to
destroying the reader without it being closed first.
Add a try-catch surrounding this part in the code. Also mark
`maybe_pause()` and `do_pause()` as noexcept, to make it clear why
that part doesn't need to be in the try-catch.

Fixes: #16606

Closes scylladb/scylladb#16630
2024-01-15 20:55:55 +01:00
Kefu Chai
e5300f3e21 topology_state_machine: add formatter for service::cleanup_status
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for service::cleanup_status,
and remove its operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16778
2024-01-15 21:31:42 +02:00
Anna Stuchlik
af1405e517 doc: remove support for CentOS 7
This commit removes support for CentOS 7
from the docs.

The change applies to version 5.4,so it
must be backported to branch-5.4.

Refs https://github.com/scylladb/scylla-enterprise/issues/3502

In addition, this commit removes the information
about Amazon Linux and Oracle Linux, unnecessarily added
without request, and there's no clarity over which versions
should be documented.

Closes scylladb/scylladb#16279
2024-01-15 15:37:29 +02:00
Anna Stuchlik
bca39b2a93 doc: remove Serverless from the Drivers page
This commit removes the information about ScyllaDB Cloud Serverless,
which is no longer valid.

Closes scylladb/scylladb#16700
2024-01-15 15:36:51 +02:00
Botond Dénes
66bef6e961 cql3: cluster_describe_statement: don't produce range ownership for tablet keyspaces
Tablet keyspaces have per/table range ownership, which cannot currently
be expressed in a DESC CLUSTER statement, which describes range
ownership in the current keyspace (if set). Until we figure out how to
represent range ownership (tablets) of all tables of a keyspace, we
disable range ownership for tablet keyspaces.

Fixes: #16483

Closes scylladb/scylladb#16713
2024-01-15 14:03:54 +01:00
Patryk Wrobel
aec0db1b96 cql_auth_query_test.cc: do not rely on templated operator<<
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&)
from test/boost/cql_auth_query_test.cc.

It prepares the test for removal of the templated helpers.
Such removal is one of goals of the referenced issue that is linked below.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16758
2024-01-15 13:30:05 +02:00
Kefu Chai
ece2bd2f6e service: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16764
2024-01-15 13:29:33 +02:00
Kefu Chai
fc97d91f1a auth: add fmt::format for auth::resource and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for `auth::resource` and friends,
* update their callers of `operator<<` to use `fmt::print()`.
* drop `operator<<`, as they are not used anymore.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16765
2024-01-15 13:26:39 +02:00
Kefu Chai
f344e13066 types: add formatter for data_value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for data_value, but its
its operator<<() is preserved as we are still using the generic
homebrew formatter for formatting std::vector, which in turn uses
operator<< of the element type.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16767
2024-01-15 13:18:23 +02:00
Kefu Chai
218334eaf5 test/nodetool: use build/$CMAKE_BUILD_TYPE when appropriate
because the CMake-generated build.ninja is located under build/,
and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla,
instead of at build/$scylla_build_mode/scylla, so let's adapt to this
change accordingly.

we will promote this change to a shared place if we have similar
needs in other tests as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16775
2024-01-15 12:52:35 +02:00
Pavel Emelyanov
dd892b0d8a code: Enable tablets if cluster feature is enabled
If the TABLETS map is missing in the CREATE KEYSPACE statement the
tablets are anyway enabled if the respective cluster feature is enabled.

To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
4838eeb201 test: Turn off tablets feature by default
Next patches will make per-keyspace initial_tables option really
optional and turn tablets ON when the feature is ON. This will break all
other tests' assumptions, that they are testing vnodes replication. So
turn the feature off by default, tests that do need tables will need to
explicitly enable this feature on their own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
ae7da54f88 test: Move test_tablet_drain_failure_during_decommission to another suite
In its current location it will be started with 3 pre-created scylla
nodes with default features ON. Next patch will exclude `tablets` from
the default list, so the test needs to create servers on its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
46b36d8c07 test/tablets: Enable tables for real on test keyspace
When started cql_test_env creates a test keyspace. Some tablets test
cases create a table in this keyspace, but misuse the whole feature. The
thing is that while tablets feature is ON in those test cases, the
keyspace itself doesn _not_ have the initial_tables option and thus
tablets are not enabled for the ks' table for real. Currently test cases
work just because this table is only used as a transparent table ID
placeholder. If turning on tablets for the keyspace, several test cases
would get broken for two reasons.

First, the tables map will no longer be empty on test start.

Second, applying changes to tablet metadata may not be visible, becase
test case uses "ranom" timestamp, that can be less that the initial
metadata mutations' timestamp.

This patch fixes all three places:

1. enables tables for the test keyspace
2. removes assumption that the initial metadata is empty
3. uses large enough timestamp for subsequent mutations

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
2376b699e0 test/tablets: Make timestamp local
Just to make next patching simpler

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
f3a69bfaca cql3: Add feature service to as_ks_metadata_update()
To call prepare_options() with tablets feature state later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
4dede19e4f cql3: Add feature service to ks_prop_defs::as_ks_metadata()
To call prepare_options() with tablets feature state later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
267770bf0f cql3: Add feature service to get_keyspace_metadata()
To be passed down to ks_prop_defs::as_ks_metadata()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
6cb3055059 cql: Add tablets on/off switch to CREATE KEYSPACE
Now the user can do

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }

to turn tablets off. It will be useful in the future to opt-out keyspace
from tablets when they will be turned on by default based on cluster
features only.

Also one can do just

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }

and let Scylla select the initial tablets value by its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:11 +03:00
Pavel Emelyanov
941f6d8fca cql: Move initial_tablets from REPLICATION to TABLETS in DDL
This patch changes the syntax of enabling tablets from

  CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }

to be

  CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }

and updates all tests accordingly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:04:48 +03:00
Pavel Emelyanov
4c4a9679d8 network_topology_strategy: Estimate initial_tablets if 0 is set
If user configured zero initial tablets (spoiler: or this value was set
automagically when enabling tablets begind the scenes) we still need
some value to start with and this patch calculates one.

The math is based on topology and RF so that all shards are covered:

initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters)

The estimation is done when a table is created, not when the keyspace is
created. For that, the keyspace is configured with zero initial tabled,
and table-creation time zero is converted into auto-estimated value.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:04:48 +03:00
Kamil Braun
423234841e Merge 'add automatic sstable cleanup to the topology coordinator' from Gleb
For correctness sstable cleanup has to run between (some) topology
changes.  Sometimes even a failed topology change may require running
the cleanup.  The series introduces automatic sstable cleanup step to the
topology change coordinator. Unlike other operations it is not represented
as a global transition state, but done by each node independently which
allows cleanup to run without locking the topology state machine so
tablet code can run in parallel with the cleanup.

It is done by having a cleanup state flag for each node in the
topology. The flag is a tri state: "clean" - the node is clean, "needed"
- cleanup is needed (but not running), "running" - cleanup is running. No
topology operation can proceed if there is a node in "running" state, but
some operation can proceed even if there are nodes in "needed" state. If
the coordinator needs to perform a topology operation that cannot run while
there are nodes that need cleanup the coordinator will start one
automatically and continue only after cleanup completes. There is also a
possibility to kick cleanup manually through the new RAFT API call.

* 'cleanup-needed-v8' of https://github.com/gleb-cloudius/scylla:
  test: add test for automatic cleanup procedure
  test: add test for topology requests queue management
  storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator
  storage_service: topology coordinator: add logging to removenode and decommission
  storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator
  storage_service: topology coordinator: manage cluster cleanup as part of the topology management
  storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter
  test: use servers_see_each_other when needed
  test: add servers_see_each_other helper
  storage_service: topology coordinator: make topology coordinator lifecycle subscriber
  system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request
  storage_service: topology coordinator: introduce sstable cleanup fiber
  storage_proxy: allow to wait for all ongoing writes
  storage_service: topology coordinator: mark nodes as needing cleanup when required
  storage_service: add mark_nodes_as_cleanup_needed function
  vnode_effective_replication_map: add get_all_pending_nodes() function
  vnode_effective_replication_map: pre calculate dirty endpoints during topology change
  raft topology: add cleanup state to the topology state machine
2024-01-14 18:54:02 +01:00
Gleb Natapov
f8b90aeb14 test: add test for automatic cleanup procedure
The test runs two bootstraps and checks that there is no cleanup
in between.  Then it runs a decommission and checks that cleanup runs
automatically and then it runs one more decommission and checks that no
cleanup runs again.  Second part checks manual cleanup triggering. It
adds a node, triggers cleanup through the REST API, checks that is runs,
decommissions a node and check that the cleanup did not run again.
2024-01-14 15:45:53 +02:00
Gleb Natapov
5882855669 test: add test for topology requests queue management
This test creates a 5 node cluster with 2 down nodes (A and B). After
that it creates a queue of 3 topology operation: bootstrap, removenode
A and removenode B with ignore_nodes=A. Check that all operation
manage to complete.  Then it downs one node and creates a queue with
two requests: bootstrap and decommission. Since none can proceed both
should be canceled.
2024-01-14 15:45:53 +02:00