Commit Graph

37347 Commits

Author SHA1 Message Date
Kamil Braun
33c19baabc db: system_keyspace: take simpler service references in make
Take references to services which are initialized earlier. The
references to `gossiper`, `storage_service` and `raft_group0_registry`
are no longer needed.

This will allow us to move the `make` step right after starting
`system_keyspace`.
2023-06-18 13:39:27 +02:00
Kamil Braun
b34605d161 db: system_keyspace: call initialize_virtual_tables from main
`initialize_virtual_tables` was called from `system_keyspace::make`,
which caused this `make` function to take a bunch of references to
late-initialized services (`gossiper`, `storage_service`).

Call it from `main`/`cql_test_env` instead.

Note: `system_keyspace::make` is called from
`distributed_loader::init_system_keyspace`. The latter function contains
additional steps: populate the system keyspaces (with data from
sstables) and mark their tables ready for writes.

None of these steps apply to virtual tables.

There exists at least one writable virtual table, but writes into
virtual tables are special and the implementation of writes is
virtual-table specific. The existing writable virtual table
(`db_config_table`) only updates in-memory state when written to. If a
virtual table would like to create sstables, or populate itself with
sstable data on startup, it will have to handle this in its own
initialization function.

Separating `initialize_virtual_tables` like this will allow us to
simplify `system_keyspace` initialization, making it independent of
services used for distributed communication.
2023-06-18 13:39:27 +02:00
Kamil Braun
c931d9327d db: system_keyspace: refactor virtual tables creation
Split `system_keyspace::make` into two steps: creating regular
`system` and `system_schema` tables, then creating virtual tables.

This will allow, in later commit, to make `system_keyspace`
initialization independent of services used for distributed
communication such as `gossiper`. See further commits for details.
2023-06-18 13:39:27 +02:00
Kamil Braun
035045c288 db: system_keyspace: remove system_keyspace_make
The code can now be inlined in `system_keyspace::make` as we no longer
access private members of `database`.
2023-06-18 13:39:27 +02:00
Kamil Braun
cf120e46b8 db: system_keyspace: refactor local system table creation code
`system_keyspace_make` would access private fields of `database` in
order to create local system tables (creating the `keyspace` and
`table` in-memory structures, creating directory for `system` and
`system_schema`).

Extract this part into `database::create_local_system_table`.

Make `database::add_column_family` private.
2023-06-18 13:39:27 +02:00
Kamil Braun
3f04a5956c replica: database: remove is_bootstrap argument from create_keyspace
Unused.
2023-06-18 13:39:27 +02:00
Kamil Braun
8848c3b809 replica: database: write a comment for parse_system_tables 2023-06-18 13:39:27 +02:00
Kamil Braun
4ca149c1f0 replica: database: remove redundant keyspace::get_erm_factory() getter
`keyspace` can simply access its private field.
2023-06-18 13:39:27 +02:00
Kamil Braun
53cf646103 db: system_keyspace: don't take sharded<> references
Take `query_processor` and `database` references directly, not through
`sharded<...>&`. This is now possible because we moved `query_processor`
and `database` construction early, so by the time `system_keyspace` is
started, the services it depends on were also already started.

Calls to `_qp.local()` and `_db.local()` inside `system_keyspace` member
functions can now be replaced with direct uses of `_qp` and `_db`.
Runtime assertions for dependant services being initialized are gone.
2023-06-18 13:39:26 +02:00
Avi Kivity
b7627085cb Revert "Revert "configure: Switch debug build from -O0 to -Og""
This reverts commit 7dadd38161.

The latest revert cited debuggability trumping performance, but the
performance loss is su huge here that debug builds are unusable and
next promotions time out.

In the interest of progress, pick the lesser of two evils.
2023-06-17 15:20:26 +03:00
Kamil Braun
9bfa63fe37 Merge 'test/topology/test_cluster_features: workaround for python driver not reconnecting after full cluster restart' from Piotr Dulikowski
The test `test_downgrade_after_successful_upgrade_fails` shuts down the whole cluster, reconfigures the nodes and then restarts. Apparently, the python driver sometimes does not handle this correctly; in one test run we observed that the driver did not manage to reconnect to any of the nodes, even though the nodes managed to start successfully.

More context can be found on the python driver issue.

This PR works around this issue by using the existing `reconnect_driver` function (which is a workaround for a _different_ python driver issue already) to help the driver reconnect after the full cluster restart.

Refs: scylladb/python-driver#230

Closes #14276

* github.com:scylladb/scylladb:
  tests/topology: work around python driver issue in cluster feature tests
  test/topology{_raft_disabled}: move reconnect_driver to topology utils
2023-06-16 16:54:58 +02:00
Pavel Emelyanov
900c609269 Merge 'Initialize query_processor early, without messaging_service or gossiper' from Kamil Braun
In https://github.com/scylladb/scylladb/pull/14231 we split `storage_proxy` initialization into two phases: for local and remote parts. Here we do the same with `query_processor`. This allows performing queries for local tables early in the Scylla startup procedure, before we initialize services used for cluster communication such as `messaging_service` or `gossiper`.

Fixes: #14202

As a follow-up we will simplify `system_keyspace` initialization, making it available earlier as well.

Closes #14256

* github.com:scylladb/scylladb:
  main, cql_test_env: start `query_processor` early
  cql3: query_processor: split `remote` initialization step
  cql3: query_processor: move `migration_manager&`, `forwarder&`, `group0_client&` to a `remote` object
  cql3: query_processor: make `forwarder()` private
  cql3: query_processor: make `get_group0_client()` private
  cql3: strongly_consistent_modification_statement: fix indentation
  cql3: query_processor: make `get_migration_manager` private
  tracing: remove `qp.get_migration_manager()` calls
  table_helper: remove `qp.get_migration_manager()` calls
  thrift: handler: move implementation of `execute_schema_command` to `query_processor`
  data_dictionary: add `get_version`
  cql3: statements: schema_altering_statement: move `execute0` to `query_processor`
  cql3: statements: pass `migration_manager&` explicitly to `prepare_schema_mutations`
  main: add missing `supervisor::notify` message
2023-06-16 17:41:08 +03:00
Kamil Braun
23d5ddbecb Merge 'storage_service: remove optimization in cleanup_group0_config_if_needed' from Piotr Dulikowski
The `topology_coordinator::cleanup_group0_config_if_needed` function first checks whether the number of group 0 members is larger than the number of non-left entries in the topology table, then attempts to remove nodes in left state from group 0 and prints a warning if no such nodes are found. There are some problems with this check:

- Currently, a node is added to group 0 before it inserts its entry to the topology table. Such a node may cause the check to succeed but no nodes will be removed, which will cause the warning to be printed needlessly.
- Cluster features on raft will reverse the situation and it will be possible for an entry in system.topology to exist without the corresponding node being a part of group 0. This, in turn, may cause the check not to pass when it should and nodes could be removed later than necessary.

This commit gets rid of the optimization and the warning, and the topology coordinator will always compute the set of nodes that should be removed. Additionally, the set of nodes to remove is now computed differently: instead of iterating over left nodes and including only those that are in group 0, we now iterate over group 0 members and include those that are in `left` state. As the number of left nodes can potentially grow unbounded and the number of group 0 members is more likely to be bounded, this should give better performance in long-running clusters.

Closes #14238

* github.com:scylladb/scylladb:
  storage_service: fix indentation after previous commit
  storage_service: remove optimization in cleanup_group0_config_if_needed
2023-06-16 15:59:32 +02:00
Piotr Dulikowski
fadb1351bd tests/topology: work around python driver issue in cluster feature tests
The test `test_downgrade_after_successful_upgrade_fails` stops all
nodes, reconfigures them to support the test-only feature and restarts
them. Unfortunately, it looks like python driver sometimes does not
handle this properly and might not reconnect after all nodes are shut
down.

This commit adds a workaround for scylladb/python-driver#230 - the test
re-creates python driver session right after nodes are restarted.
2023-06-16 15:25:02 +02:00
Piotr Dulikowski
b3771e6011 test/topology{_raft_disabled}: move reconnect_driver to topology utils
The `reconnect_driver` function will be useful outside the
`topology_raft_disabled` test suite - namely, for cluster feature tests
in `topology`. The best course of action for this function would be to
put it into pylib utils; however, the function depends on ManagerClient
which is defined in `test.pylib.manager_client` that depends on
`test.pylib.utils` - therefore we cannot put it there as it would cause
an import cycle. The `topology.utils` module sounds like the next best
thing.

In addition, the docstring comment is updated to reflect that this
function will now be used to work around another issue as well.
2023-06-16 15:25:02 +02:00
Kamil Braun
9f9f4c224b main, cql_test_env: start query_processor early
Start it right after `storage_proxy`.

We also need to start `cql_config` earlier
because `query_processor` uses it.
2023-06-16 14:29:59 +02:00
Kamil Braun
c212370cf1 cql3: query_processor: split remote initialization step
Pass `migration_manager&`, `forward_service&` and `raft_group0_client&`
in the remote init step which happens after the constructor.

Add a corresponding uninit remote step.
Make sure that any use of the `remote` services is finished before we
destroy the `remote` object by using a gate.

Thanks to this in a later commit we'll be able to move the construction
of `query_processor` earlier in the Scylla initialization procedure.
2023-06-16 14:29:59 +02:00
Kamil Braun
ec5b831c13 cql3: query_processor: move migration_manager&, forwarder&, group0_client& to a remote object
These services are used for performing distributed queries, which
require remote calls. As a preparation for 2-phase initialization of
`query_processor` (for local queries vs for distributed queries), move
them to a separate `remote` object which will be constructed in the
second phase.

Replace the getters for the different services with a single `remote()`
getter. Once we split the initialization into two phases, `remote()`
will include a safety protection.
2023-06-16 14:08:21 +02:00
Kamil Braun
c2fa6406ad cql3: query_processor: make forwarder() private 2023-06-16 13:45:59 +02:00
Kamil Braun
f616408a87 cql3: query_processor: make get_group0_client() private 2023-06-16 13:45:19 +02:00
Kamil Braun
db769c8eb3 cql3: strongly_consistent_modification_statement: fix indentation 2023-06-16 13:44:59 +02:00
Kamil Braun
2e441e17cf cql3: query_processor: make get_migration_manager private
After previous commits it's no longer used outside `query_processor`.
Also remove the `const` version - not needed for anything.

Use the getter instead of directly accessing `_mm` in `query_processor`
methods. Later we will put `_mm` in a separate object.
2023-06-16 13:44:14 +02:00
Tomasz Grabiec
e41ff4604d Merge 'raft_topology: fencing and global_token_metadata_barrier' from Gusev Petr
This is the initial implementation of [this spec](https://docs.google.com/document/d/1X6pARlxOy6KRQ32JN8yiGsnWA9Dwqnhtk7kMDo8m9pI/edit).

* the topology version (int64) was introduced, it's stored in topology table and updated through RAFT at the relevant stages of the topology change algorithm;
* when the version is incremented, a `barrier_and_drain` command is sent to all the nodes in the cluster, if some node is unavailable we fail and retry indefinitely;
* the `barrier_and_drain` handler first issues a `raft_read_barrier()` to obtain the latest topology, and  then waits until all requests using previous versions are finished; if this round of RPCs is finished the topology change coordinator can be sure that there are no requests inflight using previous versions and such requests can't appear in the future.
* after `barrier_and_drain` the topology change coordinator issues the `fence` command, it stores the current version in local table as `fence_version` and blocks requests with older versions by throwing `stale_topology_exception`; if a request with older version was started before the fence, its reply will also be fenced.
* the fencing part of the PR is for the future, when we relax the requirement that all nodes are available during topology change; it should protect the cluster from requests with stale topology from the nodes which was unavailable during topology change and which was not reached by the `barrier_and_drain()` command;
* currently, fencing is implemented for `mutation` and `read` RPCs, other RPCs will be handled in the follow-ups; since currently all nodes are supposed to be alive the missing parts of the fencing doesn't break correctness;
* along with fencing, the spec above also describes error handling, isolation and `--ignore_dead_nodes` parameter handling, these will be also added later; [this ticket](https://github.com/scylladb/scylladb/issues/14070) contains all that remains to be done;
* we don't worry about compatibility when we change topology table schema or `raft_topology_cmd_handler` RPC method signature since the raft topology code is currently hidden by `--experimental raft` flag and is not accessible to the users. Compatibility is maintained for other affected RPCs (mutation, read) - the new `fencing_token` parameter is `rpc::optional`, we skip the fencing check if it's not present.

Closes #13884

* github.com:scylladb/scylladb:
  storage_service: warn if can't find ip for server
  storage_proxy.cc: add and use global_token_metadata_barrier
  storage_service: exec_global_command: bool result -> exceptions
  raft_topology: add cmd_index to raft commands
  storage_proxy.cc: add fencing to read RPCs
  storage_proxy.cc: extract handle_read
  storage_proxy.cc: refactor encode_replica_exception_for_rpc
  storage_proxy: fix indentation
  storage_proxy: add fencing for mutation
  storage_servie: fix indentation
  storage_proxy: add fencing_token and related infrastructure
  raft topology: add fence_version
  raft_topology: add barrier_and_drain cmd
  token_metadata: add topology version
2023-06-16 12:07:31 +02:00
Pavel Emelyanov
5412c7947a backlog_controller: Unwrap scheduling_group
Some time ago (997a34bf8c) the backlog
controller was generalized to maintain some scheduling group. Back then
the group was the pair of seastar::scheduling_group and
seastar::io_priority_class. Now the latter is gone, so the controller's
notion of what sched group is can be relaxed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #14266
2023-06-16 12:02:14 +03:00
Michał Chojnowski
3cf15e6ad7 test: perf: memory_footprint_test: don't use obsolete sstable versions
memory_footprint_test fails with:
`sstable - writing sstables with too old format`
because it attempts to write the obsolete sstables formats,
for which the writer code has been long removed.

Fix that.

Closes #14265
2023-06-16 11:58:26 +03:00
Kefu Chai
f6c24c9b70 repair: set repair state correctly
repair_node_state::state is only for debugging purpose, see
ab57cea783 which introduced it.
so this change does not impact the behavior of scylla, but can
improve the debugging experience by reflecting more accurate state
of repair when we are actually inspecting it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14255
2023-06-16 11:16:59 +03:00
Nadav Har'El
e1513f1199 Merge 'cql3: prepare selectors' from Avi Kivity
CQL statements carry expressions in many contexts: the SELECT, WHERE, SET, and IF clauses, plus various attributes. Previously, each of these contexts had its own representation for an expression, and another one for the same expression but before preparation. We have been gradually moving towards a uniform representation of expressions.

This series tackles SELECT clause elements (selectors), in their unprepared phase. It's relatively simple since there are only five types of expression components (column references, writetime/ttl modifiers, function calls, casts, and field selections). Nevertheless, there isn't much commonality with previously converted expression elements so quite a lot of code is involved.

After the series, we are still left with a custom post-prepare representation of expressions. It's quite complicated since it deals with two passes, for aggregation, so it will be left for another series.

Closes #14219

* github.com:scylladb/scylladb:
  cql3: seletor: drop inheritance from assignment_testable
  cql3: selection: rely on prepared expressions
  cql3: selection: prepare selector expressions
  cql3: expr: match counter arguments to function parameters expecting bigint
  cql3: expr: avoid function constant-folding if a thread is needed
  cql3: add optional type annotation to assignment_testable
  cql3: expr: wire unresolved_identifier to test_assignment()
  cql3: expr: support preparing column_mutation_attribute
  cql3: expr: support preparing SQL-style casts
  cql3: expr: support preparing field_selection expressions
  cql3: expr: make the two styles of cast expressions explicit
  cql3: error injection functions: mark enabled_injections() as impure
  cql3: eliminate dynamic_cast<selector> from functions::get()
  cql3: test_assignment: pass optional schema everywhere
  cql3: expr: prepare_expr(): allow aggregate functions
  cql3: add checks for aggregation functions after prepare
  cql3: expr: add verify_no_aggregate_functions() helper
  test: add regression test for rejection of aggregates in the WHERE clause
  cql3: expr: extract column_mutation_attribute_type
  cql3: expr: add fmt formatter for column_mutation_attribute_kind
  cql3: statements: select_statement: reuse to_selectable() computation in SELECT JSON
2023-06-15 15:59:41 +03:00
Kefu Chai
befc78274b install.sh: pass -version to java executable
currently, despite that we are moving from Java-8 to
Java-11, we still support both Java versions. and the
docker image used for testing Datatax driver has
not been updated to install java-11.

the "java" executable provided by openjdk-java-8 does
not support "--version" command line argument. java-11
accept both "-version" and "--version". so to cater
the needs of the the outdated docker image, we pass
"-version" to the selected java. so the test passes
if java-8 is found. a better fix is to update the
docker image to install java-11 though.

the output of "java -version" and "java --version" is
attached here as a reference:

```console
$ /usr/lib/jvm/java-1.8.0/bin/java --version
Unrecognized option: --version
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
```

```console
$ /usr/lib/jvm/java-1.8.0/bin/java -version
openjdk version "1.8.0_362"
OpenJDK Runtime Environment (build 1.8.0_362-b09)
OpenJDK 64-Bit Server VM (build 25.362-b09, mixed mode)
```

```console
/usr/lib/jvm/jre-11/bin/java --version
openjdk 11.0.19 2023-04-18
OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-2.fc38) (build 11.0.19+7)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-2.fc38) (build 11.0.19+7, mixed mode, sharing)
```

```console
$ /usr/lib/jvm/jre-11/bin/java -version
openjdk version "11.0.19" 2023-04-18
OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-2.fc38) (build 11.0.19+7)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-2.fc38) (build 11.0.19+7, mixed mode, sharing)
```

Fixes #14253
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14254
2023-06-15 15:42:09 +03:00
Botond Dénes
d1dc579062 Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai
this series adds an option named "uuid_sstable_identifier_enabled", and the related cluster feature bit, which is set once all nodes in this cluster set this option to "true". and the sstable subsystem will start using timeuuid instead plain integer for the identifier of sstables. timeuuid should be a better choice for identifiers as we don't need to worry about the id conflicts anymore. but we still have quite a few tests using static sstables with integer in their names, these tests are not changed in this series. we will create some tests to exercise the sstable subsystem with this option set.

a very simple inter-op test with Cassandra 4.1.1 was also performed to verify that the generated sstables can be read by the Cassandra:

1. start scylla, and connect it with cqlsh, run following commands, and stop it
    ```
    cqlsh> CREATE  KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy','replication_factor':1} ;
    cqlsh> CREATE TABLE ks.cf ( name text primary key, value text );
    cqlsh> INSERT INTO ks.cf (name, value) VALUES ('1', 'one');
    cqlsh> SELECT * FROM ks.cf;
    ```
2. enable Cassandra's `uuid_sstable_identifiers_enabled`, and start Cassandra 4.1.1, and connect it with cqlsh, run following commands, and stop it
    ```
    cqlsh> CREATE  KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy','replication_factor':1} ;
    cqlsh> CREATE TABLE ks.cf ( name text primary key, value text );
    cqlsh> INSERT INTO ks.cf (name, value) VALUES ('1', 'one');
    cqlsh> SELECT * FROM ks.cf;
    ```
2. move away the sstables generated by Cassandra, and replace it with the sstables generated by scylladb:
    ```console
    $ mv ~/cassandra/data/data/ks/cf-b29d23a009d911eeb5fed163c4d0af49 /tmp
    $ mv ~/scylla/ks/cf-db47a12009d611eea6b8b179df3a2d5d ~/cassandra/data/data/ks/cf-b29d23a009d911eeb5fed163c4d0af49
    ```
3. start Cassandra 4.1.1 again, and connect it with cqlsh, run following commands
    ```
    cqlsh> SELECT * FROM ks.cf;
     name | value
    ------+-------
        1 |   one
    ```

Fixes https://github.com/scylladb/scylladb/issues/10459

Closes #13932

* github.com:scylladb/scylladb:
  replica,sstable: introduce invalid generation id
  sstables, replica: pass uuid_sstable_identifiers to generation generator
  gms/feature_service: introduce UUID_SSTABLE_IDENTIFIERS cluster feature
  db: config: add uuid_sstable_identifiers_enabled option
  sstables, replica: support UUID in generation_type
2023-06-15 15:23:24 +03:00
Petr Gusev
fe5e1a5462 storage_service: warn if can't find ip for server
This shouldn't happen during normal operation.
2023-06-15 15:52:50 +04:00
Petr Gusev
5a3384f495 storage_proxy.cc: add and use global_token_metadata_barrier
fence_old_reads is removed since it's replaced by this barrier.
2023-06-15 15:52:50 +04:00
Petr Gusev
d9d29ec293 storage_service: exec_global_command: bool result -> exceptions
This allows to reflect cause-and-effect
relationships in the logs: if some command
failed, we write to the log at the top level
of the topology state machine. The log message
includes the current state of the state
machine and a description of what
exactly went wrong.

Note that in the exec_global_command overload
returning node_to_work_on we don't call retake_node()
if the nested exec_global_command failed.
This is fine, since all the callers
just log/break in this case.
2023-06-15 15:52:50 +04:00
Petr Gusev
96a1c661bd raft_topology: add cmd_index to raft commands
In this commit we add logic to protect against
raft commands reordering. This way we can be
sure that the topology state
(_topology_state_machine._topology) on all the
nodes processing the command is consistent
with the topology state on the topology change
coordinator. In particular, this allows
us to simply use _topology.version as the current
version in barrier_and_drain instead of passing it
along with the command as a parameter.

Topology coordinator maintains an index of the last
command it has sent to the cluster. This index is
incremented for each command and sent along with it.
The receiving node compares it with the last index
it received in the same term and returns an error
if it's not greater. We are protected
against topology change coordinator migrating
to other node by the already existing
terms check: if the term from the command
doesn't match the current term we return an error.
2023-06-15 15:52:50 +04:00
Petr Gusev
94605e4839 storage_proxy.cc: add fencing to read RPCs
On the call site we use the version captured in
read_executor/erm/token_metadata. In the handlers
we use apply_fence twice just like in mutation RPC.

Fencing was also added to local query calls, such as
query_result_local in make_data_request. This is for
the case when query coordinator was isolated from
topology change coordinator and didn't receive
barrier_and_drain.
2023-06-15 15:52:50 +04:00
Petr Gusev
4004ce1f44 storage_proxy.cc: extract handle_read
We continue the refactoring by introducing
the common implementation for all read methods.
2023-06-15 15:52:50 +04:00
Petr Gusev
2d791a5ed4 storage_proxy.cc: refactor encode_replica_exception_for_rpc
We are going to add fencing to read RPCs, it would be easier
to do it once for all three of them. This refactoring
enables this since it allows to use
encode_replica_exception_for_rpc for handle_read_digest.
2023-06-15 15:52:50 +04:00
Petr Gusev
6b115e902b storage_proxy: fix indentation 2023-06-15 15:52:50 +04:00
Petr Gusev
46f73fcaa6 storage_proxy: add fencing for mutation
At the call site, we use the version, captured
in erm/token_metadata. In the handler, we use
double checking, apply_fence after the local
write guarantees that no mutations
succeed on coordinators if the fence version
has been updated on the replica during the write.

Fencing was also added to mutate_locally calls
on request coordinator, for the case
if this coordinator was isolated from the
topology change coordinator and missed the
barrier_and_drain command.
2023-06-15 15:52:49 +04:00
Petr Gusev
7fe707570a storage_servie: fix indentation 2023-06-15 15:48:00 +04:00
Petr Gusev
d34da12240 storage_proxy: add fencing_token and related infrastructure
A new stale_topology_exception was introduced,
it's raised in apply_fence when an RPC comes
with a stale fencing_token.

An overload of apply_fence with future will be
used to wrap the storage_proxy methods which
need to be fenced.
2023-06-15 15:48:00 +04:00
Petr Gusev
f6b019c229 raft topology: add fence_version
It's stored outside of topology table,
since it's updated not through RAFT, but
with a new 'fence' raft command.
The current value is cached in shared_token_metadata.
An initial fence version is loaded in main
during storage_service initialisation.
2023-06-15 15:48:00 +04:00
Petr Gusev
4f99302c2b raft_topology: add barrier_and_drain cmd
We use utils::phased_barrier. The new phase
is started each time the version is updated.
We track all instances of token_metadata,
when an instance is destroyed the
corresponding phased_barrier::operation is
released.
2023-06-15 15:48:00 +04:00
Petr Gusev
253d8a8c65 token_metadata: add topology version
It's stored in as a static column in topology table,
will be updated at various steps of the topology
change state machine.

The initial value is 1, zero means that topology
versions are not yet supported, will be
used in RPC handling.
2023-06-15 15:48:00 +04:00
Kefu Chai
2d265e860d replica,sstable: introduce invalid generation id
the invalid sstable id is the NULL of a sstable identifier. with
this concept, it would be a lot simpler to find/track the greatest
generation. the complexity is hidden in the generation_type, which
compares the a) integer-based identifiers b) uuid-based identifiers
c) invalid identitifer in different ways.

so, in this change

* the default constructor generation_type is
  now public.
* we don't check for empty generation anymore when loading
  SSTables or enumerating them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-15 17:54:59 +08:00
Kefu Chai
939fa087cc sstables, replica: pass uuid_sstable_identifiers to generation generator
before this change, we assume that generation is always integer based.
in order to enable the UUID-based generation identifier if the related
option is set, we should populate this option down to generation generator.

because we don't have access to the cluster features in some places where
a new generation is created, a new accessor exposing feature_service from
sstable manager is added.

Fixes #10459
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-15 17:54:59 +08:00
Kefu Chai
49071e48ae gms/feature_service: introduce UUID_SSTABLE_IDENTIFIERS cluster feature
UUID_SSTABLE_IDENTIFIERS is a new cluster wide feature. if it is
enabled, all nodes will generate new sstables with the UUID as their
generation identifiers. this feature is configured using
config option of "uuid_sstable_identifiers_enabled".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-15 17:54:59 +08:00
Kefu Chai
4c2df04449 db: config: add uuid_sstable_identifiers_enabled option
unlike Cassandra 4.1, this option is true by default, will be used
for enabling cluster feature of "UUID_SSTABLE_IDENTIFIERS". not wired yet.

please note, because we are still using sstableloader and
sstabledump based on 3.x branch, while the Cassandra upstream
introduced the uuid sstable identifier in its 4.x branch, these tool
fail to work with the sstables with uuid identifier, so this option
is disabled when performing these tests. we will enable it once
these tools are updated to support the uuid-basd sstable identifiers.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-15 17:54:59 +08:00
Kefu Chai
15543464ce sstables, replica: support UUID in generation_type
this change generalize the value of generation_type so it also
supports UUID based identifier.

* sstables/generation_type.h:
  - add formatter and parse for UUID. please note, Cassandra uses
    a different format for formatting the SSTable identifier. and
    this formatter suits our needs as it uses underscore "_" as the
    delimiter, as the file name of components uses dash "-" as the
    delimiter. instead of reinventing the formatting or just use
    another delimiter in the stringified UUID, we choose to use the
    Cassandra's formatting.
  - add accessors for accessing the type and value of generation_type
  - add constructor for constructing generation_type with UUID and
    string.
  - use hash for placing sstables with uuid identifiers into shards
    for more uniformed distrbution of tables in shards.
* replica/table.cc:
  - only update the generator if the given generation contains an
    integer
* test/boost:
  - add a simple test to verify the generation_type is able to
    parse and format

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-15 17:54:59 +08:00
Kamil Braun
59d4bb3787 tracing: remove qp.get_migration_manager() calls
Pass `migration_manager&` from top-level instead.
2023-06-15 09:48:54 +02:00
Kamil Braun
1b68e8582b table_helper: remove qp.get_migration_manager() calls
Push those calls up the call stack, to `trace_keyspace_helper` module.
Pass `migration_manager` reference around together with
`query_processor` reference.
2023-06-15 09:48:54 +02:00