Commit Graph

75 Commits

Author SHA1 Message Date
Avi Kivity
3bead8cea0 feature: grandfather PER_TABLE_PARTITIONERS
The PER_TABLE_PARTITIONERS feature was added in 90df9a44ce (2020; 4.0)
and can now be assumed to be always present. We also remove the associated
schema_feature.
2024-05-18 00:15:07 +03:00
Avi Kivity
6b532fd40b test: schema_change_test: regenerate digest for PER_TABLE_PARTITIONERS
The first digest tested was generated without the PER_TABLE_PARTITIONERS
schema feature. We're about to make that feature mandatory, so we won't
be able (and won't need) to generate a digest without it.

Update the digest to include the feature. Note it wasn't untested before,
we have a test with schema_features::full().
2024-05-18 00:14:43 +03:00
Avi Kivity
c4d8b17f4c test: test_schema_change_digest: drop unneeded reference digests
digests[0] was used by the VIEW_VIRTUAL_COLUMNS feature, which
no longer exists.

digests[1] is the same as digests[2], so drop it.
2024-05-17 20:41:20 +03:00
Avi Kivity
b5f6021a6b feature: grandfather VIEW_VIRTUAL_COLUMNS
The VIEW_VIRTUAL_COLUMNS feature was added in a108df09f9 (2019; 3.1)
and can now be assumed to be always present.

The corresponding schema_feature is removed. Note schema_features are not sent
over the wire. A digest calculation without VIEW_VIRTUAL_COLUMNS is no longer tested.
2024-05-17 20:41:19 +03:00
Patryk Jędrzejczak
77342ffb34 test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls
In the following commit, we make the `consistent-topology-changes`
experimental feature unused. Then, all unit tests in the boost suite
will start using the raft-based topology by default. Unfortunately,
tests with multiple `single_node_cql_env::run_in_thread` calls
(usually coming from the `do_with_cql_env_thread` calls) would fail.

In a noninitial `run_in_thread` call, a node is started as if it
booted for the first time. On the other hand, it has its persistent
state from previous boots. Hence, the node can behave strangely and
unexpectedly. In particular, `SYSTEM.TOPOLOGY` is not empty and the
assertion that expects it to be empty when we boot for the first
time fails.

We fix this issue by making noninitial `run_in_thread` calls
behave as normal restarts.

After this change,
`test_schema_digest_does_not_change_with_disabled_features` starts
failing. This test copies the data directory before booting for the
first time, so the new
`_sys_ks.local().build_bootstrap_info().get();` makes the node
incorrectly think it restarts. Then, after noticing it is not a part
of group 0, the node would start the raft upgrade procedure if we
didn't run it in the raft RECOVERY mode. This procedure would get
stuck because it depends on messaging being enabled even if the node
communicates only with itself and messaging is disabled in boost tests.
2024-04-25 14:33:21 +02:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kefu Chai
97587a2ea4 test/boost: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17139
2024-02-06 13:22:16 +02:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Kefu Chai
6c06751640 cdc: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16725
2024-01-11 09:13:37 +02:00
Patryk Jędrzejczak
5ebfbf42bc db: config: make consistent_cluster_management mandatory
Code that executed only when consistent_cluster_management=false is
removed. In particular, after this patch:
- raft_group0 and raft_group_registry are always enabled,
- raft_group0::status_for_monitoring::disabled becomes unused,
- topology tests can only run with consistent_cluster_management.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
7dd7ec8996 test: boost: schema_change_test: replace disable_raft_schema_config
In the following commits, we make consistent cluster management
mandatory. This will make disable_raft_schema_config unusable,
so we need to get rid of it. However, we don't want to remove
tests that use it.

The idea is to use the Raft RECOVERY mode instead of disabling
consistent cluster management directly.
2023-12-14 16:54:04 +01:00
Kamil Braun
7dad31c78f feature_service: enable GROUP0_SCHEMA_VERSIONING in Raft mode
As promised in earlier commits:
Fixes: #7620
Fixes: #13957

Also modify two test cases in `schema_change_test` which depend on
the digest calculation method in their checks. Details are explained in
the comments.
2023-12-08 17:46:31 +01:00
Pavel Emelyanov
210b01a5ce config: Make object storage config updateable_value_source
Now its plain updateable_value, but without the ..._source object the
updateable_value is just a no-op value holder. In order for the
observers to operate there must be the value source, updating it would
update the attached updateable values _and_ notify the observers.

In order for the config to be the u.v._source, config entries should be
comparable to each other, thus the <=> operator for it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Avi Kivity
35849fc901 Revert "Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun"
This reverts commit 3d4398d1b2, reversing
changes made to 45dfce6632. The commit
causes some schema changes to be lost due to incorrect timestamps
in some mutations. More information is available in [1].

Reopens: scylladb/scylladb#7620
Reopens: scylladb/scylladb#13957

Fixes scylladb/scylladb#15530.

[1] https://github.com/scylladb/scylladb/pull/15687
2023-10-11 00:32:05 +03:00
Botond Dénes
f6575344df Merge 'Collect dangling object-store sstables' from Pavel Emelyanov
Sstables in transitional states are marked with the respective 'status' in the registry. Currently there are two of such -- 'creating' and 'removing'. And the 'sealed' status for sstables in use.

On boot the distributed loader tries to garbage collect the dangling sstables. For filesystem storage it's done with the help of temorary sstables' dirs and pending deletion logs. For s3-backed sstables, the garbage collection means fetching all non-sealed entries and removing the corresponding objects from the storage.

Test included (last patch)

fixes #13024

Closes scylladb/scylladb#15318

* github.com:scylladb/scylladb:
  test: Extend object_store test to validate GC works
  sstable_directory: Garbage collect S3 sstables on reboot
  sstable_directory: Pass storage to garbage_collect()
  sstable_directory: Create storage instance too
2023-09-21 09:15:00 +03:00
Kamil Braun
c2beee348a feature_service: enable GROUP0_SCHEMA_VERSIONING in Raft mode
As promised in earlier commits:
Fixes: #7620
Fixes: #13957

Also modify two test cases in `schema_change_test` which depend on
the digest calculation method in their checks. Details are explained in
the comments.
2023-09-15 17:54:36 +02:00
Pavel Emelyanov
a957e97ab4 sstable_directory: Create storage instance too
Right now the directory instance only creates lister, but lister is
unaware on exact objects manipulations. The storage is, so create it
too, it's going to be used by garbage collector in next patches

This change also needs fixing the way cql_test_env is configured for
schema_change_test. There are cases that try to pick up keyspace with S3
storage option from the pre-created sstables, and populating those would
need to provide some (even empty) object storage endpoint

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-12 09:29:34 +03:00
Patryk Jędrzejczak
866c9a904d test: always pass empty description to migration_manager::announce
In the next commit, we remove the default value for the
description parameter of migration_manager::announce to avoid
using it in the future. However, many calls to announce in tests
use the default value. We have to change it, but we don't really
care about descriptions in the tests, so we pass the empty string
everywhere.
2023-08-07 14:38:11 +02:00
Kamil Braun
84bb75ea0a Merge 'service: migration_manager: change the prepare_ methods to functions' from Patryk Jędrzejczak
The `migration_manager` service is responsible for schema convergence in
the cluster - pushing schema changes to other nodes and pulling schema
when a version mismatch is observed. However, there is also a part of
`migration_manager` that doesn't really belong there - creating
mutations for schema updates. These are the functions with `prepare_`
prefix. They don't modify any state and don't exchange any messages.
They only need to read the local database.

We take these functions out of `migration_manager` and make them
separate functions to reduce the dependency of other modules (especially
`query_processor` and CQL statements) on `migration_manager`. Since all
of these functions only need access to `storage_proxy` (or even only
`replica::database`), doing such a refactor is not complicated. We just
have to add one parameter, either `storage_proxy` or `database` and both
of them are easily accessible in the places where these functions are
called.

This refactor makes `migration_manager` unneeded in a few functions:
- `alternator::executor::create_keyspace`,
- `cql3::statements::alter_type_statement::prepare_announcement_mutations`,
- `cql3::statements::schema_altering_statement::prepare_schema_mutations`,
- `cql3::query_processor::execute_thrift_schema_command:`,
- `thrift::handler::execute_schema_command`.

We remove the `migration_manager&` parameter from all these functions.

Fixes #14339

Closes #14875

* github.com:scylladb/scylladb:
  cql3: query_processor::execute_thrift_schema_command: remove an unused parameter
  cql3: schema_altering_statement::prepare_schema_mutations: remove an unused parameter
  cql3: alter_type_statement::prepare_announcement_mutations: change parameters
  alternator: executor::create_keyspace: remove an unused parameter
  service: migration_manager: change the prepare_ methods to functions
2023-08-01 11:56:56 +02:00
Patryk Jędrzejczak
3468cbd66b service: migration_manager: change the prepare_ methods to functions
The migration_manager service is responsible for schema convergence
in the cluster - pushing schema changes to other nodes and pulling
schema when a version mismatch is observed. However, there is also
a part of migration_manager that doesn't really belong there -
creating mutations for schema updates. These are the functions with
prepare_ prefix. They don't modify any state and don't exchange any
messages. They only need to read the local database.

We take these functions out of migration_manager and make them
separate functions to reduce the dependency of other modules
(especially query_processor and CQL statements) on
migration_manager. Since all of these functions only need access
to storage_proxy (or even only replica::database), doing such a
refactor is not complicated. We just have to add one parameter,
either storage_proxy or database and both of them are easily
accessible in the places where these functions are called.
2023-07-28 13:55:27 +02:00
Tomasz Grabiec
1ecd3c1a9a test: schema_change_test: Verify digests also with TABLE_DIGEST_INSENSITIVE_TO_EXPIRY enabled
The new test cases are a mirror of old test cases, but with updated digests.
2023-07-12 21:21:55 +02:00
Tomasz Grabiec
f2ed9fcd7e schema_mutations, migration_manager: Ignore empty partitions in per-table digest
Schema digest is calculated by querying for mutations of all schema
tables, then compacting them so that all tombstones in them are
dropped. However, even if the mutation becomes empty after compaction,
we still feed its partition key. If the same mutations were compacted
prior to the query, because the tombstones expire, we won't get any
mutation at all and won't feed the partition key. So schema digest
will change once an empty partition of some schema table is compacted
away.

Tombstones expire 7 days after schema change which introduces them. If
one of the nodes is restarted after that, it will compute a different
table schema digest on boot. This may cause performance problems. When
sending a request from coordinator to replica, the replica needs
schema_ptr of exact schema version request by the coordinator. If it
doesn't know that version, it will request it from the coordinator and
perform a full schema merge. This adds latency to every such request.
Schema versions which are not referenced are currently kept in cache
for only 1 second, so if request flow has low-enough rate, this
situation results in perpetual schema pulls.

After ae8d2a550d, it is more liekly to
run into this situation, because table creation generates tombstones
for all schema tables relevant to the table, even the ones which
will be otherwise empty for the new table (e.g. computed_columns).

This change inroduces a cluster feature which when enabled will change
digest calculation to be insensitive to expiry by ignoring empty
partitions in digest calculation. When the feature is enabled,
schema_ptrs are reloaded so that the window of discrepancy during
transition is short and no rolling restart is required.

A similar problem was fixed for per-node digest calculation in
18f484cc753d17d1e3658bcb5c73ed8f319d32e8. Per-table digest calculation
was not fixed at that time because we didn't persist enabled features
and they were not enabled early-enough on boot for us to depend on
them in digest calculation. Now they are enabled before non-system
tables are loaded so digest calculation can rely on cluster features.

Fixes #4485.
2023-07-03 23:06:55 +02:00
Jan Ciolek
d2ef55b12c test: use NetworkTopologyStrategy in all unit tests
As described in https://github.com/scylladb/scylladb/issues/8638,
we're moving away from `SimpleStrategy`, in the future
it will become deprecated.

We should remove all uses of it and replace them
with `NetworkTopologyStrategy`.

This change replaces `SimpleStrategy` with
`NetworkTopologyStrategy` in all unit tests,
or at least in the ones where it was reasonable to do so.
Some of the tests were written explicitly to test the
`SimpleStrategy` strategy, or changing the keyspace from
`SimpleStrategy` to `NetworkTopologyStrategy`.
These tests were left intact.
It's still a feature that is supported,
even if it's slowly getting deprecated.

The typical way to use `NetworkTopologyStrategy` is
to specify a replication factor for each datacenter.
This could be a bit cumbersome, we would have to fetch
the list of datacenters, set the repfactors, etc.

Luckily there is another way - we can just specify
a replication factor to use for or each existing
datacenter, like this:
```cql
CREATE KEYSPACE {} WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1};
```

This makes the change rather straightforward - just replace all
instances of `'SimpleStrategy'', with `'NetworkTopologyStrategy'`.

Refs: https://github.com/scylladb/scylladb/issues/8638

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #13990
2023-05-23 08:52:56 +03:00
Kamil Braun
30cc07b40d Merge 'Introduce tablets' from Tomasz Grabiec
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.

Things achieved in this PR:

  - You can start a cluster and create a keyspace whose tables will use
    tablet-based replication. This is done by setting `initial_tablets`
    option:

    ```
        CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
                        'replication_factor': 3,
                        'initial_tablets': 8};
    ```

    All tables created in such a keyspace will be tablet-based.

    Tablet-based replication is a trait, not a separate replication
    strategy. Tablets don't change the spirit of replication strategy, it
    just alters the way in which data ownership is managed. In theory, we
    could use it for other strategies as well like
    EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
    is augmented to support tablets.

  - You can create and drop tablet-based tables (no DDL language changes)

  - DML / DQL work with tablet-based tables

    Replicas for tablet-based tables are chosen from tablet metadata
    instead of token metadata

Things which are not yet implemented:

  - handling of views, indexes, CDC created on tablet-based tables
  - sharding is done using the old method, it ignores the shard allocated in tablet metadata
  - node operations (topology changes, repair, rebuild) are not handling tablet-based tables
  - not integrated with compaction groups
  - tablet allocator piggy-backs on tokens to choose replicas.
    Eventually we want to allocate based on current load, not statically

Closes #13387

* github.com:scylladb/scylladb:
  test: topology: Introduce test_tablets.py
  raft: Introduce 'raft_server_force_snapshot' error injection
  locator: network_topology_strategy: Support tablet replication
  service: Introduce tablet_allocator
  locator: Introduce tablet_aware_replication_strategy
  locator: Extract maybe_remove_node_being_replaced()
  dht: token_metadata: Introduce get_my_id()
  migration_manager: Send tablet metadata as part of schema pull
  storage_service: Load tablet metadata when reloading topology state
  storage_service: Load tablet metadata on boot and from group0 changes
  db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
  migration_notifier: Introduce before_drop_keyspace()
  migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
  test: perf: Introduce perf-tablets
  test: Introduce tablets_test
  test: lib: Do not override table id in create_table()
  utils, tablets: Introduce external_memory_usage()
  db: tablets: Add printers
  db: tablets: Add persistence layer
  dht: Use last_token_of_compaction_group() in split_token_range_msb()
  locator: Introduce tablet_metadata
  dht: Introduce first_token()
  dht: Introduce next_token()
  storage_proxy: Improve trace-level logging
  locator: token_metadata: Fix confusing comment on ring_range()
  dht, storage_proxy: Abstract token space splitting
  Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
  db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
  db: Introduce get_non_local_vnode_based_strategy_keyspaces()
  service: storage_proxy: Avoid copying keyspace name in write handler
  locator: Introduce per-table replication strategy
  treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
  locator: Introduce effective_replication_map
  locator: Rename effective_replication_map to vnode_effective_replication_map
  locator: effective_replication_map: Abstract get_pending_endpoints()
  db: Propagate feature_service to abstract_replication_strategy::validate_options()
  db: config: Introduce experimental "TABLETS" feature
  db: Log replication strategy for debugging purposes
  db: Log full exception on error in do_parse_schema_tables()
  db: keyspace: Remove non-const replication strategy getter
  config: Reformat
2023-04-27 09:40:18 +02:00
Tomasz Grabiec
41e69836fd db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
5b046043ea migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
It will be extended with listener notification firing, which is an
async operation.
2023-04-24 10:49:37 +02:00
Pavel Emelyanov
f953fb2f52 schema_change_test: Use proxy from cql_test_env
There's one place where test case calls for storage proxy and currently
does it via global refernece. Time to switch it to cql_test_env's one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-21 14:18:00 +03:00
Kefu Chai
596ea6d439 test: drop unused captured variables
this should silence the warning like:
```
test/boost/multishard_mutation_query_test.cc:493:29: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    do_with_cql_env_thread([this] (cql_test_env& env) -> future<> {
                            ^~~~
test/boost/multishard_mutation_query_test.cc:577:29: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    do_with_cql_env_thread([this] (cql_test_env& env) -> future<> {
                            ^~~~
2 errors generated.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-22 21:21:04 +08:00
Nadav Har'El
843a5dfc15 Merge 'Allow setting permissions for user-defined functions' from Wojciech Mitros
This series aims to allow users to set permissions on user-defined functions.

The implementation is based on Cassandra's documentation and should be fully compatible: https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissions

Fixes: #5572
Fixes: #10633

Closes #12869

* github.com:scylladb/scylladb:
  cql3: allow UDTs in permissions on UDFs
  cql3: add type_parser::parse() method taking user_types_metadata
  schema_change_test: stop using non-existent keyspace
  cql3: fix parameter names in function resource constructors
  cql3: handle complex types as when decoding function permissions
  cql3: enforce permissions for ALTER FUNCTION
  cql-pytest: add a (failing) test case for UDT in UDF
  cql-pytest: add a test case for user-defined aggregate permissions
  cql-pytest: add tests for function permissions
  cql3: enforce permissions on function calls
  selection: add a getter for used functions
  abstract_function_selector: expose underlying function
  cql3: enforce permissions on DROP FUNCTION
  cql3: enforce permissions for CREATE FUNCTION
  client_state: add functions for checking function permissions
  cql-pytest: add a case for serializing function permissions
  cql3: allow specifying function permissions in CQL
  auth: add functions_resource to resources
2023-03-12 14:04:34 +02:00
Wojciech Mitros
4182a221d6 schema_change_test: stop using non-existent keyspace
The current implementation of CQL type parsing worked even
when given a string representing a non-existent keyspace, as
long as the parsed type was one of the "native" types. This
implementation is going to change, so that we won't parse
types given an incorrect keyspace name.
When using `do_with_cql_env`, a "ks" keyspace is created by
default, and "tests" keyspace is not. The tests for reverse
schemas in `schema_change_test` were using the "tests"
keyspace, so in order to make the tests work after the future
changes, they now use the existing "ks" keyspace.
2023-03-10 11:02:32 +01:00
Kefu Chai
3ae11de204 treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:53 +08:00
Tomasz Grabiec
c8e2bf1596 db: schema_tables: Optimize schema merge
Currently, applying a schema change on a replica works like this:

  Collect all affected keyspaces from incoming mutations
  Read current state of schema
  Apply the mutations
  Read new state of schema

The "Read ... state of schema" step reads all kinds of schema
objects. In particular, to read the "table" objects, it does the
following:

  for every affected keyspace k:
       read all mutations from system_schema.tables for k
       extract all existing table names from those mutations
       for every existing table:
             read mutations from {tables, columns, indexes, view_virtual_columns, ...} for that table

As you can see, the number of reads performed is O(nr tables in a
keyspace), not O(nr tables in a change). This means that making a
sequence of schema changes, like adding a table, is quadratic.

Another aspect which magnifies this is that we don't read those tables
using a single scan, but issue individual queries for each table
separately.

This patch optimizes this by considering only affected tables when
reading schema for the purpose of diff calculation.

When mutations contain multi-table deletions, we still read the
set of tables, like before. This could be optimized by looking
at the database to get the list, but it's not part of the patch.

I tested this using a test case provided by Kamil (kbr-scylla@53fe154)

  ./test.py --mode debug test_many_schema_changes -s

The test bootstraps a cluster and then creates about 40 schema
changes. Then a new node is bootstrapped and replays those changes via
group0.

In debug mode, each change takes roughly 2s to process before the
patch, and 0.5s after the patch.

The whole replay is reduced to 56% of what was before:

Before (1m19s) :

INFO  2023-01-20 19:44:35,848 [shard 0] raft_group0 - setup_group0: ensuring that the cluster has fully upgraded to use Raft...
INFO  2023-01-20 19:45:54,844 [shard 0] raft_group0 - setup_group0: waiting for peers to synchronize state...

After (45s):

INFO  2023-01-20 22:02:51,869 [shard 0] raft_group0 - setup_group0: ensuring that the cluster has fully upgraded to use Raft...
INFO  2023-01-20 22:03:36,834 [shard 0] raft_group0 - setup_group0: waiting for peers to synchronize state...

Closes #12592

Closes #12592
2023-02-21 17:26:57 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Raphael S. Carvalho
3c5afb2d5c test: Enable Scylla test command line options for boost tests
We have enabled the command line options without changing a
single line of code, we only had to replace old include
with scylla_test_case.hh.

Next step is to add x-log-compaction-groups options, which will
determine the number of compaction groups to be used by all
instantiations of replica::table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Kamil Braun
bed555d1e5 db: system_keyspace: rename 'raft_config' to 'raft_snapshot_config'
Make it clear that the table stores the snapshot configuration, which is
not necessarily the currently operating configuration (the last one
appended to the log).

In the future we plan to have a separate virtual table for showing the
currently operating configuration, perhaps we will call it
`system.raft_config`.
2023-01-12 16:21:26 +01:00
Tomasz Grabiec
ae8d2a550d db: schema_tables: Make table creation shadow earlier concurrent changes
Issuing two CREATE TABLE statements with a different name for one of
the partition key columns leads to the following assertion failure on
all replicas:

scylla: schema.cc:363: schema::schema(const schema::raw_schema&, std::optional<raw_view_info>): Assertion `!def.id || def.id == id - column_offset(def.kind)' failed.

The reason is that once the create table mutations are merged, the
columns table contains two entries for the same position in the
partition key tuple.

If the schemas were the same, or not conflicting in a way which leads
to abort, the current behavior would be to drop the older table as if
the last CREATE TABLE was preceded by a DROP TABLE.

The proposed fix is to make CREATE TABLE mutation include a tombstone
for all older schema changes of this table, effectively overriding
them. The behavior will be the same as if the schemas were not
different, older table will be dropped.

Fixes #11396
2022-08-29 12:06:02 +02:00
Benny Halevy
2b017ce285 schema, everywhere: define and use table_schema_version as a strong type
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.

Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:45 +03:00
Nadav Har'El
cc69177dcc config: fix printing of experimental feature list
Recently we noticed a regression where with certain versions of the fmt
library,

   SELECT value FROM system.config WHERE name = 'experimental_features'

returns string numbers, like "5", instead of feature names like "raft".

It turns out that the fmt library keep changing their overload resolution
order when there are several ways to print something. For enum_option<T> we
happen to have to conflicting ways to print it:
  1. We have an explicit operator<<.
  2. We have an *implicit* convertor to the type held by T.

We were hoping that the operator<< always wins. But in fmt 8.1, there is
special logic that if the type is convertable to an int, this is used
before operator<<()! For experimental_features_t, the type held in it was
an old-style enum, so it is indeed convertible to int.

The solution I used in this patch is to replace the old-style enum
in experimental_features_t by the newer and more recommended "enum class",
which does not have an implicit conversion to int.

I could have fixed it in other ways, but it wouldn't have been much
prettier. For example, dropping the implicit convertor would require
us to change a bunch of switch() statements over enum_option (and
not just experimental_features_t, but other types of enum_option).

Going forward, all uses of enum_option should use "enum class", not
"enum". tri_mode_restriction_t was already using an enum class, and
now so does experimental_features_t. I changed the examples in the
comments to also use "enum class" instead of enum.

This patch also adds to the existing experimental_features test a
check that the feature names are words that are not numbers.

Fixes #11003.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11004
2022-07-11 09:17:30 +02:00
Piotr Sarna
151d8f7c58 test: regenerate schema_change_test for storage options case
Keyspace storage options series adds a new schema table:
system_schema.scylla_keyspaces. The regenerated cases ensure
that this new table is taken into account when the schema feature
is available.
2022-04-08 09:17:01 +02:00
Piotr Sarna
4705a5fa42 test: improve output of schema_change_test regeneration
Schema change test operates on pre-generated sstables, and sometimes
this set of sstables needs to be regenerated. In order to make the
regeneration process more ergonomic, the output is now directly
copyable as valid C++ representation of UUIDs.
2022-04-08 09:17:01 +02:00
Nadav Har'El
7be3129458 cdc: don't need current keyspace to create the log table
CDC registers to the table-creation hook (before_create_column_family)
to add a second table - the CDC log table - to the same keyspace.
The handler function (on_before_update_column_family() in cdc/log.cc)
wants to retrieve the keyspace's definition, but that does NOT WORK if
we create the keyspace and table in one operation (which is exactly what
we intend to do in Alternator to solve issue #9868) - because at the
time of the hook, the keyspace does not yet exist in the schema.

It turns out that on_before_update_column_family() does not REALLY need
the keyspace. It needed it to pass it on to make_create_table_mutations()
but that function doesn't use the keyspace parameter passed to it! All
it needs is the keyspace's name - which is in the schema anyway and
doesn't need to be looked up.

So in this patch we fix make_create_table_mutations() to not require the
unused keyspace parameter - and fix the CDC code not to look for the
keyspace that is no longer needed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220215162342.622509-1-nyh@scylladb.com>
2022-02-16 08:38:56 +02:00
Kamil Braun
a664ac7ba5 treewide: require group0_guard when performing schema changes
`announce` now takes a `group0_guard` by value. `group0_guard` can only
be obtained through `migration_manager::start_group0_operation` and
moved, it cannot be constructed outside `migration_manager`.

The guard will be a method of ensuring linearizability for group 0
operations.
2022-01-24 15:20:35 +01:00
Kamil Braun
283ac7fefe treewide: pass mutation timestamp from call sites into migration_manager::prepare_* functions
The functions which prepare schema change mutations (such as
`prepare_new_column_family_announcement`) would use internally
generated timestamps for these mutations. When schema changes are
managed by group 0 we want to ensure that timestamps of mutations
applied through Raft are monotonic. We will generate these timestamps at
call sites and pass them into the `prepare_` functions. This commit
prepares the APIs.
2022-01-24 15:12:50 +01:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Gleb Natapov
f0a41c102a test: move schema_change_test.cc to new schema announcement api 2022-01-13 23:10:18 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Gleb Natapov
38e1f85959 migration_manager: drop view_ptr array from announce_column_family_update()
No users pass it any longer.
2021-12-11 12:31:07 +02:00
Botond Dénes
3f4f408bcf schema: add get_reversed()
A variant of make_reversed() which goes through the schema registry,
teaching the schema to the registry if necessary. This effectively
caches the result of the reversing and as an added bonus double
reversing yields the very same schema C++ object that was the starting
point.

Closes #9365
2021-09-22 18:55:25 +03:00
Tomasz Grabiec
83113d8661 Merge "raft: new schema for storing raft snapshots" from Pavel Solodovnikov
Previously, the layout for storing raft snapshot
descriptors contained a `config` field, which had `blob`
data type.

That means `raft::configuration` for the snapshot was serialized
as a whole in binary form. It's convenient to implement and
is the most compact form of representing the data, but:

1. Hard to debug due to the need to de-serialize the data.
2. Plants a time bomb wrt. changing data layout and also the
   documentation in the future.

Remove the `config` field from `system.raft_snapshots` and
extract it to a separate `system.raft_config` table to store
the data in exploded form.

Also, modify the schema of `system.raft_snapshots` table in
the following way: add a `server_id` field as a part of
composite partition key ((group_id, server_id)) to
be able to start multiple raft servers belonging to one raft
group on the same scylla node.

Rename `id` field in `raft_snapshots` to `snapshot_id` so
it's self-documenting.

Rename `snapshot_id` from clustering key since a given server
can have only one snapshot installed at a time.

Note that the `raft::server_address` stucture contains an opaque
`info` member, which is `bytes`, but in the `raft_config` table
we use `ip_addr inet` field, instead. We always know that the
corresponding member field is going to contain an IP address (either v4
or v6) of a given raft server.

So, now the snapshots schema looks like this:

    CREATE TABLE raft_snapshots (
        group_id timeuuid,
        server_id uuid,
        snapshot_id uuid,
        idx int,
        term int,
        -- no `config` field here, moved to `raft_config` table
        PRIMARY KEY ((group_id, server_id))
    )

    CREATE TABLE raft_config (
         group_id timeuuid,
         my_server_id uuid,
         server_id uuid,
         disposition text, -- can be either 'CURRENT` or `PREVIOUS'
         can_vote bool,
         ip_addr inet,
         PRIMARY KEY ((group_id, my_server_id), server_id, disposition)
    );

This way it's much easier to extend the schema with new fields,
very easy to debug and inspect via CQL, and it's much more descriptive
in terms of self-documentation.

Tests: unit(dev)

* manmanson/raft_snapshots_new_schema_v2:
  test: adjust `schema_change_test` to include new `system.raft_config` table
  raft: new schema for storing raft snapshots
  raft: pass server id to `raft_sys_table_storage` instance
2021-09-10 20:41:59 +02:00
Botond Dénes
f200c8104a schema: introduce make_reversed()
`make_revered()` creates a schema identical to the schema instance it is
called on, with clustering order reversed. To distinguish the reverse
schema from the original one, the node-id part of its version UUID is
bit-flipped. This ensures that reversing a schema twice will result in
the identical schema to the original one (although a different C++
object).

This reversed schema will be used in reversed reads, so intermediate
layers can be ignorant of the fact that the read happens in reverse.
2021-09-09 11:49:05 +03:00