Commit Graph

286 Commits

Author SHA1 Message Date
Kamil Braun
0eee196a2e migration_manager: disable schema pulls when schema is Raft-managed
We want to disable `migration_manager` schema pulls and make schema
managed only by Raft group 0 if Raft is enabled. This will be important
with Raft-based topology, when schema will depend on topology (e.g. for
tablets).

We solved the problem partially in PR #13695. However, it's still
possible for a bootstrapping node to pull schema in the early part of
bootstrap procedure, before it setups group 0, because of how the
currently used `_raft_gr.using_raft()` check is implemented.

Here's the list of cases:
- If a node is bootstrapping in non-Raft mode, schema pulls must remain
  enabled.
- If a node is bootstrapping in Raft mode, it should never perform a
  schema pull.
- If a bootstrapped node is restarting in non-Raft mode but with Raft
  feature enabled (which means we should start upgrading to use Raft),
  or restarting in the middle of Raft upgrade procedure, schema pulls must
  remain enabled until the Raft upgrade procedure finishes.
  This is also the case of restarting after RECOVERY.
- If a bootstrapped node is restarting in Raft mode, it should never
  perform a schema pull.

The `raft_group0` service is responsible for setting up Raft during boot
and for the Raft upgrade procedure. So this is the most natural place to
make the decision that schema pulls should be disabled. Instead of
trying to come up with a correct condition that fully covers the above
list of cases, store a `bool` inside `migration_manager` and set it from
`raft_group0` function at the right moment - when we decide that we
should boot in Raft mode, or restart with Raft, or upgrade. Most of the
conditions are already checked in `setup_group0_if_exist`, we just need
to set the bool. Also print a log message when schema pulls are
disabled.

Fix a small bug in `migration_manager::get_schema_for_write` - it was
possible for the function to mark schema as synced without actually
syncing it if it was running concurrently to the Raft upgrade procedure.

Correct some typos in comments and update the comments.

Fixes #12870
2023-06-30 11:06:02 +02:00
Tomasz Grabiec
a9282103ba Merge 'Call storage_service notifications only after keyspace schema changes are applied on all shards' from Benny Halevy
This series aims at hardening schema merges and preventing inconsistencies across shards by
updating the database shards before calling the notification callback.

As seen in #13137, we don't want to call the notifications on all shards in parallel while the database shards are in flux.

In addition, any error to update the keyspace will cause abort so not to leave the database shards in an inconsistent state .

Other changes optimize this path by:
- updating shard 0 first, to seed the effective_replication_map.
- executing `storage_service::keyspace_changed` only once, on shard 0 to prevent quadratic update of the token_metadata and e_r_m on every keyspace change.

Fixes #13137

Closes #14158

* github.com:scylladb/scylladb:
  migration_manager: propagate listener notification exceptions
  storage_service: keyspace_changed: execute only on shard 0
  database: modify_keyspace_on_all_shards: execute func first on shard 0
  database: modify_keyspace_on_all_shards: call notifiers only after applying func on all shards
  database: add modify_keyspace_on_all_shards
  schema_tables: merge_keyspaces: extract_scylla_specific_keyspace_info for update_keyspace
  database: create_keyspace_on_all_shards
  database: update_keyspace_on_all_shards
  database: drop_keyspace_on_all_shards
2023-06-29 12:17:53 +02:00
Gleb Natapov
945f476363 test: add test for group0 raft command merging
Add a test that submits 3 large commands each one a little bit larger
than 1/3 of maximum mutation size. Check that in the end 2 command were
executed (first 2 were merged and third was executed separately).
2023-06-27 14:59:55 +03:00
Benny Halevy
825d617a53 migration_manager: propagate listener notification exceptions
1e29b07e40 claimed
to make event notification exception safe,
but swallawing the exceptions isn't safe at all,
as this might leave the node in an inconsistent state
if e.g. storage_service::keyspace_changed fails on any of the
shards.  Propagating the exception here will cause abort,
but it is better than leaving the node up, but in an
inconsistent state.

We keep notifying other listeners even if any of them failed
Based on 1e29b07e40:
```
If one of the listeners throws an exception, we must ensure that other
listeners are still notified.
```

The decision about swallowing exceptions can't be
made in such a generic layer.
Specific notification listeners that may ignore exceptions,
like in transport/evenet_notifier, may decide to swallow their
local exceptions on their own (as done in this patch).

Refs #3389

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-06-26 21:08:09 +03:00
Kefu Chai
4f85839be3 migration_manager: use try_emplace() when appropriate
try_emplace() is

- simpler than the lookup-and-insert dance, and
- presumably, it is more efficient.
- also, most importantly, it is simpler to read.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14237
2023-06-14 15:00:53 +03:00
Kefu Chai
25587b679d migration_manager: coroutineize migration_manager::do_merge_schema_from()
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-14 15:31:46 +08:00
Kefu Chai
befc9096d1 migration_manager: coroutineize migration_manager::sync_schema()
to reduce the indentation level, and to improve the readability.
also, take this opportunity to name some variables for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-06-14 15:19:56 +08:00
Pavel Emelyanov
5861d15912 Merge 'Small gossiper and migration_manager cleanups' from Gleb
Some assorted cleanups here: consolidation of schema agreement waiting
into a single place and removing unused code from the gossiper.

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1458/

Reviewed-by: Konstantin Osipov <kostja@scylladb.com>

* gleb/gossiper-cleanups of github.com:scylladb/scylla-dev:
  storage_service: avoid unneeded copies in on_change
  storage_service: remove check that is always true
  storage_service: rename handle_state_removing to handle_state_removed
  storage_service: avoid string copy
  storage_service: delete code that handled REMOVING_TOKENS state
  gossiper: remove code related to advertising REMOVING_TOKEN state
  migration_manager: add wait_for_schema_agreement() function
2023-05-27 10:49:54 +03:00
Gleb Natapov
a429018a8a migration_manager: add wait_for_schema_agreement() function
Several subsystems re-implement the same logic for waiting for schema
agreement. Provide the function in the migration_manager and use it
instead.
2023-05-25 14:44:53 +03:00
Alejo Sanchez
91f609d065 migration_manager: do not pull schema if raft is on
After consistent schema changes, remove schema pulls from gossiper
events if Raft is enabled, and considering Raft upgrade state.

Only disable pull if Raft is fully enabled.

Fixes #12870

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13695
2023-05-24 10:39:45 +02:00
Tomasz Grabiec
05be5e969b migration_manager: Fix snapshot transfer failing if TABLETS feature is not enabled
Without the feature, the system schema doesn't have the table, and the
read will fail with:

   Transferring snapshot to ... failed with: seastar::rpc::remote_verb_error (Can't find a column family tablets in keyspace system)

We should not attempt to read tablet metadata in the experimental
feature is not enabled.

Fixes #13946
Closes #13947
2023-05-19 09:58:56 +02:00
Pavel Emelyanov
5216dcb1b3 Merge 'db/system_keyspace: remove the dependency on storage_proxy' from Botond Dénes
The `system_keyspace` has several methods to query the tables in it. These currently require a storage proxy parameter, because the read has to go through storage-proxy. This PR uses the observation that all these reads are really local-replica reads and they only actually need a relatively small code snippet from storage proxy. These small code snippets are exported into standalone function in a new header (`replica/query.hh`). Then the system keyspace code is patched to use these new standalone functions instead of their equivalent in storage proxy. This allows us to replace the storage proxy dependency with a much more reasonable dependency on `replica::database`.

This PR patches the system keyspace code and the signatures of the affected methods as well as their immediate callers. Indirect callers are only patched to the extent it was needed to avoid introducing new includes (some had only a forward-declaration of storage proxy and so couldn't get database from it). There are a lot of opportunities left to free other methods or maybe even entire subsystems from storage proxy dependency, but this is not pursued in this PR, instead being left for follow-ups.

This PR was conceived to help us break the storage proxy -> storage service -> system tables -> storage proxy dependency loop, which become a major roadblock in migrating from IP -> host_id. After this PR, system keyspace still indirectly depends on storage proxy, because it still uses `cql3::query_processor` in some places. This will be addressed in another PR.

Refs: #11870

Closes #13869

* github.com:scylladb/scylladb:
  db/system_keyspace: remove dependency on storage_proxy
  db/system_keyspace: replace storage_proxy::query*() with  replica:: equivalent
  replica: add query.hh
2023-05-18 10:53:27 +03:00
Tomasz Grabiec
a91e83fad6 Merge "issue raft read barrier before pulling schema" from Gleb
Schema pull may fail because the pull does not contain everything that
is needed to instantiate a schema pointer. For instance it does not
contain a keyspace. This series changes the code to issue raft read
barrier before the pull which will guaranty that the keyspace is created
before the actual schema pull is performed.
2023-05-14 14:14:24 +03:00
Botond Dénes
157fdb2f6d db/system_keyspace: remove dependency on storage_proxy
The methods that take storage_proxy as argument can now accept a
replica::database instead. So update their signatures and update all
callers. With that, system_keyspace.* no longer depends on storage_proxy
directly.
2023-05-12 07:27:55 -04:00
Gleb Natapov
7caf1d26fb migration manager: Make schema pull abortable.
Now which schema pull may issues raft read barrier it may stuck if
majority is not available. Make the operation abortable and abort it
during queries if timeout is reached.
2023-05-11 16:31:23 +03:00
Gleb Natapov
70189b60de migration manager: if raft is enables sync with group0 leader before pulling a schema which is not available locally
Schema pull may fail because the pull does not contain everything that
is needed to instantiate a schema pointer. For instance it does not
contain a keyspace. This patch changes the code to issue raft read
barrier before the pull which will guaranty that the keyspace is created
before the actual schema pull is performed.

Refs: #3760
Fixes: #13211
2023-05-11 13:28:54 +03:00
Wojciech Mitros
9ae1b02144 service: revoke permissions on functions when a function/keyspace is dropped
Currently, when a user has permissions on a function/all functions in
keyspace, and the function/keyspace is dropped, the user keeps the
permissions. As a result, when a new function/keyspace is created
with the same name (and signature), they will be able to use it even
if no permissions on it are granted to them.

Simliarly to regular UDFs, the same applies to UDAs.

After this patch, the corresponding permissions on functions are dropped
when a function/keyspace is dropped.

Fixes #13820

Closes #13823
2023-05-10 14:39:42 +03:00
Kamil Braun
30cc07b40d Merge 'Introduce tablets' from Tomasz Grabiec
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.

Things achieved in this PR:

  - You can start a cluster and create a keyspace whose tables will use
    tablet-based replication. This is done by setting `initial_tablets`
    option:

    ```
        CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
                        'replication_factor': 3,
                        'initial_tablets': 8};
    ```

    All tables created in such a keyspace will be tablet-based.

    Tablet-based replication is a trait, not a separate replication
    strategy. Tablets don't change the spirit of replication strategy, it
    just alters the way in which data ownership is managed. In theory, we
    could use it for other strategies as well like
    EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
    is augmented to support tablets.

  - You can create and drop tablet-based tables (no DDL language changes)

  - DML / DQL work with tablet-based tables

    Replicas for tablet-based tables are chosen from tablet metadata
    instead of token metadata

Things which are not yet implemented:

  - handling of views, indexes, CDC created on tablet-based tables
  - sharding is done using the old method, it ignores the shard allocated in tablet metadata
  - node operations (topology changes, repair, rebuild) are not handling tablet-based tables
  - not integrated with compaction groups
  - tablet allocator piggy-backs on tokens to choose replicas.
    Eventually we want to allocate based on current load, not statically

Closes #13387

* github.com:scylladb/scylladb:
  test: topology: Introduce test_tablets.py
  raft: Introduce 'raft_server_force_snapshot' error injection
  locator: network_topology_strategy: Support tablet replication
  service: Introduce tablet_allocator
  locator: Introduce tablet_aware_replication_strategy
  locator: Extract maybe_remove_node_being_replaced()
  dht: token_metadata: Introduce get_my_id()
  migration_manager: Send tablet metadata as part of schema pull
  storage_service: Load tablet metadata when reloading topology state
  storage_service: Load tablet metadata on boot and from group0 changes
  db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
  migration_notifier: Introduce before_drop_keyspace()
  migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
  test: perf: Introduce perf-tablets
  test: Introduce tablets_test
  test: lib: Do not override table id in create_table()
  utils, tablets: Introduce external_memory_usage()
  db: tablets: Add printers
  db: tablets: Add persistence layer
  dht: Use last_token_of_compaction_group() in split_token_range_msb()
  locator: Introduce tablet_metadata
  dht: Introduce first_token()
  dht: Introduce next_token()
  storage_proxy: Improve trace-level logging
  locator: token_metadata: Fix confusing comment on ring_range()
  dht, storage_proxy: Abstract token space splitting
  Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
  db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
  db: Introduce get_non_local_vnode_based_strategy_keyspaces()
  service: storage_proxy: Avoid copying keyspace name in write handler
  locator: Introduce per-table replication strategy
  treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
  locator: Introduce effective_replication_map
  locator: Rename effective_replication_map to vnode_effective_replication_map
  locator: effective_replication_map: Abstract get_pending_endpoints()
  db: Propagate feature_service to abstract_replication_strategy::validate_options()
  db: config: Introduce experimental "TABLETS" feature
  db: Log replication strategy for debugging purposes
  db: Log full exception on error in do_parse_schema_tables()
  db: keyspace: Remove non-const replication strategy getter
  config: Reformat
2023-04-27 09:40:18 +02:00
Tomasz Grabiec
46eae545ad migration_manager: Send tablet metadata as part of schema pull
This is currently used by group0 to transfer snapshot of the raft
state machine.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
41e69836fd db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
b754433ac1 migration_notifier: Introduce before_drop_keyspace()
Tablet allocator will need to inject mutations on keyspace drop.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
5b046043ea migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
It will be extended with listener notification firing, which is an
async operation.
2023-04-24 10:49:37 +02:00
Benny Halevy
c5d819ce60 gms: versioned_value: make members private
and provide accessor functions to get them.

1. So they can't be modified by mistake, as the versioned value is
   immutable. A new value must have a higher version.
2. Before making the version a strong gms::version_type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:37:32 +03:00
Kefu Chai
c37f4e5252 treewide: use fmt::join() when appropriate
now that fmtlib provides fmt::join(). see
https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view
there is not need to revent the wheel. so in this change, the homebrew
join() is replaced with fmt::join().

as fmt::join() returns an join_view(), this could improve the
performance under certain circumstances where the fully materialized
string is not needed.

please note, the goal of this change is to use fmt::join(), and this
change does not intend to improve the performance of existing
implementation based on "operator<<" unless the new implementation is
much more complicated. we will address the unnecessarily materialized
strings in a follow-up commit.

some noteworthy things related to this change:

* unlike the existing `join()`, `fmt::join()` returns a view. so we
  have to materialize the view if what we expect is a `sstring`
* `fmt::format()` does not accept a view, so we cannot pass the
  return value of `fmt::join()` to `fmt::format()`
* fmtlib does not format a typed pointer, i.e., it does not format,
  for instance, a `const std::string*`. but operator<<() always print
  a typed pointer. so if we want to format a typed pointer, we either
  need to cast the pointer to `void*` or use `fmt::ptr()`.
* fmtlib is not able to pick up the overload of
  `operator<<(std::ostream& os, const column_definition* cd)`, so we
  have to use a wrapper class of `maybe_column_definition` for printing
  a pointer to `column_definition`. since the overload is only used
  by the two overloads of
  `statement_restrictions::add_single_column_parition_key_restriction()`,
  the operator<< for `const column_definition*` is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 20:34:18 +08:00
Kefu Chai
3ae11de204 treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:53 +08:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Asias He
4571fcf9e7 token_metadata: Rename is_member to is_normal_token_owner
The name is_normal_token_owner is more clear than is_member.
The is_normal_token_owner reflects what it really checks.
2022-11-18 09:29:20 +08:00
Kamil Braun
43687be1f1 service/raft: raft_group0_client: prepare for upgrade procedure
Now, whether an 'group 0 operation' (today it means schema change) is
performed using the old or new methods, doesn't depend on the local RAFT
fature being enabled, but on the state of the upgrade procedure.

In this commit the state of the upgrade is always
`use_pre_raft_procedures` because the upgrade procedure is not
implemented yet. But stay tuned.

The upgrade procedure will need certain guarantees: at some point it
switches from `use_pre_raft_procedures` to `synchronize` state. During
`synchronize` schema changes must be disabled, so the procedure can
ensure that schema is in sync across the entire cluster before
establishing group 0. Thus, when the switch happens, no schema change
can be in progress.

To handle all this weirdness we introduce `_upgrade_lock` and
`get_group0_upgrade_state` which takes this lock whenever it returns
`use_pre_raft_procedures`. Creating a `group0_guard` - which happens at
the start of every group 0 operation - will take this lock, and the lock
holder shall be stored inside the guard (note: the holder only holds the
lock if `use_pre_raft_procedures` was returned, no need to hold it for
other cases). Because `group0_guard` is held for the entire duration of
a group 0 operation, and because the upgrade procedure will also have to
take this lock whenever it wants to change the upgrade state (it's an
rwlock), this ensures that no group 0 operation that uses the old ways
is happening when we change the state.

We also implement `wait_until_group0_upgraded` using a condition
variable. It will be used by certain methods during upgrade (later
commits; stay tuned).

Some additional comments were written.
2022-08-19 19:15:19 +02:00
Benny Halevy
2b017ce285 schema, everywhere: define and use table_schema_version as a strong type
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.

Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:45 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Benny Halevy
1fda686f96 idl: make idl headers self-sufficient
Add include statements to satisfy dependencies.

Delete, now unneeded, include directives from the upper level
source files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Jadw1
d13f347621 DB: Add scylla_aggregates system table
Saving information about UDA's reduce function to `scylla_aggregates`
table and distributing it across cluster.
2022-07-18 15:25:37 +02:00
Benny Halevy
acae3cc223 treewide: stop use of deprecated coroutine::make_exception
Convert most use sites from `co_return coroutine::make_exception`
to `co_await coroutine::return_exception{,_ptr}` where possible.

In cases this is done in a catch clause, convert to
`co_return coroutine::exception`, generating an exception_ptr
if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10972
2022-07-07 15:02:16 +03:00
Kamil Braun
4d439a16b3 treewide: remove unnecessary migration_manager::is_raft_enabled() calls
In schema_altering_statement: we will bounce statements to shard 0
whether Raft is enabled or not.

In migration_manager, when we're sending a group 0 snapshot: well, if
we're sending a group 0 snapshot, Raft must be enabled; the check is
redundant.
2022-06-23 16:14:41 +02:00
Gleb Natapov
2fa0519991 migration manager: remove unused code 2022-06-09 09:40:55 +03:00
Avi Kivity
4b53af0bd5 treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines
coroutine::parallel_for_each avoids an allocation and is therefore preferred. The lifetime
of the function object is less ambiguous, and so it is safer. Replace all eligible
occurences (i.e. caller is a coroutine).

One case (storage_service::node_ops_cmd_heartbeat_updater()) needed a little extra
attention since there was a handle_exception() continuation attached. It is converted
to a try/catch.

Closes #10699
2022-05-31 09:06:24 +03:00
Gleb Natapov
c2ef390a52 service: raft: move group0 write path into a separate file
Writing into the group0 raft group on a client side involves locking
the state machine, choosing a state id and checking for its presence
after operation completes. The code that does it resides now in the
migration manager since the currently it is the only user of group0. In
the near future we will have more client for group0 and they all will
have to have the same logic, so the patch moves it to a separate class
raft_group0_client that any future user of group0 can use to write
into it.

Message-Id: <YoYAJwdTdbX+iCUn@scylladb.com>
2022-05-19 17:21:35 +03:00
Avi Kivity
5937b1fa23 treewide: remove empty comments in top-of-files
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.

Closes #10562
2022-05-13 07:11:58 +02:00
Pavel Emelyanov
0aea43a245 gossiper: Make state and locks maps private
Locks are not needed outside gossiper, state map is sometimes read from,
but there a const getter for such cases. Both methods now desrve the
underbar prefix, but it doesn't come with this short patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-06 10:34:48 +03:00
Avi Kivity
19ab3edd77 gms: feature_service: remove variable/helper function duplication
Each feature has a private variable and a public accessor. Since the
accessor effectively makes the variable public, avoid the intermediary
and make the variable public directly.

To ease mechanical translation, the variable name is chosen as
the function name (without the cluster_supports_ prefix).

References throughout the codebase are adjusted.
2022-05-04 18:59:56 +03:00
Avi Kivity
de0ee13f45 schema_tables: forward-declare user_function and user_aggerates
These bring in wasm.hh (though they really shouldn't) and make
everyone suffer. Forward declare instead and add missing includes
where needed.

Closes #10444
2022-04-28 07:22:02 +03:00
Piotr Sarna
58529591a9 database,cql3: add STORAGE option to keyspaces
The STORAGE option is designed to hold a map of options
used for customizing storage for given keyspace.
The option is kept in a system_schema.scylla_keyspaces table.
The option is only available if the whole cluster is aware
of it - guarded by a cluster feature.

Example of the table contents:
```
cassandra@cqlsh> select * from system_schema.scylla_keyspaces;

 keyspace_name | storage_options                                | storage_type
---------------+------------------------------------------------+--------------
           ksx | {'bucket': '/tmp/xx', 'endpoint': 'localhost'} |           S3
```
2022-04-08 09:17:01 +02:00
Gleb Natapov
c17a03727c migration_manager: drain migration manager before stopping protocol servers on shutdown
When protocol servers are stopping they wait for all active queries to
complete, but DDL queries use migration manager internally, so if they
hang there protocol servers will not be able to stop since migration
manager is drained afterwords. The patch moves the migration manager
draining before protocol servers stoppage.

Since after the patch migration managers is drained before messaging
service is stopped we need to make sure that no rpc request triggers new
migration manager requests. We do it by making sure that any attempt to
issue such a request after aborted will return abort_requested_exception.
2022-03-31 10:00:29 +03:00
Gleb Natapov
e52abdca30 migration_manager: pass abort source to raft primitives
We want to be able to abort raft operations on migration manager drain.
MM already has an abort source that is signaled on drain, so all that is
left is to pass it to raft calls.
2022-03-31 10:00:29 +03:00
Pavel Emelyanov
b80d5f8900 schema_tables: Add sharded<system_keyspace> argument to update_schema_version_and_announce
All its (indirect) callers had been patched to have it, now it's
possible to have the argument in it. Next patch will make use of it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Pavel Emelyanov
42e733bdf7 migration_manager: Keep sharded<system_keyspace> reference
The main target here is system_keyspace::update_schema_version() which
is now static, but needs to have system_keyspace at "this". Migration
manager is one of the places that calls that method indirectly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Nadav Har'El
7be3129458 cdc: don't need current keyspace to create the log table
CDC registers to the table-creation hook (before_create_column_family)
to add a second table - the CDC log table - to the same keyspace.
The handler function (on_before_update_column_family() in cdc/log.cc)
wants to retrieve the keyspace's definition, but that does NOT WORK if
we create the keyspace and table in one operation (which is exactly what
we intend to do in Alternator to solve issue #9868) - because at the
time of the hook, the keyspace does not yet exist in the schema.

It turns out that on_before_update_column_family() does not REALLY need
the keyspace. It needed it to pass it on to make_create_table_mutations()
but that function doesn't use the keyspace parameter passed to it! All
it needs is the keyspace's name - which is in the schema anyway and
doesn't need to be looked up.

So in this patch we fix make_create_table_mutations() to not require the
unused keyspace parameter - and fix the CDC code not to look for the
keyspace that is no longer needed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220215162342.622509-1-nyh@scylladb.com>
2022-02-16 08:38:56 +02:00
Benny Halevy
b94c9ed3e6 migration_manager: passive_announce(version): handle exception
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-02 14:54:19 +02:00
Kamil Braun
97ff98f3a7 service: migration_manager: retry schema change command on transient failures
The call to `raft::server::add_entry` in `announce_with_raft` may fail
e.g. due to a leader change happening when we try to commit the entry.
In cases like this it makes sense to retry the command so we don't
prematurely report an error to the client.

This may result in double application of the command. Fortunately, the schema
change command is idempotent thanks to the group 0 state ID mechanism
(originally used to prevent conflicting concurrent changes from happening).
Indeed, once a command passes the state ID check, it changes the group 0
history last state ID, causing all later applications of that same
command to fail the check. Similarly, once a command fails the state ID
check, it means that the last state ID is different than the one
observed when the command was being constructed, so all further
applications of the command will also fail the check (it is not possible
for the last state ID to change from X to Y then back to X).

Note that this reasoning only works for commands with `prev_state_id`
engaged, such as the ones which we're using in
`migration_manager::announce_with_raft`. It would not work with
"unconditional commands" where `prev_state_id` is `nullopt` - for those
commands no state ID check is performed. It could still be safe to retry
those commands if they are idempotent for a different reason.

(Note: actually, our schema commands are already idempotent even without
the state ID check, because they simply apply a set of mutations, and
applying the same mutations twice is the same as applying them once.)
Message-Id: <20220131152926.18087-1-kbraun@scylladb.com>
2022-01-31 19:49:31 +01:00
Kamil Braun
4a52b802ac test: unit test for group 0 concurrent change protection and CQL DDL retries
Check that group 0 history grows iff a schema change does not throw
`group0_concurrent_modification`. Check that the CQL DDL statement retry
mechanism works as expected.
2022-01-27 11:26:15 +01:00