Commit Graph

1034 Commits

Author SHA1 Message Date
Pavel Emelyanov
512465288f main, cql_test_env: Start-stop batchlog manager in one "block"
Currently starting and stopping of b.m. is spread over main(). Keep it
close to each other.

Another trickery here is that calling b.m.::start() can only be done
after joining the cluster, because this start() spawns replay loop
which, in turn calls token_metadata::count_normal_token_owners() and if
the latter returns zero, the b.m. code uses it as a fraction denominator
and crashes.

With the above in mind, cql_test_env should start batchlog manager after
it "joins the ring" too. For now it doesn't make any difference, but
next patch will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-12 16:33:31 +03:00
Avi Kivity
89ba4e4a5e Merge 'Stop using anonymous minio bucket for tests' from Pavel Emelyanov
Currently minio starts with a bucket that has public anonymous access. Respectively, all tests use unsigned S3 requests. That was done for simplicity, and its better to apply some policy to the bucket and, consequentially, make tests sign their requests.

Other than the obvious benefit that we test requests signing in unit tests, another goal of this PR is to make it possible to simulate and test various error paths locally, e.g. #13745 and #13022

Closes #14525

* github.com:scylladb/scylladb:
  test/s3: Remove AWS_S3_EXTRA usage
  test/s3: Run tests over non-anonymous bucket
  test/minio: Create random temp user on start
  code: Rename S3_PUBLIC_BUCKET_FOR_TEST
2023-09-11 23:12:56 +03:00
Botond Dénes
b062b245ad Merge 'Don't cache dc:rack on system keyspace local cache' from Pavel Emelyanov
The local node's dc:rack pair is cached on system keyspace on start. However, most of other code don't need it as they get dc:rack from topology or directly from snitch. There are few places left that still mess with sysks cache, but they are easy to patch. So after this patch all the core code uses two sources of dc:rack -- topology / snitch -- instead of three.

Closes #15280

* github.com:scylladb/scylladb:
  system_keyspace: Don't require snitch argument on start
  system_keyspace: Don't cache local dc:rack pair
  system_keyspace: Save local info with explicit location
  storage_service: Get endpoint location from snitch, not system keyspace
  snitch: Introduce and use get_location() method
  repair: Local location variables instead of system keyspace's one
  repair: Use full endpoint location instead of datacenter part
2023-09-11 10:26:26 +03:00
Pavel Emelyanov
1d00cc5baa test/s3: Run tests over non-anonymous bucket
Currently minio applies anonymous public policy for the test bucket and
all tests just use unsigned S3 requests. This patch generates a policy
for the temporary minio user and removes the anon public one. All tests
are updated respectively to use the provided key:secret pair.

The use-https bit is off by default as minio still starts with plain
http. That's OK for now, all tests are local and have no secret data
anyway

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 11:16:13 +03:00
Pavel Emelyanov
e8e8539c7c code: Rename S3_PUBLIC_BUCKET_FOR_TEST
The bucket is going to stop being public, rename the env variable in
advance to make the essential patch smaller

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 10:25:53 +03:00
Pavel Emelyanov
5d52a35e05 system_keyspace: Don't require snitch argument on start
Now system keyspace is finally independent from snitch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-05 12:57:09 +03:00
Pavel Emelyanov
9926917bf5 system_keyspace: Save local info with explicit location
On boot system keyspace is kicked to insert local info into system.local
table. Among other things there's dc:rack pair which sys.ks. gets from
its cache which, in turn, should have been previously initialized from
snitch on sys.ks. start. This patch makes the local info updating method
get the dc:rack from caller via argument. Callers, in turn, call snitch
directly, because these are main and cql_test_env startup routines.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-05 12:54:46 +03:00
Pavel Emelyanov
13a0c29618 storage_service: Remove query processor arg from join_cluster()
The s.service since d42685d0cb is having on-board query processor ref^w
pointer and can use it to join cluster

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #15236
2023-09-05 07:30:37 +03:00
Botond Dénes
3e7ec6cc83 Merge 'Move cell assertion from cql_test_env to cql_assertions' from Pavel Emelyanov
The cql_test_env has a virtual require_column_has_value() helper that better fits cql_assertions crowd. Also, the helper in question duplicates some existing code, so it can also be made shorter (and one class table helper gets removed afterwards)

Closes #15208

* github.com:scylladb/scylladb:
  cql_assertions: Make permit from env
  table: Remove find_partition_slow() helper
  sstable_compaction_test: Do not re-decorate key
  cql_test_env: Move .require_column_has_value
  cql_test_env: Use table.find_row() shortcut
2023-08-30 08:34:05 +03:00
Pavel Emelyanov
137c7116dc cql_assertions: Make permit from env
To call table::find_row() one needs to provide a permit. Tests have
short and neat helper to create one from cql_test_env

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 16:01:29 +03:00
Pavel Emelyanov
4e9f380608 cql_test_env: Move .require_column_has_value
This env helper is only used by tests (from cql_query_test)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 15:38:33 +03:00
Pavel Emelyanov
7597663ef5 cql_test_env: Use table.find_row() shortcut
The require_column_has_value() finds the cell in three steps -- finds
partition, then row, then cell. The class table already has a method to
facilitate row finding by partition and clustering key

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 15:37:27 +03:00
Pavel Emelyanov
a61454be00 storage_service: Use local cdc gen. service in join_cluster()
The method in question accepts cdc_generation_service ref argument from
main and cql_test_env, but storage service now has local cdcv gen.
service reference, so this argument and its propagation down the stack
can be removed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 09:36:58 +03:00
Pavel Emelyanov
933ea0afe6 storage_service: Bring cdc_generation_service dependency back
It sort of reverts the 5a97ba7121 commit, because storage service now
uses the cdc generation service to serve raft topo updates which, in
turn, takes the cdc gen. service all over the raft code _just_ to make
it as an argument to storage service topo calls.

Also there's API carrying cdc gen. service for the single call and also
there's an implicit need to kick cdc gen. service on decommission which
also needs storage service to reference cdc gen. after boot is complete

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 09:36:58 +03:00
Botond Dénes
139ba553b8 Merge 'sstable, test: log sstable name and pk when capping local_deletion_time ' from Kefu Chai
in this series, we also print the sstable name and pk when writing a tombstone whose local_deletion_time (ldt for short) is greater than INT32_MAX which cannot be represented by an uint32_t.

Fixes #15015

Closes #15107

* github.com:scylladb/scylladb:
  sstable/writer: log sstable name and pk when capping ldt
  test: sstable_compaction_test: add a test for capped tombstone ldt
2023-08-23 09:29:54 +03:00
Kefu Chai
0bc99c7f49 test: sstable_compaction_test: add a test for capped tombstone ldt
local_delection_time (short for ldt) is a timestamp used for the
purpose of purging the tombstone after gc_grace_seconds. if its value
is greater than INT32_MAX, it is capped when being written to sstable.
this is very likely a signal of bad configuration or a even a bug in
scylla. so we keep track of it with a metric named
"scylla_sstables_capped_tombstone_deletion_time".

in this change, a test is added to verify that the metric is updated
upon seeing a tombstone with this abnormal ldt.

because we validate the consistency before and after compaction in
tests, this change adds a parameter to disable this check, otherwise,
because capping the ldt changes the mutation, the validation would
fail the test.

Refs #15015
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-21 19:25:32 +08:00
Pavel Emelyanov
6bc30f1944 system_keyspace: De-bloat .setup() from messing with system.local
On boot several manipulations with system.local are performed.

1. The host_id value is selected from it with key = local

   If not found, system_keyspace generates a new host_id, inserts the
   new value into the table and returns back

2. The cluster_name is selected from it with key = local

   Then it's system_keyspace that either checks that the name matches
   the one from db::config, or inserts the db::config value into the
   table

3. The row with key = local is updated with various info like versions,
   listen, rpc and bcast addresses, dc, rack, etc. Unconditionally

All three steps are scattered over main, p.1 is called directly, p.2 and
p.3 are executed via system_keyspace::setup() that happens rather late.
Also there's some touch of this table from the cql_test_env startup code.

The proposal is to collect this setup into one place and execute it
early -- as soon as the system.local table is populated. This frees the
system_keyspace code from the logic of selecting host id and cluster
name leaving it to main and keeps it with only select/insert work.

refs: #2795

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #15082
2023-08-20 21:24:31 +03:00
Tomasz Grabiec
bd8bb5d4b1 Merge 'Wire tablet into compaction group' from Raphael "Raph" Carvalho
Compaction group is the data plane for tablets, so this integration
allows each tablet to have its own storage (memtable + sstables).
A crucial step for dynamic tablets, where each tablet can be worked
on independently.

There are still some inefficiencies to be worked on, but as it is,
it already unlocks further development.

```
INFO  2023-07-27 22:43:38,331 [shard 0] init - loading tablet metadata
INFO  2023-07-27 22:43:38,333 [shard 0] init - loading non-system sstables
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 0 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 2 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 4 present for ks.cf
INFO  2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 6 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 1 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 3 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 5 present for ks.cf
INFO  2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 7 present for ks.cf
```

Closes #14863

* github.com:scylladb/scylladb:
  Kill scylla option to configure number of compaction groups
  replica: Wire tablet into compaction group
  token_metadata: Add this_host_id to topology config
  replica: Switch to chunked_vector for storing compaction groups
  replica: Generate group_id for compaction_group on demand
2023-08-18 15:17:17 +02:00
Gleb Natapov
4ffc39d885 cql3: Extend the scope of group0_guard during DDL statement execution
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.

Fixes: #13942

Message-ID: <ZNsynXayKim2XAFr@scylladb.com>
2023-08-17 15:52:48 +03:00
Raphael S. Carvalho
b578d6643f Kill scylla option to configure number of compaction groups
The option was introduced to bootstrap the project. It's still
useful for testing, but that translates into maintaining an
additional option and code that will not be really used
outside of testing. A possible option is to later map the
option in boost tests to initial_tablets, which may yield
the same effect for testing.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-08-16 18:23:53 -03:00
Raphael S. Carvalho
5d1f60439a token_metadata: Add this_host_id to topology config
The motivation is that token_metadata::get_my_id() is not available
early in the bootstrap process, as raft topology is pulled later
than new tables are registered and created, and this node is added
to topology even later.

To allow creation of compaction groups to retrieve "my id" from
token metadata early, initialization will now feed local id
into topology config which is immutable for each node anyway.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-08-16 18:23:44 -03:00
Avi Kivity
e8f3b073c3 Merge 'Maintain sstable state explicitly' from Pavel Emelyanov
An sstable can be in one of several states -- normal, quarantined, staging, uploading. Right now this "state" is hard-wired into sstable's path, e.g. quarantined sstable would sit in e.g. /var/lib/data/ks-cf-012345/quarantine/ directory. Respectively, there's a bunch of directory names constexprs in sstables.hh defining each "state". Other than being confusing, this approach doesn't work well with S3 backend. Additionally, there's snapshot subdir that adds to the confusion, because snapshot is not quite a state.

This PR converts "state" from constexpr char* directories names into a enum class and patches the sstable creation, opening and state-changing API to use that enum instead of parsing the path.

refs: #13017
refs: #12707

Closes #14152

* github.com:scylladb/scylladb:
  sstable/storage: Make filesystem storage with initial state
  sstable: Maintain state
  sstable: Make .change_state() accept state, not directory string
  sstable: Construct it with state
  sstables_manager: Remove state-less make_sstable()
  table: Make sstables with required state
  test: Make sstables with upload state in some cases
  tools: Make sstables with normal state
  table: Open-code sstables making streaming helpers
  tests: Make sstables with normal state by default
  sstable_directory: Make sstable with required state
  sstable_directory: Construct with state
  distributed_loader: Make sstable with desired state when populating
  distributed_loader: Make sstable with upload state when uploading
  sstable: Introduce state enum
  sstable_directory: Merge verify and g.c. calls
  distributed_loader: Merge verify and gc invocations
  sstable/filesystem: Put underscores to dir members
  sstable/s3: Mark make_s3_object_name() const
  sstable: Remove filename(dir, ...) method
2023-08-15 17:44:06 +03:00
Raphael S. Carvalho
2590eec352 replica: Generate group_id for compaction_group on demand
There are a few good reasons for this change.
1) compaction_group doesn't have to be aware of # of groups
2) thinking forward to dynamic tablets, # of groups cannot be
statically embedded in group id, otherwise it gets stale.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-08-15 09:04:05 -03:00
Avi Kivity
d57a951d48 Revert "cql3: Extend the scope of group0_guard during DDL statement execution"
This reverts commit 70b5360a73. It generates
a failure in group0_test .test_concurrent_group0_modifications in debug
mode with about 4% probability.

Fixes #15050
2023-08-15 00:26:45 +03:00
Pavel Emelyanov
734c0820df tests: Make sstables with normal state by default
It's assumed that sstables are not very specific about which
subdirectory an sstable is, so they can use normal state. Places that
need to move sstables between states will use sstable manager API
explicitly

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-14 14:56:02 +03:00
Avi Kivity
b120d35c58 Merge 'Relax cql_test_env services maintenance' from Pavel Emelyanov
To add a sharded service to the cql_test_env one needs to patch it in 5 or 6 places

- add cql_test_env reference
- add cql_test_env constructor argument
- initialize the reference in initializer list
- add service variable to do_with method
- pass the variable to cql_test_env constructor
- (optionally) export it via cql_test_env public method

Steps 1 through 5 are annoying, things get much simpler if look like

- add cql_test_env variable
- (optionally) export it via cql_test_env public method

This is what this PR does

refs: #2795

Closes #15028

* github.com:scylladb/scylladb:
  cql_test_env: Drop local *this reference
  cql_test_env: Drop local references
  cql_test_env: Move most of the stuff in run_in_thread()
  cql_test_env: Open-code env start/stop and remove both
  cql_test_env: Keep other services as class variables
  cql_test_env: Keep services as class variables
  cql_test_env: Construct env early
  cql_test_env: De-static fdpinger variable
  cql_test_env: Define all services' variables early
  cql_test_env: Keep group0_client pointer
2023-08-13 20:24:52 +03:00
Gleb Natapov
70b5360a73 cql3: Extend the scope of group0_guard during DDL statement execution
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.

Fixes: #13942

Message-ID: <ZNSWF/cHuvcd+g1t@scylladb.com>
2023-08-13 14:19:39 +03:00
Pavel Emelyanov
64ddc9e4b4 cql_test_env: Drop local *this reference
The auto& env = *this is also now excessive, so drop it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:30:34 +03:00
Pavel Emelyanov
de679d7c36 cql_test_env: Drop local references
The local auto& foo = env._foo references in run_in_thread() a no longer
needed, the code that uses foo can be switched to use _foo (this->_foo)
instead

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:29:42 +03:00
Pavel Emelyanov
487ecae517 cql_test_env: Move most of the stuff in run_in_thread()
Thw do_with() method is static and cannot just access cql_test_env
variable's fields, using local references instead. To simplify this,
most of the method's content is moved to non-static run_in_thread()
method

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:28:40 +03:00
Pavel Emelyanov
2c175660f2 cql_test_env: Open-code env start/stop and remove both
These two just make more churn in next patch, so drop both

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:28:03 +03:00
Pavel Emelyanov
10f9292fe8 cql_test_env: Keep other services as class variables
There are more services on do_with() stack that are not referenced from
the cql_test_env. Move them to be class variables too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:27:19 +03:00
Pavel Emelyanov
08a3be3b17 cql_test_env: Keep services as class variables
Now they are duplicated -- variables exist on do_with() stack and the
class references some of them. This patch makes is vice-versa -- all the
variables are on the cql_test_env and do_with() references them. The
latter will change soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:26:21 +03:00
Pavel Emelyanov
b31d2097b8 cql_test_env: Construct env early
Its constructor is _just_ assigning references and setting up rlimits.
Both can happen early

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:25:49 +03:00
Pavel Emelyanov
49d4760655 cql_test_env: De-static fdpinger variable
So that it could be moved onto cql_test_env as a class member

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:25:25 +03:00
Pavel Emelyanov
749c5baf21 cql_test_env: Define all services' variables early
Nowadays they are all scattered along the .do_with() function. Keeping
them in one early place makes it possible to relocate them onto the
cql_test_env later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:23:54 +03:00
Pavel Emelyanov
d36737f094 cql_test_env: Keep group0_client pointer
It's now reference, but some time later it won't be able to get
initialized construction-time, so turn it into pointer

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 15:23:16 +03:00
Pavel Emelyanov
da98355bc8 test: Remove require_..._exists from cql_test_env
Not used by any code anymore. This makes cql_test_env shorter and nicer

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:36 +03:00
Pavel Emelyanov
6ead9a5255 test: Don't use require_table_exists() in test/lib/random_schema
This check is pointless. The subsequent call to find_column_family()
would call on_internal_error() in case schema is not found, and since
cql_test_env sets abort-on-internal-error to true, this would fail just
like that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-12 11:46:36 +03:00
Kamil Braun
59c410fb97 Merge 'migration_manager: announce: provide descriptions for all calls' from Patryk Jędrzejczak
The `system.group0_history` table provides useful descriptions for each
command committed to Raft group 0. One way of applying a command to
group 0 is by calling `migration_manager::announce`. This function has
the `description` parameter set to empty string by default. Some calls
to `announce` use this default value which causes `null` values in
`system.group0_history`. We want `system.group0_history` to have an
actual description for every command, so we change all default
descriptions to reasonable ones.

Going further, We remove the default value for the `description`
parameter of `migration_manager::announce` to avoid using it in the
future. Thanks to this, all commands in `system.group0_history` will
have a non-null description.

Fixes #13370

Closes #14979

* github.com:scylladb/scylladb:
  migration_manager: announce: remove the default value of description
  test: always pass empty description to migration_manager::announce
  migration_manager: announce: provide descriptions for all calls
2023-08-09 16:58:41 +02:00
Pavel Emelyanov
f1515c610e code: Remove query-context.hh
The whole thing is unused now, so the header is no longer needed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:11:07 +03:00
Pavel Emelyanov
413d81ac16 code: Remove qctx
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-08 11:10:56 +03:00
Patryk Jędrzejczak
866c9a904d test: always pass empty description to migration_manager::announce
In the next commit, we remove the default value for the
description parameter of migration_manager::announce to avoid
using it in the future. However, many calls to announce in tests
use the default value. We have to change it, but we don't really
care about descriptions in the tests, so we pass the empty string
everywhere.
2023-08-07 14:38:11 +02:00
Avi Kivity
6c1e44e237 Merge 'Make replica::database and cql3::query_processor share wasm manager' from Pavel Emelyanov
This makes it possible to remove remaining users of the global qctx.

The thing is that db::schema_tables code needs to get wasm's engine, alien runner and instance cache to build wasm context for the merged function or to drop it from cache in the opposite case. To get the wasm stuff, this code uses global qctx -> query_processor -> wasm chain. However, the functions (un)merging code already has the database reference at hand, and its natural to get wasm stuff from it, not from the q.p. which is not available

So this PR packs the wasm engine, runner and cache on sharded<wasm::manager> instance, makes the manager be referenced by both q.p. and database and removes the qctx from schema tables code

Closes #14933

* github.com:scylladb/scylladb:
  schema_tables: Stop using qctx
  database: Add wasm::manager& dependency
  main, cql_test_env, wasm: Start wasm::manager earlier
  wasm: Shuffle context::context()
  wasm: Add manager::remove()
  wasm: Add manager::precompile()
  wasm: Move stop() out of query_processor
  wasm: Make wasm sharded<manager>
  query_processor: Wrap wasm stuff in a struct
2023-08-06 17:00:28 +03:00
Tomasz Grabiec
67c7aadded service, raft: Move balance_tablets() to tablet_allocator
The implementation will access metrics registered from tablet_allocator.
2023-08-05 21:48:08 +02:00
Tomasz Grabiec
5bfc8b0445 main, storage_service: Pass tablet allocator to storage_service
Tablet balancing will be done through tablet_allocator later.
2023-08-05 03:10:26 +02:00
Pavel Emelyanov
fa93ac9bfd database: Add wasm::manager& dependency
The dependency is needed by db::schema_tables to get wasm manager for
its needs. This patch prepares the ground. Now the wasm::manager is
shared between replica::database and cql3::query_processor

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
f4e7ffa0fc main, cql_test_env, wasm: Start wasm::manager earlier
It will be needed by replica::database and should be available that
early. It doesn't depend on anything and can be moved in the starting
order safely

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Pavel Emelyanov
243f2217dd wasm: Make wasm sharded<manager>
The wasm::manager is just cql3::wasm_context renamed. It now sits in
lang/wasm* and is started as a sharded service in main (and cql test
env). This move also needs some headers shuffling, but it's not severe

This change is required to make it possible for the wasm::manager to be
shared (by reference) between q.p. and replica::database further

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-04 19:47:50 +03:00
Kamil Braun
b835acf853 Merge 'Cluster features on raft: topology coordinator + check on boot' from Piotr Dulikowski
This PR implements the functionality of the raft-based cluster features
needed to safely manage and enable cluster features, according to the
cluster features on raft design doc.

Enabling features is a two phase process, performed by the topology
coordinator when it notices that there are no topology changes in
progress and there are some not-yet enabled features that are declared
to be supported by all nodes:

1. First, a global barrier is performed to make sure that all nodes saw
   and persisted the same state of the `system.topology` table as the
   coordinator and see the same supported features of all nodes. When
   booting, nodes are now forbidden to revoke support for a feature if all
   nodes declare support for it, a successful barrier this makes sure that
   no node will restart and disable the features.
2. After a successful barrier, the features are marked as enabled in the
   `system.topology` table.

The whole procedure is a group 0 operation and fails if the topology
table is modified in the meantime (e.g. some node changes its supported
features set).

For now, the implementation relies on gossip shadow round check to
protect from nodes without all features joining the cluster. In a
followup, a new joining procedure will be implemented which involves the
topology coordinator and lets it verify joining node's cluster features
before the new node is added to group 0 and to the cluster.

A set of tests for the new implementation is introduced, containing the
same tests as for the non-raft-based cluster feature implementation plus
one additional test, specific to this implementation.

Closes #14722

* github.com:scylladb/scylladb:
  test: topology_experimental_raft: cluster feature tests
  test: topology: fix a skipped test
  storage_service: add injection to prevent enabling features
  storage_service: initialize enabled features from first node
  topology_state_machine: add size(), is_empty()
  group0_state_machine: enable features when applying cmds/snapshots
  persistent_feature_enabler: attach to gossip only if not using raft
  feature_service: enable and check raft cluster features on startup
  storage_service: provide raft_topology_change_enabled flag from outside
  storage_service: enable features in topology coordinator
  storage_service: add barrier_after_feature_update
  topology_coordinator: exec_global_command: make it optional to retake the guard
  topology_state_machine: add calculate_not_yet_enabled_features
2023-08-02 12:32:27 +02:00