First implementation of strongly consistent everywhere tables operates on simple table
representing string to string map.
Add hard-coded schema for broadcast_kv_store table (key text primary key,
value text). This table is under system keyspace and is created if and only if
BROADCAST_TABLES feature is enabled.
There's a cache of endpoint:{dc,rack} on system keyspace cache, but the
local node is not there, because this data is populated from the peers
table, while local node's dc/rack is in snitch (or system.local table).
At the same time, storage_service::join_cluster() and whoever it calls
(e.g. -- the repair) will need this info on start and it's convenient
to have this data on sys-ks cache.
It's not on the peers part of the cache because next branch removes this
map and it's going to be very clumsy to have a whole container with just
one enty in it.
There's a peer code in system_keyspace::setup() that gets the local node
dc/rack and committs it into the system.local table. However, putting
the data into cache is done on .start(). This is because cql-test-env
needs this data cached too, but it doesn't call sys_ks.setup(). Will be
cleaned some other day.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Define an enum class, `group0_upgrade_state`, describing the state of
the upgrade procedure (implemented in later commits).
Provide IDL definitions for (de)serialization.
The node will have its current upgrade state stored on disk in
`system.scylla_local` under the `group0_upgrade_state` key. If the key
is not present we assume `use_pre_raft_procedures` (meaning we haven't
started upgrading yet or we're at the beginning of upgrade).
Introduce `system_keyspace` accessor methods for storing and retrieving
the on-disk state.
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.
Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.
Fixes#11207
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently the INTERNAL_IP state is updated using reconnectable helper
by subscribing on on_join/on_change events from gossiper. The same
subscription exists in storage service (it's a bit more elaborated by
checking if the node is the part of the ring which is OK).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.
Closes#10562
* 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla:
main: allow joining raft group0 before waiting for gossiper to settle
service: raft_group0: make `join_group0` re-entrant
service: storage_service: add `join_group0` method
raft_group_registry: update gossiper state only on shard 0
raft: don't update gossiper state if raft is enabled early or not enabled at all
gms: feature_service: add `cluster_uses_raft_mgmt` accessor method
db: system_keyspace: add `bootstrap_needed()` method
db: system_keyspace: mark getter methods for bootstrap state as "const"
Querying the table is now done with the help of qctx directly. This
patch replaces it with a querying helper that calls the consumer
function with the entry struct as the argument.
After this change repair code can stop including query_context and
mess with untyped_result_set.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Current code works directly on the qctx which is not nice. Instead,
make it use the system keyspace reference. To make it work, the patch
adds a helper method and introduces a helper struct for the table
entry. This struct will also be used to query the table (next patch).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method checks that bootstrap state is equal to
`NEEDS_BOOTSTRAP`. This will be used later to check
if we are in the state of "fresh" start (i.e. starting
a node from scratch).
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The `bootstrap_complete()`, `bootstrap_in_progress()`,
`was_decommissioned()` and `get_bootstrap_state()` don't
modify internal state, so eligible to be marked as `const`.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The callers are system_keyspace.load_local_host_id and storage service.
The former is non-static since previous patch, the latter has its own
sys.ks. reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
No users of this variable left, all the code relies on system_keyspace
"this" to get it. Respectively, the cache can be a unique_ptr<> on the
system_keyspace instance and the global sharded variable can be removed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
And remove a bunch of (_local)?_cache.invoke_on_all() calls. This
is the preparation for removing the global cache instance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's snitch code that needs it. It now takes messaging service
from gossiper, so it can do the same with system keyspace. This
change removes one user of the global sys.ks. cache instance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The users of get_/set_bootstrap_sate and aux helpers are CDC and
storage service. Both have local system_keyspace references and can
just use them. This removes some users of global system ks. cache
and the qctx thing.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The host id is cached on db::config object that's available in
all the places that need it. This allows removing the method in
question from the system_keyspace and not caring that anyone that
needs host_id would have to depend on system_keyspace instance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The update_table() helper template too. And the update_peer_info as
well. It can stop using global qctx and cache after that
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's called from two places -- .setup() and schema_tables code. Both
have the instance hanging around, so the method can be de-marked
static and set free from global qctx
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This one is a bit more tricky that its four preceeders. The qctx's
qp().execute_cql() is replaced with qp().execute_internal() for
symmetry with the rest. Without data args it's the same.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Yet another same step. Drop static keyword and patch out globals.
Get config.cluster_name from _db while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Just remove static mark and stop using global qctx.
Grab config from _db instead of argument while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Before patching system_keyspace methods to use query processor from
its instance, the respective call is needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's called only on start and actively uses both qctx and local
cache. Next patches will fix the whole setup code to stop using
global qctx/cache.
For now setup invocation is left in its place, but it must really
happen in start() method. More patching is needed to make it work.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
For now it's a reference, but all users of the cache will be
eventually switched into using system_keyspace.
In cql-test-env cache starting happens earlier than it was
before, but that's OK, it just initializes empty instances.
In main cache starts at the same time as before patching.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Start happens at exactly the same place. One thing to take care
of is that it happens on all shards.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The db::system_keyspace was made a class some time ago, time to create
a standard sharded<> object out of it. It needs query processor and
database. None of those depensencies is started early enough, so the
object for now starts in two steps -- early instances creation and
late start.
The instances will carry qctx and local_cache on board and all the
services that need those two will depend on system-keyspace. Its start
happens at exactly the same place where system_keyspace::setup happens
thus any service that will use system_keyspace will be on the same
safe side as it is now.
In the further future the system_keyspace will be equpped with its
own query processor backed by local replica database instance, instead
of the whole storage proxy as it is now.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We add a `peers()` method to `discovery` which returns the peers
discovered until now (including seeds). The caller of functions which
return an output -- `tick` or `request` -- is responsible for persisting
`peers()` before returning the output of `tick`/`request` (e.g. before
sending the response produced by `request` back). The user of
`discovery` is also responsible for restoring previously persisted peers
when constructing `discovery` again after a restart (e.g. if we
previously crashed in the middle of the algorithm).
The `persistent_discovery` class is a wrapper around `discovery` which
does exactly that.
For storage we use a simple local table.
A simple bugfix is also included in the first patch.
* kbr/discovery-persist-v3:
service: raft: raft_group0: persist discovered peers and restore on restart
db: system_keyspace: introduce discovery table
service: raft: discovery: rename `get_output` to `tick`
service: raft: discovery: stop returning peer_list from `request` after becoming leader
This includes most of the connection_notifier stuff as well as
the auxiliary code from system_keyspace.cc and a bunch of
updating calls from the client state changing.
Other than less code and less disk updates on clients connection
paths, this removes one usage of the nasty global qctx thing.
Since the system.clients goes away rename the system.clients_v
here too so the table is always present out there.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This table will be used to persist the list of peers discovered by the
`discovery` algorithm that is used for creating Raft group 0 when
bootstrapping a fresh cluster.
`system.raft`, `system.raft_snapshots` and `system.raft_config`
were missing from the `extra_durable_tables` list, so that
`set_wait_for_sync_to_commitlog(true)` was not enabled when
the tables were re-created via `create_table_from_mutations`.
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20220210073418.484843-1-pa.solodovnikov@scylladb.com>
When performing a change through group 0 (which right now only covers
schema changes), clear entries from group 0 history table which are older
than one week.
This is done by including an appropriate range tombstone in the group 0
history table mutation.
The description parameter is used for the group 0 history mutation.
The default is empty, in which case the mutation will leave
the description column as `null`.
I filled the parameter in some easy places as an example and left the
rest for a follow-up.
This is how it looks now in a fresh cluster with a single statement
performed by the user:
cqlsh> select * from system.group0_history ;
key | state_id | description
---------+--------------------------------------+------------------------------------------------------
history | 9ec29cac-7547-11ec-cfd6-77bb9e31c952 | CQL DDL statement
history | 9beb2526-7547-11ec-7b3e-3b198c757ef2 | null
history | 9be937b6-7547-11ec-3b19-97e88bd1ca6f | null
history | 9be784ca-7547-11ec-f297-f40f0073038e | null
history | 9be52e14-7547-11ec-f7c5-af15a1a2de8c | null
history | 9be335dc-7547-11ec-0b6d-f9798d005fb0 | null
history | 9be160c2-7547-11ec-e0ea-29f4272345de | null
history | 9bdf300e-7547-11ec-3d3f-e577a2e31ffd | null
history | 9bdd2ea8-7547-11ec-c25d-8e297b77380e | null
history | 9bdb925a-7547-11ec-d754-aa2cc394a22c | null
history | 9bd8d830-7547-11ec-1550-5fd155e6cd86 | null
history | 9bd36666-7547-11ec-230c-8702bc785cb9 | Add new columns to system_distributed.service_levels
history | 9bd0a156-7547-11ec-a834-85eac94fd3b8 | Create system_distributed(_everywhere) tables
history | 9bcfef18-7547-11ec-76d9-c23dfa1b3e6a | Create system_distributed_everywhere keyspace
history | 9bcec89a-7547-11ec-e1b4-34e0010b4183 | Create system_distributed keyspace
This table will contain a history of all group 0 changes applied through
Raft. With each change is an associated unique ID, which also identifies
the state of all group 0 tables (including schema tables) after this
change is applied, assuming that all such changes are serialized through
Raft (they will be eventually).
We will use these state IDs to check if a given change is still
valid at the moment it is applied (in `group0_state_machine::apply`),
i.e. that there wasn't a concurrent change that happened between
creating this change and applying it (which may invalidate it).
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937