to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.
we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15913
After adding the new prepare_new_column_family_announcement that
doesn't assume the existence of a keyspace, we also need to get
rid of the same assumption in all on_before_create_column_family
calls. After all, they may be initiated before creating the
keyspace. However, some listeners require keyspace_metadata, so we
pass it as a new parameter.
This reverts commit 4b80130b0b, reversing
changes made to a5519c7c1f. It's suspected
of causing dtest failures due to a bug in coroutine::parallel_for_each.
After adding the new prepare_new_column_family_announcement that
doesn't assume the existence of a keyspace, we also need to get
rid of the same assumption in all on_before_create_column_family
calls. After all, they may be initiated before creating the
keyspace. However, some listeners require keyspace_metadata, so we
pass it as a new parameter.
We change the type of IDs in CDC_GENERATIONS_V3 to timeuuid to
give them a time-based order. We also change how we initialize
them so that the new CDC generation always has the highest ID.
This is the last step to enabling the efficient clearing of
obsolete CDC generation data.
Additionally, we change the types of current_cdc_generation_uuid,
new_cdc_generation_data_uuid and the second values of the elements
in unpublished_cdc_generations to timeuuid, so that they match id
in CDC_GENERATIONS_V3.
After moving the creation of uuid out of
make_new_generation_description, this function only calls the
topology_description_generator's constructor and its generate
method. We could remove this function, but we instead simplify
the code by removing the topology_description_generator class.
We can do this refactor because make_new_generation_description
is the only place using it. We inline its generate method into
make_new_generation_description and turn its private methods into
static functions.
In the future commit, we change how we initialize uuid of the
new CDC generation in the Raft-based topology. It forces us to
move this initialization out of the make_new_generation_data
function shared between Raft-based and gossiper-based topologies.
We also rename make_new_generation_data to
make_new_generation_description since it only returns
cdc::topology_description now.
We make CDC_GENERATIONS_V3 single-partition by adding the key
column and changing the clustering key from range_end to
(id, range_end). This is the first step to enabling the efficient
clearing of obsolete CDC generation data, which we need to prevent
Raft-topology snapshots from endlessly growing as we introduce new
generations over time. The next step is to change the type of the id
column to timeuuid. We do it in the following commits.
After making CDC_GENERATIONS_V3 single-partition, there is no easy
way of preserving the num_ranges column. As it is used only for
sanity checking, we remove it to simplify the implementation.
In the following commit, we implement the
get_cdc_generation_mutations_v3 function very similar to
get_cdc_generation_mutations_v2. The only differences in creating
mutations between CDC_GENERATIONS_V2 and CDC_GENERATIONS_V3 are:
- a need to set the num_ranges cell for CDC_GENERATIONS_V2,
- different partition keys,
- different clustering keys.
To avoid code duplication, we introduce
get_common_cdc_generation_mutations, which does most of the work
shared by both functions.
In the following commits, we modify the CDC_GENERATIONS_V3 schema
to enable efficient clearing of obsolete CDC generation data.
These modifications make the current get_cdc_generation_mutations
work only for the CDC_GENERATIONS_V2 schema, and we need a new
function for CDC_GENERATIONS_V3, so we add the "_v2" suffix.
Now that the endpoint_state isn't change in place
we do not need to copy it to each subscriber.
We can rather just pass the lw_shared_ptr holding
a snapshot of it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
No need to lookup the application_state again using the
endpoint, as both callers already have a reference to
the endpoint_state handy.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Before changing _endpoint_state_map to hold a
lw_shared_ptr<endpoint_state>, provide synchronous helpers
for users to traverse all endpoint_states with no need
to copy them (as long as the called func does not yield).
With that, gossiper::get_endpoint_states() can be made private.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
As was described in the previous patch, this method is explicitly called
by storage service after updating the bootstrap state, so it's unneeded
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The legacy_handle_cdc_generation() checks if the node had bootstrapped
with the help of system_keyspace method. The former is called in two
cases -- on boot via cdc_generation_service::after_join() and via
gossiper on_...() notifications. The notifications, in turn, are set up
in the very same after_join().
The after_join(), in turn, is called from storage_service explicitly
after the bootstrap state is updated to be "complete", so the check for
the state in legacy_handle_...() seems unnecessary. However, there's
still the case when it may be stepped on -- decommission. When performed
it calls storage_service::leave_ring() which udpates the bootstrap state
to be "needed", thus preventing the cdc gen. service from doing anything
inside gossiper's on_...() notifications.
It's more correct to stop cdc gen. service handling gossiper
notifications by unsubscribing it, but by adding fragile implicit
dependencies on the bootstrap state.
Checks for sys.dist.ks in the legacy_handle_...() are kept in a form
of on-internal-error. The system distributed keyspace is activated by
storage service even before the bootstrap state is updated and is
never deactivated, but it's anyway good to have this assertion.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This series cleans up and hardens the endpoint locking design and
implementation in the gossiper and endpoint-state subscribers.
We make sure that all notifications (expect for `before_change`, that
apparently can be dropped) are called under lock_endpoint, as well as
all calls to gossiper::replicate, to serialize endpoint_state changes
across all shards.
An endpoint lock gets a unique permit_id that is passed to the
notifications and passed back by them if the notification functions call
the gossiper back for the same endpoint on paths that modify the
endpoint_state and may acquire the same endpoint lock - to prevent a
deadlock.
Fixes scylladb/scylladb#14838
Refs scylladb/scylladb#14471
Closes#14845
* github.com:scylladb/scylladb:
gossiper: replicate: ensure non-null permit
gossiper: add_saved_endpoint: lock_endpoint
gossiper: mark_as_shutdown: lock_endpoint
gossiper: real_mark_alive: lock_endpoint
gossiper: advertise_token_removed: lock_endpoint
gossiper: do_status_check: lock_endpoint
gossiper: remove_endpoint: lock_endpoint if needed
gossiper: force_remove_endpoint: lock_endpoint if needed
storage_service: lock_endpoint when removing node
gossiper: use permit_id to serialize state changes while preventing deadlocks
gossiper: lock_endpoint: add debug messages
utils: UUID: make default tagged_uuid ctor constexpr
gossiper: lock_endpoint must be called on shard 0
gossiper: replicate: simplify interface
gossiper: mark_as_shutdown: make private
gossiper: convict: make private
gossiper: mark_as_shutdown: do not call convict
Pass permit_id to subscribers when we acquire one
via lock_endpoint. The subscribers then pass it back to
gossiper for paths that acquire lock_endpoint for
the same endpoint, to detect nested locks when the endpoint
is locked with the same permit_id.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We add the CDC generation optimality check in
`storage_service::raft_check_and_repair_cdc_streams` so that it doesn't
create new generations when unnecessary. Since
`generation_service::check_and_repair_cdc_streams` already has this
check, we extract it to the new `is_cdc_generation_optimal` function to
not duplicate the code.
After this change, multiple tasks could wait for a single generation
change. Calling `signal` on `topology_state_machine.event` would't wake
them all. Moreover, we must ensure the topology coordinator wakes when
his logic expects it. Therefore, we change all `signal` calls on
`topology_state_machine.event` to `broadcast`.
We delay the deletion of the `new_cdc_generation` request to the moment
when the topology transition reaches the `publish_cdc_generation` state.
We need this change to ensure the added CDC generation optimality check
in the next commit has an intended effect. If we didn't make it, it
would be possible that a task makes the `new_cdc_generation` request,
and then, after this request was removed but before committing the new
generation, another task also makes the `new_cdc_generation` request. In
such a scenario, two generations are created, but only one should. After
delaying the deletion of `new_cdc_generation` requests, the second
request would have no effect.
Additionally, we modify the `test_topology_ops.py` test in a way that
verifies the new changes. We call
`storage_service::raft_check_and_repair_cdc_streams` multiple times
concurrently and verify that exactly one generation has been created.
Fixes#14055Closes#14789
* github.com:scylladb/scylladb:
storage_service: raft_check_and_repair_cdc_streams: don't create a new generation if current one is optimal
storage_service: delay deletion of the new_cdc_generation request
raft topology: broadcast on topology_state_machine.event instead of signal
cdc: implement the is_cdc_generation_optimal function
In the following commits, we add the CDC generation optimality
check to storage_service::raft_check_and_repair_cdc_streams so
that it doesn't create new CDC generations when unnecessary. Since
generation_service::check_and_repair_cdc_streams already has
this check, we extract it to the new is_cdc_generation_optimal
function to not duplicate the code.
As a preparation for ensuring access safety for column families
related maps, add tables_metadata, access to members of which
would be protected by rwlock.
This test limits `commitlog_segment_size_in_mb` to 2, thus `max_command_size`
is limited to less than 1 MB. It adds an injection which copies mutations
generated by `get_cdc_generation_mutations` n times, where n is picked that
the memory size of all mutations exceeds `max_command_size`.
This test passes if cdc generation data is committed by raft in multiple commands.
If all the data is committed in a single command, the leader node will loop trying
to send raft command and getting the error:
```
storage_service - raft topology: topology change coordinator fiber got error raft::command_is_too_big_error (Command size {} is greater than the configured limit {})
```
this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `cdc::generation_id` and `db_clock::time_point` without the help of `operator<<`.
the formatter of `cdc::generation_id` uses that of `db_clock::time_point` , so these two commits are posted together in a single pull request.
the corresponding `operator<<()` is removed in this change, as all its callers are now using fmtlib for formatting now.
Refs #13245Closes#13703
* github.com:scylladb/scylladb:
db_clock: specialize fmt::formatter<db_clock::time_point>
cdc: generation: specialize fmt::formatter<generation_id>
as this syntax is not supported by the standard, it seems clang
just silently construct the value with the initializer list and
calls the operator=, but GCC complains:
```
/home/kefu/dev/scylladb/cdc/split.cc:392:54: error: converting to ‘std::optional<partition_deletion>’ from initializer list would use explicit constructor ‘constexpr std::optional<_Tp>::optional(_Up&&) [with _Up = const tombstone&; typename std::enable_if<__and_v<std::__not_<std::is_same<std::optional<_Tp>, typename std::remove_cv<typename std::remove_reference<_Iter>::type>::type> >, std::__not_<std::is_same<std::in_place_t, typename std::remove_cv<typename std::remove_reference<_Iter>::type>::type> >, std::is_constructible<_Tp, _Up>, std::__not_<std::is_convertible<_Iter, _Iterator> > >, bool>::type <anonymous> = false; _Tp = partition_deletion]’
392 | _result[t.timestamp].partition_deletions = {t};
| ^
```
to silences the error, and to be more standard compliant,
let's use emplace() instead.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `generation_id` without the help of `operator<<`.
the corresponding `operator<<()` is removed in this change, as all its
callers are now using fmtlib for formatting now.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
if the visitor clauses are the same, we can just use the generic version
of it by specifying the parameter with `auto&`. simpler this way.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13626
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.
fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.
in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.
sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.
also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.
please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.
this change is a cleanup to modernize the code base with C++20
features.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13687
now that C++20 generates operator== for us, these is no need to
handcraft it manually. also, in C++17, the standard library offers
default implementation of operator== for `std::variant<>`, so no need
to implement it by ourselves.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13625
and provide accessor functions to get them.
1. So they can't be modified by mistake, as the versioned value is
immutable. A new value must have a higher version.
2. Before making the version a strong gms::version_type.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When a node notices that a new CDC generation was introduced in
`storage_service::topology_state_load`, it updates its internal data
structures that are used when coordinating writes to CDC log tables.
`cdc::generation_service::make_new_cdc_generation` would create a new
CDC generation and insert it into the `CDC_GENERATIONS_V2` table these
days. For Raft-based topology chnages we'll do the data insertion
somewhere else - in topology coordinator code. So extract the parts for
calculating the CDC generation to free-standing functions (these are
almost pure calculations, modulo accessing RNG).
The `CDC_GENERATIONS_V3` table schema is a copy-paste of the
`CDC_GENERATIONS_V2` schema. The difference is that V2 lives in
`system_distributed_keyspace` and writes to it are distributed using
regular `storage_proxy` replication mechanisms based on the token ring.
The V3 table lives in `system_keyspace` and any mutations written to it
will go through group 0.
Also extend the `TOPOLOGY` schema with new columns:
- `new_cdc_generation_data_uuid` will be stored as part of a bootstrapping
node's `ring_slice`, it stores UUID of a newly introduced CDC
generation which is used as partition key for the `CDC_GENERATIONS_V3`
table to access this new generation's data. It's a regular column,
meaning that every row (corresponding to a node) will have its own.
- `current_cdc_generation_uuid` and `current_cdc_generation_timestamp`
together form the ID of the newest CDC generation in the cluster.
(the uuid is the data key for `CDC_GENERATIONS_V3`, the timestamp is
when the CDC generation starts operating). Those are static columns
since there's a single newest CDC generation.
The function would generate a mutation timestamp for itself, take it as
parameter instead. We'll use timestamps provided by Group 0 APIs when
creating CDC generations during Group 0- based topology changes.
The function used to obtain the sharding info for a given node (its
number of shards and ignore_msb_bits) was using gossiper application
states.
We want to reuse `topology_description_generator` to build CDC
generations when doing Raft Group 0-based topology changes, so make
`get_sharding_info` a parameter.
It was a `static` function inside system_distributed_keyspace. Later it
will be used for another table living in system_keyspace, so move it
outside, to the CDC generations module, and make it accessible from
other places.
the default generated operator<=> is exactly the same as the
handcrafted one. so let compiler do its job. also, since
operator<=> is defaulted, there is no need to define operator==
anymore, so drop it as well.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.
the building systems are updated accordingly.
the source files referencing `types.hh` were updated using following
command:
```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```
the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12926
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.
Closes#12858
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788
When we start allowing NULL in lists in some contexts, the exact
location where an error is raised (when it's disallowed) will
change. To prepare for that, relax the exception check to just
ensure the word NULL is there, without caring about the exact
wording.
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.