Commit Graph

123 Commits

Author SHA1 Message Date
Tomasz Grabiec
9da3bd84c7 dht: Extract dht::static_sharder
Before the patch, dht::sharder could be instantiated and it would
behave like a static sharder. This is not safe with regards to
extensions of the API because if a derived implementation forgets to
override some method, it would incorrectly default to the
implementation from static sharder. Better to fail the compilation in
this case, so extract static sharder logic to dht::static_sharder
class and make all methods in dht::sharder pure virtual.

This also allows us to have algorithms indicate that they only work
with static sharder by accepting the type, and have compile-time
safety for this requirement.

schema::get_sharder() is changed to return the static_sharder&.
2024-05-16 00:28:47 +02:00
Patryk Jędrzejczak
628d7e709e cdc: generation: fix retrieve_generation_data_v2
`system_keyspace::read_cdc_generation_opt` queries
`system.cdc_generations_v3`, which stores ids of CDC generations
as timeuuids. This function shouldn't be called with a normal uuid
(used by `system.cdc_generations_v2` to store generation ids).
Such a call would end with a marshaling error.

Before this patch,`retrieve_generation_data_v2` could call
`system_keyspace::read_cdc_generation_opt` with a normal uuid if
the generation wasn't present in `system.cdc_generations_v2`.
This logic caused a marshaling error while handling the
`check_and_repair_cdc_streams` request in the
`cdc_test.TestCdc.test_check_and_repair_cdc_streams_liveness` dtest.

This patch fixes the code being added in 6.0, no need to backport it.

Fixes scylladb/scylladb#18473

Closes scylladb/scylladb#18483
2024-05-06 09:12:47 +02:00
Patryk Jędrzejczak
3a34bb18cd db: config: make consistent-topology-changes unused
We make the `consistent-topology-changes` experimental feature
unused and assumed to be true in 6.0. We remove code branches that
executed if `consistent-topology-changes` was disabled.
2024-04-25 14:33:21 +02:00
Kefu Chai
372a4d1b79 treewide: do not define FMT_DEPRECATED_OSTREAM
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.

in this change,

* utils: drop the range formatters in to_string.hh and to_string.c, as
  we don't use them anymore. and the tests for them in
  test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
  we are not able to print the elements using operator<< anymore
  after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
  due to a bug in {fmt} v9, {fmt} fails to format a range whose
  element type is `basic_sstring<uint8_t>`, as it considers it
  as a string-like type, but `basic_sstring<uint8_t>`'s char type
  is signed char, not char. this issue does not exist in {fmt} v10,
  so, in this change, we add a workaround to explicitly specialize
  the type trait to assure that {fmt} format this type using its
  `fmt::formatter` specialization instead of trying to format it
  as a string. also, {fmt}'s generic ranges formatter calls the
  pair formatter's `set_brackets()` and `set_separator()` methods
  when printing the range, but operator<< based formatter does not
  provide these method, we have to include this change in the change
  switching to {fmt}, otherwise the change specializing
  `fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
  for comparing values. but without the operator<< based formatters,
  Boost.Test would not be able to print them. after removing
  the homebrew formatters, we need to use the generic
  `boost_test_print_type()` helper to do this job. so we are
  including `test_utils.hh` in tests so that we can print
  the formattable types.
* treewide: add "#include "utils/to_string.hh" where
  `fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:57:36 +08:00
Benny Halevy
fceb1183d3 cdc: should_propose_first_generation: get my_host_id from caller
There is no need to map this node's inet_address to host_id.
The storage_service can easily just pass the local host_id.
While at it, get the other node's host_id directly
from their endpoint_state instead of looking it up
yet again in the gossiper, using the nodes' address.

Refs #12283

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-20 12:53:49 +02:00
Patryk Jędrzejczak
0470b721c2 cdc: generation: allow increasing generation_leeway through error injection
The increased `generation_leeway` is used in the next patch to
write a test. Since it's no longer a constant, we create a new
getter for it.
2024-02-12 10:14:00 +01:00
Piotr Dulikowski
d04b3338ce cdc/generation_service: in legacy mode, fall back to raft tables
When a node enters recovery after being in raft topology mode, topology
operations switch back to legacy mode. We want CDC to keep working when
that happens, so we need for the legacy code to be able to access
generations created back in raft mode - so that the node can still
properly serve writes to CDC log tables.

In order to make this possible, modify the legacy logic to also look for
a cdc generation in raft tables, if it is not found in legacy tables.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
77a8f5e3d6 cdc/generation_service: turn off gossip notifications in raft topo mode
In raft topology mode CDC information is propagated through group 0.
Prevent the generation service from reacting to gossiper notifications
after we made the switch to raft mode.
2024-02-08 19:12:28 +01:00
Kefu Chai
6c06751640 cdc: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16725
2024-01-11 09:13:37 +02:00
Benny Halevy
ad8a9104d8 endpoint_state subscriptions: batch on_change notification
Rather than calling on_change for each particular
application_state, pass an endpoint_state::map_type
with all changed states, to be processed as a batch.

In particular, thise allows storage_service::on_change
to update_peer_info once for all changed states.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Petr Gusev
7b55ccbd8e token_metadata: drop the template
Replace token_metadata2 ->token_metadata,
make token_metadata back non-template.

No behavior changes, just compilation fixes.
2023-12-12 23:19:54 +04:00
Petr Gusev
799f747c8f shared_token_metadata: switch to the new token_metadata 2023-12-12 23:19:54 +04:00
Petr Gusev
7eb7863635 cdc: switch to token_metadata2
Change the token_metadata type to token_metadata2 in
the signatures of CDC-related methods in
storage_service and cdc/generation. Use
get_new_strong to get a pointer to the new host_id-based
token_metadata from the inet_address-based one,
living in the shared_token_metadata.

The starting point of the patch is in
storage_service::handle_global_request. We change the
tmptr type to token_metadata2 and propagate the change
down the call chains. This includes token-related methods
of the boot_strapper class.
2023-12-12 23:19:53 +04:00
Benny Halevy
25754f843b gossiper: add get_this_endpoint_state_ptr
Returns this node's endpoint_state_ptr.
With this entry point, the caller doesn't need to
get_broadcast_address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Michael Huang
62a8a31be7 cdc: use chunked_vector for topology_description entries
Lists can grow very big. Let's use a chunked vector to prevent large contiguous
allocations.
Fixes: #15302.

Closes scylladb/scylladb#15428
2023-09-18 23:17:01 +03:00
Patryk Jędrzejczak
1c58c6336a system_keyspace: change id to timeuuid in CDC_GENERATIONS_V3
We change the type of IDs in CDC_GENERATIONS_V3 to timeuuid to
give them a time-based order. We also change how we initialize
them so that the new CDC generation always has the highest ID.
This is the last step to enabling the efficient clearing of
obsolete CDC generation data.

Additionally, we change the types of current_cdc_generation_uuid,
new_cdc_generation_data_uuid and the second values of the elements
in unpublished_cdc_generations to timeuuid, so that they match id
in CDC_GENERATIONS_V3.
2023-09-12 11:43:34 +02:00
Patryk Jędrzejczak
fab066cffe cdc: generation: remove topology_description_generator
After moving the creation of uuid out of
make_new_generation_description, this function only calls the
topology_description_generator's constructor and its generate
method. We could remove this function, but we instead simplify
the code by removing the topology_description_generator class.
We can do this refactor because make_new_generation_description
is the only place using it. We inline its generate method into
make_new_generation_description and turn its private methods into
static functions.
2023-09-12 11:18:54 +02:00
Patryk Jędrzejczak
3bf4cac72e cdc: do not create uuid in make_new_generation_data
In the future commit, we change how we initialize uuid of the
new CDC generation in the Raft-based topology. It forces us to
move this initialization out of the make_new_generation_data
function shared between Raft-based and gossiper-based topologies.

We also rename make_new_generation_data to
make_new_generation_description since it only returns
cdc::topology_description now.
2023-09-12 11:18:38 +02:00
Patryk Jędrzejczak
2cd430ac80 system_kayspace: make CDC_GENERATIONS_V3 single-partition
We make CDC_GENERATIONS_V3 single-partition by adding the key
column and changing the clustering key from range_end to
(id, range_end). This is the first step to enabling the efficient
clearing of obsolete CDC generation data, which we need to prevent
Raft-topology snapshots from endlessly growing as we introduce new
generations over time. The next step is to change the type of the id
column to timeuuid. We do it in the following commits.

After making CDC_GENERATIONS_V3 single-partition, there is no easy
way of preserving the num_ranges column. As it is used only for
sanity checking, we remove it to simplify the implementation.
2023-09-12 09:51:45 +02:00
Patryk Jędrzejczak
29f54836d0 cdc: generation: introduce get_common_cdc_generation_mutations
In the following commit, we implement the
get_cdc_generation_mutations_v3 function very similar to
get_cdc_generation_mutations_v2. The only differences in creating
mutations between CDC_GENERATIONS_V2 and CDC_GENERATIONS_V3 are:
- a need to set the num_ranges cell for CDC_GENERATIONS_V2,
- different partition keys,
- different clustering keys.

To avoid code duplication, we introduce
get_common_cdc_generation_mutations, which does most of the work
shared by both functions.
2023-09-12 09:37:21 +02:00
Patryk Jędrzejczak
ed1c1369d9 cdc: generation: rename get_cdc_generation_mutations
In the following commits, we modify the CDC_GENERATIONS_V3 schema
to enable efficient clearing of obsolete CDC generation data.
These modifications make the current get_cdc_generation_mutations
work only for the CDC_GENERATIONS_V2 schema, and we need a new
function for CDC_GENERATIONS_V3, so we add the "_v2" suffix.
2023-09-11 12:30:21 +02:00
Benny Halevy
c16ec870da gms: pass endpoint_state_ptr to endpoint_state change subscribers
Now that the endpoint_state isn't change in place
we do not need to copy it to each subscriber.
We can rather just pass the lw_shared_ptr holding
a snapshot of it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 09:35:15 +03:00
Benny Halevy
1e0d19b89d cdc/generation: get_generation_id_for: get endpoint_state&
No need to lookup the application_state again using the
endpoint, as both callers already have a reference to
the endpoint_state handy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 08:34:23 +03:00
Benny Halevy
4f5ffc7719 gossiper: add for_each_endpoint_state helpers
Before changing _endpoint_state_map to hold a
lw_shared_ptr<endpoint_state>, provide synchronous helpers
for users to traverse all endpoint_states with no need
to copy them (as long as the called func does not yield).

With that, gossiper::get_endpoint_states() can be made private.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-31 08:32:31 +03:00
Pavel Emelyanov
4bf8f693ee cdc: Remove bootstrap state assertion from after_join()
As was described in the previous patch, this method is explicitly called
by storage service after updating the bootstrap state, so it's unneeded

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 09:47:35 +03:00
Pavel Emelyanov
566f57b683 cdc: Rework gen. service check for bootstrap state
The legacy_handle_cdc_generation() checks if the node had bootstrapped
with the help of system_keyspace method. The former is called in two
cases -- on boot via cdc_generation_service::after_join() and via
gossiper on_...() notifications. The notifications, in turn, are set up
in the very same after_join().

The after_join(), in turn, is called from storage_service explicitly
after the bootstrap state is updated to be "complete", so the check for
the state in legacy_handle_...() seems unnecessary. However, there's
still the case when it may be stepped on -- decommission. When performed
it calls storage_service::leave_ring() which udpates the bootstrap state
to be "needed", thus preventing the cdc gen. service from doing anything
inside gossiper's on_...() notifications.

It's more correct to stop cdc gen. service handling gossiper
notifications by unsubscribing it, but by adding fragile implicit
dependencies on the bootstrap state.

Checks for sys.dist.ks in the legacy_handle_...() are kept in a form
of on-internal-error. The system distributed keyspace is activated by
storage service even before the bootstrap state is updated and is
never deactivated, but it's anyway good to have this assertion.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 09:46:13 +03:00
Kamil Braun
39ca07c49b Merge 'Gossiper endpoint locking' from Benny Halevy
This series cleans up and hardens the endpoint locking design and
implementation in the gossiper and endpoint-state subscribers.

We make sure that all notifications (expect for `before_change`, that
apparently can be dropped) are called under lock_endpoint, as well as
all calls to gossiper::replicate, to serialize endpoint_state changes
across all shards.

An endpoint lock gets a unique permit_id that is passed to the
notifications and passed back by them if the notification functions call
the gossiper back for the same endpoint on paths that modify the
endpoint_state and may acquire the same endpoint lock - to prevent a
deadlock.

Fixes scylladb/scylladb#14838
Refs scylladb/scylladb#14471

Closes #14845

* github.com:scylladb/scylladb:
  gossiper: replicate: ensure non-null permit
  gossiper: add_saved_endpoint: lock_endpoint
  gossiper: mark_as_shutdown: lock_endpoint
  gossiper: real_mark_alive: lock_endpoint
  gossiper: advertise_token_removed: lock_endpoint
  gossiper: do_status_check: lock_endpoint
  gossiper: remove_endpoint: lock_endpoint if needed
  gossiper: force_remove_endpoint: lock_endpoint if needed
  storage_service: lock_endpoint when removing node
  gossiper: use permit_id to serialize state changes while preventing deadlocks
  gossiper: lock_endpoint: add debug messages
  utils: UUID: make default tagged_uuid ctor constexpr
  gossiper: lock_endpoint must be called on shard 0
  gossiper: replicate: simplify interface
  gossiper: mark_as_shutdown: make private
  gossiper: convict: make private
  gossiper: mark_as_shutdown: do not call convict
2023-08-02 13:50:08 +02:00
Benny Halevy
f74d154fe3 gossiper: use permit_id to serialize state changes while preventing deadlocks
Pass permit_id to subscribers when we acquire one
via lock_endpoint.  The subscribers then pass it back to
gossiper for paths that acquire lock_endpoint for
the same endpoint, to detect nested locks when the endpoint
is locked with the same permit_id.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-01 17:41:57 +03:00
Kamil Braun
8bb3732d66 Merge 'storage_service: raft_check_and_repair_cdc_streams: don't create a new generation if current one is optimal' from Patryk Jędrzejczak
We add the CDC generation optimality check in
`storage_service::raft_check_and_repair_cdc_streams` so that it doesn't
create new generations when unnecessary. Since
`generation_service::check_and_repair_cdc_streams` already has this
check, we extract it to the new `is_cdc_generation_optimal` function to
not duplicate the code.

After this change, multiple tasks could wait for a single generation
change. Calling `signal` on `topology_state_machine.event` would't wake
them all. Moreover, we must ensure the topology coordinator wakes when
his logic expects it. Therefore, we change all `signal` calls on
`topology_state_machine.event` to `broadcast`.

We delay the deletion of the `new_cdc_generation` request to the moment
when the topology transition reaches the `publish_cdc_generation` state.
We need this change to ensure the added CDC generation optimality check
in the next commit has an intended effect. If we didn't make it, it
would be possible that a task makes the `new_cdc_generation` request,
and then, after this request was removed but before committing the new
generation, another task also makes the `new_cdc_generation` request. In
such a scenario, two generations are created, but only one should. After
delaying the deletion of `new_cdc_generation` requests, the second
request would have no effect.

Additionally, we modify the `test_topology_ops.py` test in a way that
verifies the new changes. We call
`storage_service::raft_check_and_repair_cdc_streams` multiple times
concurrently and verify that exactly one generation has been created.

Fixes #14055

Closes #14789

* github.com:scylladb/scylladb:
  storage_service: raft_check_and_repair_cdc_streams: don't create a new generation if current one is optimal
  storage_service: delay deletion of the new_cdc_generation request
  raft topology: broadcast on topology_state_machine.event instead of signal
  cdc: implement the is_cdc_generation_optimal function
2023-08-01 12:10:00 +02:00
Patryk Jędrzejczak
b05b4a352a cdc: implement the is_cdc_generation_optimal function
In the following commits, we add the CDC generation optimality
check to storage_service::raft_check_and_repair_cdc_streams so
that it doesn't create new CDC generations when unnecessary. Since
generation_service::check_and_repair_cdc_streams already has
this check, we extract it to the new is_cdc_generation_optimal
function to not duplicate the code.
2023-07-28 11:04:17 +02:00
Aleksandra Martyniuk
cdbfa0b2f5 replica: iterate safely over tables related maps
Loops over _column_families and _ks_cf_to_uuid which may preempt
are protected by reader mode of rwlock so that iterators won't
get invalid.
2023-07-25 17:13:04 +02:00
Aleksandra Martyniuk
52afd9d42d replica: wrap column families related maps into tables_metadata
As a preparation for ensuring access safety for column families
related maps, add tables_metadata, access to members of which
would be protected by rwlock.
2023-07-25 16:13:00 +02:00
Mikołaj Grzebieluch
4e3c97d8d4 test: raft topology: test prepare_and_broadcast_cdc_generation_data
This test limits `commitlog_segment_size_in_mb` to 2, thus `max_command_size`
is limited to less than 1 MB. It adds an injection which copies mutations
generated by `get_cdc_generation_mutations` n times, where n is picked that
the memory size of all mutations exceeds `max_command_size`.

This test passes if cdc generation data is committed by raft in multiple commands.
If all the data is committed in a single command, the leader node will loop trying
to send raft command and getting the error:
```
storage_service - raft topology: topology change coordinator fiber got error raft::command_is_too_big_error (Command size {} is greater than the configured limit {})
```
2023-07-07 13:56:35 +02:00
Kefu Chai
7863ef53ad cdc: generation: specialize fmt::formatter<generation_id>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `generation_id` without the help of `operator<<`.

the corresponding `operator<<()` is removed in this change, as all its
callers are now using fmtlib for formatting now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-28 15:47:44 +08:00
Kefu Chai
87e9686f61 cdc: generation: simpify std::visit() call
if the visitor clauses are the same, we can just use the generic version
of it by specifying the parameter with `auto&`. simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13626
2023-04-27 14:43:20 +02:00
Kefu Chai
c06b20431e cdc: generation: use default-generated operator==
now that C++20 generates operator== for us, these is no need to
handcraft it manually. also, in C++17, the standard library offers
default implementation of operator== for `std::variant<>`, so no need
to implement it by ourselves.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13625
2023-04-24 10:13:28 +03:00
Benny Halevy
c5d819ce60 gms: versioned_value: make members private
and provide accessor functions to get them.

1. So they can't be modified by mistake, as the versioned value is
   immutable. A new value must have a higher version.
2. Before making the version a strong gms::version_type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 08:37:32 +03:00
Kamil Braun
1688001585 cdc: generation_service: add a FIXME 2023-04-20 16:36:41 +02:00
Kamil Braun
d13a0b1930 cdc: generation_service: add legacy_ prefix for gossiper-based functions
Most of the code in the service exists to handle gossiper-based topology
changes. Name the functions appropriately and add a note in the
comments.
2023-04-20 16:36:41 +02:00
Kamil Braun
4c99b4004b storage_service: use CDC generations introduced by Raft topology
When a node notices that a new CDC generation was introduced in
`storage_service::topology_state_load`, it updates its internal data
structures that are used when coordinating writes to CDC log tables.
2023-04-20 16:36:41 +02:00
Kamil Braun
3abe0f0ad6 cdc: generation: extract pure parts of make_new_generation outside
`cdc::generation_service::make_new_cdc_generation` would create a new
CDC generation and insert it into the `CDC_GENERATIONS_V2` table these
days. For Raft-based topology chnages we'll do the data insertion
somewhere else - in topology coordinator code. So extract the parts for
calculating the CDC generation to free-standing functions (these are
almost pure calculations, modulo accessing RNG).
2023-04-20 15:38:59 +02:00
Kamil Braun
1e9cf3badd cdc: generation: get_cdc_generation_mutations: take timestamp as parameter
The function would generate a mutation timestamp for itself, take it as
parameter instead. We'll use timestamps provided by Group 0 APIs when
creating CDC generations during Group 0- based topology changes.
2023-04-20 15:38:37 +02:00
Kamil Braun
85f4f1830b cdc: generation: make topology_description_generator::get_sharding_info a parameter
The function used to obtain the sharding info for a given node (its
number of shards and ignore_msb_bits) was using gossiper application
states.

We want to reuse `topology_description_generator` to build CDC
generations when doing Raft Group 0-based topology changes, so make
`get_sharding_info` a parameter.
2023-04-20 15:38:37 +02:00
Kamil Braun
3e863d0e58 sys_dist_ks: make get_cdc_generation_mutations public
It was a `static` function inside system_distributed_keyspace. Later it
will be used for another table living in system_keyspace, so move it
outside, to the CDC generations module, and make it accessible from
other places.
2023-04-20 15:38:37 +02:00
Kefu Chai
aed681fa3c cdc: generation: schema_tables: use defaulted operator<=>
the default generated operator<=> is exactly the same as the
handcrafted one. so let compiler do its job. also, since
operator<=> is defaulted, there is no need to define operator==
anymore, so drop it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 17:25:30 +08:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Pavel Emelyanov
4f67898e7b system_keyspace: De-static cdc_is_rewritten()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-15 18:44:59 +03:00
Pavel Emelyanov
736021ee98 system_keyspace: De-static cdc_set_rewritten()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-15 18:44:53 +03:00
Pavel Emelyanov
b3d139bbdb system_keyspace: De-static update_cdc_generation_id()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-15 18:44:40 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00