scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Author	SHA1	Message	Date
Petr Gusev	7b55ccbd8e	token_metadata: drop the template Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.	2023-12-12 23:19:54 +04:00
Petr Gusev	799f747c8f	shared_token_metadata: switch to the new token_metadata	2023-12-12 23:19:54 +04:00
Petr Gusev	7eb7863635	cdc: switch to token_metadata2 Change the token_metadata type to token_metadata2 in the signatures of CDC-related methods in storage_service and cdc/generation. Use get_new_strong to get a pointer to the new host_id-based token_metadata from the inet_address-based one, living in the shared_token_metadata. The starting point of the patch is in storage_service::handle_global_request. We change the tmptr type to token_metadata2 and propagate the change down the call chains. This includes token-related methods of the boot_strapper class.	2023-12-12 23:19:53 +04:00
Benny Halevy	25754f843b	gossiper: add get_this_endpoint_state_ptr Returns this node's endpoint_state_ptr. With this entry point, the caller doesn't need to get_broadcast_address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Michael Huang	62a8a31be7	cdc: use chunked_vector for topology_description entries Lists can grow very big. Let's use a chunked vector to prevent large contiguous allocations. Fixes: #15302. Closes scylladb/scylladb#15428	2023-09-18 23:17:01 +03:00
Patryk Jędrzejczak	1c58c6336a	system_keyspace: change id to timeuuid in CDC_GENERATIONS_V3 We change the type of IDs in CDC_GENERATIONS_V3 to timeuuid to give them a time-based order. We also change how we initialize them so that the new CDC generation always has the highest ID. This is the last step to enabling the efficient clearing of obsolete CDC generation data. Additionally, we change the types of current_cdc_generation_uuid, new_cdc_generation_data_uuid and the second values of the elements in unpublished_cdc_generations to timeuuid, so that they match id in CDC_GENERATIONS_V3.	2023-09-12 11:43:34 +02:00
Patryk Jędrzejczak	fab066cffe	cdc: generation: remove topology_description_generator After moving the creation of uuid out of make_new_generation_description, this function only calls the topology_description_generator's constructor and its generate method. We could remove this function, but we instead simplify the code by removing the topology_description_generator class. We can do this refactor because make_new_generation_description is the only place using it. We inline its generate method into make_new_generation_description and turn its private methods into static functions.	2023-09-12 11:18:54 +02:00
Patryk Jędrzejczak	3bf4cac72e	cdc: do not create uuid in make_new_generation_data In the future commit, we change how we initialize uuid of the new CDC generation in the Raft-based topology. It forces us to move this initialization out of the make_new_generation_data function shared between Raft-based and gossiper-based topologies. We also rename make_new_generation_data to make_new_generation_description since it only returns cdc::topology_description now.	2023-09-12 11:18:38 +02:00
Patryk Jędrzejczak	2cd430ac80	system_kayspace: make CDC_GENERATIONS_V3 single-partition We make CDC_GENERATIONS_V3 single-partition by adding the key column and changing the clustering key from range_end to (id, range_end). This is the first step to enabling the efficient clearing of obsolete CDC generation data, which we need to prevent Raft-topology snapshots from endlessly growing as we introduce new generations over time. The next step is to change the type of the id column to timeuuid. We do it in the following commits. After making CDC_GENERATIONS_V3 single-partition, there is no easy way of preserving the num_ranges column. As it is used only for sanity checking, we remove it to simplify the implementation.	2023-09-12 09:51:45 +02:00
Patryk Jędrzejczak	29f54836d0	cdc: generation: introduce get_common_cdc_generation_mutations In the following commit, we implement the get_cdc_generation_mutations_v3 function very similar to get_cdc_generation_mutations_v2. The only differences in creating mutations between CDC_GENERATIONS_V2 and CDC_GENERATIONS_V3 are: - a need to set the num_ranges cell for CDC_GENERATIONS_V2, - different partition keys, - different clustering keys. To avoid code duplication, we introduce get_common_cdc_generation_mutations, which does most of the work shared by both functions.	2023-09-12 09:37:21 +02:00
Patryk Jędrzejczak	ed1c1369d9	cdc: generation: rename get_cdc_generation_mutations In the following commits, we modify the CDC_GENERATIONS_V3 schema to enable efficient clearing of obsolete CDC generation data. These modifications make the current get_cdc_generation_mutations work only for the CDC_GENERATIONS_V2 schema, and we need a new function for CDC_GENERATIONS_V3, so we add the "_v2" suffix.	2023-09-11 12:30:21 +02:00
Benny Halevy	c16ec870da	gms: pass endpoint_state_ptr to endpoint_state change subscribers Now that the endpoint_state isn't change in place we do not need to copy it to each subscriber. We can rather just pass the lw_shared_ptr holding a snapshot of it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:35:15 +03:00
Benny Halevy	1e0d19b89d	cdc/generation: get_generation_id_for: get endpoint_state& No need to lookup the application_state again using the endpoint, as both callers already have a reference to the endpoint_state handy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 08:34:23 +03:00
Benny Halevy	4f5ffc7719	gossiper: add for_each_endpoint_state helpers Before changing _endpoint_state_map to hold a lw_shared_ptr<endpoint_state>, provide synchronous helpers for users to traverse all endpoint_states with no need to copy them (as long as the called func does not yield). With that, gossiper::get_endpoint_states() can be made private. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 08:32:31 +03:00
Pavel Emelyanov	4bf8f693ee	cdc: Remove bootstrap state assertion from after_join() As was described in the previous patch, this method is explicitly called by storage service after updating the bootstrap state, so it's unneeded Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:47:35 +03:00
Pavel Emelyanov	566f57b683	cdc: Rework gen. service check for bootstrap state The legacy_handle_cdc_generation() checks if the node had bootstrapped with the help of system_keyspace method. The former is called in two cases -- on boot via cdc_generation_service::after_join() and via gossiper on_...() notifications. The notifications, in turn, are set up in the very same after_join(). The after_join(), in turn, is called from storage_service explicitly after the bootstrap state is updated to be "complete", so the check for the state in legacy_handle_...() seems unnecessary. However, there's still the case when it may be stepped on -- decommission. When performed it calls storage_service::leave_ring() which udpates the bootstrap state to be "needed", thus preventing the cdc gen. service from doing anything inside gossiper's on_...() notifications. It's more correct to stop cdc gen. service handling gossiper notifications by unsubscribing it, but by adding fragile implicit dependencies on the bootstrap state. Checks for sys.dist.ks in the legacy_handle_...() are kept in a form of on-internal-error. The system distributed keyspace is activated by storage service even before the bootstrap state is updated and is never deactivated, but it's anyway good to have this assertion. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-29 09:46:13 +03:00
Kamil Braun	39ca07c49b	Merge 'Gossiper endpoint locking' from Benny Halevy This series cleans up and hardens the endpoint locking design and implementation in the gossiper and endpoint-state subscribers. We make sure that all notifications (expect for `before_change`, that apparently can be dropped) are called under lock_endpoint, as well as all calls to gossiper::replicate, to serialize endpoint_state changes across all shards. An endpoint lock gets a unique permit_id that is passed to the notifications and passed back by them if the notification functions call the gossiper back for the same endpoint on paths that modify the endpoint_state and may acquire the same endpoint lock - to prevent a deadlock. Fixes scylladb/scylladb#14838 Refs scylladb/scylladb#14471 Closes #14845 * github.com:scylladb/scylladb: gossiper: replicate: ensure non-null permit gossiper: add_saved_endpoint: lock_endpoint gossiper: mark_as_shutdown: lock_endpoint gossiper: real_mark_alive: lock_endpoint gossiper: advertise_token_removed: lock_endpoint gossiper: do_status_check: lock_endpoint gossiper: remove_endpoint: lock_endpoint if needed gossiper: force_remove_endpoint: lock_endpoint if needed storage_service: lock_endpoint when removing node gossiper: use permit_id to serialize state changes while preventing deadlocks gossiper: lock_endpoint: add debug messages utils: UUID: make default tagged_uuid ctor constexpr gossiper: lock_endpoint must be called on shard 0 gossiper: replicate: simplify interface gossiper: mark_as_shutdown: make private gossiper: convict: make private gossiper: mark_as_shutdown: do not call convict	2023-08-02 13:50:08 +02:00
Benny Halevy	f74d154fe3	gossiper: use permit_id to serialize state changes while preventing deadlocks Pass permit_id to subscribers when we acquire one via lock_endpoint. The subscribers then pass it back to gossiper for paths that acquire lock_endpoint for the same endpoint, to detect nested locks when the endpoint is locked with the same permit_id. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-01 17:41:57 +03:00
Kamil Braun	8bb3732d66	Merge 'storage_service: raft_check_and_repair_cdc_streams: don't create a new generation if current one is optimal' from Patryk Jędrzejczak We add the CDC generation optimality check in `storage_service::raft_check_and_repair_cdc_streams` so that it doesn't create new generations when unnecessary. Since `generation_service::check_and_repair_cdc_streams` already has this check, we extract it to the new `is_cdc_generation_optimal` function to not duplicate the code. After this change, multiple tasks could wait for a single generation change. Calling `signal` on `topology_state_machine.event` would't wake them all. Moreover, we must ensure the topology coordinator wakes when his logic expects it. Therefore, we change all `signal` calls on `topology_state_machine.event` to `broadcast`. We delay the deletion of the `new_cdc_generation` request to the moment when the topology transition reaches the `publish_cdc_generation` state. We need this change to ensure the added CDC generation optimality check in the next commit has an intended effect. If we didn't make it, it would be possible that a task makes the `new_cdc_generation` request, and then, after this request was removed but before committing the new generation, another task also makes the `new_cdc_generation` request. In such a scenario, two generations are created, but only one should. After delaying the deletion of `new_cdc_generation` requests, the second request would have no effect. Additionally, we modify the `test_topology_ops.py` test in a way that verifies the new changes. We call `storage_service::raft_check_and_repair_cdc_streams` multiple times concurrently and verify that exactly one generation has been created. Fixes #14055 Closes #14789 * github.com:scylladb/scylladb: storage_service: raft_check_and_repair_cdc_streams: don't create a new generation if current one is optimal storage_service: delay deletion of the new_cdc_generation request raft topology: broadcast on topology_state_machine.event instead of signal cdc: implement the is_cdc_generation_optimal function	2023-08-01 12:10:00 +02:00
Patryk Jędrzejczak	b05b4a352a	cdc: implement the is_cdc_generation_optimal function In the following commits, we add the CDC generation optimality check to storage_service::raft_check_and_repair_cdc_streams so that it doesn't create new CDC generations when unnecessary. Since generation_service::check_and_repair_cdc_streams already has this check, we extract it to the new is_cdc_generation_optimal function to not duplicate the code.	2023-07-28 11:04:17 +02:00
Aleksandra Martyniuk	cdbfa0b2f5	replica: iterate safely over tables related maps Loops over _column_families and _ks_cf_to_uuid which may preempt are protected by reader mode of rwlock so that iterators won't get invalid.	2023-07-25 17:13:04 +02:00
Aleksandra Martyniuk	52afd9d42d	replica: wrap column families related maps into tables_metadata As a preparation for ensuring access safety for column families related maps, add tables_metadata, access to members of which would be protected by rwlock.	2023-07-25 16:13:00 +02:00
Mikołaj Grzebieluch	4e3c97d8d4	test: raft topology: test `prepare_and_broadcast_cdc_generation_data` This test limits `commitlog_segment_size_in_mb` to 2, thus `max_command_size` is limited to less than 1 MB. It adds an injection which copies mutations generated by `get_cdc_generation_mutations` n times, where n is picked that the memory size of all mutations exceeds `max_command_size`. This test passes if cdc generation data is committed by raft in multiple commands. If all the data is committed in a single command, the leader node will loop trying to send raft command and getting the error: ``` storage_service - raft topology: topology change coordinator fiber got error raft::command_is_too_big_error (Command size {} is greater than the configured limit {}) ```	2023-07-07 13:56:35 +02:00
Kefu Chai	7863ef53ad	cdc: generation: specialize fmt::formatter<generation_id> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `generation_id` without the help of `operator<<`. the corresponding `operator<<()` is removed in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-28 15:47:44 +08:00
Kefu Chai	87e9686f61	cdc: generation: simpify std::visit() call if the visitor clauses are the same, we can just use the generic version of it by specifying the parameter with `auto&`. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13626	2023-04-27 14:43:20 +02:00
Kefu Chai	c06b20431e	cdc: generation: use default-generated operator== now that C++20 generates operator== for us, these is no need to handcraft it manually. also, in C++17, the standard library offers default implementation of operator== for `std::variant<>`, so no need to implement it by ourselves. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13625	2023-04-24 10:13:28 +03:00
Benny Halevy	c5d819ce60	gms: versioned_value: make members private and provide accessor functions to get them. 1. So they can't be modified by mistake, as the versioned value is immutable. A new value must have a higher version. 2. Before making the version a strong gms::version_type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 08:37:32 +03:00
Kamil Braun	1688001585	cdc: generation_service: add a FIXME	2023-04-20 16:36:41 +02:00
Kamil Braun	d13a0b1930	cdc: generation_service: add legacy_ prefix for gossiper-based functions Most of the code in the service exists to handle gossiper-based topology changes. Name the functions appropriately and add a note in the comments.	2023-04-20 16:36:41 +02:00
Kamil Braun	4c99b4004b	storage_service: use CDC generations introduced by Raft topology When a node notices that a new CDC generation was introduced in `storage_service::topology_state_load`, it updates its internal data structures that are used when coordinating writes to CDC log tables.	2023-04-20 16:36:41 +02:00
Kamil Braun	3abe0f0ad6	cdc: generation: extract pure parts of `make_new_generation` outside `cdc::generation_service::make_new_cdc_generation` would create a new CDC generation and insert it into the `CDC_GENERATIONS_V2` table these days. For Raft-based topology chnages we'll do the data insertion somewhere else - in topology coordinator code. So extract the parts for calculating the CDC generation to free-standing functions (these are almost pure calculations, modulo accessing RNG).	2023-04-20 15:38:59 +02:00
Kamil Braun	1e9cf3badd	cdc: generation: `get_cdc_generation_mutations`: take timestamp as parameter The function would generate a mutation timestamp for itself, take it as parameter instead. We'll use timestamps provided by Group 0 APIs when creating CDC generations during Group 0- based topology changes.	2023-04-20 15:38:37 +02:00
Kamil Braun	85f4f1830b	cdc: generation: make `topology_description_generator::get_sharding_info` a parameter The function used to obtain the sharding info for a given node (its number of shards and ignore_msb_bits) was using gossiper application states. We want to reuse `topology_description_generator` to build CDC generations when doing Raft Group 0-based topology changes, so make `get_sharding_info` a parameter.	2023-04-20 15:38:37 +02:00
Kamil Braun	3e863d0e58	sys_dist_ks: make `get_cdc_generation_mutations` public It was a `static` function inside system_distributed_keyspace. Later it will be used for another table living in system_keyspace, so move it outside, to the CDC generations module, and make it accessible from other places.	2023-04-20 15:38:37 +02:00
Kefu Chai	aed681fa3c	cdc: generation: schema_tables: use defaulted operator<=> the default generated operator<=> is exactly the same as the handcrafted one. so let compiler do its job. also, since operator<=> is defaulted, there is no need to define operator== anymore, so drop it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-28 17:25:30 +08:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Pavel Emelyanov	4f67898e7b	system_keyspace: De-static cdc_is_rewritten() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-15 18:44:59 +03:00
Pavel Emelyanov	736021ee98	system_keyspace: De-static cdc_set_rewritten() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-15 18:44:53 +03:00
Pavel Emelyanov	b3d139bbdb	system_keyspace: De-static update_cdc_generation_id() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-09-15 18:44:40 +03:00
Benny Halevy	257d74bb34	schema, everywhere: define and use table_id as a strong type Define table_id as a distinct utils::tagged_uuid modeled after raft tagged_id, so it can be differentiated from other uuid-class types, in particular from table_schema_version. Fixes #11207 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:41 +03:00
Pavel Emelyanov	cf0f912e59	cdc: Handle sleep-aborted exception on stop When update_streams_description() fails it spawns a fiber and retries the update in the background once every 60s. If the sleeping between attempts is aborted, the respective exceptional future happens to be ignored and warned in logs. fixes: #11192 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220802132148.20688-1-xemul@scylladb.com>	2022-08-04 13:03:29 +02:00
Avi Kivity	19ab3edd77	gms: feature_service: remove variable/helper function duplication Each feature has a private variable and a public accessor. Since the accessor effectively makes the variable public, avoid the intermediary and make the variable public directly. To ease mechanical translation, the variable name is chosen as the function name (without the cluster_supports_ prefix). References throughout the codebase are adjusted.	2022-05-04 18:59:56 +03:00
Pavel Emelyanov	c2cf4e3536	system_keyspace,cdc,storage_service: Make bootstrap manipulations non-static The users of get_/set_bootstrap_sate and aux helpers are CDC and storage service. Both have local system_keyspace references and can just use them. This removes some users of global system ks. cache and the qctx thing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-25 15:08:13 +03:00
Pavel Emelyanov	62417577ab	cdc_generation_service: Add system keyspace dependency The service uses system keyspace to, e.g., manage the generation id, thus it depends on the system_keyspace instance and deserves the explicit reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-25 13:39:32 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Kamil Braun	fe0366f6bc	cdc: `check_and_repair_cdc_streams`: fix indentation	2022-01-13 23:10:18 +02:00
Juliusz Stasiewicz	ea46439858	cdc: `check_and_repair_cdc_streams`: regenerate if too many streams are present If the number of streams exceeds the number of token ranges it indicates that some spurious streams from decommissioned nodes are present. In such a situation - simply regenerate. Fixes #9772 Closes #9780	2022-01-13 23:10:18 +02:00
Pavel Solodovnikov	5dcfb94d5a	gms: i_endpoint_state_change_subscriber: make callbacks to return futures Coroutinize a few simple callbacks in the process. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-01-11 09:29:12 +03:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00

1 2 3

113 Commits