scylladb

Author	SHA1	Message	Date
Calle Wilund	d478896d46	commitlog: kill non-recycled segment management It has been default for a while now. Makes no sense to not do it. Even hints can use it (even if it makes no difference there)	2022-04-11 16:34:00 +00:00
Benny Halevy	ebbbf1e687	lister: move to utils There's nothing specific to scylla in the lister classes, they could (and maybe should) be part of the seastar library. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-02-28 12:36:03 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Benny Halevy	96aa6161d8	db: hints manager: use effective_replication_map to get_natural_endpoints Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 13:54:52 +03:00
Piotr Dulikowski	6093c2378b	hints: assign _last_written_rp in ep manager's move constructor The end_point_hints_manager's field _last_written_rp is initialized in its regular constructor, but is not copied in the move constructor. Because the move constructor is always involved when creating a new endpoint manager, the _last_written_rp field is effectively always initialized with the zero-argument constructor, and is set to the zero value. This can cause the following erroneous situation to occur: - Node A accumulates hints towards B. - Sync point is created at A. It will be used later to wait for currently accumulated hints. - Node A is restarted. The endpoint manager A->B is created which has bogus value in the _last_written_rp (it is set to zero). - Node A replays its hints but does not write any new ones. - A hint flush occurs. If there are no hint segments on disk after flush, the endpoint manager sets its last sent position to the last written position, which is by design. However, the last written position has incorrect value, so the last sent position also becomes incorrect and too low. - Try to wait for the sync point created earlier. The sync point waiting mechanism waits until last sent hint position reaches or goes past the position encoded in the sync point, but it will not happen because the last sent position is incorrect. The above bug can be (sometimes) reproduced in hintedhandoff_sync_point_api_test dtest. Now, the _last_written_rp field is properly initialized in the move constructor, which prevents the bug described above. Fixes: #9320 Closes #9426	2021-10-04 13:21:34 +02:00
Pavel Emelyanov	598841a5dd	code: Expell gossiper.hh from other headers This needs to add forward declarations of the gossiper class and re-include some other headers here and there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-09-22 13:13:06 +03:00
Piotr Dulikowski	91163fcfa5	commitlog: make it possible to provide base segment ID Adds a configuration option to the commitlog: base_segment_id. When provided, the commitlog uses this ID as a base of its segment IDs instead of calculating it based on the number of milliseconds between the epoch and boot time. This is needed in order for the feature which allows to wait for hints to be replayed to work - it relies on the replay positions monotonically increasing. Endpoint managers periodically re-creates its commitlog instance - if it is re-created when there are no segments on disk, currently it will choose the number of milliseconds between the epoch and boot time, which might result in segments being generated with the same IDs as some segments previously created and deleted during the same runtime.	2021-09-15 11:04:34 +02:00
Piotr Dulikowski	77f2448b2c	hints: propagate abort signal correctly in wait_for_sync_point When `manager::wait_for_sync_point` is called, the abort source from the arguments (`as`) might have already been triggered. In such case, the subscription which was supposed to trigger the `local_as` abort source won't be run, and the code will wait indefinitely for hints to be replayed instead of checking the replay status and returning immediately. This commit fixes the problem by manually triggering `local_as` if `as` have been triggered.	2021-09-14 14:27:01 +02:00
Piotr Dulikowski	8e29ebc5d5	hints: fix use-after-free when dismissing replay waiters When the promise waited on in the `wait_until_hints_are_replayed_up_to` function is resolved, a continuation runs which prints a log line with information about this event. The continuation captures a pointer to the hints sender and uses it to get information about the endpoint whose hints are waited for. However, at this point the sender might have been deleted - for example, when the node is being stopped and everybody waiting for hints is dismissed. This commit fixes the use-after-free by getting all necessary information while the sender is guaranteed to be alive and captures it in the continuation's capture list.	2021-09-14 13:46:16 +02:00
Nadav Har'El	49ca1f86b2	Merge 'hints: error injection for pausing hint replay' from Piotr Dulikowski Adds a `hinted_handoff_pause_hint_replay` error injection point. When enabled, hint replay logic behaves as if it is run, but it gets stuck in a loop and no hints are actually sent until the point is disabled again. This injection point will be useful in dtests - it will simulate infinitely slow hint replay and will make it possible to test how some operations behave while hint replay logic is running. The first intended use case of this injection point is testing the HTTP API for waiting for hints (#8728). Refs: #6649 Closes #8801 * github.com:scylladb/scylla: hints: fix indentation after previous patch hints: error injection for pausing hint replay hints: coroutinize lambda inside send_one_file	2021-08-11 11:42:29 +03:00
Piotr Dulikowski	f2e1339f38	hints: use an abort_source with sleep_abortable in flush+send loop Each hint sender runs an asynchronous loop with tries to flush and then send hints. Between each attempt, it sleeps at most 10 seconds using sleep_abortable. However, an overload of sleep_abortable is used which does not take an abort_source - it should abort the sleep in case Seastar handles a SIGINT or SIGTERM signal. However, in order for that to work, the application must not prevent default handling of those signals in Seastar - but Scylla explicitly does it by disabling the `auto_handle_sigint_sigterm` option in reactor config. As a result, those sleeps are never aborted, and - because we wait for the async loops to stop - they can delay shutdown by at most 10 seconds. To fix that, an abort_source is added to the hints sender, and the abort_source is triggered when the corresponding sender is requested to stop. Fixes: #9176 Closes #9177	2021-08-11 10:32:53 +02:00
Piotr Dulikowski	68cac2eab7	hints: fix indentation after previous patch	2021-08-09 16:16:14 +02:00
Piotr Dulikowski	20cbe7fa2f	hints: error injection for pausing hint replay Adds a `hinted_handoff_pause_hint_replay` error injection point. When enabled, hint replay logic behaves as if it is run, but it gets stuck in a loop and no hints are actually sent until the point is disabled again. This injection point will be useful in dtests - it will simulate infinitely slow hint replay and will make it possible to test how some operations behave while hint replay logic is running. The first intended use case of this injection point is testing the HTTP API for waiting for hints (#8728). Refs: #6649	2021-08-09 16:16:14 +02:00
Piotr Dulikowski	29993f7745	hints: coroutinize lambda inside send_one_file Converts the lambda invoked for every commitlog entry in a hints file into a coroutine.	2021-08-09 16:16:14 +02:00
Piotr Dulikowski	d41d39bbcd	hints: add functions for creating and waiting for sync points Adds functions which allow to create per-shard sync points and wait for them.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	70df9973f3	hints: make it possible to wait until hints are replayed Adds necessary infrastructure which allows, for a given endpoint manager, to wait until hints are replayed up to a specified position. An abort source must be specified which, if triggered, cancels waiting for hint replay. If the endpoint manager is stopped, current waiters are dismissed with an exception.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	93f244426d	hints: track the RP of the last replayed position Keeps track of a position which serves as an upper bound for positions of already replayed hints - i.e. all hints with replay positions strictly lower than it are considered replayed. In order to accurately track this bound during hint replay, a std::map is introduced which contains positions of hints which are currently being sent.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	03e2e671cd	hints: track the RP of the last written hint The position of the last written hint is now tracked by the endpoint hints manager. When manager is constructed and no hints are replayed yet, the last written hint position is initialized to the beginning of a fake segment with ID corresponding to the current number of milliseconds since the epoch. This choice makes sure that, in case a new hint sync point is created before any hints are written, the position recorded for that hint queue will be larger than all replay positions in segments currently stored on disk.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	27d0d598fd	hints: change last_attempted_rp to last_succeeded_rp Instead of tracking the last position for which hint sending is attempted, the last successfully replayed position is tracked. The previous variable was used to calculate the position from which hint replay should restart in case of an error, in the following way: _last_not_complete_rp = ctx_ptr->first_failed_rp.value_or( ctx_ptr->last_attempted_rp.value_or(_last_not_complete_rp)); Now, this formula uses the last_succeeded_rp in place of last_attempted_rp. This change does not have an effect on the choice of the starting position of the next retry: - If the hint at `last_attempted_rp` has succeeded, in the new algorithm the same position will be recorded in `last_succeeded_rp`, and the formula will yield the same result. - If the hint at `last_attempted_rp` has failed, it will be accounted into `first_failed_rp`, so the formula will yield the same result. The motivation for this change is that in the next commits of this PR we will start tracking the position of the last replayed hint per hint queue, and the meaning of the new variable makes it more useful - when there are no failed hints in the hint sending attempt, last_succeeded_rp gives us information that hints _up to this position_ were replayed; the last_attempted_rp variable can only tell us that hints _before that position_ were replayed successfully.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	08a7d79ffc	hints: rearrange error handling logic for hint sending Instead of calling the `on_hint_send_failure` method inside the hint sending task in places where an error occurs, we now let the exceptions be returned and handle them inside a single `then_wrapped` attached to the hint sending task. Apart from the `then_wrapped`, there is one more place which calls `on_hint_send_failure` - in the exception handler for the future which spawns the asynchronous hint sending task. It needs to be kept separate because it is a part of a separate task.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	45b04c94e0	hints: sort segments by ID, divide into foreign and local Endpoint hints manager keeps a commitlog instance which is used to write hints into new segments. This instance is re-created every 10 seconds, which causes the previous instance to leave its segments on disk. On the other hand, hints sender keeps a list of segments to replay which is updated only after it becomes empty. The list is repopulated with segments returned by the commitlog::get_segments_to_replay() method which does not specify the order of the segments returned. As a preparation for the upcoming hint sync points feature, this commit changes the order in which segments are replayed: - First, segments written by other shards are replayed. Such segments may appear in the queue because of segment rebalancing which is done at startup. The purpose of replaying "foreign" segments first is that they are problematic for hint sync points. For each hint queue, a hint sync point encodes a replay position of the last written hint on the local shard. Accounting foreign segments precisely would make the implementation more complicated. To make things simpler, waiting for sync points will always make sure that all foreign segments are replayed. This might sometimes cause more hints to be waited on than necessary if a restart occurs in the meantime. - Segments written by the local shard are replayed later, in order of their IDs. This makes sure that local hints are replayed in the order they were written to segments, and will make it possible to use replay positions to track progress of hint replay.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	f83699bb7c	Revert "db/hints: allow to forcefully update segment list on flush" This reverts commit `e48739a6da`. This commit removes the functionality from endpoint hints manager which allowed to flush hints immediately and forcefully update the list of segments to replay. The new implementation of waiting for hints will be based on replay positions returned by the commitlog API and it won't be necessary to forcefully update the segment list when creating a sync point.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	9c1d4e7e6c	Revert "db/hints: add a metric for counting processed files" This reverts commit `5a49fe74bb`. This commit removes a metric which tracks how many segments were replayed during current runtime. It was necessary for current "wait for hints" mechanism which is being replaced with a different one - therefore we can remove the metric.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	3b851a5ebd	Revert "db/hints: make it possible to wait until current hints are sent" This reverts commit `427bbf6d86`. This commit removes the infrastructure which allows to wait until current hints are replayed in a given hint queue. It will be replaced with a different mechanism in later commits.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	4a35d138f6	Revert "storage_proxy: add functions for syncing with hints queue" This reverts commit `244738b0d5`. This commit removes create_hint_queue_sync_point and check_hint_queue_sync_point functions from storage_proxy, which were used to wait until local hints are sent out to particular nodes. Similar methods will be reintroduced later in this PR, with a completely different implementation.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	e3c32c897a	Revert "hints: dismiss segment waiters when hint queue can't send" This reverts commit `9d68824327`. First, we are reverting existing infrastructure for waiting for hints in order to replace it with a different one, therefore this commit needs to be reverted as well. Second, errors during hint replay can occur naturally and don't necessarily indicate that no progress can be made - for example, the target node is heavily loaded and some hints time out. The "waiting for hints" operation becomes a user-issued command, so it's not as vital to ensure liveness.	2021-08-09 09:06:23 +02:00
Pavel Emelyanov	92a4278cd1	hints: Drop storage service from managers The storage service pointer is only used so (un)subscribe to (from) lifecycle events. Now the subscription is gone, so can the storage service pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-17 15:09:36 +03:00
Pavel Emelyanov	acdc568ecf	hints: Do not subscribe managers on lifecycle events directly Managers sit on storage proxy which is already subscribed on lifecycle events, so it can "notify" hints managers directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-17 15:06:26 +03:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pavel Solodovnikov	e0749d6264	treewide: some random header cleanups Eliminate not used includes and replace some more includes with forward declarations where appropriate. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-06 19:18:49 +03:00
Avi Kivity	cea5493cb7	storage_proxy, treewide: introduce names for vectors of inet_address storage_proxy works with vectors of inet_addresses for replica sets and for topology changes (pending endpoints, dead nodes). This patch introduces new names for these (without changing the underlying type - it's still std::vector<gms::inet_address>). This is so that the following patch, that changes those types to utils::small_vector, will be less noisy and highlight the real changes that take place.	2021-05-05 18:36:48 +03:00
Piotr Dulikowski	9d68824327	hints: dismiss segment waiters when hint queue can't send When a hint queue becomes stuck due to not being able to send to its destination (e.g. destination node is no longer UP, or we failed to send some hints from a file), then it's better to immediately dismiss anybody who waits for hint replay instead of letting them wait until timeout.	2021-04-27 15:58:15 +02:00
Piotr Dulikowski	244738b0d5	storage_proxy: add functions for syncing with hints queue Adds two methods to `storage_proxy`: - `create_hint_queue_sync_point` - creates a "hint sync point" which is kept present in storage_proxy until all hint queues on the local node reach their curent end. It will also disappear if given deadline is reached first. - `check_hint_queue_sync_point` - checks if given hint sync point still exists. The created sync point waits for hint queues in all hint managers, on all shards.	2021-04-27 15:06:39 +02:00
Piotr Dulikowski	427bbf6d86	db/hints: make it possible to wait until current hints are sent Implements `wait_until_hints_are_replayed` method returning a future which blocks until either all current hint segments are replayed (returns success in this case), or when provided timeout is reached (returns a timeout exception in this case).	2021-04-26 13:57:03 +02:00
Piotr Dulikowski	5a49fe74bb	db/hints: add a metric for counting processed files Adds a field to `end_point_hints_manager::sender`: `_total_replayed_segments_count` which keeps track of how many segments were replayed so far. This metric will be used to calculate the sequence number of the last current hint segments in the queue - so that we can implement waiting for current segments to be replayed.	2021-04-22 18:45:34 +02:00
Piotr Dulikowski	e48739a6da	db/hints: allow to forcefully update segment list on flush Endpoint hints manager keeps a list of segments to replay. New segments are appended to it lazily - only when a hint flush occurs (hints commitlog instance is re-created) and the list is empty. Because of that, this list cannot be currently used to tell how many segments are on disk. This commit allows to trigger hints flush and forcefully update the list of segments to replay. In later commits, a mechanism will be implemented which will allow to wait until a given number of hint segments is replayed. Triggering a hints flush with segment list update will allow us to properly synchronize and determine up to which segment we need to wait.	2021-04-22 17:34:04 +02:00
Piotr Dulikowski	4f90514247	hints: use token_metadata to tell if node is in the ring Now, instead of looking at the gossiper state to check if the destination node is still in the ring, we are using token_metadata as a source of truth. This results in much simpler code in can_send() as token_metadata has an is_member method which does exactly what we want.	2021-04-01 03:58:29 +02:00
Piotr Dulikowski	e7d9057d0c	hints: slightly reogranize "if" statement in can_send This commit reverses the order of if-else blocks in can_send, which makes it - in my opinion, at least - slightly easier to read.	2021-04-01 03:58:29 +02:00
Piotr Sarna	added53b7d	Merge 'hints: use a soft disk space limit in hints commitlog' from Piotr Dulikowski A recent change to the commitlog (`4082f57`) caused its configurable size limit to be strictly enforced - after reaching the limit, new segments wouldn't be allocated until some of the previous segments are freed. This flow can work for the regular commitlog, however the hints commitlog does not delete the segments itself - instead, hints manager recreates its commitlog every 10 seconds, picks up segments left by the previous instance and deletes each segment manually only after all hints are sent out from a segment. Because of the non-standard flow, it is possible that the hints commitlog fills up and stops accepting more hints. Hints manager uses a relatively low limit for each commitlog instance (128MB divided by shard count), so it's not hard to fill it up. What's worse, hints manager tries to acquire file_update_mutex in exclusive mode before re-creating the commitlog, while hints waiting to be written acquire this lock in shared mode - which causes hints flushing to completely deadlock and no more hints be admitted to the commitlog. The queue of hints waiting to be admitted grows very quickly and soon all writes which could result in a hint being generated are rejected with OverloadedException. To solve this problem, it is now possible to bring back the soft disk space limit by setting a flag in commitlog's configuration. Tests: - unit(dev) - wrote hints for 15 minutes in order to see if it gets stuck again Fixes #8137 Closes #8206 * github.com:scylladb/scylla: hints_manager: don't use commitlog hard space limit commitlog: add an option to allow going over size limit	2021-03-04 12:24:05 +01:00
Piotr Dulikowski	376da49cf4	hints_manager: don't use commitlog hard space limit This commit disables the hard space limit applied by commitlogs created to store hints. The hard limit causes problems for hints because they use small-sized commitlogs to store hints (128MB, currently). Instead of letting the commitlog delete the segments itself, it recreates the commitlog every 10 seconds and manually deletes old segments after all hints are sent out from them. If the 128MB limit is reached, the hints manager will get stuck. A future which puts hint into commitlog holds a shared lock, and commitlog recreation needs to get an exclusive lock, which results in a deadlock. No more hints will be admitted, and eventually we will start rejecting writes with OverloadedException due to too many hints waiting to be admitted to the commitlog. By disabling the hard limit for hints commitlog, the old behavior is brought back - commitlog becomes more conservative with the space used after going over its size limit, but does not block until some of its segments are deleted.	2021-03-02 16:53:50 +01:00
Benny Halevy	baf5d05631	storage_service: use atomic_vector for lifecycle_subscribers So it can be modified while walked to dispatch subscribed event notifications. In #8143, there is a race between scylla shutdown and notify_down(), causing use-after-free of cql_server. Using an atomic vector itstead and futurizing unregister_subscriber allows deleting from _lifecycle_subscribers while walked using atomic_vector::for_each. Fixes #8143 Test: unit(release) DTest: update_cluster_layout_tests:TestUpdateClusterLayout.add_node_with_large_partition4_test(release) materialized_views_test.py:TestMaterializedViews.double_node_failure_during_mv_insert_4_nodes_test(release) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210224164647.561493-2-bhalevy@scylladb.com>	2021-03-01 20:34:42 +02:00
Piotr Dulikowski	220a2ca800	hints_manager: implement change_host_filter Implements a function which is responsible for changing hints manager configuration while it is running. It first starts new endpoint managers for endpoints which weren't allowed by previous filter but are now, and then stops endpoint managers which are rejected by the new filter. The function is blocking and waits until all relevant ep managers are started or stopped.	2020-11-17 10:24:43 +01:00
Piotr Dulikowski	cefe5214ff	config: plug in hints::host_filter object into configuration Uses db::hints::host_filter as the type of hinted_handoff_enabled configuration option. Previously, hinted_handoff_enabled used to be a string option, and it was parsed later in a separate function during startup. The function returned a std::optional<std::unordered_set<sstring>>, whose meaning in the context of hints is rather enigmatic for an observer not familiar with hints. Now, hinted_handoff_enabled has type of db::hints::host_filter, and it is plugged into the config parsing framework, so there is no need for later post-processing.	2020-11-17 10:24:42 +01:00
Piotr Dulikowski	40710677d0	hints: introduce db::hints::directory_initializer Introduces a db::hints::directory_initializer object, which encapsulates the logic of initializing directories for hints (creating/validating directories, segment rebalancing). It will be useful for lazy initialization of hints manager.	2020-11-17 10:15:47 +01:00
Benny Halevy	8bcdf39a18	hints/manager: scan_for_hints_dirs: fix use-after-move This use-after move was apprently exposed after switching to clang in commit `eb861e68e9`. The directory_entry is required for std::stoi(de.name.c_str()) and later in the catch{} clause. This shows in the node logs as a "Ignore invalid directory" debug log message with an empty name, and caused the hintedhandoff_rebalance_test to fail when hints files aren't rebalanced. Test: unit(dev) DTest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test (dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201106172017.823577-1-bhalevy@scylladb.com>	2020-11-09 16:32:54 +01:00
Avi Kivity	cb9a9584ac	db: hints/manager: don't capture structured bindings in lambdas Clang does not yet implement p1091r3, which allows lambdas to capture structured bindings. To accomodate it, don't use structured bindings for variables that are later captured.	2020-10-16 15:24:09 +03:00
Piotr Dulikowski	77a0f1a153	hints: don't read hint files when it's not allowed to send When there are hint files to be sent and the target endpoint is DOWN, end_point_hints_manager works in the following loop: - It reads the first hint file in the queue, - For each hint in the file it decides that it won't be sent because the target endpoint is DOWN, - After realizing that there are some unsent hints, it decides to retry this operation after sleeping 1 second. This causes the first segment to be wholly read over and over again, with 1 second pauses, until the target endpoint becomes UP or leaves the cluster. This causes unnecessary I/O load in the streaming scheduling group. This patch adds a check which prevents end_point_hints_manager from reading the first hint file at all when it is not allowed to send hints. First observed in #6964 Tests: - unit(dev) - hinted handoff dtests Closes #7407	2020-10-12 19:09:57 +03:00

1 2 3

141 Commits