scylladb

Author	SHA1	Message	Date
Pavel Solodovnikov	c0854a0f62	raft: create system tables only when `raft` experimental feature is set Also introduce a tiny function to return raft-enabled db config for cql testing. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210826091432.279532-1-pa.solodovnikov@scylladb.com>	2021-08-26 12:21:12 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	fe479aca1d	reader_permit: add timeout member To replace the timeout parameter passed to flat_mutation_reader methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 14:29:44 +03:00
Pavel Solodovnikov	22794efc22	db: add experimental option for raft Introduce `raft` experimental option. Adjust the tests accordingly to accomodate the new option. It's not enabled by default when providing `--experimental=true` config option and should be requested explicitly via `--experimental-options=raft` config option. Hide the code related to `raft_group_registry` behind the switch. The service object is still constructed but no initialization is performed (`init()` is not called) if the flag is not set. Later, other raft-related things, such as raft schema changes, will also use this flag. Also, don't introduce a corresponding gossiper feature just yet, because again, it should be done after the raft schema changes API contract is stabilized. This will be done in a separate series, probably related to implementing the feature itself. Tests: unit(dev) Ref #9239. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210823121956.167682-1-pa.solodovnikov@scylladb.com>	2021-08-23 17:45:58 +03:00
Benny Halevy	e9aff2426e	everywhere: make deferred actions noexcept Prepare for updating seastar submodule to a change that requires deferred actions to be noexcept (and return void). Test: unit(dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:52 +03:00
Benny Halevy	ef8ec54970	commitlog: segment, segment_manager: mark methods noexcept Prepare for marking deferred_actions nexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:40 +03:00
Benny Halevy	4439e5c132	everywhere: cleanup defer.hh includes Get rid of unused includes of seastar/util/{defer,closeable}.hh and add a few that are missing from source files. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:39 +03:00
Calle Wilund	3633c077be	commitlog/config: Make hard size enforcement false by default + add config opt Refs #9053 Flips default for commitlog disk footprint hard limit enforcement to off due to observed latency stalls with stress runs. Instead adds an optional flag "commitlog_use_hard_size_limit" which can be turned on to in fact do enforce it. Sort of tape and string fix until we can properly tweak the balance between cl & sstable flush rate. Closes #9195	2021-08-15 15:10:27 +03:00
Asias He	97bb2e47ff	storage_service: Enable Repair Based Note Operations (RBNO) by default for replace We decided to enable repair based node operations by default for replace node operations. To do that, a new option --allowed-repair-based-node-ops is added. It lists the node operations that are allowed to enable repair based node operations. The operations can be bootstrap, replace, removenode, decommission and rebuild. By default, --allowed-repair-based-node-ops is set to contain "replace". Note, the existing option --enable-repair-based-node-ops is still in play. It is the global switch to enable or disable the feature. Examples: - To enable bootstrap and replace node ops: ``` scylla --enable-repair-based-node-ops true --allowed-repair-based-node-ops replace,bootstrap ``` - To disable any repair based node ops: ``` scylla --enable-repair-based-node-ops false ``` Closes #9197	2021-08-15 13:30:46 +03:00
Piotr Sarna	84876a165b	db,schema_tables: add handling user-defined aggregates Aggregates are propagated, created and dropped very similarly to user-defined functions - a set of helper functions for aggregates are added based on the UDF implementation.	2021-08-13 11:14:11 +02:00
Piotr Sarna	58196e8ea6	db,view: avoid ignoring failed future in background view updates The code for handling background view updates used to propagate exceptions unconditionally, which leads to "exceptional future ignored" warnings if the update was put to background. From now on, the exception is only propagated if its future is actually waited on. Fixes #6187 Tested manually, the warning was not observed after the patch Closes #9179	2021-08-12 17:32:35 +03:00
Nadav Har'El	49ca1f86b2	Merge 'hints: error injection for pausing hint replay' from Piotr Dulikowski Adds a `hinted_handoff_pause_hint_replay` error injection point. When enabled, hint replay logic behaves as if it is run, but it gets stuck in a loop and no hints are actually sent until the point is disabled again. This injection point will be useful in dtests - it will simulate infinitely slow hint replay and will make it possible to test how some operations behave while hint replay logic is running. The first intended use case of this injection point is testing the HTTP API for waiting for hints (#8728). Refs: #6649 Closes #8801 * github.com:scylladb/scylla: hints: fix indentation after previous patch hints: error injection for pausing hint replay hints: coroutinize lambda inside send_one_file	2021-08-11 11:42:29 +03:00
Piotr Dulikowski	f2e1339f38	hints: use an abort_source with sleep_abortable in flush+send loop Each hint sender runs an asynchronous loop with tries to flush and then send hints. Between each attempt, it sleeps at most 10 seconds using sleep_abortable. However, an overload of sleep_abortable is used which does not take an abort_source - it should abort the sleep in case Seastar handles a SIGINT or SIGTERM signal. However, in order for that to work, the application must not prevent default handling of those signals in Seastar - but Scylla explicitly does it by disabling the `auto_handle_sigint_sigterm` option in reactor config. As a result, those sleeps are never aborted, and - because we wait for the async loops to stop - they can delay shutdown by at most 10 seconds. To fix that, an abort_source is added to the hints sender, and the abort_source is triggered when the corresponding sender is requested to stop. Fixes: #9176 Closes #9177	2021-08-11 10:32:53 +02:00
Piotr Dulikowski	68cac2eab7	hints: fix indentation after previous patch	2021-08-09 16:16:14 +02:00
Piotr Dulikowski	20cbe7fa2f	hints: error injection for pausing hint replay Adds a `hinted_handoff_pause_hint_replay` error injection point. When enabled, hint replay logic behaves as if it is run, but it gets stuck in a loop and no hints are actually sent until the point is disabled again. This injection point will be useful in dtests - it will simulate infinitely slow hint replay and will make it possible to test how some operations behave while hint replay logic is running. The first intended use case of this injection point is testing the HTTP API for waiting for hints (#8728). Refs: #6649	2021-08-09 16:16:14 +02:00
Piotr Dulikowski	29993f7745	hints: coroutinize lambda inside send_one_file Converts the lambda invoked for every commitlog entry in a hints file into a coroutine.	2021-08-09 16:16:14 +02:00
Piotr Dulikowski	d41d39bbcd	hints: add functions for creating and waiting for sync points Adds functions which allow to create per-shard sync points and wait for them.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	e18b29765a	hints: add hint sync point structure Adds a sync_point structure. A sync point is a (possibly incomplete) mapping from hint queues to a replay position in it. Users will be able to create sync points consisting of the last written positions of some hint queues, so then they can wait until hint replay in all of the queues reach that point. The sync point supports serialization - first it is serialized with the help of IDL to a binary form, and then converted to a hexadecimal string. Deserialization is also possible.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	70df9973f3	hints: make it possible to wait until hints are replayed Adds necessary infrastructure which allows, for a given endpoint manager, to wait until hints are replayed up to a specified position. An abort source must be specified which, if triggered, cancels waiting for hint replay. If the endpoint manager is stopped, current waiters are dismissed with an exception.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	93f244426d	hints: track the RP of the last replayed position Keeps track of a position which serves as an upper bound for positions of already replayed hints - i.e. all hints with replay positions strictly lower than it are considered replayed. In order to accurately track this bound during hint replay, a std::map is introduced which contains positions of hints which are currently being sent.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	03e2e671cd	hints: track the RP of the last written hint The position of the last written hint is now tracked by the endpoint hints manager. When manager is constructed and no hints are replayed yet, the last written hint position is initialized to the beginning of a fake segment with ID corresponding to the current number of milliseconds since the epoch. This choice makes sure that, in case a new hint sync point is created before any hints are written, the position recorded for that hint queue will be larger than all replay positions in segments currently stored on disk.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	27d0d598fd	hints: change last_attempted_rp to last_succeeded_rp Instead of tracking the last position for which hint sending is attempted, the last successfully replayed position is tracked. The previous variable was used to calculate the position from which hint replay should restart in case of an error, in the following way: _last_not_complete_rp = ctx_ptr->first_failed_rp.value_or( ctx_ptr->last_attempted_rp.value_or(_last_not_complete_rp)); Now, this formula uses the last_succeeded_rp in place of last_attempted_rp. This change does not have an effect on the choice of the starting position of the next retry: - If the hint at `last_attempted_rp` has succeeded, in the new algorithm the same position will be recorded in `last_succeeded_rp`, and the formula will yield the same result. - If the hint at `last_attempted_rp` has failed, it will be accounted into `first_failed_rp`, so the formula will yield the same result. The motivation for this change is that in the next commits of this PR we will start tracking the position of the last replayed hint per hint queue, and the meaning of the new variable makes it more useful - when there are no failed hints in the hint sending attempt, last_succeeded_rp gives us information that hints _up to this position_ were replayed; the last_attempted_rp variable can only tell us that hints _before that position_ were replayed successfully.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	08a7d79ffc	hints: rearrange error handling logic for hint sending Instead of calling the `on_hint_send_failure` method inside the hint sending task in places where an error occurs, we now let the exceptions be returned and handle them inside a single `then_wrapped` attached to the hint sending task. Apart from the `then_wrapped`, there is one more place which calls `on_hint_send_failure` - in the exception handler for the future which spawns the asynchronous hint sending task. It needs to be kept separate because it is a part of a separate task.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	45b04c94e0	hints: sort segments by ID, divide into foreign and local Endpoint hints manager keeps a commitlog instance which is used to write hints into new segments. This instance is re-created every 10 seconds, which causes the previous instance to leave its segments on disk. On the other hand, hints sender keeps a list of segments to replay which is updated only after it becomes empty. The list is repopulated with segments returned by the commitlog::get_segments_to_replay() method which does not specify the order of the segments returned. As a preparation for the upcoming hint sync points feature, this commit changes the order in which segments are replayed: - First, segments written by other shards are replayed. Such segments may appear in the queue because of segment rebalancing which is done at startup. The purpose of replaying "foreign" segments first is that they are problematic for hint sync points. For each hint queue, a hint sync point encodes a replay position of the last written hint on the local shard. Accounting foreign segments precisely would make the implementation more complicated. To make things simpler, waiting for sync points will always make sure that all foreign segments are replayed. This might sometimes cause more hints to be waited on than necessary if a restart occurs in the meantime. - Segments written by the local shard are replayed later, in order of their IDs. This makes sure that local hints are replayed in the order they were written to segments, and will make it possible to use replay positions to track progress of hint replay.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	f83699bb7c	Revert "db/hints: allow to forcefully update segment list on flush" This reverts commit `e48739a6da`. This commit removes the functionality from endpoint hints manager which allowed to flush hints immediately and forcefully update the list of segments to replay. The new implementation of waiting for hints will be based on replay positions returned by the commitlog API and it won't be necessary to forcefully update the segment list when creating a sync point.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	9c1d4e7e6c	Revert "db/hints: add a metric for counting processed files" This reverts commit `5a49fe74bb`. This commit removes a metric which tracks how many segments were replayed during current runtime. It was necessary for current "wait for hints" mechanism which is being replaced with a different one - therefore we can remove the metric.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	3b851a5ebd	Revert "db/hints: make it possible to wait until current hints are sent" This reverts commit `427bbf6d86`. This commit removes the infrastructure which allows to wait until current hints are replayed in a given hint queue. It will be replaced with a different mechanism in later commits.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	4a35d138f6	Revert "storage_proxy: add functions for syncing with hints queue" This reverts commit `244738b0d5`. This commit removes create_hint_queue_sync_point and check_hint_queue_sync_point functions from storage_proxy, which were used to wait until local hints are sent out to particular nodes. Similar methods will be reintroduced later in this PR, with a completely different implementation.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	0d74dee683	Revert "messaging_service: add verbs for hint sync points" This reverts commit `82c419870a`. This commit removes the HINT_SYNC_POINT_CREATE and HINT_SYNC_POINT_CHECK rpc verbs. The upcoming HTTP API for waiting for hint replay will be restricted to waiting for hints on the node handling the request, so there is no need for new verbs.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	ff453d80ff	Revert "config: add wait_for_hint_replay_before_repair option" This reverts commit `86d831b319`. This commit removes the wait_for_hints_before_repair option. Because a previous commit in this series removes the logic from repair which caused it to wait for hints to be replayed, this option is now useless. We can safely remove this option because it is not present in any release yet.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	e3c32c897a	Revert "hints: dismiss segment waiters when hint queue can't send" This reverts commit `9d68824327`. First, we are reverting existing infrastructure for waiting for hints in order to replace it with a different one, therefore this commit needs to be reverted as well. Second, errors during hint replay can occur naturally and don't necessarily indicate that no progress can be made - for example, the target node is heavily loaded and some hints time out. The "waiting for hints" operation becomes a user-issued command, so it's not as vital to ensure liveness.	2021-08-09 09:06:23 +02:00
Avi Kivity	3b5e312800	db: schema_tables: clean up read_schema_partition_for_keyspace() coroutine captures read_schema_partition_for_keyspace() copies some parameters to capture them in a coroutine, but the same can be achieved more cleanly by changing the reference parameters to value parameters, so do that. Test: unit (dev) Closes #9154	2021-08-08 12:55:10 +03:00
Asias He	6350a19f73	compaction: Move compaction_strategy.hh to compaction dir The top dir is a mess. Move compaction_strategy.hh and compaction_strategy_type.hh to the new home.	2021-08-07 08:06:37 +08:00
Avi Kivity	885ca2158e	db: schema_tables: reindent Following conversion to corotuines in `fc91e90c59`, remove extra indents and braces left to make the change clearer. One variable had to be renamed since without the braces it duplicated another variable in the same block. Test: unit (dev) Closes #9125	2021-08-02 22:36:57 +02:00
Nadav Har'El	fc91e90c59	Merge 'db: schema_tables: coroutinize' from Avi Kivity schema_tables is quite hairy, but can be easily simplified with coroutines. In addition to switching future-returning functions to coroutines, we also switch Seastar threads to coroutines. This is less of a clear-cut win; the motivation is to reduce the chances of someone calling a function that expects to run in a thread from a non-thread context. This sometimes works by accident, but when it doesn't, it's pretty bad. So a uniform calling convention has some benefit. I left the extra indents in, since the indent-fixing patch is hard to rebase in case a rebase is needed. I will follow up with an indent fix post merge. Test: unit (dev, debug, release) Closes #9118 * github.com:scylladb/scylla: db: schema_tables: drop now redundant #includes db: schema_tables: coroutinize drop_column_mapping() db: schema_tables: coroutinize column_mapping_exists() db: schema_tables: coroutinize get_column_mapping() db: schema_tables: coroutinize read_table_mutations() db: schema_tables: coroutinize create_views_from_schema_partition() db: schema_tables: coroutinize create_views_from_table_row() db: schema_tables: unpeel lw_shared_ptr in create_Tables_from_tables_partition() db: schema_tables: coroutinize create_tables_from_tables_partition() db: schema_tables: coroutinize create_table_from_name() db: schema_tables: coroutinize read_table_mutations() db: schema_tables: coroutinize merge_keyspaces() db: schema_tables: coroutinize do_merge_schema() db: schema_tables: futurize and coroutinize merge_functions() db: schema_tables: futurize and coroutinize user_types_to_drop::drop db: schema_tables: futurize and coroutinize merge_types() db: schema_tables: futurize and coroutinize merge_tables_and_views() db: schema_tables: coroutinize store_column_mapping() db: schema_tables: futurize and coroutinize read_tables_for_keyspaces() db: schema_tables: coroutinize read_table_names_of_keyspace() db: schema_tables: coroutinize recalculate_schema_version() db: schema_tables: coroutinize merge_schema() db: schema_tables: introduce and use with_merge_lock() db: schema_tables: coroutinize update_schema_version_and_announce() db: schema_tables: coroutinize read_keyspace_mutation() db: schema_tables: coroutinize read_schema_partition_for_table() db: schema_tables: coroutinize read_schema_partition_for_keyspace() db: schema_tables: coroutinize query_partition_mutation() db: schema_tables: coroutinize read_schema_for_keyspaces() db: schema_tables: coroutinize convert_schema_to_mutations() db: schema_tables: coroutinize calculate_schema_digest() db: schema_tables: coroutinize save_system_schema()	2021-08-02 13:43:53 +03:00
Tomasz Grabiec	c3ada1a145	Merge "count row (sstables/row cache/memtables) and range (memtables) tombstone reads" from Michael Fixes #7749.	2021-08-01 23:13:18 +02:00
Avi Kivity	ca59754e68	db: schema_tables: drop now redundant #includes	2021-08-01 20:13:15 +03:00
Avi Kivity	40fdbf9558	db: schema_tables: coroutinize drop_column_mapping()	2021-08-01 20:13:15 +03:00
Avi Kivity	7d46300af2	db: schema_tables: coroutinize column_mapping_exists()	2021-08-01 20:13:15 +03:00
Avi Kivity	74b2200f4d	db: schema_tables: coroutinize get_column_mapping()	2021-08-01 20:13:15 +03:00
Avi Kivity	f19ca7aaaa	db: schema_tables: coroutinize read_table_mutations()	2021-08-01 20:13:15 +03:00
Avi Kivity	81a2be17b6	db: schema_tables: coroutinize create_views_from_schema_partition()	2021-08-01 20:13:15 +03:00
Avi Kivity	15f2fd2a23	db: schema_tables: coroutinize create_views_from_table_row()	2021-08-01 20:13:15 +03:00
Avi Kivity	0843d441ff	db: schema_tables: unpeel lw_shared_ptr in create_Tables_from_tables_partition() The tables local is a lw_shared_ptr which is created and then refeferenced before returning. It can be unpeeled to the pointed-to type, resulting in one less allocation.	2021-08-01 20:13:15 +03:00
Avi Kivity	66054d24c4	db: schema_tables: coroutinize create_tables_from_tables_partition()	2021-08-01 20:13:15 +03:00
Avi Kivity	82ba3c5f4a	db: schema_tables: coroutinize create_table_from_name()	2021-08-01 20:13:15 +03:00
Avi Kivity	862f491605	db: schema_tables: coroutinize read_table_mutations()	2021-08-01 20:13:15 +03:00
Avi Kivity	91c1a29808	db: schema_tables: coroutinize merge_keyspaces()	2021-08-01 20:13:15 +03:00
Avi Kivity	78fc05922b	db: schema_tables: coroutinize do_merge_schema() It is now using an internal thread, so unpeel is and replace future::get() with co_await.	2021-08-01 20:13:15 +03:00
Avi Kivity	9680d9e76c	db: schema_tables: futurize and coroutinize merge_functions() Right now, merge_functions() expects to be called in a thread. Remove that requirement by converting it into a coroutine and returning a future. De-threading helps reduce errors where something expects to be called in a thread, but isn't.	2021-08-01 20:13:15 +03:00

1 2 3 4 5 ...

2245 Commits