scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-04 14:03:06 +00:00

Author	SHA1	Message	Date
Avi Kivity	cc8fc73761	Merge 'hints: fix bugs in HTTP API for waiting for hints found by running dtest in debug mode' from Piotr Dulikowski This series of commits fixes a small number of bugs with current implementation of HTTP API which allows to wait until hints are replayed, found by running the `hintedhandoff_sync_point_api_test` dtest in debug mode. Refs: #9320 Closes #9346 * github.com:scylladb/scylla: commitlog: make it possible to provide base segment ID hints: fill up missing shards with zeros in decoded sync points hints: propagate abort signal correctly in wait_for_sync_point hints: fix use-after-free when dismissing replay waiters	2021-09-15 12:55:54 +03:00
Avi Kivity	daf028210b	build: enable -Winconsistent-missing-override warning This warning can catch a virtual function that thinks it overrides another, but doesn't, because the two functions have different signatures. This isn't very likely since most of our virtual functions override pure virtuals, but it's still worth having. Enable the warning and fix numerous violations. Closes #9347	2021-09-15 12:55:54 +03:00
Piotr Dulikowski	91163fcfa5	commitlog: make it possible to provide base segment ID Adds a configuration option to the commitlog: base_segment_id. When provided, the commitlog uses this ID as a base of its segment IDs instead of calculating it based on the number of milliseconds between the epoch and boot time. This is needed in order for the feature which allows to wait for hints to be replayed to work - it relies on the replay positions monotonically increasing. Endpoint managers periodically re-creates its commitlog instance - if it is re-created when there are no segments on disk, currently it will choose the number of milliseconds between the epoch and boot time, which might result in segments being generated with the same IDs as some segments previously created and deleted during the same runtime.	2021-09-15 11:04:34 +02:00
Piotr Dulikowski	486421c58c	hints: fill up missing shards with zeros in decoded sync points Between encoding and decoding of a sync point, the node might have been restarted and resharded with increased shard count. During resharding, existing hints segments might have been moved to new shards. Because of that, we need to make sure that we wait for foreign segments to be replayed on the new shards too. This commit modifies the sync point decoding logic so that it places a zero replay position for new shards. Additionally, a (incorrect) shard count check is removed from `storage_proxy::wait_for_hint_sync_point` because now the shard count in decoded sync point is guaranteed to be not less than the node's current shard count.	2021-09-15 11:04:34 +02:00
Piotr Dulikowski	77f2448b2c	hints: propagate abort signal correctly in wait_for_sync_point When `manager::wait_for_sync_point` is called, the abort source from the arguments (`as`) might have already been triggered. In such case, the subscription which was supposed to trigger the `local_as` abort source won't be run, and the code will wait indefinitely for hints to be replayed instead of checking the replay status and returning immediately. This commit fixes the problem by manually triggering `local_as` if `as` have been triggered.	2021-09-14 14:27:01 +02:00
Piotr Dulikowski	8e29ebc5d5	hints: fix use-after-free when dismissing replay waiters When the promise waited on in the `wait_until_hints_are_replayed_up_to` function is resolved, a continuation runs which prints a log line with information about this event. The continuation captures a pointer to the hints sender and uses it to get information about the endpoint whose hints are waited for. However, at this point the sender might have been deleted - for example, when the node is being stopped and everybody waiting for hints is dismissed. This commit fixes the use-after-free by getting all necessary information while the sender is guaranteed to be alive and captures it in the continuation's capture list.	2021-09-14 13:46:16 +02:00
Avi Kivity	3f2c680b70	Merge 'Add initial support for WebAssembly in user-defined functions (UDF)' from Piotr Sarna This series adds very basic support for WebAssembly-based user-defined functions. This series comes with a basic set of tests which were used to designate a minimal goal for this initial implementation. Example usage: ```cql CREATE FUNCTION ks.fibonacci (str text) RETURNS NULL ON NULL INPUT RETURNS boolean LANGUAGE xwasm AS ' (module (func $fibonacci (param $n i32) (result i32) (if (i32.lt_s (local.get $n) (i32.const 2)) (return (local.get $n)) ) (i32.add (call $fibonacci (i32.sub (local.get $n) (i32.const 1))) (call $fibonacci (i32.sub (local.get $n) (i32.const 2))) ) ) (export "fibonacci" (func $fibonacci)) ) ' ``` Note that the language is currently called "xwasm" as in "experimental wasm", because its interface is still subject to change in the future. Closes #9108 * github.com:scylladb/scylla: docs: add a WebAssembly entry cql-pytest: add wasm-based tests for user-defined functions main: add wasm engine instantiation treewide: add initial WebAssembly support to UDF wasm: add initial WebAssembly runtime implementation db: add wasm_engine pointer to database lang: add wasm_engine service import wasmtime.hh lua: move to lang/ directory cql3: generalize user-defined functions for more languages	2021-09-14 11:34:20 +03:00
Avi Kivity	e9ae9279e8	system_keyspace: reindent after conversion to class Conversion to class left indentation in ruins, but that can be easily fixed. 'git diff -w' reports no changes. Closes #9339	2021-09-14 08:49:24 +03:00
Piotr Sarna	62e8c89a9c	treewide: add initial WebAssembly support to UDF This commit adds a very basic support for user-defined functions coded in wasm. The support is very limited (only a few types work) and was not tested against reactor stalls and performance in general.	2021-09-13 19:03:58 +02:00
Avi Kivity	e70b9d4835	system_keyspace: convert from namespace to class All the namespace scope functions in system_keyspace have no place to store context, so they must store their context in global variables. This prevents conversion of those global variables to constructor-provided depdendencies. Take the first step towards providing a place to store the context by converting system_keyspace to a class. All the functions are static, so no context is yet available, but we can de-static-ify them incrementally in the future and store the context in class members. Indentation is a mess, but can be easily fixed later.	2021-09-13 15:14:14 +03:00
Avi Kivity	115d6d8d4c	system_keyspace: prepare forward-declared members In anticipation of making system_keyspace a class instead of a namespace, rename any member that is currently forward-declared, since one can't forward-declare a class member. Each member is taken out of the system_keyspace namespace and gains a system_keyspace prefix. Aliases are added to reduce code churn. The result isn't lovely, but can be adjusted later.	2021-09-13 15:11:26 +03:00
Avi Kivity	c6ce81d6a0	system_keyspace: rearrange legacy subnamespace Merge two fragments together, in anticipation of making 'legacy' s struct instead of a namespace (when system_keyspace is a class, we can't nest a namespace inside it).	2021-09-13 15:10:15 +03:00
Avi Kivity	6d379ae6f9	system_keyspace: remove outdated java code This code has been rewritten and not removed, or is not needed. Remove it to reduce clutter.	2021-09-13 15:08:57 +03:00
Piotr Sarna	4e952df470	lua: move to lang/ directory Support for more languages is comming, so let's group them in a separate directory.	2021-09-13 11:01:33 +02:00
Piotr Sarna	46c6603fe0	cql3: generalize user-defined functions for more languages In order to support more languages than just Lua in the future, Lua-specific configuration is now extracted to a separate structure.	2021-09-13 11:01:33 +02:00
Avi Kivity	c5f52f9d97	schema_tables: don't flush in tests Flushing schema tables is important for crash recovery (without a flush, we might have sstables using a new schema before the commitlog entry noting the schema change has been replayed), but not important for tests that do not test crash recovery. Avoiding those flushes reduces system, user, and real time on tests running on a consumer-level SSD. before: real 8m51.347s user 7m5.743s sys 5m11.185s after: real 7m4.249s user 5m14.085s sys 2m11.197s Note real time is higher that user+sys time divided by the number of hardware threads, indicating that there is still idle time due to the disk flushing, so more work is needed. Closes #9319	2021-09-12 11:32:13 +03:00
Tomasz Grabiec	83113d8661	Merge "raft: new schema for storing raft snapshots" from Pavel Solodovnikov Previously, the layout for storing raft snapshot descriptors contained a `config` field, which had `blob` data type. That means `raft::configuration` for the snapshot was serialized as a whole in binary form. It's convenient to implement and is the most compact form of representing the data, but: 1. Hard to debug due to the need to de-serialize the data. 2. Plants a time bomb wrt. changing data layout and also the documentation in the future. Remove the `config` field from `system.raft_snapshots` and extract it to a separate `system.raft_config` table to store the data in exploded form. Also, modify the schema of `system.raft_snapshots` table in the following way: add a `server_id` field as a part of composite partition key ((group_id, server_id)) to be able to start multiple raft servers belonging to one raft group on the same scylla node. Rename `id` field in `raft_snapshots` to `snapshot_id` so it's self-documenting. Rename `snapshot_id` from clustering key since a given server can have only one snapshot installed at a time. Note that the `raft::server_address` stucture contains an opaque `info` member, which is `bytes`, but in the `raft_config` table we use `ip_addr inet` field, instead. We always know that the corresponding member field is going to contain an IP address (either v4 or v6) of a given raft server. So, now the snapshots schema looks like this: CREATE TABLE raft_snapshots ( group_id timeuuid, server_id uuid, snapshot_id uuid, idx int, term int, -- no `config` field here, moved to `raft_config` table PRIMARY KEY ((group_id, server_id)) ) CREATE TABLE raft_config ( group_id timeuuid, my_server_id uuid, server_id uuid, disposition text, -- can be either 'CURRENT` or `PREVIOUS' can_vote bool, ip_addr inet, PRIMARY KEY ((group_id, my_server_id), server_id, disposition) ); This way it's much easier to extend the schema with new fields, very easy to debug and inspect via CQL, and it's much more descriptive in terms of self-documentation. Tests: unit(dev) * manmanson/raft_snapshots_new_schema_v2: test: adjust `schema_change_test` to include new `system.raft_config` table raft: new schema for storing raft snapshots raft: pass server id to `raft_sys_table_storage` instance	2021-09-10 20:41:59 +02:00
Avi Kivity	16116ac631	interval: constrain comparator parameters The interval template member functions mostly accept tri-comparators but a few functions accept less-comparators. To reduce the chance of error, and to provide better error messages, constrain comparator parameters to the expected signature. In one case (db/size_estimates_virtual_reader.cc) the caller had to be adjusted. The comparator supported comparisons of the interval value type against other types, but not against itself. To simplify things, we add that signature too, even though it will never be called. Closes #9291	2021-09-10 16:43:16 +02:00
Avi Kivity	c1028de22a	Merge 'Introduce native reversed format' from Botond Dénes We define the native reverse format as a reversed mutation fragment stream that is identical to one that would be emitted by a table with the same schema but with reversed clustering order. The main difference to the current format is how range tombstones are handled: instead of looking at their start or end bound depending on the order, we always use them as-usual and the reversing reader swaps their bounds to facilitate this. This allows us to treat reversed streams completely transparently: just pass along them a reversed schema and all the reader, compacting and result building code is happily ignorant about the fact that it is a reversed stream. This series is the first step towards implementing efficient reverse reads. It allows us to remove all the special casing we have in various places for reverse reads and thus treating reverse streams transparently in all the middle layers. The only layers that have to know about the actual reversing are mutation sources proper. The plan is that when reading in reverse we create a reversed schema in the top layer then pass this down as the schema for the read. There are two layers that will need to act on this reversed schema: * The layer sitting on top of the first layer which still can't handle reversed streams, this layer will create a reversed reader to handle the transition. * The mutation source proper: which will obtain the underlying schema and will emit the data in reverse order. Once all the mutation sources are able to handle reverse reads, we can get rid of the reverse reader entirely. Refs: #1413 Tests: unit(dev) TODO: * v2 * more testing Also on: https://github.com/denesb/scylla.git reverse-reads/v3 Changelog v3: * Drop the entire schema transformation mechanism; * Drop reversing from `schema_builder()`; * Don't keep any information about whether the schema is reversed or not in the schema itself, instead make reversing deterministic w.r.t. schema version, such that: `s.version() == s.make_reversed().make_reversed().version()`; * Re-reverse range tombstones in `streaming_mutation_freezer`, so `reconcilable_results` sent to the coordinator during read repair still use the old reverse format; v2: * Add `data_type reversed(data_type)`; * Add `bound_kind reverse_kind(bound_kind)`; * Make new API safer to use: - `schema::underlying_type()`: return this when unengaged; - `schema::make_transformed()`: noop when applying the same transformation again; * Generalize reversed into transformation. Add support to transferring to remote nodes and shards by way of making `schema_tables` aware of the transformation; * Use reverse schema everywhere in reverse reader; Closes #9184 * github.com:scylladb/scylla: range_tombstone_accumulator: drop _reversed flag test/boost/mutation_test: add test for mutation::consume() monotonicity test/boost/flat_mutation_reader_test: more reversed reader tests flat_mutation_reader: make_reversing_reader(): implement fast_forward_to(partition_range) flat_mutation_reader: make_reversing_reader(): take ownership of the reader test/lib/mutation_source_test: add consistent log to all methods mutation: introduce reverse() mutation_rebuilder: make it standalone mutation: make copy constructor compatible with mutation_opt treewide: switch to native reversed format for reverse reads mutation: consume(): add native reverse order mutation: consume(): don't include dummy rows query: add slice reversing functions partition_slice_builder: add range mutating methods partition_slice_builder: add constructor with slice query: specific_ranges: add non-const ranges accessor range_tombstone: add reverse() clustering_bounds_comparator: add reverse_kind() schema: introduce make_reversed() schema: add a transforming copy constructor utils: UUID_gen: introduce negate() types: add reversed(data_type) docs: design-notes: add reverse-reads.md	2021-09-09 15:50:22 +03:00
Botond Dénes	f02632aeb0	range_tombstone_accumulator: drop _reversed flag	2021-09-09 15:42:15 +03:00
Piotr Sarna	5d7c765422	db,view: split stopping view builder to drain+stop In order to be able to avoid a deadlock when CQL server cannot be started, the view builder shutdown procedure is now split to two parts - - drain and stop. Drain is performed before storage proxy shutdown, but stop() will be called even before drain is scheduled. The deadlock is as follows: - view builder creates a reader permit in order to be able to read from system tables - CQL server fails to start, shutdown procedure begins - view builder stop() is not called (because it was not scheduled yet), so it holds onto its reader permit - database shutdown procedure waits for all permits to be destroyed, and it hangs indefinitely because view builder keeps holding its permit.	2021-09-08 10:52:40 +02:00
Avi Kivity	705f957425	Merge "Generalize TLS creds builder configuration" from Pavel E " There are 4 places out there that do the same steps parsing "client_\|server_encryption_options" and configuring the seastar::tls::creds_builder with the values (messaging, redis, alternator and transport). Also to make redis and transport look slimmer main() cleans the client_encryption_options by ... parsing it too. This set introduces a (coroutinized) helper to configure the creds_builder with map<string, string> and removes the options beautification from main. tests: unit(dev), dtest.internode_ssl_test(dev) " * 'br-generalize-tls-creds-builder-configuration' of https://github.com/xemul/scylla: code: Generalize tls::credentials_builder configuration transport, redis: Do not assume fixed encryption options messaging: Move encryption options parsing to ms main: Open-code internode encryption misconfig warning main, config: Move options parsing helpers	2021-09-01 14:19:19 +03:00
Avi Kivity	8b59e3a0b1	Merge ' cql3: Demand ALLOW FILTERING for unlimited, sliced partitions ' from Dejan Mircevski Return the pre- `6773563d3` behavior of demanding ALLOW FILTERING when partition slice is requested but on potentially unlimited number of partitions. Put it on a flag defaulting to "off" for now. Fixes #7608; see comments there for justification. Tests: unit (debug, dev), dtest (cql_additional_test, paging_test) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #9126 * github.com:scylladb/scylla: cql3: Demand ALLOW FILTERING for unlimited, sliced partitions cql3: Track warnings in prepared_statement test: Use ALLOW FILTERING more strictly cql3: Add statement_restrictions::to_string	2021-08-31 18:05:26 +03:00
Dejan Mircevski	2f28f68e84	cql3: Demand ALLOW FILTERING for unlimited, sliced partitions When a query requests a partition slice but doesn't limit the number of partitions, require that it also says ALLOW FILTERING. Although do_filter() isn't invoked for such queries, the performance can still be unexpectedly slow, and we want to signal that to the user by demanding they explicitly say ALLOW FILTERING. Because we now reject queries that worked fine before, existing applications can break. Therefore, the behavior is controlled by a flag currently defaulting to off. We will default to "on" in the next Scylla version. Fixes #7608; see comments there for justification. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2021-08-31 10:45:41 -04:00
Pavel Solodovnikov	8d3c0ee9b6	raft: new schema for storing raft snapshots Previously, the layout for storing raft snapshot descriptors contained a `config` field, which had `blob` data type. That means `raft::configuration` for the snapshot was serialized as a whole in binary form. It's convenient to implement and is the most compact form of representing the data, but: 1. Hard to debug due to the need to de-serialize the data. 2. Plants a time bomb wrt. changing data layout and also the documentation in the future. Remove the `config` field from `system.raft_snapshots` and extract it to a separate `system.raft_config` table to store the data in exploded form. Also, modify the schema of `system.raft_snapshots` table in the following way: add a `server_id` field as a part of composite partition key ((group_id, server_id)) to be able to start multiple raft servers belonging to one raft group on the same scylla node. Rename `id` field in `raft_snapshots` to `snapshot_id` so it's self-documenting. Rename `snapshot_id` from clustering key since a given server can have only one snapshot installed at a time. Note that the `raft::server_address` stucture contains an opaque `info` member, which is `bytes`, but in the `raft_config` table we use `ip_addr inet` field, instead. We always know that the corresponding member field is going to contain an IP address (either v4 or v6) of a given raft server. So, now the snapshots schema looks like this: CREATE TABLE raft_snapshots ( group_id timeuuid, server_id uuid, snapshot_id uuid, idx int, term int, -- no `config` field here, moved to `raft_config` table PRIMARY KEY ((group_id, server_id)) ) CREATE TABLE raft_config ( group_id timeuuid, my_server_id uuid, server_id uuid, disposition text, -- can be either 'CURRENT` or `PREVIOUS' can_vote bool, ip_addr inet, PRIMARY KEY ((group_id, my_server_id), server_id, disposition) ); This way it's much easier to extend the schema with new fields, very easy to debug and inspect via CQL, and it's much more descriptive in terms of self-documentation. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-08-27 09:24:46 +03:00
Pavel Solodovnikov	c0854a0f62	raft: create system tables only when `raft` experimental feature is set Also introduce a tiny function to return raft-enabled db config for cql testing. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210826091432.279532-1-pa.solodovnikov@scylladb.com>	2021-08-26 12:21:12 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	fe479aca1d	reader_permit: add timeout member To replace the timeout parameter passed to flat_mutation_reader methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 14:29:44 +03:00
Pavel Solodovnikov	22794efc22	db: add experimental option for raft Introduce `raft` experimental option. Adjust the tests accordingly to accomodate the new option. It's not enabled by default when providing `--experimental=true` config option and should be requested explicitly via `--experimental-options=raft` config option. Hide the code related to `raft_group_registry` behind the switch. The service object is still constructed but no initialization is performed (`init()` is not called) if the flag is not set. Later, other raft-related things, such as raft schema changes, will also use this flag. Also, don't introduce a corresponding gossiper feature just yet, because again, it should be done after the raft schema changes API contract is stabilized. This will be done in a separate series, probably related to implementing the feature itself. Tests: unit(dev) Ref #9239. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210823121956.167682-1-pa.solodovnikov@scylladb.com>	2021-08-23 17:45:58 +03:00
Benny Halevy	e9aff2426e	everywhere: make deferred actions noexcept Prepare for updating seastar submodule to a change that requires deferred actions to be noexcept (and return void). Test: unit(dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:52 +03:00
Benny Halevy	ef8ec54970	commitlog: segment, segment_manager: mark methods noexcept Prepare for marking deferred_actions nexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:40 +03:00
Benny Halevy	4439e5c132	everywhere: cleanup defer.hh includes Get rid of unused includes of seastar/util/{defer,closeable}.hh and add a few that are missing from source files. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:39 +03:00
Pavel Emelyanov	e02b39ca3d	code: Generalize tls::credentials_builder configuration All the places in code that configure the mentioned creds builder from client_\|server_encryption_options now do it the same way. This patch generalizes it all in the utils:: helper. The alternator code "ignores" require_client_auth and truststore keys, but it's easy to make the generalized helper be compatible. Also make the new helper coroutinized from the beginning. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-08-20 18:05:41 +03:00
Pavel Emelyanov	aa88527375	main, config: Move options parsing helpers The get_or_default and is_true are two aux bits that are used to parse the config options. The former is duplicated in the alternator code as well. Put both in utils namespace for future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-08-20 17:53:41 +03:00
Calle Wilund	3633c077be	commitlog/config: Make hard size enforcement false by default + add config opt Refs #9053 Flips default for commitlog disk footprint hard limit enforcement to off due to observed latency stalls with stress runs. Instead adds an optional flag "commitlog_use_hard_size_limit" which can be turned on to in fact do enforce it. Sort of tape and string fix until we can properly tweak the balance between cl & sstable flush rate. Closes #9195	2021-08-15 15:10:27 +03:00
Asias He	97bb2e47ff	storage_service: Enable Repair Based Note Operations (RBNO) by default for replace We decided to enable repair based node operations by default for replace node operations. To do that, a new option --allowed-repair-based-node-ops is added. It lists the node operations that are allowed to enable repair based node operations. The operations can be bootstrap, replace, removenode, decommission and rebuild. By default, --allowed-repair-based-node-ops is set to contain "replace". Note, the existing option --enable-repair-based-node-ops is still in play. It is the global switch to enable or disable the feature. Examples: - To enable bootstrap and replace node ops: ``` scylla --enable-repair-based-node-ops true --allowed-repair-based-node-ops replace,bootstrap ``` - To disable any repair based node ops: ``` scylla --enable-repair-based-node-ops false ``` Closes #9197	2021-08-15 13:30:46 +03:00
Piotr Sarna	84876a165b	db,schema_tables: add handling user-defined aggregates Aggregates are propagated, created and dropped very similarly to user-defined functions - a set of helper functions for aggregates are added based on the UDF implementation.	2021-08-13 11:14:11 +02:00
Piotr Sarna	58196e8ea6	db,view: avoid ignoring failed future in background view updates The code for handling background view updates used to propagate exceptions unconditionally, which leads to "exceptional future ignored" warnings if the update was put to background. From now on, the exception is only propagated if its future is actually waited on. Fixes #6187 Tested manually, the warning was not observed after the patch Closes #9179	2021-08-12 17:32:35 +03:00
Nadav Har'El	49ca1f86b2	Merge 'hints: error injection for pausing hint replay' from Piotr Dulikowski Adds a `hinted_handoff_pause_hint_replay` error injection point. When enabled, hint replay logic behaves as if it is run, but it gets stuck in a loop and no hints are actually sent until the point is disabled again. This injection point will be useful in dtests - it will simulate infinitely slow hint replay and will make it possible to test how some operations behave while hint replay logic is running. The first intended use case of this injection point is testing the HTTP API for waiting for hints (#8728). Refs: #6649 Closes #8801 * github.com:scylladb/scylla: hints: fix indentation after previous patch hints: error injection for pausing hint replay hints: coroutinize lambda inside send_one_file	2021-08-11 11:42:29 +03:00
Piotr Dulikowski	f2e1339f38	hints: use an abort_source with sleep_abortable in flush+send loop Each hint sender runs an asynchronous loop with tries to flush and then send hints. Between each attempt, it sleeps at most 10 seconds using sleep_abortable. However, an overload of sleep_abortable is used which does not take an abort_source - it should abort the sleep in case Seastar handles a SIGINT or SIGTERM signal. However, in order for that to work, the application must not prevent default handling of those signals in Seastar - but Scylla explicitly does it by disabling the `auto_handle_sigint_sigterm` option in reactor config. As a result, those sleeps are never aborted, and - because we wait for the async loops to stop - they can delay shutdown by at most 10 seconds. To fix that, an abort_source is added to the hints sender, and the abort_source is triggered when the corresponding sender is requested to stop. Fixes: #9176 Closes #9177	2021-08-11 10:32:53 +02:00
Piotr Dulikowski	68cac2eab7	hints: fix indentation after previous patch	2021-08-09 16:16:14 +02:00
Piotr Dulikowski	20cbe7fa2f	hints: error injection for pausing hint replay Adds a `hinted_handoff_pause_hint_replay` error injection point. When enabled, hint replay logic behaves as if it is run, but it gets stuck in a loop and no hints are actually sent until the point is disabled again. This injection point will be useful in dtests - it will simulate infinitely slow hint replay and will make it possible to test how some operations behave while hint replay logic is running. The first intended use case of this injection point is testing the HTTP API for waiting for hints (#8728). Refs: #6649	2021-08-09 16:16:14 +02:00
Piotr Dulikowski	29993f7745	hints: coroutinize lambda inside send_one_file Converts the lambda invoked for every commitlog entry in a hints file into a coroutine.	2021-08-09 16:16:14 +02:00
Piotr Dulikowski	d41d39bbcd	hints: add functions for creating and waiting for sync points Adds functions which allow to create per-shard sync points and wait for them.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	e18b29765a	hints: add hint sync point structure Adds a sync_point structure. A sync point is a (possibly incomplete) mapping from hint queues to a replay position in it. Users will be able to create sync points consisting of the last written positions of some hint queues, so then they can wait until hint replay in all of the queues reach that point. The sync point supports serialization - first it is serialized with the help of IDL to a binary form, and then converted to a hexadecimal string. Deserialization is also possible.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	70df9973f3	hints: make it possible to wait until hints are replayed Adds necessary infrastructure which allows, for a given endpoint manager, to wait until hints are replayed up to a specified position. An abort source must be specified which, if triggered, cancels waiting for hint replay. If the endpoint manager is stopped, current waiters are dismissed with an exception.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	93f244426d	hints: track the RP of the last replayed position Keeps track of a position which serves as an upper bound for positions of already replayed hints - i.e. all hints with replay positions strictly lower than it are considered replayed. In order to accurately track this bound during hint replay, a std::map is introduced which contains positions of hints which are currently being sent.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	03e2e671cd	hints: track the RP of the last written hint The position of the last written hint is now tracked by the endpoint hints manager. When manager is constructed and no hints are replayed yet, the last written hint position is initialized to the beginning of a fake segment with ID corresponding to the current number of milliseconds since the epoch. This choice makes sure that, in case a new hint sync point is created before any hints are written, the position recorded for that hint queue will be larger than all replay positions in segments currently stored on disk.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	27d0d598fd	hints: change last_attempted_rp to last_succeeded_rp Instead of tracking the last position for which hint sending is attempted, the last successfully replayed position is tracked. The previous variable was used to calculate the position from which hint replay should restart in case of an error, in the following way: _last_not_complete_rp = ctx_ptr->first_failed_rp.value_or( ctx_ptr->last_attempted_rp.value_or(_last_not_complete_rp)); Now, this formula uses the last_succeeded_rp in place of last_attempted_rp. This change does not have an effect on the choice of the starting position of the next retry: - If the hint at `last_attempted_rp` has succeeded, in the new algorithm the same position will be recorded in `last_succeeded_rp`, and the formula will yield the same result. - If the hint at `last_attempted_rp` has failed, it will be accounted into `first_failed_rp`, so the formula will yield the same result. The motivation for this change is that in the next commits of this PR we will start tracking the position of the last replayed hint per hint queue, and the meaning of the new variable makes it more useful - when there are no failed hints in the hint sending attempt, last_succeeded_rp gives us information that hints _up to this position_ were replayed; the last_attempted_rp variable can only tell us that hints _before that position_ were replayed successfully.	2021-08-09 09:24:36 +02:00
Piotr Dulikowski	08a7d79ffc	hints: rearrange error handling logic for hint sending Instead of calling the `on_hint_send_failure` method inside the hint sending task in places where an error occurs, we now let the exceptions be returned and handle them inside a single `then_wrapped` attached to the hint sending task. Apart from the `then_wrapped`, there is one more place which calls `on_hint_send_failure` - in the exception handler for the future which spawns the asynchronous hint sending task. It needs to be kept separate because it is a part of a separate task.	2021-08-09 09:24:36 +02:00

... 53 54 55 56 57 ...

4972 Commits