Commit Graph

27858 Commits

Author SHA1 Message Date
Asias He
cc44edb4e2 database: Detemplate run_async
I initially tried to use a noncopyable_function to avoid the unnecessary
template usage.

However, since database::apply_in_memory is a hot function. It is better
to use with_gate directly. The run_async function does nothing but calls
with_gate anyway.

Closes #9160
2021-08-12 07:53:10 +03:00
Takuya ASADA
e5bb88b69a scylla_cpuscaling_setup: change scaling_governor path
On some environment /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
does not exist even it supported CPU scaling.
Instead, /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is
avaliable on both environment, so we should switch to it.

Fixes #9191

Closes #9193
2021-08-11 15:31:14 +03:00
Nadav Har'El
89724533f8 test/cql-pytest: CREATE INDEX IF NOT EXISTS vs. Cassandra
What should the following pair of statements do?

    CREATE INDEX xyz ON tbl(a)
    CREATE INDEX IF NOT EXISTS xyz ON tbl(b)

There are two reasonable choices:
1. An index with the name xyz already exists, so the second command should
   do nothing, because of the "IF NOT EXISTS".
2. The index on tbl(b) does *not* yet exist, so the command should try to
   create it. And when it can't (because the name xyz is already taken),
   it should produce an error message.

Currently, Cassandra went with choice 1, and Scylla went with choice 2.

After some discussions on the mailing list, we agreed that Scylla's
choice is the better one and Cassandra's choice could be considered a
bug: The "IF NOT EXIST" feature is meant to allow idempotent creation of
an index - and not to make it easy to make mistakes without not noticing.
The second command listed above is most likely a mistake by the user,
not anything intentional: The command intended to ensure than an index
on column b exists, but after the silent success of the command, no such
index exists.

So this patch doesn't change any Scylla code (it just adds a comment),
and rather it adds a test which "enshrines" the current behavior.
The test passes on Scylla and fails on Cassandra so we tag it
"cassandra_bug", meaning that we consider this difference to be
intentional and we consider Cassandra's behavior in this case to be wrong.

Fixes #9182.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210811113906.2105644-1-nyh@scylladb.com>
2021-08-11 13:41:58 +02:00
Asias He
ce8fd051c9 storage_service: Fix argument in send_meta_data::do_receive
The extra status print is not needed in the log.

Fixes the following error:

ERROR 2021-08-10 10:54:21,088 [shard 0] storage_service -
service/storage_service.cc:3150 @do_receive: failed to log message:
fmt='send_meta_data: got error code={}, from node={}, status={}':
fmt::v7::format_error (argument not found)

Fixes #9183

Closes #9189
2021-08-11 11:35:30 +02:00
Asias He
040b626235 table: Fix is_shared assert for load and stream
The reader is used by load and stream to read sstables from the upload
directory which are not guaranteed to belong to the local shard.

Using the make_range_sstable_reader instead of
make_local_shard_sstable_reader.

Tests:

backup_restore_tests.py:TestBackupRestore.load_and_stream_using_snapshot_test
backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_2_test
backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_1_test
migration_test.py:TestLoadAndStream.load_and_stream_asymmetric_cluster_test
migration_test.py:TestLoadAndStream.load_and_stream_decrease_cluster_test
migration_test.py:TestLoadAndStream.load_and_stream_frozen_pk_test
migration_test.py:TestLoadAndStream.load_and_stream_increase_cluster_test
migration_test.py:TestLoadAndStream.load_and_stream_primary_replica_only_test

Fixes #9173

Closes #9185
2021-08-11 12:18:40 +03:00
Piotr Jastrzebski
db4c9199f5 sstables: remove unused uppermost_bound from clustering_ranges_walker and mutation_fragment_filter
Those methods are never used so it's better not to keep a dead code
around.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>

Closes #9188
2021-08-11 10:54:59 +02:00
Nadav Har'El
49ca1f86b2 Merge 'hints: error injection for pausing hint replay' from Piotr Dulikowski
Adds a `hinted_handoff_pause_hint_replay` error injection point. When
enabled, hint replay logic behaves as if it is run, but it gets stuck in
a loop and no hints are actually sent until the point is disabled again.

This injection point will be useful in dtests - it will simulate
infinitely slow hint replay and will make it possible to test how some
operations behave while hint replay logic is running.

The first intended use case of this injection point is testing the HTTP
API for waiting for hints (#8728).

Refs: #6649

Closes #8801

* github.com:scylladb/scylla:
  hints: fix indentation after previous patch
  hints: error injection for pausing hint replay
  hints: coroutinize lambda inside send_one_file
2021-08-11 11:42:29 +03:00
Piotr Dulikowski
f2e1339f38 hints: use an abort_source with sleep_abortable in flush+send loop
Each hint sender runs an asynchronous loop with tries to flush and then
send hints. Between each attempt, it sleeps at most 10 seconds using
sleep_abortable. However, an overload of sleep_abortable is used which
does not take an abort_source - it should abort the sleep in case
Seastar handles a SIGINT or SIGTERM signal. However, in order for that
to work, the application must not prevent default handling of those
signals in Seastar - but Scylla explicitly does it by disabling the
`auto_handle_sigint_sigterm` option in reactor config. As a result,
those sleeps are never aborted, and - because we wait for the async
loops to stop - they can delay shutdown by at most 10 seconds.

To fix that, an abort_source is added to the hints sender, and the
abort_source is triggered when the corresponding sender is requested to
stop.

Fixes: #9176

Closes #9177
2021-08-11 10:32:53 +02:00
Tomasz Grabiec
e177cd382b db: Remove superfluous } from read_command printout
Message-Id: <20210810131429.407903-1-tgrabiec@scylladb.com>
2021-08-10 17:32:34 +03:00
Michał Chojnowski
2aa0a2e6a1 test: perf: perf_collection: use the optimized version of bptree
Since key_compare does not conform to SimpleLessCompare, the benchmark
tests the non-optimized version of bptree (without SIMD key search).
We want to test the optimized version.

Closes #9180
2021-08-10 17:04:34 +03:00
Nadav Har'El
65381bd155 test/alternator: add tests for expression length limits
The DynamoDB documentation
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
describes several hard limits on the size of the size of expressions
(ProjectionExpression, ConditionExpression, UpdateExpression,
FilterExpression) and various elements they contain.

In this patch we begin testing those limits with a comprehensive test for
the *length* of each of these four expressions: we test that lengths up to
(and including) 4096 bytes are allowed but longer expressions are rejected.
We also add TODOs for additional documented limits that should be tested
in the future.

Currently, this test passes on DynamoDB but xfails on Alternator because
Alternator does *not* enforce any limits on the expression length. I don't
think this is a real problem, and we may consider keeping it this way,
but we should at least be aware that this difference exists and an
xfailing test will remind us.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210810081948.2012120-2-nyh@scylladb.com>
2021-08-10 12:06:21 +02:00
Nadav Har'El
9d49a32486 test/alternator: add tests for attribute name limits
DynamoDB limits attribute names in items to lengths of up 65535 bytes,
but in some cases (such as key attributes) the limit is lower - 255.
This patch adds tests for many of these cases.

All the new tests pass on DynamoDB, but some still xfail on Alternator
because Alternator is too lenient - sometimes allowing longer attribute
names than DynamoDB allows. While this may sound great, it also has
downsides: The oversized attribute names perform badly, and as they
grow, Alternator's internal limits will be reached as well, and result
in an unsightly "internal server error" being reported instead of the
expected user-friendly error.

Refs #9169.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210810081948.2012120-1-nyh@scylladb.com>
2021-08-10 12:06:13 +02:00
Avi Kivity
112cee4960 Merge "make sstable::make_reader() return flat_mutation_reader_v2" from Michael
"
* Make `sstable::make_reader()` return `flat_mutation_reader_v2`,
  retain the old one as `sstable::make_reader_v1()`

* Start weaning tests off `sstable::make_reader_v1()` (done all the
  easy ones, i.e. those not involving range tombstones)
"

* tag 'sstable-make-reader-v2-v1' of github.com:cmm/scylla:
  tests: use flat_mutation_reader_v2 in the easier part of sstable_3_x_test
  tests: upgrade the "buffer_overflow" test to flat_mutation_reader_v2
  tests: get rid of sstable::make_reader_v1() in broken_sstable_test
  tests: get rid of sstable::make_reader_v1() in the trivial cases
  sstables: make sstable::make_reader() return flat_mutation_reader_v2
2021-08-10 12:57:10 +03:00
Avi Kivity
a7ef826c2b Merge "Fold validation compaction into scrub" from Botond
"
Validation compaction -- although I still maintain that it is a good
descriptive name -- was an unfortunate choice for the underlying
functionality because Origin has burned the name already as it uses it
for a compaction type used during repair. This opens the door for
confusion for users coming from Cassandra who will associate Validation
compaction with the purpose it is used for in Origin.
Additionally, since Origin's validation compaction was not user
initiated, it didn't have a corresponding `nodetool` command to start
it. Adding such a command would create an operational difference between
us and Origin.

To avoid all this we fold validation compaction into scrub compaction,
under a new "validation" mode. I decided against using the also
suggested `--dry-mode` flag as I feel that a new mode is a more natural
choice, we don't have to define how it interacts with all the other
modes, unlike with a `--dry-mode` flag.

Fixes: #7736

Tests: unit(dev), manual(REST API)
"

* 'scrub-validation-mode/v2' of https://github.com/denesb/scylla:
  compaction/compaction_descriptor: add comment to Validation compaction type
  compaction/compaction_descriptor: compaction_options: remove validate
  api: storage_service: validate_keyspace -> scrub_keyspace (validate mode)
  compaction/compaction_manager: hide perform_sstable_validation()
  compaction: validation compaction -> scrub compaction (validate mode)
  compaction/compaction_descriptor: compaction_options: add options() accessor
  compaction/compaction_descriptor: compaction_options::scrub::mode: add validate
2021-08-10 12:18:35 +03:00
Michael Livshin
c0ba657a86 tests: use flat_mutation_reader_v2 in the easier part of sstable_3_x_test
That is, anything not involving range tombstones.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-09 19:20:48 +03:00
Michael Livshin
7c2854a094 tests: upgrade the "buffer_overflow" test to flat_mutation_reader_v2
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-09 19:20:48 +03:00
Michael Livshin
a4c43eda3a tests: get rid of sstable::make_reader_v1() in broken_sstable_test
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-09 19:20:48 +03:00
Michael Livshin
37c9f8f137 tests: get rid of sstable::make_reader_v1() in the trivial cases
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-09 19:20:48 +03:00
Michael Livshin
f07306d75c sstables: make sstable::make_reader() return flat_mutation_reader_v2
Rename the old version to `sstables::make_reader_v1()`, to have a
nicely searcheable eradication target.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-09 19:20:48 +03:00
Piotr Dulikowski
68cac2eab7 hints: fix indentation after previous patch 2021-08-09 16:16:14 +02:00
Piotr Dulikowski
20cbe7fa2f hints: error injection for pausing hint replay
Adds a `hinted_handoff_pause_hint_replay` error injection point. When
enabled, hint replay logic behaves as if it is run, but it gets stuck in
a loop and no hints are actually sent until the point is disabled again.

This injection point will be useful in dtests - it will simulate
infinitely slow hint replay and will make it possible to test how some
operations behave while hint replay logic is running.

The first intended use case of this injection point is testing the HTTP
API for waiting for hints (#8728).

Refs: #6649
2021-08-09 16:16:14 +02:00
Piotr Dulikowski
29993f7745 hints: coroutinize lambda inside send_one_file
Converts the lambda invoked for every commitlog entry in a hints file
into a coroutine.
2021-08-09 16:16:14 +02:00
Asias He
4ae6eae00a table: Get rid of table::run_compaction helper
The table::run_compaction is a trivial wrapper for
table::compact_sstables.

We have lots of similar {start, trigger, run}_compaction functions.
Dropping the run_compaction wrapper to reduce confusion.

Closes #9161
2021-08-09 14:02:54 +03:00
Tomasz Grabiec
e115fce8f7 Merge "raft: sometimes become a candidate even if outside the configuration" from Kamil
There are situations where a node outside the current configuration is
the only node that can become a leader. We become candidates in such
cases. But there is an easy check for when we don't need to; a comment was
added explaining that.

* kbr/candidate-outside-config-v3:
  raft: sometimes become a candidate even if outside the configuration
  raft: fsm: update _commit_idx when applying snapshot
2021-08-09 12:29:03 +02:00
Avi Kivity
1b618921be Merge 'hinted handoff: introduce HTTP API for waiting for hint replay (stateless version)' from Piotr Dulikowski
This PR introduces a new feature to hinted handoff: ability to wait until hints from given node are replayed towards a chosen set of nodes.

It replaces the old mechanism which waits for hints to be replayed before repair and exposes it through an HTTP API. The implementation is completely different, so this PR begins with a revert of the old functionality and then introduces the new implementation.

Waiting for hints is made possible with the help of "hint sync points". A sync point is a collection of positions in some hint queues from one node - those positions are encoded into the sync point's description as a hexadecimal string. The sync point consists only of its hexadecimal description - there is no state kept on any of the nodes.

Two operations are available through the HTTP API:

- `/hints_manager/waiting_point` (POST) - _Create a sync point_. Given a set of `target_hosts`, creates a sync point which encodes positions currently at the end of all queues pointing to any of the `target_hosts`.
- `/hints_manager/waiting_point` (GET) - _Wait or check the sync point_. Given a description of a sync point, checks if the sync point was already reached. If you provide a non-zero `timeout` parameter and the sync point is not reached yet, this endpoint will wait until it the point reached or the timeout expires.

Hinted handoff uses the commitlog framework in order to store and replay hints. Each entry (here, a serialized hint) can be identified by a "replay position", which contains the ID of the segment containing the hint, and its position in the file. Replay positions are ordered with respect to segment ID and then position in the file; because segment IDs are assigned sequentially and entries are also written sequentially, this order corresponds to the chronological order in which hints were written. This order also corresponds to the order in which hints are replayed, provided that hint segments are processed starting with the one with the smallest ID first.

The main idea is to track the positions of both the most recently written hint, and the most recently replayed hint. When creating a hint sync point, the position of the last written hint is encoded; when the sync point is waited on, the hints manager waits until the last replayed position reached the position encoded in the sync point. The description of the sync point encodes positions on a per-hint queue basis - separately for each shard, destination endpoint and hint type (regular or MV).

Note: although hints manager destroys and re-creates commitlog instances, the ordering described above still works - the ID of the first segment assigned by the commitlog instance corresponds to the number of milliseconds since the epoch, so commitlog instances created by newer instances will have larger IDs.

Before the hints manager is enabled, it performs segment _rebalancing_: for a given endpoint, it makes sure that each shard gets roughly the same number of hint segments. For example, if there are 3 shards and shard 1 has 7 segments, then shard 0 will get 2 segments, shard 1: 3 segments, and shard 2: 2 segments. Apart from distributing the work evenly between shards on startup, it also handles the case when the node is resharded - if the number of shards is reduced, segments from removed shards will be redistributed to lower shards.

Because of the possibility of segments being moved between shards on restart, this makes accurate tracking of hint replay harder. In order to simplify the problem, this PR changes the order in which hint segments are replayed - segments from other shards (here called "foreign" segments) are replayed first, before any "local" segment from this shard. Foreign segments are treated as if they were placed before the 0 replay position - when waiting for a hint sync point, we will __always__ wait for foreign segments to be replayed.

This behavior makes sure that hints generated before the sync point was created will be replayed - and, if segment rebalancing happened in the meantime, we will potentially replay some more segments which were moved across shards.

This PR starts with a revert of the "hints: delay repair until hints are replayed" (#8452) functionality. Some infrastructure introduced in the original PR started to be used by other places in the code, so this is not a simple revert of the merge commit - instead, commits of the old PR are reverted separately and modified in order to make the code compile.

The following commits from the original PR were omitted from the revert because the code introduced by them became used by other logic in repair:

- 0db45d1df5 (repair: introduce abort_source for repair abort)
- 3a2d09b644 (repair: introduce abort_source for shutdown)
- 49f4a2f968 (repair: plug in waiting for hints to be sent before repair)

Refs: #8102
Fixes: #8727

Tests: unit(dev)

Closes #8982

* github.com:scylladb/scylla:
  api: add HTTP API for hint sync points
  api: register hints HTTP API outside set_server_done
  storage_proxy: add functions for creating and waiting for hint sync pts
  hints: add functions for creating and waiting for sync points
  hints: add hint sync point structure
  utils,alternator: move base64 code from alternator to utils
  hints: make it possible to wait until hints are replayed
  hints: track the RP of the last replayed position
  hints: track the RP of the last written hint
  hints: change last_attempted_rp to last_succeeded_rp
  hints: rearrange error handling logic for hint sending
  hints: sort segments by ID, divide into foreign and local
  Revert "db/hints: allow to forcefully update segment list on flush"
  Revert "db/hints: add a metric for counting processed files"
  Revert "db/hints: make it possible to wait until current hints are sent"
  Revert "storage_proxy: add functions for syncing with hints queue"
  Revert "messaging_service: add verbs for hint sync points"
  Revert "storage_proxy: implement verbs for hint sync points"
  Revert "config: add wait_for_hint_replay_before_repair option"
  Revert "storage_proxy: coordinate waiting for hints to be sent"
  Revert "repair: plug in waiting for hints to be sent before repair"
  Revert "hints: dismiss segment waiters when hint queue can't send"
  Revert "storage_proxy: stop waiting for hints replay when node goes down"
  Revert "storage_proxy: add abort_source to wait_for_hints_to_be_replayed"
2021-08-09 10:59:07 +03:00
Piotr Dulikowski
7e3966c03e api: add HTTP API for hint sync points
Adds HTTP endpoints for manipulating hint sync points:

- /hinted_handoff/sync_point (POST) - creates a new sync point for
  hints towards nodes listed in the `target_hosts` parameter
- /hinted_handoff/sync_point (GET) - checks the status of the sync
  point. If a non-zero `timeout` parameter is given, it waits until the
  sync point is reached or the timeout expires.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
9091ce5977 api: register hints HTTP API outside set_server_done
Registration of the currently unused hinted handoff endpoints is moved
out from the set_server_done function. They are now explicitly
registered in main.cc by calling api::set_hinted_handoff and also
uninitialized by calling api::unset_hinted_handoff.

Setting/unsetting HTTP API separately will allow to pass a reference to
the sync_point_service without polluting the set_server_done function.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
14b00610b2 storage_proxy: add functions for creating and waiting for hint sync pts
Adds functions in storage_proxy which allow to create sync points and
wait for them.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
d41d39bbcd hints: add functions for creating and waiting for sync points
Adds functions which allow to create per-shard sync points and wait for
them.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
e18b29765a hints: add hint sync point structure
Adds a sync_point structure. A sync point is a (possibly incomplete)
mapping from hint queues to a replay position in it. Users will be able
to create sync points consisting of the last written positions of some
hint queues, so then they can wait until hint replay in all of the
queues reach that point.

The sync point supports serialization - first it is serialized with the
help of IDL to a binary form, and then converted to a hexadecimal
string. Deserialization is also possible.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
5a0942a0f8 utils,alternator: move base64 code from alternator to utils
The base64 encoding/decoding functions will be used for serialization of
hint sync point descriptions. Base64 format is not specific to
Alternator, so it can be moved to utils.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
70df9973f3 hints: make it possible to wait until hints are replayed
Adds necessary infrastructure which allows, for a given endpoint
manager, to wait until hints are replayed up to a specified position. An
abort source must be specified which, if triggered, cancels waiting for
hint replay.

If the endpoint manager is stopped, current waiters are dismissed with
an exception.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
93f244426d hints: track the RP of the last replayed position
Keeps track of a position which serves as an upper bound for positions
of already replayed hints - i.e. all hints with replay positions
strictly lower than it are considered replayed.

In order to accurately track this bound during hint replay, a std::map
is introduced which contains positions of hints which are currently
being sent.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
03e2e671cd hints: track the RP of the last written hint
The position of the last written hint is now tracked by the endpoint
hints manager.

When manager is constructed and no hints are replayed yet, the last
written hint position is initialized to the beginning of a fake segment
with ID corresponding to the current number of milliseconds since the
epoch. This choice makes sure that, in case a new hint sync point is
created before any hints are written, the position recorded for that
hint queue will be larger than all replay positions in segments
currently stored on disk.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
27d0d598fd hints: change last_attempted_rp to last_succeeded_rp
Instead of tracking the last position for which hint sending is
attempted, the last successfully replayed position is tracked.

The previous variable was used to calculate the position from which hint
replay should restart in case of an error, in the following way:

    _last_not_complete_rp = ctx_ptr->first_failed_rp.value_or(
        ctx_ptr->last_attempted_rp.value_or(_last_not_complete_rp));

Now, this formula uses the last_succeeded_rp in place of
last_attempted_rp. This change does not have an effect on the choice of
the starting position of the next retry:

- If the hint at `last_attempted_rp` has succeeded, in the new algorithm
  the same position will be recorded in `last_succeeded_rp`, and the
  formula will yield the same result.
- If the hint at `last_attempted_rp` has failed, it will be accounted
  into `first_failed_rp`, so the formula will yield the same result.

The motivation for this change is that in the next commits of this PR we
will start tracking the position of the last replayed hint per hint
queue, and the meaning of the new variable makes it more useful - when
there are no failed hints in the hint sending attempt, last_succeeded_rp
gives us information that hints _up to this position_ were replayed; the
last_attempted_rp variable can only tell us that hints _before that
position_ were replayed successfully.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
08a7d79ffc hints: rearrange error handling logic for hint sending
Instead of calling the `on_hint_send_failure` method inside the hint
sending task in places where an error occurs, we now let the exceptions
be returned and handle them inside a single `then_wrapped` attached to
the hint sending task.

Apart from the `then_wrapped`, there is one more place which calls
`on_hint_send_failure` - in the exception handler for the future which
spawns the asynchronous hint sending task. It needs to be kept separate
because it is a part of a separate task.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
45b04c94e0 hints: sort segments by ID, divide into foreign and local
Endpoint hints manager keeps a commitlog instance which is used to write
hints into new segments. This instance is re-created every 10 seconds,
which causes the previous instance to leave its segments on disk.

On the other hand, hints sender keeps a list of segments to replay which
is updated only after it becomes empty. The list is repopulated with
segments returned by the commitlog::get_segments_to_replay() method
which does not specify the order of the segments returned.

As a preparation for the upcoming hint sync points feature, this commit
changes the order in which segments are replayed:

- First, segments written by other shards are replayed. Such segments
  may appear in the queue because of segment rebalancing which is done
  at startup.
  The purpose of replaying "foreign" segments first is that they are
  problematic for hint sync points. For each hint queue, a hint sync
  point encodes a replay position of the last written hint on the local
  shard. Accounting foreign segments precisely would make the
  implementation more complicated. To make things simpler, waiting for
  sync points will always make sure that all foreign segments are
  replayed. This might sometimes cause more hints to be waited on than
  necessary if a restart occurs in the meantime.
- Segments written by the local shard are replayed later, in order of
  their IDs. This makes sure that local hints are replayed in the order
  they were written to segments, and will make it possible to use replay
  positions to track progress of hint replay.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
f83699bb7c Revert "db/hints: allow to forcefully update segment list on flush"
This reverts commit e48739a6da.

This commit removes the functionality from endpoint hints manager which
allowed to flush hints immediately and forcefully update the list of
segments to replay.

The new implementation of waiting for hints will be based on replay
positions returned by the commitlog API and it won't be necessary to
forcefully update the segment list when creating a sync point.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
9c1d4e7e6c Revert "db/hints: add a metric for counting processed files"
This reverts commit 5a49fe74bb.

This commit removes a metric which tracks how many segments were
replayed during current runtime. It was necessary for current "wait for
hints" mechanism which is being replaced with a different one -
therefore we can remove the metric.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
3b851a5ebd Revert "db/hints: make it possible to wait until current hints are sent"
This reverts commit 427bbf6d86.

This commit removes the infrastructure which allows to wait until
current hints are replayed in a given hint queue.

It will be replaced with a different mechanism in later commits.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
4a35d138f6 Revert "storage_proxy: add functions for syncing with hints queue"
This reverts commit 244738b0d5.

This commit removes create_hint_queue_sync_point and
check_hint_queue_sync_point functions from storage_proxy, which were
used to wait until local hints are sent out to particular nodes.

Similar methods will be reintroduced later in this PR, with a completely
different implementation.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
0d74dee683 Revert "messaging_service: add verbs for hint sync points"
This reverts commit 82c419870a.

This commit removes the HINT_SYNC_POINT_CREATE and HINT_SYNC_POINT_CHECK
rpc verbs.

The upcoming HTTP API for waiting for hint replay will be restricted
to waiting for hints on the node handling the request, so there is no
need for new verbs.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
4604bb21c3 Revert "storage_proxy: implement verbs for hint sync points"
This reverts commit 485036ac33.

This commit removes the handlers for HINT_SYNC_POINT_CREATE and
HINT_SYNC_POINT_CHECK verbs.

The upcoming HTTP API for waiting for hint replay will be restricted
to waiting for hints on the node handling the request, so there is no
need for new verbs.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
ff453d80ff Revert "config: add wait_for_hint_replay_before_repair option"
This reverts commit 86d831b319.

This commit removes the wait_for_hints_before_repair option. Because a
previous commit in this series removes the logic from repair which
caused it to wait for hints to be replayed, this option is now useless.

We can safely remove this option because it is not present in any
release yet.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
6c5d2fe0bf Revert "storage_proxy: coordinate waiting for hints to be sent"
This reverts commit 46075af7c4.

This commit removes the logic responsible for waiting for other nodes to
replay their hints. The upcoming HTTP API for waiting for hint replay
will be restricted to waiting for hints on the node handling the
request, so there is no need for coordinating multiple nodes.
2021-08-09 09:24:36 +02:00
Piotr Dulikowski
ecf854affc Revert "repair: plug in waiting for hints to be sent before repair"
This reverts commit 49f4a2f968.

The idea to wait for hints to be replayed before repair is not always a
good one. For example, someone might want to repair a small token range
or just one table - but hinted handoff cannot selectively replay hints
like this.

The fact that we are waiting for hints before repair caused a small
number of regressions (#8612, #8831).

This commit removes the logic in repair which caused it to wait for
hints. Additionally, the `storage_proxy.hh` include, which was
introduced in the commit being reverted is also removed and smaller
header files are included instead (gossiper.hh and fb_utilities.hh).
2021-08-09 09:22:26 +02:00
Piotr Dulikowski
e3c32c897a Revert "hints: dismiss segment waiters when hint queue can't send"
This reverts commit 9d68824327.

First, we are reverting existing infrastructure for waiting for hints in
order to replace it with a different one, therefore this commit needs to
be reverted as well.

Second, errors during hint replay can occur naturally and don't
necessarily indicate that no progress can be made - for example, the
target node is heavily loaded and some hints time out. The "waiting for
hints" operation becomes a user-issued command, so it's not as vital to
ensure liveness.
2021-08-09 09:06:23 +02:00
Piotr Dulikowski
afb4c85662 Revert "storage_proxy: stop waiting for hints replay when node goes down"
This reverts commit 22e06ace2c.

The upcoming HTTP API for waiting for hint replay will be restricted
to waiting for hints on the node handling the request, so we are
removing all infrastructure related to coordinating hint waiting -
therefore this commit needs to be reverted.
2021-08-09 09:06:23 +02:00
Piotr Dulikowski
035da96161 Revert "storage_proxy: add abort_source to wait_for_hints_to_be_replayed"
This reverts commit 958a13577c.

The `wait_for_hints_to_be_replayed` function is going to be completely
removed in this PR, so this commit needs to be reverted, too.
2021-08-09 09:06:23 +02:00
Takuya ASADA
b822c642e5 docker: fix housekeeping --repo-files to apt repository
Even we switched to Ubuntu based container image, housekeeping still
using yum repository.
It should be switched to apt repository.

Fixes #9144

Closes #9147
2021-08-09 07:47:03 +03:00