Commit Graph

31077 Commits

Author SHA1 Message Date
Kamil Braun
38f65e5a2e main: start direct failure detector service
We add the new direct failure detector to the list of services started
in the Scylla process.

To start the service, we need an implementation of `pinger` and `clock`.

`pinger` is implemented using existing GOSSIP_ECHO verb. The gossip echo
message requires the node's gossip generation number. We handle this by
embedding the pinger implementation inside `gossiper`, and making
`gossiper` update the generation number (cached inside the pinger class)
periodically.

`clock` is a simple implementation which uses `std::chrono::steady_clock`
and `seastar::sleep_until` underneath. Translating `steady_clock`
durations to `direct_failure_detector::clock` durations happens by taking
the number of ticks.

The service is currently not used, just initialized; no endpoints are
added and no listeners are registered yet, but the following commits
change that.
2022-05-09 13:14:42 +02:00
Kamil Braun
9551256e81 messaging_service: abortable version of send_gossip_echo
Use the new `send_message_abortable` function to implement an abortable
version of `send_gossip_echo`.

These echo messages will be used for direct failure detection.
2022-05-09 13:14:41 +02:00
Kamil Braun
f2548fc3fa message: abortable version of send_message
I want to be able to timeout `send_message`, but not through the
existing `send_message_timeout` API which forces me to use a particular
clock/duration/timepoint type. Introduce a more general
`send_message_abortable` API which gets an `abort_source&`, subscribes
to it, and uses the `rpc::cancellable` interface to cancel the RPC on
abort.

The function is 90% copy-pasta from `send_message{_timeout}`, only the
abort part is new.
2022-05-09 13:14:41 +02:00
Kamil Braun
c15f3a9698 test: raft: randomized_nemesis_test: remove old failure_detector
No longer used.
Split from the previous commit for a better diff.
2022-05-09 13:14:41 +02:00
Kamil Braun
915d329f1f test: raft: randomized_nemesis_test: use direct_failure_detector::failure_detector
Until now the nemesis test used its own failure detector implementation
which used one-way heartbeats.

Switch it to use the new direct failure detection service, which will
also be used in production code. Integrating it does require some work
however as we need to implement the `pinger` and `clock` interfaces
for the failure detector.

The service is sharded, but for simplicity of implementation we
implement rpcs and sleeps by routing the requests to shard 0, where
logical timers and network live.
2022-05-09 13:14:41 +02:00
Kamil Braun
e5fc0681d9 test: raft: randomized_nemesis_test: ping all shards on each tick
Right now the test is running entirely on shard 0, but we want to
introduce a sharded service to the test. The initial naive attempt of
doing that failed because the test would time out (reach the tick limit)
before any work distributed to other shards could even start. The
solution in this commit solves that by synchronizing the shards on each
tick.

When the test is ran with smp=1, the behavior is as before.
2022-05-09 13:14:41 +02:00
Kamil Braun
e4f85cf425 test: unit test for new failure detector service 2022-05-09 13:14:41 +02:00
Kamil Braun
666e5a414d direct_failure_detector: introduce new failure detector service
The new service performs failure detection by periodically pinging
endpoints. The set of pinged endpoints can be dynamically extended and
shrinked. To learn about liveness of endpoints, user of the service
registers a listener and chooses a threshold - a duration of time which
has to pass since the last successful ping in order to mark an endpoint
as dead. When an endpoint responds it's immediately marked as alive.

Endpoints are identified using abstract integer identifiers.
The method of performing a ping is a dependency of the service provided
by the user through the `pinger` interface. The implementation of `pinger`
is responsible for translating the abstract endpoint IDs to 'real'
addresses. For example, production implementation may map endpoint IDs
to IP addresses and use TCP/IP to perform the ping, while a test/simulation
implementation may use a simulated network that also operates on
abstract identifiers.

Similarly, the method of measuring time is a dependency provided by the
user using the `clock` interface. The service operates on abstract time
intervals and timepoints. So, for example, in a production
implementation time can be measured using a stopwatch, while in
test/simulation we can use a logical clock.

The service distributes work across different shards. When an endpoint
is added to the set of detected endpoints, the service will choose a
shard with the smallest amount of workers and create a worker that is
responsible for periodically pinging this endpoint on that shard and
sending notifications to listeners.

Endpoints can be added or removed only through the shard 0 instance of
the service and shard 0 is responsible for coordinating the endpoint
workers. Listeners can be registered on any shard.
2022-05-09 13:14:40 +02:00
Botond Dénes
9623589c77 Merge 'Futurize data_read_resolver::resolve and to_data_query_result' from Benny Halevy
This series futurizes two synchronous functions used for data reconciliation:
`data_read_resolver::resolve` and `to_data_query_result` and does so
by introducing lower-level asynchronous infrastructure:
`mutation_partition_view::accept_gently`,
`frozen_mutation::unfreeze_gently` and `frozen_mutation::consume_gently`,
and `mutation::consume_gently`.

This trades some cycles on this cold path to prevent known reactor stalls.

Fixes #2361
Fixes #10038

Closes #10482

* github.com:scylladb/scylla:
  mutation: add consume_gently
  frozen_mutation: add consume_gently
  query: coroutinize to_data_query_result
  frozen_mutation: add unfreeze_gently
  mutation_partition_view: add accept_gently methods
  storage_proxy: futurize data_read_resolver::resolve
2022-05-06 10:23:02 +03:00
Piotr Sarna
eeec502aee Merge 'gms: feature_service: reduce boilerplate to add a cluster feature' from Avi Kivity
Currently, adding a cluster feature requires editing several files and
repeating the new feature name several times. This series reduces
the boilerplate to a single line (for non-experimental features), and
perhaps three for experimental features.

Closes #10488

* github.com:scylladb/scylla:
  gms: feature_service: remove variable/helper function duplication
  gms: feature: make `operator bool` implicit
  gms: feature_service: remove feature variable duplication in enable()
  gms: feature_service: remove feature variable declaration/definition duplication
  gms: features: de-quadruplicate active feature names
  gms: features: de-quadruplicate deprecated feature names
  gms: feature_service: avoid duplicating feature names when listing known features
2022-05-05 12:43:15 +02:00
Benny Halevy
ca1b616092 mutation: add consume_gently
Allow yielding when consuming a mutation,
and use in to_data_query_result.

Fixes #10038

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-05 13:32:25 +03:00
Benny Halevy
09fb2c983a frozen_mutation: add consume_gently
Allow yielding when consuming a frozen_mutation,
and use in to_data_query_result.

Refs #10038

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-05 13:32:25 +03:00
Benny Halevy
c9612855c7 query: coroutinize to_data_query_result
Reduce stalls by maybe yielding in-between partitions,
and by awaiting unfreeze_gently where possible.

Refs #10038

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-05 13:32:25 +03:00
Benny Halevy
e12454f175 frozen_mutation: add unfreeze_gently
And use in data_read_resolver::resolve

Fixes #2361

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-05 13:32:25 +03:00
Benny Halevy
4963eb73b5 mutation_partition_view: add accept_gently methods
Allow yielding when consuming mutation_partition_view.
To be used in later patches by a new unfreeze_gently function
and frozen_mutation::consume.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-05 13:32:25 +03:00
Benny Halevy
f02c25f2c3 storage_proxy: futurize data_read_resolver::resolve
Allow yielding in data_read_resolver::resolve to
prevent reactor stalls.

TODO: unfreeze_gently, to prevent stalls due
to large partitions.

Refs #2361

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-05-05 13:32:25 +03:00
Avi Kivity
19ab3edd77 gms: feature_service: remove variable/helper function duplication
Each feature has a private variable and a public accessor. Since the
accessor effectively makes the variable public, avoid the intermediary
and make the variable public directly.

To ease mechanical translation, the variable name is chosen as
the function name (without the cluster_supports_ prefix).

References throughout the codebase are adjusted.
2022-05-04 18:59:56 +03:00
Avi Kivity
435b46cd52 gms: feature: make operator bool implicit
Features are usually used as booleans, so forcing allowing them
to implicitly decay to bool is not a mistake. In fact a bunch
of helper functions exist to cast feature variables to bool.

Prepare to reduce this boilerplate by allowing automatic conversion
to bool.
2022-05-04 18:58:24 +03:00
Avi Kivity
81ad595f61 gms: feature_service: remove feature variable duplication in enable()
We have a list of all feature variables in enable(), but the list
is also available programatically in _registered_features, so use
that instead.
2022-05-04 18:44:28 +03:00
Avi Kivity
f0f4759163 gms: feature_service: remove feature variable declaration/definition duplication
Feature variables are both declared and defined. Make that happen in one
place, reducing boilerplate.
2022-05-04 18:24:56 +03:00
Avi Kivity
0f95258577 gms: features: de-quadruplicate active feature names
Active feature names are present four or five times in the code:
a delaration in feature.hh, a definition and initialization (two copies)
in feature_service.cc, a use in feature_service.cc, and a possible
reference in feature_service.cc if the feature is conditionally enabled.

Switch to just one copy or two, using the "foo"sv operator (and "foo"s)
to generate a string_view (string) as before.

Note that a few features had different external and C++ names; we
preserve the external name.

This patch does cause literal strings to be present in two places,
making them vulnerable to misspellings. But since feature names
are immutable, there is little risk that one will change without
the other.
2022-05-04 18:12:53 +03:00
Avi Kivity
980b109adb gms: features: de-quadruplicate deprecated feature names
Deprecated features are unused, but are present four times in the code:
a delaration in feature.hh, a definition and initialization (two copies)
in feature_service.cc, and a use in feature_service.cc. Switch to just
one copy, using the "foo"sv operator to generate a string_view as before.

Note that a few features had different external and C++ names; we
preserve the external name.
2022-05-04 17:54:05 +03:00
Avi Kivity
ebe5ce2870 gms: feature_service: avoid duplicating feature names when listing known features
We already have the registered features in a data structure, collect them
from there instead of repeating.
2022-05-04 16:19:42 +03:00
Calle Wilund
78350a7e1b cdc: Ensure columns removed from log table are registered as dropped
If we are redefining the log table, we need to ensure any dropped
columns are registered in "dropped_columns" table, otherwise clients will not
be able to read data older than now.
Includes unit test.

Should probably be backported to all CDC enabled versions.

Fixes #10473
Closes #10474
2022-05-04 14:19:39 +02:00
Michał Radwański
29e09a3292 db/config: command line arguments logger_stdout_timestamps and logger_ostream_type are no longer ignored
Closes #10452
2022-05-04 14:40:52 +03:00
Pavel Emelyanov
063d26bc9e system_keyspace/config: Swallow string->value cast exception
When updating an updateable value via CQL the new value comes as a
string that's then boost::lexical_cast-ed to the desired value. If the
cast throws the respective exception is printed in logs which is very
likely uncalled for.

fixes: #10394
tests: manual

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220503142942.8145-1-xemul@scylladb.com>
2022-05-04 08:35:12 +03:00
Pavel Emelyanov
b26a3da584 gossiper: Coroutinize wait_for_gossip_to_settle()
Looks notably shorter this way

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220422093000.24407-1-xemul@scylladb.com>
2022-05-03 15:58:04 +03:00
Botond Dénes
4440d4b41a Merge "De-globalize gossiper" from Pavel Emelyanov
"
- Alternator gets gossiper for its proxy dependency

- Forward service method that takes global gossiper can re-use
  proxy method (forward -> proxy reference is already there)

- Table code is patched to require gossiper argument

- Snitch gets a dependency reference on snitch_ptr and some extra
  care for snitch driver vs snitch-ptr interaction and gossip test

- Cql test env should carry gossiper reference on-board

- Few places can re-use the existing local gossiper reference

- Scylla-gdb needs to get gossiper from debug namespace and needs
  _not_ to get feature service from gossiper
"

* 'br-gossiper-deglobal-2' of https://github.com/xemul/scylla:
  code: De-globalize gossiper
  scylla-gdb, main: Get feature service without gossiper help
  test: Use cql-test-env gossiper
  cql test env: Keep gossiper reference on board
  code: Use gossiper reference where possible
  snitch: Use local gossiper in drivers
  snitch: Keep gossiper reference
  test: Remove snitch from manual gossip test
  gossiper: Use container() instead of the global pointer
  main, cql_test_env: Start snitch later
  snitch: Move snitch_base::get_endpoint_info()
  forward service: Re-use proxy's helper with duplicated code
  table: Don't use global gossiper
  alternator: Don't use global gossiper
2022-05-03 15:56:07 +03:00
Nadav Har'El
6fb762630b cql-pytest: translate Cassandra's tests for SELECT operations
This is a translation of Cassandra's CQL unit test source file
    validation/operations/SelectTest.java into our our cql-pytest framework.

    This large test file includes 78 tests for various types of SELECT
    operations. Four additional tests require UDF in Java syntax,
    and were skipped.

    All 78 tests pass on Cassandra. 25 of the tests fail on Scylla
    reproducing 3 already known Scylla issues and 8 previously-unknown
    issues:

    Previously known issues:

      Refs #2962:  Collection column indexing
      Refs #4244:  Add support for mixing token, multi- and single-column
                   restrictions
      Refs #8627:  Cleanly reject updates with indexed values where
                   value > 64k

    Newly-discovered issues:

      Refs #10354: SELECT DISTINCT should allow filter on static columns,
                   not just partition keys
      Refs #10357: Spurious static row returned from query with filtering,
                   despite not matching filter
      Refs #10358: Comparison with UNSET_VALUE should produce an error
      Refs #10359: "CONTAINS NULL" and "CONTAINS KEY NULL" restrictions
                   should match nothing
      Refs #10361: Null or UNSET_VALUE subscript should generate an
                   invalid request error
      Refs #10366: Enforce Key-length limits during SELECT
      Refs #10443: SELECT with IN and ORDER BY orders rows per partition
                   instead of for the entire response
      Refs #10448: The CQL token() function should validate its parameters

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10449
2022-05-03 11:45:05 +03:00
Pavel Emelyanov
e80adbade3 code: De-globalize gossiper
No code uses global gossiper instance, it can be removed. The main and
cql-test-env code now have their own real local instances.

This change also requires adding the debug:: pointer and fixing the
scylle-gdb.py to find the correct global location.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
89ee15b05b scylla-gdb, main: Get feature service without gossiper help
This is needed not to mess with removed global gossiper in the next
patch. Other than this, it's better to access services by their own
debug:: pointers, not via under-the-good dependencies chains.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
b25fc29801 test: Use cql-test-env gossiper
There's yet another -test-env -- the alternator- one -- which needs
gossiper. It now uses global reference, but can grab gossiper reference
from the cql-test-env which partitipates in initialization.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
b0544ba7bd cql test env: Keep gossiper reference on board
The reference is already available at the env initialization, but it's
not kept on the env instance itself. Will be used by the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
4bea0b7491 code: Use gossiper reference where possible
Some places in the code has function-local gossiper reference but
continue to use global instance. Re-use the local reference (it's going
to become sharded<> instance soon).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
e502047c74 snitch: Use local gossiper in drivers
Each driver has a pointer to this shard snitch_ptr which, in turn, has
the reference on gossiper. This lets drivers stop using the global
gossiper instance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
38c77d0d85 snitch: Keep gossiper reference
The reference is put on the snitch_ptr because this is the sharded<>
thing and because gossiper reference is the same for different snitch
drivers. Also, getting gossiper from snitch_ptr by driver will look
simpler than getting it from any base class.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
52fc4d6b22 test: Remove snitch from manual gossip test
It's not in use out there

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
7a0ca3fedc gossiper: Use container() instead of the global pointer
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:40 +03:00
Pavel Emelyanov
2d32c47d0d main, cql_test_env: Start snitch later
Snitch depends on gossiper and system keyspace, so it needs to be
started after those two do.

fixes #10402

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:57:32 +03:00
Pavel Emelyanov
f85e12ffa5 snitch: Move snitch_base::get_endpoint_info()
This method is only needed by production_snitch_base inheritants

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:34:52 +03:00
Pavel Emelyanov
282a1880a5 forward service: Re-use proxy's helper with duplicated code
The get_live_endpoints matches the same method on the proxy side. Since
the forward service carries proxy reference, it can use its method
(which needs to be made public for that sake).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:34:51 +03:00
Pavel Emelyanov
11c99fc41b table: Don't use global gossiper
The table::get_hit_rate needs gossiper to get hitrates state from.
There's no way to carry gossiper reference on the table itself, so it's
up to the callers of that method to provide it. Fortunately, there's
only one caller -- the proxy -- but the call chain to carry the
reference it not very short ... oh, well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:33:08 +03:00
Pavel Emelyanov
7a5c2cdbe6 alternator: Don't use global gossiper
There's proxy at hand which can provide local gossiper reference

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:33:07 +03:00
Avi Kivity
a2901a376d Merge 'Coroutinize some storage_service member functions' from Pavel Solodovnikov
These trivial changes are mostly intended to reduce the use of `seastar::async`.

Closes #10416

* github.com:scylladb/scylla:
  service: storage_service: coroutinize `start_gossiping()`
  service: storage_service: coroutinize `node_ops_cmd_heartbeat_updater()`
  service: storage_service: coroutinize `node_ops_abort_thread()`
  service: storage_service: coroutinize `node_ops_abort()`
  service: storage_service: coroutinize `node_ops_done()`
  service: storage_service: coroutinize `node_ops_update_heartbeat()`
  service: storage_service: coroutinize `force_remove_completion()`
  service: storage_service: coroutinize `start_leaving()`
  service: storage_service: coroutinize `start_sys_dist_ks()`
  service: storage_service: coroutinize `prepare_to_join()`
  service: storage_service: coroutinize `removenode_add_ranges()`
  service: storage_service: coroutinize `unbootstrap()`
  service: storage_service: coroutinize `get_changed_ranges_for_leaving()`
2022-05-02 12:59:36 +03:00
Botond Dénes
53c66fe24a Merge "Make LCS reshape and major more efficient by picking the ideal output level" from Raphael S. Carvalho
"
Today, both operations are picking the highest level as the ideal level for
placing the output, but the size of input should be used instead.

The formula for calculating the ideal level is:
    ceil(log base(fan_out) of (total_input_size / max_fragment_size))

        where fan_out = 10 by default,
        total_input_size = total size of input data and
        max_fragment_size = maximum size for fragment (160M by default)

    such that 20 fragments will be placed at level 2, as level 1
    capacity is 10 fragments only.

By placing the output in the incorrect level, tons of backlog will be generated
for LCS because it will either have to promote or demote fragments until the
levels are properly balanced.
"

* 'optimize_lcs_major_and_reshape/v2' of https://github.com/raphaelsc/scylla:
  compaction: LCS: avoid needless work post major compaction completion
  compaction: LCS: avoid needless work post reshape completion
  compaction: LCS: extract calculation of ideal level for input
  compaction: LCS: Fix off-by-one in formula used to calculate ideal level
2022-05-02 10:16:09 +03:00
Pavel Solodovnikov
47834313d8 repair: avoid infinite recursion on stringifying unknown node_ops_cmd
Cast the cmd representation to underlying type and
avoid infinite recursion in the `operator <<(node_ops_cmd)`.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20220430180701.1012190-1-pa.solodovnikov@scylladb.com>
2022-05-02 10:08:34 +03:00
Avi Kivity
5169ce40ef Merge 'loading_cache: force minimum size of unprivileged ' from Piotr Grabowski
This series enforces a minimum size of the unprivileged section when
performing `shrink()` operation.

When the cache is shrunk, we still drop entries first from unprivileged
section (as before this commit), however, if this section is already small
(smaller than `max_size / 2`), we will drop entries from the privileged
section.

This is necessary, as before this change the unprivileged section could
be starved. For example if the cache could store at most 50 entries and
there are 49 entries in privileged section, after adding 5 entries (that would
go to unprivileged section) 4 of them would get evicted and only the 5th one
would stay. This caused problems with BATCH statements where all
prepared statements in the batch have to stay in cache at the same time
for the batch to correctly execute.

To correctly check if the unprivileged section might get too small after
dropping an entry, `_current_size` variable, which tracked the overall size
of cache, is changed to two variables: `_unprivileged_section_size` and
`_privileged_section_size`, tracking section sizes separately.

New tests are added to check this new behavior and bookkeeping of the section
sizes. A test is added, that sets up a CQL environment with a very small
prepared statement cache, reproduces issue in #10440 and stresses the cache.

Fixes #10440.

Closes #10456

* github.com:scylladb/scylla:
  loading_cache_test: test prepared stmts cache
  loading_cache: force minimum size of unprivileged
  loading_cache: extract dropping entries to lambdas
  loading_cache: separately track size of sections
  loading_cache: fix typo in 'privileged'
2022-05-01 19:36:35 +03:00
Avi Kivity
325eb9b4d2 Merge 'compaction: get rid of reader v1' from Benny Halevy
This series gets rid of the remaining usage of flat_mutation_reader v1 in compaction

Test: sstable_compaction_test

Closes #10454

* github.com:scylladb/scylla:
  compaction: sanitize headers from flat_mutation_reader v1
  flat_mutation_reader: get rid of class filter
  compaction: cleanup_compaction: make_partition_filter: return flat_mutation_reader_v2::filter
2022-05-01 19:29:10 +03:00
Avi Kivity
ab72dbc93e Merge 'Make internal statements caching more coherent' from Eliran Sinvani
There was some doubts about which internal prepared statements are cached and which aren't.
In addition some queries that should have been cached (IMO), weren't. This PR adds some verbosity
to the caching enabling parameter as well as adding caching to some queries.
As a followup I would suggest to have internal queries as a compile time strings that have a compile time hash, this
will make the cache lookup not be dependent on the query textual length as it is today, this makes sense given that the
queries are static even today.

Closes #10465

Fixes #10335.

* github.com:scylladb/scylla:
  internal queries: add caching to some queries
  query_processor: remove default internal query caching behavior
  query_processor: make execute_internal caching parameter more verbose
2022-05-01 19:27:06 +03:00
Eliran Sinvani
a16b4e407d internal queries: add caching to some queries
Some of the internal queries didn't have caching enabled even though
there are chances of the query executing in large bursts or relatively
often, example of the former is `default_authorized::authorize` and for
the later is `system_distributed_keyspace::get_service_levels`.

Fixes #10335

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2022-05-01 13:30:02 +03:00