The new service performs failure detection by periodically pinging
endpoints. The set of pinged endpoints can be dynamically extended and
shrinked. To learn about liveness of endpoints, user of the service
registers a listener and chooses a threshold - a duration of time which
has to pass since the last successful ping in order to mark an endpoint
as dead. When an endpoint responds it's immediately marked as alive.
Endpoints are identified using abstract integer identifiers.
The method of performing a ping is a dependency of the service provided
by the user through the `pinger` interface. The implementation of `pinger`
is responsible for translating the abstract endpoint IDs to 'real'
addresses. For example, production implementation may map endpoint IDs
to IP addresses and use TCP/IP to perform the ping, while a test/simulation
implementation may use a simulated network that also operates on
abstract identifiers.
Similarly, the method of measuring time is a dependency provided by the
user using the `clock` interface. The service operates on abstract time
intervals and timepoints. So, for example, in a production
implementation time can be measured using a stopwatch, while in
test/simulation we can use a logical clock.
The service distributes work across different shards. When an endpoint
is added to the set of detected endpoints, the service will choose a
shard with the smallest amount of workers and create a worker that is
responsible for periodically pinging this endpoint on that shard and
sending notifications to listeners.
Endpoints can be added or removed only through the shard 0 instance of
the service and shard 0 is responsible for coordinating the endpoint
workers. Listeners can be registered on any shard.
This series futurizes two synchronous functions used for data reconciliation:
`data_read_resolver::resolve` and `to_data_query_result` and does so
by introducing lower-level asynchronous infrastructure:
`mutation_partition_view::accept_gently`,
`frozen_mutation::unfreeze_gently` and `frozen_mutation::consume_gently`,
and `mutation::consume_gently`.
This trades some cycles on this cold path to prevent known reactor stalls.
Fixes#2361Fixes#10038Closes#10482
* github.com:scylladb/scylla:
mutation: add consume_gently
frozen_mutation: add consume_gently
query: coroutinize to_data_query_result
frozen_mutation: add unfreeze_gently
mutation_partition_view: add accept_gently methods
storage_proxy: futurize data_read_resolver::resolve
Currently, adding a cluster feature requires editing several files and
repeating the new feature name several times. This series reduces
the boilerplate to a single line (for non-experimental features), and
perhaps three for experimental features.
Closes#10488
* github.com:scylladb/scylla:
gms: feature_service: remove variable/helper function duplication
gms: feature: make `operator bool` implicit
gms: feature_service: remove feature variable duplication in enable()
gms: feature_service: remove feature variable declaration/definition duplication
gms: features: de-quadruplicate active feature names
gms: features: de-quadruplicate deprecated feature names
gms: feature_service: avoid duplicating feature names when listing known features
Reduce stalls by maybe yielding in-between partitions,
and by awaiting unfreeze_gently where possible.
Refs #10038
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Allow yielding when consuming mutation_partition_view.
To be used in later patches by a new unfreeze_gently function
and frozen_mutation::consume.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Allow yielding in data_read_resolver::resolve to
prevent reactor stalls.
TODO: unfreeze_gently, to prevent stalls due
to large partitions.
Refs #2361
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Each feature has a private variable and a public accessor. Since the
accessor effectively makes the variable public, avoid the intermediary
and make the variable public directly.
To ease mechanical translation, the variable name is chosen as
the function name (without the cluster_supports_ prefix).
References throughout the codebase are adjusted.
Features are usually used as booleans, so forcing allowing them
to implicitly decay to bool is not a mistake. In fact a bunch
of helper functions exist to cast feature variables to bool.
Prepare to reduce this boilerplate by allowing automatic conversion
to bool.
Active feature names are present four or five times in the code:
a delaration in feature.hh, a definition and initialization (two copies)
in feature_service.cc, a use in feature_service.cc, and a possible
reference in feature_service.cc if the feature is conditionally enabled.
Switch to just one copy or two, using the "foo"sv operator (and "foo"s)
to generate a string_view (string) as before.
Note that a few features had different external and C++ names; we
preserve the external name.
This patch does cause literal strings to be present in two places,
making them vulnerable to misspellings. But since feature names
are immutable, there is little risk that one will change without
the other.
Deprecated features are unused, but are present four times in the code:
a delaration in feature.hh, a definition and initialization (two copies)
in feature_service.cc, and a use in feature_service.cc. Switch to just
one copy, using the "foo"sv operator to generate a string_view as before.
Note that a few features had different external and C++ names; we
preserve the external name.
If we are redefining the log table, we need to ensure any dropped
columns are registered in "dropped_columns" table, otherwise clients will not
be able to read data older than now.
Includes unit test.
Should probably be backported to all CDC enabled versions.
Fixes#10473Closes#10474
When updating an updateable value via CQL the new value comes as a
string that's then boost::lexical_cast-ed to the desired value. If the
cast throws the respective exception is printed in logs which is very
likely uncalled for.
fixes: #10394
tests: manual
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220503142942.8145-1-xemul@scylladb.com>
"
- Alternator gets gossiper for its proxy dependency
- Forward service method that takes global gossiper can re-use
proxy method (forward -> proxy reference is already there)
- Table code is patched to require gossiper argument
- Snitch gets a dependency reference on snitch_ptr and some extra
care for snitch driver vs snitch-ptr interaction and gossip test
- Cql test env should carry gossiper reference on-board
- Few places can re-use the existing local gossiper reference
- Scylla-gdb needs to get gossiper from debug namespace and needs
_not_ to get feature service from gossiper
"
* 'br-gossiper-deglobal-2' of https://github.com/xemul/scylla:
code: De-globalize gossiper
scylla-gdb, main: Get feature service without gossiper help
test: Use cql-test-env gossiper
cql test env: Keep gossiper reference on board
code: Use gossiper reference where possible
snitch: Use local gossiper in drivers
snitch: Keep gossiper reference
test: Remove snitch from manual gossip test
gossiper: Use container() instead of the global pointer
main, cql_test_env: Start snitch later
snitch: Move snitch_base::get_endpoint_info()
forward service: Re-use proxy's helper with duplicated code
table: Don't use global gossiper
alternator: Don't use global gossiper
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectTest.java into our our cql-pytest framework.
This large test file includes 78 tests for various types of SELECT
operations. Four additional tests require UDF in Java syntax,
and were skipped.
All 78 tests pass on Cassandra. 25 of the tests fail on Scylla
reproducing 3 already known Scylla issues and 8 previously-unknown
issues:
Previously known issues:
Refs #2962: Collection column indexing
Refs #4244: Add support for mixing token, multi- and single-column
restrictions
Refs #8627: Cleanly reject updates with indexed values where
value > 64k
Newly-discovered issues:
Refs #10354: SELECT DISTINCT should allow filter on static columns,
not just partition keys
Refs #10357: Spurious static row returned from query with filtering,
despite not matching filter
Refs #10358: Comparison with UNSET_VALUE should produce an error
Refs #10359: "CONTAINS NULL" and "CONTAINS KEY NULL" restrictions
should match nothing
Refs #10361: Null or UNSET_VALUE subscript should generate an
invalid request error
Refs #10366: Enforce Key-length limits during SELECT
Refs #10443: SELECT with IN and ORDER BY orders rows per partition
instead of for the entire response
Refs #10448: The CQL token() function should validate its parameters
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10449
No code uses global gossiper instance, it can be removed. The main and
cql-test-env code now have their own real local instances.
This change also requires adding the debug:: pointer and fixing the
scylle-gdb.py to find the correct global location.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is needed not to mess with removed global gossiper in the next
patch. Other than this, it's better to access services by their own
debug:: pointers, not via under-the-good dependencies chains.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's yet another -test-env -- the alternator- one -- which needs
gossiper. It now uses global reference, but can grab gossiper reference
from the cql-test-env which partitipates in initialization.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The reference is already available at the env initialization, but it's
not kept on the env instance itself. Will be used by the next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some places in the code has function-local gossiper reference but
continue to use global instance. Re-use the local reference (it's going
to become sharded<> instance soon).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Each driver has a pointer to this shard snitch_ptr which, in turn, has
the reference on gossiper. This lets drivers stop using the global
gossiper instance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The reference is put on the snitch_ptr because this is the sharded<>
thing and because gossiper reference is the same for different snitch
drivers. Also, getting gossiper from snitch_ptr by driver will look
simpler than getting it from any base class.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Snitch depends on gossiper and system keyspace, so it needs to be
started after those two do.
fixes#10402
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The get_live_endpoints matches the same method on the proxy side. Since
the forward service carries proxy reference, it can use its method
(which needs to be made public for that sake).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The table::get_hit_rate needs gossiper to get hitrates state from.
There's no way to carry gossiper reference on the table itself, so it's
up to the callers of that method to provide it. Fortunately, there's
only one caller -- the proxy -- but the call chain to carry the
reference it not very short ... oh, well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
Today, both operations are picking the highest level as the ideal level for
placing the output, but the size of input should be used instead.
The formula for calculating the ideal level is:
ceil(log base(fan_out) of (total_input_size / max_fragment_size))
where fan_out = 10 by default,
total_input_size = total size of input data and
max_fragment_size = maximum size for fragment (160M by default)
such that 20 fragments will be placed at level 2, as level 1
capacity is 10 fragments only.
By placing the output in the incorrect level, tons of backlog will be generated
for LCS because it will either have to promote or demote fragments until the
levels are properly balanced.
"
* 'optimize_lcs_major_and_reshape/v2' of https://github.com/raphaelsc/scylla:
compaction: LCS: avoid needless work post major compaction completion
compaction: LCS: avoid needless work post reshape completion
compaction: LCS: extract calculation of ideal level for input
compaction: LCS: Fix off-by-one in formula used to calculate ideal level
This series enforces a minimum size of the unprivileged section when
performing `shrink()` operation.
When the cache is shrunk, we still drop entries first from unprivileged
section (as before this commit), however, if this section is already small
(smaller than `max_size / 2`), we will drop entries from the privileged
section.
This is necessary, as before this change the unprivileged section could
be starved. For example if the cache could store at most 50 entries and
there are 49 entries in privileged section, after adding 5 entries (that would
go to unprivileged section) 4 of them would get evicted and only the 5th one
would stay. This caused problems with BATCH statements where all
prepared statements in the batch have to stay in cache at the same time
for the batch to correctly execute.
To correctly check if the unprivileged section might get too small after
dropping an entry, `_current_size` variable, which tracked the overall size
of cache, is changed to two variables: `_unprivileged_section_size` and
`_privileged_section_size`, tracking section sizes separately.
New tests are added to check this new behavior and bookkeeping of the section
sizes. A test is added, that sets up a CQL environment with a very small
prepared statement cache, reproduces issue in #10440 and stresses the cache.
Fixes#10440.
Closes#10456
* github.com:scylladb/scylla:
loading_cache_test: test prepared stmts cache
loading_cache: force minimum size of unprivileged
loading_cache: extract dropping entries to lambdas
loading_cache: separately track size of sections
loading_cache: fix typo in 'privileged'
This series gets rid of the remaining usage of flat_mutation_reader v1 in compaction
Test: sstable_compaction_test
Closes#10454
* github.com:scylladb/scylla:
compaction: sanitize headers from flat_mutation_reader v1
flat_mutation_reader: get rid of class filter
compaction: cleanup_compaction: make_partition_filter: return flat_mutation_reader_v2::filter
There was some doubts about which internal prepared statements are cached and which aren't.
In addition some queries that should have been cached (IMO), weren't. This PR adds some verbosity
to the caching enabling parameter as well as adding caching to some queries.
As a followup I would suggest to have internal queries as a compile time strings that have a compile time hash, this
will make the cache lookup not be dependent on the query textual length as it is today, this makes sense given that the
queries are static even today.
Closes#10465Fixes#10335.
* github.com:scylladb/scylla:
internal queries: add caching to some queries
query_processor: remove default internal query caching behavior
query_processor: make execute_internal caching parameter more verbose
Some of the internal queries didn't have caching enabled even though
there are chances of the query executing in large bursts or relatively
often, example of the former is `default_authorized::authorize` and for
the later is `system_distributed_keyspace::get_service_levels`.
Fixes#10335
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Also, pass `node_ops_cmd` by value to get rid of lifetime issues
when converting to coroutine.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
When executing internal queries, it is important that the developer
will decide if to cache the query internally or not since internal
queries are cached indefinitely. Also important is that the programmer
will be aware if caching is going to happen or not.
The code contained two "groups" of `query_processor::execute_internal`,
one group has caching by default and the other doesn't.
Here we add overloads to eliminate default values for caching behaviour,
forcing an explicit parameter for the caching values.
All the call sites were changed to reflect the original caching default
that was there.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
`execute_internal` has a parameter to indicate if caching a prepared
statement is needed for a specific call. However this parameter was a
boolean so it was easy to miss it's meaning in the various call sites.
This replaces the parameter type to a more verbose one so it is clear
from the call site what decision was made.
Said method has to evict all querier cache entries, belonging to the to-be-dropped table. This is already the case, but there was a window where new entries could sneak in, causing a stale reference to the table to be de-referenced later when they are evicted due to TTL. This window is now closed, the entries are evicted after the method has waited for all ongoing operations on said table to stop.
Fixes: #10450Closes#10451
* github.com:scylladb/scylla:
replica/database: drop_column_family(): drop querier cache entries after waiting for ops
replica/database: finish coroutinizing drop_column_family()
replica/database: make remove(const column_family&) private
Fix hangs on Scylla node startup with Raft enabled that were caused by:
- a deadlock when enabling the USES_RAFT feature,
- a non-voter server forgetting who the leader is and not being able to forward a `modify_config` entry to become a voter.
Read the commit messages for details.
Fixes: #10379
Refs: #10355Closes#10380
* github.com:scylladb/scylla:
raft: actively search for a leader if it is not known for a tick duration
raft: server: return immediately from `wait_for_leader` if leader is known
service: raft: don't support/advertise USES_RAFT feature