in this series, we use different table names in simple_backlog_controller_test. this test is a test exercising sstables compaction strategies. and it creates and keeps multiple tables in a single test session. but we are going to add metrics on per-table basis, and will use the table's ks and cf as the counter's labels. as the metrics subsystem does not allow multiple counters to share the same label. the test will fail when the metrics are being added.
to address this problem, in this change
1. a new ctor is added for `simple_schema`, so we can create `simple_schema` with different names
2. use the new ctor in simple_backlog_controller_test
Fixes#14767Closes#14783
* github.com:scylladb/scylladb:
test: use different table names in simple_backlog_controller_test
test/lib/simple_schema: add ctor for customizing ks.cf
test/lib/simple_schema: do not hardwire ks.cf
This commit adds the information on how to install ScyllaDB
without root privileges (with "unified installer", but we've
decided to drop that name - see the page title).
The content taken from the website
https://www.scylladb.com/download/?platform=tar&version=scylla-5.2#open-source
is divided into two sections: "Download and Install" and
"Configure and Run ScyllaDB".
In addition, the "Next Steps" section is also copied from
the website, and adjusted to be in sync with other installation
pages in the docs.
Refs https://github.com/scylladb/scylla-docs/issues/4091Closes#14781
In this commit we just pass a fencing_token
through hint_mutation RPC verb.
The hints manager uses either
storage_proxy::send_hint_to_all_replicas or
storage_proxy::send_hint_to_endpoint to send a hint.
Both methods capture the current erm and use the
corresponding fencing token from it in the
mutation or hint_mutation RPC verb. If these
verbs are fenced out, the server stale_topology_exception
is translated to a mutation_write_failure_exception
on the client with an appropriate error message.
The hint manager will attempt to resend the failed
hint from the commitlog segment after a delay.
However, if delivery is unsuccessful, the hint will
be discarded after gc_grace_seconds.
Closes#14580
We don't load gossiper endpoint states in `storage_service::join_cluster` if `_raft_topology_change_enabled`, but gossiper is still needed even in case of `_raft_topology_change_enabled` mode, since it still contains part of the cluster state. To work correctly, the gossiper needs to know the current endpoints. We cannot rely on seeds alone, since it is not guaranteed that seeds will be up to date and reachable at the time of restart.
The problem was demonstrated by the test `test_joining_old_node_fails`, it fails occasionally with `experimental_features: [consistent-topology-changes]` on the line where it waits for `TEST_ONLY_FEATURE` to become enabled on all nodes. This doesn't happen since `SUPPORTED_FEATURES` gossiper state is not disseminated, and feature_service still relies on gossiper to disseminate information around the cluster.
The series also contains a fix for a problem in `gossiper::do_send_ack2_msg`, see commit message for details.
Fixes#14675Closes#14775
* github.com:scylladb/scylladb:
storage_service: restore gossiper endpoints on topology_state_load fix
gossiper: do_send_ack2_msg fix
If semaphore mismatch occurs, check whether both semaphores belong
to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error.
Until now, semaphore mismatch was only checked in multi-partition queries. The PR pushes the check to `querier_cache` and perform it on all `lookup_*_querier` methods.
The mismatch can happen if user's scheduling group changed during
a query. We don't want to throw an error then, but drop and reset
cached reader.
This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it.
Refers: https://github.com/scylladb/scylla-enterprise/issues/3182
Refers: https://github.com/scylladb/scylla-enterprise/issues/3050Closes: #14770Closes#14736
* github.com:scylladb/scylladb:
querier_cache: add stats of scheduling group mismatches
querier_cache: check semaphore mismatch during querier lookup
querier_cache: add reference to `replica::database::is_user_semaphore()`
replica:database: add method to determine if semaphore is user one
We don't load gossiper endpoint states in
storage_service::join_cluster if
_raft_topology_change_enabled, but gossiper
is still needed even in case of
_raft_topology_change_enabled mode, since it
still contains part of the cluster state.
To work correctly, the gossiper needs to know
the current endpoints. We cannot rely on seeds alone,
since it is not guaranteed that seeds will be
up to date and reachable at the time of restart.
The specific scenario of the problem: cluster with
three nodes, the second has the first in seeds,
the third has the first and second. We restart all
the nodes simultaneously, the third node uses its
seeds as _endpoints_to_talk_with in the first gossiper round
and sends SYN to the first and sedond. The first node
hasn't started its gossiper yet, so handle_syn_msg
returns immediately after if (!this->is_enabled());
The third node receives ack from the second node and
no communication from the first node, so it fills
its _live_endpoints collection with the second node
and will never communicate with the first node again.
The problem was demonstrated by the test
test_joining_old_node_fails, it fails occasionally with
experimental_features: [consistent-topology-changes]
on the line where it waits for TEST_ONLY_FEATURE
to become enabled on all nodes. This doesn't happen
since SUPPORTED_FEATURES gossiper state is not
disseminated because of the problem described above.
The first commit is needed since add_saved_endpoint
adds the endpoint with some default app states with locally
incrementing versions and without that fix gossiper
refuses to fill the real app states for this endpoint later.
Fixes: #14675
Fixes#14668
In #14668, we have decided to introduce a new `scylla.yaml` variable for the schema commitlog segment size and set it to 128MB. The reason is that segment size puts a limit on the mutation size that can be written at once, and some schema mutation writes are much larger than average, as shown in #13864. This `schema_commitlog_segment_size_in_mb variable` variable is now added to `scylla.yaml` and `db/config`.
Additionally, we do not derive the commitlog sync period for schema commitlog anymore because schema commitlog runs in batch mode, so it doesn't need this parameter. It has also been discussed in #14668.
Closes#14704
* github.com:scylladb/scylladb:
replica: do not derive the commitlog sync period for schema commitlog
config: set schema_commitlog_segment_size_in_mb to 128
config: add schema_commitlog_segment_size_in_mb variable
This commit is a first part of the fix for #14675.
The issue is about the test test_joining_old_node_fails
faling occasionally with
experimental_features: [consistent-topology-changes].
The next commit contains a fix for it, here we
solve the pre-existing gossiper problem
which we stumble upon after the fix.
Local generation for addr may have been
increased since the current node sent
an initial SYN. Comparing versions across different
generations in get_state_for_version_bigger_than
could result in loosing some app states with
smaller versions.
More specifically, consider a cluster with nodes
.1, .2, .3, .3 has .1 and .2 as seeds, .2 has .1
as a seed. Suppose .2 receives a SYN from .3 before
its gossiper starts, and it has a
version 0.24 for .1 in endpoint_states.
The digest from .3 contains 0.25 as a version for .1,
so examine_gossiper produces .1->0.24 as a digest
and this digest is send to .3 as part of the ack.
Before processing this ack, .3 processed an ack from
.1 (scylla sends SYN to many nodes) and updates
its endpoint_states according to it, so now it
has .1->100500.32 for .1. Then
we get to do_send_ack2_msg and call
get_state_for_version_bigger_than(.1, 24).
This returns properties which has version > 24,
ignoring a lot of them with smaller versions
which has been received from .1. Also,
get_state_for_version_bigger_than updates
generation (it copies get_heart_beat_state from
.3), so when we apply the ack in handle_ack2_msg
at .2 we update the generation and now the
skipped app states will only be updated on .2
if somebody change them and increment their version.
Cassandra behaviour is the same in this case
(see https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/gms/GossipDigestAckVerbHandler.java#L86). This is probably less
of a problem for them since most of the time they
send only one SYN in one gossiper round
(save for unreachable nodes), so there is less
room for conflicts.
before this change, add_version_library() is a single function
which accomplishes two tasks:
1. build scylla-version target using
2. add an object library
but this has two problems:
1. we should run `SCYLLA-VERSION-GEN` at configure time, instead
of at build time. otherwise the targets which read from the
SCYLLA-{VERSION, RELEASE, PRODUCT}-FILE cannot access them,
unless they are able to read them in their build rules. but
they always use `file(STRINGS ..)` to read them, and thsee
`file()` command is executed at configure time. so, this
is a dead end.
2. we repeat the `file(STRING ..)` multiple places. this is
not ideal if we want to minimize the repeatings.
so, to address this problem, in this change:
1. use `execute_process()` instead of `add_custom_command()`
for generating these *-FILE files. so they are always ready
at build time. this partially reverts bb7d99ad37.
2. extract `generate_scylla_version()` out of `add_version_library()`.
so we can call the former much earlier than the latter.
this would allow us to reference the variables defined by
the `generate_scylla_version()` much earlier.
3. define cached strings in the extracted function, so that
they can consumed by other places.
4. reference the cached variables in `build_submodule.cmake`.
also, take this opportunity to fix the version string
used in build_submodule.cmake: we should have used
`scylla_version_tilde`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14769
Previously semaphore mismatch was checked only in multi-partition
queries and if happened, an internal error was thrown.
This commit pushed the check down to `querier_cache`, so each
`lookup_*_querier` method will check for the mismatch.
What's more, if semaphore mismatch occurs, check whether both semaphores belong
to user. If so, log a warning and drop cached reader instead of
throwing an error.
The mismatch can happen if user's scheduling group changed during
a query. We don't want to throw an error then, but drop and reset
cached reader.
Before choosing a function, we prepare the arguments that can be
prepared without a receiver. Preparing an argument makes
its type known, which allows to choose the best overload
among many possible functions.
The function that prepared the argument passes the unprepared
argument by mistake. Let's fix it so that it actually uses
the prepared argument.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#14786
We allow inserting column values using a JSON value, eg:
```cql
INSERT INTO mytable JSON '{ "\"myKey\"": 0, "value": 0}';
```
When no JSON value is specified, the query should be rejected.
Scylla used to crash in such cases. A recent change fixed the crash
(https://github.com/scylladb/scylladb/pull/14706), it now fails
on unwrapping an uninitialized value, but really it should
be rejected at the parsing stage, so let's fix the grammar so that
it doesn't allow JSON queries without JSON values.
A unit test is added to prevent regressions.
Refs: https://github.com/scylladb/scylladb/pull/14707
Fixes: https://github.com/scylladb/scylladb/issues/14709
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closes#14785
in `simple_backlog_controller_test`, we need to have multiple tables
at the same time. but the default constructor of `simple_schema` always
creates schema with the table name of "ks.cf". we are going to have
a per-table metrics. and the new metric group will use the table name
as its counter labels, so we need to either disable this per-table
metrics or use a different table name for each table.
as in real world, we don't have multiple tables at the same time. it
would be better to stop reusing the same table name in a single test
session. so, in this change, we use a random cf_name for each of
the created table.
Fixes#14767
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
some low level tests, like the ones exercising sstables, creates
multiple tables. and we are going to add per-table metrics and
the new metrics uses the ks.cf as part of its unique id. so,
once the per-table metrics is enabled, the sstable tests would fail.
as the metrics subsystem does not allow registering multiple
metric groups with the same name.
so, in this change, we add a new constructor for `simple_schema`,
so that we can customize the the schema's ks and cf when creating
the `simple_schema`. in the next commit, we will use this new
constructor in a sstable test which creates multiple tables.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
instead, query the name of ks and cf from the scheme. this change
prepare us for the a simple_schema whose ks and cf can be customized
by its contructor.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
in 46616712, we tried to keep the tmpdir only if the test failed,
and keep up to 1 of them using the recently introduced
option of `tmp_path_retention_count`. but it turns out this option
is not supported by the pytest used by our jenkins nodes, where we
have pytest 6.2.5. this is the one shipped along with fedora 36.
so, in this change, the tempdir is removed if the test completes
without failures. as the tempdir contains huge number of files,
and jenkins is quite slow scanning them. after nuking the tempdir,
jenkins will be much faster when scanning for the artifacts.
Fixes#14690
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14772
The mutation compactor has a validator which it uses to validate the stream of mutation fragments that passes through it. This validator is supposed to validate the stream as it enters the compactor, as opposed to its compacted form (output). This was true for most fragment kinds except range tombstones, as purged range tombstones were not visible to
the validator for the most part.
This mistake was introduced by https://github.com/scylladb/scylladb/commite2c9cdb576, which itself was a flawed attempt at fixing an error seen because purged tombstones were not terminated by the compactor.
This patch corrects this mistake by fixing the above problem properly: on page-cut, if the validator has an active tombstone, a closing tombstone is generated for it, to avoid the false-positive error. With this, range tombstones can be validated again as they come in.
The existing unit test checking the validation in the compactor is greatly expanded to check all (I hope) different validation scenarios.
Closes#13817
* github.com:scylladb/scylladb:
test/mutation_test: test_compactor_validator_sanity_test
mutation/mutation_compactor: fix indentation
mutation/mutation_compactor: validate the input stream
mutation: mutation_fragment_stream_validating_filter: add accessor to underlying validator
readers: reader-from-fragment: don't modify stream when created without range
The grammar mistakenly allows nothing to be parsed as an
intValue (itself accepted in LIMIT and similar clauses).
Easily fixed by removing the empty alternative. A unit test is
added.
Fixes#14705.
Closes#14707
let's use RAII to tear down the client and the input file, so we can
always perform the cleanups even if the test throws.
Closes#14765
* github.com:scylladb/scylladb:
s3/test: use seastar::deferred() to perform cleanup
s3/test: close using deferred_close()
The set in question is read-and-delete-only and thus always empty. Originally it was removed by commit c9993f020d (storage_service: get rid of handle_state_replacing), but some dangling ends were left. Consequentially, the on_alive() callback can get rid of few dead if-else branches
Closes#14762
* github.com:scylladb/scylladb:
storage_service: Relax on_alive()
storage_service: Remove _replacing_nodes_pending_ranges_updater
The "fix" is straightforward -- callers of system_keyspace::*paxos* methods need to get system keyspace from somewhere. This time the only caller is storage_proxy::remote that can have system keyspace via direct dependency reference.
Closes#14758
* github.com:scylladb/scylladb:
db/system_keyspace: Move and use qctx::execute_cql_with_timeout()
db/system_keyspace: Make paxos methods non-static
service/paxos: Add db::system_keyspace& argument to some methods
test: Optionally initialize proxy remote for cql_test_env
proxy/remote: Keep sharded<db::system_keyspace>& dependency
Task manager tasks covering reshard compaction.
Reattempt on https://github.com/scylladb/scylladb/pull/14044. Bugfix for https://github.com/scylladb/scylladb/issues/14618 is squashed with 95191f4.
Regression test added.
Closes#14739
* github.com:scylladb/scylladb:
test: add test for resharding with non-empty owned_ranges_ptr
test: extend test_compaction_task.py to test resharding compaction
compaction: add shard_reshard_sstables_compaction_task_impl
compaction: invoke resharding on sharded database
compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run()
compaction: add reshard_sstables_compaction_task_impl
compaction: create resharding_compaction_task_impl
Greatly expand this test to check that the compactor validates the input
stream properly.
The test is renamed (the _sanity_test suffix is removed) to reflect the
expanded scope.
The mutation compactor has a validator which it uses to validate the
stream of mutation fragments that passes through it. This validator is
supposed to validate the stream as it enters the compactor, as opposed
to its compacted form (output). This was true for most fragment kinds
except range tombstones, as purged range tombstones were not visible to
the validator for the most part.
This mistake was introduced by e2c9cdb576, which itself was a flawed
attempt at fixing an error seen because purged tombstones were not
terminated by the compactor.
This patch corrects this mistake by fixing the above problem properly:
on page-cut, if the validator has an active tombstone, a closing
tombstone is generated for it, to avoid the false-positive error. With
this, range tombstones can be validated again as they come in.
The fragment reader currently unconditionally forwards its buffer to the
passed-in partition range. Even if this range is
`query::full_partition_range`, this will involve dropping any fragments
up to the first partitions tart. This causes problems for test users who
intentionally create invalid fragment streams, that don't start with a
partition-start.
Refactor the reader to not do any modifications on the stream, when
neither slice, nor partition-range was passed by the user.
before this change, there are chances that the temporary sstables
created for collecting the GC-able data create by a certain
compaction can be picked up by another compaction job. this
wastes the CPU cycles, adds write amplification, and causes
inefficiency.
in general, these GC-only SSTables are created with the same run id
as those non-GC SSTables, but when a new sstable exhausts input
sstable(s), we proactively replace the old main set with a new one
so that we can free up the space as soon as possible. so the
GC-only SSTables are added to the new main set along with
the non-GC SSTables, but since the former have good chance to
overlap the latter. these GC-only SSTables are assigned with
different run ids. but we fail to register them to the
`compaction_manager` when replacing the main sstable set.
that's why future compactions pick them up when performing compaction,
when the compaction which created them is not yet completed.
so, in this change,
* to prevent sstables in the transient stage from being picked
up by regular compactions, a new interface class is introduced
so that the sstable is always added to registration before
it is added to sstable set, and removed from registration after
it is removed from sstable set. the struct helps to consolidate
the regitration related logic in a single place, and helps to
make it more obvious that the timespan of an sstable in
the registration should cover that in the sstable set.
* use a different run_id for the gc sstable run, as it can
overlap with the output sstable run. the run_id for the
gc sstable run is created only when the gc sstable writer
is created. because the gc sstables is not always created
for all compactions.
please note, all (indirect) callers of
`compaction_task_executor::compact_sstables()` passes a non-empty
`std::function` to this function, so there is no need to check for
empty before calling it. so in this change, the check is dropped.
Fixes#14560
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14725
Since repair is performed on all nodes, each node can just repair the
primary ranges instead of all owned ranges. This avoids repair ranges
more than once.
Closes#14766
Commit f5e3b8df6d introduced an optimization for
as_mutation_source_excluding_staging() and added a test that
verifies correctness of single key and range reads based
on supplied predicates. This new test aims to improve the
coverage by testing directly both table::as_mutation_source()
and as_mutation_source_excluding_staging(), therefore
guaranteeing that both supply the correct predicate to
sstable set.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#14763
let's use RAII to remove the object use as a fixture, so we don't
leave some object in the bucket for testing. this might interfere
with other tests which share the same minio server with the test
which fails to do its clean up if an exception is thrown.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
let's use RAII to tear down the client and the input file, so we can
always perform the cleanups even if the test throws.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Now when there's always-false local variable, it can also be dropped and
all the associated if-else branches can be simplified
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The set in question is always empty, so it can be removed and the only
check for its contents can be constified
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
`expression`'s default constructor is dangerous as an it can leak
into computations and generate surprising results. Fix that by
removing the default constructor.
This is made somewhat difficult by the parser generator's reliance
on default construction, and we need to expand our workaround
(`uninitialized<>`) capabilities to do so.
We also remove some incidental uses of default-constructed expressions.
Closes#14706
* github.com:scylladb/scylladb:
cql3: expr: make expression non-default-constructible
cql3: grammar: don't default-construct expressions
cql3: grammar: improve uninitialized<> flexibility
cql3: grammar: adjust uninitialized<> wrapper
test: expr_test: don't invoke expression's default constructor
cql3: statement_restrictions: explicitly initialize expressions in index match code
cql3: statement_restrictions: explicitly intitialize some expression fields
cql3: statement_restrictions: avoid expression's default constructor when classifying restrictions
cql3: expr: prepare_expression: avoid default-constructed expression
cql3: broadcast_tables: prepare new_value without relying on expression default constructor
This template call is only used by system keyspace paxos methods. All
those methods are no longer static and can use system_keyspace::_qp
reference to real query processor instead of global qctx. The
execute_cql_with_timeout() wrapper is moved to system_keyspace to make
it work
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The service::paxos_state methods that call those already have system
keyspace reference at hand and can call method on an object
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The paxos_state's .prepare(), .accept(), .learn() and .prune() methods
access system keyspace via its static methods. The only caller of those
(storage_proxy::remote) already has the sharded system k.s. reference
and can pass its .local() one as argument
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some test cases that use cql_test_env involve paxos state updates. Since
this update is becoming via proxy->remote->system_keyspace those test
cases need cql_test_env to initialize the remote part of the proxy too
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>