We add two test cases that test the new CDC generation publisher
to detect potential bugs like incorrect order of publications or
not publishing some generations at all.
The purpose of the second test case --
test_multiple_unpublished_cdc_generations -- is to enforce and test
a scenario when there are multiple unpublished CDC generations at
the same time. We expect that this is a rare case. The main fiber
of the topology coordinator would have to make much more progress
(like finishing two bootstraps) than the CDC generation publisher
fiber. Since multiple unpublished CDC generations might never
appear in other tests but could be handled incorrectly, having
such a test is valuable.
Currently, the topology coordinator has the
topology::transition_state::publish_cdc_generation state
responsible for publishing the already created CDC generations
to the user-facing description tables. This process cannot fail
as it would cause some CDC updates to be missed. On the other
hand, we would like to abort the publish_cdc_generation state when
bootstrap aborts. Of course, we could also wait until handling this
state finishes, even in the case of the bootstrap abort, but that
would be inefficient. We don't want to unnecessarily block topology
operations by publishing CDC generations.
The solution is to remove the publish_cdc_generation state
completely and introduce a new background fiber of the topology
coordinator -- cdc_generation_publisher -- that continually
publishes committed CDC generations.
The implementation of the CDC generation publisher is very similar
to the main fiber of the topology coordinator. One noticeable
difference is that we don't catch raft::commit_status_unknown,
which is handled raft_group0_client::add_entry.
Note that this modification changes the Raft-based topology a bit.
Previously, the publish_cdc_generation state had to end before
entering the next state -- write_both_read_old. Now, committed
CDC generations can theoretically be published at any time.
Although it is correct because the following states don't depend on
publish_cdc_generation, it can cause problems in tests. For example,
we can't assume now that a CDC generation is published just because
the bootstrap operation has finished.
We extend service::topology with the list of unpublished CDC
generations and load its contents from system.topology. This step
is the last one in making unpublished CDC generations accessible
to the topology coordinator.
Note that when we load unpublished_cdc_generations, we don't
perform any sanity checks contrary to current_cdc_generation_uuid.
Every unpublished CDC generation was a current generation once,
and we checked it at that moment.
We add committed CDC generations to unpublished_cdc_generations
so that we can load them to topology and properly handle them
in the following commits.
In the following commits, we replace the
topology::transition_state::publish_cdc_generation state with
a background fiber that continually publishes committed CDC
generations. To make these generations accessible to the
topology coordinator, we store them in the new column of
system.topology -- unpublished_cdc_generations.
If a test isn't going to use task manager or isn't interested in
statuses of finished tasks, then keeping them in the memory
for some time (currently 10s by default) after they are finished
is a memory waste.
Set default task_ttl value to zero. It can be changed by setting
--task-ttl-in-seconds or through rest api (/task_manager/ttl).
In conf/scylla.yaml set task-ttl-in-seconds to 10.
Closes#15239
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda with seastar::async(). The cql env already provides a helper for that
Closes#15305
* github.com:scylladb/scylladb:
cql_query_test: Fix indentation after previous patch
cql_query_test: Use do_with_cql_env_thread() explicitly
This code was supposed to be moved into
`mutate_live_and_unreachable_endpoints`
in 2c27297dbd
but it looks like the original statements were left
in place outside the mutate function.
This patch just removes the stale code since the required
logic is already done inside `mutate_live_and_unreachable_endpoints`.
Fixesscylladb/scylladb#15296
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#15304
The Alternator tests can run against HTTPS - namely when using
test/alternator/run with the "--https" option (local Alternator
configured with HTTPS) or "--aws" option (DynamoDB, using HTTPS).
In some cases we make these HTTPS requests with verify=False, to avoid
checking the SSL certificates. E.g., this is necessary for Alternator
with a self-signed certificate. Unfortunately, the urllib3 library adds
an ugly warning message when SSL certificate verification is disabled.
In the past we tried to disable these warnings, using the documented
urllib3.disable_warnings() function, but it didn't help. It turns out
that pytest has its own warning handling, so to disable warnings in
pytest we must say so in a special configuration parameter in pytest.ini.
So in this patch, we drop the disable_warnings call from conftest.py
(where it didn't help), and instead put a similar declaration in
pytest.ini. The disable_warnings call in the test/alternator/run
script needs to remain - it is run outside pytest, so pytest.ini
doesn't affect it.
After this patch, running test/alternator/run with --https or --aws
finishes without warnings, as desired.
Fixes#15287
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#15292
Since 5d1f60439a we have
this node's host_id in topology config, so it can be used
to determine this node when adding it.
Prepare for extending the token_metadata interface
to provide host_id in update_topology.
We would like to compare the host_id first to be able to distinguish
this node from a node we're replacing that may have the same ip address
(but different host_id).
Closes#15297
* github.com:scylladb/scylladb:
locator: topology: is_configured_this_node: delete spurious semicolumn
locator: topology: is_configured_this_node: compare host_id first
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda
with seastar::async(). The cql env already provides a helper for that
Indentation is deliberately left broken until next patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
before this change, when running into a zero chunk_len, scylla
crashes with `assert(chunk_size != 0)`. but we can do better than
printing a backtrace like:
```
scylla: sstables/compress.cc:158: void
sstables::compression::segmented_offsets::init(uint32_t): Assertion `chunk_size != 0' failed.
```
so, in this change, a `malformed_sstable_exception` is throw in place
of an `assert()`, which is supposed to verify the programming
invariants, not for identifying corrupted data file.
Fixes#15265
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15264
Passing the gate_closed_exception to the task promise
ends up with abandoned exception since no-one is waiting
for it.
Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.
Fixes scylladb/scylladb#15211
In addition, this series adds a private abort_source for each task_manager module
(chained to the main task_manager::abort_source) and abort is requested on task_manager::module::stop().
gate holding in compaction_manager is hardened
and makes sure to stop compaction_manager and task_manager in sstable_compaction_test cases.
Closes#15213
* github.com:scylladb/scylladb:
compaction_manager: stop: close compaction_state:s gates
compaction_manager: gracefully handle gate close
task_manager: task: start: fixup indentation
task_manager: module: make_task: enter gate when the task is created
task_manaer: module: stop: request abort
task_manager: task::impl: subscribe to module about_source
test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
test: simple_backlog_controller_test: stop compaction and task managers
Since 5d1f60439a we have
this node's host_id in topology config, so it can be used
to determine this node when adding it.
Prepare for extending the token_metadata interface
to provide host_id in update_topology.
We would like to compare the host_id first to be able to distinguish
this node from a node we're replacing that may have the same ip address
(but different host_id).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Improved the coverage of the tests for the list_append() function
in UpdateExpression - test that if one of its arguments is not a list,
including a missing attribute or item, it is reported as an error as
expected.
The new tests pass on both Alternator and DynamoDB.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#15291
instead of flattening the functions into the script, let's structure them into functions. so they can be reused. and more maintainable this way.
Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15242
* github.com:scylladb/scylladb:
build: early return when appropriate
build: extract generate_compdb() out
Right now, the function allows for passing the path to a file as a seastar::sstring,
which is then converted to std::filesystem::path -- implicitly to the caller.
However, the function performs I/O, and there is no reason to accept any other type
than std::filesystem::path, especially because the conversion is straightforward.
Callers can perform it on their own.
This commit introduces the more constrained API.
Closes#15266
This series cleans up gossiper.
Methods that do not change the gossiper object are marked as const.
Dead code is removed.
Closes#15272
* github.com:scylladb/scylladb:
gossiper: get_current* methods: mark as const
gossiper: get_generation_for_nodes: mark as const
gossiper: examine_gossiper: mark as const
gossiper: request_all, send_all: mark as const
gossiper: do_on_*notifications: mark as const
utils: atomic_vector: mark for_each functions as const
gossiper: compare_endpoint_startup: mark as const
gossiper: get_state_for_version_bigger_than: mark as const
gossiper: make_random_gossip_digest: delete dead legacy code
gossiper: make_random_gossip_digest: mark as const
gossiper: do_sort: mark as const
gossiper: is* methods: mark as const
gossiper: wait_for_gossip and friends: mark as const
gossiper: drop unused dump_endpoint_state_map
gossiper: remove unused shadow version members
"experimental" option was marked "Unused" in 64bc8d2f7d. but we
chose to keep it in hope that the upgrade test does not fail.
despite that the upgrade tests per-se survived the "upgrade",
after the upgrade, the tests exercising the experimental features
are still failing hard. they have not been updated to set the
"experimental-features" option, and are still relying on
"experimental" to enable all the experimental features under
test.
so, in this change, let's just drop the option so that
scylla can fail early at seeing this "experimental" option.
this should help us to identify the tests relying on it
quicker. as the "experimental" features should only be used
in development environment, this change should have no impact
to production.
Refs #15214
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15233
Make sure the compaction_state:s are idle before
they are destroyed. Although all tasks are stopped
in stop_ongoing_compactions, make sure there is
fiber holding the compaction_state gate.
compaction_manager::remove now needs to close the
compaction_state gate and to stop_ongoing_compactions
only if the gate is not closed yet.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Check if the compaction_state gate is closed
along with _state != state::enabled and return early
in this case.
At this point entering the gate is guaranteed to succeed.
So enter the gate before calling `perform_compaction`
keeping the std::optional<gate_holder> throughout
the compaction task.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.
Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.
Fixesscylladb/scylladb#15211
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Have a private about_source for every module
and request abort on stop() to signal all outstanding
tasks to abort (especially when they are sleeping
for the task_ttl).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather to the top-level task_manager about_source,
to provide separation between task_manager modules
so each one can be aborted and stopped independentally
of the others (in the next patch).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The s.service since d42685d0cb is having on-board query processor ref^w
pointer and can use it to join cluster
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#15236
instead of flattening the functions into the script, let's structure
them into functions. so they can be reused. and more maintainable
this way.
Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Replacing `minimum_keyspace_rf` config option with 4 config options:
`{minimum,maximum}_replication_factor_{warn,fail}_threshold`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on RF when creating/altering a keyspace.
The reason to rather replace than extend `minimum_keyspace_rf` config
option is to be aligned with Cassandra, which did the same, and has the
same parameters' names.
Only min soft limit is enabled by default and it is set to 3, which means
that we'll generate a CQL warning whenever RF is set to either 1 or 2.
RF's value of 0 is always allowed and means that there will not be any
replicas on a given DC. This was agreed with PM.
Because we don't allow to change guardrails' values when scylla is
running (per PM), there're no tests provided with this PR, and dtests will be
provided separately.
Exceeding guardrails' thresholds will be tracked by metrics.
Resolves#8619
Refs #8892 (the RF part, not the replication-strategy part)
Closes#14262
We need to const_cast `this` since the const
container() has no const invoke_on override.
Trying to fix this in seastar sharded.hh breaks
many other call sites in scylla.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
They only need to access the _vec_lock rwlock
so mark it as mutable, but otherwise they provide a const
interface to the calls, as the called func receives
the entries by value and it cannot modify them.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>