Commit Graph

38666 Commits

Author SHA1 Message Date
Patryk Jędrzejczak
4ee68a47bb test: add test_cdc_generation_publishing
We add two test cases that test the new CDC generation publisher
to detect potential bugs like incorrect order of publications or
not publishing some generations at all.

The purpose of the second test case --
test_multiple_unpublished_cdc_generations -- is to enforce and test
a scenario when there are multiple unpublished CDC generations at
the same time. We expect that this is a rare case. The main fiber
of the topology coordinator would have to make much more progress
(like finishing two bootstraps) than the CDC generation publisher
fiber. Since multiple unpublished CDC generations might never
appear in other tests but could be handled incorrectly, having
such a test is valuable.
2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak
2643ccc70e docs: remove information about publish_cdc_generation
We update documentation after replacing the
topology::transition_state::publish_cdc_generation state with
the CDC generation publisher fiber.
2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak
fc1ee2cc14 raft topology: introduce the CDC generation publisher
Currently, the topology coordinator has the
topology::transition_state::publish_cdc_generation state
responsible for publishing the already created CDC generations
to the user-facing description tables. This process cannot fail
as it would cause some CDC updates to be missed. On the other
hand, we would like to abort the publish_cdc_generation state when
bootstrap aborts. Of course, we could also wait until handling this
state finishes, even in the case of the bootstrap abort, but that
would be inefficient. We don't want to unnecessarily block topology
operations by publishing CDC generations.

The solution is to remove the publish_cdc_generation state
completely and introduce a new background fiber of the topology
coordinator -- cdc_generation_publisher -- that continually
publishes committed CDC generations.

The implementation of the CDC generation publisher is very similar
to the main fiber of the topology coordinator. One noticeable
difference is that we don't catch raft::commit_status_unknown,
which is handled raft_group0_client::add_entry.

Note that this modification changes the Raft-based topology a bit.
Previously, the publish_cdc_generation state had to end before
entering the next state -- write_both_read_old. Now, committed
CDC generations can theoretically be published at any time.
Although it is correct because the following states don't depend on
publish_cdc_generation, it can cause problems in tests. For example,
we can't assume now that a CDC generation is published just because
the bootstrap operation has finished.
2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak
d404443b54 system_keyspace: load unpublished_cdc_generations to topology
We extend service::topology with the list of unpublished CDC
generations and load its contents from system.topology. This step
is the last one in making unpublished CDC generations accessible
to the topology coordinator.

Note that when we load unpublished_cdc_generations, we don't
perform any sanity checks contrary to current_cdc_generation_uuid.
Every unpublished CDC generation was a current generation once,
and we checked it at that moment.
2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak
bc726a066f raft topology: mark committed CDC generations as unpublished
We add committed CDC generations to unpublished_cdc_generations
so that we can load them to topology and properly handle them
in the following commits.
2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak
5ed9d4db6d raft topology: add unpublished_cdc_generations to system.topology
In the following commits, we replace the
topology::transition_state::publish_cdc_generation state with
a background fiber that continually publishes committed CDC
generations. To make these generations accessible to the
topology coordinator, we store them in the new column of
system.topology -- unpublished_cdc_generations.
2023-09-08 09:05:01 +02:00
Aleksandra Martyniuk
8a65477202 tasks: db: change default task_ttl value
If a test isn't going to use task manager or isn't interested in
statuses of finished tasks, then keeping them in the memory
for some time (currently 10s by default) after they are finished
is a memory waste.

Set default task_ttl value to zero. It can be changed by setting
--task-ttl-in-seconds or through rest api (/task_manager/ttl).

In conf/scylla.yaml set task-ttl-in-seconds to 10.

Closes #15239
2023-09-07 12:42:29 +03:00
Nadav Har'El
42e26ab13b Merge 'Explicitly use do_with_cql_env_thread in query test' from Pavel Emelyanov
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda with seastar::async(). The cql env already provides a helper for that

Closes #15305

* github.com:scylladb/scylladb:
  cql_query_test: Fix indentation after previous patch
  cql_query_test: Use do_with_cql_env_thread() explicitly
2023-09-07 11:54:54 +03:00
Benny Halevy
c5e4dace8e gossiper: real_mark_alive: do not erase from unreachable_endpoints without holding lock
This code was supposed to be moved into
`mutate_live_and_unreachable_endpoints`
in 2c27297dbd
but it looks like the original statements were left
in place outside the mutate function.

This patch just removes the stale code since the required
logic is already done inside `mutate_live_and_unreachable_endpoints`.

Fixes scylladb/scylladb#15296

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #15304
2023-09-07 10:02:49 +02:00
Nadav Har'El
c52e0fd333 test/alternator: avoid warnings about unverified HTTPS
The Alternator tests can run against HTTPS - namely when using
test/alternator/run with the "--https" option (local Alternator
configured with HTTPS) or "--aws" option (DynamoDB, using HTTPS).

In some cases we make these HTTPS requests with verify=False, to avoid
checking the SSL certificates. E.g., this is necessary for Alternator
with a self-signed certificate. Unfortunately, the urllib3 library adds
an ugly warning message when SSL certificate verification is disabled.

In the past we tried to disable these warnings, using the documented
urllib3.disable_warnings() function, but it didn't help. It turns out
that pytest has its own warning handling, so to disable warnings in
pytest we must say so in a special configuration parameter in pytest.ini.

So in this patch, we drop the disable_warnings call from conftest.py
(where it didn't help), and instead put a similar declaration in
pytest.ini. The disable_warnings call in the test/alternator/run
script needs to remain - it is run outside pytest, so pytest.ini
doesn't affect it.

After this patch, running test/alternator/run with --https or --aws
finishes without warnings, as desired.

Fixes #15287

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15292
2023-09-07 07:23:57 +03:00
Tomasz Grabiec
dd57c53328 Merge 'Topology: use this host_id in is_configured_this_node' from Benny Halevy
Since 5d1f60439a we have
this node's host_id in topology config, so it can be used
to determine this node when adding it.

Prepare for extending the token_metadata interface
to provide host_id in update_topology.

We would like to compare the host_id first to be able to distinguish
this node from a node we're replacing that may have the same ip address
(but different host_id).

Closes #15297

* github.com:scylladb/scylladb:
  locator: topology: is_configured_this_node: delete spurious semicolumn
  locator: topology: is_configured_this_node: compare host_id first
2023-09-06 22:13:29 +02:00
Pavel Emelyanov
9da4668c71 cql_query_test: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-06 16:54:25 +03:00
Pavel Emelyanov
84e30ab56c cql_query_test: Use do_with_cql_env_thread() explicitly
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda
with seastar::async(). The cql env already provides a helper for that

Indentation is deliberately left broken until next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-06 16:54:14 +03:00
Kefu Chai
1ed894170c sstables: throw at seeing invalid chunk_len
before this change, when running into a zero chunk_len, scylla
crashes with `assert(chunk_size != 0)`. but we can do better than
printing a backtrace like:
```
scylla: sstables/compress.cc:158: void
sstables::compression::segmented_offsets::init(uint32_t): Assertion `chunk_size != 0' failed.
```
so, in this change, a `malformed_sstable_exception` is throw in place
of an `assert()`, which is supposed to verify the programming
invariants, not for identifying corrupted data file.

Fixes #15265
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15264
2023-09-06 14:20:38 +03:00
Nadav Har'El
5930637ad8 Merge 'task_manager: module: make_task: enter gate when the task is created' from Benny Halevy
Passing the gate_closed_exception to the task promise
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes scylladb/scylladb#15211

In addition, this series adds a private abort_source for each task_manager module
(chained to the main task_manager::abort_source) and abort is requested on task_manager::module::stop().

gate holding in compaction_manager is hardened
and makes sure to stop compaction_manager and task_manager in sstable_compaction_test cases.

Closes #15213

* github.com:scylladb/scylladb:
  compaction_manager: stop: close compaction_state:s gates
  compaction_manager: gracefully handle gate close
  task_manager: task: start: fixup indentation
  task_manager: module: make_task: enter gate when the task is created
  task_manaer: module: stop: request abort
  task_manager: task::impl: subscribe to module about_source
  test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
  test: simple_backlog_controller_test: stop compaction and task managers
2023-09-06 13:29:26 +03:00
Benny Halevy
574c7e349a locator: topology: is_configured_this_node: delete spurious semicolumn
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-06 12:24:09 +03:00
Benny Halevy
115462be17 locator: topology: is_configured_this_node: compare host_id first
Since 5d1f60439a we have
this node's host_id in topology config, so it can be used
to determine this node when adding it.

Prepare for extending the token_metadata interface
to provide host_id in update_topology.

We would like to compare the host_id first to be able to distinguish
this node from a node we're replacing that may have the same ip address
(but different host_id).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-06 12:24:09 +03:00
Nadav Har'El
cfc70810d3 test/alternator: more error-path tests for list_append() function
Improved the coverage of the tests for the list_append() function
in UpdateExpression - test that if one of its arguments is not a list,
including a missing attribute or item, it is reported as an error as
expected.

The new tests pass on both Alternator and DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15291
2023-09-06 11:59:54 +03:00
Avi Kivity
f594175042 Merge 'build: extract generate_compdb() out' from Kefu Chai
instead of flattening the functions into the script, let's structure them into functions. so they can be reused. and more maintainable this way.

Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15242

* github.com:scylladb/scylladb:
  build: early return when appropriate
  build: extract generate_compdb() out
2023-09-05 20:54:06 +03:00
Dawid Medrek
c7fe5d7f94 utils/lister: Limit the API of scan_dir() to fs::path
Right now, the function allows for passing the path to a file as a seastar::sstring,
which is then converted to std::filesystem::path -- implicitly to the caller.
However, the function performs I/O, and there is no reason to accept any other type
than std::filesystem::path, especially because the conversion is straightforward.
Callers can perform it on their own.

This commit introduces the more constrained API.

Closes #15266
2023-09-05 20:50:42 +03:00
Nadav Har'El
1cbe60a7e3 Update seastar submodule
* seastar 6e80e84a...576ee47d (9):
  > http/client: Add "total new connections" metrics
  > semaphore: initialize wait_list in move ctor

Fixes #15253
Fixes #15263

  > tutorial: Add a missing argument in code example
  > sstring: format sstring without implicitly conversion
  > coroutine: Add a necessary include in generator.hh
  > tls: Move server name into tls_options
  > net/arp|ip: fix unused param warning in forward virtual method
  > net/ethernet: fix unused param ethernet_address::adjust_endianness
  > tls: Optionally skip client EOF wait

Closes #15273
2023-09-05 17:07:08 +03:00
Pavel Emelyanov
1ef4ba196b Merge 'Gossiper: mark const methods and remove dead code' from Benny Halevy
This series cleans up gossiper.
Methods that do not change the gossiper object are marked as const.
Dead code is removed.

Closes #15272

* github.com:scylladb/scylladb:
  gossiper: get_current* methods: mark as const
  gossiper: get_generation_for_nodes: mark as const
  gossiper: examine_gossiper: mark as const
  gossiper: request_all, send_all: mark as const
  gossiper: do_on_*notifications: mark as const
  utils: atomic_vector: mark for_each functions as const
  gossiper: compare_endpoint_startup: mark as const
  gossiper: get_state_for_version_bigger_than: mark as const
  gossiper: make_random_gossip_digest: delete dead legacy code
  gossiper: make_random_gossip_digest: mark as const
  gossiper: do_sort: mark as const
  gossiper: is* methods: mark as const
  gossiper: wait_for_gossip and friends: mark as const
  gossiper: drop unused dump_endpoint_state_map
  gossiper: remove unused shadow version members
2023-09-05 13:47:29 +03:00
Kefu Chai
f6cca741ea config: remove "experimental" option
"experimental" option was marked "Unused" in 64bc8d2f7d. but we
chose to keep it in hope that the upgrade test does not fail.
despite that the upgrade tests per-se survived the "upgrade",
after the upgrade, the tests exercising the experimental features
are still failing hard. they have not been updated to set the
"experimental-features" option, and are still relying on
"experimental" to enable all the experimental features under
test.

so, in this change, let's just drop the option so that
scylla can fail early at seeing this "experimental" option.
this should help us to identify the tests relying on it
quicker. as the "experimental" features should only be used
in development environment, this change should have no impact
to production.

Refs #15214
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15233
2023-09-05 10:09:04 +03:00
Benny Halevy
cfecb68245 compaction_manager: stop: close compaction_state:s gates
Make sure the compaction_state:s are idle before
they are destroyed. Although all tasks are stopped
in stop_ongoing_compactions, make sure there is
fiber holding the compaction_state gate.

compaction_manager::remove now needs to close the
compaction_state gate and to stop_ongoing_compactions
only if the gate is not closed yet.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
96055414c7 compaction_manager: gracefully handle gate close
Check if the compaction_state gate is closed
along with _state != state::enabled and return early
in this case.

At this point entering the gate is guaranteed to succeed.
So enter the gate before calling `perform_compaction`
keeping the std::optional<gate_holder> throughout
the compaction task.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
a5b7f1a275 task_manager: task: start: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
f9a7635390 task_manager: module: make_task: enter gate when the task is created
Passing the gate_closed_exception to the task promise in start()
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes scylladb/scylladb#15211

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
51792d2292 task_manaer: module: stop: request abort
Have a private about_source for every module
and request abort on stop() to signal all outstanding
tasks to abort (especially when they are sleeping
for the task_ttl).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
d7205db863 task_manager: task::impl: subscribe to module about_source
Rather to the top-level task_manager about_source,
to provide separation between task_manager modules
so each one can be aborted and stopped independentally
of the others (in the next patch).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
062684eb1f test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
b9127f55ac test: simple_backlog_controller_test: stop compaction and task managers
The compaction_manager and task_manager should
be orderly stopped before they are destroyed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Pavel Emelyanov
13a0c29618 storage_service: Remove query processor arg from join_cluster()
The s.service since d42685d0cb is having on-board query processor ref^w
pointer and can use it to join cluster

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #15236
2023-09-05 07:30:37 +03:00
Kefu Chai
ea91342d4b build: early return when appropriate
less intentation for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-05 12:14:02 +08:00
Kefu Chai
ce5f7d36cd build: extract generate_compdb() out
instead of flattening the functions into the script, let's structure
them into functions. so they can be reused. and more maintainable
this way.

Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-05 12:14:02 +08:00
Piotr Smaroń
eb46f1bd17 guardrails: restrict replication factor (RF)
Replacing `minimum_keyspace_rf` config option with 4 config options:
`{minimum,maximum}_replication_factor_{warn,fail}_threshold`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on RF when creating/altering a keyspace.
The reason to rather replace than extend `minimum_keyspace_rf` config
option is to be aligned with Cassandra, which did the same, and has the
same parameters' names.
Only min soft limit is enabled by default and it is set to 3, which means
that we'll generate a CQL warning whenever RF is set to either 1 or 2.
RF's value of 0 is always allowed and means that there will not be any
replicas on a given DC. This was agreed with PM.
Because we don't allow to change guardrails' values when scylla is
running (per PM), there're no tests provided with this PR, and dtests will be
provided separately.
Exceeding guardrails' thresholds will be tracked by metrics.

Resolves #8619
Refs #8892 (the RF part, not the replication-strategy part)

Closes #14262
2023-09-04 19:22:17 +03:00
Benny Halevy
04ba560b8d gossiper: get_current* methods: mark as const
We need to const_cast `this` since the const
container() has no const invoke_on override.
Trying to fix this in seastar sharded.hh breaks
many other call sites in scylla.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:18:04 +03:00
Benny Halevy
43d883c5aa gossiper: get_generation_for_nodes: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:17:38 +03:00
Benny Halevy
cfe0ec2203 gossiper: examine_gossiper: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:17:25 +03:00
Benny Halevy
ce05bbe32f gossiper: request_all, send_all: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:16:19 +03:00
Benny Halevy
cc1d5771e5 gossiper: do_on_*notifications: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:16:10 +03:00
Benny Halevy
eb51b70e6d utils: atomic_vector: mark for_each functions as const
They only need to access the _vec_lock rwlock
so mark it as mutable, but otherwise they provide a const
interface to the calls, as the called func receives
the entries by value and it cannot modify them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:14:38 +03:00
Benny Halevy
963d6fb009 gossiper: compare_endpoint_startup: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:14:22 +03:00
Benny Halevy
2899e07572 gossiper: get_state_for_version_bigger_than: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:13:02 +03:00
Benny Halevy
87ac1a26f2 gossiper: make_random_gossip_digest: delete dead legacy code
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:12:51 +03:00
Benny Halevy
33f004587e gossiper: make_random_gossip_digest: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:12:43 +03:00
Benny Halevy
02e8fdc4b8 gossiper: do_sort: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:11:56 +03:00
Benny Halevy
482963b2c4 gossiper: is* methods: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:11:00 +03:00
Benny Halevy
f7eddf0322 gossiper: wait_for_gossip and friends: mark as const
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:09:15 +03:00
Benny Halevy
044a696aca gossiper: drop unused dump_endpoint_state_map
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:09:04 +03:00
Benny Halevy
083506d479 gossiper: remove unused shadow version members
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-04 16:08:25 +03:00