Commit Graph

5519 Commits

Author SHA1 Message Date
Botond Dénes
b062b245ad Merge 'Don't cache dc:rack on system keyspace local cache' from Pavel Emelyanov
The local node's dc:rack pair is cached on system keyspace on start. However, most of other code don't need it as they get dc:rack from topology or directly from snitch. There are few places left that still mess with sysks cache, but they are easy to patch. So after this patch all the core code uses two sources of dc:rack -- topology / snitch -- instead of three.

Closes #15280

* github.com:scylladb/scylladb:
  system_keyspace: Don't require snitch argument on start
  system_keyspace: Don't cache local dc:rack pair
  system_keyspace: Save local info with explicit location
  storage_service: Get endpoint location from snitch, not system keyspace
  snitch: Introduce and use get_location() method
  repair: Local location variables instead of system keyspace's one
  repair: Use full endpoint location instead of datacenter part
2023-09-11 10:26:26 +03:00
Nadav Har'El
ea56c8efcd test/alternator: reduce code duplication in test for list_append()
A reviewer noted that test_update_expression_list_append_non_list_arguments
has too much code duplication - the same long API call to run
"SET a = list_append(...)" was repeated many times.

So in this patch we add a short inner function "try_list_append" to
avoid this duplication.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes: #15298
2023-09-11 10:09:35 +03:00
Botond Dénes
7385f93816 Merge 'Task manager repair tasks progress' from Aleksandra Martyniuk
Find progress of repair tasks based on the number of ranges
that have been repaired.

Fixes: [#1156](https://github.com/scylladb/scylla-enterprise/issues/1156).

Closes #14698

* github.com:scylladb/scylladb:
  test: repair tasks test
  repair: add methods making repair progress more precise
  tasks: make progress related methods virtual
  repair: add get_progress method to shard_repair_task_impl
  repair: add const noexcept qualifiers to shard_repair_task_impl::ranges_size()
  repair: log a name of a particular table repair is working on
  tasks: delete move and copy constructors from task_manager::task::impl
2023-09-11 09:32:23 +03:00
Kamil Braun
26d9a82636 Merge 'raft topology: replace publish_cdc_generation with a bg fiber' from Patryk Jędrzejczak
Currently, the topology coordinator has the
`topology::transition_state::publish_cdc_generation` state responsible
for publishing the already created CDC generations to the user-facing
description tables. This process cannot fail as it would cause some CDC
updates to be missed. On the other hand, we would like to abort the
`publish_cdc_generation` state when bootstrap aborts. Of course, we
could also wait until handling this state finishes, even in the case of
the bootstrap abort, but that would be inefficient. We don't want to
unnecessarily block topology operations by publishing CDC generations.

The solution proposed by this PR is to remove the
`publish_cdc_generation` state completely and introduce a new background
fiber of the topology coordinator -- `cdc_generation_publisher` -- that
continually publishes committed CDC generations.

Apart from introducing the CDC generation publisher, we add
`test_cdc_generation_publishing.py` that verifies its correctness and we
adapt other CDC tests to the new changes.

Fixes #15194

Closes #15281

* github.com:scylladb/scylladb:
  test: test_cdc: introduce wait_for_first_cdc_generation
  test: move cdc_streams_check_and_repair check
  test: add test_cdc_generation_publishing
  docs: remove information about publish_cdc_generation
  raft topology: introduce the CDC generation publisher
  system_keyspace: load unpublished_cdc_generations to topology
  raft topology: mark committed CDC generations as unpublished
  raft topology: add unpublished_cdc_generations to system.topology
2023-09-08 15:08:41 +02:00
Kamil Braun
8bff5843b5 Merge 'test: topology: add tests for gossiper/endpoint/live and gossiper/endpoint/down' from Aleksandra Martyniuk
Add tests for gossiper/endpoint/live and gossiper/endpoint/down
which run only in release mode.

Enable test_remove_node_with_concurrent_ddl and fix types and
variables names used by it, so that they can be reused in gossiper
test.

Fixes: #15223.

Closes #15244

* github.com:scylladb/scylladb:
  test: topology: add gossiper test
  test: fix types and variable names in wait_for_host_down
2023-09-08 12:43:11 +02:00
Patryk Jędrzejczak
23a4557662 test: test_cdc: introduce wait_for_first_cdc_generation
After introducing the CDC generation publisher,
test_cdc_log_entries_use_cdc_streams could (at least in theory)
fail by accessing system_distributed.cdc_streams_descriptions_v2
before the first CDC generation has been published.

To avoid flakiness, we simply wait until the first CDC generation
is published in a new function -- wait_for_first_cdc_generation.
2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak
3a2c080cbe test: move cdc_streams_check_and_repair check
The part of test_topology_ops that tests the
cdc_streams_check_and_repair request could (at least in theory)
fail on
`assert(len(gen_timestamps) + 1 == len(new_gen_timestamps))`
after introducing the CDC generation publisher because we can
no longer assume that all previously committed CDC generations
have been published before sending the request.

To prevent flakiness, we move this part of the test to
test_cdc_generations_are_published. This test allows for ensuring
that all previous CDC generations have been published.
Additionally, checking cdc_streams_check_and_repair there is
simpler and arguably fits the test better.
2023-09-08 09:05:01 +02:00
Patryk Jędrzejczak
4ee68a47bb test: add test_cdc_generation_publishing
We add two test cases that test the new CDC generation publisher
to detect potential bugs like incorrect order of publications or
not publishing some generations at all.

The purpose of the second test case --
test_multiple_unpublished_cdc_generations -- is to enforce and test
a scenario when there are multiple unpublished CDC generations at
the same time. We expect that this is a rare case. The main fiber
of the topology coordinator would have to make much more progress
(like finishing two bootstraps) than the CDC generation publisher
fiber. Since multiple unpublished CDC generations might never
appear in other tests but could be handled incorrectly, having
such a test is valuable.
2023-09-08 09:05:01 +02:00
Nadav Har'El
42e26ab13b Merge 'Explicitly use do_with_cql_env_thread in query test' from Pavel Emelyanov
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda with seastar::async(). The cql env already provides a helper for that

Closes #15305

* github.com:scylladb/scylladb:
  cql_query_test: Fix indentation after previous patch
  cql_query_test: Use do_with_cql_env_thread() explicitly
2023-09-07 11:54:54 +03:00
Nadav Har'El
c52e0fd333 test/alternator: avoid warnings about unverified HTTPS
The Alternator tests can run against HTTPS - namely when using
test/alternator/run with the "--https" option (local Alternator
configured with HTTPS) or "--aws" option (DynamoDB, using HTTPS).

In some cases we make these HTTPS requests with verify=False, to avoid
checking the SSL certificates. E.g., this is necessary for Alternator
with a self-signed certificate. Unfortunately, the urllib3 library adds
an ugly warning message when SSL certificate verification is disabled.

In the past we tried to disable these warnings, using the documented
urllib3.disable_warnings() function, but it didn't help. It turns out
that pytest has its own warning handling, so to disable warnings in
pytest we must say so in a special configuration parameter in pytest.ini.

So in this patch, we drop the disable_warnings call from conftest.py
(where it didn't help), and instead put a similar declaration in
pytest.ini. The disable_warnings call in the test/alternator/run
script needs to remain - it is run outside pytest, so pytest.ini
doesn't affect it.

After this patch, running test/alternator/run with --https or --aws
finishes without warnings, as desired.

Fixes #15287

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15292
2023-09-07 07:23:57 +03:00
Pavel Emelyanov
9da4668c71 cql_query_test: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-06 16:54:25 +03:00
Pavel Emelyanov
84e30ab56c cql_query_test: Use do_with_cql_env_thread() explicitly
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda
with seastar::async(). The cql env already provides a helper for that

Indentation is deliberately left broken until next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-06 16:54:14 +03:00
Nadav Har'El
5930637ad8 Merge 'task_manager: module: make_task: enter gate when the task is created' from Benny Halevy
Passing the gate_closed_exception to the task promise
ends up with abandoned exception since no-one is waiting
for it.

Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.

Fixes scylladb/scylladb#15211

In addition, this series adds a private abort_source for each task_manager module
(chained to the main task_manager::abort_source) and abort is requested on task_manager::module::stop().

gate holding in compaction_manager is hardened
and makes sure to stop compaction_manager and task_manager in sstable_compaction_test cases.

Closes #15213

* github.com:scylladb/scylladb:
  compaction_manager: stop: close compaction_state:s gates
  compaction_manager: gracefully handle gate close
  task_manager: task: start: fixup indentation
  task_manager: module: make_task: enter gate when the task is created
  task_manaer: module: stop: request abort
  task_manager: task::impl: subscribe to module about_source
  test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
  test: simple_backlog_controller_test: stop compaction and task managers
2023-09-06 13:29:26 +03:00
Nadav Har'El
cfc70810d3 test/alternator: more error-path tests for list_append() function
Improved the coverage of the tests for the list_append() function
in UpdateExpression - test that if one of its arguments is not a list,
including a missing attribute or item, it is reported as an error as
expected.

The new tests pass on both Alternator and DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15291
2023-09-06 11:59:54 +03:00
Aleksandra Martyniuk
c96224e97d test: topology: add gossiper test
Add tests for gossiper/endpoint/live and gossiper/endpoint/down
which run only in release mode.
2023-09-05 15:04:26 +02:00
Aleksandra Martyniuk
ede8182dd4 test: fix types and variable names in wait_for_host_down
Fix types and variable names in ManagerClient::wait_for_host_down
and related methods.
2023-09-05 15:01:59 +02:00
Pavel Emelyanov
5d52a35e05 system_keyspace: Don't require snitch argument on start
Now system keyspace is finally independent from snitch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-05 12:57:09 +03:00
Pavel Emelyanov
9926917bf5 system_keyspace: Save local info with explicit location
On boot system keyspace is kicked to insert local info into system.local
table. Among other things there's dc:rack pair which sys.ks. gets from
its cache which, in turn, should have been previously initialized from
snitch on sys.ks. start. This patch makes the local info updating method
get the dc:rack from caller via argument. Callers, in turn, call snitch
directly, because these are main and cql_test_env startup routines.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-05 12:54:46 +03:00
Kefu Chai
f6cca741ea config: remove "experimental" option
"experimental" option was marked "Unused" in 64bc8d2f7d. but we
chose to keep it in hope that the upgrade test does not fail.
despite that the upgrade tests per-se survived the "upgrade",
after the upgrade, the tests exercising the experimental features
are still failing hard. they have not been updated to set the
"experimental-features" option, and are still relying on
"experimental" to enable all the experimental features under
test.

so, in this change, let's just drop the option so that
scylla can fail early at seeing this "experimental" option.
this should help us to identify the tests relying on it
quicker. as the "experimental" features should only be used
in development environment, this change should have no impact
to production.

Refs #15214
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15233
2023-09-05 10:09:04 +03:00
Benny Halevy
062684eb1f test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Benny Halevy
b9127f55ac test: simple_backlog_controller_test: stop compaction and task managers
The compaction_manager and task_manager should
be orderly stopped before they are destroyed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-05 09:17:25 +03:00
Pavel Emelyanov
13a0c29618 storage_service: Remove query processor arg from join_cluster()
The s.service since d42685d0cb is having on-board query processor ref^w
pointer and can use it to join cluster

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #15236
2023-09-05 07:30:37 +03:00
Avi Kivity
9a3d57256a Merge 'config: add index_cache_fraction' from Michał Chojnowski
Index caching was disabled by default because it caused performance regressions
for some small-partition workloads. See https://github.com/scylladb/scylladb/issues/11202.

However, it also means that there are workloads which could benefit from the
index cache, but (by default) don't.

As a compromise, we can set a default limit on the memory usage of index cache,
which should be small enough to avoid catastrophic regressions in
small-partition workloads, but big enough to accommodate workloads where
index cache is obviously beneficial.

This series adds such a configurable limit, sets it to to 0.2 of total cache memory by default,
and re-enables index caching by default.

Fixes #15118

Closes #14994

* github.com:scylladb/scylladb:
  test: boost/cache_algorithm_test: add cache_algorithm_test
  sstables: partition_index_cache: deglobalize stats
  utils: cached_file: deglobalize cached_file metrics
  db: config: enable index caching by default
  config: add index_cache_fraction
  utils: lru: add move semantics to list links
2023-09-03 19:39:31 +03:00
Michał Chojnowski
bcc235ad5f test: boost/cache_algorithm_test: add cache_algorithm_test
The tests added in this patch validate that index_cache_fraction
does what it's supposed to do.
2023-09-01 22:34:41 +02:00
Michał Chojnowski
f00bed9429 sstables: partition_index_cache: deglobalize stats
Move partition_index_cache stats from a thread_local variable
to cache_tracker. After the change, partition_index_cache
receives a reference to the stats via constructor, instead of
referencing a global.

This is needed so that cache_tracker can know the memory usage
of index caches (for cache eviction purposes) without relying on
globals.

But it also makes sense even without that motive.
2023-09-01 22:34:41 +02:00
Michał Chojnowski
c7d9d35030 utils: cached_file: deglobalize cached_file metrics
Move cached_file metrics from a thread_local variable
to cache_tracker.

This is needed so that cache_tracker can know
the memory usage of index caches (for purposes
of cache eviction) without relying on globals.

But it also makes sense even without that motive.
2023-09-01 22:34:41 +02:00
Kamil Braun
117dedab19 Merge 'Cluster features on raft: topology coordinator + check on boot followups' from Piotr Dulikowski
This PR collects followups described in #14972:

- The `system.topology` table is now flushed every time feature-related
  columns are modified. This is done because of the feature check that
  happens before the schema commitlog is replayed.
- The implementation now guarantees that, if all nodes support some
  feature as described by the `supported_features` column, then support
  for that feature will not be revoked by any node.  Previously, in an
  edge case where a node is the last one to add support for some feature
  `X` in `supported_features` column, crashes before applying/persisting
  it and then restarts without supporting `X`, it would be allowed to boot
  anyway and would revoke support for the `X` in `system.topology`.
  The existing behavior, although counterintuitive, was safe - the
  topology coordinator is responsible for explicitly marking features as
  enabled, and in order to enable a feature it needs to perform a special
  kind of a global barrier (`barrier_after_feature_update`) which only
  succeeds after the node has updated its features column - so there is no
  risk of enabling an unsupported feature.  In order to make the behavior
  less confusing, the node now will perform a second check when it tries
  to update its `supported_features` column in `system.topology`.
- The `barrier_after_feature_update` is removed and the regular global
  `barrier` topology command is used instead. The `barrier` handler now
  performs a feature check if the node did not have a chance to verify and
  update its cluster features for the second time.

JOIN_NODE rpc will be sent separately as it is a big item on its own.

Fixes: #14972

Closes #15168

* github.com:scylladb/scylladb:
  test: topology{_experimental_raft}: don't stop gracefully in feature tests
  storage_service: remove _topology_updated_with_local_metadata
  topology_coordinator: remove barrier_after_feature_update
  topology_coordinator: perform feature check during barrier
  storage_service: repeat the feature check after read barrier
  feature_service: introduce unsupported_feature_exception
  feature_service: move startup feature check to a separate function
  topology_coordinator: account for features to enable in should_preempt_balancing
  group0_state_machine: flush system.topology when updating features columns
2023-09-01 11:52:26 +02:00
Botond Dénes
34d94fb549 test/cql-pytest/test_tools.py: improve tempdir usage for scrub tests
Scrub tests use a lot of temporary directories. This is suspected to
cause problems in some cases. To improve the situation, this patch:
* Creates a single root temporary directory for all scrub tests
* All further fixtures create their files/directories inside this root
  dir.
* All scrub tests create their temporary directories within this root
  dir.
* All temporary directories now use an appropriate "prefix", so we can
  tell which temporary directory is part of the problem if a test fails.

Refs: #14309

Closes #15117
2023-09-01 07:17:49 +03:00
Alexey Novikov
87fa7d0381 compact and remove expired range tombstones from cache on read
during read from cache compact and expire range tombstones
remove expired empty rows from cache

Refs #2252
Fixes #6033

Closes #14463
2023-09-01 07:17:49 +03:00
Piotr Dulikowski
5471330ee7 test: topology{_experimental_raft}: don't stop gracefully in feature tests
The current cluster feature tests are stopping nodes in a graceful way.
Doing it gracefully isn't strictly necessary for the test scenarios and
we can switch `server_stop_gracefully` calls to `server_stop`. This only
became possible after a previous commit which causes `system.topology`
table to be flushed when cluster feature columns are modified, and will
server as a good test for it.
2023-08-31 16:46:11 +02:00
Aleksandra Martyniuk
92fad5769a test: repair tasks test
Add tests checking whether repair tasks are properly structured and
their progress is gathered correctly.
2023-08-30 15:34:25 +02:00
Kamil Braun
0ee23b260e Merge 'raft topology: add and deprecate support for --ignore-dead-nodes with IPs' from Patryk Jędrzejczak
We want to stop supporting IPs for `--ignore-dead-nodes` in
`raft_removenode` and `--ignore-dead-nodes-for-replace` for
`raft_replace`. However, we shouldn't remove these features without the
deprecation period because the original `removenode` and `replace`
operations still support them. So, we add them for now.

The `IP -> Raft ID` translation is done through the new
`raft_address_map::find_by_addr` member function.

We update the documentation to inform about the deprecation of the IP
support for `--ignore-dead-nodes`.

Fixes #15126

Closes #15156

* github.com:scylladb/scylladb:
  docs: inform about deprecating IP support for --ignore-dead-nodes
  raft topology: support IPs for --ignore-dead-nodes
  raft_address_map: introduce find_by_addr
2023-08-30 10:41:23 +02:00
Botond Dénes
3e7ec6cc83 Merge 'Move cell assertion from cql_test_env to cql_assertions' from Pavel Emelyanov
The cql_test_env has a virtual require_column_has_value() helper that better fits cql_assertions crowd. Also, the helper in question duplicates some existing code, so it can also be made shorter (and one class table helper gets removed afterwards)

Closes #15208

* github.com:scylladb/scylladb:
  cql_assertions: Make permit from env
  table: Remove find_partition_slow() helper
  sstable_compaction_test: Do not re-decorate key
  cql_test_env: Move .require_column_has_value
  cql_test_env: Use table.find_row() shortcut
2023-08-30 08:34:05 +03:00
Kamil Braun
0bff96a611 Merge 'gossip: add group0_id attribute to gossip_digest_syn' from Mikołaj Grzebieluch
Motivation:

The user can bootstrap 3 different clusters and then connect them
(#14448). When these clusters start gossiping, their token rings will be
merged, but there will be 3 different group 0s in there. It results in a
corrupted cluster.

We need to prevent such situations from happening in clusters which
don't use Raft-based topology.

-------

Gossiper service sets its group0 id on startup if it is stored in
`scylla_local` or sets it during joining group0.

Send group0_id (if it is set) when the node tries to initiate the gossip
round. When a node gets gossip_digest_syn it checks if its group0 id
equals the local one and if not, the message is discarded.

Fixes #14448

Performed manual tests with the following scenario:
1. setup a cluster of two nodes (one compiled with and one without this patch)
2. setup a new node
3. create a basic keyspace and table
4. execute simple select and insert queries

Tested 4 scenarios: the seed node was with or without this patch, and the third node was with or without this patch.
These tests didn't detect any errors.

Closes #15004

* github.com:scylladb/scylladb:
  tests: raft: cluster of nodes with different group0 ids
  gossip: add group0_id attribute to gossip_digest_syn
2023-08-29 16:41:29 +02:00
Pavel Emelyanov
137c7116dc cql_assertions: Make permit from env
To call table::find_row() one needs to provide a permit. Tests have
short and neat helper to create one from cql_test_env

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 16:01:29 +03:00
Pavel Emelyanov
0a727a9b2e sstable_compaction_test: Do not re-decorate key
The is_partition_dead() local helper accepts partition key argument and
decorates it. Howerver, its caller gets partition key from decorated key
itself, and can just pass it along

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 15:38:41 +03:00
Pavel Emelyanov
4e9f380608 cql_test_env: Move .require_column_has_value
This env helper is only used by tests (from cql_query_test)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 15:38:33 +03:00
Pavel Emelyanov
7597663ef5 cql_test_env: Use table.find_row() shortcut
The require_column_has_value() finds the cell in three steps -- finds
partition, then row, then cell. The class table already has a method to
facilitate row finding by partition and clustering key

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 15:37:27 +03:00
Kamil Braun
ebc9056237 Merge 'Restore storage_service -> cdc_generation_service dependency' from Pavel Emelyanov
The main goal of this PR is to stop cdc_generation_service from calling
system_keyspace::bootstrap_complete(). The reason why it's there is that
gen. service doesn't want to handle generation before node joined the
ring or after it was decommissioned. The cleanup is done with the help
of storage_service->cdc_generation_service explicit dependency brought
back and this, in turn, suddenly freed the raft and API code from the
need to carry cdc gen. service reference around.

Closes #15047

* github.com:scylladb/scylladb:
  cdc: Remove bootstrap state assertion from after_join()
  cdc: Rework gen. service check for bootstrap state
  api: Don't carry cdc gen. service over
  storage_service: Use local cdc gen. service in join_cluster()
  storage_service: Remove cdc gen. service from raft_state_monitor_fiber()
  raft: Do not carry cdc gen. service over
  storage_service: Use local cdc gen. service in topo calls
  storage_service: Bring cdc_generation_service dependency back
2023-08-29 14:10:06 +02:00
Mikołaj Grzebieluch
bac8aa38d9 tests: raft: cluster of nodes with different group0 ids
The reproducer for #14448.

The test starts two nodes with different group0_ids. The second node
is restarted and tries to join the cluster consisting of the first node.
gossip_digest_syn message should be rejected by the first node, so
the second node will not be able to join the cluster.

This test uses repair-based node operations to make this test easier.
If the second node successfully joins the cluster, their tokens metadata
will be merged and the repair service will allow to decommission the second node.
If not - decommissioning the second node will fail with an exception
"zero replica after the removal" thrown by the repair service.
2023-08-29 11:09:15 +02:00
Mikołaj Grzebieluch
2230abc9b2 gossip: add group0_id attribute to gossip_digest_syn
Gossiper service sets its group0 id on startup if it is stored in `scylla_local`
or sets it during joining group0.

Send group0_id (if it is set) when the node tries to initiate the gossip round.
When a node gets gossip_digest_syn it checks if its group0 id equals the local
one and if not, the message is discarded.

Fixes #14448.
2023-08-29 11:09:15 +02:00
Botond Dénes
57deeb5d39 Merge 'gossiper: add get_unreachable_members_synchronized and use over api' from Benny Halevy
Modeled after get_live_members_synchronized,
get_unreachable_members_synchronized calls
replicate_live_endpoints_on_change to synchronize
the state of unreachable_members on all shards.

Fixes #12261
Fixes #15088

Also, add rest_api unit test for those apis

Closes #15093

* github.com:scylladb/scylladb:
  test: rest_api: add test_gossiper
  gossiper: add get_unreachable_members_synchronized
2023-08-29 10:43:22 +03:00
Pavel Emelyanov
a61454be00 storage_service: Use local cdc gen. service in join_cluster()
The method in question accepts cdc_generation_service ref argument from
main and cql_test_env, but storage service now has local cdcv gen.
service reference, so this argument and its propagation down the stack
can be removed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 09:36:58 +03:00
Pavel Emelyanov
933ea0afe6 storage_service: Bring cdc_generation_service dependency back
It sort of reverts the 5a97ba7121 commit, because storage service now
uses the cdc generation service to serve raft topo updates which, in
turn, takes the cdc gen. service all over the raft code _just_ to make
it as an argument to storage service topo calls.

Also there's API carrying cdc gen. service for the single call and also
there's an implicit need to kick cdc gen. service on decommission which
also needs storage service to reference cdc gen. after boot is complete

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-08-29 09:36:58 +03:00
Mikołaj Grzebieluch
a031a14249 tests: add asynchronous log browsing functionality
Add a class that handles log file browsing with the following features:
* mark: returns "a mark" to the current position of the log.
* wait_for: asynchronously checks if the log contains the given message.
* grep: returns a list of lines matching the regular expression in the log.

Add a new endpoint in `ManagerClient` to obtain the scylla logfile path.

Fixes #14782

Closes #14834
2023-08-25 14:19:09 +02:00
Patryk Jędrzejczak
b2755755f4 raft topology: support IPs for --ignore-dead-nodes
We want to stop supporting IPs for --ignore-dead-nodes in
raft_removenode and --ignore-dead-nodes-for-replace for
raft_replace. However, we shouldn't remove these features without
the deprecation period because the original removenode and
replace operations still support them. So, we add them for now.

Additionally, we modify test_raft_ignore_nodes.py so that it
verifies the added IP support.
2023-08-25 12:33:45 +02:00
Patryk Jędrzejczak
9806bddf75 test: fix a test case in raft_address_map_test
The test didn't test what it was supposed to test. It would pass
even if set_nonexpiring() didn't insert a new entry.

Closes #15157
2023-08-25 12:11:33 +02:00
Patryk Jędrzejczak
59df5ce7e4 raft_address_map: introduce find_by_addr
In the following commit, we add IP support for --ignore-dead-nodes
in raft_removenode and raft_replace. To implement it, we need
a way to translate IPs to Raft IDs. The solution is to add a new
member function -- find_by_addr -- to raft_address_map that
does the IP->ID translation.

The IP support for --ignore-dead-nodes will be deprecated and
find_by_addr shouldn't be called for other reasons, so it always
logs a warning.

We also add some unit tests for find_by_addr.
2023-08-24 15:10:43 +02:00
Raphael S. Carvalho
d6cc752718 test: Fix flakiness in sstable_compaction_test.autocompaction_control_test
It's possible that compaction task is preempted after completion and
before reevaluation, causing pending_tasks to be > 1.

Let's only exit the loop if there are no pending tasks, and also
reduce 100ms sleep which is an eternity for this test.

Fixes #14809.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #15059
2023-08-24 13:37:06 +03:00
Benny Halevy
672ec66769 test: rest_api: add test_gossiper
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-24 11:37:12 +03:00