Commit Graph

6652 Commits

Author SHA1 Message Date
Botond Dénes
4c0dadee7c Merge 'test: changes to prepare for dropping FMT_DEPRECATED_OSTREAM' from Kefu Chai
this series includes test related changes to enable us to drop `FMT_DEPRECATED_OSTREAM` deprecated in {fmt} v10.

Refs #13245

Closes scylladb/scylladb#18054

* github.com:scylladb/scylladb:
  test: unit: add fmt::formatter for test_data in tests
  test/lib: do not print with fmt::to_string()
  test/boost: print runtime_error using e.what()
2024-03-28 15:33:56 +02:00
Kamil Braun
33751f8f4e Merge 'raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC' from Gleb
* 'gleb/raft_snapshot_rpc-v3' of github.com:scylladb/scylla-dev:
  raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC
  Use correct limit for raft commands throughout the code.
2024-03-28 14:25:58 +01:00
Nadav Har'El
566223c34a Merge ' tools/scylla-nodetool: repair: abort on first failed repair' from Botond Dénes
When repairing multiple keyspaces, bail out on the first failed keyspace repair, instead of continuing and reporting all failures at the end. This is what Origin does as well.

To be able to test this, a bit of refactoring was needed, to be able to assert that `scylla-nodetool` doesn't make repair requests, beyond the expected ones.

Refs: https://github.com/scylladb/scylla-cluster-tests/issues/7226

Closes scylladb/scylladb#17678

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: repair: abort on first failed repair
  test/nodetool: nodetool(): add check_return_code param
  test/nodetool: nodetool(): return res object instead of just stdout
  test/nodetool: count unexpected requests
2024-03-28 14:02:29 +02:00
Botond Dénes
81bbfae77a tools/scylla-nodetool: implement the checkAndRepairCdcStreams command
Closes scylladb/scylladb#18076
2024-03-28 13:54:37 +02:00
Pavel Emelyanov
1adf16ce73 Merge 'network_topology_strategy: reallocate_tablets: support for rf changes' from Benny Halevy
This series provides a reallocate_tablets function, that's initially called by allocate_tablets_for_new_table.
The new allocation implementation is independent of vnodes/token ownership.
Rather than using the natural_endpoints_tracker, it implements its own tracking
based on dc/rack load (== number of replicas in rack), with the additional benefit
that tablet allocation will balance the allocation across racks, using a heap structure,
similar to the one we use to balance tablet allocation across shards in each node.

reallocate_tablets may also be called with an optional parameter pointing the the current tablet_map.
In this case the function either allocates more tablet replicas in datacenters for which the replication factor was increased,
or it will deallocate tablet replicas from datacenters for which replication factor was decreased.

The NetworkTopologyStrategy_tablets_test unit test was extended to cover replication factor changes.

Closes scylladb/scylladb#17846

* github.com:scylladb/scylladb:
  network_topology_strategy: reallocate_tablets: consider new_racks before existing racks
  network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test
  network_topology_strategy: reallocate_tablets: support deallocation via rf change
  network_topology_startegy_test: tablets_test: randomize cases
  network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership
  network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test
  network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions
  network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes
  network_topology_strategy_test: endpoints_check: strictly check rf for tablets
  network_topology_strategy_test: full_ring_check for tablets: drop unused options param
2024-03-28 11:19:11 +03:00
Kefu Chai
99e743de9d test: nodetool: match with vector printed by {fmt}
our homebrew formatter for std::vector<string> formats like

```
{hello, world}
```

while {fmt}'s formatter for sequence-like container formats like

```
["hello", "world"]
```

since we are moving to {fmt} formatters. and in this context,
quoting the verbatim text makes more sense to user. let's
support the format used by {fmt} as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18057
2024-03-28 09:35:37 +02:00
Marcin Maliszkiewicz
50e0032bca test: auth: remove if not exists from auth cql statement
They were added due to https://github.com/scylladb/python-driver/issues/296
but looks like it no longer reproduces.

Change was tested with ./test.py -vv --repeat=100 test_auth
to minimize chance of introducing flakiness.

Closes scylladb/scylladb#18043
2024-03-28 06:06:45 +01:00
Gleb Natapov
c1dcf0fae7 Use correct limit for raft commands throughout the code.
Raft uses schema commitlog, so all its limits should be derived from
this commitlog segment size, but many places used regular commitlog size
to calculate the limits and did not do what they really suppose to be
doing.
2024-03-27 19:16:09 +02:00
Avi Kivity
96a3544739 Merge 'alternator: reduce stall for Query and Scan with large pages' from Nadav Har'El
Before this series, Alternator's Query and Scan operations convert an
entire result page to JSON without yielding. For a page of maximum
size (1MB) and tiny rows, this can cause a significant stall - the
test included in this PR reported stalls of 14-26ms on my laptop.

The problem is the describe_items() function, which does this conversion
immediately, without yielding. This patch changes this function to
return a future, and use a new result_set::visit_gently() method
that does what visit() does, but with yields when needed.

This PR improves #17995, but does not completely fix is as the stalls in the
are not completely eliminated. But on my laptop it usually reduces the stalls
to around 5ms. It appears that the remaining stalls some from other places
not fixed in this PR, such as perhaps query_page::handle_result(), and will need
to be fixed by additional patches.

Closes scylladb/scylladb#18036

* github.com:scylladb/scylladb:
  alternator: reduce stall for Query and Scan with large pages
  result_set: introduce visit_gently()
  alternator: coroutinize do_query() function
2024-03-27 15:06:32 +02:00
Kamil Braun
404406e6a1 Merge ' test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables' from Botond Dénes
Memtables are fickle, they can be flushed when there is memory pressure,
if there is too much commitlog or if there is too much data in them. The
tests in test_select_from_mutation_fragments.py currently assume data
written is in the memtable. This is tru most of the time but we have
seen some odd test failures that couldn't be understood.  To make the
tests more robust, flush the data to the disk and read it from the
sstables. This means that some range scans need to filter to read from
just a single mutation source, but this does not influence the tests.
Also fix a use-after-return found when modifying the tests.

This PR tentatively fixes the below issues, based on our best guesses on why they failed (each was seen just once):
Fixes: scylladb/scylladb#16795
Fixes: scylladb/scylladb#17031

Closes scylladb/scylladb#17562

* github.com:scylladb/scylladb:
  test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables
  cql3: select_statement: mutation_fragments_select_statement: fix use-after-return
2024-03-27 13:21:19 +01:00
Botond Dénes
fdd5367974 Merge 'compaction: implement unchecked_tombstone_compaction' from Ferenc Szili
This change adds the missing Cassandra compaction option unchecked_tombstone_compaction.
Setting this option to true causes the compaction to ignore tombstone_threshold, and decide whether to do a compaction only based on the value of tombstone_compaction_interval

Fixes #1487

Closes scylladb/scylladb#17976

* github.com:scylladb/scylladb:
  removed forward declaration of resharding_descriptor
  compaction options and troubleshooting docs
  cql-pytest/test_compaction_strategy_validation.py
  test/boost/sstable_compaction_test.cc
  compaction: implement unchecked_tombstone_compaction
2024-03-27 13:56:02 +02:00
Andrei Chekun
0752ef1481 test: remove skip annotation for multi-DC test with 5 DCs with one node in each
As a follow-up of the https://github.com/scylladb/scylladb/pull/17503 remove skip annotation for the multi-DC test with a reduced amount of the DC used in it: from 30 DCs to 5 DCs

Closes scylladb/scylladb#17898
2024-03-27 13:13:13 +02:00
Michał Chojnowski
295b27a07b cache_flat_mutation_reader: only call get_iterator_in_latest() when pointing at a row
Calling `_next_row.get_iterator_in_latest()` is illegal when `_next_row` is not
pointing at a row. In particular, the iterator returned by such call might be
dangling.

We have observed this to cause a use-after-free in the field, when a reverse
read called `maybe_add_to_cache` after `_latest_it` was left dangling after
a dead row removal in `copy_from_cache_to_buffer`.

To fix this, we should ensure that we only call `_next_row.get_iterator_in_latest`
is pointing at a row.

Only the occurrences of this problem in `maybe_add_to_cache` are truly dangerous.
As far as I can see, other occurrences can't break anything as of now.
But we apply fixes to them anyway.

Closes scylladb/scylladb#18046
2024-03-27 11:48:42 +01:00
Kamil Braun
d274f63d89 Merge 'Add support for "initial-token" parameter in raft mode' from Gleb
Fixes scylladb/scylladb#17893

* 'gleb/initial-token-v1' of github.com:scylladb/scylla-dev:
  dht: drop unused parameter from get_random_bootstrap_tokens() function
  test: add test for initial_token parameter
  topology coordinator: use provided initial_token parameter to choose bootstrap tokens
  topology cooordinator: propagate initial_token option to the coordinator
2024-03-27 11:41:06 +01:00
Kefu Chai
71a519dee8 test: unit: add fmt::formatter for test_data in tests
this change is created in same spirit of d1c35f943d.

before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for test_data in
radix_tree_stress_test.cc, and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Kefu Chai
4f8c1a4729 test/lib: do not print with fmt::to_string()
we should not format a variable unless we want to print it. in this
case, we format `first_row` using `fmt::to_string()` to a string,
and then insert the string to another string, despite that this is
in a cold path, this is still a anti pattern -- both convoluted,
and not performant.

so let's just pass `first_row` to `format()`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Kefu Chai
d0ceb35e7e test/boost: print runtime_error using e.what()
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter. but fortunately, fmt v10 brings the builtin
formatter for classes derived from `std::exception`. but before
switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM`
macro, we need to print out `std::runtime_error`. so far, we don't
have a shared place for formatter for `std::runtime_error`. so we
are addressing the needs on a case-by-case basis.

in this change, we just print it using `e.what()`. it's behavior
is identical to what we have now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Benny Halevy
c5ff060dee network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test
Test that tablet allocation is balanced across
racks, nodes, and shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
4a7d57525e network_topology_strategy: reallocate_tablets: support deallocation via rf change
Add support for deallocating tablet replicas when the
datacenter replication factor is decreased.

We deallocate replicas back-to-front order to maintain
replica pairing between the base table and
its materialized views.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
1e8f8db5b8 network_topology_startegy_test: tablets_test: randomize cases
Instead of deterministically testing a very small set of cases,
randomize the the shard_count per node, the cluster topology
and the NetworkTopologyStrategy options.

The next patch will extend the test to also test
`reallocate_tablets` with randomized options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Botond Dénes
f70f04c240 tools/scylla-nodetool: repair: abort on first failed repair
When repairing multiple keyspaces, bail out on the first failed keyspace
repair, instead of continuing and reporting all failures at the end.
This is what Origin does as well.
2024-03-27 05:46:18 -04:00
Benny Halevy
40a4b349bd network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test
Test that we attempting to allocate tablets
throws an error when there are not enough nodes
for the configured replication factor.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
f19dbb4ae5 network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions
Using e.g. `BOOST_CHECK_EQUAL(endpoints.size(), total_rf)`
rather than `BOOST_CHECK(endpoints.size() == total_rf)`
prints a more detailed error message that includes the
runtime valies, if it fails.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
93b6573a90 network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
c11ffd14cc network_topology_strategy_test: endpoints_check: strictly check rf for tablets
With tablet we want to verify that the number of
replicas allocated per tablet per dc exactly matches
the replication strategy per-dc replication factor options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
ffa5870758 network_topology_strategy_test: full_ring_check for tablets: drop unused options param
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Botond Dénes
764e9a344d test/nodetool: nodetool(): add check_return_code param
When set to false, the returncode is not checked, this is left to the
caller. This in turn allows for checking the expected and unexpected
requests which is not checked when the nodetool process fails.
This is used by utils._do_check_nodetool_fails_with(), so that expected
and unexpected requests are checked even for failed invocations.

Some test need adjustment to the stricter checks.
2024-03-27 04:18:19 -04:00
Botond Dénes
8f3b1db37f test/nodetool: nodetool(): return res object instead of just stdout
So callers have access to stderr, return code and more.
This causes some churn in the test, but the changes are mechanical.
2024-03-27 04:18:19 -04:00
Botond Dénes
4d98b7d532 test/nodetool: count unexpected requests
We currently check at the end of each test, that all expected requests
set by the test were consumed. This patch adds a mechanism to count
unexpected requests -- requests which didn't match any of the expected
ones set by the test. This can be used to asser that nodetool didn't
make any request to the server, beyond what the test expected it to do.
Before this patch, requests like this would only be noticed by the test,
if the response of 404/500 caused nodetool to fail, which is not always
the case.
2024-03-27 02:39:28 -04:00
Tomasz Grabiec
042a4b7627 Merge 'tablets: add warning on CREATE KEYSPACE' from Nadav Har'El
The CDC feature is not supported on a table that uses tablets
(Refs https://github.com/scylladb/scylladb/issues/16317), so if a user creates a keyspace with tablets enabled
they may be surprised later (perhaps much later) when they try to enable
CDC on the table and can't.

The LWT feature always had issue Refs https://github.com/scylladb/scylladb/issues/5251, but it has become potentially
more common with tablets.

So it was proposed that as long as we have missing features (like CDC or
LWT), every time a keyspace is created with tablets it should output a
warning (a bona-fide CQL warning, not a log message) that some features
are missing, and if you need them you should consider re-creating the
keyspace without tablets.

This PR does this.

The warning text which will be produced is the following (obviously, it can
be improved later, as we perhaps find more missing features):

>   "Tables in this keyspace will be replicated using tablets, and will
>    not support the CDC feature (issue https://github.com/scylladb/scylladb/issues/16317) and LWT may suffer from
>    issue https://github.com/scylladb/scylladb/issues/5251 more often. If you want to use CDC or LWT, please drop
>    this keyspace and re-create it without tablets, by adding AND TABLETS
>    = {'enabled': false} to the CREATE KEYSPACE statement."

This PR also includes a test - that checks that this warning is is
indeed generated when a keyspace is created with tablets (either by default
or explicitly), and not generated if the keyspace is created without
tablets. It also fixes existing tests which didn't like the new warning.

Fixes https://github.com/scylladb/scylladb/issues/16807

Closes scylladb/scylladb#17318

* github.com:scylladb/scylladb:
  tablets: add warning on CREATE KEYSPACE
  test/cql-pytest: fix guadrail tests to not be sensitive to more warnings
2024-03-26 20:04:07 +01:00
Gleb Natapov
ed534fde8f test: add test for initial_token parameter
Test that configured tokens are used and tokens collision is detected.
2024-03-26 18:43:31 +02:00
Nadav Har'El
ba97fd98a3 alternator: reduce stall for Query and Scan with large pages
Before this patch, Alternator's Query and Scan operations convert an
entire result page to JSON without yielding. For a page of maximum
size (1MB) and tiny rows, this can cause a significant stall - the
test included in this patch reported stalls of 14-26ms on my laptop.

The problem is the describe_items() function, which does this conversion
immediately, without yielding. This patch changes this function to
return a future, and use the result_set::visit_gently() method
instead of visit() that yields when needed.

This patch does not completely eliminate stalls in the test, but
on my laptop usually reduces them to around 5ms. It appears that
the remaining stalls some from other places not fixed in this PR,
such as perhaps query_page::handle_result(), and will need to be
fixed by additional patches.

The test included in this patch is useful for manually reproducing
the stall, but not useful as a regression test: It is slow (requiring
a couple of seconds to set up the large partition) and doesn't
check anything, and can't even report the stall without modifying the
test runner. So the test is skipped by default (using the "veryslow"
marker) and can be enabled and run manually by developers who want
to continue working on #17995.

Refs #17995.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-26 18:32:45 +02:00
Avi Kivity
4ddf82e58b treewide: don't #include "gms/feature_service.hh" from other headers
feature_service.hh is a high-level header that integrates much
of the system functionality, so including it in lower-level headers
causes unnecessary rebuilds. Specifically, when retiring features.

Fix by removing feature_service.hh from headers, and supply forward
declarations and includes in .cc where needed.

Closes scylladb/scylladb#18005
2024-03-26 15:31:18 +02:00
Avi Kivity
22b8065a89 Merge 'tools/scylla-nodetool: implement the getsstables and sstableinfo commands' from Botond Dénes
These commands manage to avoid detection because they are not documented on https://opensource.docs.scylladb.com/stable/operating-scylla/nodetool.html.

They were discovered when running dtests, with ccm tuned to use the native nodetool directly. See https://github.com/scylladb/scylla-ccm/pull/565.

The commands come with tests, which pass with both the native and Java nodetools. I also checked that the relevant dtests pass with the native implementation.

Closes scylladb/scylladb#17979

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the sstableinfo command
  tools/scylla-nodetool: implement the getsstables command
  tools/scylla-nodetool: move get_ks_cfs() to the top of the file
  test/nodetool: rest_api_mock.py: add expected_requests context manager
2024-03-26 14:38:00 +02:00
Kefu Chai
101fdfc33a test: randomized_nemesis_test: add fmt::formatter for stop_crash::result_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

also, it's impossible to partial specialize a nested type of a
template class, we cannot specialize the `fmt::formatter` for
`stop_crash<M>::result_type`, as a workaround, a new type is
added.

in this change,

* define a new type named `stop_crash_result`
* add fmt::formatter for `stop_crash_result`
* define stop_crash::result_type as an alias of `stop_crash_result`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18018
2024-03-26 12:18:55 +02:00
Botond Dénes
7edbf189e6 Merge 'treewide: use fmt::to_string() to transform a UUID to std::string and drop UUID::to_sstring()' from Kefu Chai
`UUID::to_sstring()` relies on `FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for `UUID`, and this feature is deprecated in {fmt} v9, and dropped in {fmt} v10.

in this series, all callers of `UUID::to_sstring()` are switched to `fmt::to_string()`, and this function is dropped.

Closes scylladb/scylladb#18020

* github.com:scylladb/scylladb:
  utils: UUID: drop UUID::to_sstring()
  treewide: use fmt::to_string() to transform a UUID to std::string
2024-03-26 12:14:56 +02:00
Botond Dénes
f0ff23492f Merge 'Sanitize topology suites' skiplists' from Pavel Emelyanov
There are skip_in_<mode> lists in suite yaml that tells test.py not to run the test from it. This PR sanitizes these lists in two ways.

First, to skip pytests the skip-decorators are much more convenient, e.g. because they show the reason why the test is skipped.

Also, if a test wants to be opt-in-ed for some mode only, it's opt-out-ed in all other lists instead. There's run_in_<mode> list in suite for that.

Closes scylladb/scylladb#17964

* github.com:scylladb/scylladb:
  test: Do not duplicate test name in several skip-lists
  test: Mark tests with skip_mode instead of suite skip-list
2024-03-26 08:24:57 +02:00
Kefu Chai
1b859e484f treewide: use fmt::to_string() to transform a UUID to std::string
without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is
implemented using its `fmt::formatter`, which is not available
at the end of this header file where `UUID` is defined. at this moment,
we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can
still use `UUID::to_sstring()`, but in {fmt} v10, we cannot.

so, in this change, we change all callers of `UUID::to_sstring()`
to `fmt::to_string()`, so that we don't depend on
`FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-26 13:38:37 +08:00
Botond Dénes
1ea7b408db tools/scylla-nodetool: implement the sstableinfo command 2024-03-25 11:29:30 -04:00
Botond Dénes
50da93b9c8 tools/scylla-nodetool: implement the getsstables command 2024-03-25 11:29:30 -04:00
Botond Dénes
4ff88b848c test/nodetool: rest_api_mock.py: add expected_requests context manager
So tests and fixtures can use `with expected_requests():` and have
cleanup be taken care for them. I just discovered that some tests do not
clean up after themselves and when running all tests in a certain order,
this causes unrelated tests to fail.
Fix by using the context everywhere, getting guaranteed cleanup after
each test.
2024-03-25 11:29:30 -04:00
Petr Gusev
7c84fc527b test_invalid_user_type_statements: increase raft timeout
The test creates ut4 with a lot of fields,
this may take a while in debug builds,
to avoid raft operation timeout set the threshold
to some big value.

The error injector is disabled in release builds,
so this settings won't be applied to them.
This shouldn't be a problem since release builds
are fast enough, even on arm.

Fixes scylladb/scylladb#17987

Closes scylladb/scylladb#17997
2024-03-25 14:52:16 +01:00
Ferenc Szili
8bb7a18de2 test/cql-pytest: add --omit-scylla-output to Cassandra test runs
Currently, the tests in test/cql-pytest can be run against both ScyllaDB and Cassandra.
Running the test for either will first output the test results, and subsequently
print the stdout output of the process under test. Using the command line
option --omit-scylla-output it is possible to disable this print for Scylla,
but it is not possible for tests run against Cassandra.

This change adds the option to suppress output for Cassandra tests, too. By default,
the stdout of the Cassandra run will still be printed after the test results, but
this can now be disabled with --omit-scylla-output

Closes scylladb/scylladb#17996
2024-03-25 15:14:45 +02:00
Pavel Emelyanov
16343b3edc test: Do not duplicate test name in several skip-lists
Some tests are only run in dev mode for some reason. For such tests
there's run_in_dev list, no need in putting it in all the non-dev
skip_in_... ones.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:56:37 +03:00
Pavel Emelyanov
90dfcec86b test: Mark tests with skip_mode instead of suite skip-list
There are many tests that are skipped in release mode becuase they rely
on error-injection machinery which doesn't work in release mode. Most of
those tests are listed in suite's skip_in_release, but it's not very
handy, mainly because it's not clear why the test is there. The
skip_mode decoration is much more convenient.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:56:37 +03:00
Kamil Braun
69bf962522 Merge 'allow changing snitch with topology over raft' from Gleb
Fixes scylladb/scylladb#17513

* 'gleb/raft-snitch-change-v3' of github.com:scylladb/scylla-dev:
  doc: amend snitch changing procedure to work with raft
  test: add test to check that snitch change takes effect.
  raft topology: update rack/dc info in topology state on reboot if changed
2024-03-25 10:41:39 +01:00
Gleb Natapov
d7adf26a56 test: add test to check that snitch change takes effect.
The test creates two node cluster with default snitch (SimpleSnitch) and
checks that dc and rack names are as expected. Then it changes the
config to use GossipingPropertyFileSnitch with different names, restart
nodes and check that now peers table has new names.
2024-03-25 10:41:49 +02:00
Raphael S. Carvalho
6bdb456fad sstables_loader: Fix loader when write selector is previous during tablet migration
The loader is writing to pending replica even when write selector is set
to previous. If migration is reverted, then the writes won't be rolled
back as it assumes pending replicas weren't written to yet. That can
cause data resurrection if tablet is later migrated back into the same
replica.

NOTE: write selector is handled correctly when set to next, because
get_natural_endpoints() will return the next replica set, and none
of the replicas will be considered leaving. And of course, selector
set to both is also handled correctly.

Fixes #17892.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17902
2024-03-24 01:20:50 +01:00
Kamil Braun
230f23004b Revert "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
This reverts commit b4144d14c6.

The test is flaky and blocks next promotions.
2024-03-22 17:25:04 +01:00
Petr Gusev
2a5f5d1948 test_fencing: fix flakiness
To cause the stale topology exception the test reads
the version from the last bootstrapped host and assigns its decremented
value to version and fence_version fields of system.topology.
The test assumes that version == fence_version here, if version
is greater than fence_version we won't get state topology
exception in this setup. Tablet balancer can break
this -- it may increment the version after the last node is
bootstrapped.

Fix this by disabling the tablet balancer earlier.

fixes scylladb/scylladb#17807

Closes scylladb/scylladb#17940
2024-03-22 12:49:13 +01:00