Commit Graph

5235 Commits

Author SHA1 Message Date
Petr Gusev
3737bf8fa2 topology.cc: unindex_node: _dc_racks removal fix
The eps reference was reused to manipulate
the racks dictionary. This resulted in
assigning a set of nodes from the racks
dictionary to an element of the _dc_endpoints dictionary.

The problem was demonstrated by the dtest
test_decommission_last_node_in_rack
(scylladb/scylla-dtest#3299).
The test set up four nodes, three on one rack
and one on another, all within a single data
center (dc). It then switched to a
'network_topology_strategy' for one keyspace
and tried to decommission the single node
on the second rack. This decomission command
with error message 'zero replica after the removal.'
This happened because unindex_node assigned
the empty list from the second rack
as a value for the single dc in
_dc_endpoints dictionary. As a result,
we got empty nodes list for single dc in
natural_endpoints_tracker::_all_endpoints,
node_count == 0 in data_center_endpoints,
_rf_left == 0, so
network_topology_strategy::calculate_natural_endpoints
rejected all the endpoints and returned an empty
endpoint_set. In
repair_service::do_decommission_removenode_with_repair
this caused the 'zero replica after the removal' error.

With this fix the test passes both with
--consistent-cluster-management option and
without it.

The specific unit test for this problem was added.

Fixes: #14184

Closes #14673
2023-07-13 11:16:01 +03:00
Kamil Braun
9d4b3c6036 test: use correct timestamp resolution in test_group0_history_clearing_old_entries
In 10c1f1dc80 I fixed
`make_group0_history_state_id_mutation` to use correct timestamp
resolution (microseconds instead of milliseconds) which was supposed to
fix the flakiness of `test_group0_history_clearing_old_entries`.

Unfortunately, the test is still flaky, although now it's failing at a
later step -- this is because I was sloppy and I didn't adjust this
second part of the test to also use microsecond resolution. The test is
counting the number of entries in the `system.group0_history` table that
are older than a certain timestamp, but it's doing the counting using
millisecond resolution, causing it to give results that are off by one
sometimes.

Fix it by using microseconds everywhere.

Fixes #14653

Closes #14670
2023-07-13 10:33:52 +03:00
Tomasz Grabiec
b7bc991aa1 Merge 'Fix test_node_isolation flakiness' from Kamil Braun
The test isolates a node and then connects to it through CQL.
The `connect()` step would often timeout on ARM debug builds. This was
already dealt with in the past in the context of other tests: #11289.

The `ManagerClient.con_gen` function creates a connection in a way that
avoids the problem -- connection timeout settings are adjusted to
account for the slowness. Use it in this test to fix the flakiness.

At the same time, reduce the timeout used for the actual CQL request
(after the driver has already connected), because the test expects this
request to timeout and waiting for 200 seconds here is just a waste of
time.

Closes #14663

* github.com:scylladb/scylladb:
  test: test_node_isolation: use `ManagerClient.con_gen` to create CQL connection
  test: manager_client: make `con_gen` for `ManagerClient.__init__` nonoptional
2023-07-12 16:36:54 +02:00
Asias He
d3034e0fab view_update_generator: Increase the registration_queue_size
When repair writes a sstable to disk, we check if the sstable needs view
update processing. If yes, the sstable will be placed into the staging
dir for processing, with the _registration_sem semaphore to prevent too
many pending unprocessed sstables.

We have seen multiple cases in the field where view update processing is
inefficient and way too slow which blocks the base table repair to
finish on time.

This patch increases the registration_queue_size to a bigger number to
mitigate the problem that slow view update processing blocks repair.

It is better to have a consistent base table + inconsistent view table
than inconsistent base table + inconsistent view table.

Currently, sstables in staging dir are not compacted. So we could not
increase the _registration_sem with too big number to avoid accumulate
too many sstables.

The view_build_test.cc is updated to make the test pass.

Closes #14241
2023-07-12 15:51:35 +03:00
Kefu Chai
8f31f28446 build: cmake: add test/raft tests
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14656
2023-07-12 15:06:59 +03:00
Kamil Braun
820d7e9520 test: test_node_isolation: use ManagerClient.con_gen to create CQL connection
The test isolates a node and then connects to it through CQL.
The `connect()` step would often timeout on ARM debug builds. This was
already dealt with in the past in the context of other tests: #11289.

The `ManagerClient.con_gen` function creates a connection in a way that
avoids the problem -- connection timeout settings are adjusted to
account for the slowness. Use it in this test to fix the flakiness.

At the same time, reduce the timeout used for the actual CQL request
(after the driver has already connected), because the test expects this
request to timeout and waiting for 200 seconds here is just a waste of
time.
2023-07-12 12:34:02 +02:00
Kefu Chai
20c7b6057b test: silence the deprecation warning.
because `lw_shared_ptr::operator=(T&&)` was deprecated. we started to
have following waring:

```
/home/kefu/dev/scylladb/test/boost/statement_restrictions_test.cc:394:41: warning: 'operator=' is deprecated: call make_lw_shared<> and assign the result instead [-Wdeprecated-declarations]
  394 |         definition.column_specification = std::move(specification);
      |                                         ^
/home/kefu/dev/scylladb/seastar/include/seastar/core/shared_ptr.hh:346:7: note: 'operator=' has been explicitly marked deprecated here
  346 |     [[deprecated("call make_lw_shared<> and assign the result instead")]]
      |       ^
1 warning generated.
```

so, in this change, we use the recommended way to update a lw_shared_ptr.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14648
2023-07-12 13:10:33 +03:00
Kamil Braun
3464877276 test: manager_client: make con_gen for ManagerClient.__init__ nonoptional
`ManagerClient` is given a function that is used to create CQL
connections to the Scylla cluster. For some reason it was typed as
`Optional` even though it was never passed `None`. Fix it.
2023-07-12 11:44:15 +02:00
Nadav Har'El
a4087f58df alternator: fix error path for size() function on constants
The DynamoDB documentation for the size() function claims that it only
works on paths (attribute names or references), but it actually works on
constants from the query (e.g., ":val") as well.

It turns out that Alternator supports this undocumented case already, but
gets the error path wrong: Usually, when size() is calculated on the data,
if the data has the wrong type of size() (e.g., an integer), the condition
simply doesn't match. But if the value comes from the query - it should
generate an error that the query is wrong - ValidationException.

This patch fixes this case, and also adds tests for it that pass on both
DynamoDB and Alternator (after this patch).

Fixes #14592

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14593
2023-07-12 12:29:05 +03:00
Nadav Har'El
f08bc83cb2 cql-pytest: translate Cassandra's tests for CAST operations
This is a translation of Cassandra's CQL unit test source file
functions/CastFctsTest.java into our cql-pytest framework.

There are 13 tests, 9 of them currently xfail.

The failures are caused by one recently-discovered issue:

Refs #14501: Cannot Cast Counter To Double

and by three previously unknown or undocumented issues:

Refs #14508: SELECT CAST column names should match Cassandra's
Refs #14518: CAST from timestamp to string not same as Cassandra on zero
             milliseconds
Refs #14522: Support CAST function not only in SELECT

Curiously, the careful translation of this test also caused me to
find a bug in Cassandra https://issues.apache.org/jira/browse/CASSANDRA-18647
which the test in Java missed because it made the same mistake as the
implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14528
2023-07-12 11:42:04 +03:00
Nadav Har'El
599636b307 test/alternator: fix flaky test test_ttl_expiration_gsi_lsi
The Alternator test test_ttl.py::test_ttl_expiration_gsi_lsi was flaky.
The test incorrectly assumes that when we write an already expired item,
it will be visible for a short time until being deleted by the TTL thread.
But this doesn't need to be true - if the test is slow enough, it may go
look or the item after it was already expired!

So we fix this test by splitting it into two parts - in the first part
we write a non-expiring item, and notice it eventually appears in the
GSI, LSI, and base-table. Then we write the same item again, with an
expiration time - and now it should eventually disappear from the GSI,
LSI and base-table.

This patch also fixes a small bug which prevented this test from running
on DynamoDB.

Fixes #14495

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14496
2023-07-12 11:23:12 +03:00
Botond Dénes
968421a3e0 Merge 'Stop task manager compaction module properly' from Aleksandra Martyniuk
Due to wrong order of stopping of compaction services, shutdown needs
to wait until all compactions are complete, which may take really long.

Moreover, test version of compaction manager does not abort task manager,
which is strictly bounded to it, but stops its compaction module. This results
in tests waiting for compaction task manager's tasks to be unregistered,
which never happens.

Stopping and aborting of compaction manager and task manager's compaction
module are performed in a proper order.

Closes #14461

* github.com:scylladb/scylladb:
  tasks: test: abort task manager when wrapped_compaction_manager is destructed
  compaction: swap compaction manager stopping order
  compaction: modify compaction_manager::stop()
2023-07-12 09:54:00 +03:00
Botond Dénes
c5cb23a825 Merge 'Add scylla table to scylla-gdb' from Pavel Emelyanov
The command is to print interesting and/or hard-to-get-by-hand info about individual tables

Closes #14635

* github.com:scylladb/scylladb:
  test: Add 'scylla table' cmd test
  scylla-gdb: Print table phased barriers
  scylla-gdb: Add 'table' command
2023-07-12 09:26:59 +03:00
Kamil Braun
dc6f6cb6b0 cql_test_env: load host ID from sstables after restart
Performance tests such as `perf-fast-forward` are executed in our CI
environments in two steps (two invocations of the `scylla` process):
first by populating data directories (with `--populate` option), then by
running the actual test.

These tests are using `cql_test_env`, which did not load the previously
saved (in the populate step) Host ID of this node, but generated a new
one randomly instead.

In b39ca97919 we enabled
`consistent_cluster_management` by default. This caused the perf tests
to hang in `setup_group0` at `read_barrier` step. That's because Raft
group 0 was initialized with old configuration -- the one created during
the populate step -- but the Raft server was started with a newly
generated Host ID (which is used as the server's Raft ID), so the server
considered itself as being outside the configuration.

Fix this by reloading the Host ID from disk, simulating more closely the
behavior of main.cc initialization.

Fixes #14599

Closes #14640
2023-07-11 23:30:44 +03:00
Avi Kivity
1545ae2d3b Merge 'Make SSTable cleanup more efficient by fast forwarding to next owned range' from Raphael "Raph" Carvalho
Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node.

That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in https://github.com/scylladb/scylladb/issues/14317.

To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead.

Without further ado,

before:

`INFO  2023-07-01 07:10:26,281 [shard 0] compaction - [Cleanup keyspace2.standard1 701af580-17f7-11ee-8b85-a479a1a77573] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s8o_06uww24drzrroaodpv-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.`

after:

`INFO  2023-07-01 07:07:52,354 [shard 0] compaction - [Cleanup keyspace2.standard1 199dff90-17f7-11ee-b592-b4f5d81717b9] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s4m_5hehd2rejj8w15d2nt-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.`

Fixes #12998.
Fixes #14317.

Closes #14469

* github.com:scylladb/scylladb:
  test: Extend cleanup correctness test to cover more cases
  compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range
  sstables: Close SSTable reader if index exhaustion is detected in fast forward call
  sstables: Simplify sstable reader initialization
  compaction: Extend make_sstable_reader() interface to work with mutation_source
  test: Extend sstable partition skipping test to cover fast forward using token
2023-07-11 23:28:15 +03:00
Avi Kivity
9cdae78d04 test: expr_test: add copyright/license
Closes #14613
2023-07-11 21:45:27 +03:00
Raphael S. Carvalho
60ba1d8b47 test: Extend cleanup correctness test to cover more cases
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-07-11 13:56:24 -03:00
Michał Chojnowski
b511d57fc8 Revert "Merge 'Compaction resharding tasks' from Aleksandra Martyniuk"
This reverts commit 2a58b4a39a, reversing
changes made to dd63169077.

After patch 87c8d63b7a,
table_resharding_compaction_task_impl::run() performs the forbidden
action of copying a lw_shared_ptr (_owned_ranges_ptr) on a remote shard,
which is a data race that can cause a use-after-free, typically manifesting
as allocator corruption.

Note: before the bad patch, this was avoided by copying the _contents_ of the
lw_shared_ptr into a new, local lw_shared_ptr.

Fixes #14475
Fixes #14618

Closes #14641
2023-07-11 19:11:37 +03:00
Pavel Emelyanov
5518502085 test: Add 'scylla table' cmd test
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-07-11 15:12:43 +03:00
Kefu Chai
ef78b31b43 s3/client: add tagging ops
with tagging ops, we will be able to attach kv pairs to an object.
this will allow us to mark sstable components with taggings, and
filter them based on them.

* test/pylib/minio_server.py: enable anonymous user to perform
  more actions. because the tagging related ops are not enabled by
  "mc anonymous set public", we have to enable them using "set-json"
  subcommand.
* utils/s3/client: add methods to manipulate taggings.
* test/boost/s3_test: add a simple test accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14486
2023-07-11 09:30:46 +03:00
Kefu Chai
3b6e37051b build: cmake: add more tests to CMake
to be in-sync with configure.py

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14479
2023-07-11 09:21:26 +03:00
Avi Kivity
0cabf4eeb9 build: disable implicit fallthrough
Prevent switch case statements from falling through without annotation
([[fallthrough]]) proving that this was intended.

Existing intended cases were annotated.

Closes #14607
2023-07-10 19:36:06 +02:00
Tomasz Grabiec
65a5942ec0 Merge 'Fix bootstrap "wait for UP/NORMAL nodes" to handle ignored nodes, recently replaced nodes, and recently changed IPs' from Kamil Braun
Before this PR, the `wait_for_normal_state_handled_on_boot` would
wait for a static set of nodes (`sync_nodes`), calculated using the
`get_nodes_to_sync_with` function and `parse_node_list`; the latter was
used to obtain a list of "nodes to ignore" (for replace operation) and
translate them, using `token_metadata`, from IP addresses to Host IDs
and vice versa. `sync_nodes` was also used in `_gossiper.wait_alive` call
which we do after `wait_for_normal_state_handled_on_boot`.

Recently we started doing these calculations and this wait very early in
the boot procedure - immediately after we start gossiping
(50e8ec77c6).

Unfortunately, as always with gossiper, there are complications.
In #14468 and #14487 two problems were detected:
- Gossiper may contain obsolete entries for nodes which were recently
  replaced or changed their IPs. These entries are still using status
  `NORMAL` or `shutdown` (which is treated like `NORMAL`, e.g.
  `handle_state_normal` is also called for it). The
  `_gossiper.wait_alive` call would wait for those entries too and
  eventually time out.
- Furthermore, by the time we call `parse_node_list`, `token_metadata`
  may not be populated yet, which is required to do the IP<->Host ID
  translations -- and populating `token_metadata` happens inside
  `handle_state_normal`, so we have a chicken-and-egg problem here.

It turns out that we don't need to calculate `sync_nodes` (and
hence `ignore_nodes`) in order to wait for NORMAL state handlers. We
can wait for handlers to finish for *any* `NORMAL`/`shutdown` entries
appearing in gossiper, even those that correspond to dead/ignored
nodes and obsolete IPs.  `handle_state_normal` is called, and
eventually finishes, for all of them.
`wait_for_normal_state_handled_on_boot` no longer receives a set of
nodes as parameter and is modified appropriately, it's now calculating
the necessary set of nodes on each retry (the set may shrink while
we're waiting, e.g. because an entry corresponding to a node that was
replaced is garbage-collected from gossiper state).

Thanks to this, we can now put the `sync_nodes` calculation (which is
still necessary for `_gossiper.wait_alive`), and hence the
`parse_node_list` call, *after* we wait for NORMAL state handlers,
solving the chickend-and-egg problem.

This addresses the immediate failure described in #14487, but the test
would still fail. That's because `_gossiper.wait_alive` may still receive
a too large set of nodes -- we may still include obsolete IPs or entries
corresponding to replaced nodes in the `sync_nodes` set.

We need a better way to calculate `sync_nodes` which detects ignores
obsolete IPs and nodes that are already gone but just weren't
garbage-collected from gossiper state yet.

In fact such a method was already introduced in the past:
ca61d88764
but it wasn't used everywhere. There, we use `token_metadata` in which
collisions between Host IDs and tokens are resolved, so it contains only
entries that correspond to the "real" current set of NORMAL nodes.

We use this method to calculate the set of nodes passed to
`_gossiper.wait_alive`.

We also introduce regression tests with necessary extensions
to the test framework.

Fixes #14468
Fixes #14487

Closes #14507

* github.com:scylladb/scylladb:
  test: rename `test_topology_ip.py` to `test_replace.py`
  test: test bootstrap after IP change
  test: scylla_cluster: return the new IP from `change_ip` API
  test: node replace with `ignore_dead_nodes` test
  test: scylla_cluster: accept `ignore_dead_nodes` in `ReplaceConfig`
  storage_service: remove `get_nodes_to_sync_with`
  storage_service: use `token_metadata` to calculate nodes waited for to be UP
  storage_service: don't calculate `ignore_nodes` before waiting for normal handlers
2023-07-10 00:28:20 +02:00
Avi Kivity
850d759fd9 Merge 'repair: optimise repair reader with different shard count' from Gusev Petr
Consider a cluster with no data, e.g. in tests. When a new node is bootstrapped with repair we iterate over all (shard, table, range), read data from all the peer nodes for the range, look for any discrepancies and heal them. Even for small num_tokens (16 in the tests) the number of affected ranges (those we need to consider) amounts to total number of tokens in the cluster, which is 32 for the second node and 48 for the third. Multiplying this by the number of shards and the number of tables in each keyspace gives thousands of ranges. For each of them we need to follow some row level repair protocol, which includes several RPC exchanges between the peer nodes and creating some data structures on them. These exchanges are processed sequentially for each shard, there are `parallel_for_each` in code, but they are throttled by the choosen memory constraints and in fact execute sequentially.

When the bootstrapping node (master) reaches a peer node and asks for data in the specific range and master shard, two options exist. If sharder parameters (primarily, `--smp`) are the same on the master and on the peer, we can just read one local shard, this is fast. If, on the other hand, `--smp` is different, we need to do a multishard query. The given range from the master can contain data from different peer shards, so we split this range into a number of subranges such that each of them contain data only from the given master shard (`dht::selective_token_range_sharder`). The number of these subranges can be quite big (300 in the tests). For each of these subranges we do `fast_forward_to` on the `multishard_reader`, and this incurs a lot of overhead, mainly becuse of `smp::submit_to`.

In this series we optimize this case. Instead of splitting the master range and reading only what's needed, we read all the data in the range and then apply the filter by the master shard. We do this if the estimated number of partitions is small (<=100).

This is the logs of starting a second node with `--smp 4`, first node was `--smp 3`:

```
with this patch
    20:58:49.644 INFO> [debug/topology_custom.test_topology_smp.1] starting server at host 127.222.46.3 in scylla-2...
    20:59:22.713 INFO> [debug/topology_custom.test_topology_smp.1] started server at host 127.222.46.3 in scylla-2, pid 1132859

without this patch
    21:04:06.424 INFO> [debug/topology_custom.test_topology_smp.1] starting server at host 127.181.31.3 in scylla-2...
    21:06:01.287 INFO> [debug/topology_custom.test_topology_smp.1] started server at host 127.181.31.3 in scylla-2, pid 1134140
```

Fixes: #14093

Closes #14178

* github.com:scylladb/scylladb:
  repair_test: add test_reader_with_different_strategies
  repair: extract repair_reader declaration into reader.hh
  repair_meta: get_estimated_partitions fix
  repair_meta: use multishard_filter reader if the number of partitions is small
  repair_meta: delay _repair_reader creation
  database.hh: make_multishard_streaming_reader with range parameter
  database.cc: extract streaming_reader_lifecycle_policy
2023-07-09 23:21:06 +03:00
Aleksandra Martyniuk
23e3251fc3 tasks: test: abort task manager when wrapped_compaction_manager is destructed
When task manager is not aborted, the tasks are stored in the memory,
not allowing the tasks' gate to be closed.

When wrapped_compaction_manager is destructed, task manager gets
aborted, so that system could shutdown.
2023-07-09 12:08:32 +02:00
Michał Chojnowski
c41f0ebd2a test: mutation_test: unflake test_external_memory_usage
The test has about 1/2500000 chance to fail due to a conflict of random
values. And it recently did, just to spite us.

Fight back.

Fixes #14563

Closes #14576
2023-07-08 15:20:25 +03:00
Kamil Braun
de7f668441 Merge 'raft topology: send cdc generation data in parts' from Mikołaj Grzebieluch
The CDC generation data can be large and not fit in a single command.
This pr splits it into multiple mutations by smartly picking a
`mutation_size_threshold` and sending each mutation as a separate group
0 command.

Commands are sent sequentially to avoid concurrency problems.

Topology snapshots contain only mutation of current CDC generation data
but don't contain any previous or future generations. If a new
generation of data is being broadcasted but hasn't been entirely applied
yet, the applied part won't be sent in a snapshot. New or delayed nodes
can never get the applied part in this scenario.

Send the entire cdc_generations_v3 table in the snapshot to resolve this
problem.

A mechanism to remove old CDC generations will be introduced as a
follow-up.

Closes #13962

* github.com:scylladb/scylladb:
  test: raft topology: test `prepare_and_broadcast_cdc_generation_data`
  service: raft topology: print warning in case of `raft::commit_status_unknown` exception in topology coordinator loop
  raft topology: introduce `prepare_and_broadcast_cdc_generation_data`
  raft: add release_guard
  raft: group0_state_machine::merger take state_id as the maximal value from all merged commands
  raft topology: include entire cdc_generations_v3 table in cdc_generation_mutations snapshot
  raft topology: make `mutation_size_threshold` depends on `max_command_size`
  raft: reduce max batch size of raft commands and raft entries
  raft: add description argument to add_entry_unguarded
  raft: introduce `write_mutations` command
  raft: refactor `topology_change` applying
2023-07-07 16:31:29 +02:00
Kamil Braun
f9cfd7e4f5 Merge 'raft: do not ping self in direct failure detector' from Konstantin Osipov
Avoid pinging self in direct failure detector, this adds confusing noise and adds constant overhead.
Fixes #14388

Closes #14558

* github.com:scylladb/scylladb:
  direct_fd: do not ping self
  raft: initialize raft_group_registry with host id early
  raft: code cleanup
2023-07-07 14:26:17 +02:00
Mikołaj Grzebieluch
4e3c97d8d4 test: raft topology: test prepare_and_broadcast_cdc_generation_data
This test limits `commitlog_segment_size_in_mb` to 2, thus `max_command_size`
is limited to less than 1 MB. It adds an injection which copies mutations
generated by `get_cdc_generation_mutations` n times, where n is picked that
the memory size of all mutations exceeds `max_command_size`.

This test passes if cdc generation data is committed by raft in multiple commands.
If all the data is committed in a single command, the leader node will loop trying
to send raft command and getting the error:
```
storage_service - raft topology: topology change coordinator fiber got error raft::command_is_too_big_error (Command size {} is greater than the configured limit {})
```
2023-07-07 13:56:35 +02:00
Mikołaj Grzebieluch
6dad582796 raft: reduce max batch size of raft commands and raft entries
For now, `raft_sys_table_storage::_max_mutation_size` equals `max_mutation_size`
(half of the commitlog segment size), so with some additional information, it
can exceed this threshold resulting in throwing an exception when writing
mutation to the commitlog.

A batch of raft commands has the size at most `group0_state_machine::merger::max_command_size`
(half of the commitlog segment size). It doesn't have additional metadata, but
it may have a size of exactly `max_mutation_size`. It shouldn't make any trouble,
but it is prefered to be careful.

Make `raft_sys_table_storage::_max_mutation_size` and
`group0_state_machine::merger::max_command_size` more strict to leave space
for metadata.

Fixed typo "1204" => "1024".
2023-07-07 13:11:52 +02:00
Gleb Natapov
4f23eec44f Rename experimental raft feature to consistent-topology-changes
Make the name more descriptive

Fixes #14145

Message-Id: <ZKQ2wR3qiVqJpZOW@scylladb.com>
2023-07-07 11:08:10 +02:00
Konstantin Osipov
50140980ac raft: initialize raft_group_registry with host id early
Earlier, when local query processor wasn't available at
the beginning of system start, we couldn't query our own
host id when initializing the raft group registry. The local
host id is needed by the registry since it is responsible
to route RPC messages to specific raft groups, and needs
to reject messages destined to a different host.

Now that the host id is known early at boot, remove the optional
and pass host id in the constructor. Resolves an earlier fixme.
2023-07-06 20:54:05 +03:00
Kamil Braun
0d437a7d63 Merge 'utils: error injection: add inject_with_handler for interactions with injected code' from Mikołaj Grzebieluch
Currently, it is hard for injected code to wait for some events, for example, requests on some REST endpoint.

This PR adds the `inject_with_handler` method that executes injected function and passes `injection_handler` as its argument.
The `injection_handler` class is used to wait for events inside the injected code.
The `error_injection` class can notify the injection's handler or handlers associated with the injection on all shards about the received message.

Closes #14357.

Closes #14460

* github.com:scylladb/scylladb:
  tests: introduce InjectionHandler class for communicating with injected code
  api/error_injection: add message_injection endpoint
  tests: utils: error injections: add test for inject_with_handler
  utils: error injection: add inject_with_handler for interactions with injected code
  utils: error injection: create structure for error injections data
2023-07-06 18:16:51 +02:00
Mikołaj Grzebieluch
907c0e8900 tests: introduce InjectionHandler class for communicating with injected code
Add a client for sending empty messages to the injected code from tests.
2023-07-06 12:34:53 +02:00
Mikołaj Grzebieluch
7e5c42af0a tests: utils: error injections: add test for inject_with_handler
Add a test checking the correctness of the `inject_with_handler` method
in presence of concurrency.
2023-07-06 12:34:53 +02:00
Kamil Braun
431a8f8591 test: rename test_topology_ip.py to test_replace.py
No idea why it was named like that before.
2023-07-06 10:24:46 +02:00
Kamil Braun
452d9a3c77 test: test bootstrap after IP change
Regression test for #14468.
2023-07-06 10:24:46 +02:00
Kamil Braun
2032d7dbe4 test: scylla_cluster: return the new IP from change_ip API
Also simplify the API by getting rid of `ActionReturn` and returning
errors through exceptions (which are correctly forwarded to the client
for some time already).
2023-07-06 10:24:46 +02:00
Kamil Braun
00f51ea753 test: node replace with ignore_dead_nodes test
Regression test for #14487 on steroids. It performs 3 consecutive node
replace operations, starting with 3 dead nodes.

In order to have a Raft majority, we have to boot a 7-node cluster, so
we enable this test only in one mode; the choice was between `dev` and
`release`, I picked `dev` because it compiles faster and I develop on
it.
2023-07-06 10:24:46 +02:00
Kamil Braun
9b136ee574 test: scylla_cluster: accept ignore_dead_nodes in ReplaceConfig 2023-07-06 10:24:46 +02:00
Tomasz Grabiec
c25201c1a3 Merge 'view: fix range tombstone handling on flushes in view_updating_consumer' from Michał Chojnowski
View update routines accept `mutation` objects.
But what comes out of staging sstable readers is a stream of mutation_fragment_v2 objects.
To build view updates after a repair/streaming, we have to convert the fragment stream into `mutation`s. This is done by piping the stream to mutation_rebuilder_v2.

To keep memory usage limited, the stream for a single partition might have to be split into multiple partial `mutation` objects. view_update_consumer does that, but in improper way -- when the split/flush happens inside an active range tombstone, the range tombstone isn't closed properly. This is illegal, and triggers an internal error.

This patch fixes the problem by closing the active range tombstone (and reopening in the same position in the next `mutation` object).

The tombstone is closed just after the last seen clustered position. This is not necessary for correctness -- for example we could delay all processing of the range tombstone until we see its end bound -- but it seems like the most natural semantic.

Fixes https://github.com/scylladb/scylladb/issues/14503

Closes #14502

* github.com:scylladb/scylladb:
  test: view_build_test: add range tombstones to test_view_update_generator_buffering
  test: view_build_test: add test_view_udate_generator_buffering_with_random_mutations
  view_updating_consumer: make buffer limit a variable
  view: fix range tombstone handling on flushes in view_updating_consumer
2023-07-05 21:21:43 +02:00
Michał Chojnowski
f6203f2bd4 test: view_build_test: add range tombstones to test_view_update_generator_buffering
This patch adds a full-range tombstone to the compacted mutation.
This raises the coverage of the test. In particular, it reproduces
issue #14503, which should have been caught by this test, but wasn't.
2023-07-05 17:33:49 +02:00
Michał Chojnowski
aab10402ce test: view_build_test: add test_view_udate_generator_buffering_with_random_mutations
A random mutation test for view_updating_consumer's buffering logic.
Reproduces #14503.
2023-07-05 17:33:49 +02:00
Michał Chojnowski
ac29b6f198 view_updating_consumer: make buffer limit a variable
The limit doesn't change at runtime, but we this patch makes it variable for
unit testing purposes.
2023-07-05 17:33:47 +02:00
Raphael S. Carvalho
5d34db2532 test: Extend sstable partition skipping test to cover fast forward using token
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-07-05 11:38:58 -03:00
Pavel Emelyanov
e91f95a629 Merge 's3/test: restructure object_store test into a pytest based test suite' from Kefu Chai
in this series, test/object_storage is restructured into a pytest based test. this paves the road to a test suites covers more use cases. so we can some more lower-level tests for tiered/caching-store.

Closes #14165

* github.com:scylladb/scylladb:
  s3/test: do not return ip in managed_cluster()
  s3/test: verify the behavior with asserts
  s3/test: restructure object_store/run into a pytest
  s3/test: extract get_scylla_with_s3_cmd() out
  s3/test: s/restart_with_dir/kill_with_dir/
  s3/test: vendor run_with_dir() and friends
  s3/test: remove get_tempdir()
  s3/test: extract managed_cluster() out
2023-07-05 15:40:43 +03:00
Kefu Chai
9080f8842b s3/test: do not return ip in managed_cluster()
let's just use cluster.contact_points for retrieving the IP address
of the scylla node in this single-node cluster. so the name of
managed_cluster() is less weird.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 17:07:39 +08:00
Kefu Chai
ec6410653f s3/test: verify the behavior with asserts
instead of assigning to "success", let's use assert for this purpose.
simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 17:07:21 +08:00
Kefu Chai
471d75c6c6 s3/test: restructure object_store/run into a pytest
instead of using a single run to perform the test, restructure
it into a pytest based test suite with a single test case.
this should allow us to add more tests exercising the object-storage
and cached/tierd storage in future.

* add fixtures so they can be reused by tests
* use tmpdir fixture for managing the tmpdir, see
  https://docs.pytest.org/en/6.2.x/tmpdir.html#the-tmpdir-fixture
* perform part of the teardown in the "test_tempdir()" fixture
* change the type of test from "Run" to "Python"
* rename "run" to "test_basic.py"
* optionally start the minio server if the settings are not
  found in command line or env variables, so that the tests are
  self-contained without the fixture setup by test.py.
* instead of sys.exit(), use assert statement, as this is
  what pytest uses.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-05 17:05:13 +08:00
Petr Gusev
b69bc97673 repair_test: add test_reader_with_different_strategies 2023-07-05 13:02:17 +04:00