Commit Graph

7773 Commits

Author SHA1 Message Date
Botond Dénes
19a43b5859 Merge 'repair: Reduce hints and batchlog flush' from Asias He
The hints and batchlog flush requests are issued to all nodes for each repair request when tombstone_gc repair mode is used.

The amount of such flush requests is high when all nodes in the cluster run repair. It is observed it takes a long time, up to 15s, for a repair request to finish such a flush request.

To reduce overhead of the flush, each node caches the flush and only executes the real flush when some time has passed. It is safe to do so before the real flush_time is returned. Repair uses the smallest flush_time from peers as the repair time.

The nice thing about the cache on the receiver side is that all senders can hit the cache. It is better than cache on the sender side.

A slightly smaller flush_time compared to the real flush time will be used with the benefits of significantly dropped hints and batchlog flush. The tradeoff is reasonable.

Fixes #20259

Performance improvement. No backports.

Closes scylladb/scylladb#20260

* github.com:scylladb/scylladb:
  test/test_repair.py: Add test_batchlog_flush_in_repair
  repair: Reduce hints and batchlog flush
  db/batchlog_manager: Add add_delay_to_batch_replay
  db/batchlog_manager: Add get_last_replay
  db/batchlog_manager: wire in batchlog_replay_cleanup_after_replays
  db/config: introduce batchlog_replay_cleanup_after_replays
  db/batchlog_manager: do_batch_log_replay(): add cleanup flag
2024-11-01 14:23:27 +02:00
Pavel Emelyanov
292fd52a60 Merge 'utils: chunked_vector: various constructor improvements' from Avi Kivity
Optimize the various constructors a little, and add an std::from_range_t
constructor.

Minor improvement, so no backports.

Closes scylladb/scylladb#21399

* github.com:scylladb/scylladb:
  utils: chunked_vector: add from_range_t constructor
  utils: chunked_vector: optimize initializer_list constructor
  utils: chunked_vector: iterator constructor: copy spanwise
  utils: chunked_vector: reserve for forward iterators, not just random access iterators, on construction
2024-11-01 15:02:56 +03:00
Botond Dénes
0ee0dd3ef4 Merge 'Collect and report backup progress' from Pavel Emelyanov
Task manager GET /status method returns two counters that reflect task progress -- total and completed. To make caller reason about their meaning, additionally there's progress_units field next to those counters.

This patch implements this progress report for backup task. The units are bytes, the total counter is total size of files that are being uploaded, and the completed counter is total amount of bytes successfully sent with PUT requests. To get the counters, the client::upload_file() is extended to calculate those.

fixes #20653

Closes scylladb/scylladb#21144

* github.com:scylladb/scylladb:
  backup_task: Report uploading progress
  s3/client: Account upload progress for real
  s3/client: Introduce upload_progress
  s3: Extract client_fwd.hh
2024-11-01 10:57:12 +02:00
Kefu Chai
64122b3df3 treewide: s/boost::transform/std::ranges::transform/
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::transform`.

in this change, we:

- replace `boost::transform` with `std::ranges::transform`
- update affected code to work with `std::ranges::transform`

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21318
2024-11-01 08:15:14 +02:00
Nadav Har'El
ee2d75b088 Merge 'Generalize "breakpoint" type of error injection' from Pavel Emelyanov
This pattern is -- if requested (by test) suspend code execution until requestor (the test) explicitly wakes it up. For that the injected place should inject a lambda that is called with so called "handler" at hand and try to read message from the handler. In many cases the inner lambda additionally prints a message into logs that tests waits upon to make sure injection was stepped on. In the end of the day this "breakpoint" is injected like

```
    co_await inject("foo", [] (auto& handler) {
        log.info("foo waiting");
        co_await handler.wait_for_message(timeout);
    });
```

This PR makes breakpoints shorter and more unified, like this

```
    co_await inject("foo", wait_for_message(timeout));
```

where `wait_for_message` is a wrapper structure used to pick new `inject()` overload.

Closes scylladb/scylladb#21342

* github.com:scylladb/scylladb:
  sstables: Use inject(wait_for_message_overload)
  treewide,error_injection: Use inject(wait_for_message) and fix tests
  treewide,error_injection: Use inject(wait_for_message) overload
  error_injection: Add inject() overload with wait_for_message wrapper
2024-10-31 21:56:27 +02:00
Avi Kivity
6a9852d47b utils: chunked_vector: add from_range_t constructor
std::ranges::to<> has a little protocol with containers. Implement it
to get optimized construction.

Similar to the iterator pair constructor, if the range's size can be
obtained (even with an O(N) algorithm), favor that to avoid reallocations.
Copy elements spanwise to promote optimization to memcpy when possible.
2024-10-31 19:32:16 +02:00
Kefu Chai
f8221b960f test: route S3 mock server messages through logger
The S3 mock server (introduced in 5a96549c) currently prints its status
messages directly to stdout, which can be distracting when reviewing test
results. For example:

```console
$ ./test.py --verbose --mode debug object_store/test_backup::test_simple_backup
Found 1 tests.
Starting S3 mock server on ('127.226.51.1', 2012)
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------
[1/1]      object_store  debug  [ PASS ] object_store.test_backup.1 5.99s
Stopping S3 mock server
-------------------------
CPU utilization: 6.5%
```

Move these messages to use proper logging to give developers more control
over their visibility:

- Make logger parameter mandatory in MockS3Server constructor
- Route "Stopping S3 mock server" message through the provided logger
- Add --log-level option to the standalone mock server launcher

The message is now hidden:

```console
$ ./test.py --verbose --mode debug --save-log-on-success object_store/test_backup::test_simple_backup
Found 1 tests.
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------

[1/1]      object_store  debug  [ PASS ] object_store.test_backup.1 6.25s
------------------------------------------------------------------------------
CPU utilization: 5.5%
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21384
2024-10-31 18:21:29 +03:00
Wojciech Mitros
88ab8db944 mv: run view building in streaming scheduling group
View building is an expensive process that takes a long time to complete.
During the build, it's impact on other work should be minimized, even at
the expense of slightly slowing it down.

Instead, view building is currently performed in the the same scheduling
group (gossip) as other high-priority tasks, in particular raft processing,
which slows it down, making races more likely and increasing the number
of retries that need to be done.

While view building is still initiated in the gossip group (as it's the
result of adding a view, which is a schema change), in this patch the bulk
of the view building work is moved to a low-priority, maintenance scheduling
group (named "streaming" after its main use case).

Additionally, a test is added, where we make sure that the scheduling
group is the one most used when building a view.

Fixes https://github.com/scylladb/scylladb/issues/21232

Closes scylladb/scylladb#21326
2024-10-31 10:13:20 +01:00
Nadav Har'El
7572c483b1 test/topology_experimental_raft: fix flaky test
Today, each test function in test/topology_experimental_raft creates a
cluster in the beginning of the test and drops it at the end of the
function. This is very inefficient if you hope (like I do) to write many
small and pinpointed test functions instead of large test functions that
test 20 unrelated things.

Trying to propose a way to change this sad state of affairs, in
test_alternator.py I created a fixture "alternator3" which I hoped could
be used in multiple tests that need a 3-node Alternator cluster.
Currently only one test uses this fixture.

Unfortunately, it turns out the alternator3 fixture is broken, and
led to flaky test runs (sometimes the test using alternator3 picked
up an existing cluster instead of starting with an empty cluster,
and failed). These problems cannot be *completely* fixed at the current
state of the framework. The framework does not currently allow keeping
a 3-node cluster between test functions, while also allowing other test
functions to create different clusters. The specific flakiness we saw
could be fixed by adding a missing before_test() call, but in the
future we would need to ensure that all the test functions that
use it are contiguous in the test file, and I don't see how we can (or
want to) ensure this. So at this point I am giving up and withdrawing
this proposal until the developers of the topology test framework
make this one of their design goals.

Since there was only one test using this fixture, removing it should
make no performance or correctness difference - it should just fix
the flakiness.

Fixes scylladb/scylladb#21322.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#21370
2024-10-31 10:12:26 +01:00
Calle Wilund
c4361037f7 cql_test_env/gossip: Prevent double shutdown call crash
Fixes scylladb/scylladb#21159

When an exception is thrown in sstable write etc such that
storage_manager::isolate is initiated, we start a shutdown chain
for message service, gossip etc. These are synced (properly) in
storage_manager::stop, but if we somehow call gossiper::shutdown
outside the normal service::stop cycle, we can end up running the
method simultaneously, intertwined (missing the guard because of
the state change between check and set). We then end up co_awaiting
an invalid future (_failure_detector_loop_done) - a second wait.

Fixed by
a.) Remove superfluous gossiper::shutdown in cql_test_env. This was added
    in 20496ed, ages ago. However, it should not be needed nowadays.
b.) Ensure _failure_detector_loop_done is always waitable. Just to be sure.

Closes scylladb/scylladb#21379
2024-10-31 10:11:20 +01:00
Nadav Har'El
d3f09638f0 Merge 'compound_compat: replace use of boost ranges with std ranges' from Avi Kivity
Replace use of boost::ranges::join() with another construct, as it
has no std replacement, and replace other uses with their std
equivalent, in order to reduce dependency load.

Code cleanup - no backport.

Closes scylladb/scylladb#21382

* github.com:scylladb/scylladb:
  compound_compat: replace use of boost ranges with std ranges
  compound_compat: simplify seriakization of ka/la sstables static cell names
2024-10-31 10:16:41 +02:00
Avi Kivity
907da210b6 compound_compat: replace use of boost ranges with std ranges
To reduce the dependency load, replace use of boost ranges
with the std equivalent.

Files that lost the indirect boost dependency have it added as a
direct dependency.
2024-10-30 19:58:07 +02:00
Pavel Emelyanov
39cb93be3c treewide,error_injection: Use inject(wait_for_message) and fix tests
This is continuation of previous patch, this time also update tests that
wait for specific message in logs (to make sure injection handler was
called and paused the code execution).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-30 16:53:33 +03:00
Dawid Mędrek
b984488552 cql3: Rename SALTED HASH to HASHED PASSWORD
Cassandra 4.1 announced a new option to create a role with:
`HASHED PASSWORD`. Example:

```
CREATE ROLE bob WITH HASHED PASSWORD = 'hashed_password';
```

We've already introduced another option following the same
semantics: `SALTED HASH`; example:

```
CREATE ROLE bob WITH SALTED HASH = 'salted_hash';
```

The change hasn't made it to any release yet, so in this commit
we rename it to `HASHED PASSWORD` to be compatible with Cassandra.

Additionally, we adjust existing tests to work against Cassandra too.

Fixes scylladb/scylladb#21350

Closes scylladb/scylladb#21352
2024-10-30 14:07:58 +02:00
Tomasz Grabiec
f3869dadc6 Merge 'compound: replace boost ranges with std ranges' from Avi Kivity
Continue standardization on std::ranges. Since compound contains a custom
iterator, we first have to upgrade it to C++20 iterator concepts.

Cleanup / minor refactoring, so no backport.

Closes scylladb/scylladb#21320

* github.com:scylladb/scylladb:
  compound: replace boost ranges with std ranges
  compound: upgrade iterator to be an std::forward_iterator
2024-10-30 11:02:51 +01:00
Asias He
73806f66a5 test/test_repair.py: Add test_batchlog_flush_in_repair
It checks batchlog flush request cache in repair.
2024-10-30 11:10:39 +08:00
Botond Dénes
169c74346d db/batchlog_manager: do_batch_log_replay(): add cleanup flag
Add a flag controlling whether cleanup (memtable flush) will be done
after the replay. This is to allow repair to opt out from cleanup --
when many concurrenty repairs are running, there can be storms of calles
to do_batch_log_replay(), which will be mostly no-op, but they will all
attempt to flush the memtable to clean-up after themselves. This is
unnecessary and introduces latency to repairs, best to leave the cleanup
to the periodic batch-log replay.
2024-10-30 11:07:57 +08:00
Avi Kivity
73b1f66b70 Revert "Merge 'Allow explicitly enabling or disabling tablets when creating a new keyspace' from Benny Halevy"
This reverts commit c286434e4c, reversing
changes made to 6712fcc316.

The commit causes memtable_test to be very flaky in debug mode.
Specifically, subtests test_exceptions_in_flush_on_sstable_open
and test_exceptions_in_flush_on_sstable_write).
2024-10-30 00:55:29 +02:00
Kamil Braun
36cc3bcc90 test: test_crash_coordinator_before_streaming: enable TRACE for raft_topology logger
Issue scylladb/scylladb#21114 reported that sometimes during the test we
timeout when waiting for node to restart after it was killed.
Preliminary investigation showed that the node appears to be hanging
inside `topology_state_load`, while holding `token_metadata` lock, which
prevents `join_topology` from progressing.

Enable TRACE level logging for `raft_topology` so we get more accurate
info where inside `topology_state_load` the hang happens, once the
problem reproduces again in CI.

Closes scylladb/scylladb#21247
2024-10-29 12:46:47 +02:00
Kefu Chai
54d438168a build: cmake: explicitly mark convenience libraries as STATIC
before this change, these
[convenience libraries](https://www.gnu.org/software/automake/manual/html_node/Libtool-Convenience-Libraries.html)
were implicitly built as static libraries by default,
but weren't explicitly marked as STATIC in CMake. While this worked
with default settings, it could cause issues if `BUILD_SHARED_LIBS` is
enabled.

So before we are ready for building these components as shared
libraries, let's mark all convenience libraries as STATIC for
consistency and to prevent potential issues before we properly support
shared library builds.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21274
2024-10-29 10:22:19 +01:00
Pavel Emelyanov
25ae3d0aed backup_task: Report uploading progress
Do it by passing reference to s3::upload_progress_monitor object that
sits on task impl itself. Different files' uploads would then update the
monitor with their sizes and uploaded counters. The structure is
reported by get_progress() method. Unit size is set to be bytes. Test is
updated.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-29 08:40:35 +03:00
Avi Kivity
c286434e4c Merge 'Allow explicitly enabling or disabling tablets when creating a new keyspace' from Benny Halevy
Separate the configuration for enabling the tablets feature from the enablement of tablets when creating new keyspaces.

This change always enables the TABLETS cluster feature and the tablets logic respectively.

The `enable_tablets` config option just controls whether tablets are enabled or disabled by default for new keyspaces.

If `enable_tablets` is set to `true`, tablets can be disabled using `CREATE KEYSPACE WITH tablets = { 'enabled': false }` as it is today.

If `enable_tablets` is set to `false`, tablets can be enabled using `CREATE KEYSPACE WITH tablets = { 'enabled': true }`.

The motivation for this change is to simplify the user experience of using tablets by setting the default for new keyspaces to false amd allowing the user to simply opt-in by using tablets = {enabled: true }.
This is not pissible today.
The user has to enable tablets by default for all new keyspaces (that use the NetworkTopologyStrategy) and then actively opt-out to use vnodes.

* Not required to be backported to OSS versions.  May be backported to specific enterprise versions

Closes scylladb/scylladb#20729

* github.com:scylladb/scylladb:
  data_dictionary: keyspace_metadata::describe: print tablets enabled also when defaulted
  tablets_test: test enable/disable tablets when creating a new keyspace
  treewide: always allow tablets keyspaces
  feature_service: prevent enabling both tablets and gossip topology changes
  alternator: create_keyspace_metadata: enable tablets using feature_service
2024-10-28 21:33:17 +02:00
Nadav Har'El
6712fcc316 test/cql-pytest: add option to run cql-pytes tests against specific release
This patch adds the option "--release <version>" to test/cql-pytest/run,
which downloads the pre-compiled Scylla release with the given version
number and runs the tests against that version. For example, it can be used
to demonstrate that #15559 was indeed a regression between 2022.1 and 2022.2,
by running a recently-added test against these two old versions:

test/cql-pytest/run --release 2022.1 --runxfail \
        test_prepare.py::test_duplicate_named_bind_marker_prepared

test/cql-pytest/run --release 2022.2 --runxfail \
        test_prepare.py::test_duplicate_named_bind_marker_prepared

The first run passes, the second fails - showing the regression.

The Scylla releases are downloaded from ScyllaDB's S3 bucket
(downloads.scylladb.com). They are saved in the build/ directory
(e.g., build/2022.2.9), and if that directory is not removed, when
"run --release" requests the same version again, the previous download
is reused.

Release numbers can look like:

    * 5.4.7
    * 5.4 (will get the latest in the 5.4 branch, e.g., 5.4.7)
    * 5.4.0~rc2 (a prerelease)
    * 2021.1.9 (Enterprise release)
    * 2023.1 (latest in this branch, Enterprise release)

Fixes #13189

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19228
2024-10-28 21:29:44 +02:00
Avi Kivity
94c21e5c05 Merge 'sstables: Reduce amount of I/O for clustering-key-bounded reads from large partitions' from Tomasz Grabiec
Single-row reads from large partition issue 64 KiB reads to the data file,
which is equal to the default span of the promoted index block in the data file.
If users would want to increase selectivity of the index to speed up single-row reads,
this won't be effective. The reason is that the reader uses promoted index
to look up the start position in the data file of the read, but end position
will in practice extend to the next partition, and amount of I/O will be
determined by the underlying file input stream implementation and its
read-ahead heuristics. By default, that results in at least 2 IOs 32KB each.

There is already infrastructure to lookup end position based on upper
bound of the read, in anticipation for sharing the promoted index cache,
but it's not effective becasue it's a non-populating lookup and the upper
bound cursor has its own private cached_promoted_index, which is cold
when positions are computed. It's non-populating on purpose, to avoid
extra index file IO to read upper bound. In case upper bound is far-enough
from the lower bound, this will only increase the cost of the read.

The solution employed here is to warm up the lower bound cursor's
cache before positions are computed, and use that cursor for
non-populating lookup of the upper bound.

We use the lower bound cursor and the slice's lower bound so that we
read the same blocks as later lower-bound slicing would, so that we
don't incur extra IO for cases where looking up upper bound is not
worth it, that is when upper bound is far from the lower bound. If
upper bound is near lower bound, then warming up using lower bound
will populate cached_promoted_index with blocks which will allow us to
locate the upper bound block accurately.  This is especially important
for single-row reads, where the bounds are around the same key.  In
this case we want to read the data file range which belongs to a
single promoted index block.  It doesn't matter that the upper bound
is not exactly the same. They both will likely lie in the same block,
and if not, binary search will bring adjacent blocks into cache.  Even
if upper bound is not near, the binary search will populate the cache
with blocks which can be used to narrow down the data file range
somewhat.

Fixes #10030.

The change was tested with perf-fast-forward.

I populated the data set with `column_index_size_in_kb` set to 1

  scylla perf-fast-forward --populate --run-tests=large-partition-slicing --column-index-size-in-kb=1

Test run:

  build/release/scylla perf-fast-forward --run-tests=large-partition-select-few-rows -c1 --keep-cache-across-test-cases --test-case-duration=0

This test issues two reads of subsequent keys from the middle of a large partition (1M rows in total). The first read will miss in the index file page cache, the second read will hit.

Notice that before the change, the second read issued 2 aio requests worth of 64KiB in total.
After the change, the second read issued 1 aio worth of 2 KiB. That's because promoted index block is larger than 1 KiB.
I verified using logging that the data file range matches a single promoted index block.

Also, the first read which misses in cache is still faster after the change.

Before:

```
running: large-partition-select-few-rows on dataset large-part-ds1
Testing selecting few rows from a large partition:
stride  rows      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    allocs   tasks insns/f    cpu
500000  1         0.009802            1         1        102          0        102        102       21.0     21        196       2       1        0        1        1        0        0        0       568     269 4716050  53.4%
500001  1         0.000321            1         1       3113          0       3113       3113        2.0      2         64       1       0        1        0        0        0        0        0       116      26  555110  45.0%
```

After:

```
running: large-partition-select-few-rows on dataset large-part-ds1
Testing selecting few rows from a large partition:
stride  rows      time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    avg aio    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    allocs   tasks insns/f    cpu
500000  1         0.009609            1         1        104          0        104        104       20.0     20        137       2       1        0        1        1        0        0        0       561     268 4633407  43.1%
500001  1         0.000217            1         1       4602          0       4602       4602        1.0      1          2       1       0        1        0        0        0        0        0       110      26  313882  64.1%
```

Backports: none, not a regression

Closes scylladb/scylladb#20522

* github.com:scylladb/scylladb:
  perf: perf_fast_forward: Add test case for querying missing rows
  perf-fast-forward: Allow overriding promoted index block size
  perf-fast-forward: Test subsequent key reads from the middle in test_large_partition_select_few_rows
  perf-fast-forward: Allow adding key offset in test_large_partition_select_few_rows
  perf-fast-forward: Use single-partition reads in test_large_partition_select_few_rows
  sstables: bsearch_clustered_cursor: Add more tracing points
  sstables: reader: Log data file range
  sstables: bsearch_clustered_cursor: Unify skip_info logging
  sstables: bsearch_clustered_cursor: Narrow down range using "end" position of the block
  sstables: bsearch_clustered_cursor: Skip even to the first block
  test: sstables: sstable_3_x_test: Improve failure message
  sstables: mx: writer: Never include partition_end marker in promoted index block width
  sstables: Reduce amount of I/O for clustering-key-bounded reads from large partitions
  sstables: clustered_cursor: Track current block
2024-10-28 21:13:23 +02:00
Avi Kivity
d3dae09316 compound: replace boost ranges with std ranges
Standardize on the standard range library.

The serialize_value(initializer_list) overload is disambiguated
not to call itself. Apparently it wasn't called before.

Since std::ranges::subrange does not provide operator==, replace
it with std::ranges::equals().
2024-10-28 18:35:41 +02:00
Avi Kivity
61d7f1f6a5 compound: upgrade iterator to be an std::forward_iterator
compound::iterator isn't far from a forward_iterator, and if we want
to use it with std::ranges, we have to upgrade it. This is because
std::ranges::subrange() only provides front() for forward ranges, and
we do use this front(). Boost apparently isn't as strict.

To make it a forward_range, we have to drop operator-> and make
operator* return a value (similar to std::views::tranform), since
forward iterators require that pointers and references be stable,
and this iterator returns a pointer to one of its members.

We also add an iterator_concept member to declare the compatibility
to std::ranges.
2024-10-28 17:16:36 +02:00
Kamil Braun
101c1d50f0 Merge 'fix nodetool status to show zero-token nodes' from Abhinav Kumar Jha
In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes.

This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API  and adding appropriate logic in scylla-nodetool.cc to support zero token nodes.

A test is also added in nodetool/test_status.py to verify this logic. This test fails without this commit’s zero token node support logic, hence verifying the behavior.

This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only
to 6.2 version, since earlier versions don't support zero token nodes.

Fixes: scylladb/scylladb#19849
Fixes: scylladb/scylladb#17857

Closes scylladb/scylladb#20909

* github.com:scylladb/scylladb:
  fix nodetool status to show zero-token nodes
  test: move `wait_for_first_completed` to pylib/util.py
  token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes
2024-10-28 12:19:36 +01:00
Kefu Chai
9f8adcd207 backup_task: track the first failure uploading sstables
before this change, we only record the exception returned
by `upload_file()`, and rethrow the exception. but the exception
thrown by `update_file()` not populated to its caller. instead, the
exceptional future is ignored on pupose -- we need to perform
the uploads in parallel.  this is why the task is not marked fail
even if some of the uploads performed by it fail.

in this change, we

- coroutinize `backup_task_impl::do_backup()`. strictly speaking,
  this is not necessary to populate the exception. but, in order
  to ensure that the possible exception is captured before the
  gate is closed, and to reduce the intentation, the teardown
  steps are performed explicitly.
- in addition to note down the exception in the logging message,
  we also store it in a local variable, which it rethrown
  before this function returns.

Fixes scylladb/scylladb#21248
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21254
2024-10-28 12:54:27 +03:00
Aleksandra Martyniuk
85d9565158 test: repair: drop log checks from test_repair_succeeds_with_unitialized_bm
Currently, test_repair_succeeds_with_unitialized_bm checks whether
repair finishes successfully and the error is properly handled
if batchlog_manager isn't initialized. Error handling depends on
logs, making the test fragile to external conditions and flaky.

Drop the error handling check, successful repair is a sufficient
passing condition.

Fixes: #21167.

Closes scylladb/scylladb#21208
2024-10-28 08:39:16 +02:00
Botond Dénes
be70755f47 Merge 'repair: Fix finished ranges metrics for removenode' from Asias He
The skipped ranges should be multiplied by the number of tables

Otherwise the finished ranges ratio will not reach 100%.

Fixes #21174

Closes scylladb/scylladb#21252

* github.com:scylladb/scylladb:
  test: Add test_node_ops_metrics.py
  repair: Make the ranges more consistent in the log
  repair: Fix finished ranges metrics for removenode
2024-10-28 08:09:32 +02:00
Asias He
9868ccbac0 test: Add test_node_ops_metrics.py
It tests the node_ops_metrics_done metric reaches 100% when a node ops
is done.

Refs: #21174
2024-10-28 08:45:37 +08:00
Kefu Chai
24d14b601b treewide: s/boost::adaptors::map_values/std::views::values/
now that we are allowed to use C++23. we now have the luxury of using
`std::views::values`.

in this change, we:

- replace `boost::adaptors::map_values` with `std::views::values`
- update affected code to work with `std::views::values`
- the places where we use `boost::join()` are not changed, because
  we cannot use `std::views::concat` yet. this helper is only
  available in C++26.

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21265
2024-10-27 21:32:45 +02:00
Avi Kivity
3124711fc4 Merge 'Report rows_merged in compaction_history rest api and nodetool' from Łukasz Paszkowski
Currently, running the `nodetool compactionhistory` command or using the rest api `curl -X GET --header "Accept: application/json" "http://localhost:10000/compaction_manager/compaction_history"` return compaction history without the `row_merged` field.

The series computes rows merged during compaction and provides this information to users via both the nodetool command and the rest api. The `rows_merged` field contains information on merged clustering keys across multiple sstable files. For instance, compacting two sstables of a table consisting of 7 rows where two rows are part of the both sstables, the output would have the following format: {1: 5, 2: 2}.

No backport is required. It extends the existing compaction history output.

Fixes https://github.com/scylladb/scylladb/issues/666

Closes scylladb/scylladb#20481

* github.com:scylladb/scylladb:
  test/rest_api: Add tests for compactionhistory
  nodetool: Add rows merged stats into compactionhistory output
  compaction: Update compaction history with collected histogram
  compaction: Remove const qualifier from methods creating sstable readers
  sstable_set: Add optional statistics to make_local_shard_sstable_reader
  make_combined_reader: Add optional parameter, combined_reader_statistics
  reader_selector: Extend with maximum reader count
  mutation_fragment_merger: Create histogram while consuming mutation fragment batches
2024-10-27 21:26:11 +02:00
Nadav Har'El
6fdd0ebd3b RBAC: confirm that unprivileged users can't read the roles table
A worry was raised that an unprivileged user might be able to read the
system.roles table - which contains the Alternator secret keys (and also
CQL's hashed passwords). This patch adds tests that show that this worry
is unjustified - and acts as a regression test to ensure it never
becomes justified. The tests show that an unprivileged user cannot read
the system.roles table using either CQL or Alternator APIs.

More specifically, the two tests in this patch demonstrate that:

* The Alternator API does not allow an unprivileged user to read ANY system
  table, unless explicitly granted permissions for that table.

* The CQL API whitelists (see service::client_state::has_access) specific
  system tables - e.g., system_schema.tables - that are made readable to any
  unprivileged user. But the system.auth table is NOT whitelisted in this
  way - and is unreadable to unprivileged users unless explicitly granted
  permissions on that table.

The new tests passes on both Scylla and Casssandra.

Refs #5206 (that issue is about removing the Alternator secret keys from
the roles table - but stealing CQL salted hashes is still pretty bad, so
it's good to know that unprivileged users can't read them).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#21215
2024-10-27 21:09:38 +02:00
Nadav Har'El
1634a64ffd cql-pytest: test a few small materialized views CQL issues
While documenting materialized view in a new document (Refs #16569)
I encountered a few questions on how various CQL operations work on
a table that has views, and this patch contains tests that clarify their
answer - and can later guarantee that the answer doesn't unintentionally
change in the future. The questions that these tests answer are:

1. That TRUNCATE on a base table also TRUNCATEs its views. This is just
   a basic test, with no attempt to reproduce issue #17635 (which is
   about the truncation of the base and views not being atomic).

2. That DROP TABLE is *not allowed* on a base table that has views.

3. That DROP KEYSPACE is allowed, even if there are tables with views.

4. Test that ALTER TABLE tbl DROP is never allowed in Cassandra, but
   allowed in some cases by Scylla

5. Test that ALTER TABLE tbl ADD is allowed, and "SELECT *" expands to
   select the new column into the materialized view as well.

All the new tests pass on both Scylla and Cassandra.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#21142
2024-10-27 21:08:28 +02:00
Avi Kivity
7ffbfe8bb3 Merge 'Squash some sstables::test helpers' from Pavel Emelyanov
There's a `missing_summary_first_last_sane` test case that uses some very specific way of modifying an sstable -- it loads one from resources, then tries to "write" the loaded stuff elsewhere. For that it uses a special purpose test::store() helper and a bunch of auxiliary ones from the same class. Those aux helpers are not used anywhere else and are also very special for this test case, so it make sense to keep this whole functionality in a single helper.

Closes scylladb/scylladb#21255

* github.com:scylladb/scylladb:
  test: Squash test::change_generation_number() into test::store()
  test: Squash test::change_dir() into test::store()
  test: Coroutinize sstables::test::store()
2024-10-27 19:59:59 +02:00
Paweł Zakrzewski
b077685fec test/cql-pytest: GROUP BY with static columns
This commit adds a new test case 'test_group_by_static_column_and_tombstones'
to verify the behavior of GROUP BY queries with static columns. The test is
adapted from Cassandra's test suite and aims to reproduce issue #21267.

Original, larger test:
cassandra_tests/validation/operations/select_group_by_test.py::testGroupByWithPaging()

Closes scylladb/scylladb#21270
2024-10-27 14:45:53 +02:00
Abhinav
c00d40b239 fix nodetool status to show zero-token nodes
In the current scenario, the nodetool status doesn’t display information
regarding zero token nodes. For example, if 5 nodes are spun by the
administrator, out of which, 2 nodes are zero token nodes, then nodetool
status only shows information regarding the 3 non-zero token nodes.

This commit intends to fix this issue by leveraging the “/storage_service/host_id
” API  and adding appropriate logic in scylla-nodetool.cc to support zero token nodes.

Robust topology tests are added, which spins up scylla nodes and confirm nodetool
status output for various cases, providing good coverage.
A test is also added in nodetool/test_status.py to verify this logic. These tests fail
without this commit’s zero token node support logic, hence verifying the behavior.

The test `test_status_keyspace_joining_node` has been removed. This test is
based on case where host_id=None, which is impossible. Since we now use
host_id_map for node discovery in nodetool, the nodes with "host_id=None"
go undetected. Since this case is anyway impossible, we can get rid of this.

This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only
to 6.2 version, since earlier versions dont support zero token nodes.

Fixes: scylladb/scylladb#19849
2024-10-25 13:28:09 +05:30
Abhinav
39dfd2d7ac test: move wait_for_first_completed to pylib/util.py
This function is needed in a new test added in the next commit and this
refactoring avoids code duplication.
2024-10-25 13:26:42 +05:30
Pavel Emelyanov
7595ef7303 test: Squash test::change_generation_number() into test::store()
No other usages of the former helper other than immediatelly followed by
the latter, no point in keepint it around.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-24 11:29:17 +03:00
Pavel Emelyanov
e885b0e6cd test: Squash test::change_dir() into test::store()
No other usages of the former helper other than immediatelly followed by
the latter, no point in keepint it around.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-24 11:28:39 +03:00
Pavel Emelyanov
874cf2ea6f test: Coroutinize sstables::test::store()
Ahead of future changes

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-24 11:28:07 +03:00
Benny Halevy
63cbb6e071 tablets_test: test enable/disable tablets when creating a new keyspace
Test both configuration values for `enable_tablets`
and the possibility to explicitly enable or disable
tablets, respectively, when creating a keyspace using the
`tablets = {'enabled': true|false}` CREATE KEYSPACE option.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-10-24 10:18:42 +03:00
Benny Halevy
b0e12cb40d treewide: always allow tablets keyspaces
With the tablets feature always enabled (Unless gossip toopology
changes are forced), the enable_tablets option now controls only
the default for newly created keyspaces.

Even when set to `false`, tablets are still enabled as a
feature and the user may explicitly enable tablets
using `CREATE KEYSPACE <name> WITH tablets = {'enabled': true}`

Note: best viewed with `git show -w`

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-10-24 10:18:42 +03:00
Benny Halevy
bc62407421 feature_service: prevent enabling both tablets and gossip topology changes
Tablets require raft consistent topology changes.
Therefore, document that they are incompatible in
the config help and prevent their usage in
`feature_config_from_db_config`

Fixes scylladb/scylladb#21075

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-10-24 10:18:42 +03:00
Michał Jadwiszczak
68d0c9a18a test/auth_cluster/test_raft_service_levels: match enterprise SL limit
Despite OSS doesn't limit number of created service levels, match the
enterprise limit to decrease divergence in the test between OSS and
enterprise.

Fixes scylladb/scylladb#21044

Closes scylladb/scylladb#21045
2024-10-23 17:44:19 +02:00
Dawid Mędrek
298cafff35 cql-pytest/test_describe: Introduce auxiliary type for service levels
We introduce an auxiliary type representing a service level for making
it easier to adjust the tests in Enterprise. We move the responsibility
of producing create statements for service levels to the class, so we
only need to modify the code in one place when necessary.

All existing relevant tests have been adjusted to this change.

Closes scylladb/scylladb#21230
2024-10-23 10:15:25 +02:00
Kamil Braun
f5c60e538d Merge 'cql/tablets: fix retrying ALTER tablets KEYSPACE' from Piotr Smaron
ALTER tablets-enabled KEYSPACES (KS) may fail due to
`group0_concurrent_modification`, in which case it's repeated by a `for`
loop surrounding the code. But because raft's `add_entry` consumes the
raft's guard (by `std::move`'ing the guard object), retries of ALTER KS
will use a moved-from guard object, which is UB, potentially a crash.
The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event
will be repeated by the topology state machine if it receives the
concurrent modification exception, because the event will remain present
in the global requests queue, hence it's going to be executed as the
very next event.
Note: refactor is implemented in the follow-up commit.

Fixes: scylladb/scylladb#21102

Should be backported to every 6.x branch, as it may lead to a crash.

Closes scylladb/scylladb#21121

* github.com:scylladb/scylladb:
  test: add UT to test retrying ALTER tablets KEYSPACE
  cql/tablets: fix indentation in `rf_change` event handler
  cql/tablets: fix retrying ALTER tablets KEYSPACE
2024-10-23 10:01:21 +02:00
Botond Dénes
519e167611 Merge 'replica/table: check memtable before discarding tombstone during read' from Lakshmi Narayanan Sreethar
On the read path, the compacting reader is applied only to the sstable
reader. This can cause an expired tombstone from an sstable to be purged
from the request before it has a chance to merge with deleted data in
the memtable leading to data resurrection.

Fix this by checking the memtables before deciding to purge tombstones
from the request on the read path. A tombstone will not be purged if a
key exists in any of the table's memtables with a minimum live timestamp
that is lower than the maximum purgeable timestamp.

Fixes #20916

`perf-simple-query` stats before and after this fix :

`build/Dev/scylla perf-simple-query --smp=1 --flush` :
```
// Before this Fix
// ---------------
94941.79 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59393 insns/op,   24029 cycles/op,        0 errors)
97551.14 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59376 insns/op,   23966 cycles/op,        0 errors)
96599.92 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59367 insns/op,   23998 cycles/op,        0 errors)
97774.91 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59370 insns/op,   23968 cycles/op,        0 errors)
97796.13 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59368 insns/op,   23947 cycles/op,        0 errors)

         throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79
instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02
  cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19

// After this Fix
// --------------
95313.53 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59392 insns/op,   24058 cycles/op,        0 errors)
97311.48 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59375 insns/op,   24005 cycles/op,        0 errors)
98043.10 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59381 insns/op,   23941 cycles/op,        0 errors)
96750.31 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59396 insns/op,   24025 cycles/op,        0 errors)
93381.21 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59390 insns/op,   24097 cycles/op,        0 errors)

         throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21
instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73
  cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22
```

This PR fixes a regression introduced in ce96b472d3 and should be backported to older versions.

Closes scylladb/scylladb#20985

* github.com:scylladb/scylladb:
  topology-custom: add test to verify tombstone gc in read path
  replica/table: check memtable before discarding tombstone during read
  compaction_group: track maximum timestamp across all sstables
2024-10-23 10:28:00 +03:00
Botond Dénes
b9b778054a Merge 'test.py: Add option to fail after number of failures' from Petr Hála
* Add `--max-failures` flag to test.py, which will stop the execution after number of failures
   * Helps with "fails-fast" approach and can be used to improve CI speed, especially the 100times run
   * Adds the number of cancelled tests to both summary and junit xml. I did not include them in boost, since it does not contain any statistics.
* Removes unnecessary list creation in test.py
   * Completely unrelated change, but it is small enough that I feel it can be included as part of this one. If this is an issue I can create separate PR for it

*  Add `Test.started` property
   * Helps with determining the current status of the Test and differentiating cancelled/not started tests.
* Add `Test.failed` and `Test.did_not_run` read-only computed properties
   * Helper methods to determine status, instead of using `Test.success`, which does not tell the entire story
* Fix `ScyllaClusterManager.stop()` method, so it doesn't fail when ran multiple times
   * This happens when tasks are cancelled, not sure yet why, it almost certainly non-wanted behaviour but this behaviour was already there and with this fix it no longer causes errors

I will use backport/None for now as it is a new feature.

Fixes https://github.com/scylladb/qa-tasks/issues/1714

Closes scylladb/scylladb#21098

* github.com:scylladb/scylladb:
  test.py: Add option to fail after number of failures
  test.py: Add started, failed and did_not_run properties to Test
  test.py: Remove unnecessary list creation
  test: lib: Fix ScyllaClusterManager.stop()
2024-10-23 09:11:52 +03:00