We split some of the test cases so it's clearer what's going on in the
test. Also, if a bug happens in the future, it should be easier to
reason about it when it corresponds to exactly one CQL statement
instead of possibly two.
Read and Write Consumed Capacity units are an abstract way of measuring Alternator actions. In general, they correspond to the read or write data.
In the long run, the RCU/WCU adds a way of charging an operation and limiting usage.
This series addresses two issues: consume capacity request API and metering.
The Alternator (and DynmoDB) API has an optional parameter allowing users to check the number of units an operation consumes. When a user adds that parameter, the response will contain the number of units used for the operation.
This series adds the consume capacity support to the get_item and put_item, adds a metric to collect the overall RCU and WCU used, and adds a test for the new functionality.
Follow-up PRs will add support for more operations and GSI.
Replaces #19811
Partially implement: #5027Closesscylladb/scylladb#21543
* github.com:scylladb/scylladb:
alternator/test_metrics: Add tests for table consumption units
test_returnconsumedcapacity.py: Add putItem tests
Alternator: add WCU support
Add test/alternator/test_returnconsumedcapacity.py
alternator/executor: Add consume capacity for get_item
alsternator/stats: Add rcu and wcu metrics to stats
alternator/executor.hh: white-space cleanup
Add the consume_capacity helper class
The semantics of Scylla's materialized views may vary depending on how their
primary keys correspond to the base table's one. One of the differences is
how we handle writes to columns in the base table that are not selected by
a view:
* Case 1: The view's PK is a permutation of the base table's PK:
Since the view's primary key cannot be changed in an update, a row in
the view remains alive as long as the corresponding row in the base table
is alive.
The tricky part comes when the base table has columns that are NOT selected
by the view. CQL3 used to not allow for defining a table that didn't have
any other columns besides its primary key. Also, when inserting a row into
a table, it was mandatory to provide at least one value aside from the
primary key. At some point it changed [1] and the implementation of the
solution relied on the notion of the row marker.
Putting the details aside, consider the following scenario:
(i) the base table has a primary key consisting of columns
c_1, ..., c_k, and it has regular columns rc_1, ..., rc_n,
(ii) the primary key of an MV defined on that table consists of
a permutation of c_1, ..., c_k. The MV doesn't select at least
one of the regular columns of the base table. Without loss of
generality, let that unselected column be rc_1.
(iii) the base table has a row R whose only non-null value is the one
in the regular column rc_1.
Now, what will R correspond to in the MV? The base table doesn't have a row
marker, but all of its regular columns in the MV will be NULLs. That's NOT
allowed.
To solve that problem, all unselected columns have corresponding virtual
columns in the MV; the only information they provide is whether there is
a value in the base table or not. This way, the MV knows if a row is still
alive or not.
For that reason, we send view updates to virtual columns in the following
cases:
(i) the value in the column changes from NULL to a value, i.e. it's
created,
(ii) the value in the column exists, but its TTL has been updated.
* Case 2: The view's PK has one more column that the base table's one:
Since the primary key of the view has a regular column C from the base
table, it is guaranteed that if there's a row in the MV, the corresponding
row in the base table can remain alive: since C is part of the view's PK,
it must have a value, so the row in the base table has a value in C too.
The problem with virtual columns from the previous case doesn't manifest
in this one. The liveness of the cell in C determines the liveness of
the whole row in the view.
The semantics gets more complex, but the conclusion is this: in case 1,
virtual columns exist and we may need to generate view updates for them,
while in case 2 virtual columns do NOT exist and so we don't generate
view updates for them.
What changes in this patch is we adjust the code to it. If a view has
a regular column from the base table as part of its primary key, we
no longer emit view updates when we change a column unselected by that
view. It is purely an OPTIMIZATION change.
[1]: https://issues.apache.org/jira/browse/CASSANDRA-4361Fixesscylladb/scylladb#21652Closesscylladb/scylladb#21653
Schema of system tables is defined statically and table_schema_version needs to be explicitly set in code like this:
```
builder.with_version(system_keyspace::generate_schema_version(table_id, version_offset));
```
Whenever schema is changed, the schema version needs to change, otherwise we hit undefined behavior when trying to interpret mutation data created with the old schema using the new schema.
It's not obvious that one needs to do that and developers often forget to do that. There were several instances of mistakes of omission, some caught during review, some not, e.g.: 31ea74b96e.
This patch changes definitions to call the new `schema_builder::with_hash_version()`, which will make the schema builder compute version from schema definition so that changes of the schema will automatically change the version. This way we no longer rely on the developer to remember to bump the version offset.
All nodes should arrive at the same version, which is verified by existing `test_group0_schema_versioning` and a new unit test: `test_system_schema_version_is_stable`.
Closesscylladb/scylladb#21602
* github.com:scylladb/scylladb:
system_tables: Compute schema version automatically
schema_builder: Introduce with_hash_version()
schema: Store raw_view_info in schema::raw_schema
schema: Remove dead comment
hashing: Add hasher for unordered_map
hashing: Add hasher for unique_ptr
hashing: Add hasher for double
[avi: add missing include <memory> to hashing.hh]
This adds a new tablet migration kind: repair. It allows tablet repair
scheduler to use this migration kind to schedule repair jobs.
The current repair scheduler implementation does the following:
- A tablet is picked to be repaired when is requested by user
- The tablet repair can be scheduled along with tablet migration and
rebuild. It runs in the tablet_migration track.
- Repair jobs are scheduled in a smart way so that at any point in time,
there are no more than configured jobs per shard, which is similar to
scylla manager's control.
New feature. No backport is needed.
Closesscylladb/scylladb#21088
* github.com:scylladb/scylladb:
test: Add tests for tablet repair scheduler
repair: Add restful API for tablet repair
repair: Add tablet repair scheduler internal API support
docs: Update system_keyspace.md for tablet repair related info
docs: Add docs for tablet repair migration
repair: Add core tablet repair scheduler support
messaging_service: Introduce TABLET_REPAIR verb
tablet_allocator: Introduce stream_weight for tablet_migration_streaming_info
network_topology_strategy: Preserve fields of task_info in reallocate_tablets
Adding tests to verify the RCU and WCU metrics.
A new helper function check_increases_metric_exact check that a given
metrics increased by a given number.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This patch adds testing for putItem consume capacity.
There is an additional test for number support. Numbers are encoded
differently with alternator and dynamoDB, the test adds some flexibility
in the result so it would pass both DynamoDB and Alternator.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
now that we are allowed to use C++23. we now have the luxury of using `std::ranges::find_if`.
in this change, we:
- replace `boost::find_if` with `std::ranges::find_if`
- remove all `#include <boost/range/algorithm/find_if.hpp>`
to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#21495
* github.com:scylladb/scylladb:
treewide: replace boost::find_if with std::ranges::find_if
counters: replace boost::find_if with std::ranges::find_if
combine.hh: use std::iter_const_reference_t when appropriate
Currently, PER PARTITION LIMIT is not implemented for aggregates and queries can result in more rows than expected from the same partition.
Instrument the result_set_builder class so that it can enforce PER PARTITION LIMIT for aggregate queries, specifically:
- add per_partition_limit to the result_set_builder
- expose the number of input rows in the selector
result_set_builder gets two new functions handling partition start and end:
- accept_partition_end for notifying that a partition has been finished. This is also called when a page ends, so we cannot simply flush here, as a naive implementation could do.
- accept_new_partition, where we flush_selectors() if it's indeed a new partition (and not a continuation of the previous) and the query has a grouping: we don't want to flush on new partition in a query like SELECT COUNT(*) FROM foo;
Fixes#5363Closesscylladb/scylladb#21125
* github.com:scylladb/scylladb:
test: enable PER PARTIION LIMIT + GROUP BY tests
cql3: respect PER PARTITION LIMIT for aggregates
cql3: selection: count input rows in the selector
cql3: selection: pass per partition limit to the result_set_builder
cql3: show different messages for LIMIT and PER PARTITION LIMIT in get_limit
Currently, task_manager_module::abort_all_repairs marks top-level repairs as aborted (but does not abort them) and aborts all existing shard tasks.
A running repair checks whether its id isn't contained in _aborted_pending_repairs and then proceeds to create shard tasks. If abort_all_repairs is executed after _aborted_pending_repairs is checked but before shard tasks are created, then those new tasks won't be aborted. The issue is the most severe for tablet_repair_task_impl that checks the _aborted_pending_repairs content from different shards, that do not see the top-level task. Hence the repair isn't stopped but it creates shard repair tasks on all shards but the one that initialized repair.
Abort top-level tasks in abort_all_repairs. Fix the shard on which the task abort is checked.
Fixes: #21612.
Needs backport to 6.1 and 6.2 as they contain the bug.
Closesscylladb/scylladb#21616
* github.com:scylladb/scylladb:
test: add test to check if repair is properly aborted
repair: add shard param to task_manager_module::is_aborted
repair: use task abort source to abort repair
repair: drop _aborted_pending_repairs and utilize tasks abort mechanism
repair: fix task_manager_module::abort_all_repairs
This patch adds testing for the consumedCapacity header.
It's currently only test get_item
The test works with both AWS and alternator.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Before these changes, we didn't wait for the materialized views to
finish building before writing to the base table. That led to generating
an additional view update, which, in turn, led to test failures.
The scenario corresponding to the summary above looked like this:
1. The test creates an empty table and MVs on it.
2. The view builder starts, but it doesn't finish immediately.
3. The test performs mutations to the base table. Since the views
already exist, view updates are generated.
4. Finally, the view builder finishes. It notices that the base
table has a row, so it generates a view update for it because
it doesn't notice that we already have data in the view.
We solve it by explicitly waiting for both views to finish building
and only then start writing to the base table.
Additionally, we also fix a lifetime issue of the row the test revolves
around, further stabilizing CI.
Fixes https://github.com/scylladb/scylladb/issues/20889
Backport: These changes have no semantic effect on the codebase,
but they stabilize CI, so we want to backport them to the maintained
versions of Scylla.
Closesscylladb/scylladb#21632
* github.com:scylladb/scylladb:
test/boost/view_schema_test.cc: Increase TTL in test_view_update_generating_writetime
test/boost/view_schema_test.cc: Wait for views to build in test_view_update_generating_writetime
The auxiliary function `eventually()` (defined in `test/lib/eventually.hh`)
tries to execute a passed function. If it throws, `eventually()` sleeps
for `2^#previous_attempts` milliseconds and tries to perform it again.
The default limit of attempts is 17.
In `test_view_update_generating_writetime`, right before the last test
case, we perform:
```cql
UPDATE t USING TTL 10 AND TIMESTAMP 8 SET g=40 WHERE k=1 AND c=1;
```
The test case itself executes:
```cql
SELECT WRITETIME(g) FROM t;
```
and asserts that the result of the query is equal to 8, i.e. it
corresponds to the timestamp of the last write to the table `t`.
However, if the test case keeps failing, then during its 14th attempt
(so affter sleeping for at least `2^14 - 1` milliseconds, which amounts
to about 16 seconds), we'll observe the following error:
```
[Exception] - std::runtime_error: Expected row not found: [0000000000000008] not in {result_message::rows {row: null}}
```
The reason behind it is the specified TTL is too short. 10 seconds will
have already passed before the 14th attempt, so the value in the column
`g` will be `NULL` again. In particular, the `WRITETIME(g)` will no
longer be equal to `8`.
To solve that issue, we change the TTL in the CQL statement to 300.
The time spent on 17 loops of `eventually()` amounts to about
`2^18 - 1` milliseconds, which is about 263 seconds. That's why
setting the TTL to 300 seconds should be enough to prevent the error
from occurring.
Before these changes, we didn't wait for the materialized views to
finish building before writing to the base table. That led to generating
an additional view update, which, in turn, led to test failures.
The scenario corresponding to the summary above looked like this:
1. The test creates an empty table and MVs on it.
2. The view builder starts, but it doesn't finish immediately.
3. The test performs mutations to the base table. Since the views
already exist, view updates are generated.
4. Finally, the view builder finishes. It notices that the base
table has a row, so it generates a view update for it because
it doesn't notice that we already have data in the view.
We solve it by explicitly waiting for both views to finish building
and only then start writing to the base table.
Fixesscylladb/scylladb#20889
Alternator's "/localnodes" HTTP requests is supposed to return the list
of nodes in the local DC to which the user can send requests.
Before commit bac7c33313 we used the
gossiper is_alive() method to determine if a node should be returned.
That commit changed the check to is_normal() - because a node can be
alive but in non-normal (e.g., joining) state and not ready for
requests.
However, it turns out that checking is_normal() is not enough, because
if node is stopped abruptly, other nodes will still consider it "normal",
but down (this is so-called "DN" state). So we need to check **both**
is_alive() and is_normal().
This patch also adds a test reproducing this case, where a node is
shut down abruptly. Before this patch, the test failed ("/localnodes"
continued to return the dead node), and after it it passes.
Fixes#21538
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#21540
This change was created in the same spirit of f8221b960f.
The S3ProxyServer (introduced in 8919e0abab) currently prints its
status directly to stdout, which can be distracting when reviewing test
results. For example:
```console
$ ./test.py --mode release object_store/test_backup::test_simple_backup_and_restore
Found 1 tests.
Setting minio proxy random seed to 1731924995
Starting S3 proxy server on ('127.193.179.2', 9002)
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[1/1] object_store release [ PASS ] object_store.test_backup.1
Stopping S3 proxy server
------------------------------------------------------------------------------
CPU utilization: 3.1%
```
Move these messages to use proper logging to give developers more control
over their visibility:
- Make logger parameter mandatory in S3ProxyServer constructor
- Route "Stopping S3 proxy" message through the provided logger
- Add --log-level option to the standalone proxy server launcher
The message is now hidden:
```console
$ ./test.py --mode release object_store/test_backup::test_simple_backup_and_restore
Found 1 tests.
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[1/1] object_store release [ PASS ] object_store.test_backup.1
------------------------------------------------------------------------------
CPU utilization: 4.1%
```
---
this change improves the developer experience, hence no need to backport.
Closesscylladb/scylladb#21610
* github.com:scylladb/scylladb:
test: route S3 Proxy server messages through logger
test: s3_proxy: remove unused method
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::find_if`.
in this change, we:
- replace `boost::find_if` with `std::ranges::find_if`
- remove all `#include <boost/range/algorithm/find_if.hpp>`
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
For historic reasons, we have (in bytes.hh) a type sstring_view which is an alias for std::string_view - since the same standard type can hold a pointer into both a seastar::sstring and std::string.
This alias in unnecessary and misleading to new developers, who might be misled to believe it is assume it is somehow different from std::string_view - when it isn't.
This series removes all uses of sstring_view (changing them to use std::string_view), and in the last patch removes the alias itself. A few functions whose name referred to "sstring" but take a std::string_view were renamed.
The patches are fairly mechanical and trivial, with no functional changes intended. To ease the review the series was split to a few smaller patches that modify specific areas of the code.
Fixes#4062.
Closesscylladb/scylladb#21617
* github.com:scylladb/scylladb:
bytes: remove unused alias sstring_view
change remaining sstring_view to std::string_view
test: change sstring_view to std::string_view
cql3: change sstring_view to std::string_view
alternator: change sstring_view to std::string_view
type: change from_sstring() to from_string_view()
cross-tree: change to_sstring_view() to to_string_view()
Wean the mutation code (at least the headers) from boost ranges to std ranges,
in order to reduce the dependency load.
Cleanup, so no backport.
Closesscylladb/scylladb#21601
* github.com:scylladb/scylladb:
partition_snapshot_row_cursor.hh: switch from boost ranges to std ranges
mutation: mutation_partition_v2.hh: switch from boost ranges to std ranges
mutation: mutation_partition.hh: switch from boost ranges to std ranges
partition_snapshot_reader.hh: drop unused include boost/range/algorithm/heap_algorithm.hpp
Our "sstring_view" is an historic alias for the standard std::string_view.
The test/ directory used this old alias in a few of random places, let's
change them to use the standard type name.
Refs #4062.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
For historic reasons, we have (in bytes.hh) a type sstring_view which
is an alias for std::string_view - since the same standard type can hold
a pointer into both a seastar::sstring and std::string.
This alias in unnecessary and misleading to new developers (who might
assume it is somehow different from std::string_view). This patch doesn't
yet remove all occurances of sstring_view (the request in #4062), but
begins to do it by renaming one commonly-used function, to_sstring_view(bytes)
to to_string_view() and of course changes all its uses to the new name.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This change was created in the same spirit of f8221b960f.
The S3ProxyServer (introduced in 8919e0abab) currently prints its
status directly to stdout, which can be distracting when reviewing test
results. For example:
```console
$ ./test.py --mode release object_store/test_backup::test_simple_backup_and_restore
Found 1 tests.
Setting minio proxy random seed to 1731924995
Starting S3 proxy server on ('127.193.179.2', 9002)
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[1/1] object_store release [ PASS ] object_store.test_backup.1
Stopping S3 proxy server
------------------------------------------------------------------------------
CPU utilization: 3.1%
```
Move these messages to use proper logging to give developers more control
over their visibility:
- Make logger parameter mandatory in S3ProxyServer constructor
- Route "Stopping S3 proxy" message through the provided logger
- Add --log-level option to the standalone proxy server launcher
The message is now hidden:
```console
$ ./test.py --mode release object_store/test_backup::test_simple_backup_and_restore
Found 1 tests.
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[1/1] object_store release [ PASS ] object_store.test_backup.1
------------------------------------------------------------------------------
CPU utilization: 4.1%
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
neither `InjectingHandler.log_error`, nor `InjectingHandler.log_message`
is used. so let's drop them.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tablet_repair_task_impl keeps a vector of tablet_repair_task_meta,
each of which keeps an effective_replication_map_ptr. So, after
the task completes, the token metadata version will not change for
task_ttl seconds.
Implement tablet_repair_task_impl::release_resources method that clears
tablet_repair_task_meta vector when the task finishes.
Set task_ttl to 1h in test_tablet_repair to check whether the test
won't time out.
Fixes: #21503.
Closesscylladb/scylladb#21504
This PR enables compaction tasks to verify the integrity of the input data through checksum and digest checks. The mechanism for integrity checking was introduced in previous PRs (#20207, #20720) as a built-in functionality of the input streams. This PR integrates this mechanism with compaction. The change applies to all compaction types and covers both compressed and uncompressed SSTables adhering to the 3.x format. If a compaction task reads only part of an SSTable, then only the per-chunk checksums are verified, not the digest.
The PR consists of:
* Changes to mx readers to support integrity checking. The kl readers, considered as compatibility-only, were left unchanged. Also, integrity checking on single-partition reversed reads (`data_consume_reversed_partition()`) remains unsupported by mx readers as this is not used in compaction.
* Changes to `sstable` and `sstable_set` APIs to allow toggling integrity checks for mx readers.
* Activation of integrity checking for all compaction types.
* Tests for all compaction types with corrupted SSTables.
Integrity checks come at a cost. For uncompressed SSTables, the cost is the loading of the CRC and Digest components from disk, and the calculation of checksums and digest from the actual data. For compressed SSTables, checksums are stored in-place and they are being checked already on all reads, so the only extra cost is the loading and calculation of the digest. The measurements show a ~5% regression in compaction performance for uncompressed SSTables, and a negligible regression for compressed SSTables.
Command: `perf-sstable --smp=1 --cpuset=1 --poll-mode --mode=compaction --iterations=1000 --partitions 10000 --sstables=1 --key_size=4096 --num_columns=15 --column_size={32, 1024, 3500, 7000, 14500}`
Uncompressed SSTables:
```
+--------------+-----------------------+----------------------+------------+
| SSTable Size | No Integrity (p/sec) | Integrity (p/sec) | Regression |
+--------------+-----------------------+----------------------+------------+
| 50 MiB | 65175.59 +- 80.82 | 61814.63 +- 72.88 | 5.16% |
| 200 MiB | 41795.10 +- 60.39 | 39686.28 +- 45.05 | 5.05% |
| 500 MiB | 21087.41 +- 30.72 | 20092.93 +- 25.05 | 4.72% |
| 1 GiB | 12781.64 +- 21.77 | 12233.94 +- 21.71 | 4.29% |
| 2 GiB | 6629.99 +- 9.40 | 6377.13 +- 8.28 | 3.81% |
+--------------+-----------------------+----------------------+------------+
```
Compressed SSTables:
```
+--------------+-----------------------+----------------------+------------+
| SSTable Size | No Integrity (p/sec) | Integrity (p/sec) | Regression |
+--------------+-----------------------+----------------------+------------+
| 50 MiB | 53975.05 +- 63.18 | 53825.93 +- 62.28 | 0.28% |
| 200 MiB | 28687.94 +- 26.58 | 28689.41 +- 26.91 | 0% |
| 500 MiB | 13865.35 +- 15.50 | 13790.41 +- 14.88 | 0.54% |
| 1 GiB | 7858.10 +- 7.71 | 7829.75 +- 9.66 | 0.36% |
| 2 GiB | 4023.11 +- 2.43 | 4010.54 +- 2.55 | 0.31% |
+--------------+-----------------------+----------------------+------------+
(p/sec = partitions/sec)
```
Refs #19071.
New feature, no backport is needed.
Closesscylladb/scylladb#21153
* github.com:scylladb/scylladb:
test: Add test for compaction with corrupted SSTables
compaction: Enable integrity checks for all compaction types
sstables: Add integrity option to factories for sstable_set readers
sstables: Add integrity option to sstable::make_reader()
sstables: Add integrity option to mx::make_reader()
sstables: Load checksums and digests in mx full-scan reader
sstables: Add integrity option to data_consume_single_partition()
sstables: Disengage integrity_check from sstable class
sstables: Allow data sources to disable digest check
This depends on the previous change to the schema_builder
which makes version computation depend on definition only
instead of being new time uuid.
This way we avoid the possibility for a common mistake
when schema of a system table is extended but we forget
to bump up its version passed to .with_version().
data_consume_rows_context_m has a _column_value buffer it uses to read
key and column values into, preparing for parsing and consuming them.
This buffer is reset (released) in a few different cases:
* When using it for key - after consuming its content
* When using it for column value - when a colum has no value
However, the buffer is not released when used for a column value and the
column is consumed. This means that if a large column is read from the
sstable, this buffer can potentially linger and keep consuming memory
until either one of the other release scenarios is hit, or the reader is
destroyed.
Add a third release scenario, releasing the buffer after the row end was
consumed. This allows the buffer to be re-used between columns of the
same row, at the same time ensuring that a large buffer will not linger.
This patch can almost halve the memory consumption of reads in certain
circumstances. Point in case: the test
test_reader_concurrency_semaphore_memory_limit_engages starts to fail
after this fix, because the read doesn't trigger the OOM limit anymore
and needs doubling of the concurrency to keep passing.
This issue was found in a dtest
(`test_ics_refresh_with_big_sstable_files`), which writes some large
cells of up to 7MiB. After reading the row containing this large cell,
the reader holds on to the 7MiB buffer causing the semaphore's OOM
protection to kick in down the line.
Fixes: https://github.com/scylladb/scylladb/issues/21160Closesscylladb/scylladb#21132
The later includes the former and in addition to `seastar::format()`,
`print.hh` also provides helpers like `seastar::fprint()` and
`seastar::print()`, which are deprecated and not used by scylladb.
Previously, we include `seastar/core/print.hh` for using
`seastar::format()`. and in seastar 5b04939e, we extracted
`seastar::format()` into `seastar/core/format.hh`. this allows us
to include a much smaller header.
In this change, we just include `seastar/core/format.hh` in place of
`seastar/core/print.hh`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21574
just for better readability:
* chain comparison statement when appropriate
* do not use f-string when there are no place holders
* use list comprehension when initializing a set
* remove unused import statement
* move import statement of the standard library before
those which import the 3rd-party modules
* put two empty lines in-between top-level functions.
this is recommended by PEP8.
* remove the extraneous spaces around `=` in parameter
list.
* remove the extraneous spaces in a list like `[ 1, 2, 3 ]`
so it looks like `[1, 2, 3]`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21561
This patch moves (after straightforward translation) the test
"test_views_with_future_tombstone", a regression test for #5793,
from the C++ boost framework to the Python cqlpy framework.
The main motivation this move is the ease of debugging failures:
During the work on a patch for #20679 (eliminating read-before-write)
this test began to fail, and understanding where the C++ failed was
near impossible: the Boost test framework reports that the test failed,
but not in which line or why, and adding printouts to this huge source
file require a ridiculous amount of time for recompilation every time.
In contrast, the new pytest-based version shows exactly where the
error is, beautifully:
```
> assert [] == list(cql.execute(f'select * from {mv}'))
E assert [] == [Row(b=2, a=1, c=3, d=4, e=5)]
test_materialized_view.py:1614: AssertionError
```
It shows exactly which assertion failed, and exactly what were the
values that were compared. Beautiful and super helpful for debugging.
Beyond the ease of debugging, moving this (and later, other) test to
the cql-pytest framework has additional advantages:
1. The test was misplaced, in the cql_test source file, and it belongs
with materialized views tests so let's use this opportunity to move
it to the right place.
2. Can easily run the same test on multiple versions of Scylla, and
also on Cassandra. It's a good way to confirm the test is correct.
3. No need to recompile the test after every attempt to fix the bug.
The cql_query_test.cc is huge - over 6,000 lines - and takes over
a minute to compile after every attempt to fix a bug.
Refs #16134 (the issue asks to move all MV tests to cql-pytest)
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#21552
Added test cases to reproduce issues with tablet migration involving views.
Refs #19149
Refs #21564
No backport needed as the PR adds only testcases.
Closesscylladb/scylladb#21566
* github.com:scylladb/scylladb:
topology_custom/test_tablets.py: add testcase for tablet migration of staged sstables
topology_custom/test_tablets.py: add testcase for tablet migration with unbuilt views
Tablet migration mixes staged and non staged sstables causing base view
inconsistencies in the pending replica. Added a testcase to reproduce
this issue.
Refs #19149.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
When a tablet gets migrated right after view was created but before the
view builder registered the new view, the pending replica will not
register the sstables in the tablet for view building causing base view
inconsistencies. This commit adds a testcase to reproduce the issue.
Refs #21564
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
We recently added a "--release <version>" option to test/cql-pytest/run
to run a cql-pytest test against a released version of Scylla, downloaded
automatically from ScyllaDB's precompiled binary repository. This patch
adds the same capability also to test/alternator/run - allowing to run
a current test/alternator test on older releases of Scylla. The
implementation in this patch reuses the same implementation from the
cql-pytest patch.
Here is an example use case: the pull request #19941 claimed that
a certain bug fix was backported to release 6.0. Was it? Let's run
the test reproducing that bug on two releases:
test/alternator/run --release 6.0 test_streams.py::test_stream_list_tables
test/alternator/run --release 6.1 test_streams.py::test_stream_list_tables
It shows that the test passes on 6.1 (so the bug is fixed there) but the
test fails 6.0. It turns out that although the fix was backported to
branch-6.0, this happened shortly after 6.0.4 was released and no later
6.0 minor release came afterwards! So the bug wasn't actually fixed
on any official release of 6.0.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#21343
The test is only sending a subset of the running servers for the rolling
restart. The rolling restart is checking the visibility of the restarted
node agains the other nodes, but if that set is incomplete some of the
running servers might not have seen the restarted node yet.
Improved the manager client rolling restart method to consider all the
running nodes for checking the restarted node visibility.
Fixes: scylladb/scylladb#19959Closesscylladb/scylladb#21477
this change was created in the same spirit of aebb5329, which
included the fmt/iostream.h and iostream when appropriate so that
the tree can build with seastar submodule including e96932b0.
in the seastar change, we stopped including unused `fmt/ostream.h`
in a public header in seastar, so the parent projects relying on
the header to indirectly include fmt/ostream.h and iostream would
have to include these headers explicitly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21525
In the previous patch we enabled integrity checking on all compaction
types. This means that compaction jobs should now fail if they encounter
an SSTable with an invalid checksum or digest.
Add a test to verify this behavior. Test every compaction type with:
* compressed/uncompressed SSTables with invalid checksums
* compressed/uncompressed SSTables with invalid digests
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Split compaction divides the partitions in an existing sstable into two groups and writes them into two new sstables, which replace the original one. The partition count from the original sstable is used as an estimate when writing the new ones, but this estimate is not accurate as the partitions are split between the two new sstables and each will contain only a portion of the original partition count. This also causes the bloom filters to be rebuilt at the end of compaction, as they were initially built with inaccurate estimates.
Fix this by using a better estimate for the output sstables, which is half the original partition count.
Fixes#20253
Improvement; No need to backport.
Closesscylladb/scylladb#20908
* github.com:scylladb/scylladb:
compaction: use better partition estimate for split compaction
compaction::table_state: implement `get_token_range_after_split()` wrapper
replica/table: implement `get_token_range_after_split()` wrappers
tablet_map: introduce `get_token_range_after_split()`
tablet_map: implement existing get_token_range() using the new variant
tablet_map: introduce `get_token_range()` variant
tablet_map: introduce `get_last_token()` variant
Add test to ensure backup tasks properly handle non-existent snapshots
by:
- Verifying backup task reports failure status
- Ensuring error is propagated through task status API
Previously untested edge case when backing up a snapshot that doesn't
exist in the test_backup.py tests.
Refs scylladb/scylladb#21381
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21385
1. Add `retry_strategy` interface and default implementation for exponential back-off retry strategy.
2. Add new S3 related errors, also introduce additional errors to describe pure http errors that has no additional information in the body.
3. Add retries to the s3 client, all retries are coordinated by an instance of `retry_strategy`. In a case of error also parse response body in attempt to retrieve additional and more focused error information as suggested by AWS. See https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html. Introduce `aws_exception` to carry the original `aws_error`.
4. Discard whatever exception is thrown in `abort_upload` when aborting multipart upload since we don't care about cleanly aborting it since there are other means to clean up dangling parts, for example `rclone cleanup` or S3 bucket's Lifecycle Management Policy.
5. Add tests to cover retries, and retry exhaustion. Also add tests for jumbo upload.
6. Add the S3 proxy which is used to randomly inject retryable S3 errors to test the "retry" part of the S3 client. Switch the `s3_test` to use the S3 proxy. `s3_tests` set afloat `put_object` problem that was causing segmentation when retrying, fixed.
7. Extend the `s3_test` to use both `minio` and `proxy` configurations.
8. Add parameter to the proxy to seed the error injection randomization to make it replayable.
fixes: #20611fixes: #20613Closesscylladb/scylladb#21054
* github.com:scylladb/scylladb:
aws_errors: Make error messages more verbose.
test: Make the minio proxy randomization re-playable
test/boost/s3_test: add error injection scenarios to existing test suite
test: Switch `s3_test` to use proxy
test: Add more tests
client: Stop returning error on `DELETE` in multipart upload abortion
client: Fix sigsegv when retrying
client: Add retries
client: Adjust `map_s3_client_exception` to return exception instance
aws_errors: Change aws_error::parse to return std::optional<>
aws_errors: Add http errors mapping into aws_error
client: Add aws_exception mapping
aws_error: Add `aws_exeption` to carry original `aws_error`
aws_errors: Add new error codes
client: Introduce retry strategy
Add a buffer hint to the multishard reader. This is an internal hint, used by the multishard reader to provide a hint to the shard reader, on how much data exactly is needed by the multishard reader from the respective shard. This hint allows eliminating extraneous cross-shard round-trips and possible shard reader evict-recreate cycles. Building on this, repair sets its own row buffer size as the max buffer size on the multishard reader, ensuring that the row buffer is filled with the minimum amount of cross-shard round trips and minimal reader recreation.
To further eliminate unnecessary evictions, this PR also disables the multishard reader's read-ahead which is a mechanism that was designed to reduce latency for user-reads but it can be too aggressive for repair, causing unnecessary extra congestion on the already struggling streaming semaphores.
Refs: https://github.com/scylladb/scylladb/issues/18269
Fixes: https://github.com/scylladb/scylladb/issues/21113
The performance impact was measured with an SCT test, which creates a cluster of 3 nodes with 16 shards, then adds a 4th one with 12 shards.
Currently, it is the bootstrap time which is the worse in the case of mixed shard clusters, see below for the improvement measured during bootstrap:
| | master | buffer-hint | metric |
| ------------ | ------------- | ------------- | --------------------------------------------------- |
| evictions | 0.9M | 93.0K | scylla_database_paused_reads_permit_based_evictions |
| read (bytes) | 9.0T | 3.9T | scylla_reactor_aio_bytes_read |
| read (ops) | 88.0M | 33.5M | scylla_reactor_aio_reads |
| time | 56min | 20min | N/A |
This is a performance improvement, no backport required.
Closesscylladb/scylladb#20815
* github.com:scylladb/scylladb:
test/boost/mutation_reader_test: add test for multishard reader buffer hint
repair/row_level: disable read-ahead
db/config: introduce repair_multishard_reader_enable_read_ahead
readers/multishard: implement the read_ahead flag
replica/database: make_multishard_streaming_reader(): expose the read_ahead parameter
readers/multishard: add read_ahead parameter
repair/row_level: set max buffer size on multishard reader
replica/database: make_multishard_streaming_reader(): expose buffer_hint parameter
db/config: introduce enable_repair_multishard_reader_buffer_hint
readers/multishard: multishard_reader: pass hint to shard_reader
readers/multishard: shard_reader_v2::fill_reader_buffer(): respect the hint
readers/multishard: propagate fill_buffer_hint to shard_reader:fill_reader_buffer()
readers/multishard: shard_reader: extract buffer-fill into its own method
In scylladb/scylladb#19745, view_builder was migrated to group0 and since then it is dependant on group0_service.
Because of this, group0_service should be initialized/destroyed before/after view_builder.
This patch also adds error injection to `raft_server_with_timeouts::read_barrier`, which does 1s sleep before doing the read barrier. There is a new test which reproduces the use after free bug using the error injection.
Fixesscylladb/scylladb#20772scylladb/scylladb#19745 is present in 6.2, so this fix should be backported to it.
Closesscylladb/scylladb#21471
* github.com:scylladb/scylladb:
test/boost/secondary_index_test: add test for use after free
api/raft: use `get_server_with_timeouts().read_barrier()` in coroutines
main,cql_test_env: start group0_service before view_builder
Reproduces scylladb/scylladb#20772.
Add error injection to `raft_server_with_timeouts::read_barrier`,
which does 1s sleep before doing the read barrier.