Tested code paths should not throw exceptions. `scylla_reactor_cpp_exceptions`
metric is used. This is a global metric. To address potential test flakiness,
each test runs multiple times:
- `run_count = 100`
- `cpp_exception_threshold = 10`
If a change in the code introduced an exception, expectation is that the number
of registered exceptions will be > `cpp_exception_threshold` in `run_count` runs.
In which case the test fails.
Replace throwing protocol_exception with returning it as a result
or an exceptional future in the transport server module. This
improves performance, for example during connection storms and
server restarts, where protocol exceptions are more frequent.
In functions already returning a future, protocol exceptions are
propagated using an exceptional future. In functions not already
returning a future, result_with_exception is used.
Notable change is checking v.failed() before calling v.get() in
process_request function, to avoid throwing in case of an
exceptional future.
Refs: #24567
Make make_bytes_ostream and make_fragmented_temporary_buffer accept
writer callbacks that return utils::result_with_exception instead of
forcing them to throw on error. This lets callers propagate failures
by returning an error result rather than throwing an exception.
Introduce buffer_writer_for, bytes_ostream_writer, and fragmented_buffer_writer
concepts to simplify and document the template requirements on writer callbacks.
This patch does not modify the actual callbacks passed, except for the syntax
changes needed for successful compilation, without changing the logic.
Refs: #24567
Previously, connection::handle_error always called f.get() inside a try/catch,
forcing every failed future to throw and immediately catch an exception just to
classify it. This change eliminates that extra throw/catch cycle by first checking
f.failed(), getting the stored std::exception_ptr via f.get_exception(), and
then dispatching on its type via utils::try_catch<T>(eptr).
The error-response logic is not changed - cassandra_exception, std::exception,
and unknown exceptions are caught and processed, and any exceptions thrown by
write_response while handling those exceptions continues to escape handle_error.
Refs: #24567
Add a helper to fetch scylla_transport_cql_errors_total{type="protocol_error"} counter
from Scylla's metrics endpoint. These metrics are used to track protocol error
count before and after each test.
Add cql_with_protocol context manager utility for session creation with parameterized
protocol_version value. This is used for testing connection establishment with
different protocol versions, and proper disposal of successfully established sessions.
The tests cover two failure scenarios:
- Protocol version mismatch in test_protocol_version_mismatch which tests both supported
and unsupported protocol version
- Malformed frames via raw socket in _protocol_error_impl, used by several test functions,
and also test_no_protocol_exceptions test to assert that the error counters never decrease
during test execution, catching unintended metric resets
Refs: #24567
cql, schema: Extend name length limit from 48 to 192 bytes
This commit increases the maximum length of names for keyspaces, tables, materialized views, and indexes from 48 to 192 bytes.
The previous 48-bytes limit was inherited from Cassandra 3 for compatibility. However, this validation was removed in Cassandra 4 and 5 (see CASSANDRA-20389)
and some usage scenarios (such as some feature store workflows generating long table names) now depend on this relaxed constraint.
This change brings ScyllaDB's behavior in line with modern Cassandra versions and better supports these use cases.
The new limit of 192 bytes is derived from underlying filesystem limitations to prevent runtime errors when creating directories for table data.
When a new table is created, ScyllaDB generates a directory for its SSTables. The directory name is constructed from the table name, a dash, and a 32-character UUID.
For a CDC-enabled table, an associated log table is also created, which has the suffix `_scylla_cdc_log` appended to its name.
The directory name for this log table becomes the longest possible representation.
Additionally we reserve 15 bytes for future use, allowing for potential future extensions without breaking existing schemas.
To guarantee that directory creation never fails due to exceeding filesystem name limits, the maximum name length is calculated as follows:
255 bytes (common filesystem limit for a path component)
- 32 bytes (for the 32-character UUID string)
- 1 byte (for the '-' separator)
- 15 bytes (for the '_scylla_cdc_log' suffix)
- 15 bytes (reserved for future use)
----------
= 192 bytes (Maximum allowed name length)
This calculation is similar in principle to the one proposed for Cassandra to fix related directory creation failures (see apache/cassandra/pull/4038).
This patch also updates/adds all associated tests to validate the new 192-byte limit.
The documentation has been updated accordingly.
Fixes#4480
Backport 2025.2: The significantly shorter maximum table name length in Scylla compared to Cassandra is becoming a more common issue for users in the latest release.
Closesscylladb/scylladb#24500
* github.com:scylladb/scylladb:
cql, schema: Extend name length limit from 48 to 192 bytes
replica: Remove unused keyspace::init_storage()
`dirty_memory_manager` tracks two quantities about memtable memory usage:
"real" and "unspooled" memory usage.
"real" is the total memory usage (sum of `occupancy().total_space()`)
by all memtable LSA regions, plus a upper-bound estimate of the size of
memtable data which has already moved to the cache region but isn't
evictable (merged into the cache) yet.
"unspooled" is the difference between total memory usage by all memtable
LSA regions, and the total flushed memory (sum of `_flushed_memory`)
of memtables.
`dirty_memory_manager` controls the shares of compaction and/or blocks
writes when these quantities cross various thresholds.
"Total flushed memory" isn't a well defined notion,
since the actual consumption of memory by the same data can vary over
time due to LSA compactions, and even the data present in memtable can
change over the course of the flush due to removals of outdated MVCC versions.
So `_flushed_memory` is merely an approximation computed by `flush_reader`
based on the data passing through it.
This approximation is supposed to be a conservative lower bound.
In particular, `_flushed_memory` should be not greater than
`occupancy().total_space()`. Otherwise, for example, "unspooled" memory
could become negative (and/or wrap around) and weird things could happen.
There is an assertion in `~flush_memory_accounter` which checks that
`_flushed_memory < occupancy().total_space()` at the end of flush.
But it can fail. Without additional treatment, the memtable reader sometimes emits
data which is already deleted. (In particular, it emites rows covered by
a partition tombstone in a newer MVCC version.)
This data is seen by `flush_reader` and accounted in `_flushed_memory`.
But this data can be garbage-collected by the `mutation_cleaner` later during the
flush and decrease `total_memory` below `_flushed_memory`.
There is a piece of code in `mutation_cleaner` intended to prevent that.
If `total_memory` decreases during a `mutation_cleaner` run,
`_flushed_memory` is lowered by the same amount, just to preserve the
asserted property. (This could also make `_flushed_memory` quite inaccurate,
but that's considered acceptable).
But that only works if `total_memory` is decreased during that run. It doesn't
work if the `total_memory` decrease (enabled by the new allocator holes made
by `mutation_cleaner`'s garbage collection work) happens asynchronously
(due to memory reclaim for whatever reason) after the run.
This patch fixes that by tracking the decreases of `total_memory` closer to the
source. Instead of relying on `mutation_cleaner` to notify the memtable if it
lowers `total_memory`, the memtable itself listens for notifications about
LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's
estimate of flushed memory decreased by the change in `total_memory` since the
beginning of flush (if it was positive), and it keeps the amount of "spooled"
memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`.
Fixesscylladb/scylladb#21413
Backport candidate because it fixes a crash that can happen in existing stable branches.
Closesscylladb/scylladb#21638
* github.com:scylladb/scylladb:
memtable: ensure _flushed_memory doesn't grow above total memory usage
replica/memtable: move region_listener handlers from dirty_memory_manager to memtable
dirty_memory_manager tracks two quantities about memtable memory usage:
"real" and "unspooled" memory usage.
"real" is the total memory usage (sum of `occupancy().total_space()`)
by all memtable LSA regions, plus a upper-bound estimate of the size of
memtable data which has already moved to the cache region but isn't
evictable (merged into the cache) yet.
"unspooled" is the difference between total memory usage by all memtable
LSA regions, and the total flushed memory (sum of `_flushed_memory`)
of memtables.
dirty_memory_manager controls the shares of compaction and/or blocks
writes when these quantities cross various thresholds.
"Total flushed memory" isn't a well defined notion,
since the actual consumption of memory by the same data can vary over
time due to LSA compactions, and even the data present in memtable can
change over the course of the flush due to removals of outdated MVCC versions.
So `_flushed_memory` is merely an approximation computed by `flush_reader`
based on the data passing through it.
This approximation is supposed to be a conservative lower bound.
In particular, `_flushed_memory` should be not greater than
`occupancy().total_space()`. Otherwise, for example, "unspooled" memory
could become negative (and/or wrap around) and weird things could happen.
There is an assertion in ~flush_memory_accounter which checks that
`_flushed_memory < occupancy().total_space()` at the end of flush.
But it can fail. Without additional treatment, the memtable reader sometimes emits
data which is already deleted. (In particular, it emites rows covered by
a partition tombstone in a newer MVCC version.)
This data is seen `flush_reader` and accounted in `_flushed_memory`.
But this data can be garbage-collected by the mutation_cleaner later during the
flush and decrease `total_memory` below `_flushed_memory`.
There is a piece of code in mutation_cleaner intended to prevent that.
If `total_memory` decreases during a `mutation_cleaner` run,
`_flushed_memory` is lowered by the same amount, just to preserve the
asserted property. (This could also make `_flushed_memory` quite inaccurate,
but that's considered acceptable).
But that only works if `total_memory` is decreased during that run. It doesn't
work if the `total_memory` decrease (enabled by the new allocator holes made
by `mutation_cleaner`'s garbage collection work) happens asynchronously
(due to memory reclaim for whatever reason) after the run.
This patch fixes that by tracking the decreases of `total_memory` closer to the
source. Instead of relying on `mutation_cleaner` to notify the memtable if it
lowers `total_memory`, the memtable itself listens for notifications about
LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's
estimate of flushed memory decreased by the change in `total_memory` since the
beginning of flush (if it was positive), and it keeps the amount of "spooled"
memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`.
The memtable wants to listen for changes in its `total_memory` in order
to decrease its `_flushed_memory` in case some of the freed memory has already
been accounted as flushed. (This can happen because the flush reader sees
and accounts even outdated MVCC versions, which can be deleted and freed
during the flush).
Today, the memtable doesn't listen to those changes directly. Instead,
some calls which can affect `total_memory` (in particular, the mutation cleaner)
manually check the value of `total_memory` before and after they run, and they
pass the difference to the memtable.
But that's not good enough, because `total_memory` can also change outside
of those manually-checked calls -- for example, during LSA compaction, which
can occur anytime. This makes memtable's accounting inaccurate and can lead
to unexpected states.
But we already have an interface for listening to `total_memory` changes
actively, and `dirty_memory_manager`, which also needs to know it,
does just that. So what happens e.g. when `mutation_cleaner` runs
is that `mutation_cleaner` checks the value of `total_memory` before it runs,
then it runs, causing several changes to `total_memory` which are picked up
by `dirty_memory_manager`, then `mutation_cleaner` checks the end value of
`total_memory` and passes the difference to `memtable`, which corrects
whatever was observed by `dirty_memory_manager`.
To allow memtable to modify its `_flushed_memory` correctly, we need
to make `memtable` itself a `region_listener`. Also, instead of
the situation where `dirty_memory_manager` receives `total_memory`
change notifications from `logalloc` directly, and `memtable` fixes
the manager's state later, we want to only the memtable listen
for the notifications, and pass them already modified accordingl
to the manager, so there is no intermediate wrong states.
This patch moves the `region_listener` callbacks from the
`dirty_memory_manager` to the `memtable`. It's not intended to be
a functional change, just a source code refactoring.
The next patch will be a functional change enabled by this.
The `drain` method, cancels all running compactions and moves the
compaction manager into the disabled state. To move it back to
the enabled state, the `enable` method shall be called.
This, however, throws an assertion error as the submission time is
not cancelled and re-enabling the manager tries to arm the armed timer.
Thus, cancel the timer, when calling the drain method to disable
the compaction manager.
Fixes https://github.com/scylladb/scylladb/issues/24504
All versions are affected. So it's a good candidate for a backport.
Closesscylladb/scylladb#24505
As test/cqlpy/README.md explains, the way to tell the run-cassandra
script which version of Cassandra should be run is through the
"CASSANDRA" variable, for example:
CASSANDRA=$HOME/apache-cassandra-4.1.6/bin/cassandra \
test/cqlpy/run-cassandra test_file.py::test_function
But all the Cassandra scripts, of all versions, have one strange
feature: If you set CASSANDRA_HOME, then instead of running the
actual Cassandra script you tried to run (in this case, 4.1.6), the
Cassandra script goes to run the other Cassandra from CASSANDRA_HOME!
This means that if a user happens to have, for some reason, set
CASSANDRA_HOME, then the documented "CASSANDRA" variable doesn't work.
The simple fix is to clear CASSANDRA_HOME in the environment that
run-cassandra passes to Cassandra.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#24546
Token metadata api is initialized before gossiper is started.
get_host_id_map REST endpoint cannot function without the fully
initialized gossiper though. The gossiper is started deep in
the join_cluster call chain, but if we move token_metadata api
initialization after the call it means that no api will be available
during bootstrap. This is not what we want.
Make a simple fix by returning an error from the api if the gossiper is
not initialized yet.
Fixes: #24479Closesscylladb/scylladb#24575
File name for the boost test do not use run_id, so each consequent run will
overwrite the logs from the previous one. If the first repeat fails, and the
second will pass, it overwrites the failed log. This PR allows saving the
failed one.
Closesscylladb/scylladb#24580
The following was seen:
```
!WARNING | scylla[6057]: [shard 12:strm] seastar_memory - oversized allocation: 212992 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at
[Backtrace #0]
void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_0>(seastar::current_backtrace_tasklocal()::$_0&&, bool) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:89
(inlined by) seastar::current_backtrace_tasklocal() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:99
seastar::current_tasktrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:136
seastar::current_backtrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:169
seastar::memory::cpu_pages::warn_large_allocation(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:848
seastar::memory::allocate_slowpath(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:911
operator new(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:1706
std::allocator<dht::token_range_endpoints>::allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/allocator.h:196
(inlined by) std::allocator_traits<std::allocator<dht::token_range_endpoints> >::allocate(std::allocator<dht::token_range_endpoints>&, unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/alloc_traits.h:515
(inlined by) std::_Vector_base<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> >::_M_allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_vector.h:380
(inlined by) void std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> >::_M_realloc_append<dht::token_range_endpoints const&>(dht::token_range_endpoints const&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/vector.tcc:596
locator::describe_ring(replica::database const&, gms::gossiper const&, seastar::basic_sstring<char, unsigned int, 15u, true> const&, bool) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_vector.h:1294
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> > >::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/coroutine:242
(inlined by) seastar::internal::coroutine_traits_base<std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> > >::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:80
seastar::reactor::do_run() at ./build/release/seastar/./build/release/seastar/./seastar/src/core/reactor.cc:2635
std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke(std::_Any_data const&) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/reactor.cc:4684
```
Fix by using chunked_vector.
Fixes#24158Closesscylladb/scylladb#24561
Currently, CI uses several nodes to execute the different modes to
reduce overall time for execution. During copying the results from nodes
to the main job test reports will be overwritten, since they are using
the same directory and the same name. This patch allows to
distinguishing these results and not overwrite them.
Closesscylladb/scylladb#24559
It just std::move-s a buffer and a semaphore_units objects, both moves
are noexcept, so is the constructor itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#24552
In f96d30c2b5
we introduced the maintenance service, which is an additional
instance of auth::service. But this service has a somewhat
confusing 2-level startup mechanism: it's initialized with
sharded<Service>::start and then auth::service::start
(different method with the same name to confuse even more).
When maintenance_socket was disabled (default setting), the code
did only the first part of the startup. This registered a config
observer but didn't create a permission_cache instance.
As a result, a crash on SIGHUP when config is reloaded can occur.
Fixes: https://github.com/scylladb/scylladb/issues/24528
Backport: all not eol versions since 6.0 and 2025.1
Closesscylladb/scylladb#24527
* github.com:scylladb/scylladb:
test: add test for live updates of permissions cache config
main: don't start maintenance auth service if not enabled
This commit increases the maximum length of names for keyspaces, tables, materialized views, and indexes from 48 to 192 bytes.
The previous 48-bytes limit was inherited from Cassandra 3 for compatibility. However, this validation was removed in Cassandra 4 and 5 (see CASSANDRA-20389)
and some usage scenarios (such as some feature store workflows generating long table names) now depend on this relaxed constraint.
This change brings ScyllaDB's behavior in line with modern Cassandra versions and better supports these use cases.
The new limit of 192 bytes is derived from underlying filesystem limitations to prevent runtime errors when creating directories for table data.
When a new table is created, ScyllaDB generates a directory for its SSTables. The directory name is constructed from the table name, a dash, and a 32-character UUID.
For a CDC-enabled table, an associated log table is also created, which has the suffix `_scylla_cdc_log` appended to its name.
The directory name for this log table becomes the longest possible representation.
Additionally we reserve 15 bytes for future use, allowing for potential future extensions without breaking existing schemas.
To guarantee that directory creation never fails due to exceeding filesystem name limits, the maximum name length is calculated as follows:
255 bytes (common filesystem limit for a path component)
- 32 bytes (for the 32-character UUID string)
- 1 byte (for the '-' separator)
- 15 bytes (for the '_scylla_cdc_log' suffix)
- 15 bytes (reserved for future use)
----------
= 192 bytes (Maximum allowed name length)
This calculation is similar in principle to the one proposed for Cassandra to fix related directory creation failures (see apache/cassandra/pull/4038).
This patch also updates/adds all associated tests to validate the new 192-byte limit.
The documentation has been updated accordingly.
In f96d30c2b5
we introduced the maintenance service, which is an additional
instance of auth::service. But this service has a somewhat
confusing 2-level startup mechanism: it's initialized with
sharded<Service>::start and then auth::service::start
(different method with the same name to confuse even more).
When maintenance_socket was disabled (default setting), the code
did only the first part of the startup. This registered a config
observer but didn't create a permission_cache instance.
As a result, a crash on SIGHUP when config is reloaded can occur.
This PR adds an upgrade test for SSTable compression with shared dictionaries, and adds some bits to pylib and test.py to support that.
In the series, we:
1. Mount `$XDG_CACHE_DIR` into dbuild.
2. Add a pylib function which downloads and installs a released ScyllaDB package into a subdirectory of `$XDG_CACHE_DIR/scylladb/test.py`, and returns the path to `bin/scylla`.
3. Add new methods and params to the cluster manager, which let the test start nodes with historical Scylla executables, and switch executables during the test.
4. Add a test which uses the above to run an upgrade test between the released package and the current build.
5. Add `--run-internet-dependent-tests` to `test.py` which lets the user of `test.py` skip this test (and potentially other internet-dependent tests in the future).
(The patch modifying `wait_for_cql_and_get_hosts` is a part of the new test — the new test needs it to test how particular nodes in a mixed-version cluster react to some CQL queries.)
This is a follow-up to #23025, split into a separate PR because the potential addition of upgrade tests to `test.py` deserved a separate thread.
Needs backport to 2025.2, because that's where the tested feature is introduced.
Fixes#24110Closesscylladb/scylladb#23538
* github.com:scylladb/scylladb:
test: add test_sstable_compression_dictionaries_upgrade.py
test.py: add --run-internet-dependent-tests
pylib/manager_client: add server_switch_executable
test/pylib: in add_server, give a way to specify the executable and version-specific config
pylib: pass scylla_env environment variables to the topology suite
test/pylib: add get_scylla_2025_1_executable()
pylib/scylla_cluster: give a way to pass executable-specific options to nodes
dbuild: mount "$XDG_CACHE_HOME/scylladb"
In libstdc++15, the internal structure of an unordered container
hashtable node changed from _M_storage._M_storage.__data to just
_M_storage._M_storage (though the layout is the same). Adjust
the code to work with both variants.
Closesscylladb/scylladb#24549
The contract in mutation_reader.hh says:
```
// pr needs to be valid until the reader is destroyed or fast_forward_to()
// is called again.
future<> fast_forward_to(const dht::partition_range& pr) {
```
`test_fast_forwarding_combined_reader_is_consistent_with_slicing` violates
this by passing a temporary to `fast_forward_to`.
Fix that.
Fixesscylladb/scylladb#24542Closesscylladb/scylladb#24543
This patch intends to give an overview of where, when and how we store
data in S3 and provide a quick set of commands
which help gain local access to the data in case there is a need for
manual intervention.
The patch also collects in the same place links/descriptions for all
formats we use in S3.
Fixes#22438
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Closesscylladb/scylladb#24323
Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3. This should address and prevent future problems related to this issue https://github.com/minio/minio/issues/21333
No backport needed since this problem related only to this change https://github.com/scylladb/scylladb/pull/23880Closesscylladb/scylladb#24312
* github.com:scylladb/scylladb:
s3_client: headers cleanup
s3_client: Refactor `range` class for state validation
Refs #24447
Patch adding this somehow managed to leave out the thread_local
specifier. While gnutls cert object can be shared across shards
just fine, the actual shared_ptr here cannot, thus we could
cause memory errors.
Closesscylladb/scylladb#24514
This reverts commit 0b516da95b, reversing
changes made to 30199552ac. It breaks
cluster.random_failures.test_random_failures.test_random_failures
in debug mode (at least).
Fixes#24513
Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3.
With current changes, pytest executes boost tests. Gathering metrics added to the pytest BoostFacade and UnitFacade to have the possibility to get them for C++ test as previously.
Since boost, raft, unit, and ldap directories aren't executed by test.py, suite.yaml files are renamed to test_config.yaml to preserve the old way of test configuration and removing them from execution by test.py
Pytest executes all modes by itself, JUnit report for the C++ test will be one for the run. That means that there is no possibility to output them in testlog in different folders. So testlog/report directory is used to store all kinds of reports generated during tests. JUnit reports should be testlog/report/junit, Allure reports should be in testlog/report/allure.
**Breaking changes:**
1. Terminal output changed. test.py will run pytest for the next directories: `test/boost`, `test/ldap`, `test/raft`, `test/unit`. `test.py` will blindly translate the output of the pytest to the terminal. Then when all these tests are finished, `test.py` will continue to show previous output for the rest of the test.
2. The format of execution of C++ test directories mentioned above has been changed. Now it will be a simple path to the file with extension. For example, instead of `boost/aggregate_fcts_test` now you need to use `test/boost/aggregate_fcts_test.cc`
3. This PR creates a spike in test amount. The previous logic was to consolidate the boost results from different runs and different modes to one report. So for the three repeats and three modes (nine test results) in CI was shown one result. Now it shows nine results, with differentiating them by mode and run.
**Note:**
Pytest uses pytest-xdist module to run tests in parallel. The Frozen toolchain has this dependency installed, for the local use, please install it manually.
Changes for CI https://github.com/scylladb/scylla-pkg/pull/4949. It will be merged after the current PR will be in master. Short disruption is expected, while PR in scylla-pkg will not be merged.
Fixes: https://github.com/scylladb/qa-tasks/issues/1777Closesscylladb/scylladb#22894
* github.com:scylladb/scylladb:
test.py: clean code that isn't used anymore
test.py: switch off C++ tests from test.py discovery
test.py: Integrate pytest c++ test execution to test.py
The one collects map<ip, state> then converts it to a jsonable vector of
helper objects with key and value members. This patch removes the
intermediate map and creates the vector instantly. With that change the
handler makes less data manipulations and behaves like the
get_all_endpoint_states one.
Very similar change was done in 12420dc644 with get_host_to_id_map
handler.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#24456
Our documentation docs/alternator/new-apis.md claims that Alternator TTL
does not work with tablets, due to issue #16567. However, we fixed that
issue in commit de96c28625. So let's drop
the outdated statement that it doesn't work.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#24427
Revise how we report statistics for `chunked_download_source`. Ensure
metrics for downloaded but unconsumed data are visible, as they do not
contribute to read amplification, which is tracked separately.
Closesscylladb/scylladb#24491
Applier fiber needs local storage, so before shutting down local storage we need to make sure that group0 is stopped.
We also improve the logs for the case when `gate_closed_exception` is thrown while a mutation is being written.
Fixes [scylladb/scylladb#24401](https://github.com/scylladb/scylladb/issues/24401)
Backport: no backport -- not safe and the problem is minor.
Closesscylladb/scylladb#24418
* github.com:scylladb/scylladb:
storage_service: test_group0_apply_while_node_is_being_shutdown
main.cc: fix group0 shutdown order
storage_proxy: log gate_closed_exception
This patch updates alternator/stats.cc and the get_description.py
configuration (metrics-config.yml) to restore compatibility with
per-table alternator metrics in the documentation generation process.
Previously, the group name for metrics was selected using an inline
expression like (has_table)? "alternator_table" : "alternator", which
made it difficult to maintain a straightforward mapping in the
configuration file. With this change, the group name is now assigned to
a variable in alternator/stats.cc, allowing metrics-config.yml to map
group names directly. This makes the configuration easier to maintain
and enables get_description.py to document both global and per-table
metrics correctly.
This is a minimal, targeted fix to get the documentation working again
with the new per-table metrics format.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Closesscylladb/scylladb#24509
An interval object stores five booleans: start()->is_inclusive(),
a boolean since start() itself is an std::optional, two more for
end(), and is_singular(). Due to bad packing, these five booleans
occupy 8 bytes each, for a total of 40 bytes.
Re-pack the interval class by storing those booleans explicitly
close by. Since we lose std::optional's ability to store
a maybe-constructed object, we re-implement it using anonymous
unions and therefore have to implement the 5 special methods.
This helps saves space when vectors of intervals are used, as
seen in #3335 for example.
We'd like to change the data layout of `interval` to save space.
As a result, start() and end() which return references to data
members must return objects (not references). Since we'd like
to maintain zero-copy for these functions, we change them to
return objects containing references (rather than references
to objects), avoiding copying of potentially expensive objects.
We repurpose the interval_bound class to hold references (by
instantiating it with `const T&` instead of `T`) and provide
converting constructors. To make transform_bounds() retain
zero-copy, we add start() and end() that take *this by
rvalue reference.
We are about to change start() to return a proxy object rather
than a `const interval_bound<T>&`. This is generally transparent,
except in one case: `auto x = i.start()`. With the current implementation,
we'll copy object referred to and assign it to x. With the planned
implementation, the proxy object will be assigned to `x`, but it
will keep referring to `i`.
To prevent such problems, rename start() to start_ref() and end()
to end_ref(). This forces us to audit all calls, and redirect calls
that will break to new start_copy() and end_copy() methods.
In this series, we will make interval manage its memory directly,
specifically it will directly construct and destroy T values that
it contains rather than let std::optional<T> manage those values
itself.
Add tests that expose bugs encountered during development (actually,
review) of this series. The tests pass before the series, fail
with series as it was before fixing, and pass with the series as
it is now.
The tests use a class maybe_throwing_interval_payload that can
be set to throw at strategic locations and exercise all the interesting
interval shapes.
Refactor the voter handler logic to only pass around node IDs (`raft::server_id`), instead of pairs of IDs and node descriptor references. Node descriptors can always be efficiently retrieved from the original nodes map, which remains valid throughout the calculation.
This change reduces unnecessary reference passing and simplifies the code. All node detail lookups are now performed via the central nodes map as needed.
Additional cleanup has been done:
* removing redundant comments (that just repeat what the code does)
* use explicit comparators for the datacenter and rack information priorities (instead of the comparison operator) to be more explicit about the prioritization
Fixes: scylladb/scylladb#24035
No backport: This change does not fix any bug and doesn't change the behavior, just cleans up the code in master, therefore no backport is needed.
Closesscylladb/scylladb#24452
* https://github.com/scylladb/scylladb:
raft: simplify voter handler code to not pass node references around
raft: reformat voter handler for consistent indentation
raft: use explicit priority comparators for datacenters and racks
raft: clean up voter handler by removing redundant comments