Refactor resource_gather.py to not create the initial cgroup when the process it's already in it. This will allow not going deeper, creating again and again the same cgroup with each test.py execution when the terminal isn't closed.
Add creation of own event loop in case it's not exists. This needed to be able to work with
test.py that creates loop and with pytest that not create loop.
Due to the changes in creating schemas with base info the
test_schema_is_recovered_after_dying seems to be flaky when checking
that the schema is actually lost after 'grace_period'. We don't
actually guarantee that the the schema will be lost at that exact
moment so there's no reason to test this. To remove the flakiness,
we remove the check and the related sleep, which should also slightly
improve the speed of this test.
Dropping an index is a schema change of its base table and
a schema drop of the index's materialized view. This combination
of schema changes used to cause issues during view building, because
when a view schema was dropped, it wasn't getting updated with the
new version of the base schema, and while the view building was
in progress, we would update the base schema for the base table
mutation reader and try generating updates with a view schema that
wasn't compatible with the base schema, failing on an `on_internal_error`.
In this patch we add a test for this scenario. We create an index,
halt its view building process using an injection, and drop it.
If no errors are thrown, the test succeeds.
The test was failing before https://github.com/scylladb/scylladb/pull/23337
and is passing afterwards.
In the previous commits we made sure that the base info is not dependent
on the base schema version, and the info dependent on the base schema
version is calculated when it's needed. In this patch we remove the
unnecessary re-setting of the base_info.
The set_base_info method isn't removed completely, because it also has
a secondary function - zeroing the view_info fields other than base_info.
Because of this, in this patch we rename it accordingly and limit its
use to the updates caused by a base schema change.
In the following patch we plan to remove the base schema from the base_info
to make the base_info immutable. To do that, we first prepare the schema
registry for the change; we need to be able to create view schemas from
frozen schemas there and frozen schemas have no information about the base
table. Unless we do this change, after base schemas are removed from the
base info, we'll no longer be able to load a view schema to the schema registry
without looking up the base schema in the database.
This change also required some updates to schema building:
* we add a method for unfreezing a view schema with base info instead of
a base schema
* we make it possible to use schema_builder with a base info instead of
a base schema
* we add a method for creating a view schema from mutations with a base info
instead of a base schema
* we add a view_info constructor withat base info instead of a base schema
* we update the naming in schema_registry to reflect the usage of base info
instead of base schema
Currently, the base_info may or may not be set in view schemas.
Even when it's set, it may be modified. This necessitates extra
checks when handling view schemas, as well as potentially causing
errors when we forget to set it at some point.
Instead, we want to make the base info an immutable member of view
schemas (inside view_info). The first step towards that is making
sure that all newly created schemas have the base info set.
We achieve that by requiring a base schema when constructing a view
schema. Unfortunately, this adds complexity each time we're making
a view schema - we need to get the base schema as well.
In most cases, the base schema is already available. The most
problematic scenario is when we create a schema from mutations:
- when parsing system tables we can get the schema from the
database, as regular tables are parsed before views
- when loading a view schema using the schema loader tool, we need
to load the base additionally to the view schema, effectively
doubling the work
- when pulling the schema from another node - in this case we can
only get the current version of the base schema from the local
database
Additionally, we need to consider the base schema version - when
we generate view updates the version of the base schema used for
reads should match the version of the base schema in view's base
info.
This is achieved by selecting the correct (old or new) schema in
`db::schema_tables::merge_tables_and_views` and using the stored
base schema in the schema_registry.
The query may fail also on a no_such_keyspace
exception, which generates the following cql error:
```
Error from server: code=2200 [Invalid query] message="Can\'t find a keyspace test_1745198244144_qoohq"
```
Extend the pytest.raises match expression to include
this error as well.
Fixes#23812
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#23875
Both test.py and test/cqlpy/run run many test functions against the same
Scylla process. In the resulting log file, it is hard to understand which
log messages are related to which test. In this patch, we log a message
(using the "/system/log" REST API) every time a test is started or ends.
The messages look like this:
INFO 2025-04-22 15:10:44,625 [shard 1:strm] api - /system/log:
test/cqlpy: Starting test_lwt.py::test_lwt_missing_row_with_static
...
INFO 2025-04-22 15:10:44,631 [shard 0:strm] api - /system/log:
test/cqlpy: Ended test_lwt.py::test_lwt_missing_row_with_static
We already had a similar feature in test/alternator, added three years
ago in commit b0371b6bf8. The implementation
is similar but not identical due to different available utility functions,
and in any case it's very simple.
While at it, this patch also fixes the has_rest_api() to timeout after
one second. Without this, if the REST API is blocked in a way that
a connection attempt just hangs, the tests can hang. With the new
timeout, the test will hang for a second, realize the REST API is
not available, and remember this decision (the next tests will not
wait one second again). We had the same bug in Alternator, and fixed
it in 758f8f01d7. This one second "pause"
will only happen if the REST API port is blocked - in the more typical
case the REST API port is just not listening but not blocked, and the
failure will be noticed immediately and won't wait a whole second.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23857
The test populates a table with 50k rows, creates a view on that table
and then compares the time spent in streaming vs. gossip scheduling
groups. It only takes 10s in dev mode on my machine, but is much slower
in debug mode in CI - building the view doesn't finish within 2 minutes.
The bigger the view to build, the more accurrate the measurement;
moreover, the test scenario isn't interesting enough to be worth running
it in debug mode as this should be covered by other tests. Therefore,
just skip this test in debug mode.
Fixes: scylladb/scylladb#23862Closesscylladb/scylladb#23866
This PR contains changes that do not add new functionality, and have small refactoring of the existing code.
The most significant change though is switching the SQLite writer from a singleton to a thread locking mechanism that will be needed later on.
This PR is an extraction of several commits from https://github.com/scylladb/scylladb/pull/22894 as reviewer [request](https://github.com/scylladb/scylladb/pull/22894?notification_referrer_id=NT_kwDOACiLR7MxNDg0ODk2MDU1MjoyNjU3MDk1¬ifications_query=reason%3Aparticipating#pullrequestreview-2778582278).
Closesscylladb/scylladb#23867
* github.com:scylladb/scylladb:
test.py: move the readme file for LDAP tests to the correct location
test.py: eliminate deprecation warning for xml.etree.ElementTree.Element
test.py: align the behavior of max-failures parameter with pytest maxfail
test.py: fix typo in toxiproxy name parameter
test.py: add locking to the sqlite writer for resource gather
test.py: add sqlite datetime adapter for resource gather
test.py: change the parameter for get_modes_to_run()
This commit adds a test to verify that a query with long timeout
doesn't block ERM on failure. The motivation for the test is
fixing scylladb#21831.
This commit:
- add test_long_query_timeout_erm
This commit adds to ManagerClient a get_cql_exclusive function that
allows creating a cql connection with WhiteListRoundRobinPolicy for
a single server. Such connection is useful in tests that kill nodes to
make sure that the live node handles the queries. Before this commit,
some tests used cluster_con from test/cluster/conftest.py, and after
this commit test can start to use a method from MangerClient.
This change:
- Extend ManagerClient con_gen type to allow LoadBalancingPolicy arg
- Implement get_cql_exclusive()
SQLite blocking the DB during writes, so it's not possible to make writes from
several thread. To be able to gather metrics in several threads, we need a
locking mechanism for threads during writes. So thread will not try to
write metrics while another thread is performing writes.
Change the parameter for get_modes_to_run() from session to config to
narrow the scope, and prepare it to later use in method that do not have
access to the session, but have access to the config object
This PR introduces a cleanup mechanism in s3_tests to remove uploaded objects after the test completes, ensuring a clean testing environment. Additionally, the recently added test has been refactored and split into smaller, more maintainable parts, improving readability and extending its coverage to include the "proxied" case.
As these changes primarily improve code aesthetics and maintainability, backporting is not necessary.
Refs: https://github.com/scylladb/scylladb/issues/23830Closesscylladb/scylladb#23828
* github.com:scylladb/scylladb:
s3_tests: Improve and extend copy object test coverage
s3_tests: Implement post-test cleanup for uploaded objects
The test/scylla_gdb suite needs Scylla to have been built with debug
symbols - which is NOT the case for the dev build. So the script
test/scylla_gdb/run attempts to recognize when a developer runs it
on an executable with the debug symbols missing - and prints a clear error.
Unfortunately, as we noticed in #10863, and again in #23832, because
wasmtime is compiled with debug symbols and linked with Scylla,
build/dev/scylla "pretends" to have debug symbols, foiling the check
in test/scylla_gdb/run. Reviewers rejected two solutions to this problem
(pull requests #10865 and #10923), so in pull request #10937 I added
a cosmetic solution just for test/scylla_gdb: in test/scylla_gdb/conftest.py
we check that there are **really** debug symbols that interest us,
and if not, exit immediately instead of failing each test separately.
For some reason, the sys.exit() we used is no longer effective - it
no longer exits pytest, so in this patch we use pytest.exit() instead.
Fixes#23832 (sort of, we leave build/dev/scylla with the fake claim
that it has debug symbols, but test/scylla_gdb will handle this
situation more gracefully).
Closesscylladb/scylladb#23834
The test `test_mv_write_to_dead_node` currently uses a timeout of 60
seconds for remove_node, after it was increased from 30 seconds to fix
scylladb/scylladb#22953. Apparently it is still too low, and it was
observed to fail in debug mode.
Normally remove_node uses a default timeout of TOPOLOGY_TIMEOUT = 1000
seconds, but the test requires a timeout which is shorter than 5
minutes, because it is a regression test for an issue where MV updates
hold topology changes for more than 5 minutes, and we want to verify in
the test that the topology change completes in less than 5 minutes.
To resolve the issue, we set the test to skip in debug mode, because the
remove node operation is unpredictably slow, and we increase the timeout
to 180 seconds which is hopefully enough time for remove_node in
non-debug modes, and still sufficient to satisfy the test requirements.
Fixesscylladb/scylladb#22530Closesscylladb/scylladb#23833
Refactored the copy object test to enhance readability and maintainability.
The test was simplified and split into smaller, more focused parts.
Additionally, a "proxied" variant of the test was introduced to expand
coverage.
Fixes the following scenario:
1. Scale out adds new nodes to each rack
2. Table is created - all tablets are allocated to new nodes because they have low load
3. Rebalancing moves tablets from old nodes to new nodes - table balance for the new table is not fixed
We're wrong to try to equalize global load when allocating tablets,
and we should equalize per-table load instead, and let background load
balancing fix it in a fair way. It will add to the allocated storage
imbalance, but:
1. The table is initially empty, so doesn't impact actual storage imbalance.
2. It's more important to avoid overloading CPU on the nodes - imbalance hurts this aspect immediately.
3. If the table was created before imbalance was formed, we would end up in the same situation as in the problematic scenario after the patch.
4. It's the job of the load balancing to keep up with storage growing, and if it's not, scale out should kick in.
Before we have CPU-aware tablet allocation, and thus can prove we have
CPU capacity on the small nodes, we should respect per-table balance
as this is the way in which we achieve full CPU utilization.
Fixes#23631
Backport to 2025.1 because load imbalance is a serious problem in production.
Closesscylladb/scylladb#23708
* github.com:scylladb/scylladb:
tablets: Equalize per-table balance when allocating tablets for a new table
load_sketch: Tolerate missing tablet_map when selecting for a given table
tests: tablets: Simplify tests by moving common code to topology_builder
Changing DC or rack on a node which was already bootstrapped is, in
case of vnodes, very unsafe (almost guaranteed to cause data loss or
unavailability), and is outright not supported if the cluster has
a tablet-backed keyspaces. Moreover, the possibility of doing that
makes it impossible to uphold some of the invariants promised by
the RF-rack-valid flag, which is eventually going to become
unconditionally enabled.
Get rid of the above problems by removing the possibility of changing
the DC / rack of a node. A node will now fail to start if its snitch
reports a different DC or rack than the one that was reported during the
first boot.
Fixes: scylladb/scylladb#23278Fixes: scylladb/scylladb#22869
Marking for backport to 2025.1, as this is a necessary part of the RF-rack-valid saga
Closesscylladb/scylladb#23800
* github.com:scylladb/scylladb:
doc: changing topology when changing snitches is no longer supported
test: cluster: introduce test_no_dc_rack_change
storage_service: don't update DC/rack in update_topology_with_local_metadata
main: make dc and rack immutable after bootstrap
test: cluster: remove test_snitch_change
The scylla_gdb tests verify, as a sanity check, that the executable
was built with debug information. They do so via file(1).
In Fedora 42, file(1) crashes on ELF files that have interpreter pathnames
larger than 128 characters[1]. This was later fixed[2], but the fix is not
in any release.
Work around the problem by using objdump instead of file.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2354970
[2] b3384a1fbfClosesscylladb/scylladb#23823
Currently log files have information about run_id twice:
cluster.object_store_test_backup.10.test_abort_restore_with_rpc_error.dev.10_cluster.log
However, sometimes the first run_id can be incorrect:
cluster.object_store_test_backup.1.test_abort_restore_with_rpc_error.dev.10_cluster.log
Removing first run_id in the name to not face this issue and because
it's actually redundant.
Removing creation empty file for scylla manager log, since it redundant
and was done as incorrect assumption on the root cause of the fail.
Add extension to the stacktrace file, so it will be opened in the
browser in Jenkins in the new tab instead of downloading it.
Fixes: https://github.com/scylladb/scylladb/issues/23731Closesscylladb/scylladb#23797
It mirrors the sstable method of the same name, which is public. With ->
operator, it's just as convenient to call it directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
... and replace get_sstable() with it. It's more natural (despite having
the only user) to consider the class to be yet another "pointer" to an
sstable.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Update the regular expression in `check_node_log_for_failed_mutations` to avoid
false test failures when DEBUG-level logging is enabled.
Fixesscylladb/scylladb#23688Closesscylladb/scylladb#23658
Fixes#23774
Test code falls into same when_all issue as http client did.
Avoid passing exceptions through this, and instead catch and
report in worker lambda.
Closesscylladb/scylladb#23778
Implement the CopyObject API to directly copy S3 object from one location to another. This implementation consumes zero networking overhead on the client side since the object is copied internally by S3 machinery
Usage example: Backup of tiered SSTables - you already have SSTables on S3, CopyObject is the ideal way to go
No need to backport since we are adding new functionality for a future use
Closesscylladb/scylladb#23779
* github.com:scylladb/scylladb:
s3_client: implement S3 copy object
s3_client: improve exception message
s3_client: reposition local function for future use
This PR enhances S3 throughput by leveraging every available shard to upload backup files concurrently. By distributing the load across multiple shards, we significantly improve the upload performance. Each shard retrieves an SSTable and processes its files sequentially, ensuring efficient, file-by-file uploads.
To prevent uncontrolled fiber creation and potential resource exhaustion, the backup task employs a directory semaphore from the sstables_manager. This mechanism helps regulate concurrency at the directory level, ensuring stable and predictable performance during large-scale backup operations.
Refs #22460fixes: #22520
```
===========================================
Release build, master, smp-16, mem-32GiB
Bytes: 2342880184, backup time: 9.51 s
===========================================
Release build, this PR, smp-16, mem-32GiB
Bytes: 2342891015, backup time: 1.23 s
===========================================
```
Looks like it is faster at least x7.7
No backport needed since it (native backup) is still unused functionality
Closesscylladb/scylladb#23727
* github.com:scylladb/scylladb:
backup: Add test for invalid endpoint
backup_task: upload on all shards
backup_task: integrate sharded storage manager for upload
Commit 14bf09f447 added a single-chunk layout to `managed_bytes`, which makes the overhead of `managed_bytes` smaller in the common case of a small buffer.
But there was a bug in it. In the copy constructor of `managed_bytes`, a copy of a single-chunk `managed_bytes` is made single-chunk too.
But this is wrong, because the source of the copy and the target of the copy might have different preferred max contiguous allocation sizes.
In particular, if a `managed_bytes` of size between 13 kiB and 128 kiB is copied from the standard allocator into LSA, the resulting `managed_bytes` is a single chunk which violates LSA's preferred allocation size. (And therefore is placed by LSA in the standard allocator).
In other words, since Scylla 6.0, cache and memtable cells between 13 kiB and 128 kiB are getting allocated in the standard allocator rather than inside LSA segments.
Consequences of the bug:
1. Effective memory consumption of an affected cell is rounded up to the nearest power of 2.
2. With a pathological-enough allocation pattern (for example, one which somehow ends up placing a single 16 kiB memtable-owned allocation in every aligned 128 kiB span), memtable flushing could theoretically deadlock, because the allocator might be too fragmented to let the memtable grow by another 128 kiB segment, while keeping the sum of all allocations small enough to avoid triggering a flush. (Such an allocation pattern probably wouldn't happen in practice though).
3. It triggers a bug in reclaim which results in spurious allocation failures despite ample evictable memory.
There is a path in the reclaimer procedure where we check whether reclamation succeeded by checking that the number of free LSA segments grew.
But in the presence of evictable non-LSA allocations, this is wrong because the reclaim might have met its target by evicting the non-LSA allocations, in which case memory is returned directly to the standard allocator, rather than to the pool of free segments.
If that happens, the reclaimer wrongly returns `reclaimed_nothing` to Seastar, which fails the allocation.
Refs (possibly fixes) https://github.com/scylladb/scylladb/issues/21072
Fixes https://github.com/scylladb/scylladb/issues/22941
Fixes https://github.com/scylladb/scylladb/issues/22389
Fixes https://github.com/scylladb/scylladb/issues/23781
This is a regression fix, should be backported to all affected releases.
Closes scylladb/scylladb#23782
* github.com:scylladb/scylladb:
managed_bytes_test: add a reproducer for #23781
managed_bytes: in the copy constructor, respect the target preferred allocation size
There are two tests which test incremental read repair: one with row the other with partition tombstones. The tests currently force vnodes, by creating the test keyspace with {'enabled': false}. Even so, the tests were found to be flaky so one of them are marked for skip. This commit does the following changes:
* Make the tests use tablets by creating the test keyspace with tablets.
* Change the way the tests write data so it works with tablets: currently the tests use scylla-sstable write + upload but this won't work with tablets since upload with tablets implies --load-and-stream which means data is streamed to all replicas (no difference created between nodes). Switch to the classic stop-node + write to other replica with CL=ONE.
* Remove the skip added to the partition-tombstone test variant.
Fixes: #21179
Test improvement, no backport required.
Closesscylladb/scylladb#23167
* github.com:scylladb/scylladb:
wip
test/cluster/test_read_repair: make incremental test work with tablets
Changing DC or rack on a node which was already bootstrapped is, in
case of vnodes, very unsafe (almost guaranteed to cause data loss or
unavailability), and is outright not supported if the cluster has
a tablet-backed keyspaces. Moreover, the possibility of doing that
makes it impossible to uphold some of the invariants promised by
the RF-rack-valid flag, which is eventually going to become
unconditionally enabled.
Get rid of the above problems by removing the possibility of changing
the DC / rack of a node. A node will now fail to start if its snitch
reports a different DC or rack than the one that was reported during the
first boot.
Fixes: scylladb/scylladb#23278
Fixes the following scenario:
1. Scale out adds new nodes to each rack
2. Table is created - all tablets are allocated to new nodes because they have low load
3. Rebalancing moves tablets from old nodes to new nodes - table balance for the new table is not fixed
We're wrong to try to equalize global load when allocating tablets,
and we should equalize per-table load instead, and let background load
balancing fix it in a fair way. It will add to the allocated storage
imbalance, but:
1. The table is initially empty, so doesn't impact actual storage imbalance.
2. It's more important to avoid overloading CPU on the nodes - imbalance hurts this aspect immediately.
3. If the table was created before imbalance was formed, we would end up in the same situation in the problematic scenario after the patch.
4. It's the job of the load balancing to keep up with storage growing, and if it's not, scale out should kick in.
Before we have CPU-aware tablet allocation, and thus can prove we have
CPU capacity on the small nodes, we should respect per-table balance
as this is the way in which we achieve full CPU utilization.
Fixes#23631
* During the development phase, the backup functionality broke because we lacked a test that runs backup with an invalid endpoint. This commit adds a test to cover that scenario.
* Add checking for the expected error to be propagated from failing/aborted backup
This series adds support for reporting consumed capacity in BatchGetItem operations in Alternator.
It includes changes to the RCU accounting logic, exposing internal functionality to support batch-specific behavior, and adds corresponding tests for both simple and complex use cases involving multiple tables and consistency modes.
Need backporting to 2025.1, as RCU and WCU are not fully supported
Fixes#23690Closesscylladb/scylladb#23691
* github.com:scylladb/scylladb:
test_returnconsumedcapacity.py: test RCU for batch get item
alternator/executor: Add RCU support for batch get items
alternator/consumed_capacity: make functionality public
Add support for the CopyObject API to enable direct copying of S3
objects between locations. This approach eliminates networking
overhead on the client side, as the operation is handled internally
by S3.
There are two tests which test incremental read repair: one with row the
other with partition tombstones. The tests currently force vnodes, by
creating the test keyspace with {'enabled': false}. Even so, the tests
were found to be flaky so one of them are marked for skip.
This commit does the following changes:
* Make the tests use tablets by creating the test keyspace with tablets.
* Change the way the tests write data so it works with tablets:
currently the tests use scylla-sstable write + upload but this won't
work with tablets since upload with tablets implies --load-and-stream
which means data is streamed to all replicas (no difference created
between nodes). Switch to the classic stop-node + write to other
replica with CL=ONE.
* Remove the skip added to the partition-tombstone test variant.
Also add tracing to the read-repair query, to make debugging the test
easier if it fails.
Fixes: #21179
Continue the effort of normalizing reader names, stripping legacy qualifying terms like "flat" and "v2".
Flat and v2 readers are the default now, we only need to add qualifying terms to readers which are different than the normal.
One such reader remains: `make_generating_reader_v1()`.
This PR contains mostly mechanical changes, done with a sed script. Commits which only contain such mechanical renames are marked as such in the commitlog.
Code cleanup, no backport needed.
Closesscylladb/scylladb#23767
* github.com:scylladb/scylladb:
readers: mv reversing_v2.hh reversing.hh
readers: mv generating_v2.hh generating.hh
tree: s/make_generating_reader_v2/make_generating_reader/
readers: mv from_mutations_v2.hh from_mutations.hh
tree: s/make_mutation_reader_from_mutations_v2/make_mutation_reader_from_mutations/s
readers: mv from_fragments_v2.hh from_fragments.hh
readers: mv forwardable_v2.hh forwardable.hh
readers: mv empty_v2.hh empty.hh
tree: s/make_empty_flat_reader_v2/make_empty_mutation_reader/
readers/empty_v2.hh: replace forward declarations with include of fwd header
readers/mutation_reader_fwd.hh: forward declare reader_permit
readers: mv delegating_v2.hh delegating.hh
readers/delegating_v2.hh: move reader definition to _impl.hh file
This patch adds tests for consumed capacity in batch get item. It tests
both the simple case and the multi-item, multi-table case that combines
consistent and non-consistent reads.