Currently Alternator supports compressed requests in the gzip format
with "Content-Encoding: gzip". We did not support any other compression
formats.
It turns out that DynamoDB also supports the "deflate" encoding.
The "deflate" format is just a small variant of gzip and also supported
by the same zlib library that we already use, so it is very easy
to add support for it as well. So this patch adds it.
Beyond compatibility with DynamoDB, another benefit of this patch is
symmetry with our response compression support (PR #27454), where
we supported both gzip and deflate compression of responses - so
we should support the same for requests.
This patch also adds tests for Content-Encoding: deflate, which pass
on DynamoDB (proving that "deflate" is indeed supported there).
On Alternator the new tests failed before this patch and pass with
this patch.
Refs #27243 (which asks to support more compression formats).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27917
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertInvalidateSizedRecordsTest.java into our
cqlpy framework.
This is one of the tests added to Cassandra as part of the vector
search work, but actually has nothing to do with vector search -
it checks what happens when key columns of different types exceeed
their maximum size (64KB).
Unfortunately, each one of the tests added here *fail* on ScyllaDB,
providing more reproducers for two already known issues (which
already had plenty of reproducers...):
Refs #8627 Cleanly reject updates with indexed values where value > 64k
Refs #12247 Better error reporting for oversized keys during INSERT
One of the tests also fails on Cassandra, due to CASSANDRA-19270.
It is not clear to me how this unit test actually passed on Cassandra,
I can only guess that the Python driver somehow makes the request
differently than what the Java unit tests use to make requests to
Cassandra.
One of the tests in the original Cassandra source file I did not
translate, readingEmptyStringsForDifferentTypes, because it tests
cqlsh, not pure CQL.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27944
It is considered a dangerous practice as it creates a side-effect for
later calls to the same function.
Create a new variable instead and mutate that. Also remove the unused
update_known_ids parameter, which defaults to True and no caller changes
it. Passing False to this param also seem to have no effect. Instead of
trying to guess what the desired effect of passing False is and fixing
it, just remove this unused param.
Found by CodeQL "Modification of parameter with default".
To hopefully shut up CodeQL "Iterable can be either a string or a sequence".
This change makes the code more readable anyway, so it is more than just
a gratuitous change to make some code-scanner happy.
Replace manual init of parent fields.
Found by CodeQL: "Missing call to superclass `__init__` during object
initialization".
The secret_key is not initialized to server.secret_key, instead of
server.access_key. This probably fixes a (benign) bug.
Write the boost logs into stdout in HRF format and in XML to the file. The XML file will be used for parsing and providing the error information in the summary section of the fail.
Fixes: https://github.com/scylladb/scylladb/issues/28045
Framework enhancements, no need to backport.
Closesscylladb/scylladb#28107
* github.com:scylladb/scylladb:
test.py: remove XML log from fail summary
test.py: fix truncated boost output to stdout file
If a test tries to move a tablet, it assumes the tablets are stable.
This fixes flakiness exposed by size-based load-balancing and a later
change to refresh stats sooner.
It's a global operation, so we can use any server.
It's not only convenient. The call via api.disable_tablet_balancing()
confuse people to think that it's a per-server operation. This leads
to proliferation of code which does it needlessly on all servers.
Change the behavior of the catching the boost log output. With this
change boost will output it's logging to stdour with HRF format and to
the tempfile in XML format. This will help for easier debuggint when all
messages will be in the output file and still in the fail summary.
This will move responsibility for running tests with pytest in the same manner as it was done with boost tests. From this commit, test.py is not responsible anymore for running python tests and relies completely on pytest.
This is another step for unification of test execution.
Convert skip_mode function to `pytest.mark` to be able to use to annotate the whole module instead of each test explicitly.
NOTE: this is a breaking change. From this commit, several directories with tests will require a path to the file to launch the test. Affected directories
test/alternator
test/broadcast_tables
test/cql
test/cqlpy
test/rest_api
Changes only in framework, so no backport.
This PR will increase the amount of the tests by 30 test, due to the fact that how test.py and pytest discover tests. test.py count a file as a test, and when skip used in suite.yaml it will exclude the tests from discovery completely.
While the pytest count test funstion as a test and uses skip_mode mark and will discover the tests, but it will skip them during execution, hence the difference
test.py output before PR:
```bash
> ./test.py --mode=release rest_api/test_compaction_task rest_api/test_task_manager --list --no-gather-metrics
```
test.py output in this PR:
```bash
> ./test.py --mode=release test/rest_api/test_compaction_task.py test/rest_api/test_task_manager.py --list
rest_api/test_compaction_task.py::test_global_major_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_major_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_reshaping_compaction_task.release.1
rest_api/test_compaction_task.py::test_resharding_compaction_task.release.1
rest_api/test_compaction_task.py::test_regular_compaction_task.release.1
rest_api/test_compaction_task.py::test_compaction_task_abort.release.1
rest_api/test_compaction_task.py::test_major_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_compaction_progress[major_keyspace_compaction_task_impl_run_fail].release.1
rest_api/test_compaction_task.py::test_compaction_progress[shard_major_keyspace_compaction_task_impl_run_fail].release.1
rest_api/test_compaction_task.py::test_compaction_progress[table_major_keyspace_compaction_task_impl_run_fail].release.1
rest_api/test_task_manager.py::test_task_manager_modules.release.1
rest_api/test_task_manager.py::test_task_manager_tasks.release.1
rest_api/test_task_manager.py::test_task_manager_status_running.release.1
rest_api/test_task_manager.py::test_task_manager_status_done.release.1
rest_api/test_task_manager.py::test_task_manager_status_failed.release.1
rest_api/test_task_manager.py::test_task_manager_not_abortable.release.1
rest_api/test_task_manager.py::test_task_manager_wait.release.1
rest_api/test_task_manager.py::test_task_manager_ttl.release.1
rest_api/test_task_manager.py::test_task_manager_user_ttl.release.1
rest_api/test_task_manager.py::test_task_manager_sequence_number.release.1
rest_api/test_task_manager.py::test_task_manager_recursive_status.release.1
rest_api/test_task_manager.py::test_module_not_exists.release.1
rest_api/test_task_manager.py::test_task_folding.release.1
rest_api/test_task_manager.py::test_abort_on_unregistered_task.release.1
```
Fixes: https://github.com/scylladb/scylladb/issues/27716Closesscylladb/scylladb#26395
* github.com:scylladb/scylladb:
test.py: fix test_vector_similarity.py
docs: add directories excluded from test.py
test.py: prevent file descriptors leaking
test.py: capture print inside the test
test.py: do not print header for collection with test.py
test.py: remove not supported functionality
test.py: switch of execution of several test directories by test.py runner
test.py: integrate python tests to be executed with pytest runner
test.py: fix test/vector_search_validator to be able to run with pytest
test.py: prepare base class for migration
test.py: move environment preparation to one method
test.py: introduce new environment variable TESTPY_PREPARED_ENVIRONMENT
repair: Implement auto repair for tablet repair
This patch implements the basic auto repair support for tablet repair.
It was decided to add no per table configuration for the initial
implementation, so two scylla yaml config options are introduced to set
the default auto repair configs for all the tablet tables.
- auto_repair_enabled_default
Set true to enable auto repair for tablet tables by default. The value
will be overridden by the per keyspace or per table configuration which
is not implemented yet.
- auto_repair_threshold_default_in_seconds
Set the default time in seconds for the auto repair threshold for tablet
tables. If the time since last repair is bigger than the configured
time, the tablet is eligible for auto repair. The value will be
overridden by the per keyspace or per table configuration which is not
implemented yet.
The following metrcis are added:
- auto_repair_needs_repair_nr
The number of tablets with auto repair enabled that needs repair
- auto_repair_enabled_nr
The number of tablets with auto repair enabled
The metrics are useful to tell if auto repair is falling behind.
In the future, more auto repair scheduling will be added, e.g.,
scheduling based on the repaired and unrepaired sstable set size,
tombstone ratio and so on, in addition to the time based scheduling.
Fixes SCYLLADB-99
New feature. No backport.
Closesscylladb/scylladb#27534
* github.com:scylladb/scylladb:
topology_coordinator: Add metrics for tablet repair
repair: Implement auto repair for tablet repair
Create and drop view operations are currently performed on all shards, and their execution is not fully serialized. On slower processors this can lead to interleavings that leave stale entries in `system.scylla_views_build`
A problematic sequence looks like this:
* `on_create_view()` runs on shard 0 → entries for shard 0 and shard 1 are created
* `on_drop_view()` runs on shard 0 → entry for shard 0 is removed
* `on_create_view()` runs on shard 1 → entries for shard 0 and shard 1 are created again
* `on_drop_view()` runs on shard 1 → entry for shard 1 is removed, while the shard 0 entry remains
This results in a leftover row in `system.scylla_views_builds_in_progress`, causing `view_build_test.cc` to get stuck indefinitely in an eventual state and eventually be terminated by CI.
This patch fixes the issue by fully serializing all view create and drop operations through shard 0. Shard 0 becomes the single execution point and notifies other shards to perform their work in order. Requests originating.
new process:
- view_builder::on_create_view(...) runs only on shard 0 and kicks off dispatch_create_view(...) in the background.
- dispatch_create_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed.
- dispatch_create_view(...) calls handle_seed_view_build_progress(...) on shard 0. That:
- writes the global “build progress” row across all shards via _sys_ks.register_view_for_building_for_all_shards(...).
- After seeding, dispatch_create_view(...) broadcasts to all shards with container().invoke_on_all(...).
- Each shard runs handle_create_view_local(...), which:
- waits for pending base writes/streams, flushes the base,
- resets the reader to the current token and adds the new view,
- handles errors and triggers _build_step to continue processing.
Drop view
- view_builder::on_drop_view(...) runs only on shard 0 and kicks off dispatch_drop_view(...) in the background.
- dispatch_drop_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed.
- It broadcasts handle_drop_view_local(...) to all shards with invoke_on_all(...).
- Each shard runs handle_drop_view_local(...), which:
- removes the view from local build state (_base_to_build_step and _built_views) by scanning existing steps,
- ignores missing keyspace cases.
- After all shards finish local cleanup, shard 0 runs handle_drop_view_global_cleanup(...), which:
- removes global build progress, built‑view state, and view build status in system tables,
Shutdown
- drain() waits on _view_notification_sem before _sem so in‑flight dispatches finish before bookkeeping is halted.
In addition, the test is adjusted to remove the long eventual wait (596.52s / 30 iterations) and instead rely on the default wait of 17 iterations (~4.37 minutes), eliminating unnecessary delays while preserving correctness.
Fixes: https://github.com/scylladb/scylladb/issues/27898
Backport: not required as the problem happens on master
Closesscylladb/scylladb#27929
To prepare for implementation of filtering we skip validation
of where clauses in vector search queries. All queries that would
be blocked by the lack of ALLOW FILTERING now will pass through.
Fixes: VECTOR-410
Closesscylladb/scylladb#27758
We can run Alternator's tests against DynamoDB with `test/alternator/run --aws`, and our intention is that all except a few specially marked should pass on DynamoDB - indicating that the test itself is correct and checks compatibility with DynamoDB and not with some misunderstood spec.
Before this patch series, almost two dozen Alternator's tests failed on DynamoDB. This series fixes most of them.
Refs #26079 (it fixes almost all the problems but probably not all of them so let's keep the issue open for a while longer)
Closesscylladb/scylladb#27995
* github.com:scylladb/scylladb:
test/alternator: fix some expected error messages to fit DynamoDB
test/alternator: fix compressed request test on non-us-east1
test/alternator: fix test's expected error message on DynamoDB
test/alternator: mark Alternator-only test scylla_only
test/alternator: fix test on DynamoDB
test/alternator: increase wait_for_gsi() timeout
test/alternator: fix test passing a spurious parameter
There is a known limitation of the xdist.
Since it makes discovery in each thread, then compare it with master thread. The discovered lists of test should be the same. Sets are not order guaranteed, so they should not be used for parametrized testing, because discovery of the tests with using xdist will fail.
This PR just converts set to dist, to eliminate issue mentioned above.
We currently do it only for a bootstrapping node, which is a bug. The
missing IP can cause an internal error, for example, in the following
scenario:
- replace fails during streaming,
- all live nodes are shut down before the rollback of replace completes,
- all live nodes are restarted,
- live nodes start hitting internal error in all operations that
require IP of the replacing node (like client requests or REST API
requests coming from nodetool).
We fix the bug here, but we do it separately for replace with different
IP and replace with the same IP.
For replace with different IP, we persist the IP -> host ID mapping
in `system.peers` just like for bootstrap. That's necessary, since there
is no other way to determine IP of the replacing node on restart.
For replace with the same IP, we can't do the same. This would require
deleting the row corresponding to the node being replaced from
`system.peers`. That's fine in theory, as that node is permanently
banned, so its IP shouldn't be needed. Unfortunately, we have many
places in the code where we assume that IP of a topology member is always
present in the address map or that a topology member is always present in
the gossiper endpoint set. Examples of such places:
- nodetool operations,
- REST API endpoints,
- `db::hints::manager::store_hint`,
- `group0_voter_handler::update_nodes`.
We could fix all those places and verify that drivers work properly when
they see a node in the token metadata, but not in `system.peers`.
However, that would be too risky to backport.
We take a different approach. We recover IP of the replacing node on
restart based on the state of the topology state machine and
`system.peers` just after loading `system.peers`.
We rely on the fact that group 0 is set up at this point. The only case
where this assumption is incorrect is a restart in the Raft-based
recovery procedure. However, hitting this problem then seems improbable,
and even if it happens, we can restart the node again after ensuring
that no client and REST API requests come before replace is rolled back
on the new topology coordinator. Hence, it's not worth to complicate the
fix (by e.g. looking at the persistent topology state instead of the
in-memory state machine).
Fixes#28057
Backport this PR to all branches as it fixes a problematic bug.
Closesscylladb/scylladb#27435
* github.com:scylladb/scylladb:
gossiper: add_saved_endpoint: make generations of excluded nodes negative
test: introduce test_full_shutdown_during_replace
utils: error_injection: allow aborting wait_for_message
raft topology: preserve IP -> ID mapping of a replacing node on restart
Fixes#27992
When doing a commit log oversized allocation, we lock out all other writers by grabbing
the _request_controller semaphore fully (max capacity).
We thereafter assert that the semaphore is in fact zero. However, due to how things work
with the bookkeep here, the semaphore can in fact become negative (some paths will not
actually wait for the semaphore, because this could deadlock).
Thus, if, after we grab the semaphore and execution actually returns to us (task schedule),
new_buffer via segment::allocate is called (due to a non-fully-full segment), we might
in fact grab the segment overhead from zero, resulting in a negative semaphore.
The same problem applies later when we try to sanity check the return of our permits.
Fix is trivial, just accept less-than-zero values, and take same possible ltz-value
into account in exit check (returning units)
Added whitebox (special callback interface for sync) unit test that provokes/creates
the race condition explicitly (and reliably).
Closesscylladb/scylladb#27998
The current code:
```
try:
cql.execute(f"INSERT INTO {cf} (pk, t) VALUES (-1, 'x')", host=host[0], execution_profile=cl_one_profile).result()
except Exception:
pass
```
contains a typo: `host=host[0]` which throws an exception becase Host
object is not subscriptable. The test does not fail because the except
block is too broad and suppresses all exceptions.
Fixing the typo alone is insufficient. The write still succeeds because
the remaining nodes are UP and the query uses CL=ONE, so no failure
should be expected.
Another source of flakiness is data verification:
```
SELECT * FROM {cf} WHERE pk = 0;
```
Even when a coordinator is explicitly provided, using CL=ONE does not
guarantee a local read. The coordinator may forward the read request to
another replica, causing the verification to fail nondeterministically.
This patch rewrites the tests to address these issues:
- Fix the typo: `host[0]` to `hosts[0]`
- Verify data using `MUTATION_FRAGMENTS({cf})` which guarantees a local
read on the coordinator node
- Reconnect the driver after node restart
Fixes https://github.com/scylladb/scylladb/issues/27933Closesscylladb/scylladb#27934
With migration to the pytest, file descriptors will be hanged during the whole life of the process. Previously it was not an issue, because test.py was executing only one file with Popen, so descriptors will be freed with process done. With new approach they are blocked. This will allow to eliminate this.
Fix issue when we had issue with getting cluster and then trying to set it dirty while it None.
Put cluster to the pool only if it was created
In the current state pytest do not support the order of execution, so this parameter is removed. There is no big need in this due to the differences what pytest and test.py counted test. pytest run test functions in the threads, while test.py executed test files in the threads. That's why pytest's way is more granular and allows to fill threads better.
Remove skip node, since it already added as a pytest mark for each test in the file.
Remove pool_size, since this is not used by pytest at all. Pytest uses
xdist to set the amount of threads instead of pool_size used by test.py
With this commit test.py will lose ability to run tests by itself always bypassing execution to the pytest.
NOTE: this is a breaking change. From this commit, several directories
with tests will require a path to the file to launch the test.
Affected directories
test/alternator
test/broadcast_tables
test/cql
test/cqlpy
test/rest_api
build_mode fixture have dynamic scope. It depends how the pytest is
executed. When it executed through test.py scope will be session and
since it's broader that package everything work fine. While with pure
pytest it will fail because build_mode will have module scope.
This fix allows to run tests with pure pytest, this needed for migration
test to be executed by pytest runner instead test.py.
Since all tests share the same base class and some of the tests executed by test.py and some with pytest, we need to handle two cases where configuration is located: suite.yaml and test_config.yaml
After full migration suite.yaml case will be removed
Since anyway these two methods should be called one by one in two different cases: when test.py executes test and pytest executes test, merging them into one. Additionally, set environment variable to show the underneath pytest process that environment was already prepared and there is no need to clean directories or start additional services.
Introduce the new environment variable that will be used to signalize to the pytest runner that environment war already prepared by test.py. This needed to be able to run the test with pytest and test.py(that actually will run pytest underneath).
1. With this change the test really waits 10s, previously (in case
something went wrong), the timeout could take way more than that.
2. Added `else` to above `if` to increase clarity of execution flow -
it doesn't change logic, but makes it more clear.
This patch implements the basic auto repair support for tablet repair.
It was decided to add no per table configuration for the initial
implementation, so two scylla yaml config options are introduced to set
the default auto repair configs for all the tablet tables.
- auto_repair_enabled_default
Set true to enable auto repair for tablet tables by default. The value
will be overridden by the per keyspace or per table configuration which
is not implemented yet.
- auto_repair_threshold_default_in_seconds
Set the default time in seconds for the auto repair threshold for tablet
tables. If the time since last repair is bigger than the configured
time, the tablet is eligible for auto repair. The value will be
overridden by the per keyspace or per table configuration which is not
implemented yet.
The following metrcis are added:
- auto_repair_needs_repair_nr
The number of tablets with auto repair enabled that needs repair
- auto_repair_enabled_nr
The number of tablets with auto repair enabled
The metrics are useful to tell if auto repair is falling behind.
In the future, more auto repair scheduling will be added, e.g.,
scheduling based on the repaired and unrepaired sstable set size,
tombstone ratio and so on, in addition to the time based scheduling.
Fixes SCYLLADB-99
Allow creating materialized views and secondary indexes in a tablets keyspace only if it's RF-rack-valid, and enforce RF-rack-validity while the keyspace has views by restricting some operations:
* Altering a keyspace's RF if it would make the keyspace RF-rack-invalid
* Adding a node in a new rack
* Removing / Decommissioning the last node in a rack
Previously the config option `rf_rack_valid_keyspaces` was required for creating views. We now remove this restriction - it's not needed because we always maintain RF-rack-validity for keyspaces with views.
The restrictions are relevant only for keyspaces with numerical RF. Keyspace with rack-list-based RF are always RF-rack-valid.
Fixesscylladb/scylladb#23345
Fixes https://github.com/scylladb/scylladb/issues/26820
backport to relevant versions for materialized views with tablets since it depends on rf-rack validity
Closesscylladb/scylladb#26354
* github.com:scylladb/scylladb:
docs: update RF-rack restrictions
cql3: don't apply RF-rack restrictions on vector indexes
cql3: add warning when creating mv/index with tablets about rf-rack
service/tablet_allocator: always allow tablet merge of tables with views
locator: extend rf-rack validation for rack lists
test: test rf-rack validity when creating keyspace during node ops
locator: fix rf-rack validation during node join/remove
test: test topology restrictions for views with tablets
test: add test_topology_ops_with_rf_rack_valid
topology coordinator: restrict node join/remove to preserve RF-rack validity
topology coordinator: add validation to node remove
locator: extend rf-rack validation functions
view: change validate_view_keyspace to allow MVs if RF=Racks
db: enforce rf-rack-validity for keyspaces with views
replica/db: add enforce_rf_rack_validity_for_keyspace helper
db: remove enforce parameter from check_rf_rack_validity
test: adjust test to not break rf-rack validity
Disabling of balancing waits for topology state machine to become idle, to guarantee that no migrations are happening or will happen after the call returns. But it doesn't interrupt the scheduler, which means the call can take arbitrary amount of time. It may wait for tablet repair to be finished, which can take many hours.
We should do it via topology request, which will interrupt the tablet scheduler.
Enabling of balancing can be immediate.
Fixes https://github.com/scylladb/scylladb/issues/27647Fixes#27210Closesscylladb/scylladb#27736
* https://github.com/scylladb/scylladb:
test: Verify that repair doesn't block disabling of tablet load balancing
tablets: Make balancing disabling call preempt tablet transitions
VECTOR_SEARCH_INDEXING permission didn't work on cdc tables as we mistakenly checked for vector indexes on the cdc table insted of the base.
This patch fixes that and adds a test that validates this behavior.
Fixes: VECTOR-476
Closesscylladb/scylladb#28050
Call discover_staging_sstables in view_update_generator::start() instead
of in the constructor, because the constructor is called during
initialization before sstables are loaded.
The initialization order was changed in 5d1f74b86a and caused this
regression. It means the view update generator won't discover staging
sstables on startup and view updates won't be generated for them. It
also causes issues in sstable cleanup.
view_update_generator::start() is called in a later stage of the
initialization, after sstable loading, so do the discovery of staging
sstables there.
Fixesscylladb/scylladb#27956Closesscylladb/scylladb#27970
Currently, database::truncate_table_on_all_shards calls the table::can_flush only on the coordinator shard
and therefore it may miss shards with dirty data if the coordinator shard happens to have empty memtables, leading to clearing the memtables with dirty data rather than flushing them.
This change fixes that by making flush safe to be called, even if the memtable list is empty, and calling it on every shard that can flush (i.e. seal_immediate_fn is engaged).
Also, change database_test::do_with_some_data is use random keys instead of hard-coded key names, to reproduce this issue with `snapshot_list_contains_dropped_tables`.
Fixes#27639
* The issue exists since forever and might cause data loss due to wrongly clearing the memtable, so it needs backport to all live versions
Closesscylladb/scylladb#27643
* github.com:scylladb/scylladb:
test: database_test: do_with_some_data: randomize keys
database: truncate_table_on_all_shards: drop outdated TODO comment
database: truncate_table_on_all_shards: consider can_flush on all shards
memtable_list: unify can_flush and may_flush
test: database_test: add test_flush_empty_table_waits_on_outstanding_flush
replica: table, storage_group, compaction_group: add needs_flush
test: database_test: do_with_some_data_in_thread: accept void callback function
2-DC cluster parallel non-RBNO rebuild failure when expanding RF in DC2.
Steps to reproduce:
1. Provision a cluster with 2 datacenters and at least 2 nodes in the second datacenter.
2. Let’s assume datacenter names are "dc1" and "dc2".
3. Create a keyspace ("keyspace1") with RF=0 in dc2.
4. Populate some data into dc1.
5. Change keyspace1 replication in dc2 to 2.
6. On 2 nodes in dc2 run the following command in parallel:
nodetool rebuild --source-dc dc1
Parallel execution of rebuilds is not possible with RBNO enabled.
This test is the repro for #27804Closesscylladb/scylladb#27747
Currently the function uses a regular expression
to check the system log for a specific message.
This is tangential to the ability to cleanly abort the restore task, plus the regular expression has a syntax error:
```
test/cluster/object_store/test_backup.py:534
/home/bhalevy/dev/scylla/test/cluster/object_store/test_backup.py:534: SyntaxWarning: "\(" is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\("? A raw string is also an option.
await wait_for_first_completed([l.wait_for("Failed to handle STREAM_MUTATION_FRAGMENTS \(receive and distribute phase\) for .+: Streaming aborted", timeout=10) for l in logs])
```
Thsi change modernizes the implementation by:
- using auto_dc_rack for manager.servers_add
- using new_test_keyspace to generate and auto delete the keyspace
- using async gatherio and a prepared statement to insert the data
- simplifing the keys and values by NOT using os.urandom (that is notoriously slow)
- inserting fewer keys in debug mode
- removing the log check
With that, the test can be reenabled in all modes.
* No backport needed since the test was disabled
Closesscylladb/scylladb#27892
* github.com:scylladb/scylladb:
test_backup: do_abort_restore: reduce data footprint
test_backup: do_abort_restore: use error injection
test_backup: do_abort_restore: use asyncio for cql
test_backup: do_abort_restore: use new_test_keyspace
test_backup: do_abort_restore: use logger rather than print
test_backup: do_abort_restore: pass auto_rack_dc to servers_add
This patch adds tablet repair progress report support so that the user
could use the /task_manager/task_status API to query the progress.
In order to support this, a new system table is introduced to record the
user request related info, i.e, start of the request and end of the
request.
The progress is accurate when tablet split or merge happens in the
middle of the request, since the tokens of the tablet are recorded when
the request is started and when repair of each tablet is finished. The
original tablet repair is considered as finished when the finished
ranges cover the original tablet token ranges.
After this patch, the /task_manager/task_status API will report correct
progress_total and progress_completed.
Fixes#22564Fixes#26896Closesscylladb/scylladb#27679
It was obseved:
```
test_repair_disjoint_row_2nodes_diff_shard_count was spuriously failing due to
segfault.
backtrace pointed to a failure when allocating an object from the chain of
freed objects, which indicates memory corruption.
(gdb) bt
at ./seastar/include/seastar/core/shared_ptr.hh:275
at ./seastar/include/seastar/core/shared_ptr.hh:430
Usual suspect is use-after-free, so ran the reproducer in the sanitize mode,
which indicated shared ptr was being copied into another cpu through the
multi shard writer:
seastar - shared_ptr accessed on non-owner cpu, at: ...
--------
seastar::smp_message_queue::async_work_item<mutation_writer::multishard_writer::make_shard_writer...
```
The multishard writer itself was fine, the problem was in the streaming consumer
for repair copying a shared ptr. It could work fine with same smp setting, since
there will be only 1 shard in the consumer path, from rpc handler all the way
to the consumer. But with mixed smp setting, the ptr would be copied into the
cpus involved, and since the shared ptr is not cpu safe, the refcount change
can go wrong, causing double free, use-after-free.
To fix, we pass a generic incremental repair handler to the streaming
consumer. The handler is safe to be copied to different shards. It will
be a no op if incremental repair is not enabled or on a different shard.
A reproducer test is added. The test could reproduce the crash
consistently before the fix and work well after the fix.
Fixes#27666Closesscylladb/scylladb#27870
Function skip_mode works only on function and only in cluster test. This if OK
when we need to skip one test, but it's not possible to use it with pytestmark
to automatically mark all tests in the file. The goal of this PR is to migrate
skip_mode to be dynamic pytest.mark that can be used as ordinary mark.
Closesscylladb/scylladb#27853
[avi: apply to test/cluster/test_tablets.py::test_table_creation_wakes_up_balancer]
The test if flaky, with failures in:
for server in servers:
> await check_node_log_for_failed_mutations(manager, server)
test/cluster/test_topology_ops_encrypted.py:84:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
manager = <test.pylib.manager_client.ManagerClient object at 0xffff602e8590>
server = ServerInfo(server_id=1769, ip_addr='127.82.127.43', rpc_address='127.82.127.43', datacenter='DEFAULT_DC', rack='DEFAULT_RACK', pid=186578)
async def check_node_log_for_failed_mutations(manager: ManagerClient, server: ServerInfo):
logging.info(f"Checking that node {server} had no failed mutations")
log = await manager.server_open_log(server.server_id)
occurrences = await log.grep(expr="Failed to apply mutation from", filter_expr="(TRACE|DEBUG|INFO)")
> assert len(occurrences) == 0
E AssertionError
test/cluster/util.py:319: AssertionError
As diagnosed by Gleb in https://github.com/scylladb/scylladb/issues/27942#issuecomment-3710013625:
"The fencing errors here look legit given that we do not wait for all
requests to complete while shutting down the storage proxy. The
scenario is this:
Test does writes to rf=3 keyspace with cl=one. One node is shutting
down while there is a tablet migration. Tablet migration executes
barrier and drain which fails on a node that is been shutdown. The
topology coordinator proceeds fencing the old topology, but there
still can be un-handled mutation requests from the shutting down node
on other nodes and they will generate fencing errors like they should.
They way to avoid it (though it is benign) is to wait for all outgoing
storage proxy requests to complete during shutdown, but even then the
error may still happen since a request may timeout before it is
processed by the other side, so it may be completed by a storage proxy
coordinator side, but still not handled by replica side. This what we
have fencing for in the first place."
Fix by diabling background tablet migrations, so that we have no
topology barriers concurrent with node shutdown.
Fixes#27942Closesscylladb/scylladb#28034
The driver must see server_c before we stop server_a, otherwise
there will be no live host in the pool when we attempt to drop
the keyspace:
```
@pytest.mark.asyncio
async def test_not_enough_token_owners(manager: ManagerClient):
"""
Test that:
- the first node in the cluster cannot be a zero-token node
- removenode and decommission of the only token owner fail in the presence of zero-token nodes
- removenode and decommission of a token owner fail in the presence of zero-token nodes if the number of token
owners would fall below the RF of some keyspace using tablets
"""
logging.info('Trying to add a zero-token server as the first server in the cluster')
await manager.server_add(config={'join_ring': False},
property_file={"dc": "dc1", "rack": "rz"},
expected_error='Cannot start the first node in the cluster as zero-token')
logging.info('Adding the first server')
server_a = await manager.server_add(property_file={"dc": "dc1", "rack": "r1"})
logging.info('Adding two zero-token servers')
# The second server is needed only to preserve the Raft majority.
server_b = (await manager.servers_add(2, config={'join_ring': False}, property_file={"dc": "dc1", "rack": "rz"}))[0]
logging.info(f'Trying to decommission the only token owner {server_a}')
await manager.decommission_node(server_a.server_id,
expected_error='Cannot decommission the last token-owning node in the cluster')
logging.info(f'Stopping {server_a}')
await manager.server_stop_gracefully(server_a.server_id)
logging.info(f'Trying to remove the only token owner {server_a} by {server_b}')
await manager.remove_node(server_b.server_id, server_a.server_id,
expected_error='cannot be removed because it is the last token-owning node in the cluster')
logging.info(f'Starting {server_a}')
await manager.server_start(server_a.server_id)
logging.info('Adding a normal server')
await manager.server_add(property_file={"dc": "dc1", "rack": "r2"})
cql = manager.get_cql()
await wait_for_cql_and_get_hosts(cql, [server_a], time.time() + 60)
> async with new_test_keyspace(manager, "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 2} AND tablets = { 'enabled': true }") as ks_name:
test/cluster/test_not_enough_token_owners.py:57:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.14/contextlib.py:221: in __aexit__
await anext(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
manager = <test.pylib.manager_client.ManagerClient object at 0x7f37efe00830>
opts = "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 2} AND tablets = { 'enabled': true }"
host = None
@asynccontextmanager
async def new_test_keyspace(manager: ManagerClient, opts, host=None):
"""
A utility function for creating a new temporary keyspace with given
options. It can be used in a "async with", as:
async with new_test_keyspace(ManagerClient, '...') as keyspace:
"""
keyspace = await create_new_test_keyspace(manager.get_cql(), opts, host)
try:
yield keyspace
except:
logger.info(f"Error happened while using keyspace '{keyspace}', the keyspace is left in place for investigation")
raise
else:
> await manager.get_cql().run_async("DROP KEYSPACE " + keyspace, host=host)
E cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.69.108.39:9042 dc1>: ConnectionException('Pool for 127.69.108.39:9042 is shutdown')})
test/cluster/util.py:544: NoHostAvailable
```
Fixes#28011Closesscylladb/scylladb#28040
With randomized keys, and since we're inserting only 2 keys,
it is possible that they would end up owned only by a single shard,
reproducing #27639 in snapshot_list_contains_dropped_tables.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Many test cases already assume `func` is being called a seastar
thread and although the function they pass returns a (ready) future,
it serves no purpose other than to conform to the interface.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
All tests I am fixing in this patch do pass for me on DynamoDB, but
other developers report that they fail because some DynamoDB servers
apparently use slightly different error messages, with less detail about
the cause of an error. For example, some of our tests currently expect
an error message that looks like:
An error occurred (ValidationException) when calling the Query
operation: Invalid operator used in KeyConditionExpression:
attribute_exists
But some servers don't report the ": attribute_exists" at the end, so
we can't use the word "attribute_exists" it in the test to recognize
the correct error, and needs to use a different word (which both
versions of DynamoDB and Alternator all print).
As another example, the good old DynamoDB error:
An error occurred (ValidationException) when calling the Query
operation: 1 validation error detected: Value 'DOG' at
'conditionalOperator' failed to satisfy constraint: Member must
satisfy enum value set: [OR, AND]
Got replaced by the following less informative message:
An error occurred (ValidationException) when calling the Query
operation: Failed to satisfy constraint: Member must satisfy enum
value set: [ALL, OR]'
So we need to fix the test to allow it too.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>