Add parameters to server_start() method to provide ability to
change Scylla' CLI and env options on a node start.
Also, add `expected_server_up_state` parameter as we have for
server_add() method.
Rework `ScyllaLogFile.wait_for()` method to make it easier
to add required methods to ScyllaNode class of ccm-like shim.
Also, added `ScyllaLogFile.grep_for_errors()` method and
reworked `ScyllaLogFile.grep()`
Currently, stream_manager is initialized after storage_service and
so it is stopped before the storage_service is. In its stop method
storage_service accesses stream_manager which is uninitialized
at a time.
Move stream_manager initialization over the storage_service initialization.
Fixes: #23207.
Closesscylladb/scylladb#24008
This patch fixes "test/cqlpy/run --release 2025.1" which fails as
follows on all tests with indexes or views:
Secondary indexes are not supported on base tables with tablets
test/cqlpy/run can run cqlpy (and alternator) tests on various official
releases of Scylla which it knows how to download. When running old
versions of Scylla, we need to change the configuration options to those
that were needed on specific versions.
On new versions of Scylla we need to pass
--experimental-features=views-with-tablets
to be able to test materialized views, but in older versions we need to
remove that parameter because it didn't exist. We incorrectly removed it
for any versions 2025.1 or earlier, but that's incorrect - it just needs
to be removed for versions strictly earlier than 2025.1 - it is needed
for 2025.1 (I tested it is indeed needed even in the earliers RCs).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#24144
chunked_vector is a replacement for std::vector that avoids large contiguous
allocations.
In this series, we add some missing modifiers and improve quality-of-life for
chunked_vector users (the static_assert patch).
Those modifiers were generally unused since they have O(n) complexity
and therefore not useful for hot paths, but they are used in some
control plane code on vectors which we'd like to replace with chunked_vectors.
A candidate for such a replacement is token_range_vector (see #3335).
This is a prerequisite for fixing some minor stalls; I don't expect we'll backport
fixes to those stalls.
Closesscylladb/scylladb#24162
* github.com:scylladb/scylladb:
utils: chunked_vector: add swap() method
utils: chunked_vector: add range insert() overloads
utils: chunked_vector: relax static_assert
utils: chunked_vector: implement erase() for single elements and ranges
utils: chunked_vector: implement insert() for single-element inserts
Negative load sizes don't make sense, but we've seen a case in
production, where a negative number was returned by ScyllaDB REST API,
so be prepared to handle these too.
Fixes: scylladb/scylladb#24134Closesscylladb/scylladb#24135
Lots of code from this test can be reused in PR #23861. I'm splitting it now in this change so we can merge it cleanly as a separate patch.
Refs #23564Closesscylladb/scylladb#24105
* github.com:scylladb/scylladb:
Refactor out code from test_restore_with_streaming_scopes
Refactor out code from test_restore_with_streaming_scopes
Refactor out code from test_restore_with_streaming_scopes
Refactor out code from test_restore_with_streaming_scopes
Refactor out code from test_restore_with_streaming_scopes
The non-streaming loading of sstables performs cleanup since recently [1]. For vnodes, unfortunately, cleanup is almost unavoidable, because of the nature of vnodes sharding, even if sstable is already clean. This leads to waste of IO and CPU for nothing. Skipping the cleanup in a smart way is possible, but requires too many changes in the code and in the on-disk data. However, the effort will not help existing SSTables and it's going to be obsoleted by tablets some time soon.
Said that, the easiest way to skip cleanup is the explicit --skip-cleanup option for nodetool and respective skip_cleanup parameter for API handler.
New feature, no backport
fixes#24136
refs #12422 [1]
Closesscylladb/scylladb#24139
* github.com:scylladb/scylladb:
nodetool: Add refresh --skip-cleanup option
api: Introduce skip_cleanup query parameter
distributed_loader: Don't create owned ranges if skip-cleanup is true
code: Push bool skip_cleanup flag around
Inserts an iterator range at some position.
Again we insert the range at the end and use std::rotate() to
move the newly inserted elements into place, forgoing possible
optimizations.
Unit tests are added.
Implement using std::rotate() and resize(). The elements to be erased
are rotated to the end, then resized out of existence.
Again we defer optimization for trivially copyable types.
Unit tests are added.
Needed for range_streamer with token_ranges using chunked_vector.
partition_range_compat's unwrap() needs insert if we are to
use it for chunked_vector (which we do).
Implement using push_back() and std::rotate().
emplace(iterator, args) is also implemented, though the benefit
is diluted (it will be moved after construction).
The implementation isn't optimal - if T is trivially copyable
then using std::memmove() will be much faster that std::rotate(),
but this complex optimization is left for later.
Unit tests are added.
One of pytest parameters in test_long_query_timeout_erm.py was
a CQL query containing spaces and special chars such as '*', '(', ')',
'{', '}'. After upgrading to Fedora 42, the test started to
fail with the error "test.pylib.rest_client.HTTPError: HTTP error 404"
with uri=`http://...[SELECT * FROM {}-True-False].dev.1`.
To prevent from such errors, this commit changes the parameter to
a string without spaces and such special characters.
Fixes: scylladb/scylladb#24124Closesscylladb/scylladb#24130
Currently, test.py will delete recursively all .log files under the
testlog directory instead of cleaning only on testlog directory. With
this change it will not go deeper to delete log files. We still have a
method for cleaning the log files in modes directories.
The downside of this solution, that we will need to explicitly tell all
directories that we want to clean.
Fixes: https://github.com/scylladb/scylladb/issues/24001Closesscylladb/scylladb#24004
Apparently `test_kms_network_error` will succeed at any circumstances since most of our exceptions derive from `std::exception`, so whatever happens to the test, for whatever reason it will throw, the test will be marked as passed.
Start catching the exact exception that we expect to be thrown.
Closesscylladb/scylladb#24065
In the test test_tablet_mv_replica_pairing_during_replace we stop 2 out of 4 servers while using RF=2.
Even though in the test we use exactly 4 tablets (1 for each replica of a base table and view), intially,
the tablets may not be split evenly between all nodes. Because of this, even when we chose a server that
hosts the view and a different server that hosts the base table, we sometimes stoped all replicas of the
base or the view table because the node with the base table replica may also be a view replica.
After some time, the tablets should be distributed across all nodes. When that happens, there will be
no common nodes with a base and view replica, so the test scenario will continue as planned.
In this patch, we add this waiting period after creating the base and view, and continue the test only
when all 4 tablets are on distinct nodes.
Fixes https://github.com/scylladb/scylladb/issues/23982
Fixes https://github.com/scylladb/scylladb/issues/23997Closesscylladb/scylladb#24111
compress: fix an internal error when a specific debug log is enabled
While iterating over the recent 69684e16d8,
series I shot myself in the foot by defining `algorithm_to_name(algorithm::none)`
to be an internal error, and later calling that anyway in a debug log.
(Tests didn't catch it because there's no test which simultaneously
enables the debug log and configures some table to have no compression).
This proves that `algorithm_to_name` is too much of a footgun.
Fix it so that calling `algorithm_to_name(algorithm::none)` is legal.
In hindsight, I should have done that immediately.
Fixes#23624
Fix for recently-added code, no backporting needed.
Closesscylladb/scylladb#23625
* github.com:scylladb/scylladb:
test_sstable_compression_dictionaries: reproduce an internal error in debug logging
compress: fix an internal error when a specific debug log is enabled
Refs scylladb/scylla-enterprise#5321
Adds two small test cases, for slight variations on KMIP host config
being missing when rebooting a node, and table/sstable resolution
failing due to this.
Mainly to verify that we fail as expected, without crashing.
Closesscylladb/scylladb#23544
This patch adds a few tests for Alternator over HTTPS (encrypted HTTP,
a.k.a. TLS or SSL). The tests are skipped unless run with "--https", so
they will not be run in CI. Nevertheless, they are useful to improve
our understanding on how DynamoDB works over HTTPS and can be a basis
for adding more tests for HTTPS support. The included tests pass on both
Alternator and AWS DynamoDB.
One test checks that both TLS 1.2 and TLS 1.3 are properly supported,
and if chosen by the client, are actually honored. The same test also
checks that TLS 1.1 is not supported, and results with a proper error
if attempted. Both AWS DynamoDB and Alterator support the same protocols.
Another test verifies that HTTP (unencrypted) requests cannot be sent
over an HTTPS port. This is important for security - an installation
that chooses to allow only HTTPS wants users to only use encrypted
connections, and would not want users to continue sending unencrypted
requests to the HTTPS port.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23493
Currently send_gossip_echo has a 22 seconds timeout
during which _abort_source is ignored.
Use a function-local abort_source to abort
send_gossip_echo either on timeout or if
_abort_source requested abort, and co_return in
the latter case.
Closesscylladb/scylladb#12296
* github.com:scylladb/scylladb:
gossiper: make send_gossip_echo cancellable
gossiper: add send_echo helper
idl, message: make with_timeout and cancellable verb attributes composable
gossiper: failure_detector_loop_for_node: ignore abort_requested_exception
gossiper: failure_detector_loop_for_node: check if abort_requested in loop condition
In this PR, we're adjusting most of the cluster tests so that they pass
with the `rf_rack_valid_keyspaces` configuration option enabled. In most
cases, the changes are straightforward and require little to no additional
insight into what the tests are doing or verifying. In some, however, doing
that does require a deeper understanding of the tests we're modifying.
The justification for those changes and their correctness is included in
the commit messages corresponding to them.
Note that this PR does not cover all of the cluster tests. There are few
remaining ones, but they require a bit more effort, so we delegate that
work to a separate PR.
I tested all of the modified tests locally with `rf_rack_valid_keyspaces`
set to true, and they all passed.
Fixesscylladb/scylladb#23959
Backport: we want to backport these changes to 2025.1 since that's the version where we introduced RF-rack-valid keyspaces in. Although the tests are not, by default, run with `rf_rack_valid_keyspaces` enabled yet, that will most likely change in the near future and we'll also want to backport those changes too. The reason for this is that we want to verify that Scylla works correctly even with that constraint.
Closesscylladb/scylladb#23661
* https://github.com/scylladb/scylladb:
test/cluster/suite.yaml: Enable rf_rack_valid_keyspaces in suite
test/cluster: Disable rf_rack_valid_keyspaces in problematic tests
test/cluster/test_tablets: Divide rack into two to adjust tests to RF-rack-validity
test/cluster/test_tablets: Adjust test_tablet_rf_change to RF-rack-validity
test/cluster/test_tablet_repair_scheduler.py: Adjust to RF-rack-validity
test/pylib/repair.py: Assign nodes to multiple racks in create_table_insert_data_for_repair
test/cluster/test_zero_token_nodes_topology_ops: Adjust to RF-rack-validity
test/cluster/test_zero_token_nodes_no_replication.py: Adjust to RF-rack-validity
test/cluster/test_zero_token_nodes_multidc.py: Adjust to RF-rack-validity
test/cluster/test_not_enough_token_owners.py: Adjust to RF-rack-validity
test/cluster/test_multidc.py: Adjust to RF-rack-validity
test/cluster/object_store/test_backup.py: Adjust to RF-rack-validity
test/cluster: Adjust simple tests to RF-rack-validity
Any empty object of the json::json_list type has its internal
_set variable assigned to false which results in such objects
being skipped by the json::json_builder.
Hence, the json returned by the api GET//compaction_manager/compaction_history
does not contain the field `rows_merged` if a cell in the
system.compaction_history table is null or an empty list.
In such cases, executing the command `nodetool compactionhistory`
will result in a crash with the following error message:
`error running operation: rjson::error (JSON assert failed on condition 'false'`
The patch fixes it by checking if the json object contains the
`rows_merged` element before processing. If the element does
not exist, the nodetool will now produce an empty list.
Fixes https://github.com/scylladb/scylladb/issues/23540Closesscylladb/scylladb#23514
Following a number of similar code cleanup PR, this one aims to be the last one, definitely dropping flat from all reader and related names.
Similarly, v2 is also dropped from reader names, although it still persists in mutation_fragment_v2, mutation_v2 and related names. This won't change in the foreseeable future, as we don't have plans to drop mutation (the v1 variant).
The changes in this PR are entirely mechanical, mostly just search-and-replace.
Code cleanup, no backport required.
Closesscylladb/scylladb#24087
* github.com:scylladb/scylladb:
test/boost/mutation_reader_another_test: drop v2 from reader and related names
test/boost/mutation_reader: s/puppet_reader_v2/puppet_reader/
test/boost/sstable_datafile_test: s/sstable_reader_v2/sstable_mutation_reader/
test/boost/mutation_test: s/consumer_v2/consumer/
test/lib/mutation_reader_assertions: s/flat_reader_assertions_v2/mutation_reader_assertions/
readers/mutation_readers: s/generating_reader_v2/generating_reader/
readers/mutation_readers: s/delegating_reader_v2/delegating_reader/
readers/mutation_readers: s/empty_flat_reader_v2/empty_mutation_reader/
readers/mutation_source: s/make_reader_v2/make_mutation_reader/
readers/mutation_source: s/flat_reader_v2_factory_type/mutation_reader_factory/
readers/mutation_reader: s/reader_consumer_v2/mutation_reader_consumer/
mutation/mutation_compactor: drop v2 from compactor and related names
replica/table: s/make_reader_v2/make_mutation_reader/
mutation_writer: s/bucket_writer_v2/bucket_writer/
readers/queue: drop v2 from reader and related names
readers/multishard: drop v2 from reader and related names
readers/evictable: drop v2 from reader and related names
readers/multi_range: remove flat from name
Almost all of the tests have been adjusted to be able to be run with
the `rf_rack_valid_keyspaces` configuration option enabled, while
the rest, a minority, create nodes with it disabled. Thanks to that,
we can enable it by default, so let's do that.
Some of the tests in the test suite have proven to be more problematic
in adjusting to RF-rack-validity. Since we'd like to run as many tests
as possible with the `rf_rack_valid_keyspaces` configuration option
enabled, let's disable it in those. In the following commit, we'll enable
it by default.
Three tests in the file use a multi-DC cluster. Unfortunately, they put
all of the nodes in a DC in the same rack and because of that, they fail
when run with the `rf_rack_valid_keyspaces` configuration option enabled.
Since the tests revolve mostly around zero-token nodes and how they
affect replication in a keyspace, this change should have zero impact on
them.
We reduce the number of nodes and the RF values used in the test
to make sure that the test can be run with the `rf_rack_valid_keyspaces`
configuration option. The test doesn't seem to be reliant on the
exact number of nodes, so the reduction should not make any difference.
The change boils down to matching the number of created racks to the number
of created nodes in each DC in the auxiliary function `prepare_multi_dc_repair`.
This way, we ensure that the created keyspace will be RF-rack-valid and so
we can run the test file even with the `rf_rack_valid_keyspaces` configuration
option enabled.
The change has no impact on the tests that use the function; the distribution
of nodes across racks does not affect how repair is performed or what the
tests do and verify. Because of that, the change is correct.
We assign the newly created nodes to multiple racks. If RF <= 3,
we create as many racks as the provided RF. We disallow the case
of RF > 3 to avoid trying to create an RF-rack-invalid keyspace;
note that no existing test calls `create_table_insert_data_for_repair`
providing a higher RF. The rationale for doing this is we want to ensure
that the tests calling the function can be run with the
`rf_rack_valid_keyspaces` configuration option enabled.
We assign the nodes to the same DC, but multiple racks to ensure that
the created keyspace is RF-rack-valid and we can run the test with
the `rf_rack_valid_keyspaces` configuration option enabled. The changes
do not affect what the test does and verifies.
We simply assign the nodes used in the test to seprate racks to
ensure that the created keyspace is RF-rack-valid to be able
to run the test with the `rf_rack_valid_keyspaces` configuration
option set to true. The change does not affect what the test
does and verifies -- it only depends on the type of nodes,
whether they are normal token owners or not -- and so the changes
are correct in that sense.
We parameterize the test so it's run with and without enforced
RF-rack-valid keyspaces. In the test itself, we introduce a branch
to make sure that we won't run into a situation where we're
attempting to create an RF-rack-invalid keyspace.
Since the `rf_rack_valid_keyspaces` option is not commonly used yet
and because its semantics will most likely change in the future, we
decide to parameterize the test rather than try to get rid of some
of the test cases that are problematic with the option enabled.
We simply assign DC/rack properties to every node used in the test.
We put all of them in the same DC to make sure that the cluster behaves
as closely to how it would before these changes. However, we distribute
them over multiple racks to ensure that the keyspace used in the test
is RF-rack-valid, so we can also run it with the `rf_rack_valid_keyspaces`
configuration option set to true. The distribution of nodes between racks
has no effect on what the test does and verifies, so the changes are
correct in that sense.
Instead of putting all of the nodes in a DC in the same rack
in `test_putget_2dc_with_rf`, we assign them to different racks.
The distribution of nodes in racks is orthogonal to what the test
is doing and verifying, so the change is correct in that sense.
At the same time, it ensures that the test never violates the
invariant of RF-rack-valid keyspaces, so we can also run it
with `rf_rack_valid_keyspaces` set to true.
We modify the parameters of `test_restore_with_streaming_scopes`
so that it now represents a pair of values: topology layout and
the value `rf_rack_valid_keyspaces` should be set to.
Two of the already existing parameters violate RF-rack-validity
and so the test would fail when run with `rf_rack_valid_keyspaces: true`.
However, since the option isn't commonly used yet and since the
semantics of RF-rack-valid keyspaces will most likely change in
the future, let's keep those cases and just run them with the
option disabled. This way, we still test everything we can
without running into undesired failures that don't indicate anything.
We adjust all of the simple cases of cluster tests so they work
with `rf_rack_valid_keyspaces: true`. It boils down to assigning
nodes to multiple racks. For most of the changes, we do that by:
* Using `pytest.mark.prepare_3_racks_cluster` instead of
`pytest.mark.prepare_3_nodes_cluster`.
* Using an additional argument -- `auto_rack_dc` -- when calling
`ManagerClient::servers_add()`.
In some cases, we need to assign the racks manually, which may be
less obvious, but in every such situation, the tests didn't rely
on that assignment, so that doesn't affect them or what they verify.