Currently, the tests in test/cql-pytest can be run against both ScyllaDB and Cassandra.
Running the test for either will first output the test results, and subsequently
print the stdout output of the process under test. Using the command line
option --omit-scylla-output it is possible to disable this print for Scylla,
but it is not possible for tests run against Cassandra.
This change adds the option to suppress output for Cassandra tests, too. By default,
the stdout of the Cassandra run will still be printed after the test results, but
this can now be disabled with --omit-scylla-output
Closesscylladb/scylladb#17996
Fixesscylladb/scylladb#17513
* 'gleb/raft-snitch-change-v3' of github.com:scylladb/scylla-dev:
doc: amend snitch changing procedure to work with raft
test: add test to check that snitch change takes effect.
raft topology: update rack/dc info in topology state on reboot if changed
To change snitch with raft all nodes need to be started simultaneously
since each node will try to update its state in the raft and for that
quorum is required.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, instead of printing the `unique_ptr` instance, we
print the pointee of it. since `server_impl` uses pimpl paradigm,
`_fsm` is always valid after `server_impl::start()`, we can always
deference it without checking for null.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17953
The test creates two node cluster with default snitch (SimpleSnitch) and
checks that dc and rack names are as expected. Then it changes the
config to use GossipingPropertyFileSnitch with different names, restart
nodes and check that now peers table has new names.
before this change, we rely on the default-generated fmt::formatter
created from operator<<. but this depends on the
`FMT_DEPRECATED_OSTREAM` macro which is not respected in {fmt} v10.
this change addresses the formatting with fmtlib < 10, and without
`FMT_DEPRECATED_OSTREAM` defined. please note, in {fmt} v10 and up,
it defines formatter for classes derived from `std::exception`, so
our formatter is only added when compiled with {fmt} < 10.
in this change, `fmt::formatter<service::wait_for_ip_timeout>` is
added for backward compatibility with {fmt} < 10.
Refs scylladb#13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17955
before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated
at the first run of `configure.py`, once these files are around, they
are not updated despite that `SCYLLA_VERSION_GEN` does not generate
them as long as the release string retrieved from git sha1 is identical
the one stored in `SCYLLA-RELEASE-FILE`, because we don't rerun
`SCYLLA_VERSION_GEN` at all.
but the pain is, when performing incremental build, like other built
artifacts, these generated files stay with the build directory, so
even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE
keeps the same -- it still contains the original git sha1 when it
was created. this could leads to confusion if developer or even our
CI perform incremental build using the same workspace and build
directory, as the built scylla executables always report the same
version number.
in this change, we always rebuilt the said
SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct ninja
to re-stat the output files, see
https://ninja-build.org/manual.html#ref_rule, in order to avoid
unnecessary rebuild. so the downside is that `SCYLLA_VERSION_GEN`
is executed every time we run `ninja` even if all targets are updated.
but the upside is that the release number reported by scylla is
accurate even if we perform incremental build.
also, since we encode the product, version and release stored
in the above files in the generated `build.ninja` file, in this change,
these three files are added as dependencies of `build.ninja`,
so that this file is regenerated if any of them is newer than
`build.ninja`.
Fixes#8255
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17974
before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated
when CMake generate `build.ninja` for the first time, once these files
are around, they are not updated anymore. despite that
`SCYLLA_VERSION_GEN` does not generate them as long as the release
string retrieved from git sha1 is identical the one stored in
`SCYLLA-RELEASE-FILE`, because we don't rerun `SCYLLA_VERSION_GEN` at
all.
but the pain is, when performing incremental build, like other built
artifacts, these generated files stay with the build directory, so
even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE
keeps the same -- it still contains the original git sha1 when it
was created. this could leads to confusion if developer or even our
CI perform incremental build using the same workspace and build
directory, as the built scylla executables always report the same
version number.
in this change, we always rebuilt the said
SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct CMake
to regenerate `build.ninja` if any of these files is updated.
Fixes#17975
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17983
codespell workflow checks for misspellings to identify common
mispellings. it considers "raison" in "raison d'etre" (the accent
mark over "e" is removed , so the commit message can be encoded in
ASCII), to the misspelling of "reason" or "raisin". apparently, the
dictionary it uses does not include les mots francais les plus
utilises.
so, in this change, let's ignore "raison" for this very use case,
before we start the l10n support of the document.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17985
since we already have `format_as()` for `cql3_type::raw`, there is no
need to provide `cql3_type::raw` if the tree is compiled with {fmt} >= 10,
otherwise compiler is not able to figure out which one to match, see the
errror at the end of this commit message. so, in this change, we only
provide the specialized `fmt::formatter` for `cql3_type::raw` when
{fmt} < 10. this should address the FTBFS with {fmt} >= 10.
```
/usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1040:25: error: ambiguous partial specializations of 'formatter<cql3::cql3_type::raw>'
1040 | = __bool_constant<__is_constructible(_Tp, _Args...)>;
| ^
/usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1046:16: note: in instantiation of template type alias '__is_constructible_impl' requested here
1046 | : public __is_constructible_impl<_Tp, _Args...>
| ^
/usr/include/fmt/core.h:1420:13: note: in instantiation of template class 'std::is_constructible<fmt::formatter<cql3::cql3_type::raw>>' requested here
1420 | !has_formatter<T, Context>::value))>
| ^
/usr/include/fmt/core.h:1421:22: note: while substituting prior template arguments into non-type template parameter [with T = cql3::cql3_type::raw]
1421 | FMT_CONSTEXPR auto map(const T&) -> unformattable_pointer {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1422 | return {};
| ~~~~~~~~~~
1423 | }
| ~
```
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17986
When openning a backport PR, adding a reference to the original PR.
This will be used later for updating the original PR/issue once the
backport is done (with different label)
Closesscylladb/scylladb#17973
The loader is writing to pending replica even when write selector is set
to previous. If migration is reverted, then the writes won't be rolled
back as it assumes pending replicas weren't written to yet. That can
cause data resurrection if tablet is later migrated back into the same
replica.
NOTE: write selector is handled correctly when set to next, because
get_natural_endpoints() will return the next replica set, and none
of the replicas will be considered leaving. And of course, selector
set to both is also handled correctly.
Fixes#17892.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#17902
To cause the stale topology exception the test reads
the version from the last bootstrapped host and assigns its decremented
value to version and fence_version fields of system.topology.
The test assumes that version == fence_version here, if version
is greater than fence_version we won't get state topology
exception in this setup. Tablet balancer can break
this -- it may increment the version after the last node is
bootstrapped.
Fix this by disabling the tablet balancer earlier.
fixesscylladb/scylladb#17807Closesscylladb/scylladb#17940
This patch introduces raft-based service levels.
The difference to the current method of working is:
- service levels are stored in `system.service_levels_v2`
- reads are executed with `LOCAL_ONE`
- writes are done via raft group0 operation
Service levels are migrated to v2 in topology upgrade.
After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then)
Fixes#17926Closesscylladb/scylladb#16585
* github.com:scylladb/scylladb:
test: test service levels v2 works in recovery mode
test: add test for service levels migration
test: add test for service levels snapshot
test:topology: extract `trigger_snapshot` to utils
main: create raft dda if sl data was migrated
service:qos: store information about sl data migration
service:qos: service levels migration
main: assign standard service level DDA before starting group0
service:qos: fix `is_v2()` method
service:qos: add a method to upgrade data accessor
test: add unit_test_raft_service_levels_accessor
service:storage_service: add support for service levels raft snapshot
service:qos: add abort_source for group0 operations
service:qos: raft service level distributed data accessor
service:qos: use group0_guard in data accessor
cql3:statements: run service level statements on shard0 with raft guard
test: fix overrides in unit_test_service_levels_accessor
service:qos: fix indentation
service:qos: coroutinize some of the methods
db:system_keyspace: add `SERVICE_LEVELS_V2` table
service:qos: extract common service levels' table functions
In this PR, we ensure unpublished CDC generation's data is
never removed, which was theoretically possible. If it happened,
it could cause problems. CDC generation publisher would then try
to publish the generation with its data removed. In particular, the
precondition of calling `_sys_ks.read_cdc_generation` wouldn't be
satisfied.
We also add a test that passes only after the fix. However, this test
needs to block execution of the CDC generation publisher's loop
twice. Currently, error injections with handlers do not allow it
because handlers always share received messages. Apart from the
first created handler, all handlers would be instantly unblocked by
a message from the past that has already unblocked the first
handler. This seems like a general limitation that could cause
problems in the future, so in this PR, we extend injections with
handlers to solve it once and for all. We add the `share_messages`
parameter to the `inject` (with handler) function. Depending on its
value, handlers will share messages (as before) or not.
Fixesscylladb/scylladb#17497Closesscylladb/scylladb#17934
* github.com:scylladb/scylladb:
topology_coordinator: clean_obsolete_cdc_generations: fix log
topology_coordinator: do not clear unpublished CDC generation's data
topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages
error_injection: allow injection handlers to not share messages
In this PR we add timeouts support to raft groups registry. We introduce
the `raft_server_with_timeouts` class, which wraps the `raft::server`
add exposes its interface with additional `raft_timeout` parameter. If
it's set, the wrapper cancels the `abort_source` after certain amount of
time. The value of the timeout can be specified either in the
`raft_timeout` parameter, or the default value can be set in `the
raft_server_with_timeouts` class constructor.
The `raft_group_registry` interface is extended with
`group0_with_timeouts()` method. It returns an instance of
`raft_server_with_timeouts` for group0 raft server. The timeout value
for it is configured in `create_server_for_group0`. It's one minute by
default and can be overridden for tests with
`group0-raft-op-timeout-in-ms` parameter.
The new api allows the client to decide whether to use timeouts or not.
In this PR we are reviewing all the group0 call sites and add
`raft_timeout` if that makes sense. The general principle is that if the
code is handling a client request and the client expects a potential
error, we use timeouts. We don't use timeouts for background fibers
(such as topology coordinator), since they wouldn't add much value. The
only thing the background fiber can do with a timeout is to retry, and
this will have the same end effect as not having a timeout at all.
Fixesscylladb/scylladb#16604Closesscylladb/scylladb#17590
* github.com:scylladb/scylladb:
migration_manager: use raft_timeout{}
storage_service::join_node_response_handler: use raft_timeout{}
storage_service::start_upgrade_to_raft_topology: use raft_timeout{}
storage_service::set_tablet_balancing_enabled: use raft_timeout{}
storage_service::move_tablet: use raft_timeout{}
raft_check_and_repair_cdc_streams: use raft_timeout{}
raft_timeout: test that node operations fail properly
raft_rebuild: use raft_timeout{}
do_cluster_cleanup: use raft_timeout{}
raft_initialize_discovery_leader: use raft_timeout{}
update_topology_with_local_metadata: use with_timeout{}
raft_decommission: use raft_timeout{}
raft_removenode: use raft_timeout{}
join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
raft_group0: make_raft_config_nonvoter: add raft_timeout parameter
raft_group0: make_raft_config_nonvoter: add abort_source parameter
manager_client: server_add with start=false shouldn't call driver_connect
scylla_cluster: add seeds parameter to the add_server and servers_add
raft_server_with_timeouts: report the lost quorum
join_node_request_handler: add raft_timeout{} for start_operation
skip_mode: add platform_key
auth: use raft_timeout{}
raft_group0_client: add raft_timeout parameter
raft_group_registry: add group0_with_timeouts
utils: add composite_abort_source.hh
error_injection: move api registration to set_server_init
error_injection: add inject_parameter method
error_injection: move injection_name string into injection_shared_data
error_injection: pass injection parameters at startup
Reduce the sprawl of sstables::test_env in .cc and .hh files, to ease
maintenance and reduce recompilations.
Closesscylladb/scylladb#17965
* github.com:scylladb/scylladb:
test: sstables::test_env: complete pimplification
test/lib: test_env: move test_env::reusable_sst() to test_services.cc
* rename `sync_labels.yaml` to `sync-labels.yaml`
* use more descrptive name for workflow
Closesscylladb/scylladb#17971
* github.com:scylladb/scylladb:
github: sync-labels: use more descriptive name for workflow
github: sync_labels: rename sync_labels to sync-labels
When a keyspace uses tablets, then effective ownership
can be obtained per table. If the user passes only a
keyspace, then /storage_service/ownership/{keyspace}
returns an error.
This change:
- adds an additional positional parameter to 'status'
command that allows a user to query status for table
in a keyspace
- makes usage of /storage_service/ownership/{keyspace}
optional to avoid errors when user tries to obtain
effective ownership of a keyspace that uses tablets
- implements new frontend tests in 'test_status.py'
that verify the new logic
Refs: scylladb#17405
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#17827
"label-sync" is not very helpful for developers to understand what
this workflow is for.
the "name" field of a job shows in the webpage on github of the
pull request against which the job is performed, so if the author
or reviewer checks the status of the pull request, he/she would
notice these names aside of the workflow's name. for this very
job, what we have now is:
```
Sync labels / label-sync
```
after this change it will be:
```
Sync labels / Synchronize labels between PR and the issue(s) fixed by it
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Create `raft_service_levels_distributed_data_accessor` if service levels
were migrated to v2 table.
This supports raft recovery mode, as service levels will be read from v2
table in the mode.
Save information whether service levels data was migrated to v2 table.
The information is stored in `system.scylla_local` table. It's
written with raft command and included in raft snapshot.
Migrate data from `system_distributes.service_levels` to
`system.service_levels_v2` during raft topology upgrade.
Migration process reads data from old table with CL ALL
and inserts the data to the new table via raft.
`raft_service_level_distributed_data_accessor` works this way:
- on read path it reads service levels from `SYSTEM.SERVICE_LEVELS_V2`
table with CL = LOCAL_ONE
- on write path it starts group0 operation and it makes the change
using raft command
Adjust service_level_controller and
service_level_controller::service_level_distributed_data_accessor
interfaces to take `group0_guard` while adding/altering/dropping a
service level.
To migrate service levels to be raft managed, obtain `group0_guard` to
be able to pass it to service_level_controller's methods.
Using this mechanism also automatically provides retries in case of
concurrent group0 operation.
The table has the same schema as `system_distributed.service_levels`.
However it's created entirely at once (unlike old table which creates
base table first and then it adds other columns) because `system` tables
are local to the node.
Getting a service level(s) will be done the same way in raft-based
service levels as it's done in standard service levels, so those
funtions are extracted to reused it.
sstables::test_env uses the pimpl idiom, but incompletely. This
prevents reaping some of the benefits.
Complete the pimplification:
- the `impl` nested struct is moved out-of-line
- all non-template member functions are moved out-of-line
- a destructor is declared and defined out-of-line
- the move constructor is also defined (necessary after the destructor is
defined)
After this, we can forward-declare more components.
test_env implementation is scattered around two .cc, concentrate it
in test_services.cc, which happens to be the file that doesn't cause
link errors.
Move toc_filename with it, as it is its only caller and it is static.