When a tool application is invoked with an unknown operation, an error
message is printed, which includes all the known operations, with all
their aliases. This is collected in `std::vector<std::string_view>`. The
problem is that the vector containing alias names, is returned as a
value, so the code ends up creating views to temporaries.
Fix this by returning alias vector with const&.
Fixes: #17584Closesscylladb/scylladb#17586
The helper in question is supposed to spawn a background fiber with
tablet migration stage action and repeat it in case action fails (until
operator intervention, but that's another story). In case action fails
a message with ERROR level is logger about the failure.
This error confuses some tests that scan scylla log messages for
ERROR-s at the end, treat most of them (if not all) as ciritical and
fail. But this particular message is not in fact an error -- topology
coordinator would re-execute this action anyway, so let's demote the
message to be WARN instead.
refs: #17027
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#17568
This series adds a Python script that searches the code for metrics definition and their description.
Because part of the code uses a nonstandard way of definition, it uses a configuration file to resolve parameter values.
The script supports the code that uses string format and string concatenation with variables.
The documentation team will use the results to both document the existing metrics and to get the metrics changes between releases.
Replaces #16328Closesscylladb/scylladb#17479
* github.com:scylladb/scylladb:
Adding scripts/metrics-config.yml
Adding scripts/get_description.py to fetch metrics description
before this change, we failed to apply the filtering of tablestats
command in the right way:
1. `table_filter` failed to check if delimiter is npos before
extract the cf component from the specified table name.
2. the stats should not included the keyspace which are not
included by the filter.
3. the total number of tables in the stats report should contain
all tables no matter they are filtered or not.
in this change, all the problems above are addressed. and the tests
are updated to cover these use cases.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17468
Currently in shard_repair_task_impl::repair_range table name is
retrieved with database::find_column_family and in case of exception,
we return from the function.
But the table name is already kept in table_info passed to repair_range
as an argument. Let's reuse it. If a table is dropped, we will find it
out almost immediately after calling repair_cf_range_row_level and
handle it more adequately.
Closesscylladb/scylladb#17245
For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. The shard_reshaping_compaction_task_impl now groups the sstables based on their compaction groups before reshaping them.
Fixes https://github.com/scylladb/scylladb/issues/16966Closesscylladb/scylladb#17395
* github.com:scylladb/scylladb:
test/topology_custom: add testcase to verify reshape with tablets
test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction
replica/distributed_loader: enable reshape for sstables
compaction: reshape sstables within compaction groups
replica/table : add method to get compaction group id for an sstable
compaction: reshape: update total reshaped size only on success
compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run
couple minor formatting fixes.
Closesscylladb/scylladb#17518
* github.com:scylladb/scylladb:
docs: remove leading space in table element
docs: remove space in words
Introduces collapsible dropdowns for images reference docs. With this update, only the latest version's details will be displayed open by default. Information about previous versions will be hidden under dropdowns, which users can expand as needed. This enhancement aims to make pages shorter and easier to navigate.
Closesscylladb/scylladb#17492
- use API endpoint of /storage_service/toppartition/
- only print out the specified samplings.
- print "\n" separator between samplings
Closesscylladb/scylladb#17574
* github.com:scylladb/scylladb:
tools/scylla-nodetool: print separator between samplings
tools/scylla-nodetool: only print the specified sampling
tools/scylla-nodetool: use /storage_service/toppartition/
This pull request adds dynamic substitutions for the following variables:
* `.. |CURRENT_VERSION| replace:: {current_version}`
* `.. |UBUNTU_SCYLLADB_LIST| replace:: scylla-{current_version}.list`
* `.. |CENTOS_SCYLLADB_REPO| replace:: scylla-{current_version}.repo`
As a result, it is no longer needed to update the "Installation on Linux" page manually after every new release.
Closesscylladb/scylladb#17544
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* node_ops_cmd
* node_ops_cmd_request
their operator<<:s are dropped
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17505
before this change, we assume that debian packaging directory is
always located under `build/debian/debian`. which is hardwired by
`configure.py`. but this could might hold anymore, if we want to
have a self-contained build, in the sense that different builds do
not share the same build directory. this could be a waste for the
non-mult-config build, but `configure.py` uses mult-config generator
when building with CMake. so in that case, all builds still share the
same $build_dir/debian/ directory.
in order to work with the out-of-source build, where the build
directory is not necessarily "build", a new option is added to
`create-relocatable-package.py`, this allows us to specify the directory
where "debian" artifacts are located.
Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17558
in 02de9f1833, we enable building Seastar testing for using the
testing facilities in scylla's own tests. but this brings in
Seastar's tests.
since scylladb's CI builds the "all" targets, and we are not
interested in running Seastar's tests when building scylladb,
let's exclude Seastar's tests from the "all" target.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17554
Print process id to the log at start.
It aids debugging/administering the instance if you have multiple
instances running on the same machine.
Closesscylladb/scylladb#17582
CMake generate debian packages under build/$<CONFIG>/debian instead of
build/$mode/debian. so let's translate $mode to $<CONFIG> if
build.ninja is found under build/ directory, as configure.py puts
build.ninja under $top_srcdir, while CMake puts it under build/ .
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17592
- changes to use build/$<CONFIG> for build directory
- add ${CMAKE_BINARY_DIR}/debian as a dep
- generate deb packages under build/$<CONFIG>/debian
Closesscylladb/scylladb#17560
* github.com:scylladb/scylladb:
build: cmake: generate deb packages under build/$<CONFIG>/debian
build: cmake: add ${CMAKE_BINARY_DIR}/debian as a dep
build: cmake: use build/$<CONFIG>/ instead of build
build: cmake: always pass absolute path for add_stripped()
This commit removes the redundant
"Cluster membership changes and LWT consistency" page.
The page is no longer useful because the Raft algorithm
serializes topology operations, which results in
consistent topology updates.
Closesscylladb/scylladb#17523
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for cache_entry, and drop its
operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17594
This PR updates the procedures that changed as a result of introducing Raft-based topology.
Refs https://github.com/scylladb/scylladb/issues/15934
Applied the updates from https://docs.google.com/document/d/1BgZaYtKHs2GZKAxudBZv4G7uwaXcRt2jM6TK9dctRQg/edit
In addition, it adds a placeholder for the 5.4-to-6.0 upgrade guide, as a file included in that guide, Enable Raft topology, is referenced from other places in the docs.
Closesscylladb/scylladb#17500
* github.com:scylladb/scylladb:
doc: replace "Raft Topology" with "Consistent Topology"
doc: (Raft topology) update Removenode
doc: (Raft topology) update Upscale a Cluster
doc:(Raft topology)update Membership Change Failures
doc: doc: (Raft topology) update Replace Dead Node
doc: (Raft topology) update Remove a Node
doc: (Raft topology) update Add a New DC
doc: (Raft topology) update Add a New Node
doc: (Raft topology) update Create Cluster (EC2)
doc: (Raft topology) update Create Cluster (n-DC)
doc: (Raft topology) update Create Cluster (1DC)
doc: include the quorum requirement file
doc: add the quorum requirement file
doc: add placeholder for Enable Raft topology page
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* std::vector<data_type>
* column_identifier
* column_identifier_raw
* untyped_constant::type_class
and drop their operator<<:s
Refs #13245Closesscylladb/scylladb#17538
* github.com:scylladb/scylladb:
cql3: add fmt::formatter for expression::printer
cql3: add fmt::formatter for raw_value{,_view}
cql3: add fmt::formatter for std::vector<data_type>
cql3: add fmt::formatter for untyped_constant::type_class
cql3: add fmt::formatter for column_identifier{,_row}
this "misspelling" was identified by codespell. actually, it's not
quite a misspelling, as "UPDATE" and "INSERT" are keywords in CQL.
so we intended to emaphasis them, so to make codespell more useful,
and to preserve the intention, let's quote the keywords with backticks.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17391
This PR includes 3 commits:
- **[actions] Add a check for backport labels**: As part of the Automation of ScyllaDB backports project, each PR should get either a `backport/none` or `backport/X.Y` label. Based on this label we will automatically open a backport PR for the relevant OSS release.
In this commit, I am adding a GitHub action to verify if such a label was added. This only applies to PR with a based branch of `master` or `next`. For releases, we don't need this check
- **Add Mergify (https://mergify.com/) configuration file**: In this PR we introduce the `.mergify.yml` configuration file, which
include a set of rules that we will use for automating our backport
process.
For each supported OSS release (currently 5.2 and 5.4) we have an almost
identical configuration section which includes the four conditions before
we open a backport pr:
* PR should be closed
* PR should have the proper label. for example: backport/5.4 (we can
have multiple labels)
* Base branch should be `master`
* PR should be set with a `promoted` label - this condition will be set
automatically once the commits are promoted to the `master` branch (passed
gating)
Once all conditions are applied, the verify bot will open a backport PR and
will assign it to the author of the original PR, then CI will start
running, and only after it pass. we merge
- **[action] Add promoted label when commits are in master**: In Scylla, we don't merge our PR but use ./script/pull_github_pr.sh` to close the pull request, adding `closes scylladb/scylladb <PR number>` remark and push changes to `next` branch.
One of the conditions for opening a backport PR is that all relevant commits are in `master` (passed gating), in this GitHub action, we will go through the list of commits once a push was made to `master` and will identify the relevant PR, and add `promoted` label to it. This will allow Mergify to start the process of backporting
Closesscylladb/scylladb#17365
* github.com:scylladb/scylladb:
[action] Add promoted label when commits are in master
Add mergify (https://mergify.com/) configuration file
[actions] Add a check for backport labels
The semantics of the function was accidentally
modified in 6e79d64. The consequence of the change
was that we didn't limit memory consumption:
the function always returned false for any node
different from the local node. The returned value
is used by storage_proxy to decide whether it
is able to store a hint or not.
This commit fixes the problem by taking other
nodes into consideration again.
Fixes#17636Closesscylladb/scylladb#17639
We need MAIN_BRANCH calculated earlier so we can use it
to checkout the right branch when cloning the src repo
(either `master` or `enterprise`, based on the detected `PRODUCT`)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#17647
The `buildah commit` command doesn't remove the working container. These
accumulate in ~/.local/container/storage until something bad happens.
Fix by adding the `--rm` flag to remove the container and volume.
Closesscylladb/scylladb#17546
before this change, we already have a `fmt::formatter` specialized for
`expression::printer`. but the formatter was implemented by
1. formatting the `printer` instance to an `ostringstream`, and
2. extracting a `std::string` from this `ostringstream`
3. formatting the `std::string` instance to the fmt context
this is convoluted and is not an optimal implementation. so,
in this change, it is reimplemented by formatting directly to
the context. its operator<< is also dropped in this change.
please note, to avoid adding the large chunk of code into the
.hh file, the implementation is put in the .cc file. but in order
to preserve the usage of `transformed(fmt::to_string<expression::printer>)`,
the `format()` function is defined as a template, and instantiated
explicitly for two use cases:
1. to format to `fmt::context`
2. to format using `fmt::to_string()`
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* raw_value
* raw_value_view
`raw_value_view` 's operator<< is still being used by the generic
homebrew printer for vector<>, so it is preserved.
`raw_value` 's operator<< is still being used by the generic
homebrew printer for optional<>, so it's preserved as well.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.
Since scylladb/scylladb#15924 is the last issue mentioned in
scylladb/scylladb#15962, this PR also reenables background
writes in `test_topology_ops` with tablets disabled. The test
doesn't pass with tablets and background writes because of
scylladb/scylladb#17025. We will reenable background writes
with tablets after fixing that issue.
Fixesscylladb/scylladb#15924Fixesscylladb/scylladb#15962Closesscylladb/scylladb#17585
* github.com:scylladb/scylladb:
test: test_topology_ops: reenable background writes without tablets
test: test_topology_ops: run with and without tablets
test: topology: decrease the server's request timeouts
Tests that verify upgrading to the raft-based topology
(`test_topology_upgrade`, `test_topology_recovery_basic`,
`test_topology_recovery_majority_loss`) have flaky
`check_system_topology_and_cdc_generations_v3_consistency` calls.
`assert topo_results[0] == topo_res` can fail because of different
`unpublished_cdc_generations` on different nodes.
The upgrade procedure creates a new CDC generation, which is later
published by the CDC generation publisher. However, this can happen
after the upgrade procedure finishes. In tests, if publishing
happens just before querying `system.topology` in
`check_system_topology_and_cdc_generations_v3_consistency`, we can
observe different `unpublished_cdc_generations` on different nodes.
It is an expected and temporary inconsistency.
For the same reasons,
`check_system_topology_and_cdc_generations_v3_consistency` can
fail after adding a new node.
To make the tests not flaky, we wait until the CDC generation
publisher finishes its job. Then, all nodes should always have
equal (and empty) `unpublished_cdc_generations`.
Fixesscylladb/scylladb#17587Fixesscylladb/scylladb#17600Fixesscylladb/scylladb#17621Closesscylladb/scylladb#17622
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for std::vector<data_type>,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for untyped_constant::type_class,
and drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
* column_identifier
* column_identifier_raw
and their operator<<:s are dropped.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Changing config under the guard can cause a deadlock.
The guard holds _read_apply_mutex. The same lock is held by the group0
apply() function. It means that no entry can be applied while the guard
is held and raft apply fiber may be even sleeping waiting for this lock
to be release. Configuration change OTOH waits for the config change
command to be committed before returning, but the way raft is implement
is that commit notifications are triggered from apply fiber which may
be stuck. Deadlock.
Drop and re-take guard around configuration changes.
Fixesscylladb/scylladb#17186
After fixing scylladb/scylladb#15924 in one of the previous
patches, we reenable background writes in `test_topology_ops`.
We also start background writes a bit later after adding all nodes.
Without this change and with tablets, the test fails with:
```
> await cql.run_async(f"CREATE TABLE tbl (pk int PRIMARY KEY, v int)")
E cassandra.protocol.ConfigurationException: <Error from server: code=2300
[Query invalid because of configuration issue] message="Datacenter
datacenter1 doesn't have enough nodes for replication_factor=3">
```
The change above makes the test a bit weaker, but we don't have to
worry about it. If adding nodes is bugged, other tests should
detect it.
Unfortunately, the test still doesn't pass with tablets and
background writes because of scylladb/scylladb#17025, so we keep
background writes disabled with tablets and leave FIXME.
Fixesscylladb/scylladb#15962
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.
A high server's request timeout can slow down the topology tests
(see the new comment in `make_scylla_conf`). We make the timeout
dependent on the testing mode to not slow down tests for no reason.
We don't touch the driver's request timeout. Decreasing it in some
modes would require too much effort for almost no improvement.
Fixesscylladb/scylladb#15924
node.rs pointer can be freed while guard is released, so it cannot be
accessed during error processing. Save state locally.
Fixes#17577
Message-ID: <Zd9keSwiIC4v_EiF@scylladb.com>
The RPC is used by group0 now which is available only on shard0
Fixesscylladb/scylladb#17565
* 'gleb/migration-request-shard0' of github.com:scylladb/scylla-dev:
raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
migration_manager: fix indentation after the previous patch.
messaging_service: process migration_request rpc on shard 0