In topology on raft, management of CDC generations is moved to the
topology coordinator. We extend the topology recovery test to verify
that the CDC keeps working correctly during the whole recovery
process. In particular, we test that after restarting nodes in the
recovery mode, they correctly use the active CDC generation created
by the topology coordinator. A node restarting in the recovery mode
should learn about the active generation from `system.cdc_local`
(or from gossip, but we don't want to rely on it). Then, it should
load its data from `system.cdc_generations_v3`.
Fixesscylladb/scylladb#17409
This patch allows us to restart writing (to the same table with
CDC enabled) with a new CQL session. It is useful when we want to
continue writing after closing the first CQL session, which
happens during the `reconnect_driver` call. We must stop writing
before calling `reconnect_driver`. If a write started just before
the first CQL session was closed, it would time out on the client.
We rename `finish_and_verify` - `stop_and_verify` is a better
name after introducing `restart`.
When the node with CDC enabled and with the topology on raft
disabled bootstraps, it reads system.cdc_local for the last
generation. Nodes with both enabled use group0 to get the last
generation.
In the following scenario with a cluster of one node:
1. the node is created with CDC and the topology on raft enabled
2. the user creates table T
3. the node is restarted in the recovery mode
4. the CDC log of T is extended with new entries
5. the node restarts in normal mode
The generation created in the step 3 is seen in
system_distributed.cdc_generation_timestamps but not in
system.cdc_generations_v3, thus there are used streams that the CDC
based on raft doesn't know about. Instead of creating a new
generation, the node should use the generation already committed
to group0.
Save the last CDC generation in the system.cdc_local during loading
the topology state so that it is visible for CDC not based on raft.
Fixesscylladb/scylladb#17819
Separate keyspace which also behaves as system brings
little benefit while creating some compatibility problems
like schema digest mismatch during rollback. So we decided
to move auth tables into system keyspace.
Fixes https://github.com/scylladb/scylladb/issues/18098Closesscylladb/scylladb#18769
this series disables operator<<:s for vector and unordered_map, and drop operator<< for mutation, because we don't have to keep it to work with these operator:s anymore. this change is a follow up of https://github.com/scylladb/seastar/issues/1544
this change is a cleanup. so no need to backport
Closesscylladb/scylladb#18866
* github.com:scylladb/scylladb:
mutation,db: drop operator<< for mutation and seed_provider_type&
build: disable operator<< for vector and unordered_map
db/heat_load_balance: include used header
test: define a more generic boost_test_print_type
test/boost: define fmt::formatter for service_level_controller_test.cc
test/boost: include test/lib/test_utils.hh
in theory, std::result_of_t should have been removed in C++20. and
std::invoke_result_t is available since C++17. thanks to libstdc++,
the tree is compiling. but we should not rely on this.
so, in this change, we replace all `std::result_of_t` with
`std::invoke_result_t`. actually, clang + libstdc++ is already warning
us like:
```
In file included from /home/runner/work/scylladb/scylladb/multishard_mutation_query.cc:9:
In file included from /home/runner/work/scylladb/scylladb/schema/schema_registry.hh:11:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/unordered_map:38:
Warning: /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/type_traits:2624:5: warning: 'result_of<void (noop_compacted_fragments_consumer::*(noop_compacted_fragments_consumer &))()>' is deprecated: use 'std::invoke_result' instead [-Wdeprecated-declarations]
2624 | using result_of_t = typename result_of<_Tp>::type;
| ^
/home/runner/work/scylladb/scylladb/mutation/mutation_compactor.hh:518:43: note: in instantiation of template type alias 'result_of_t' requested here
518 | if constexpr (std::is_same_v<std::result_of_t<decltype(&GCConsumer::consume_end_of_stream)(GCConsumer&)>, void>) {
|
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18835
When selecting from mutation_fragments(table) one may want to apply
token() filtering againts partition key. This doesn't work currently,
but used to crash. This patch adds a regression test for that
refs: #18637
refs: #18768
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18759
this change is inspired by clang-tidy. it warns like:
```
[752/852] Building CXX object service/CMakeFiles/service.dir/migration_manager.cc.o
Warning: /home/runner/work/scylladb/scylladb/service/migration_manager.cc:891:71: warning: 'view' used after it was moved [bugprone-use-after-move]
891 | db.get_notifier().before_create_column_family(*keyspace, *view, mutations, ts);
| ^
/home/runner/work/scylladb/scylladb/service/migration_manager.cc:886:86: note: move occurred here
886 | auto mutations = db::schema_tables::make_create_view_mutations(keyspace, std::move(view), ts);
| ^
```
in which, `view` is an instance of view_ptr which is a type with the
semantics of shared pointer, it's backed by a member variable of
`seastar::lw_shared_ptr<const schema>`, whose move-ctor actually resets
the original instance. so we are actually accessing the moved-away
pointer in
```c++
db.get_notifier().before_create_column_family(*keyspace, *view, mutations, ts)
```
so, in this change, instead of moving away from `view`, we create
a copy, and pass the copy to
`db::schema_tables::make_create_view_mutations()`. this should be fine,
as the behavior of `db::schema_tables::make_create_view_mutations()`
does not rely on if the `view` passed to it is a moved away from it or not.
the change which introduced this use-after-move was 88a5ddabce
Refs 88a5ddabceFixes#18837
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18838
since we've migrated away from the generic homebrew formatters
for range-alike containers, there is no need to keep there operator<<
around -- they were preserved in order to work with the container
formatters which expect operator<< of the elements.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
seastar provides an option named `Seastar_DEPRECATED_OSTREAM_FORMATTERS`
to enable the operator<< for `std::vector` and `std::unordered_map`,
and this option is enabled by default. but we intent to avoid using
them, so that we can use the fmt::formatter specializations when
Boost.test prints variables. if we keep these two operator<< enabled,
Boost.test would use them when printing variables to be compaired
then the check fails, but if elements in the vector or unordered_map
to be compaired does do not provide operator<<, compiling would fail.
so, in this change, let's disable these operator<< implementations.
this allows us to ditch the operator<< implementations which are
preserved only for testing.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
in this header, we use `hr_logger.trace("returned _pp={}", p)` to
print a `vector<float>`, so we we need to include `fmt/ranges.h`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
fmt::is_formattable<T>::value is false, even if
* T is a container of U, and
* fmt::is_formattable<U>, and
* U can be formatted using fmt::formatter
so, we have to define a more generic boost_test_print_type()
for the all types supported by {fmt}. it will help us to ditch the
operator<< for vector and unordered_map in Seastar, and allow us
to use the fmt::formatter specialization of the element
types.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
since we are moving away for operator<< based formatter, more and more
types now only have {fmt} based formatters. the same will apply to the
STL container types after ditching the generic homebrew formatter in
to_string.hh, so to be prepared for the change, let's add the
fmt::formatter for tests as well.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change was created in the same spirit of 505900f18f. because
we are deprecating the operator<< for vector and unorderd_map in
Seastar, some tests do not compile anymore if we disable these
operators. so to be prepared for the change disabling them, let's
include test/lib/test_utils.hh for accessing the printer dedicated
for Boost.test. and also '#include <fmt/ranges.h>' when necessary,
because, in order to format the ranges using {fmt}, we need to
use fmt/ranges.h.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The modified lines of code intend to await the first appearance of a log
on one of the nodes.
But due to misplaced parentheses, instead of creating a list of log-awaiting
tasks with a list comprehension, they pass a generator expression to
asyncio.create_task().
This is nonsense, and it fails immediately with a type error.
But since they don't actually check the result of the await,
the test just assumes that the search completed successfully.
This was uncovered by an upgrade to Python 3.12, because its typing is stronger
and asyncio.create_task() screams when it's passed a regular generator.
This patch fixes the bad list comprehension, and also adds an error check
on the completed awaitables (by calling `await` on them).
Fixes#18740Closesscylladb/scylladb#18754
This doesn't apply for auth-v2 as we improved data placement and
removed cassandra quirk which was setting different CL for some
default superuser involved operations.
Fixes#18773Closesscylladb/scylladb#18785
This commit enables publishing documentation
from branch-6.0. The docs will be published
as UNSTABLE (the warning about version 6.0
being unstable will be displayed).
Closesscylladb/scylladb#18832
File-based tablet streaming calls every shard to return data of every
group that intersects with a given range.
After dynamic group allocation, that breaks as the tablet range will
only be present in a single shard, so an exception is thrown causing
migration to halt during streaming phase.
Ideally, only one shard is invoked, but that's out of the scope of this
fix and compaction_groups_for_token_range() should return empty result
if none of the local groups intersect with the range.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18798
This commit fixes the incorrect Raft-related information on the Handling Cluster Membership Change Failures page
introduced with https://github.com/scylladb/scylladb/pull/17500.
The page describes the procedure for when Raft is disabled. Since 6.0, Raft for consistent schema management
is enabled and mandatory (cannot be disabled), this commit adds the procedure for Raft-enabled setups.
Closesscylladb/scylladb#18803
Since commit 952dfc6157 "repair: Introduce
repair_partition_count_estimation_ratio config option", get_config() is
used. We need to include db/config.hh for that.
Spotted when backporting to 5.4 branch.
Refs #18615Closesscylladb/scylladb#18780
As part of the Alternator test suite, we check Alternator's support for
authentication. Alternator maps Scylla's existing CQL roles to AWS's
authentication:
* AWS's access_key_id <- the name of the CQL role
* AWS's secret_access_key <- the salted hash of the password of the CQL role
Before this patch, the Alternator test suite created a new role with a
preset salted hash (role "alternator", salted hash "secret_pass")
and than used that in the tests. However, with the advent of Raft-based
metadata it is wrong to write directly to the roles table, and starting
with #17952 such writes will be outright forbidden.
But we don't actually need to create a new CQL role! We already have
a perfectly good CQL role called "cassandra", and our tests already use
it. So what this patch does is to have the Alternator tests (conftest.py)
read from the roles system-table the salted hash of the "cassandra" role,
and then use that - instead of the hard-coded pair alternator/secret_pass -
in the tests.
A couple more tests assumed that the role name that was used was
"alternator", but now it was changed to "cassandra" so those tests
needed minor fixes as well.
After this patch, the Alternator tests no longer *write* to the roles
system table. Moreover, after this patch, test/alternator/run and
test/alternator/suite.yaml (used when testing with test.py) no longer
need to do extra ugly CQL setup before starting the Alternator tests.
Fixes#18744
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#18771
* seastar 42f15a5f...914a4241 (33):
> sstring: deprecate formatters for vector and unordered_map
> github: use fedora:40 image for testing
> github: add 2 testing combinations back to the matrix
> github: extract test.yaml into a resusable workflow
> build: use initial-exec TLS when building seastar as shared library
> coroutine: preserve this->container before calling dtor
> smp: allocate hugepages eagerly when kernel support is available
> shared_mutex: Add tests for std::shared_lock and std::unique_lock
> shared_mutex: Add RAII locks
> README.md: replace C++17 with C++23
> treewide: do not check for SEASTAR_COROUTINES_ENABLED
> build: support enabled options when building seastar-module
> treewide: include required header files
> build: move add_subdirectory(src) down
> README.md: replace CircleCI badge with GitHub badge
> weak_ptr: Make it possible to convert to "compatible" pointers
> circleci: remove circleci CI tests
> build: use DPDK_MACHINE=haswell when testing dpdk build on github-hosted runner
> build: add --dpdk-machine option to configure.py
> build: stop translating -march option to names recognized by DPDK
> github: encode matrix.enables in cache key
> doc/prometheus.md: add metrics? in URL exporter URI
> tests/unit/metrics_tester: use deferred_stop() when appropriate
> httpd: mark http_server_control::stop() noexcept
> reactor: print scheduling group along with backtrace
> reactor: update lowres_clock when max_task_backlog is exceeded
> tests: add test for prometheus exporter
> tests: move apps/metrics_tester to tests/unit
> apps/metrics_tester: keep metrics with "private" labels
> apps/metrics_tester: support "labels" in conf.yaml
> apps/metrics_tester: stop server properly
> apps/metrics_tester: always start exporter
> apps/metrics_tester: fix typo in conf-example.yaml
Closesscylladb/scylladb#18800
There's a test that checks system.tablets contents to see that after
changing ks replication factor via ALTER KEYSPACE the tablet map is
updated properly. This patch extends this test that also validates that
mutations themselves are replicated according to the desired replication
factor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18644
Coroutinization will help improve readability and allow easier changes planned for this code.
This work was separated from https://github.com/scylladb/scylladb/pull/17910 to make it smoother to review and merge.
Closesscylladb/scylladb#18788
* github.com:scylladb/scylladb:
cql3: coroutinize create/alter/drop service levels
auth: coroutinize alter_role and drop_role
auth: coroutinize grant_permissions and revoke_permissions
auth: coroutinize create_role
cql3: statements: co-routinize auth related statements
cql3: statements: release unused guard explicitly in auth related statements
The sl-controller is stopped in three steps. The first (and instantly the second) is unsubscribing from lifecycle notification and draining. The third is stop itself. First two steps are "out of order" as compared to the desired start-stop sequence of any service, this patch fixes these steps.
After this PR the drain_on_shutdown() (the call that drains the node upon stop) finally becomes clean and tidy and is no longer accompanied by ad-hoc fellow drains/stops/aborts/whatever.
refs: #2737Closesscylladb/scylladb#18731
* github.com:scylladb/scylladb:
sl_controller: Remove drain() method
sl_controller: Move abort kicking into do_abort()
main,sl_controller: Subscribe for early abort
main: Unsubscribe sl controller next to subscribing
This commit updates the documentation about Raft in version 6.0.
- "Introduction": The outdated information about consistent topology updates not being supported
is removed and replaced with the correct information.
- "Enabling Raft": The relevant information is moved to other sections. The irrelevant information
is removed. The section no longer exists.
- "Verifying that the Raft upgrade procedure finished successfully" - moved under Schema
(in the same document). I additionally removed the include saying that after you verify
that schema on Raft is enabled, you MUST enable topology changes on Raft (it is not mandatory;
also, it should be part of the upgrade guide, not the Raft document).
- Unnecessary or incorrect references to versions are removed.
Refs https://github.com/scylladb/scylladb/issues/18580Closesscylladb/scylladb#18689
This commit replaces the 5.4-to-5.5 upgrade guide with the 5.4-to-6.0 upgrade guide,
including the metrics update information.
The guide references the "Enable Consistent Topology Updates" document,
as enabling consistent topology updates is a new step when upgrading to version 6.0.
Also, a procedure for image upgrades has been added (as verified by @yaronkaikov).
Fixesscylladb/scylladb#18254Fixesscylladb/scylladb#17896
Refs scylladb/scylladb#18580Closesscylladb/scylladb#18728
Currently, we do not explicitly set a scheduling group for the schema
commitlog which causes it to run in the default scheduling group (called
"main"). However:
- It is important and significant enough that it should run in a
scheduling group that is separate from the main one,
- It should not run in the existing "commitlog" group as user writes may
sometimes need to wait for schema commitlog writes (e.g. read barrier
done to learn the schema necessary to interpret the user write) and we
want to avoid priority inversion issues.
Therefore, introduce a new scheduling group dedicated to the schema
commitlog.
Fixes: scylladb/scylladb#15566Closesscylladb/scylladb#18715
Update `docs/dev/isolation.d`:
* Update the list of scheduling groups
* Remove IO priority groups (they were folded into scheduling groups)
* Add section on RPC isolation
Closesscylladb/scylladb#18749
* github.com:scylladb/scylladb:
docs: isolation.md: add section on RPC call isolation
docs: isolation.md: remove mention of IO priority groups
docs: isolation.md: update scheduling group list, add aliases
we are not interseted in the code coverage of abseil library, so no need
to apply the compiling options enabling the coverage instrumentation
when building the abseil library.
moreover, since the path of the file passed to `-fprofile-list` is a relative
path. when building with coverage enabled, the build fails when building
abseil, like:
```
/usr/lib64/ccache/clang++ -I/jenkins/workspace/scylla-master/scylla-ci/scylla/abseil -std=c++20 -I/jenkins/workspace/scylla-master/scylla-ci/scylla/seastar/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/debug/seastar/gen/include -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEBUG_PROMISE -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DBOOST_NO_CXX98_FUNCTION_BASE -DFMT_SHARED -I/usr/include/p11-kit-1 -fprofile-instr-generate -fcoverage-mapping -fprofile-list=./coverage_sources.list -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/strings/CMakeFiles/strings.dir/str_cat.cc.o -MF absl/strings/CMakeFiles/strings.dir/str_cat.cc.o.d -o absl/strings/CMakeFiles/strings.dir/str_cat.cc.o -c /jenkins/workspace/scylla-master/scylla-ci/scylla/abseil/absl/strings/str_cat.cc
clang-16: error: no such file or directory: './coverage_sources.list'`
```
in this change, we just remove the compiling options enabling the
coverage instrumentation from the cflags when building abseil.
Fixes#18686
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18748
Currently guard is released immediately because those functions are
based on continuations and guard lifetime is not extended. In the following
commit we rewrite those functions to coroutines and lifetime will be
automatically extended. This would deadlock the client because we'd
try to take second guard inside auth code without releasing this unused
one.
In the future commits auth guard will be removed and the one from
statement will be used but this needs some more code re-arrangements.
C++ standard does not define the order in which the parameters
passed to a function are evaluated. so in theory, in
```c++
reusable_sst(sst->get_schema(), std::move(sst));
```
`std::move(sst)` could be evaluated before `sst->get_schema`.
but please note, `std::move(sst)` does not move `sst`
away, it merely cast `sst` to a rvalue reference, it is
`reusable_sst()` which *could* move `sst` away by
consuming it. so following call is much more dangerous
than the above one:
```c++
reusable_sst(sst->get_schema(), modify_sst(std::move(sst)))
```
nevertheless, this usage is still confusing. so instead
of passing a copy of `sst` to `reusable_sst`.
this change is inspired by clang-tidy, it warns like:
```
Warning: /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: warning: 'sst' used after it was moved [bugprone-use-after-move]
397 | return reusable_sst(sst->get_schema(), std::move(sst));
| ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:44: note: move occurred here
397 | return reusable_sst(sst->get_schema(), std::move(sst));
| ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
397 | return reusable_sst(sst->get_schema(), std::move(sst));
|
```
per the analysis above, this is a false alarm.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18775
The existing inet_address::to_string() calls fmt::format("{}", *this)
anyway. However, the to_string() method is declared in .cc file, while
form formatter is in the header and is equipeed with constexprs so
that converting an address to string is done as much as possible
compile-time.
Also, though minor, fmt::to_string(foo) is believed to be even faster
than fmt::format("{}", foo).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18712