It needs some local naming cleanup, but otherwise it's pretty simple
Closesscylladb/scylladb#18510
* github.com:scylladb/scylladb:
generic_server: Fix indentation after previous patch
generic_server: Coroutinize listen() method
generic_server: Rename creds argument to builder
std::optional formatting changed while moving from the home-grown formatter to
the fmt provided formatter; don't rely on it for user visible messages.
Here, the optional formatted is known to be engaged, so just print it.
Closesscylladb/scylladb#18533
base timestamps are feeded into the sstable writer for calculating
delta, used by varints. given that expired ssts are bypassed, we
don't have to account them. so if we compacting fully expired and
new sstable together, we can save a bit by having a base ts closer
to the data actually written into output. also I wanted to move
the calculation into the loop in setup(), to avoid two iterations
over input set that can have even more than 1k elements.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18504
Until https://github.com/scylladb/scylladb/issues/15356 is fixed, this
will be handled by explicitly closing the connection, so if scylla fails
to update gossiper state due to premature abort on shutdown, then we
won't be stuck in an endless reconnection attempt (later through
heartbeats (30s interval)), causing the test to timeout.
Manifests in scylla logs like this:
gossip - failure_detector_loop: Got error in the loop, live_nodes={127.147.5.10, 127.147.5.16}: seastar::sleep_aborted (Sleep is aborted)
gossip - failure_detector_loop: Finished main loop
migration_manager - stopping migration service
storage_service - Shutting down native transport server
gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
cql_server_controller - CQL server stopped
...
gossip - My status = NORMAL
gossip - Announcing shutdown
gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
gossip - Sending a GossipShutdown to 127.147.5.10 with generation 1714449924
gossip - Sending a GossipShutdown to 127.147.5.16 with generation 1714449924
gossip - === Gossip round FAIL: seastar::abort_requested_exception (abort requested)
Refs #14746.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#18484
We won't run:
- old pre auth-v1 migration code
- code creating auth-v1 tables
We will keep running:
- code creating default rows
- code creating auth-v1 keyspace (needed due to cqlsh legacy hack,
it errors when executing `list roles` or `list users` if
there is no system_auth keyspace, it does support case when
there is no expected tables)
Fixes https://github.com/scylladb/scylladb/issues/17737Closesscylladb/scylladb#17939
* github.com:scylladb/scylladb:
auth: don't run legacy migrations on auth-v2 startup
auth: fix indent in password_authenticator::start
auth: remove unused service::has_existing_legacy_users func
To avoid conflicts arising from the discrepancy between different
versions of the repository, use coroutines instead of continuations
in service_level_controller::notify_service_level_removed().
Closesscylladb/scylladb#18525
We want to clear CDC generations that are no longer needed
(because all writes are already using a new generation) so they
don't take space and are not sent during snapshot transfers
(see e.g. https://github.com/scylladb/scylladb/issues/17545).
The condition used previously was that we clear generations which
were closed (i.e., a new generation started at this time) more than
24h ago. This is a safe choice, but too conservative: we could
easily end up with a large number of obsolete generations if we
boot multiple nodes during 24h (which is especially easy to do
with tablets.)
Change this bound from 24h to `5s + ring_delay`. The choice is
explained in a comment in the code.
Additionally, improve `test_raft_snapshot_request` that would
become flaky after the change so it's not sensitive to changes
anymore.
The raft-based topology was experimental before 6.0, no need
to backport.
Ref: scylladb/scylladb#17545Closesscylladb/scylladb#18497
* github.com:scylladb/scylladb:
topology_coordinator: clear obsolete generations earlier
test: test_raft_snapshot_request: improve the last assertion
test: test_raft_snapshot_request: find raft leader after restart
test: test_raft_shanpshot_request: simplify appended_command
During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints().
Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case).
The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster.
In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size.
Fixes: scylladb/scylladb#18198Closesscylladb/scylladb#18261
* github.com:scylladb/scylladb:
test: topology: test that upgrade succeeds after recent removal
topology_coordinator: compute cluster size correctly during upgrade
This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased.
The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost.
Refs scylladb/scylladb#6403Fixesscylladb/scylladb#12278Closesscylladb/scylladb#15567
* github.com:scylladb/scylladb:
docs: Update Hinted Handoff documentation
db/hints: Add endpoint_downtime_not_bigger_than()
db/hints: Migrate hinted handoff when cluster feature is enabled
db/hints: Handle arbitrary directories in resource manager
db/hints: Start using hint_directory_manager
db/hints: Enforce providing IP in get_ep_manager()
db/hints: Introduce hint_directory_manager
db/hints/resource_manager: Update function description
db/hints: Coroutinize space_watchdog::scan_one_ep_dir()
db/hints: Expose update lock of space watchdog
db/hints: Add function for migrating hint directories to host ID
db/hints: Take both IP and host ID when storing hints
db/hints: Prepare initializing endpoint managers for migrating from IP to host ID
db/hints: Migrate to locator::host_id
db/hints: Remove noexcept in do_send_one_mutation()
service: Add locator::host_id to on_leave_cluster
service: Fix indentation
db/hints: Fix indentation
`system_keyspace::read_cdc_generation_opt` queries
`system.cdc_generations_v3`, which stores ids of CDC generations
as timeuuids. This function shouldn't be called with a normal uuid
(used by `system.cdc_generations_v2` to store generation ids).
Such a call would end with a marshaling error.
Before this patch,`retrieve_generation_data_v2` could call
`system_keyspace::read_cdc_generation_opt` with a normal uuid if
the generation wasn't present in `system.cdc_generations_v2`.
This logic caused a marshaling error while handling the
`check_and_repair_cdc_streams` request in the
`cdc_test.TestCdc.test_check_and_repair_cdc_streams_liveness` dtest.
This patch fixes the code being added in 6.0, no need to backport it.
Fixesscylladb/scylladb#18473Closesscylladb/scylladb#18483
With topology over raft all operation are already serialized by the
coordinator anyway, so no need to synchronize removenode using api lock.
All others are still synchronized since there cannot be executed in
parallel for the same node anyway.
* 'gleb/17681-fix' of github.com:scylladb/scylla-dev:
storage_service: do not take API lock for removenode operation if topology coordinator is enabled
test: return file mark from wait_for that points after the found string
because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689,
the rebuilt abseil package provided by fedora has different settings
than the ones if the tree is built with the sanitizer enabled. this
inconsistency leads to a crash.
to address this problem, we have to reinstate the abseil submodule, so
we can built it with the same compiler options with which we build the
tree.
in this change
* Revert "build: drop abseil submodule, replace with distribution abseil"
* update CMake building system with abseil header include settings
* bump up the abseil submodule to the latest LTS branch of abseil:
lts_2024_01_16
* update scylla-gdb.py to adapt to the new structure of
flat_hash_map
This reverts commit 8635d24424.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18511
More than three years ago, in issue #7949, we noticed that trying to
set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the
fromJson() function) fails - the ascii key is incorrectly parsed.
We fixed that issue in commit 75109e9519
but unfortunately, did not do our due diligence: We did not write enough
tests inspired by this bug, and failed to discover that actually we have
the same bug for many other key types, not just for "ascii". Specifically,
the following key types have exactly the same bug:
* blob
* date
* inet
* time
* timestamp
* timeuuid
* uuid
Other types, like numbers or boolean worked "by accident" - instead of
parsing them as a normal string, we asked the JSON parser to parse them
again after removing the quotes, and because unquoted numbers and
unquoted true/false happwn to work in JSON, this didn't fail.
The fix here is very simple - for all *native* types (i.e., not
collections or tuples), the encoding of the key in JSON is simply a
quoted string - and removing the quotes is all we need to do and there's
no need to run the JSON parser a second time. Only for more elaborate
types - collections and tuples - we need to run the JSON parser a
second time on the key string to build the more elaborate object.
This patch also includes tests for fromJson() reading a map with all
native key types, confirming that all the aforementioned key types
were broken before this patch, and all key types (including the numbers
and booleans which worked even befoe this patch) work with this patch.
Fixes#18477.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#18482
`boost::range::random_shuffle()` uses the deprecated
`std::random_shuffle()` under the hood, so let's use
`std::ranges::shuffle()` which is available since C++20.
this change should address the warning like:
```
[312/753] CXX build/debug/test/boost/counter_test.o In file included from test/boost/counter_test.cc:17:
/usr/include/boost/range/algorithm/random_shuffle.hpp:106:13: warning: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard *, std::vector<counter_shard>>>' is deprecated: use 'std::shuffle' instead [-Wdepr
ecated-declarations]
106 | detail::random_shuffle(boost::begin(rng), boost::end(rng));
| ^
test/boost/counter_test.cc:507:27: note: in instantiation of function template specialization 'boost::range::random_shuffle<std::vector<counter_shard>>' requested here
507 | boost::range::random_shuffle(shards);
| ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_algo.h:4489:5: note: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard *, std::vector<counter_shard>>>' has been explicitly marked
deprecated here
4489 | _GLIBCXX14_DEPRECATED_SUGGEST("std::shuffle")
| ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1957:45: note: expanded from macro '_GLIBCXX14_DEPRECATED_SUGGEST'
1957 | # define _GLIBCXX14_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT)
| ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1941:19: note: expanded from macro '_GLIBCXX_DEPRECATED_SUGGEST'
1941 | __attribute__ ((__deprecated__ ("use '" ALT "' instead")))
| ^
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18517
It's completely unused, likely in favor of recently added formatter
for the type in question.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18502
So that it doesn't clash with local creds variable that will appear in
this method after its coroutinization.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
{fmt} v10.0.0 introduces formatter for `std::optional`, so there
is no need to test it. furthermore the behavior of this formatter
is different from our homebrew one. so let's skip this test if
{fmt} v10.0.0 or up is used.
Refs #18508
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18509
in {fmt} version 10.0.0, it has a regression, which dropped the
formatter for `char *`, even it does format `const char*`, as the
latter is convertible to
`fmt::stirng_view`.
and this issue was addressed in 10.1.0 using 616a4937, which adds
the formatter for `Char *` back, where `Char` is a template parameter.
but we do need to print `vector<char*>`, so, to address the build
failure with {fmt} version 10.0.0, which is shipped along with
fedora 39. let's backport this formatter.
Fixes#18503
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18505
This series adds facilities to gently convert canonical mutations back to mutations
and to gently make canonical mutations or freeze mutations in a seastar thread.
Those are used in storage_service::merge_topology_snapshot to prevent reactor stalls
due to large mutation, as seed in the test_add_many_nodes_under_load dtest.
Also, migration_manager migration_request was converted to use a seastar thread
to use the above facilities to prevent reactor stalls with large schema mutations,
e,g, with a large number of tables, and/or when reading tablets mutations with
a large number of tablets in a table.
perf-simple-query --write results:
Before:
```
median 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors)
```
After:
```
median 79716.73 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53314 insns/op, 0 errors)
```
Closesscylladb/scylladb#18290
* github.com:scylladb/scylladb:
storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method
raft: group0_state_machine: write_mutations_to_database: freeze mutations gently
database: apply_in_memory: unfreeze_gently large mutations
storage_service: get_system_mutations: make_canonical_mutation_gently
tablets: read_tablet_mutations: make_canonical_mutation_gently
schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently
schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference
storage_service: merge_topology_snapshot: freeze_gently
canonical_mutation: add make_canonical_mutation_gently
frozen_mutation: move unfreeze_gently to async_utils
mutation: add freeze_gently
idl-compiler: generate async serialization functions for stub members
raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently
storage_service: merge_topology_snapshot: co_await to_mutation_gently
canonical_mutation: add to_mutation_gently
idl-compiler: emit include directive in generated impl header file
mutation_partition: add apply_gently
collection_mutation: improve collection_mutation_view formatting
mutation_partition: apply_monotonically: do not support schema upgrade
test/perf: report also log_allocations/op
When a table has secondary indexes on *multiple* columns, and several
such columns are used for filtering in a query, Scylla chooses one
of these indexes as the main driver of the query, and the second
column's restriction is implemented as filtering.
Before this patch, the index to use was chosen fairly randomly, based on
the order of the indexes in the schema. This order may be different in
different coordinators, and may even change across restarts on the same
coordinators. This is not only inconsistent, it can cause outright wrong
results when using *paging* and switching (or restarting) coordinates
in the middle of a paged scan... One coordinator saves one index's key
in the paging state, and then the other coordinator gets this paging
state and wrongly believes it is supposed to be a key of a *different*
index.
The fix in this patch is to pick the index suitable for the first
indexed column mentioned in the query. This has two benefits over
the situation before the patch:
1. The decision of which index to use no longer changes between
coordinators or across restarts - it just depends on the schema
and the specific query.
2. Different indexes can have different "specificity" so using one
or the other can change the query's performance. After this patch,
the user is in control over which index is used by changing the
order of terms in the query. A curious user can use tracing to
check which index was used to implement a particular query.
An xfailing test we had for this issue no longer fails, so the "xfail"
marker is removed.
Fixes#7969
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#14450
write_mutations_to_database might need to handle
large mutations from system tables, so to prevent
reactor stalls, freeze the mutations gently
and call proxy.mutate_locally in parallel on
the individual frozen mutations, rather than
calling the vector<mutation> based entry point
that eventually freezes each mutation synchronously.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To prevent stalls due to large schema mutations.
While at it, reserve the result canonical_mutation vector.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The function upgrades the input mutation
only in certain cases. Currently it accepts
the input mutation by value, which may cause
and extraneous copy if the caller doesn't move
the mutation, as done in
`adjust_schema_for_schema_features`.
Getting an rvalue reference instead makes the
interface clearer.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Make a canonical mutation gently using an
async serialization function.
Similar to freeze_gently, yielding is considered
only in-between range tombstones and rows.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Unfreeze_gently doesn't have to be a method of
frozen_mutation. It might as well be implemented as
a free function reading from a frozen_mutation
and preparing a mutation gently.
The logic will be used in a later patch
to make a canonical mutation directly from
a frozen_mutation instead of unfreezing it
and then converting it to a canonical_mutation.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Allow yielding in between serializing of
range tombstones and rows to prevent reactor
stalls due to large mutations with many
rows or range tombstones.
mutations that have many cells might still
stall but those are considered infrequent enough
to ignore for now.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The generated implementation header file depends
on the generated header file for the types it uses.
Generate a respective #include directive to make it self-sufficient.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, if the input mutation_partition requires
schema upgrade, apply_monotonically always silently reverts to
being non-preemptible, even if the caller passed is_preemptible::yes.
To prevent that from happening, put the burden of upgrading
the mutation_partition schem on the caller, which is
today the apply() methods, which are synchronous anyhow.
With that, we reduce the proliferation of the
`apply_monotonically` overloads and keep only the
low level one (which could potentially be private as well,
as it's called only from within the mutation/ source files
and from tests)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Different sstable storage backends use slightly different notion of what sstable location is. Filesystem storage knows it's `/var/lib/data/ks/cf-uuid/state` path, while s3 storage keeps only this path's part without state (and even that's not very accurate, because bucket prefix is missing as well as "/var/lib/data" prefix is not needed and eventually should be omitted). Nonetheless, the sstable_directory still keeps the filsystem-like path, while it's really only needed by the filesystem lister. This PR removes it.
Closesscylladb/scylladb#18496
* github.com:scylladb/scylladb:
sstable_directory: Remove _sstable_dir member
sstable_directory: Create sstable path with make_path() when logging
sstable_directory: Use make_path to construct filesystem lister
sstable_directory: Move some logging around
We want to clear CDC generations that are no longer needed
(because all writes are already using a new generation) so they
don't take space and are not sent during snapshot transfers
(see e.g. scylladb/scylladb#17545).
The condition used previously was that we clear generations which
were closed (i.e., a new generation started at this time) more than
24h ago. This is a safe choice, but too conservative: we could
easily end up with a large number of obsolete generations if we
boot multiple nodes during 24h (which is especially easy to do
with tablets.)
Change this bound from 24h to `5s + ring_delay`. The choice is
explained in a comment in the code.
Also, prevent `test_cdc_generation_clearing` from being flaky by
firing the `increase_cdc_generation_leeway` error injection on
the server being the topology coordinator.
Ref: scylladb/scylladb#17545
The last assertion in the test is very sensitive to changes. The
constant has already been increased from 0 to 1 due to flakiness.
The old comment explains it.
In the following patch, we change the CDC generation publisher so
that it clears the obsolete CDC generations earlier. This change
would make this assertion flaky again. After restarting the servers,
the new topology coordinator could remove the first generation if it
became obsolete. This operation appends a new entry to the log. If
it happened after triggering snapshot, the assertion could fail
with `2 <= 1`.
We could increase the constant again to unflake the test, but we
better improve it once and for all. We change the assertion so
that it's not sensitive to changes in the code based on Raft. The
explanation is in the new comment.