Compare commits

..

5496 Commits

Author SHA1 Message Date
Łukasz Paszkowski
98a6002a1a compaction_manager: cancel submission timer on drain
The `drain` method, cancels all running compactions and moves the
compaction manager into the disabled state. To move it back to
the enabled state, the `enable` method shall be called.

This, however, throws an assertion error as the submission time is
not cancelled and re-enabling the manager tries to arm the armed timer.

Thus, cancel the timer, when calling the drain method to disable
the compaction manager.

Fixes https://github.com/scylladb/scylladb/issues/24504

All versions are affected. So it's a good candidate for a backport.

Closes scylladb/scylladb#24505

(cherry picked from commit a9a53d9178)

Closes scylladb/scylladb#24585
2025-06-29 14:40:43 +03:00
Avi Kivity
9f1ed14f9d Merge '[Backport 6.2] cql: create default superuser if it doesn't exist' from Marcin Maliszkiewicz
Backport of https://github.com/scylladb/scylladb/pull/20137 is needed for another backport https://github.com/scylladb/scylladb/pull/24690 to work correctly.

Fixes https://github.com/scylladb/scylladb/issues/24712

Closes scylladb/scylladb#24711

* github.com:scylladb/scylladb:
  test: test_restart_cluster: create the test
  auth: standard_role_manager allows awaiting superuser creation
  auth: coroutinize the standard_role_manager start() function
  auth: don't start server until the superuser is created
2025-06-29 14:34:55 +03:00
Aleksandra Martyniuk
8321747451 test: rest_api: fix test_repair_task_progress
test_repair_task_progress checks the progress of children of root
repair task. However, nothing ensures that the children are
already created.

Wait until at least one child of a root repair task is created.

Fixes: #24556.

Closes scylladb/scylladb#24560

(cherry picked from commit 0deb9209a0)

Closes scylladb/scylladb#24652
2025-06-28 09:40:37 +03:00
Paweł Zakrzewski
3bd3d720e0 test: test_restart_cluster: create the test
The purpose of this test that the cluster is able to boot up again after
a full cluster shutdown, thus exhibiting no issues when connecting to
raft group 0 that is larger than one.

(cherry picked from commit 900a6706b8)
2025-06-27 17:50:15 +02:00
Paweł Zakrzewski
3d6d3484b5 auth: standard_role_manager allows awaiting superuser creation
This change implements the ability to await superuser creation in the
function ensure_superuser_is_created(). This means that Scylla will not
be serving CQL connections until the superuser is created.

Fixes #10481

(cherry picked from commit 7008b71acc)
2025-06-27 17:50:08 +02:00
Paweł Zakrzewski
db1d3cc342 auth: coroutinize the standard_role_manager start() function
This change is a preparation for the next change. Moving to coroutines
makes the code more readable and easier to process.

(cherry picked from commit 04fc82620b)
2025-06-27 17:50:01 +02:00
Paweł Zakrzewski
66301eb2b6 auth: don't start server until the superuser is created
This change reorganizes the way standard_role_manager startup is
handled: now the future returned by its start() function can be used to
determine when startup has finished. We use this future to ensure the
startup is finished prior to starting the CQL server.

Some clusters are created without auth, and auth is added later. The
first node to recognize that auth is needed must create the superuser.
Currently this is always on restart, but if we were to ever make it
LiveUpdate then it would not be on restart.

This suggests that we don't really need to wait during restart.

This is a preparatory commit, laying ground for implementation of a
start() function that waits for the superuser to be created. The default
implementation returns a ready future, which makes no change in the code
behavior.

(cherry picked from commit f525d4b0c1)
2025-06-27 17:49:33 +02:00
Pavel Emelyanov
e5ac2285c0 Merge '[Backport 6.2] memtable: ensure _flushed_memory doesn't grow above total_memory' from Scylladb[bot]
`dirty_memory_manager` tracks two quantities about memtable memory usage:
"real" and "unspooled" memory usage.

"real" is the total memory usage (sum of `occupancy().total_space()`)
by all memtable LSA regions, plus a upper-bound estimate of the size of
memtable data which has already moved to the cache region but isn't
evictable (merged into the cache) yet.

"unspooled" is the difference between total memory usage by all memtable
LSA regions, and the total flushed memory (sum of `_flushed_memory`)
of memtables.

`dirty_memory_manager` controls the shares of compaction and/or blocks
writes when these quantities cross various thresholds.

"Total flushed memory" isn't a well defined notion,
since the actual consumption of memory by the same data can vary over
time due to LSA compactions, and even the data present in memtable can
change over the course of the flush due to removals of outdated MVCC versions.
So `_flushed_memory` is merely an approximation computed by `flush_reader`
based on the data passing through it.

This approximation is supposed to be a conservative lower bound.
In particular, `_flushed_memory` should be not greater than
`occupancy().total_space()`. Otherwise, for example, "unspooled" memory
could become negative (and/or wrap around) and weird things could happen.
There is an assertion in `~flush_memory_accounter` which checks that
`_flushed_memory < occupancy().total_space()` at the end of flush.

But it can fail. Without additional treatment, the memtable reader sometimes emits
data which is already deleted. (In particular, it emites rows covered by
a partition tombstone in a newer MVCC version.)
This data is seen by `flush_reader` and accounted in `_flushed_memory`.
But this data can be garbage-collected by the `mutation_cleaner` later during the
flush and decrease `total_memory` below `_flushed_memory`.

There is a piece of code in `mutation_cleaner` intended to prevent that.
If `total_memory` decreases during a `mutation_cleaner` run,
`_flushed_memory` is lowered by the same amount, just to preserve the
asserted property. (This could also make `_flushed_memory` quite inaccurate,
but that's considered acceptable).

But that only works if `total_memory` is decreased during that run. It doesn't
work if the `total_memory` decrease (enabled by the new allocator holes made
by `mutation_cleaner`'s garbage collection work) happens asynchronously
(due to memory reclaim for whatever reason) after the run.

This patch fixes that by tracking the decreases of `total_memory` closer to the
source. Instead of relying on `mutation_cleaner` to notify the memtable if it
lowers `total_memory`, the memtable itself listens for notifications about
LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's
estimate of flushed memory decreased by the change in `total_memory` since the
beginning of flush (if it was positive), and it keeps the amount of "spooled"
memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`.

Fixes scylladb/scylladb#21413

Backport candidate because it fixes a crash that can happen in existing stable branches.

- (cherry picked from commit 7d551f99be)

- (cherry picked from commit 975e7e405a)

Parent PR: #21638

Closes scylladb/scylladb#24601

* github.com:scylladb/scylladb:
  memtable: ensure _flushed_memory doesn't grow above total memory usage
  replica/memtable: move region_listener handlers from dirty_memory_manager to memtable
2025-06-24 10:12:41 +03:00
Michał Chojnowski
1c23edad22 memtable: ensure _flushed_memory doesn't grow above total memory usage
dirty_memory_manager tracks two quantities about memtable memory usage:
"real" and "unspooled" memory usage.

"real" is the total memory usage (sum of `occupancy().total_space()`)
by all memtable LSA regions, plus a upper-bound estimate of the size of
memtable data which has already moved to the cache region but isn't
evictable (merged into the cache) yet.

"unspooled" is the difference between total memory usage by all memtable
LSA regions, and the total flushed memory (sum of `_flushed_memory`)
of memtables.

dirty_memory_manager controls the shares of compaction and/or blocks
writes when these quantities cross various thresholds.

"Total flushed memory" isn't a well defined notion,
since the actual consumption of memory by the same data can vary over
time due to LSA compactions, and even the data present in memtable can
change over the course of the flush due to removals of outdated MVCC versions.
So `_flushed_memory` is merely an approximation computed by `flush_reader`
based on the data passing through it.

This approximation is supposed to be a conservative lower bound.
In particular, `_flushed_memory` should be not greater than
`occupancy().total_space()`. Otherwise, for example, "unspooled" memory
could become negative (and/or wrap around) and weird things could happen.
There is an assertion in ~flush_memory_accounter which checks that
`_flushed_memory < occupancy().total_space()` at the end of flush.

But it can fail. Without additional treatment, the memtable reader sometimes emits
data which is already deleted. (In particular, it emites rows covered by
a partition tombstone in a newer MVCC version.)
This data is seen `flush_reader` and accounted in `_flushed_memory`.
But this data can be garbage-collected by the mutation_cleaner later during the
flush and decrease `total_memory` below `_flushed_memory`.

There is a piece of code in mutation_cleaner intended to prevent that.
If `total_memory` decreases during a `mutation_cleaner` run,
`_flushed_memory` is lowered by the same amount, just to preserve the
asserted property. (This could also make `_flushed_memory` quite inaccurate,
but that's considered acceptable).

But that only works if `total_memory` is decreased during that run. It doesn't
work if the `total_memory` decrease (enabled by the new allocator holes made
by `mutation_cleaner`'s garbage collection work) happens asynchronously
(due to memory reclaim for whatever reason) after the run.

This patch fixes that by tracking the decreases of `total_memory` closer to the
source. Instead of relying on `mutation_cleaner` to notify the memtable if it
lowers `total_memory`, the memtable itself listens for notifications about
LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's
estimate of flushed memory decreased by the change in `total_memory` since the
beginning of flush (if it was positive), and it keeps the amount of "spooled"
memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`.

(cherry picked from commit 975e7e405a)
2025-06-22 17:37:27 +00:00
Michał Chojnowski
40d1186218 replica/memtable: move region_listener handlers from dirty_memory_manager to memtable
The memtable wants to listen for changes in its `total_memory` in order
to decrease its `_flushed_memory` in case some of the freed memory has already
been accounted as flushed. (This can happen because the flush reader sees
and accounts even outdated MVCC versions, which can be deleted and freed
during the flush).

Today, the memtable doesn't listen to those changes directly. Instead,
some calls which can affect `total_memory` (in particular, the mutation cleaner)
manually check the value of `total_memory` before and after they run, and they
pass the difference to the memtable.

But that's not good enough, because `total_memory` can also change outside
of those manually-checked calls -- for example, during LSA compaction, which
can occur anytime. This makes memtable's accounting inaccurate and can lead
to unexpected states.

But we already have an interface for listening to `total_memory` changes
actively, and `dirty_memory_manager`, which also needs to know it,
does just that. So what happens e.g. when `mutation_cleaner` runs
is that `mutation_cleaner` checks the value of `total_memory` before it runs,
then it runs, causing several changes to `total_memory` which are picked up
by `dirty_memory_manager`, then `mutation_cleaner` checks the end value of
`total_memory` and passes the difference to `memtable`, which corrects
whatever was observed by `dirty_memory_manager`.

To allow memtable to modify its `_flushed_memory` correctly, we need
to make `memtable` itself a `region_listener`. Also, instead of
the situation where `dirty_memory_manager` receives `total_memory`
change notifications from `logalloc` directly, and `memtable` fixes
the manager's state later, we want to only the memtable listen
for the notifications, and pass them already modified accordingl
to the manager, so there is no intermediate wrong states.

This patch moves the `region_listener` callbacks from the
`dirty_memory_manager` to the `memtable`. It's not intended to be
a functional change, just a source code refactoring.
The next patch will be a functional change enabled by this.

(cherry picked from commit 7d551f99be)
2025-06-22 17:37:27 +00:00
Pavel Emelyanov
60cb1c0b2f sstable_directory: Print ks.cf when moving unshared remove sstables
When an sstable is identified by sstable_directory as remote-unshared,
it will at some point be moved to the target shard. When it happens a
log-message appears:

    sstable_directory - Moving 1 unshared SSTables to shard 1

Processing of tables by sstable_directory often happens in parallel, and
messages from sstable_directory are intermixed. Having a message like
above is not very informative, as it tells nothing about sstables that
are being moved.

Equip the message with ks:cf pair to make it more informative.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23912

(cherry picked from commit d40d6801b0)

Closes scylladb/scylladb#24014
2025-06-17 18:24:13 +03:00
Pavel Emelyanov
a00b4a027e Update seastar submodule (no nested stall backtraces)
* seastar e40388c4...b383afb0 (1):
  > stall_detector: no backtrace if exception

Fixes #24464

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24536
2025-06-17 18:23:06 +03:00
Nikos Dragazis
8ef55444f5 sstables: Fix race when loading checksum component
`read_checksum()` loads the checksum component from disk and stores a
non-owning reference in the shareable components. To avoid loading the
same component twice, the function has an early return statement.
However, this does not guarantee atomicity - two fibers or threads may
load the component and update the shareable components concurrently.
This can lead to use-after-free situations when accessing the component
through the shareable components, since the reference stored there is
non-owning. This can happen when multiple compaction tasks run on the
same SSTable (e.g., regular compaction and scrub-validate).

Fix this by not updating the reference in shareable components, if a
reference is already in place. Instead, create an owning reference to
the existing component for the current fiber. This is less efficient
than using a mutex, since the component may be loaded multiple times
from disk before noticing the race, but no locks are used for any other
SSTable component either. Also, this affects uncompressed SSTables,
which are not that common.

Fixes #23728.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#23872

(cherry picked from commit eaa2ce1bb5)

Closes scylladb/scylladb#24267
2025-06-17 13:20:30 +03:00
Szymon Malewski
8eebc97ae1 mapreduce_service: Prevent race condition
In parallelized aggregation functions super-coordinator (node performing final merging step) receives and merges each partial result in parallel coroutines (`parallel_for_each`).
Usually responses are spread over time and actual merging is atomic.
However sometimes partial results are received at the similar time and if an aggregate function (e.g. lua script) yields, two coroutines can try to overwrite the same accumulator one after another,
which leads to losing some of the results.
To prevent this, in this patch each coroutine stores merging results in its own context and overwrites accumulator atomically, only after it was fully merged.
Comparing to the previous implementation order of operands in merging function is swapped, but the order of aggregation is not guaranteed anyway.

Fixes #20662

Closes scylladb/scylladb#24106

(cherry picked from commit 5969809607)

Closes scylladb/scylladb#24387
2025-06-17 13:20:12 +03:00
Pavel Emelyanov
884293a382 Merge '[Backport 6.2] tablets: deallocate storage state on end_migration' from Scylladb[bot]
When a tablet is migrated and cleaned up, deallocate the tablet storage
group state on `end_migration` stage, instead of `cleanup` stage:

* When the stage is updated from `cleanup` to `end_migration`, the
  storage group is removed on the leaving replica.
* When the table is initialized, if the tablet stage is `end_migration`
  then we don't allocate a storage group for it. This happens for
  example if the leaving replica is restarted during tablet migration.
  If it's initialized in `cleanup` stage then we allocate a storage
  group, and it will be deallocated when transitioning to
  `end_migration`.

This guarantees that the storage group is always deallocated on the
leaving replica by `end_migration`, and that it is always allocated if
the tablet wasn't cleaned up fully yet.

It is a similar case also for the pending replica when the migration is
aborted. We deallocate the state on `revert_migration` which is the
stage following `cleanup_target`.

Previously the storage group would be allocated when the tablet is
initialized on any of the tablet replicas - also on the leaving replica,
and when the tablet stage is `cleanup` or `end_migration`, and
deallocated during `cleanup`.

This fixes the following issue:

1. A migrating tablet enters cleanup stage
2. the tablet is cleaned up successfuly
3. The leaving replica is restarted, and allocates storage group
4. tablet cleanup is not called because it's already cleaned up
5. the storage group remains allocated on the leaving replica after the
   migration is completed - it's not cleaned up properly.

Fixes https://github.com/scylladb/scylladb/issues/23481

backport to all relevant releases since it's a bug that results in a crash

- (cherry picked from commit 34f15ca871)

- (cherry picked from commit fb18fc0505)

- (cherry picked from commit bd88ca92c8)

Parent PR: #24393

Closes scylladb/scylladb#24486

* github.com:scylladb/scylladb:
  test/cluster/test_tablets: test restart during tablet cleanup
  test: tablets: add get_tablet_info helper
  tablets: deallocate storage state on end_migration
2025-06-17 13:19:49 +03:00
Michael Litvak
06e5a48d17 test/cluster/test_tablets: test restart during tablet cleanup
Add a test that reproduces issue scylladb/scylladb#23481.

The test migrates a tablet from one node to another, and while the
tablet is in some stage of cleanup - either before or right after,
depending on the parameter - the leaving replica, on which the tablet is
cleaned, is restarted.

This is interesting because when the leaving replica starts and loads
its state, the tablet could be in different stages of cleanup - the
SSTables may still exist or they may have been cleaned up already, and
we want to make sure the state is loaded correctly.

(cherry picked from commit bd88ca92c8)
2025-06-12 14:33:07 +03:00
Michael Litvak
c5d11205c1 test: tablets: add get_tablet_info helper
Add a helper for tests to get the tablet info from system.tablets for a
tablet owning a given token.

(cherry picked from commit fb18fc0505)
2025-06-12 03:24:00 +00:00
Michael Litvak
0b46cfb60d tablets: deallocate storage state on end_migration
When a tablet is migrated and cleaned up, deallocate the tablet storage
group state on `end_migration` stage, instead of `cleanup` stage:

* When the stage is updated from `cleanup` to `end_migration`, the
  storage group is removed on the leaving replica.
* When the table is initialized, if the tablet stage is `end_migration`
  then we don't allocate a storage group for it. This happens for
  example if the leaving replica is restarted during tablet migration.
  If it's initialized in `cleanup` stage then we allocate a storage
  group, and it will be deallocated when transitioning to
  `end_migration`.

This guarantees that the storage group is always deallocated on the
leaving replica by `end_migration`, and that it is always allocated if
the tablet wasn't cleaned up fully yet.

It is a similar case also for the pending replica when the migration is
aborted. We deallocate the state on `revert_migration` which is the
stage following `cleanup_target`.

Previously the storage group would be allocated when the tablet is
initialized on any of the tablet replicas - also on the leaving replica,
and when the tablet stage is `cleanup` or `end_migration`, and
deallocated during `cleanup`.

This fixes the following issue:

1. A migrating tablet enters cleanup stage
2. the tablet is cleaned up successfuly
3. The leaving replica is restarted, and allocates storage group
4. tablet cleanup is not called because it was already cleaned up
4. the storage group remains allocated on the leaving replica after the
   migration is completed - it's not cleaned up properly.

Fixes scylladb/scylladb#23481

(cherry picked from commit 34f15ca871)
2025-06-12 03:24:00 +00:00
Michał Chojnowski
4a18a08284 utils/lsa/chunked_managed_vector: fix the calculation of max_chunk_capacity()
`chunked_managed_vector` is a vector-like container which splits
its contents into multiple contiguous allocations if necessary,
in order to fit within LSA's max preferred contiguous allocation
limits.

Each limited-size chunk is stored in a `managed_vector`.
`managed_vector` is unaware of LSA's size limits.
It's up to the user of `managed_vector` to pick a size which
is small enough.

This happens in `chunked_managed_vector::max_chunk_capacity()`.
But the calculation is wrong, because it doesn't account for
the fact that `managed_vector` has to place some metadata
(the backreference pointer) inside the allocation.
In effect, the chunks allocated by `chunked_managed_vector`
are just a tiny bit larger than the limit, and the limit is violated.

Fix this by accounting for the metadata.

Also, before the patch `chunked_managed_vector::max_contiguous_allocation`,
repeats the definition of logalloc::max_managed_object_size.
This is begging for a bug if `logalloc::max_managed_object_size`
changes one day. Adjust it so that `chunked_managed_vector` looks
directly at `logalloc::max_managed_object_size`, as it means to.

Fixes scylladb/scylladb#23854

(cherry picked from commit 7f9152babc)

Closes scylladb/scylladb#24369
2025-06-03 18:10:36 +03:00
Piotr Dulikowski
1571948cd7 topology_coordinator: silence ERROR messages on abort
When the topology coordinator is shut down while doing a long-running
operation, the current operation might throw a raft::request_aborted
exception. This is not a critical issue and should not be logged with
ERROR verbosity level.

Make sure that all the try..catch blocks in the topology coordinator
which:

- May try to acquire a new group0 guard in the `try` part
- Have a `catch (...)` block that print an ERROR-level message

...have a pass-through `catch (raft::request_aborted&)` block which does
not log the exception.

Fixes: scylladb/scylladb#22649

Closes scylladb/scylladb#23962

(cherry picked from commit 156ff8798b)

Closes scylladb/scylladb#24074
2025-05-13 20:41:41 +03:00
Pavel Emelyanov
5a82f0d217 Merge '[Backport 6.2] replica: skip flush of dropped table' from Scylladb[bot]
Currently, flush throws no_such_column_family if a table is dropped. Skip the flush of dropped table instead.

Fixes: #16095.

Needs backport to 2025.1 and 6.2 as they contain the bug

- (cherry picked from commit 91b57e79f3)

- (cherry picked from commit c1618c7de5)

Parent PR: #23876

Closes scylladb/scylladb#23904

* github.com:scylladb/scylladb:
  test: test table drop during flush
  replica: skip flush of dropped table
2025-05-13 14:00:38 +03:00
Aleksandra Martyniuk
874b4f8d9c streaming: skip dropped tables
Currently, stream_session::prepare throws when a table in requests
or summaries is dropped. However, we do not want to fail streaming
if the table is dropped.

Delete table checks from stream_session::prepare. Further streaming
steps can handle the dropped table and finish the streaming successfully.

Fixes: #15257.

Closes scylladb/scylladb#23915

(cherry picked from commit 20c2d6210e)

Closes scylladb/scylladb#24050
2025-05-13 13:56:46 +03:00
Aleksandra Martyniuk
91a1acc314 test: test table drop during flush
(cherry picked from commit c1618c7de5)
2025-05-09 09:27:08 +02:00
Aleksandra Martyniuk
720bd681f0 replica: skip flush of dropped table
(cherry picked from commit 91b57e79f3)
2025-05-09 09:26:44 +02:00
Piotr Dulikowski
b3bc3489dd utils::loading_cache: gracefully skip timer if gate closed
The loading_cache has a periodic timer which acquires the
_timer_reads_gate. The stop() method first closes the gate and then
cancels the timer - this order is necessary because the timer is
re-armed under the gate. However, the timer callback does not check
whether the gate was closed but tries to acquire it, which might result
in unhandled exception which is logged with ERROR severity.

Fix the timer callback by acquiring access to the gate at the beginning
and gracefully returning if the gate is closed. Even though the gate
used to be entered in the middle of the callback, it does not make sense
to execute the timer's logic at all if the cache is being stopped.

Fixes: scylladb/scylladb#23951

Closes scylladb/scylladb#23952

(cherry picked from commit 8ffe4b0308)

Closes scylladb/scylladb#23980
2025-05-06 10:19:32 +02:00
Botond Dénes
e4c6a3c068 Merge '[Backport 6.2] topology coordinator: do not proceed further on invalid boostrap tokens' from Scylladb[bot]
In case when dht::boot_strapper::get_boostrap_tokens fail to parse the
tokens, the topology coordinator handles the exception and schedules a
rollback. However, the current code tries to continue with the topology
coordinator logic even if an exception occurs, leaving boostrap_tokens
empty. This does not make sense and can actually cause issues,
specifically in prepare_and_broadcast_cdc_generation_data which
implicitly expect that the bootstrap_tokens of the first node in the
cluster will not be empty.

Fix this by adding the missing break.

Fixes: scylladb/scylladb#23897

From the code inspection alone it looks like 2025.1 and 6.2 have this problem, so marking for backport to both of them.

- (cherry picked from commit 66acaa1bf8)

- (cherry picked from commit 845cedea7f)

- (cherry picked from commit 670a69007e)

Parent PR: #23914

Closes scylladb/scylladb#23948

* github.com:scylladb/scylladb:
  test: cluster: add test_bad_initial_token
  topology coordinator: do not proceed further on invalid boostrap tokens
  cdc: add sanity check for generating an empty generation
2025-05-01 08:32:13 +03:00
Botond Dénes
254f535f63 Merge '[Backport 6.2] tasks: check whether a node is alive before rpc' from Scylladb[bot]
Check whether a node is alive before making an rpc that gathers children
infos from the whole cluster in virtual_task::impl::get_children.

Fixes: https://github.com/scylladb/scylladb/issues/22514.

Needs backport to 2025.1 and 6.2 as they contain the bug.

- (cherry picked from commit 53e0f79947)

- (cherry picked from commit e178bd7847)

Parent PR: #23787

Closes scylladb/scylladb#23942

* github.com:scylladb/scylladb:
  test: add test for getting tasks children
  tasks: check whether a node is alive before rpc
2025-05-01 08:30:09 +03:00
Avi Kivity
c20b2ea2af Merge '[Backport 6.2] Ensure raft group0 RPCs use the gossip scheduling group.' from Scylladb[bot]
Scylla operations use concurrency semaphores to limit the number of concurrent operations and prevent resource exhaustion. The semaphore is selected based on the current scheduling group.

For RAFT group operations, it is essential to use a system semaphore to avoid queuing behind user operations. This patch ensures that RAFT operations use the `gossip` scheduling group to leverage the system semaphore.

Fixes scylladb/scylladb#21637

Backport: 6.2 and 6.1

- (cherry picked from commit 60f1053087)

- (cherry picked from commit e05c082002)

Parent PR: #22779

Closes scylladb/scylladb#23769

* github.com:scylladb/scylladb:
  ensure raft group0 RPCs use the gossip scheduling group
  Move RAFT operations verbs to GOSSIP group.
2025-04-30 16:45:20 +03:00
Aleksandra Martyniuk
cc37c64467 test: add test for getting tasks children
Add test that checks whether the children of a virtual task will be
properly gathered if a node is down.

(cherry picked from commit e178bd7847)
2025-04-30 10:40:32 +02:00
Aleksandra Martyniuk
47377cd5d4 tasks: check whether a node is alive before rpc
Check whether a node is alive before making an rpc that gathers children
infos from the whole cluster in virtual_task::impl::get_children.

(cherry picked from commit 53e0f79947)
2025-04-30 10:35:45 +02:00
Sergey Zolotukhin
5c90107c14 ensure raft group0 RPCs use the gossip scheduling group
Scylla operations use concurrency semaphores to limit the number
of concurrent operations and prevent resource exhaustion. The
semaphore is selected based on the current scheduling group.
For Raft group operations, it is essential to use a system semaphore to
avoid queuing behind user operations.
This commit adds a check to ensure that the raft group0 RPCs are
executed with the `gossiper` scheduling group.

(cherry picked from commit e05c082002)
2025-04-30 08:49:07 +02:00
Sergey Zolotukhin
612f184638 Move RAFT operations verbs to GOSSIP group.
In order for RAFT operations to use the gossip system semaphore, moving RAFT
verbs to the gossip group in `do_get_rpc_client_idx`,  messaging_service.

Fixes scylladb/scylladb21637

(cherry picked from commit 60f1053087)
2025-04-29 19:25:55 +00:00
Piotr Dulikowski
d881f3f14d test: cluster: add test_bad_initial_token
Adds a test which checks that rollback works properly in case when a bad
value of the initial_token function is provided.

(cherry picked from commit 670a69007e)
2025-04-28 17:07:23 +00:00
Piotr Dulikowski
7eda572129 topology coordinator: do not proceed further on invalid boostrap tokens
In case when dht::boot_strapper::get_boostrap_tokens fail to parse the
tokens, the topology coordinator handles the exception and schedules a
rollback. However, the current code tries to continue with the topology
coordinator logic even if an exception occurs, leaving boostrap_tokens
empty. This does not make sense and can actually cause issues,
specifically in prepare_and_broadcast_cdc_generation_data which
implicitly expect that the bootstrap_tokens of the first node in the
cluster will not be empty.

Fix this by adding the missing break.

Fixes: scylladb/scylladb#23897
(cherry picked from commit 845cedea7f)
2025-04-28 17:07:23 +00:00
Piotr Dulikowski
41e6df7407 cdc: add sanity check for generating an empty generation
It doesn't make sense to create an empty CDC generation because it does
not make sense to have a cluster with no tokens. Add a sanity check to
cdc::make_new_generation_description which fails if somebody attempts to
do that (i.e. when the set of current tokens + optionally bootstrapping
node's tokens is empty).

The function does not work correctly if it is misused, as we saw in
scylladb/scylladb#23897. While the function should not be misused in the
first place, it's better to throw an exception rather than crash -
especially that this crash could happen on the topology coordinator.

(cherry picked from commit 66acaa1bf8)
2025-04-28 17:07:22 +00:00
Tomasz Grabiec
b48d1abade Merge '[Backport 6.2] Cache base info for view schemas in the schema registry' from Scylladb[bot]
Currently, when we load a frozen schema into the registry, we lose
the base info if the schema was of a view. Because of that, in various
places we need to set the base info again, and in some codepaths we
may miss it completely, which may make us unable to process some
requests (for example, when executing reverse queries on views).
Even after setting the base info, we may still lose it if the schema
entry gets deactivated due to all `schema_ptr`s temporarily dying.

To fix this, this patch adds the base schema to the registry, alongside
the view schema. We store just the frozen base schema, so that we can
transfer it across shards. With the base schema, we can now set the base
info when returning the schema from the registry. As a result, we can now
assume that all view schemas returned by the registry have base_info set.

In this series we also make sure that the view schemas in the registry are
kept up-to-date in regards to base schema changes.

Fixes https://github.com/scylladb/scylladb/issues/21354

This issue is a bug, so adding backport labels 6.1 and 6.2

- (cherry picked from commit 6f11edbf3f)

- (cherry picked from commit dfe3810f64)

- (cherry picked from commit 82f2e1b44c)

- (cherry picked from commit 3094ff7cbe)

- (cherry picked from commit 74cbc77f50)

Parent PR: #21862

Closes scylladb/scylladb#23046

* github.com:scylladb/scylladb:
  test: add test for schema registry maintaining base info for views
  schema_registry: avoid setting base info when getting the schema from registry
  schema_registry: update cached base schemas when updating a view
  schema_registry: cache base schemas for views
  db: set base info before adding schema to registry
2025-04-25 18:42:57 +02:00
Tomasz Grabiec
9190779ee3 Merge '[Backport 6.2] storage_service: preserve state of busy topology when transiting tablet' from Scylladb[bot]
Commit 876478b84f ("storage_service: allow concurrent tablet migration in tablets/move API", 2024-02-08) introduced a code path on which the topology state machine would be busy -- in "tablet_draining" or "tablet_migration" state -- at the time of starting tablet migration. The pre-commit code would unconditionally transition the topology to "tablet_migration" state, assuming the topology had been idle previously. On the new code path, this state change would be idempotent if the topology state machine had been busy in "tablet_migration", but the state change would incorrectly overwrite the "tablet_draining" state otherwise.

Restrict the state change to when the topology state machine is idle.

In addition, add the topology update to the "updates" vector with plain push_back(). emplace_back() is not helpful here, as topology_mutation_builder::build() cannot construct in-place, and so we invoke the "canonical_mutation" move constructor once, either way.

Unit test:

Start a two node cluster. Create a single tablet on one of the nodes. Start decommissioning that node, but block decommissioning at once. In that state (i.e., in "tablet_draining"), move the tablet manually to the other node. Check that transit_tablet() leaves the topology transition state alone.

Fixes https://github.com/scylladb/scylladb/issues/20073.

Commit 876478b84f was first released in scylla-6.0.0, so we might want to backport this patch accordingly.

- (cherry picked from commit e1186f0ae6)

- (cherry picked from commit 841ca652a0)

Parent PR: #23751

Closes scylladb/scylladb#23768

* github.com:scylladb/scylladb:
  storage_service: add unit test for mid-decommission transit_tablet()
  storage_service: preserve state of busy topology when transiting tablet
2025-04-20 20:13:11 +02:00
Avi Kivity
8db020e782 Merge '[Backport 6.2] managed_bytes: in the copy constructor, respect the target preferred allocation size' from Scylladb[bot]
Commit 14bf09f447 added a single-chunk layout to `managed_bytes`, which makes the overhead of `managed_bytes` smaller in the common case of a small buffer.

But there was a bug in it. In the copy constructor of `managed_bytes`, a copy of a single-chunk `managed_bytes` is made single-chunk too.

But this is wrong, because the source of the copy and the target of the copy might have different preferred max contiguous allocation sizes.

In particular, if a `managed_bytes` of size between 13 kiB and 128 kiB is copied from the standard allocator into LSA, the resulting `managed_bytes` is a single chunk which violates LSA's preferred allocation size. (And therefore is placed by LSA in the standard allocator).

In other words, since Scylla 6.0, cache and memtable cells between 13 kiB and 128 kiB are getting allocated in the standard allocator rather than inside LSA segments.

Consequences of the bug:

1. Effective memory consumption of an affected cell is rounded up to the nearest power of 2.

2. With a pathological-enough allocation pattern (for example, one which somehow ends up placing a single 16 kiB memtable-owned allocation in every aligned 128 kiB span), memtable flushing could theoretically deadlock, because the allocator might be too fragmented to let the memtable grow by another 128 kiB segment, while keeping the sum of all allocations small enough to avoid triggering a flush. (Such an allocation pattern probably wouldn't happen in practice though).

3. It triggers a bug in reclaim which results in spurious allocation failures despite ample evictable memory.

   There is a path in the reclaimer procedure where we check whether reclamation succeeded by checking that the number of free LSA segments grew.

   But in the presence of evictable non-LSA allocations, this is wrong because the reclaim might have met its target by evicting the non-LSA allocations, in which case memory is returned directly to the standard allocator, rather than to the pool of free segments.

   If that happens, the reclaimer wrongly returns `reclaimed_nothing` to Seastar, which fails the allocation.

Refs (possibly fixes) https://github.com/scylladb/scylladb/issues/21072
Fixes https://github.com/scylladb/scylladb/issues/22941
Fixes https://github.com/scylladb/scylladb/issues/22389
Fixes https://github.com/scylladb/scylladb/issues/23781

This is a regression fix, should be backported to all affected releases.

- (cherry picked from commit 4e2f62143b)

- (cherry picked from commit 6c1889f65c)

Parent PR: #23782

Closes scylladb/scylladb#23809

* github.com:scylladb/scylladb:
  managed_bytes_test: add a reproducer for #23781
  managed_bytes: in the copy constructor, respect the target preferred allocation size
2025-04-19 18:43:31 +03:00
Calle Wilund
d566010412 network_topology_strategy/alter ks: Remove dc:s from options once rf=0
Fixes #22688

If we set a dc rf to zero, the options map will still retain a dc=0 entry.
If this dc is decommissioned, any further alters of keyspace will fail,
because the union of new/old options will now contained an unknown keyword.

Change alter ks options processing to simply remove any dc with rf=0 on
alter, and treat this as an implicit dc=0 in nw-topo strategy.
This means we change the reallocate_tablets routine to not rely on
the strategy objects dc mapping, but the full replica topology info
for dc:s to consider for reallocation. Since we verify the input
on attribute processing, the amount of rf/tablets moved should still
be legal.

v2:
* Update docs as well.
v3:
* Simplify dc processing
* Reintroduce options empty check, but do early in ks_prop_defs
* Clean up unit test some

Closes scylladb/scylladb#22693

(cherry picked from commit 342df0b1a8)

(Update: workaround python test objects not having dc info)

Closes scylladb/scylladb#22876
2025-04-18 14:09:52 +03:00
Michał Chojnowski
ba3975858b managed_bytes_test: add a reproducer for #23781
(cherry picked from commit 6c1889f65c)
2025-04-18 07:55:23 +00:00
Michał Chojnowski
f75b0d49a0 managed_bytes: in the copy constructor, respect the target preferred allocation size
Commit 14bf09f447 added a single-chunk
layout to `managed_bytes`, which makes the overhead of `managed_bytes`
smaller in the common case of a small buffer.

But there was a bug in it. In the copy constructor of `managed_bytes`,
a copy of a single-chunk `managed_bytes` is made single-chunk too.

But this is wrong, because the source of the copy and the target
of the copy might have different preferred max contiguous allocation
sizes.

In particular, if a `managed_bytes` of size between 13 kiB and 128 kiB
is copied from the standard allocator into LSA, the resulting
`managed_bytes` is a single chunk which violates LSA's preferred
allocation size. (And therefore is placed by LSA in the standard
allocator).

In other words, since Scylla 6.0, cache and memtable cells
between 13 kiB and 128 kiB are getting allocated in the standard allocator
rather than inside LSA segments.

Consequences of the bug:

1. Effective memory consumption of an affected cell is rounded up to the nearest
   power of 2.

2. With a pathological-enough allocation pattern
   (for example, one which somehow ends up placing a single 16 kiB
   memtable-owned allocation in every aligned 128 kiB span),
   memtable flushing could theoretically deadlock,
   because the allocator might be too fragmented to let the memtable
   grow by another 128 kiB segment, while keeping the sum of all
   allocations small enough to avoid triggering a flush.
   (Such an allocation pattern probably wouldn't happen in practice though).

3. It triggers a bug in reclaim which results in spurious
   allocation failures despite ample evictable memory.

   There is a path in the reclaimer procedure where we check whether
   reclamation succeeded by checking that the number of free LSA
   segments grew.

   But in the presence of evictable non-LSA allocations, this is wrong
   because the reclaim might have met its target by evicting the non-LSA
   allocations, in which case memory is returned directly to the
   standard allocator, rather than to the pool of free segments.

   If that happens, the reclaimer wrongly returns `reclaimed_nothing`
   to Seastar, which fails the allocation.

Refs (possibly fixes) https://github.com/scylladb/scylladb/issues/21072
Fixes https://github.com/scylladb/scylladb/issues/22941
Fixes https://github.com/scylladb/scylladb/issues/22389
Fixes https://github.com/scylladb/scylladb/issues/23781

(cherry picked from commit 4e2f62143b)
2025-04-18 07:55:23 +00:00
Botond Dénes
e1ce7c36a7 test/cluster/test_read_repair.py: increase read request timeout
This test enables trace-level logging for the mutation_data logger,
which seems to be too much in debug mode and the test read times out.
Increase timeout to 1minute to avoid this.

Fixes: #23513
Fixes: #23512

Closes scylladb/scylladb#23558

(cherry picked from commit 7bbfa5293f)

Closes scylladb/scylladb#23793
2025-04-18 06:34:25 +03:00
Laszlo Ersek
597030a821 storage_service: add unit test for mid-decommission transit_tablet()
Start a two node cluster. Create a single tablet on one of the nodes.
Start decommissioning that node, but block decommissioning at once. In
that state (i.e., in "tablet_draining"), move the tablet manually to the
other node. Check that transit_tablet() leaves the topology transition
state alone.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
(cherry picked from commit 841ca652a0)
2025-04-17 09:09:43 +02:00
Laszlo Ersek
add4022d32 storage_service: preserve state of busy topology when transiting tablet
Commit 876478b84f ("storage_service: allow concurrent tablet migration
in tablets/move API", 2024-02-08) introduced a code path on which the
topology state machine would be busy -- in "tablet_draining" or
"tablet_migration" state -- at the time of starting tablet migration. The
pre-commit code would unconditionally transition the topology to
"tablet_migration" state, assuming the topology had been idle previously.
On the new code path, this state change would be idempotent if the
topology state machine had been busy in "tablet_migration", but the state
change would incorrectly overwrite the "tablet_draining" state otherwise.

Restrict the state change to when the topology state machine is idle.

In addition, add the topology update to the "updates" vector with plain
push_back(). emplace_back() is not helpful here, as
topology_mutation_builder::build() cannot construct in-place, and so we
invoke the "canonical_mutation" move constructor once, either way.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
(cherry picked from commit e1186f0ae6)
2025-04-17 08:32:56 +02:00
Avi Kivity
b6f1cacfea scylla-gdb: small-objects: fix for very small objects
Because of rounding and alignment, there are multiple pools for small
sizes (e.g. 4 for size 32). Because the pool selection algorithm
ignores alignment, different pools can be chosen for different object
sizes. For example, an object size of 29 will choose the first pool
of size 32, while an object size of 32 will choose the fourth pool of
size 32.

The small-objects command doesn't know about this and always considers
just the first pool for a given size. This causes it to miss out on
sister pools.

While it's possible to adjust pool selection to always choose one of the
pools, it may eat a precious cycle. So instead let's compensate in the
small-objects command. Instead of finding one pool for a given size,
find all of them, and iterate over all those pools.

Fixes #23603

Closes scylladb/scylladb#23604

(cherry picked from commit b4d4e48381)

Closes scylladb/scylladb#23748
2025-04-16 14:36:28 +03:00
Botond Dénes
c1defce1de Merge '[Backport 6.2] transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing' from Scylladb[bot]
A default timestamp (not to confuse with the timestamp passed via 'USING TIMESTAMP' query clause) can be set using 0x20 flag and the <timestamp> field in the binary CQL frame payload of QUERY, EXECUTE and BATCH ops. It also happens to be a default of a Java CQL Driver.

However, we were only setting the corresponding info in the CQL Tracing context of a QUERY operation. For an unknown reason we were not setting this for an EXECUTE and for a BATCH traces (I guess I simply forgot to set it back then).

This patch fixes this.

Fixes #23173

The issue fixed by this PR is not critical but the fix is simple and safe enough so we should backport it to all live releases.

- (cherry picked from commit ca6bddef35)

- (cherry picked from commit f7e1695068)

Parent PR: #23174

Closes scylladb/scylladb#23523

* github.com:scylladb/scylladb:
  CQL Tracing: set common query parameters in a single function
  transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing
2025-04-11 17:11:06 +03:00
Botond Dénes
61d1262674 Merge '[Backport 6.2] reader_concurrency_semaphore: register_inactive_read(): handle aborted permit' from Scylladb[bot]
It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up.

Fixes: scylladb/scylladb#22919

Bug is present in all live versions, backports are required.

- (cherry picked from commit 4d8eb02b8d)

- (cherry picked from commit 7ba29ec46c)

Parent PR: #23044

Closes scylladb/scylladb#23144

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: register_inactive_read(): handle aborted permit
  test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()
2025-04-11 17:10:39 +03:00
Botond Dénes
ae54d4e886 Merge '[Backport 6.2] streaming: fix the way a reason of streaming failure is determined' from Scylladb[bot]
During streaming receiving node gets and processes mutation fragments.
If this operation fails, receiver responds with -1 status code, unless
it failed due to no_such_column_family in which case streaming of this
table should be skipped.

However, when the table was dropped, an exception handler on receiver
side may get not only data_dictionary::no_such_column_family, but also
seastar::nested_exception of two no_such_column_family.

Encountered example:
```
ERROR 2025-02-12 15:20:51,508 [shard 0:strm] stream_session - [Stream #f1cd6830-e954-11ef-afd9-b022e40bf72d] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=756dd3fe-2bf0-4dcd-afbc-cfd5202669a0: seastar::nested_exception: data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14) (while cleaning up after data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14))
```

In this case, the exception does not match the try_catch<data_dictionary::no_such_column_family>
clause and gets handled the same as any other exception type.

Replace try_catch clause with table_sync_and_check that synchronizes
the schema and check if the table exists.

Fixes: https://github.com/scylladb/scylladb/issues/22834.

Needs backport to all live version, as they all contain the bug

- (cherry picked from commit 876cf32e9d)

- (cherry picked from commit faf3aa13db)

- (cherry picked from commit 44748d624d)

- (cherry picked from commit 35bc1fe276)

Parent PR: #22868

Closes scylladb/scylladb#23289

* github.com:scylladb/scylladb:
  streaming: fix the way a reason of streaming failure is determined
  streaming: save a continuation lambda
  streaming: use streaming namespace in table_check.{cc,hh}
  repair: streaming: move table_check.{cc,hh} to streaming
2025-04-11 15:13:14 +03:00
Alexander Turetskiy
e97e729973 Improve compation on read of expired tombstones
compact expired tombstones in cache even if they are blocked by
commitlog

fixes #16781

Closes: #23033
2025-04-11 15:09:43 +03:00
Botond Dénes
20a1edd763 tools/scylla-nodetool: s/GetInt()/GetInt64()/
GetInt() was observed to fail when the integer JSON value overflows the
int32_t type, which `GetInt()` uses for storage. When this happens,
rapidjson will assign a distinct 64 bit integer type to the value, and
attempting to access it as 32 bit integer triggers the wrong-type error,
resulting in assert failure. This was hit on the field where invoking
nodetool netstats resulted in nodetool crashing when the streamed bytes
amounts were higher than maxint.

To avoid such bugs in the future, replace all usage of GetInt() in
nodetool of GetInt64(), just to be sure.

A reproducer is added to the nodetool netstats crash.

Fixes: scylladb/scylladb#23394

Closes scylladb/scylladb#23395

(cherry picked from commit bd8973a025)

Closes scylladb/scylladb#23475
2025-04-11 15:00:47 +03:00
Aleksandra Martyniuk
07236341ab streaming: fix the way a reason of streaming failure is determined
During streaming receiving node gets and processes mutation fragments.
If this operation fails, receiver responds with -1 status code, unless
it failed due to no_such_column_family in which case streaming of this
table should be skipped.

However, when the table was dropped, an exception handler on receiver
side may get not only data_dictionary::no_such_column_family, but also
seastar::nested_exception of two no_such_column_family.

Encountered example:
```
ERROR 2025-02-12 15:20:51,508 [shard 0:strm] stream_session - [Stream #f1cd6830-e954-11ef-afd9-b022e40bf72d] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=756dd3fe-2bf0-4dcd-afbc-cfd5202669a0: seastar::nested_exception: data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14) (while cleaning up after data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14))
```

In this case, the exception does not match the try_catch<data_dictionary::no_such_column_family>
clause and gets handled the same as any other exception type.

Replace try_catch clause with table_sync_and_check that synchronizes
the schema and check if the table exists.

Fixes: https://github.com/scylladb/scylladb/issues/22834.
(cherry picked from commit 35bc1fe276)
2025-04-11 11:55:38 +02:00
Aleksandra Martyniuk
2027b9a21b streaming: save a continuation lambda
In the following patches, an additional preemption point will be
added to the coroutine lambda in register_stream_mutation_fragments.

Assign a lambda to a variable to prolong the captures lifetime.

(cherry picked from commit 44748d624d)
2025-04-11 11:55:38 +02:00
Aleksandra Martyniuk
c1a211101d streaming: use streaming namespace in table_check.{cc,hh}
(cherry picked from commit faf3aa13db)
2025-04-11 11:55:38 +02:00
Aleksandra Martyniuk
e52fd85cf8 repair: streaming: move table_check.{cc,hh} to streaming
(cherry picked from commit 876cf32e9d)
2025-04-11 11:55:36 +02:00
Botond Dénes
6091e81e18 reader_concurrency_semaphore: register_inactive_read(): handle aborted permit
It is possible that the permit handed in to register_inactive_read() is
already aborted (currently only possible if permit timed out).
If the permit also happens to have wait for memory, the current code
will attempt to call promise<>::set_exception() on the permit's promise
to abort its waiters. But if the permit was already aborted via timeout,
this promise will already have an exception and this will trigger an
assert. Add a separate case for checking if the permit is aborted
already. If so, treat it as immediate eviction: close the reader and
clean up.

Fixes: scylladb/scylladb#22919
(cherry picked from commit 7ba29ec46c)
2025-04-11 04:04:42 -04:00
Botond Dénes
742ff76d4d test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()
Unless the test in question actually wants to test timeouts. Timeouts
will have more pronounced consequences soon and thus using
db::timeout_clock::now() becomes a sure way to make tests flaky.
To avoid this, use db::no_timeout in the tests that don't care about
timeouts.

(cherry picked from commit 4d8eb02b8d)
2025-04-11 04:04:42 -04:00
Botond Dénes
76f5816e43 Merge '[Backport 6.2] scylla sstable: Add standard extensions and propagate to schema load ' from Scylladb[bot]
Fixes #22314

Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them.

Bundles together the setup of "always on" schema extensions into a single call, and uses this from the three (3) init points.
Could have opted for static reg via `configurables`, but since we are moving to a single code base, the need for this is going away, hence explicit init seems more in line.

- (cherry picked from commit e6aa09e319)

- (cherry picked from commit 4aaf3df45e)

- (cherry picked from commit 00b40eada3)

- (cherry picked from commit 48fda00f12)

Parent PR: #22327

Closes scylladb/scylladb#23089

* github.com:scylladb/scylladb:
  tools: Add standard extensions and propagate to schema load
  cql_test_env: Use add all extensions instead of inidividually
  main: Move extensions adding to function
  tomstone_gc: Make validate work for tools
2025-04-11 11:00:41 +03:00
Lakshmi Narayanan Sreethar
1f272eade5 topology_coordinator: handle_table_migration: do not continue after executing metadata barrier
Return after executing the global metadata barrier to allow the topology
handler to handle any transitions that might have started by a
concurrect transaction.

Fixes #22792

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#22793

(cherry picked from commit 0f7d08d41d)

Closes scylladb/scylladb#23020
2025-04-11 10:59:43 +03:00
yangpeiyu2_yewu
f9cb9b905f mutation_writer/multishard_writer.cc: wrap writer into futurize_invoke
wrapped writer in seastar::futurize_invoke to make sure that the close() for the mutation_reader can be executed before destruction.

Fixes scylladb/scylladb#22790

Closes scylladb/scylladb#22812

(cherry picked from commit 0de232934a)

Closes scylladb/scylladb#22944
2025-04-11 10:59:17 +03:00
Avi Kivity
11993efc8c Merge '[Backport 6.2] row_cache: don't garbage-collect tombstones which cover data in memtables' from Scylladb[bot]
The row cache can garbage-collect tombstones in two places:
1) When populating the cache - the underlying reader pipeline has a `compacting_reader` in it;
2) During reads - reads now compact data including garbage collection;

In both cases, garbage collection has to do overlap checks against memtables, to avoid collecting tombstones which cover data in the memtables.
This PR includes fixes for (2), which were not handled at all currently.
(1) was already supposed to be fixed, see https://github.com/scylladb/scylladb/issues/20916. But the test added in this PR showed that the test is incomplete: https://github.com/scylladb/scylladb/issues/23291. A fix for this issue is also included.

Fixes: https://github.com/scylladb/scylladb/issues/23291
Fixes: https://github.com/scylladb/scylladb/issues/23252

The fix will need backport to all live release.

- (cherry picked from commit c2518cdf1a)

- (cherry picked from commit 6b5b563ef7)

- (cherry picked from commit 7e600a0747)

- (cherry picked from commit d126ea09ba)

- (cherry picked from commit cb76cafb60)

- (cherry picked from commit df09b3f970)

- (cherry picked from commit e5afd9b5fb)

- (cherry picked from commit 34b18d7ef4)

- (cherry picked from commit f7938e3f8b)

- (cherry picked from commit 6c1f6427b3)

- (cherry picked from commit 0d39091df2)

Parent PR: #23255

Closes scylladb/scylladb#23671

* github.com:scylladb/scylladb:
  test/boost/row_cache_test: add memtable overlap check tests
  replica/table: add error injection to memtable post-flush phase
  utils/error_injection: add a way to set parameters from error injection points
  test/cluster: add test_data_resurrection_in_memtable.py
  test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts
  replica/mutation_dump: don't assume cells are live
  replica/database: do_apply() add error injection point
  replica: improve memtable overlap checks for the cache
  replica/memtable: add is_merging_to_cache()
  db/row_cache: add overlap-check for cache tombstone garbage collection
  mutation/mutation_compactor: copy key passed-in to consume_new_partition()
2025-04-10 21:41:52 +03:00
Botond Dénes
5fb8a6dae2 mutation/frozen_mutation: frozen_mutation_consumer_adaptor: fix end-of-partition handling
This adaptor adapts a mutation reader pausable consumer to the frozen
mutation visitor interface. The pausable consumer protocol allows the
consumer to skip the remaining parts of the partition and resume the
consumption with the next one. To do this, the consumer just has to
return stop_iteration::yes from one of the consume() overloads for
clustering elements, then return stop_iteration::no from
consume_end_of_partition(). Due to a bug in the adaptor, this sequence
leads to terminating the consumption completely -- so any remaining
partitions are also skipped.

This protocol implementation bug has user-visible effects, when the
only user of the adaptor -- read repair -- happens during a query which
has limitations on the amount of content in each partition.
There are two such queries: select distinct ... and select ... with
partition limit. When converting the repaired mutation to to query
result, these queries will trigger the skip sequence in the consumer and
due to the above described bug, will skip the remaining partitions in
the results, omitting these from the final query result.

This patch fixes the protocol bug, the return value of the underlying
consumer's consume_end_of_partition() is now respected.

A unit test is also added which reproduces the problem both with select
distinct ... and select ... per partition limit.

Follow-up work:
* frozen_mutation_consumer_adaptor::on_end_of_partition() calls the
  underlying consumer's on_end_of_stream(), so when consuming multiple
  frozen mutations, the underlying's on_end_of_stream() is called for
  each partition. This is incorrect but benign.
* Improve documentation of mutation_reader::consume_pausable().

Fixes: #20084

Closes scylladb/scylladb#23657

(cherry picked from commit d67202972a)

Closes scylladb/scylladb#23693
2025-04-10 21:38:13 +03:00
Botond Dénes
a157b3e62f test/boost/row_cache_test: add memtable overlap check tests
Similar to test/cluster/test_data_resurrection_in_memtable.py but works
on a single node and uses more low-level mechanism. These tests can also
reproduce more advanced scenarios, like concurrent reads, with some
reading from flushed memtables.

(cherry picked from commit 0d39091df2)
2025-04-10 07:33:09 -04:00
Botond Dénes
ce1d990dd6 replica/table: add error injection to memtable post-flush phase
After the memtable was flushed to disk, but before it is merged to
cache. The injection point will only active for the table specified in
the "table_name" injection parameter.

(cherry picked from commit 6c1f6427b3)
2025-04-10 07:33:09 -04:00
Botond Dénes
37b51871ec utils/error_injection: add a way to set parameters from error injection points
With this, now it is possible to have two-way communication between
the error injection point and its enabler. The test can enable the error
injection point, then wait until it is hit, before proceedin.

(cherry picked from commit f7938e3f8b)
2025-04-10 07:33:09 -04:00
Botond Dénes
ac18570069 test/cluster: add test_data_resurrection_in_memtable.py
Reproducers for #23252 and #23291 -- cache garbage
collecting tombstones resurrecting data in the memtable.

(cherry picked from commit 34b18d7ef4)
2025-04-10 07:33:09 -04:00
Botond Dénes
990e92d7cf test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts
Such that a given index in the return hosts refers to the same
underlying Scylla instance, as the same index in the passed-in nodes
list. This is what users of this method intuitively expect, but
currently the returned hosts list is unordered (has random order).

(cherry picked from commit e5afd9b5fb)
2025-04-10 07:33:09 -04:00
Botond Dénes
67a56ae192 replica/mutation_dump: don't assume cells are live
Currently the dumper unconditionally extracts the value of atomic cells,
assuming they are live. This doesn't always hold of course and
attempting to get the value of a dead cell will lead to marshalling
errors. Fix by checking is_live() before attempting to get the cell
value. Fix for both regular and collection cells.

(cherry picked from commit df09b3f970)
2025-04-10 07:33:09 -04:00
Botond Dénes
85a7a9cb05 replica/database: do_apply() add error injection point
So writes (to user tables) can be failed on a replica, via error
injection. Should simplify tests which want to create differences in
what writes different replicas receive.

(cherry picked from commit cb76cafb60)
2025-04-10 07:33:09 -04:00
Botond Dénes
95205a1b29 replica: improve memtable overlap checks for the cache
The current memtable overlap check that is used by the cache
-- table::get_max_purgeable_fn_for_cache_underlying_reader() -- only
checks the active memtable, so memtables which are either being flushed
or are already flushed and also have active reads against them do not
participate in the overlap check.
This can result in temporary data resurrection, where a cache read can
garbage-collect a tombstone which still covers data in a flushing or
flushed memtable, which still have active read against it.

To prevent this, extend the overlap check to also consider all of the
memtable list. Furthermore, memtable_list::erase() now places the removed
(flushed) memtable in an intrusive list. These entries are alive only as
long as there are readers still keeping an `lw_shared_ptr<memtable>`
alive. This list is now also consulted on overlap checks.

(cherry picked from commit d126ea09ba)
2025-04-10 07:33:09 -04:00
Botond Dénes
ef423eb4c7 replica/memtable: add is_merging_to_cache()
And set it when the memtable is merged to cache.

(cherry picked from commit 7e600a0747)
2025-04-10 07:33:08 -04:00
Botond Dénes
d10a2688b1 db/row_cache: add overlap-check for cache tombstone garbage collection
The cache should not garbage-collect tombstone which cover data in the
memtable. Add overlap checks (get_max_purgeable) to garbage collection
to detect tombstones which cover data in the memtable and to prevent
their garbage collection.

(cherry picked from commit 6b5b563ef7)
2025-04-10 07:33:08 -04:00
Botond Dénes
4647aa0366 mutation/mutation_compactor: copy key passed-in to consume_new_partition()
This doesn't introduce additional work for single-partition queries: the
key is copied anyway on consume_end_of_stream().
Multi-partition reads and compaction are not that sensitive to
additional copy added.

This change fixes a bug in the compacting_reader: currently the reader
passes _last_uncompacted_partition_start.key() to the compactor's
consume_new_partition(). When the compactor emits enough content for this
partition, _last_uncompacted_partition_start is moved from to emit the
partition start, this makes the key reference passed to the compaction
corrupt (refer to moved-from value). This in turn means that subsequent
GC checks done by the compactor will be done with a corrupt key and
therefore can result in tombstone being garbage-collected while they
still cover data elsewhere (data resurrection).

The compacting reader is violating the API contract and normally the bug
should be fixed there. We make an exception here because doing the fix
in the mutation compactor better aligns with our future plans:
* The fix simplifies the compactor (gets rid of _last_dk).
* Prepares the way to get rid of the consume API used by the compactor.

(cherry picked from commit c2518cdf1a)
2025-04-09 14:02:30 +00:00
Michał Chojnowski
664d36c737 table: fix a race in table::take_storage_snapshot()
`safe_foreach_sstable` doesn't do its job correctly.

It iterates over an sstable set under the sstable deletion
lock in an attempt to ensure that SSTables aren't deleted during the iteration.

The thing is, it takes the deletion lock after the SSTable set is
already obtained, so SSTables might get unlinked *before* we take the lock.

Remove this function and fix its usages to obtain the set and iterate
over it under the lock.

Closes scylladb/scylladb#23397

(cherry picked from commit e23fdc0799)

Closes scylladb/scylladb#23627
2025-04-08 19:06:42 +03:00
Lakshmi Narayanan Sreethar
90add328ad replica/table::do_apply : do not check for async gate's closure
The `table::do_apply()` method verifies if the compaction group's async
gate is open to determine if the compaction group is active. Closing
this async gate prevents any new operations but waits for existing
holders to exit, allowing their operations to complete. When holding a
gate, holders will observe the gate as closed when it is being closed,
but this is irrelevant as they are already inside the gate and are
allowed to complete. All the callers of `table::do_apply()` already
enter the gate before calling the method. So, the async gate check
inside `table::do_apply()` will erroneously throw an exception when the
compaction group is closing despite holding the gate. This commit
removes the check to prevent this from happening.

Fixes #23348

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#23579

(cherry picked from commit 750f4baf44)

Closes scylladb/scylladb#23644
2025-04-08 18:58:35 +03:00
Yaron Kaikov
f4909aafc7 .github: Make "make-pr-ready-for-review" workflow run in base repo
in 57683c1a50 we fixed the `token` error,
but removed the checkout part which causing now the following error
```
failed to run git: fatal: not a git repository (or any of the parent directories): .git
```
Adding the repo checkout stage to avoid such error

Fixes: https://github.com/scylladb/scylladb/issues/22765

Closes scylladb/scylladb#23641

(cherry picked from commit 2dc7ea366b)

Closes scylladb/scylladb#23653
2025-04-08 13:47:50 +03:00
Kefu Chai
9e3eb4329c .github: Make "make-pr-ready-for-review" workflow run in base repo
The "make-pr-ready-for-review" workflow was failing with an "Input
required and not supplied: token" error.  This was due to GitHub Actions
security restrictions preventing access to the token when the workflow
is triggered in a fork:
```
    Error: Input required and not supplied: token
```

This commit addresses the issue by:

- Running the workflow in the base repository instead of the fork. This
  grants the workflow access to the required token with write permissions.
- Simplifying the workflow by using a job-level `if` condition to
  controlexecution, as recommended in the GitHub Actions documentation
  (https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/using-conditions-to-control-job-execution).
  This is cleaner than conditional steps.
- Removing the repository checkout step, as the source code is not required for this workflow.

This change resolves the token error and ensures the
"make-pr-ready-for-review" workflow functions correctly.

Fixes scylladb/scylladb#22765

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22766

(cherry picked from commit ca832dc4fb)

Closes scylladb/scylladb#23617
2025-04-07 08:11:09 +03:00
Kefu Chai
4ac3f82df9 dist: systemd: use default KillMode
before this change, we specify the KillMode of the scylla-service
service unit explicitly to "process". according to
according to
https://www.freedesktop.org/software/systemd/man/latest/systemd.kill.html,

> If set to process, only the main process itself is killed (not recommended!).

and the document suggests use "control-group" over "process".
but scylla server is not a multi-process server, it is a multi-threaded
server. so it should not make any difference even if we switch to
the recommended "control-group".

in the light that we've been seeing "defunct" scylla process after
stopping the scylla service using systemd. we are wondering if we should
try to change the `KillMode` to "control-group", which is the default
value of this setting.

in this change, we just drop the setting so that the systemd stops the
service by stopping all processes in the control group of this unit
are stopped.

Fixes scylladb/scylladb#21507

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

(cherry picked from commit 961a53f716)

Closes scylladb/scylladb#23176
2025-04-04 17:55:04 +03:00
Vlad Zolotarov
e3d063a070 CQL Tracing: set common query parameters in a single function
Each query-type (QUERY, EXECUTE, BATCH) CQL opcode has a number of parameters
in their payload which we always want to record in the Tracing object.
Today it's a Consistency Level, Serial Consistency Level and a Default Timestamp.

Setting each of them individually can lead to a human error when one (or more) of
them would not be set. Let's eliminate such a possibility by defining
a single function that sets them all.

This also allows an easy addition of such parameters to this function in
the future.
2025-04-02 14:10:03 -04:00
Vlad Zolotarov
1b2ab34647 transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing
A default timestamp (not to confuse with the timestamp passed via 'USING TIMESTAMP' query clause)
can be set using 0x20 flag and the <timestamp> field in the binary CQL frame payload of
QUERY, EXECUTE and BATCH ops. It also happens to be a default of a Java CQL Driver.

However, we were only setting the corresponding info in the CQL Tracing context of a QUERY operation.
For an unknown reason we were not setting this for an EXECUTE and for a BATCH traces (I guess I simply forgot to
set it back then).

This patch fixes this.

Fixes #23173

(cherry picked from commit ca6bddef35)
2025-04-01 11:45:33 +00:00
Yaron Kaikov
b67329b34e .github: add action to make PR ready for review when conflicts label was removed
Moving a PR out of draft is only allowed to users with write access,
adding a github action to switch PR to `ready for review` once the
`conflicts` label was removed

Closes scylladb/scylladb#22446

(cherry picked from commit ed4bfad5c3)

Closes scylladb/scylladb#23006
2025-03-30 11:59:40 +03:00
Kefu Chai
48ff7cf61c gms: Fix fmt formatter for gossip_digest_sync
In commit 4812a57f, the fmt-based formatter for gossip_digest_syn had
formatting code for cluster_id, partitioner, and group0_id
accidentally commented out, preventing these fields from being included
in the output. This commit restores the formatting by uncommenting the
code, ensuring full visibility of all fields in the gossip_digest_syn
message when logging permits.

This fixes a regression introduced in 4812a57f, which obscured these
fields and reduced debugging insight. Backporting is recommended for
improved observability.

Fixes #23142
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23155

(cherry picked from commit 2a9966a20e)

Closes scylladb/scylladb#23198
2025-03-30 11:57:15 +03:00
Kefu Chai
18d5af1cd3 storage_proxy: Prevent integer overflow in abstract_read_executor::execute
Fix UBSan abort caused by integer overflow when calculating time difference
between read and write operations. The issue occurs when:
1. The queried partition on replicas is not purgeable (has no recorded
   modified time)
2. Digests don't match across replicas
3. The system attempts to calculate timespan using missing/negative
   last_modified timestamps

This change skips cross-DC repair optimization when write timestamp is
negative or missing, as this optimization is only relevant for reads
occurring within write_timeout of a write.

Error details:
```
service/storage_proxy.cc:5532:80: runtime error: signed integer overflow: -9223372036854775808 - 1741940132787203 cannot be represented in type 'int64_t' (aka 'long')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior service/storage_proxy.cc:5532:80
Aborting on shard 1, in scheduling group sl:default
```

Related to previous fix 39325cf which handled negative read_timestamp cases.

Fixes #23314
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23359

(cherry picked from commit ebf9125728)

Closes scylladb/scylladb#23386
2025-03-30 11:54:59 +03:00
Tomasz Grabiec
6cdd1cccdc test: tablets: Fix flakiness due to ungraceful shutdown
The test fails sporadically with:

cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1}

That's becase a server is stopped in the middle of the workload.

The server is stopped ungracefully which will cause some requests to
time out. We should stop it gracefully to allow in-flight requests to
finish.

Fixes #20492

Closes scylladb/scylladb#23451

(cherry picked from commit 8e506c5a8f)

Closes scylladb/scylladb#23468
2025-03-28 14:56:39 +01:00
Anna Stuchlik
3f0e52a5ee doc: zero-token nodes and Arbiter DC
This commit adds documentation for zero-token nodes and an explanation
of how to use them to set up an arbiter DC to prevent a quorum loss
in multi-DC deployments.

The commit adds two documents:
- The one in Architecture describes zero-token nodes.
- The other in Cluster Management explains how to use them.

We need separate documents because zero-token nodes may be used
for other purposes in the future.

In addition, the documents are cross-linked, and the link is added
to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document.

Refs https://github.com/scylladb/scylladb/pull/19684

Fixes https://github.com/scylladb/scylladb/issues/20294

Closes scylladb/scylladb#21348

(cherry picked from commit 9ac0aa7bba)

Closes scylladb/scylladb#23200
2025-03-10 10:52:13 +01:00
Piotr Dulikowski
9fc27b734f test: test_mv_topology_change: increase timeout for removenode
The test `test_mv_topology_change` is a regression test for
scylladb/scylladb#19529. The problem was that CL=ANY writes issued when
all replicas were down would be kept in memory until the timeout. In
particular, MV updates are CL=ANY writes and have a 5 minute timeout.
When doing topology operations for vnodes or when migrating tablet
replicas, the cluster goes through stages where the replica sets for
writes undergo changes, and the writes started with the old replica set
need to be drained first.

Because of the aforementioned MV updates, the removenode operation could
be delayed by 5 minutes or more. Therefore, the
`test_mv_topology_change` test uses a short timeout for the removenode
operation, i.e. 30s. Apparently, this is too low for the debug mode and
the test has been observed to time out even though the removenode
operation is progressing fine.

Increase the timeout to 60s. This is the lowest timeout for the
removenode operation that we currently use among the in-repo tests, and
is lower than 5 minutes so the test will still serve its purpose.

Fixes: scylladb/scylladb#22953

Closes scylladb/scylladb#22958

(cherry picked from commit 43ae3ab703)

Closes scylladb/scylladb#23052
2025-03-04 16:04:00 +01:00
Wojciech Mitros
8d392229cc test: add test for schema registry maintaining base info for views
In this patch we test the behavior of schema registry in a few
scenarios where it was identified it could misbehave.

The first one is reverse schemas for views. Previously, SELECT
queries with reverse order on views could fail because we didn't
have base info in the registry for such schemas.

The second one is schemas that temporarily died in the registry.
This can happen when, while processing a query for a given schema
version, all related schema_ptrs were destroyed, but this schema
was requested before schema_registry::grace_period() has passed.
In this scenario, the base info would not be recovered, causing
errors.

(cherry picked from commit 74cbc77f50)
2025-03-03 13:57:45 +01:00
Wojciech Mitros
3a1d2cbeb6 schema_registry: avoid setting base info when getting the schema from registry
After the previous patches, the view schemas returned by schema registry
always have their base info set. As such, we no longer need to set it after
getting the view schema from the registry. This patch removes these
unnecessary updates.

(cherry picked from commit 3094ff7cbe)
2025-03-03 13:49:11 +01:00
Calle Wilund
ecbb765b3f tools: Add standard extensions and propagate to schema load
Fixes #22314

Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them.

(cherry picked from commit 48fda00f12)
2025-02-27 17:46:28 +00:00
Calle Wilund
3bf553efa1 cql_test_env: Use add all extensions instead of inidividually
(cherry picked from commit 00b40eada3)
2025-02-27 17:46:28 +00:00
Calle Wilund
9500c8c9cb main: Move extensions adding to function
Easily called from elsewhere. The extensions we should always include (oxymoron?)

(cherry picked from commit 4aaf3df45e)
2025-02-27 17:46:28 +00:00
Calle Wilund
4c735abb62 tomstone_gc: Make validate work for tools
Don't crash if validation is done as part of loading a schema from file (schema.cql)

(cherry picked from commit e6aa09e319)
2025-02-27 17:46:28 +00:00
Wojciech Mitros
b0ab86a8ad schema_registry: update cached base schemas when updating a view
The schema registry now holds base schemas for view schemas.
The base schema may change without changing the view schema, so to
preserve the change in the schema registry, we also update the
base schema in the registry when updating the base info in the
view schema.

(cherry picked from commit 82f2e1b44c)
2025-02-25 16:55:03 +00:00
Wojciech Mitros
1cf288dd97 schema_registry: cache base schemas for views
Currently, when we load a frozen schema into the registry, we lose
the base info if the schema was of a view. Because of that, in various
places we need to set the base info again, and in some codepaths we
may miss it completely, which may make us unable to process some
requests (for example, when executing reverse queries on views).
Even after setting the base info, we may still lose it if the schema
entry gets deactivated.

To fix this, this patch adds the base schema to the registry, alongside
the view schema. With the base schema, we can now set the base
info when returning the schema from the registry. As a result, we can now
assume that all view schemas returned by the registry have base_info set.

To store the base schema, the loader methods now have to return the base
schema alongside the view schema. At the same time, when loading into
the registry, we need to check whether we're loading a view schema, and if
so, we need to also provide the base schema. When inserting a regular table
schema, the base schema should be a disengaged optional.

(cherry picked from commit dfe3810f64)
2025-02-25 16:55:03 +00:00
Wojciech Mitros
40312f482e db: set base info before adding schema to registry
In the following patches, we'll assure that view schemas returned by the
schema registry always have base info set. To prepare for that, make sure
that the base info is always set before inserting it into schema registry,

(cherry picked from commit 6f11edbf3f)
2025-02-25 16:55:03 +00:00
Benny Halevy
9fd5909a5e token_group_based_splitting_mutation_writer: maybe_switch_to_new_writer: prevent double close
Currently, maybe_switch_to_new_writer resets _current_writer
only in a continuation after closing the current writer.
This leaves a window of vulnerability if close() yields,
and token_group_based_splitting_mutation_writer::close()
is called. Seeing the engaged _current_writer, close()
will call _current_writer->close() - which must be called
exactly once.

Solve this when switching to a new writer by resetting
_current_writer before closing it and potentially yielding.

Fixes #22715

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22922

(cherry picked from commit 29b795709b)

Closes scylladb/scylladb#22964
2025-02-23 14:27:11 +02:00
Botond Dénes
103a986eca Merge '[Backport 6.2] reader_concurrency_semaphore: set_notify_handler(): disable timeout ' from Scylladb[bot]
`set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature eviction.

Fixes: https://github.com/scylladb/scylladb/issues/22629

Backport required to all active releases, they are all affected.

- (cherry picked from commit a3ae0c7cee)

- (cherry picked from commit 9174f27cc8)

Parent PR: #22701

Closes scylladb/scylladb#22751

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: set_notify_handler(): disable timeout
  reader_permit: mark check_abort() as const
2025-02-19 09:59:47 +02:00
Botond Dénes
1d4ea169e3 reader_concurrency_semaphore: set_notify_handler(): disable timeout
set_notify_handler() is called after a querier was inserted into the
querier cache. It has two purposes: set a callback for eviction and set
a TTL for the cache entry. This latter was not disabling the
pre-existing timeout of the permit (if any) and this would lead to
premature eviction of the cache entry if the timeout was shorter than
TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature
eviction.

Fixes: #scylladb/scylladb#22629
(cherry picked from commit 9174f27cc8)
2025-02-18 04:48:22 -05:00
Botond Dénes
86c9bc778a tools/scylla-nodetool: netstats: don't assume both senders and receivers
The code currently assumes that a session has both sender and receiver
streams, but it is possible to have just one or the other.
Change the test to include this scenario and remove this assumption from
the code.

Fixes: #22770

Closes scylladb/scylladb#22771

(cherry picked from commit 87e8e00de6)

Closes scylladb/scylladb#22873
2025-02-18 10:35:10 +02:00
Botond Dénes
4acb366a28 service/storage_proxy: schedule_repair(): materialize the range into a vector
Said method passes down its `diff` input to `mutate_internal()`, after
some std::ranges massaging. Said massaging is destructive -- it moves
items from the diff. If the output range is iterated-over multiple
times, only the first time will see the actual output, further
iterations will get an empty range.
When trace-level logging is enabled, this is exactly what happens:
`mutate_internal()` iterates over the range multiple times, first to log
its content, then to pass it down the stack. This ends up resulting in
a range with moved-from elements being pased down and consequently write
handlers being created with nullopt mutations.

Make the range re-entrant by materializing it into a vector before
passing it to `mutate_internal()`.

Fixes: scylladb/scylladb#21907
Fixes: scylladb/scylladb#21714

Closes scylladb/scylladb#21910

(cherry picked from commit 7150442f6a)

Closes scylladb/scylladb#22853
2025-02-18 10:34:42 +02:00
Botond Dénes
24eb8f49ba service: query_pager: fix last-position for filtering queries
On short-pages, cut short because of a tombstone prefix.
When page-results are filtered and the filter drops some rows, the
last-position is taken from the page visitor, which does the filtering.
This means that last partition and row position will be that of the last
row the filter saw. This will not match the last position of the
replica, when the replica cut the page due to tombstones.
When fetching the next page, this means that all the tombstone suffix of
the last page, will be re-fetched. Worse still: the last position of the
next page will not match that of the saved reader left on the replica, so
the saved reader will be dropped and a new one created from scratch.
This wasted work will show up as elevated tail latencies.
Fix by always taking the last position from raw query results.

Fixes: #22620

Closes scylladb/scylladb#22622

(cherry picked from commit 7ce932ce01)

Closes scylladb/scylladb#22718
2025-02-13 15:15:24 +02:00
Botond Dénes
8e6648870b reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload
with_permit() creates a permit, with a self-reference, to avoid
attaching a continuation to the permit's run function. This
self-reference is used to keep the permit alive, until the execution
loop processes it. This self reference has to be carefully cleared on
error-paths, otherwise the permit will become a zombie, effectively
leaking memory.
Instead of trying to handle all loose ends, get rid of this
self-reference altogether: ask caller to provide a place to save the
permit, where it will survive until the end of the call. This makes the
call-site a little bit less nice, but it gets rid of a whole class of
possible bugs.

Fixes: #22588

Closes scylladb/scylladb#22624

(cherry picked from commit f2d5819645)

Closes scylladb/scylladb#22703
2025-02-13 15:03:57 +02:00
Botond Dénes
77696b1e43 reader_concurrency_semaphore: foreach_permit(): include _inactive_reads
So inactive reads show up in semaphore diagnostics dumps (currently the
only non-test user of this method).

Fixes: #22574

Closes scylladb/scylladb#22575

(cherry picked from commit e1b1a2068a)

Closes scylladb/scylladb#22610
2025-02-13 15:03:23 +02:00
Aleksandra Martyniuk
3497ba7f60 replica: mark registry entry as synch after the table is added
When a replica get a write request it performs get_schema_for_write,
which waits until the schema is synced. However, database::add_column_family
marks a schema as synced before the table is added. Hence, the write may
see the schema as synced, but hit no_such_column_family as the table
hasn't been added yet.

Mark schema as synced after the table is added to database::_tables_metadata.

Fixes: #22347.

Closes scylladb/scylladb#22348

(cherry picked from commit 328818a50f)

Closes scylladb/scylladb#22603
2025-02-13 15:02:59 +02:00
Aleksandra Martyniuk
ade0fe2d7a nodetool: tasks: print empty string for start_time/end_time if unspecified
If start_time/end_time is unspecified for a task, task_manager API
returns epoch. Nodetool prints the value in task status.

Fix nodetool tasks commands to print empty string for start_time/end_time
if it isn't specified.

Modify nodetool tasks status docs to show empty end_time.

Fixes: #22373.

Closes scylladb/scylladb#22370

(cherry picked from commit 477ad98b72)

Closes scylladb/scylladb#22600
2025-02-13 13:26:54 +02:00
Jenkins Promoter
72cf5ef576 Update ScyllaDB version to: 6.2.4 2025-02-09 16:52:35 +02:00
Botond Dénes
2978ed58a2 reader_permit: mark check_abort() as const
All it does is read one field, making it const makes using it easier.

(cherry picked from commit a3ae0c7cee)
2025-02-09 00:32:13 +00:00
Tomasz Grabiec
6922acb69f Merge '[Backport 6.2] split: run set_split_mode() on all storage groups during all_storage_groups_split()' from Scylladb[bot]
`tablet_storage_group_manager::all_storage_groups_split()` calls `set_split_mode()` for each of its storage groups to create split ready compaction groups. It does this by iterating through storage groups using `std::ranges::all_of()` which is not guaranteed to iterate through the entire range, and will stop iterating on the first occurrence of the predicate (`set_split_mode()`) returning false. `set_split_mode()` creates the split compaction groups and returns false if the storage group's main compaction group or merging groups are not empty. This means that in cases where the tablet storage group manager has non-empty storage groups, we could have a situation where split compaction groups are not created for all storage groups.

The missing split compaction groups are later created in `tablet_storage_group_manager::split_all_storage_groups()` which also calls `set_split_mode()`, and that is the reason why split completes successfully. The problem is that
`tablet_storage_group_manager::all_storage_groups_split()` runs under a group0 guard, but
`tablet_storage_group_manager::split_all_storage_groups()` does not. This can cause problems with operations which should exclude with compaction group creation. i.e. DROP TABLE/DROP KEYSPACE

Fixes #22431

This is a bugfix and should be back ported to versions with tablets: 6.1 6.2 and 2025.1

- (cherry picked from commit 24e8d2a55c)

- (cherry picked from commit 8bff7786a8)

Parent PR: #22330

Closes scylladb/scylladb#22559

* github.com:scylladb/scylladb:
  test: add reproducer and test for fix to split ready CG creation
  table: run set_split_mode() on all storage groups during all_storage_groups_split()
2025-02-07 14:22:57 +01:00
Tomasz Grabiec
61e303a3e3 locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
In that case, new_racks will be used, but when we discover no
candidates, we try to pop from existing_racks.

Fixes #22625

Closes scylladb/scylladb#22652

(cherry picked from commit e22e3b21b1)

Closes scylladb/scylladb#22721
2025-02-06 16:47:14 +01:00
Avi Kivity
8ede62d288 Update seastar submodule (hwloc failures on some AWS instances)
* seastar ec5da7a606...e40388c4c7 (1):
  > resource: fallback to sysconf when failed to detect memory size from hwloc

Fixes #22382
2025-02-04 16:29:45 +02:00
Avi Kivity
4ac9c710fc Merge '[Backport 6.2] api: task_manager: do not unregister finish task when its status is queried' from Scylladb[bot]
Currently, when the status of a task is queried and the task is already finished,
it gets unregistered. Getting the status shouldn't be a one-time operation.

Stop removing the task after its status is queried. Adjust tests not to rely
on this behavior. Add task_manager/drain API and nodetool tasks drain
command to remove finished tasks in the module.

Fixes: https://github.com/scylladb/scylladb/issues/21388.

It's a fix to task_manager API, should be backported to all branches

- (cherry picked from commit e37d1bcb98)

- (cherry picked from commit 18cc79176a)

Parent PR: #22310

Closes scylladb/scylladb#22597

* github.com:scylladb/scylladb:
  api: task_manager: do not unregister tasks on get_status
  api: task_manager: add /task_manager/drain
2025-02-03 23:04:31 +02:00
Avi Kivity
34fa9bd586 Merge '[Backport 6.2] Simplify loading_cache_test and use manual_clock' from Scylladb[bot]
This series exposes a Clock template parameter for loading_cache so that the test could use
the manual_clock rather than the lowres_clock, since relying on the latter is flaky.

In addition, the test load function is simplified to sleep some small random time and co_return the expected string,
rather than reading it from a real file, since the latter's timing might also be flaky, and it out-of-scope for this test.

Fixes #20322

* The test was flaky forever, so backport is required for all live versions.

- (cherry picked from commit b509644972)

- (cherry picked from commit 934a9d3fd6)

- (cherry picked from commit d68829243f)

- (cherry picked from commit b258f8cc69)

- (cherry picked from commit 0841483d68)

- (cherry picked from commit 32b7cab917)

Parent PR: #22064

Closes scylladb/scylladb#22640

* github.com:scylladb/scylladb:
  tests: loading_cache_test: use manual_clock
  utils: loading_cache: make clock_type a template parameter
  test: loading_cache_test: use function-scope loader
  test: loading_cache_test: simlute loader using sleep
  test: lib: eventually: add sleep function param
  test: lib: eventually: make *EVENTUALLY_EQUAL inline functions
2025-02-03 22:56:31 +02:00
Benny Halevy
79bff0885c tests: loading_cache_test: use manual_clock
Relying on a real-time clock like lowres_clock
can be flaky (in particular in debug mode).
Use manual_clock instead to harden the test against
timing issues.

Fixes #20322

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 32b7cab917)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-03 16:15:46 +02:00
Benny Halevy
abf8f44e03 utils: loading_cache: make clock_type a template parameter
So the unit test can use manual_clock rather than lowres_clock
which can be flaky (in particular in debug mode).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0841483d68)
2025-02-03 16:02:37 +02:00
Benny Halevy
00f1dcfd09 test: loading_cache_test: use function-scope loader
Rather than a global function, accessing a thread-local `load_count`.
The thread-local load_count cannot be used when multiple test
cases run in parallel.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit b258f8cc69)
2025-02-03 16:01:53 +02:00
Benny Halevy
b0166a3a9c test: loading_cache_test: simlute loader using sleep
This test isn't about reading values from file,
but rather it's about the loading_cache.
Reading from the file can sometimes take longer than
the expected refresh times, causing flakiness (see #20322).

Rather than reading a string from a real file, just
sleep a random, short time, and co_return the string.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit d68829243f)
2025-02-03 16:00:51 +02:00
Benny Halevy
7addc3454d test: lib: eventually: add sleep function param
To allow support for manual_clock instead of seastar::sleep.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 934a9d3fd6)
2025-02-03 16:00:47 +02:00
Benny Halevy
9d5e3f050e test: lib: eventually: make *EVENTUALLY_EQUAL inline functions
rather then macros.

This is a first cleanup step before adding a sleep function
parameter to support also manual_clock.

Also, add a call to BOOST_REQUIRE_EQUAL/BOOST_CHECK_EQUAL,
respectively, to make an error more visible in the test log
since those entry points print the offending values
when not equal.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit b509644972)
2025-02-03 15:50:22 +02:00
Michael Litvak
c7421a8804 view_builder: fix loop in view builder when tokens are moved
The view builder builds a view by going over the entire token ring,
consuming the base table partitions, and generating view updates for
each partition.

A view is considered as built when we complete a full cycle of the
token ring. Suppose we start to build a view at a token F. We will
consume all partitions with tokens starting at F until the maximum
token, then go back to the minimum token and consume all partitions
until F, and then we detect that we pass F and complete building the
view. This happens in the view builder consumer in
`check_for_built_views`.

The problem is that we check if we pass the first token F with the
condition `_step.current_token() >= it->first_token` whenever we consume
a new partition or the current_token goes back to the minimum token.
But suppose that we don't have any partitions with a token greater than
or equal to the first token (this could happen if the partition with
token F was moved to another node for example), then this condition will never be
satisfied, and we don't detect correctly when we pass F. Instead, we
go back to the minimum token, building the same token ranges again,
in a possibly infinite loop.

To fix this we add another step when reaching the end of the reader's
stream. When this happens it means we don't have any more fragments to
consume until the end of the range, so we advance the current_token to
the end of the range, simulating a partition, and check for built views
in that range.

Fixes scylladb/scylladb#21829

Closes scylladb/scylladb#22493

(cherry picked from commit 6d34125eb7)

Closes scylladb/scylladb#22606
2025-02-03 13:27:28 +01:00
Avi Kivity
700402a7bb seatar: point submodule at scylla-seastar.git
This allows backporting commits to seastar.
2025-01-31 19:49:15 +02:00
Aleksandra Martyniuk
424fab77d2 api: task_manager: do not unregister tasks on get_status
Currently, /task_manager/task_status_recursive/{task_id} and
/task_manager/task_status/{task_id} unregister queries task if it
has already finished.

The status should not disappear after being queried. Do not unregister
finished task when its status or recursive status is queried.

(cherry picked from commit 18cc79176a)
2025-01-31 10:12:46 +01:00
Aleksandra Martyniuk
5d16157936 api: task_manager: add /task_manager/drain
In the following patches, get_status won't be unregistering finished
tasks. However, tests need a functionality to drop a task, so that
they could manipulate only with the tasks for operations that were
invoked by these tests.

Add /task_manager/drain/{module} to unregister all finished tasks
from the module. Add respective nodetool command.

(cherry picked from commit e37d1bcb98)
2025-01-31 10:11:57 +01:00
Aleksandra Martyniuk
e7d891b629 repair: add repair_service gate
In main.cc storage_service is started before and stopped after
repair_service. storage_service keeps a reference to sharded
repair_service and calls its methods, but nothing ensures that
repair_service's local instance would be alive for the whole
execution of the method.

Add a gate to repair_service and enter it in storage_service
before executing methods on local instances of repair_service.

Fixes: #21964.

Closes scylladb/scylladb#22145

(cherry picked from commit 32ab58cdea)

Closes scylladb/scylladb#22318
2025-01-30 11:39:46 +02:00
Anna Stuchlik
e57e3b4039 doc: add troubleshooting removal with --autoremove-ubuntu
This commit adds a troubleshooting article on removing ScyllaDB
with the --autoremove option.

Fixes https://github.com/scylladb/scylladb/issues/21408

Closes scylladb/scylladb#21697

(cherry picked from commit 8d824a564f)

Closes scylladb/scylladb#22232
2025-01-29 20:24:24 +02:00
Kefu Chai
2514f50f7f docs: fix monospace formatting for rm command
Add missing space before `rm` to ensure proper rendering
in monospace font within documentation.

Fixes scylladb/scylladb#22255
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21576

(cherry picked from commit 6955b8238e)

Closes scylladb/scylladb#22257
2025-01-29 20:23:22 +02:00
Botond Dénes
e74c8372a9 tools/scylla-sstable: dump-statistics: fix handling of {min,max}_column_names
Said fields in statistics are of type
`disk_array<uint32_t, disk_string<uint16_t>>` and currently are handled
as array of regular strings. However these fields store exploded
clustering keys, so the elements store binary data and converting to
string can yield invalid UTF-8 characters that certain JSON parsers (jq,
or python's json) can choke on. Fix this by treating them as binary and
using `to_hex()` to convert them to string. This requires some massaging
of the json_dumper: passing field offset to all visit() methods and
using a caller-provided disk-string to sstring converter to convert disk
strings to sstring, so in the case of statistics, these fields can be
intercepted and properly handled.

While at it, the type of these fields is also fixed in the
documentation.

Before:

    "min_column_names": [
      "��Z���\u0011�\u0012ŷ4^��<",
      "�2y\u0000�}\u007f"
    ],
    "max_column_names": [
      "��Z���\u0011�\u0012ŷ4^��<",
      "}��B\u0019l%^"
    ],

After:

    "min_column_names": [
      "9dd55a92bc8811ef12c5b7345eadf73c",
      "80327900e2827d7f"
    ],
    "max_column_names": [
      "9dd55a92bc8811ef12c5b7345eadf73c",
      "7df79242196c255e"
    ],

Fixes: #22078

Closes scylladb/scylladb#22225

(cherry picked from commit f899f0e411)

Closes scylladb/scylladb#22296
2025-01-29 20:22:52 +02:00
Botond Dénes
aa16d736dc Merge '[Backport 6.2] sstable_directory: do not load remote unshared sstables in process_descriptor()' from Lakshmi Narayanan
The sstable loader relied on the generation id to provide an efficient
hint about the shard that owns an sstable. But, this hint was rendered
ineffective with the introduction of UUID generation, as the shard id
was no longer embedded in the generation id. This also became suboptimal
with the introduction of tablets. Commit 0c77f77 addressed this issue by
reading the minimum from disk to determine sstable ownership but this
improvement was lost with commit 63f1969, which optimistically assumed
that hints would work most of the time, which isn't true.

This commit restores that change - shard id of a table is deduced by
reading minially from disk and then the sstable is fully loaded only if
it belongs to the local shard. This patch also adds a testcase to verify
that the sstable are loaded only in their respective shards.

Fixes #21015

This fixes a regression and should be backported.

- (cherry picked from commit d2ba45a01f)

- (cherry picked from commit 6e3ecc70a6)

- (cherry picked from commit 63100b34da)

Parent PR: #22263

Closes scylladb/scylladb#22376

* github.com:scylladb/scylladb:
  sstable_directory: do not load remote sstables in process_descriptor
  sstable_directory: reintroduce `get_shards_for_this_sstable()`
2025-01-29 20:20:50 +02:00
Botond Dénes
fa73a8da34 replica: remove noexcept from token -> tablet resolution path
The methods to resolve a key/token/range to a table are all noexcept.
Yet the method below all of these, `storage_group_for_id()` can throw.
This means that if due to any mistake a tablet without local replica is
attempted to be looked up, it will result in a crash, as the exception
bubbles up into the noexcept methods.
There is no value in pretending that looking up the tablet replica is
noexcept, remove the noexcept specifiers so that any bad lookup only
fails the operation at hand and doesn't crash the node. This is
especially relevant to replace, which still has a window where writes
can arrive for tablets that don't (yet) have a local replica. Currently,
this results in a crash. After this patch, this will only fail the
writes and the replace can move on.

Fixes: #21480

Closes scylladb/scylladb#22251

(cherry picked from commit 55963f8f79)

Closes scylladb/scylladb#22379
2025-01-29 20:19:52 +02:00
Avi Kivity
10abba4c64 Merge '[Backport 6.2] repair: handle no_such_keyspace in repair preparation phase' from null
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.

Skip the keyspace repair if no_such_keyspace is thrown during preparations.

Fixes: #22073.

Requires backport to 6.1 and 6.2 as they contain the bug

- (cherry picked from commit bfb1704afa)

- (cherry picked from commit 54e7f2819c)

Parent PR: #22473

Closes scylladb/scylladb#22541

* github.com:scylladb/scylladb:
  test: add test to check if repair handles no_such_keyspace
  repair: handle keyspace dropped
2025-01-29 19:52:38 +02:00
Michael Litvak
36b1a486de cdc: fix handling of new generation during raft upgrade
During raft upgrade, a node may gossip about a new CDC generation that
was propagated through raft. The node that receives the generation by
gossip may have not applied the raft update yet, and it will not find
the generation in the system tables. We should consider this error
non-fatal and retry to read until it succeeds or becomes obsolete.

Another issue is when we fail with a "fatal" exception and not retrying
to read, the cdc metadata is left in an inconsistent state that causes
further attempts to insert this CDC generation to fail.

What happens is we complete preparing the new generation by calling `prepare`,
we insert an empty entry for the generation's timestamp, and then we fail. The
next time we try to insert the generation, we skip inserting it because we see
that it already has an entry in the metadata and we determine that
there's nothing to do. But this is wrong, because the entry is empty,
and we should continue to insert the generation.

To fix it, we change `prepare` to return `true` when the entry already
exists but it's empty, indicating we should continue to insert the
generation.

Fixes scylladb/scylladb#21227

Closes scylladb/scylladb#22093

(cherry picked from commit 4f5550d7f2)

Closes scylladb/scylladb#22545
2025-01-29 19:52:18 +02:00
Ferenc Szili
5a74ded582 test: add reproducer and test for fix to split ready CG creation
This adds a reproducer for #22431

In cases where a tablet storage group manager had more than one storage
group, it was possible to create compaction groups outside the group0
guard, which could create problems with operations which should exclude
with compaction group creation.

(cherry picked from commit 8bff7786a8)
2025-01-29 14:46:37 +01:00
Ferenc Szili
0ea5a1fc48 table: run set_split_mode() on all storage groups during all_storage_groups_split()
tablet_storage_group_manager::all_storage_groups_split() calls set_split_mode()
for each of its storage groups to create split ready compaction groups. It does
this by iterating through storage groups using std::ranges::all_of() which is
not guaranteed to iterate through the entire range, and will stop iterating on
the first occurance of the predicate (set_split_mode()) returning false.
set_split_mode() creates the split compaction groups and returns false if the
storage group's main compaction group or merging groups are not empty. This
means that in cases where the tablet storage group manager has non-empty
storage groups, we could have a situation where split compaction groups are not
created for all storage groups.

The missing split compaction groups are later created in
tablet_storage_group_manager::split_all_storage_groups() which also calls
set_split_mode(), and that is the reason why split completes successfully. The
problem is that tablet_storage_group_manager::all_storage_groups_split() runs
under a group0 guard, and tablet_storage_group_manager::split_all_storage_groups()
does not. This can cause problems with operations which should exclude with
compaction group creation. i.e. DROP TABLE/DROP KEYSPACE

(cherry picked from commit 24e8d2a55c)
2025-01-29 14:42:17 +01:00
Aleksandra Martyniuk
be67b4d634 test: add test to check if repair handles no_such_keyspace
(cherry picked from commit 54e7f2819c)
2025-01-28 21:50:10 +00:00
Aleksandra Martyniuk
2143f8ccb6 repair: handle keyspace dropped
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.

Skip the keyspace repair if no_such_keyspace is thrown during preparations.

(cherry picked from commit bfb1704afa)
2025-01-28 21:50:10 +00:00
Kefu Chai
7da4223411 compress: fix compressor initialization order by making namespace_prefix a function
Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be
initialized before compressor::namespace_prefix due to undefined
global variable initialization order across translation units. This
was causing ZstdCompressor to be unregistered in release builds,
making it impossible to create tables with Zstd compression.

Replace the global namespace_prefix variable with a function that
returns the fully qualified compressor name. This ensures proper
initialization order and fixes the registration of the ZstdCompressor.

Fixes scylladb/scylladb#22444
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22451

(cherry picked from commit 4a268362b9)

Closes scylladb/scylladb#22510
2025-01-27 19:50:23 +02:00
Calle Wilund
2b40f405f0 commitlog: Fix assertion in oversized_alloc
Fixes #20633

Cannot assert on actual request_controller when releasing permit, as the
release, if we have waiters in queue, will subtract some units to hand to them.
Instead assert on permit size + waiter status (and if zero, also controller value)

* v2 - use SCYLLA_ASSERT

(cherry picked from commit 58087ef427)

Closes scylladb/scylladb#22455
2025-01-26 16:46:35 +02:00
Kamil Braun
399325f0f0 Merge '[Backport 6.2] raft: Handle non-critical config update errors in when changing voter status.' from Sergey Z
When a node is bootstrapped and joined a cluster as a non-voter and changes it's role to a voter, errors can occur while committing a new Raft record, for instance, if the Raft leader changes during this time. These errors are not critical and should not cause a node crash, as the action can be retried.

Fixes scylladb/scylladb#20814

Backport: This issue occurs frequently and disrupts the CI workflow to some extent. Backports are needed for versions 6.1 and 6.2.

- (cherry picked from commit 775411ac56)

- (cherry picked from commit 16053a86f0)

- (cherry picked from commit 8c48f7ad62)

- (cherry picked from commit 3da4848810)

- (cherry picked from commit 228a66d030)

Parent PR: #22253

Closes scylladb/scylladb#22358

* github.com:scylladb/scylladb:
  raft: refactor `remove_from_raft_config` to use a timed `modify_config` call.
  raft: Refactor functions using `modify_config` to use a common wrapper for retrying.
  raft: Handle non-critical config update errors in when changing status to voter.
  test: Add test to check that a node does not fail on unknown commit status error when starting up.
  raft: Add run_op_with_retry in raft_group0.
2025-01-24 17:05:50 +01:00
Sergey Zolotukhin
9730f98d34 raft: refactor remove_from_raft_config to use a timed modify_config call.
To avoid potential hangs during the `remove_from_raft_config` operation, use a timed `modify_config` call.
This ensures the operation doesn't get stuck indefinitely.

(cherry picked from commit 228a66d030)
2025-01-22 09:54:28 +01:00
Sergey Zolotukhin
d419fb4a0c raft: Refactor functions using modify_config to use a common wrapper
for retrying.

There are several places in `raft_group0` where almost identical code is
used for retrying `modify_config` in case of `commit_status_unknown`
error. To avoid code duplication all these places were changed to
use a new wrapper `run_op_with_retry`.

(cherry picked from commit 3da4848810)
2025-01-22 09:54:26 +01:00
Lakshmi Narayanan Sreethar
e0189ccac5 sstable_directory: do not load remote sstables in process_descriptor
The sstable loader relied on the generation id to provide an efficient
hint about the shard that owns an sstable. But, this hint was rendered
ineffective with the introduction of UUID generation, as the shard id
was no longer embedded in the generation id. This also became suboptimal
with the introduction of tablets. Commit 0c77f77 addressed this issue by
reading the minimum from disk to determine sstable ownership but this
improvement was lost with commit 63f1969, which optimistically assumed
that hints would work most of the time, which isn't true.

This commit restores that change - shard id of a table is deduced by
reading minially from disk and then the sstable is fully loaded only if
it belongs to the local shard. This patch also adds a testcase to verify
that the sstable are loaded only in their respective shards.

Fixes #21015

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 63100b34da)
2025-01-21 00:48:34 +05:30
Lakshmi Narayanan Sreethar
8191f5d0f4 sstable_directory: reintroduce get_shards_for_this_sstable()
Reintroduce `get_shards_for_this_sstable()` that was removed in commit
ad375fbb. This will be used in the following patch to ensure that an
sstable is loaded only once.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit d2ba45a01f)
2025-01-21 00:48:34 +05:30
Nadav Har'El
bff9ddde12 Merge '[Backport 6.2] view_builder: write status to tables before starting to build' from null
When adding a new view for building, first write the status to the
system tables and then add the view building step that will start
building it.

Otherwise, if we start building it before the status is written to the
table, it may happen that we complete building the view, write the
SUCCESS status, and then overwrite it with the STARTED status. The
view_build_status table will remain in incorrect state indicating the
view building is not complete.

Fixes #20638

The PR contains few additional small fixes in separate commits related to the view build status table.

It addresses flakiness issues in tests that use the view build status table to determine when view building is complete. The table may be in incorrect state due to these issues, having a row with status STARTED when it actually finished building the view, which will cause us to wait in `wait_for_view` until it timeouts.

For testing I used a test similar to `test_view_build_status_with_replace_node`, but it only creates the views and calls `wait_for_view`. Without these commits it failed in 4/1024 runs, and with the commits it passed 2048/2048.

backport to fix the bugs that affects previous versions and improve CI stability

- (cherry picked from commit b1be2d3c41)

- (cherry picked from commit 1104411f83)

- (cherry picked from commit 7a6aec1a6c)

Parent PR: #22307

Closes scylladb/scylladb#22356

* github.com:scylladb/scylladb:
  view_builder: hold semaphore during entire startup
  view_builder: pass view name by value to write_view_build_status
  view_builder: write status to tables before starting to build
2025-01-19 15:36:44 +02:00
Michael Litvak
e8fde3b0a3 view_builder: hold semaphore during entire startup
Guard the whole view builder startup routine by holding the semaphore
until it's done instead of releasing it early, so that it's not
intercepted by migration notifications.

(cherry picked from commit 7a6aec1a6c)
2025-01-19 11:03:53 +02:00
Michael Litvak
a7f8842776 view_builder: pass view name by value to write_view_build_status
The function write_view_build_status takes two lambda functions and
chooses which of them to run depending on the upgrade state. It might
run both of them.

The parameters ks_name and view_name should be passed by value instead
of by reference because they are moved inside each lambda function.
Otherwise, if both lambdas are run, the second call operates on invalid
values that were moved.

(cherry picked from commit 1104411f83)
2025-01-19 11:03:53 +02:00
Piotr Dulikowski
b88e0f8f74 Merge '[Backport 6.2] main, view: Pair view builder drain with its start' from null
In this PR, we pair draining the view builder with its start.
To better understand what was done and why, let's first look at the
situation before this commit and the context of it:

(a) The following things happened in order:

    1. The view builder would be constructed.
    2. Right after that, a deferred lambda would be created to stop the
       view builder during shutdown.
    3. group0_service would be started.
    4. A deferred lambda stopping group0_service would be created right
       after that.
    5. The view builder would be started.

(b) Because the view builder depends on group0_client, it couldn't be
    started before starting group0_service. On the other hand, other
    services depend on the view builder, e.g. the stream manager. That
    makes changing the order of initialization a difficult problem,
    so we want to avoid doing that unless we're sure it's the right
    choice.

(c) Since the view builder uses group0_client, there was a possibility
    of running into a segmentation fault issue in the following
    scenario:

    1. A call to `view_builder::mark_view_build_success()` is issued.
    2. We stop group0_service.
    3. `view_builder::mark_view_build_success()` calls
       `announce_with_raft()`, which leads to a use-after-free because
       group0_service has already been destroyed.

      This very scenario took place in scylladb/scylladb#20772.

Initially, we decided to solve the issue by initializing
group0_service a bit earlier (scylladb/scylladb@7bad8378c7).
Unfortunately, it led to other issues described in scylladb/scylladb#21534,
so we revert that patch. These changes are the second attempt
to the problem where we want to solve it in a safer manner.

The solution we came up with is to pair the start of the view builder
with a deferred lambda that deinitializes it by calling
`view_builder::drain()`. No other component of the system should be
able to use the view builder anymore, so it's safe to do that.
Furthermore, that pairing makes the analysis of
initialization/deinitialization order much easier. We also solve the
aformentioned use-after-free issue because the view builder itself
will no longer attempt to use group0_client.

Note that we still pair a deferred lambda calling `view_builder::stop()`
with the construction of the view builder; that function will also call
`view_builder::drain()`. Another notable thing is `view_builder::drain()`
may be called earlier by `storage_service::do_drain()`. In other words,
these changes cover the situation when Scylla runs into a problem when
starting up.

Backport: The patch I'm reverting made it to 6.2, so we want to backport this one there too.

Fixes scylladb/scylladb#20772
Fixes scylladb/scylladb#21534

- (cherry picked from commit a5715086a4)

- (cherry picked from commit 06ce976370)

- (cherry picked from commit d1f960eee2)

Parent PR: #21909

Closes scylladb/scylladb#22331

* github.com:scylladb/scylladb:
  test/topology_custom: Add test for Scylla with disabled view building
  main, view: Pair view builder drain with its start
  Revert "main,cql_test_env: start group0_service before view_builder"
2025-01-17 09:57:36 +01:00
Sergey Zolotukhin
2ea97d8c19 raft: Handle non-critical config update errors in when changing status
to voter.

When a node is bootstrapped and joins a cluster as a non-voter, errors can occur while committing
a new Raft record, for instance, if the Raft leader changes during this time. These errors are not
critical and should not cause a node crash, as the action can be retried.

Fixes scylladb/scylladb#20814

(cherry picked from commit 8c48f7ad62)
2025-01-16 20:09:31 +00:00
Sergey Zolotukhin
36ff3e8f5f test: Add test to check that a node does not fail on unknown commit status
error when starting up.

Test that a node is starting successfully if while joining a cluster and becoming a voter, it
receives an unknown commit status error.

Test for scylladb/scylladb#20814

(cherry picked from commit 16053a86f0)
2025-01-16 20:09:31 +00:00
Sergey Zolotukhin
325cdd3ebc raft: Add run_op_with_retry in raft_group0.
Since when calling `modify_config` it's quite often we need to do
retries, to avoid code duplication, a function wrapper that allows
a function to be called with automatic retries in case of failures
was added.

(cherry picked from commit 775411ac56)
2025-01-16 20:09:30 +00:00
Michael Litvak
c5bdc9e58f view_builder: write status to tables before starting to build
When adding a new view for building, first write the status to the
system tables and then add the view building step that will start
building it.

Otherwise, if we start building it before the status is written to the
table, it may happen that we complete building the view, write the
SUCCESS status, and then overwrite it with the STARTED status. The
view_build_status table will remain in incorrect state indicating the
view building is not complete.

Fixes scylladb/scylladb#20638

(cherry picked from commit b1be2d3c41)
2025-01-16 20:08:36 +00:00
Kamil Braun
cbef20e977 Merge '[Backport 6.2] Fix possible data corruption due to token keys clashing in read repair.' from Sergey
This update addresses an issue in the mutation diff calculation algorithm used during read repair. Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on the Murmur3 hash function, it could generate duplicate values for different partition keys, causing corruption in the affected rows' values.

Fixes scylladb/scylladb#19101

Since the issue affects all the relevant scylla versions, backport to: 6.1, 6.2

- (cherry picked from commit e577f1d141)

- (cherry picked from commit 39785c6f4e)

- (cherry picked from commit 155480595f)

Parent PR: #21996

Closes scylladb/scylladb#22298

* github.com:scylladb/scylladb:
  storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function.
  storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap.
  test: Add test case for checking read repair diff calculation when having conflicting keys.
2025-01-16 17:13:12 +01:00
Dawid Mędrek
833ea91940 test/topology_custom: Add test for Scylla with disabled view building
Before this commit, there doesn't seem to have been a test verifying that
starting and shutting down Scylla behave correctly when the configuration
option `view_building` is set to false. In these changes, we add one.

(cherry picked from commit d1f960eee2)
2025-01-16 14:12:31 +01:00
Dawid Mędrek
1200d3b735 main, view: Pair view builder drain with its start
In these changes, we pair draining the view builder with its start.
To better understand what was done and why, let's first look at the
situation before this commit and the context of it:

(a) The following things happened in order:

    1. The view builder would be constructed.
    2. Right after that, a deferred lambda would be created to stop the
       view builder during shutdown.
    3. group0_service would be started.
    4. A deferred lambda stopping group0_service would be created right
       after that.
    5. The view builder would be started.

(b) Because the view builder depends on group0_client, it couldn't be
    started before starting group0_service. On the other hand, other
    services depend on the view builder, e.g. the stream manager. That
    makes changing the order of initialization a difficult problem,
    so we want to avoid doing that unless we're sure it's the right
    choice.

(c) Since the view builder uses group0_client, there was a possibility
    of running into a segmentation fault issue in the following
    scenario:

    1. A call to `view_builder::mark_view_build_success()` is issued.
    2. We stop group0_service.
    3. `view_builder::mark_view_build_success()` calls
       `announce_with_raft()`, which leads to a use-after-free because
       group0_service has already been destroyed.

      This very scenario took place in scylladb/scylladb#20772.

Initially, we decided to solve the issue by initializing
group0_service a bit earlier (scylladb/scylladb@7bad8378c7).
Unfortunately, it led to other issues described in scylladb/scylladb#21534.
We reverted that change in the previous commit. These changes are the
second attempt to the problem where we want to solve it in a safer manner.

The solution we came up with is to pair the start of the view builder
with a deferred lambda that deinitializes it by calling
`view_builder::drain()`. No other component of the system should be
able to use the view builder anymore, so it's safe to do that.
Furthermore, that pairing makes the analysis of
initialization/deinitialization order much easier. We also solve the
aformentioned use-after-free issue because the view builder itself
will no longer attempt to use group0_client.

Note that we still pair a deferred lambda calling `view_builder::stop()`
with the construction of the view builder; that function will also call
`view_builder::drain()`. Another notable thing is `view_builder::drain()`
may be called earlier by `storage_service::do_drain()`. In other words,
these changes cover the situation when Scylla runs into a problem when
starting up.

Fixes scylladb/scylladb#20772

(cherry picked from commit 06ce976370)
2025-01-16 12:37:04 +01:00
Dawid Mędrek
84b774515b Revert "main,cql_test_env: start group0_service before view_builder"
The patch solved a problem related to an initialization order
(scylladb/scylladb#20772), but we ran into another one: scylladb/scylladb#21534.
After moving the initialization of group0_service, it ended up being destroyed
AFTER the CDC generation service would. Since CDC generations are accessed
in `storage_service::topology_state_load()`:

```
for (const auto& gen_id : _topology_state_machine._topology.committed_cdc_generations) {
    rtlogger.trace("topology_state_load: process committed cdc generation {}", gen_id);
    co_await _cdc_gens.local().handle_cdc_generation(gen_id);
```

we started getting the following failure:

```
Service &seastar::sharded<cdc::generation_service>::local() [Service = cdc::generation_service]: Assertion `local_is_initialized()' failed.
```

We're reverting the patch to go back to a more stable version of Scylla
and in the following commit, we'll solve the original issue in a more
systematic way.

This reverts commit 7bad8378c7.

(cherry picked from commit a5715086a4)
2025-01-16 12:36:41 +01:00
Sergey Zolotukhin
06a8956174 test: Include parent test name in ScyllaClusterManager log file names.
Add the test file name to `ScyllaClusterManager` log file names alongside the test function name.
This avoids race conditions when tests with the same function names are executed simultaneously.

Fixes scylladb/scylladb#21807

Backport: not needed since this is a fix in the testing scripts.

Closes scylladb/scylladb#22192

(cherry picked from commit 2f1731c551)

Closes scylladb/scylladb#22249
2025-01-14 16:33:27 +01:00
Sergey Zolotukhin
f0f833e8ab storage_proxy/read_repair: Remove redundant 'schema' parameter from data_read_resolver::resolve
function.

The `data_read_resolver` class inherits from `abstract_read_resolver`, which already includes the
`schema_ptr _schema` member. Therefore, using a separate function parameter in `data_read_resolver::resolve`
initialized with the same variable in `abstract_read_executor` is redundant.

(cherry picked from commit 155480595f)
2025-01-14 14:43:52 +01:00
Sergey Zolotukhin
b04b6aad9e storage_proxy/read_repair: Use partition_key instead of token key for mutation
diff calculation hashmap.

This update addresses an issue in the mutation diff calculation algorithm used during read repair.
Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on
the Murmur3 hash function, it could generate duplicate values for different partition keys, causing
corruption in the affected rows' values.

Fixes scylladb/scylladb#19101

(cherry picked from commit 39785c6f4e)
2025-01-14 14:37:47 +01:00
Sergey Zolotukhin
12ee41869a test: Add test case for checking read repair diff calculation when having
conflicting keys.

The test updates two rows with keys that result in a Murmur3 hash collision, which
is used to generate Scylla tokens. These tokens are involved in read repair diff
calculations. Due to the identical token values, a hash map key collision occurs.
Consequently, an incorrect value from the second row (with a different primary key)
is then sent for writing as 'repaired', causing data corruption.

(cherry picked from commit e577f1d141)
2025-01-13 22:05:32 +00:00
Kamil Braun
ef93c3a8d7 Merge '[Backport 6.2] cache_algorithm_test: fix flaky failures' from Michał Chojnowski
This series attempts to get read of flakiness in cache_algorithm_test by solving two problems.

Problem 1:

The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form:
0x0000000000000000000000000000000000000000...
0x0100000000000000000000000000000000000000...
0x0200000000000000000000000000000000000000...
But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar...
0x0100000000000000garbagegarbagegarbagegar...
0x0200000000000000garbagegarbagegarbagegar...

Each of these keys is created several times and -- for the test to pass -- the result must be the same each time.
By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses.

Problem 2:

Cache stats are global, so there's no good way to reliably
verify that e.g. a given read causes 0 cache misses,
because something done by Scylla in a background can trigger a cache miss.

This can cause the test to fail spuriously.

With how the test framework and the cache are designed, there's probably
no good way to test this properly. It would require ensuring that cache
stats are per-read, or at least per-table, and that Scylla's background
activity doesn't cause enough memory pressure to evict the tested rows.

This patch tries to deal with the flakiness without deleting the test
altogether by letting it retry after a failure if it notices that it
can be explained by a read which wasn't done by the test.
(Though, if the test can't be written well, maybe it just shouldn't be written...)

Fixes scylladb/scylladb#21536

(cherry picked from commit 1fffd976a4)
(cherry picked from commit 6caaead4ac)

Parent PR: scylladb/scylladb#21948

Closes scylladb/scylladb#22228

* github.com:scylladb/scylladb:
  cache_algorithm_test: harden against stats being confused by background activity
  cache_algorithm_test: fix a use of an uninitialized variable
2025-01-09 14:30:31 +01:00
Aleksandra Martyniuk
a59c4653fe repair: check tasks local to given shard
Currently task_manager_module::is_aborted checks the tasks local
to caller's shard on a given shard.

Fix the method to check the task map local to the given shard.

Fixes: #22156.

Closes scylladb/scylladb#22161

(cherry picked from commit a91e03710a)

Closes scylladb/scylladb#22197
2025-01-08 13:07:38 +02:00
Yaron Kaikov
2e87e317d9 .github/scripts/auto-backport.py: Add comment to PR when conflicts apply
When we open a PR with conflicts, the PR owner gets a notification about the assignment but has no idea if this PR is with conflicts or not (in Scylla it's important since CI will not start on draft PR)

Let's add a comment to notify the user we have conflicts

Closes scylladb/scylladb#21939

(cherry picked from commit 2e6755ecca)

Closes scylladb/scylladb#22190
2025-01-08 13:07:08 +02:00
Botond Dénes
f73f7c17ec Merge 'sstables_manager: do not reclaim unlinked sstables' from Lakshmi Narayanan Sreethar
When an sstable is unlinked, it remains in the _active list of the
sstable manager. Its memory might be reclaimed and later reloaded,
causing issues since the sstable is already unlinked. This patch updates
the on_unlink method to reclaim memory from the sstable upon unlinking,
remove it from memory tracking, and thereby prevent the issues described
above.

Added a testcase to verify the fix.

Fixes #21887

This is a bug fix in the bloom filter reload/reclaim mechanism and should be backported to older versions.

Closes scylladb/scylladb#21895

* github.com:scylladb/scylladb:
  sstables_manager: reclaim memory from sstables on unlink
  sstables_manager: introduce reclaim_memory_and_stop_tracking_sstable()
  sstables: introduce disable_component_memory_reload()
  sstables_manager: log sstable name when reclaiming components

(cherry picked from commit d4129ddaa6)

Closes scylladb/scylladb#21998
2025-01-08 13:06:24 +02:00
Michał Chojnowski
379f23d854 cache_algorithm_test: harden against stats being confused by background activity
Cache stats are global, so there's no good way to reliably
verify that e.g. a given read causes 0 cache misses,
because something done by Scylla in a background can trigger a cache miss.

This can cause the test to fail spuriously.

With how the test framework and the cache are designed, there's probably
no good way to test this properly. It would require ensuring that cache
stats are per-read, or at least per-table, and that Scylla's background
activity doesn't cause enough memory pressure to evict the tested rows.

This patch tries to deal with the flakiness without deleting the test
altogether by letting it retry after a failure if it notices that it
can be explained by a read which wasn't done by the test.
(Though, if the test can't be written well, maybe it just shouldn't be written...)

(cherry picked from commit 6caaead4ac)
2025-01-08 11:48:23 +01:00
Michał Chojnowski
10815d2599 cache_algorithm_test: fix a use of an uninitialized variable
The test needs to create some arbitrary partition keys of a given size.
It intends to create keys of the form:
0x0000000000000000000000000000000000000000...
0x0100000000000000000000000000000000000000...
0x0200000000000000000000000000000000000000...
But instead, unintentionally, it creates partially initialized keys of the form:
0x0000000000000000garbagegarbagegarbagegar...
0x0100000000000000garbagegarbagegarbagegar...
0x0200000000000000garbagegarbagegarbagegar...

Each of these keys is created several times and -- for the test to pass --
the result must be the same each time.
By coincidence, this is usually the case, since the same allocator slots are used.
But if some background task happens to overwrite the allocator slot during a
preemption, the keys used during "SELECT" will be different than the keys used
during "INSERT", and the test will fail due to extra cache misses.

(cherry picked from commit 1fffd976a4)
2025-01-08 11:48:17 +01:00
Patryk Jędrzejczak
a63a0eac1e [Backport 6.2] raft: improve logs for abort while waiting for apply
New logs allow us to easily distinguish two cases in which
waiting for apply times out:
- the node didn't receive the entry it was waiting for,
- the node received the entry but didn't apply it in time.

Distinguishing these cases simplifies reasoning about failures.
The first case indicates that something went wrong on the leader.
The second case indicates that something went wrong on the node
on which waiting for apply timed out.

As it turns out, many different bugs result in the `read_barrier`
(which calls `wait_for_apply`) timeout. This change should help
us in debugging bugs like these.

We want to backport this change to all supported branches so that
it helps us in all tests.

Fixes scylladb/scylladb#22160

Closes scylladb/scylladb#22159
2025-01-07 17:01:22 +01:00
Kamil Braun
afd588d4c7 Merge '[Backport 6.2] Do not reset quarantine list in non raft mode' from Gleb
The series contains small fixes to the gossiper one of which fixes #21930. Others I noticed while debugged the issue.

Fixes: #21930

(cherry picked from commit 91cddcc17f)

Parent PR: #21956

Closes scylladb/scylladb#21991

* github.com:scylladb/scylladb:
  gossiper: do not reset _just_removed_endpoints in non raft mode
  gossiper: do not call apply for the node's old state
2025-01-03 16:28:08 +01:00
Abhinav
f5bce45399 Fix gossiper orphan node floating problem by adding a remover fiber
In the current scenario, if during startup, a node crashes after initiating gossip and before joining group0,
then it keeps floating in the gossiper forever because the raft based gossiper purging logic is only effective
once node joins group0. This orphan node hinders the successor node from same ip to join cluster since it collides
with it during gossiper shadow round.

This commit intends to fix this issue by adding a background thread which periodically checks for such orphan entries in
gossiper and removes them.

A test is also added in to verify this logic. This test fails without this background thread enabled, hence
verifying the behavior.

Fixes: scylladb/scylladb#20082

Closes scylladb/scylladb#21600

(cherry picked from commit 6c90a25014)

Closes scylladb/scylladb#21822
2025-01-02 14:57:46 +01:00
Gleb Natapov
cda997fe59 gossiper: do not reset _just_removed_endpoints in non raft mode
By the time the function is called during start it may already be
populated.

Fixes: scylladb/scylladb#21930
(cherry picked from commit e318dfb83a)
2024-12-25 12:01:16 +02:00
Gleb Natapov
155a0462d5 gossiper: do not call apply for the node's old state
If a nodes changed its address an old state may be still in a gossiper,
so ignore it.

(cherry picked from commit e80355d3a1)
2024-12-23 11:47:12 +02:00
Piotr Dulikowski
76b1173546 Merge 'service/topology_coordinator: migrate view builder only if all nodes are up' from Michał Jadwiszczak
The migration process is doing read with consistency level ALL,
requiring all nodes to be alive.

Fixes scylladb/scylladb#20754

The PR should be backported to 6.2, this version has view builder on group0.

Closes scylladb/scylladb#21708

* github.com:scylladb/scylladb:
  test/topology_custom/test_view_build_status: add reproducer
  service/topology_coordinator: migrate view builder only if all nodes are up

(cherry picked from commit def51e252d)

Closes scylladb/scylladb#21850
2024-12-19 14:10:55 +01:00
Piotr Dulikowski
e5a37d63c0 Merge 'transport/server: revert using async function in for_each_gently()' from Michał Jadwiszczak
This patch reverts 324b3c43c0 and adds synchronous versions of `service_level_controller::find_effective_service_level()` and `client_state::maybe_update_per_service_level_params()`.

It isn't safe to do asynchronous calls in `for_each_gently`, as the
connection may be disconnected while a call in callback preempts.

Fixes scylladb/scylladb#21801

Closes scylladb/scylladb#21761

* github.com:scylladb/scylladb:
  Revert "generic_server: use async function in `for_each_gently()`"
  transport/server: use synchronous calls in `for_each_gently` callback
  service/client_state: add synchronous method to update service level params
  qos/service_level_controller: add `find_cached_effective_service_level`

(cherry picked from commit c601f7a359)

Closes scylladb/scylladb#21849
2024-12-19 14:10:31 +01:00
Tomasz Grabiec
0851e3fba7 Merge '[Backport 6.2] utils: cached_file: Mark permit as awaiting on page miss' from ScyllaDB
Otherwise, the read will be considered as on-cpu during promoted index
search, which will severely underutlize the disk because by default
on-cpu concurrency is 1.

I verified this patch on the worst case scenario, where the workload
reads missing rows from a large partition. So partition index is
cached (no IO) and there is no data file IO (relies on https://github.com/scylladb/scylladb/pull/20522).
But there is IO during promoted index search (via cached_file).

Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s.

The problem is much less pronounced if there is data file or partition index IO involved
because that IO will signal read concurrency semaphore to invite more concurrency.

Fixes #21325

(cherry picked from commit 868f5b59c4)

(cherry picked from commit 0f2101b055)

Refs #21323

Closes scylladb/scylladb#21358

* github.com:scylladb/scylladb:
  utils: cached_file: Mark permit as awaiting on page miss
  utils: cached_file: Push resource_unit management down to cached_file
2024-12-16 19:55:00 +01:00
Michael Litvak
99f190f699 service/qos/service_level_controller: update cache on startup
Update the service level cache in the node startup sequence, after the
service level and auth service are initialized.

The cache update depends on the service level data accessor being set
and the auth service being initialized. Before the commit, it may happen that a
cache update is not triggered after the initialization. The commit adds
an explicit call to update the cache where it is guaranteed to be ready.

Fixes scylladb/scylladb#21763

Closes scylladb/scylladb#21773

(cherry picked from commit 373855b493)

Closes scylladb/scylladb#21893
2024-12-16 14:19:06 +01:00
Michael Litvak
04e8506cbb service/qos: increase timeout of internal get_service_levels queries
The function get_service_levels is used to retrieve all service levels
and it is called from multiple different contexts.
Importantly, it is called internally from the context of group0 state reload,
where it should be executed with a long timeout, similarly to other
internal queries, because a failure of this function affects the entire
group0 client, and a longer timeout can be tolerated.
The function is also called in the context of the user command LIST
SERVICE LEVELS, and perhaps other contexts, where a shorter timeout is
preferred.

The commit introduces a function parameter to indicate whether the
context is internal or not. For internal context, a long timeout is
chosen for the query. Otherwise, the timeout is shorter, the same as
before. When the distinction is not important, a default value is
chosen which maintains the same behavior.

The main purpose is to fix the case where the timeout is too short and causes
a failure that propagates and fails the group0 client.

Fixes scylladb/scylladb#20483

Closes scylladb/scylladb#21748

(cherry picked from commit 53224d90be)

Closes scylladb/scylladb#21890
2024-12-16 14:15:26 +01:00
Yaron Kaikov
8e606a239f github: check if PR is closed instead of merge
In Scylla, we can have either `closed` or `merged` PRs. Based on that we decide when to start the backport process when the label was added after the PR is closed (or merged),

In https://github.com/scylladb/scylladb/pull/21876 even when adding the proper backport label didn't trigger the backport automation. Https://github.com/scylladb/scylladb/pull/21809/ caused this, we should have left the `state=closed` (this includes both closed and merged PR)

Fixing it

Closes scylladb/scylladb#21906

(cherry picked from commit b4b7617554)

Closes scylladb/scylladb#21922
2024-12-16 14:07:32 +02:00
Jenkins Promoter
8cdff8f52f Update ScyllaDB version to: 6.2.3 2024-12-15 15:55:40 +02:00
Kamil Braun
3ec741dbac Merge 'topology_coordinator: introduce reload_count in topology state and use it to prevent race' from Gleb Natapov
Topology request table may change between the code reading it and
calling to cv::when() since reading is a preemption point. In this
case cv:signal can be missed. Detect that there was no signal in between
reading and waiting by introducing reload_count which is increased each
time the state is reloaded and signaled. If the counter is different
before and after reading the state may have change so re-check it again
instead of sleeping.

Closes scylladb/scylladb#21713

* github.com:scylladb/scylladb:
  topology_coordinator: introduce reload_count in topology state and use it to prevent race
  storage_service: use conditional_variable::when in co-routines consistently

(cherry picked from commit 8f858325b6)

Closes scylladb/scylladb#21803
2024-12-12 15:45:31 +01:00
Anna Stuchlik
e3e7ac16e9 doc: remove wrong image upgrade info (5.2-to-2023.1)
This commit removes the information about the recommended way of upgrading
ScyllaDB images - by updating ScyllaDB and OS packages in one step. This upgrade
procedure is not supported (it was implemented, but then reverted).

Refs https://github.com/scylladb/scylladb/issues/15733

Closes scylladb/scylladb#21876
Fixes https://github.com/scylladb/scylla-enterprise/issues/5041
Fixes https://github.com/scylladb/scylladb/issues/21898

(cherry picked from commit 98860905d8)
2024-12-12 15:22:26 +02:00
Tomasz Grabiec
81d6d88016 utils: cached_file: Mark permit as awaiting on page miss
Otherwise, the read will be considered as on-cpu during promoted index
search, which will severely underutlize the disk because by default
on-cpu concurrency is 1.

I verified this patch on the worst case scenario, where the workload
reads missing rows from a large partition. So partition index is
cached (no IO) and there is no data file IO. But there is IO during
promoted index search (via cached_file). Before the patch this
workload was doing 4k req/s, after the patch it does 30k req/s.

The problem is much less pronounced if there is data file or index
file IO involved because that IO will signal read concurrency
semaphore to invite more concurrency.

(cherry picked from commit 0f2101b055)
2024-12-09 23:18:00 +01:00
Tomasz Grabiec
56f93dd434 utils: cached_file: Push resource_unit management down to cached_file
It saves us permit operations on the hot path when we hit in cache.

Also, it will lay the ground for marking the permit as awaiting later.

(cherry picked from commit 868f5b59c4)
2024-12-09 23:17:56 +01:00
Kefu Chai
28a32f9c50 github: do not nest ${{}} inside condition
In commit 2596d157, we added a condition to run auto-backport.py only
when the GitHub Action is triggered by a push to the default branch.
However, this introduced an unexpected error due to incorrect condition
handling.

Problem:
- `github.event.before` evaluates to an empty string
- GitHub Actions' single-pass expression evaluation system causes
  the step to always execute, regardless of `github.event_name`

Despite GitHub's documentation suggesting that ${{ }} can be omitted,
it recommends using explicit ${{}} expressions for compound conditions.

Changes:
- Use explicit ${{}} expression for compound conditions
- Avoid string interpolation in conditional statements

Root Cause:
The previous implementation failed because of how GitHub Actions
evaluates conditional expressions, leading to an unintended script
execution and a 404 error when attempting to compare commits.

Example Error:

```
  python .github/scripts/auto-backport.py --repo scylladb/scylladb --base-branch refs/heads/master --commits ..2b07d93beac7bc83d955dadc20ccc307f13f20b6
  shell: /usr/bin/bash -e {0}
  env:
    DEFAULT_BRANCH: master
    GITHUB_TOKEN: ***
Traceback (most recent call last):
  File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 201, in <module>
    main()
  File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 162, in main
    commits = repo.compare(start_commit, end_commit).commits
  File "/usr/lib/python3/dist-packages/github/Repository.py", line 888, in compare
    headers, data = self._requester.requestJsonAndCheck(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck
    return self.__check(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/commits/commits#compare-two-commits", "status": "404"}
```

Fixes scylladb/scylladb#21808
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21809

(cherry picked from commit e04aca7efe)

Closes scylladb/scylladb#21820
2024-12-06 16:34:59 +02:00
Avi Kivity
20b2d5b7c9 Merge 'compaction: update maintenance sstable set on scrub compaction completion' from Lakshmi Narayanan Sreethar
Scrub compaction can pick up input sstables from maintenance sstable set
but on compaction completion, it doesn't update the maintenance set
leaving the original sstable in set after it has been scrubbed. To fix
this, on compaction completion has to update the maintenance sstable if
the input originated from there. This PR solves the issue by updating the
correct sstable_sets on compaction completion.

Fixes #20030

This issue has existed since the introduction of main and maintenance sstable sets into scrub compaction. It would be good to have the fix backported to versions 6.1 and 6.2.

Closes scylladb/scylladb#21582

* github.com:scylladb/scylladb:
  compaction: remove unused `update_sstable_lists_on_off_strategy_completion`
  compaction_group: replace `update_sstable_lists_on_off_strategy_completion`
  compaction_group: rename `update_main_sstable_list_on_compaction_completion`
  compaction_group: update maintenance sstable set on scrub compaction completion
  compaction_group: store table::sstable_list_builder::result in replacement_desc
  table::sstable_list_builder: remove old sstables only from current list
  table::sstable_list_builder: return removed sstables from build_new_list

(cherry picked from commit 58baeac0ad)

Closes scylladb/scylladb#21790
2024-12-06 10:36:46 +02:00
Michael Pedersen
f37deb7e98 docs: correct the storage size for n2-highmem-32 to 9000GB
updated storage size for n2-highmem-32 to 9000GB as this is default in SC

Fixes scylladb/scylladb#21785
Closes scylladb/scylladb#21537

(cherry picked from commit 309f1606ae)

Closes scylladb/scylladb#21595
2024-12-05 09:51:11 +02:00
Tomasz Grabiec
933ec7c6ab utils: UUID: Make get_time_UUID() respect the clock offset
schema_change_test currently fails due to failure to start a cql test
env in unit tests after the point where this is called (in one of the
test cases):

   forward_jump_clocks(std::chrono::seconds(60*60*24*31));

The problem manifests with a failure to join the cluster due to
missing_column exception ("missing_column: done") being thrown from
system_keyspace::get_topology_request_state(). It's a symptom of
join request being missing in system.topology_requests. It's missing
because the row is expired.

When request is created, we insert the
mutations with intended TTL of 1 month. The actual TTL value is
computed like this:

  ttl_opt topology_request_tracking_mutation_builder::ttl() const {
      return std::chrono::duration_cast<std::chrono::seconds>(std::chrono::microseconds(_ts)) + std::chrono::months(1)
          - std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch());
  }

_ts comes from the request_id, which is supposed to be a timeuuid set
from current time when request starts. It's set using
utils::UUID_gen::get_time_UUID(). It reads the system clock without
adding the clock offset, so after forward_jump_clocks(), _ts and
gc_clock::now() may be far off. In some cases the accumulated offset
is larger than 1month and the ttl becomes negative, causing the
request row to expire immediately and failing the boot sequence.

The fix is to use db_clock, which respects offsets and is consistent
with gc_clock.

The test doesn't fail in CI becuase there each test case runs in a
separate process, so there is no bootstrap attempt (by new cql test
env) after forward_jump_clocks().

Closes scylladb/scylladb#21558

(cherry picked from commit 1d0c6aa26f)

Closes scylladb/scylladb#21584

Fixes #21581
2024-12-04 14:18:16 +01:00
Kefu Chai
2b5cd10b66 docs: explain task status retention and one-time query behavior
Task status information from nodetool commands is not retained permanently:

- Status of completed tasks is only kept for `task_ttl_in_seconds`
- Status is removed after being queried, making it a one-time operation

This behavior is important for users to understand since subsequent
queries for the same completed task will not return any information.
Add documentation to make this clear to users.

Fixes scylladb/scylladb#21757
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21386

(cherry picked from commit afeff0a792)

Closes scylladb/scylladb#21759
2024-12-04 13:49:24 +02:00
Kefu Chai
bf47de9f7f test: topology_custom: ensure node visibility before keyspace creation
Building upon commit 69b47694, this change addresses a subtle synchronization
weakness in node visibility checks during recovery mode testing.

Previous Approach:
- Waited only for the first node to see its peers
- Insufficient to guarantee full cluster consistency

Current Solution:
1. Implement comprehensive node visibility verification
2. Ensure all nodes mutually recognize each other
3. Prevent potential schema propagation race conditions

Key Improvements:
- Robust cluster state validation before keyspace creation
- Eliminate partial visibility scenarios

Fixes scylladb/scylladb#21724

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21726

(cherry picked from commit 65949ce607)

Closes scylladb/scylladb#21734
2024-12-04 13:46:26 +02:00
Nadav Har'El
dc71a6a75e Merge 'test/boost/view_schema_test.cc: Wait for views to build in test_view_update_generating_writetime' from Dawid Mędrek
Before these changes, we didn't wait for the materialized views to
finish building before writing to the base table. That led to generating
an additional view update, which, in turn, led to test failures.

The scenario corresponding to the summary above looked like this:

1. The test creates an empty table and MVs on it.
2. The view builder starts, but it doesn't finish immediately.
3. The test performs mutations to the base table. Since the views
   already exist, view updates are generated.
4. Finally, the view builder finishes. It notices that the base
   table has a row, so it generates a view update for it because
   it doesn't notice that we already have data in the view.

We solve it by explicitly waiting for both views to finish building
and only then start writing to the base table.

Additionally, we also fix a lifetime issue of the row the test revolves
around, further stabilizing CI.

Fixes https://github.com/scylladb/scylladb/issues/20889

Backport: These changes have no semantic effect on the codebase,
but they stabilize CI, so we want to backport them to the maintained
versions of Scylla.

Closes scylladb/scylladb#21632

* github.com:scylladb/scylladb:
  test/boost/view_schema_test.cc: Increase TTL in test_view_update_generating_writetime
  test/boost/view_schema_test.cc: Wait for views to build in test_view_update_generating_writetime

(cherry picked from commit 733a4f94c7)

Closes scylladb/scylladb#21640
2024-12-04 13:44:33 +02:00
Aleksandra Martyniuk
f13f821b31 repair: implement tablet_repair_task_impl::release_resources
tablet_repair_task_impl keeps a vector of tablet_repair_task_meta,
each of which keeps an effective_replication_map_ptr. So, after
the task completes, the token metadata version will not change for
task_ttl seconds.

Implement tablet_repair_task_impl::release_resources method that clears
tablet_repair_task_meta vector when the task finishes.

Set task_ttl to 1h in test_tablet_repair to check whether the test
won't time out.

Fixes: #21503.

Closes scylladb/scylladb#21504

(cherry picked from commit 572b005774)

Closes scylladb/scylladb#21622
2024-12-04 13:43:40 +02:00
André LFA
74ad6f2fa3 Update report-scylla-problem.rst removing references to old Health Check Report
Closes scylladb/scylladb#21467

(cherry picked from commit 703e6f3b1f)

Closes scylladb/scylladb#21591
2024-12-04 13:41:00 +02:00
Abhinav
fc42571591 test: Parametrize 'replacement with inter-dc encryption' test to confirm behavior in zero token node cases.
In the current scenario, 'test_replace_with_encryption' only confirms the replacement with inter-dc encryption
for normal nodes. This commit increases the coverage of test by parametrizing the test to confirm behavior
for zero token node replacement as well. This test also implicitly provides
coverage for bootstrap with encryption of zero token nodes.

This PR increases coverage for existing code. Hence we need to backport it. Since only 6.2 version has zero
token node support, hence we only backport it to 6.2

Fixes: scylladb/scylladb#21096

Closes scylladb/scylladb#21609

(cherry picked from commit acd643bd75)

Closes scylladb/scylladb#21764
2024-12-04 11:22:39 +01:00
Botond Dénes
c6ef055e9c Merge 'repair: fix task_manager_module::abort_all_repairs' from Aleksandra Martyniuk
Currently, task_manager_module::abort_all_repairs marks top-level repairs as aborted (but does not abort them) and aborts all existing shard tasks.

A running repair checks whether its id isn't contained in _aborted_pending_repairs and then proceeds to create shard tasks. If abort_all_repairs is executed after _aborted_pending_repairs is checked but before shard tasks are created, then those new tasks won't be aborted. The issue is the most severe for tablet_repair_task_impl that checks the _aborted_pending_repairs content from different shards, that do not see the top-level task. Hence the repair isn't stopped but it creates shard repair tasks on all shards but the one that initialized repair.

Abort top-level tasks in abort_all_repairs. Fix the shard on which the task abort is checked.

Fixes: #21612.

Needs backport to 6.1 and 6.2 as they contain the bug.

Closes scylladb/scylladb#21616

* github.com:scylladb/scylladb:
  test: add test to check if repair is properly aborted
  repair: add shard param to task_manager_module::is_aborted
  repair: use task abort source to abort repair
  repair: drop _aborted_pending_repairs and utilize tasks abort mechanism
  repair: fix task_manager_module::abort_all_repairs

(cherry picked from commit 5ccbd500e0)

Closes scylladb/scylladb#21642
2024-11-21 06:33:31 +02:00
Nadav Har'El
6ba0253dd3 alternator: fix "/localnodes" to not return down nodes
Alternator's "/localnodes" HTTP requests is supposed to return the list
of nodes in the local DC to which the user can send requests.

Before commit bac7c33313 we used the
gossiper is_alive() method to determine if a node should be returned.
That commit changed the check to is_normal() - because a node can be
alive but in non-normal (e.g., joining) state and not ready for
requests.

However, it turns out that checking is_normal() is not enough, because
if node is stopped abruptly, other nodes will still consider it "normal",
but down (this is so-called "DN" state). So we need to check **both**
is_alive() and is_normal().

This patch also adds a test reproducing this case, where a node is
shut down abruptly. Before this patch, the test failed ("/localnodes"
continued to return the dead node), and after it it passes.

Fixes #21538

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#21540

(cherry picked from commit 7607f5e33e)

Closes scylladb/scylladb#21634
2024-11-20 09:22:59 +02:00
Anna Stuchlik
d9eb502841 doc: add the 6.0-to-2024.2 upgrade guide-from-6
This commit adds an upgrade guide from ScyllDB 6.0
to ScyllaDB Enterprise 2024.2.

Fixes https://github.com/scylladb/scylladb/issues/20063
Fixes https://github.com/scylladb/scylladb/issues/20062
Refs https://github.com/scylladb/scylla-enterprise/issues/4544

(cherry picked from commit 3d4b7e41ef)

Closes scylladb/scylladb#21620
2024-11-18 17:28:44 +02:00
Emil Maskovsky
0c7c6f85e0 test/topology_custom: fix the flaky test_raft_recovery_stuck
The test is only sending a subset of the running servers for the rolling
restart. The rolling restart is checking the visibility of the restarted
node agains the other nodes, but if that set is incomplete some of the
running servers might not have seen the restarted node yet.

Improved the manager client rolling restart method to consider all the
running nodes for checking the restarted node visibility.

Fixes: scylladb/scylladb#19959

Closes scylladb/scylladb#21477

(cherry picked from commit 92db2eca0b)

Closes scylladb/scylladb#21556
2024-11-15 10:37:20 +02:00
Kefu Chai
2480decbc7 doc: import the new pub keys used to sign the package
before this change, when user follows the instruction, they'd get

```console
$ sudo apt-get update
Hit:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble InRelease
Hit:2 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble-updates InRelease
Hit:3 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble-backports InRelease
Hit:4 http://security.ubuntu.com/ubuntu noble-security InRelease
Get:5 https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease [7550 B]
Err:5 https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease
 The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A43E06657BAC99E3
Reading package lists... Done
W: GPG error: https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease: The following signatures couldn't be verified because the public key is not av
ailable: NO_PUBKEY A43E06657BAC99E3
E: The repository 'https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
```

because the packages were signed with a different keyring.

in this change, we import the new pubkey, so that the pacakge manager
can
verify the new packages (2024.2+ and 6.2+) signed with the new key.

see also https://github.com/scylladb/scylla-ansible-roles/issues/399
and https://forum.scylladb.com/t/release-scylla-manager-3-3-1/2516
for the annonucement on using the new key.

Fixes scylladb/scylladb#21557
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21524

(cherry picked from commit 1cedc45c35)

Closes scylladb/scylladb#21588
2024-11-15 10:36:44 +02:00
Botond Dénes
687a18db38 Merge 'scylla_raid_setup: fix failure on SELinux package installation' from Takuya ASADA
After merged 5a470b2bfb, we found that scylla_raid_setup fails on offline mode
installation.
This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect
internet.
Seems like it occur because of missing "policycoreutils-python-utils"
package, which is the package for "semange" command.
So we need to implement the relabeling patch without using the command.

Fixes https://github.com/scylladb/scylladb/issues/21441

Also, since Amazon Linux 2 has different package name for semange, we need to
adjust package name.

Fixes https://github.com/scylladb/scylladb/issues/21351

Closes scylladb/scylladb#21474

* github.com:scylladb/scylladb:
  scylla_raid_setup: support installing semanage on Amazon Linux 2
  scylla_raid_setup: fix failure on SELinux package installation

(cherry picked from commit 1c212df62d)

Closes scylladb/scylladb#21547
2024-11-14 15:51:06 +02:00
Botond Dénes
548170fb68 Merge '[Backport 6.2] compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors' from ScyllaDB
stop() methods, like destructors must always succeed,
and returning errors from them is futile as there is
nothing else we can do with them by continue with shutdown.

stop_ongoing_compactions, in particular, currently returns the status
of stopped compaction tasks from `stop_tasks`, but still all tasks
must be stopped after it, even if they failed, so assert that
and ignore the errors.

Fixes scylladb/scylladb#21159

* Needs backport to 6.2 and 6.1, as commit 8cc99973eb causes handles storage that might cause compaction tasks to fail and eventually terminate on shudown when the exceptions are thrown in noexcept context in the deferred stop destructor body

(cherry picked from commit e942c074f2)

(cherry picked from commit d8500472b3)

(cherry picked from commit c08ba8af68)

(cherry picked from commit a7a55298ea)

(cherry picked from commit 6cce67bec8)

Refs #21299

Closes scylladb/scylladb#21434

* github.com:scylladb/scylladb:
  compaction_manager: stop: await _stop_future if engaged
  compaction_manager: really_do_stop:  assert that no tasks are left behind
  compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors
  compaction/compaction_manager: stop_tasks(): unlink stopped tasks
  compaction/compaction_manager: make _tasks an intrusive list
2024-11-14 06:59:52 +02:00
Jenkins Promoter
75b79a30da Update ScyllaDB version to: 6.2.2 2024-11-13 23:22:52 +02:00
Benny Halevy
bdf31d7f54 compaction_manager: stop: await _stop_future if engaged
The current condition that consults the compaction manager
state for awaiting `_stop_future` works since _stop_future
is assigned after the state is set to `stopped`, but it is
incidental.  What matters is that `_stop_future` is engaged.

While at it, exchange _stop_future with a ready future
so that stop() can be safely called multiple times.
And dropped the superfluous co_return.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 6cce67bec8)
2024-11-13 10:00:47 +02:00
Benny Halevy
3d915cd091 compaction_manager: really_do_stop: assert that no tasks are left behind
stop_ongoing_compactions now ignores any errors returned
by tasks, and it should leave no task left behind.
Assert that here, before the compaction_manager is destroyed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a7a55298ea)
2024-11-13 09:59:57 +02:00
Benny Halevy
abb26ff913 compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors
stop() methods, like destructors must always succeed,
and returning errors from them is futile as there is
nothing else we can do with them but continue with shutdown.

Leaked errors on the stop path may cause termination
on shutdown, when called in a deferred action destructor.

Fixes scylladb/scylladb#21298

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit c08ba8af68)
2024-11-13 09:56:42 +02:00
Botond Dénes
3f821b7f4f compaction/compaction_manager: stop_tasks(): unlink stopped tasks
Stopped tasks currently linger in _tasks until the fiber that created
the task is scheduled again and unlinks the task. This window between
stop and remove prevents reliable checks for empty _tasks list after all
tasks are stopped.
Unlink the task early so really_do_stop() can safely check for an empty
_tasks list (next patch).

(cherry picked from commit d8500472b3)
2024-11-13 09:56:21 +02:00
Botond Dénes
cab3b86240 compaction/compaction_manager: make _tasks an intrusive list
_tasks is currently std::list<shared_ptr<compaction_task_executor>>, but
it has no role in keeping the instances alive, this is done by the
fibers which create the task (and pin a shared ptr instance).
This lends itself to an intrusive list, avoiding that extra
allocation upon push_back().
Using an intrusive list also makes it simpler and much cheaper (O(1) vs.
O(N)) to remove tasks from the _tasks list. This will be made use of in
the next patch.

Code using _task has to be updated because the value_type changes from
shared_ptr<compaction_task_executor> to compaction_task_executor&.

(cherry picked from commit e942c074f2)
2024-11-13 09:48:00 +02:00
Piotr Dulikowski
2fa4f3a9fc Merge 'main,cql_test_env: start group0_service before view_builder' from Michał Jadwiszczak
In scylladb/scylladb#19745, view_builder was migrated to group0 and since then it is dependant on group0_service.
Because of this, group0_service should be initialized/destroyed before/after view_builder.

This patch also adds error injection to `raft_server_with_timeouts::read_barrier`, which does 1s sleep before doing the read barrier. There is a new test which reproduces the use after free bug using the error injection.

Fixes scylladb/scylladb#20772

scylladb/scylladb#19745 is present in 6.2, so this fix should be backported to it.

Closes scylladb/scylladb#21471

* github.com:scylladb/scylladb:
  test/boost/secondary_index_test: add test for use after free
  api/raft: use `get_server_with_timeouts().read_barrier()` in coroutines
  main,cql_test_env: start group0_service before view_builder

(cherry picked from commit 7021efd6b0)

Closes scylladb/scylladb#21506
2024-11-12 14:36:06 +01:00
Yaron Kaikov
a3e69cc8fb ./github/workflows/add-label-when-promoted.yaml: Run auto-backport only on default branch
In https://github.com/scylladb/scylladb/pull/21496#event-15221789614
```
scylladbbot force-pushed the backport/21459/to-6.1 branch from 414691c to 59a4ccd Compare 2 days ago
```

Backport automation triggered by `push` but also should either start from `master` branch (or `enterprise` branch from Enterprise), we need to verify it by checking also the default branch.

Fixes: https://github.com/scylladb/scylladb/issues/21514

Closes scylladb/scylladb#21515

(cherry picked from commit 2596d1577b)

Closes scylladb/scylladb#21531
2024-11-11 17:43:54 +02:00
Michał Chojnowski
876017efee mvcc_test: fix a benign failure of test_apply_to_incomplete_respects_continuity
For performance reasons, mutation_partition_v2::maybe_drop(), and by extension
also mutation_partition_v2::apply_monotonically(mutation_partition_v2&&)
can evict empty row entries, and hence change the continuity of the merged
entry.

For checking that apply_to_incomplete respects continuity,
test_apply_to_incomplete_respects_continuity obtains the continuity of
the partition entry before and after apply_to_incomplete by calling
e.squashed().get_continuity(). But squashed() uses apply_monotonically(),
so in some circumstances the result of squashed() can have smaller
continuity than the argument of squashed(), which messes with the thing
that the test is trying to check, and causes spurious failures.

This patch changes the method of calculating the continuity set,
so that it matches the entry exactly, fixing the test failures.

Fixes scylladb/scylladb#13757

Closes scylladb/scylladb#21459

(cherry picked from commit 35921eb67e)

Closes scylladb/scylladb#21497
2024-11-08 15:32:24 +01:00
Yaron Kaikov
9eed1d1cbd .github/scripts/auto-backport.py: update method to get closed prs
`commit.get_pulls()` in PyGithub returns pull requests that are directly associated with the given commit

Since in closed PR. the relevant commit is an event type, the backport
automation didn't get the PR info for backporting

Ref: https://github.com/scylladb/scylladb/issues/18973

Closes scylladb/scylladb#21468

(cherry picked from commit ef104b7b96)

Closes scylladb/scylladb#21483
2024-11-08 10:26:10 +02:00
Yaron Kaikov
d33538bdd4 .github/script/auto-backport.py: push backport PR to scylladbbot fork
Since Scylla is a public repo, when we create a fork, it doesn't fork the team and permissions (unlike private repos where it does).

When we have a backport PR with conflicts, the developers need to be able to update the branch to fix the conflicts. To do so, we modified the logic of the backport automation as follows:

- Every backport PR (with and without conflicts) will be open directly on the `scylladbbot` fork repo
- When there are conflicts, an email will be sent to the original PR author with an invitation to become a contributor in the `scylladbbot` fork with `push` permissions. This will happen only once if Auther is not a contributor.
- Together with sending the invite, all backport labels will be removed and a comment will be added to the original PR with instructions
- The PR author must add the backport labels after the invitation is accepted

Fixes: https://github.com/scylladb/scylladb/issues/18973

Closes scylladb/scylladb#21401

(cherry picked from commit 77604b4ac7)

Closes scylladb/scylladb#21466
2024-11-07 12:38:56 +02:00
Yaron Kaikov
073c9cbaa1 github: add script for backports automation instead of Mergify
Adding an auto-backport.py script to handle backport automation instead of Mergify.

The rules of backport are as follows:

* Merged or Closed PRs with any backport/x.y label (one or more) and promoted-to-master label
* Backport PR will be automatically assigned to the original PR author
* In case of conflicts the backport PR will be open in the original autoor fork in draft mode. This will give the PR owner the option to resolve conflicts and push those changes to the PR branch (Today in Scylla when we have conflicts, the developers are forced to open another PR and manually close the backport PR opened by Mergify)
* Fixing cherry-pick the wrong commit SHA. With the new script, we always take the SHA from the stable branch
* Support backport for enterprise releases (from Enterprise branch)

Fixes: https://github.com/scylladb/scylladb/issues/18973
(cherry picked from commit f9e171c7af)

Closes scylladb/scylladb#21469
2024-11-07 06:57:05 +02:00
Tomasz Grabiec
a3a0ffbcd0 Merge 'tablet: Fix single-sstable split when attaching new unsplit sstables' from Raphael "Raph" Carvalho
To fix a race between split and repair here c1de4859d8, a new sstable
  generated during streaming can be split before being attached to the sstable
  set. That's to prevent an unsplit sstable from reaching the set after the
  tablet map is resized.

  So we can think this split is an extension of the sstable writer. A failure
  during split means the new sstable won't be added. Also, the duration of split
  is also adding to the time erm is held. For example, repair writer will only
  release its erm once the split sstable is added into the set.

  This single-sstable split is going through run_custom_job(), which serializes
  with other maintenance tasks. That was a terrible decision, since the split may
  have to wait for ongoing maintenance task to finish, which means holding erm
  for longer. Additionally, if split monitor decides to run split on the entire
  compaction group, it can cause single-sstable split to be aborted since the
  former wants to select all sstables, propagating a failure to the streaming
  writer.
  That results in new sstable being leaked and may cause problems on restart,
  since the underlying tablet may have moved elsewhere or multiple splits may
  have happened. We have some fragility today in cleaning up leaked sstables on
  streaming failure, but this single-sstable split made it worse since the
  failure can happen during normal operation, when there's e.g. no I/O error.

  It makes sense to kill run_custom_job() usage, since the single-sstable split
  is offline and an extension of sstable writing, therefore it makes no sense to
  serialize with maintenance tasks. It must also inherit the sched group of the
  process writing the new sstable. The inheritance happens today, but is fragile.

  Fixes #20626.

Closes scylladb/scylladb#20737

* github.com:scylladb/scylladb:
  tablet: Fix single-sstable split when attaching new unsplit sstables
  replica: Fix tablet split execute after restart

(cherry picked from commit bca8258150)

Ref scylladb/scylladb#21415
2024-11-06 15:01:35 +02:00
Botond Dénes
8bf76d6be7 Merge '[Backport 6.2] replica: Fix tombstone GC during tablet split preparation' from Raphael Raph Carvalho
During split prepare phase, there will be more than 1 compaction group with
overlapping token range for a given replica.

Assume tablet 1 has sstable A containing deleted data, and sstable B containing
a tombstone that shadows data in A.

Then split starts:

sstable B is split first, and moved from main (unsplit) group to a
split-ready group
now compaction runs in split-ready group before sstable A is split
tombstone GC logic today only looks at underlying group, so compaction is step
2 will discard the deleted data in A, since it belongs to another group (the
unsplit one), and so the tombstone can be purged incorrectly.

To fix it, compaction will now work with all uncompacting sstables that belong
to the same replica, since tombstone GC requires all sstables that possibly
contain shadowed data to be available for correct decision to be made.

Fixes https://github.com/scylladb/scylladb/issues/20044.

Please replace this line with justification for the backport/* labels added to this PR
Branches 6.0, 6.1 and 6.2 are vulnerable, so backport is needed.

(cherry picked from commit bcd358595f)

(cherry picked from commit 93815e0649)

Refs https://github.com/scylladb/scylladb/pull/20939

Closes scylladb/scylladb#21206

* github.com:scylladb/scylladb:
  replica: Fix tombstone GC during tablet split preparation
  service: Improve error handling for split
2024-11-06 09:55:47 +02:00
Raphael S. Carvalho
1e51ed88c6 replica: Fix tombstone GC during tablet split preparation
During split prepare phase, there will be more than 1 compaction group with
overlapping token range for a given replica.

Assume tablet 1 has sstable A containing deleted data, and sstable B containing
a tombstone that shadows data in A.

Then split starts:
1) sstable B is split first, and moved from main (unsplit) group to a
split-ready group
2) now compaction runs in split-ready group before sstable A is split

tombstone GC logic today only looks at underlying group, so compaction is step
2 will discard the deleted data in A, since it belongs to another group (the
unsplit one), and so the tombstone can be purged incorrectly.

To fix it, compaction will now work with all uncompacting sstables that belong
to the same replica, since tombstone GC requires all sstables that possibly
contain shadowed data to be available for correct decision to be made.

Fixes #20044.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 93815e0649)
2024-11-04 14:24:18 -03:00
Raphael S. Carvalho
ca5f938ed4 service: Improve error handling for split
Retry wasn't really happening since the loop was broken and sleep
part was skipped on error. Also, we were treating abort of split
during shutdown as if it were an actual error and that confused
longevity tests that parse for logs with error level. The fix is
about demoting the level of logs when we know the exception comes
from shutdown.

Fixes #20890.

(cherry picked from commit bcd358595f)
2024-11-04 14:22:08 -03:00
Botond Dénes
fb20ea7de1 Merge '[Backport 6.2] tasks: fix virtual tasks children' from ScyllaDB
Fix how regular tasks that have a virtual parent are created
in task_manager::module::make_task: set sequence number
of a task and subscribe to module's abort source.

Fixes: #21278.

Needs backport to 6.2

(cherry picked from commit 1eb47b0bbf)

(cherry picked from commit 910a6fc032)

Refs #21280

Closes scylladb/scylladb#21332

* github.com:scylladb/scylladb:
  tasks: fix sequence number assignment
  tasks: fix abort source subscription of virtual task's child
2024-11-04 18:18:35 +02:00
Tzach Livyatan
d5eb12c25d Update os-support-info.rst - add CentOS
ScyllaDB support RHEL 9 and derivatives, including CentOS 9.

Fix https://github.com/scylladb/scylladb/issues/21309

(cherry picked from commit 1878af9399)

Closes scylladb/scylladb#21331
2024-11-04 18:17:46 +02:00
Aleksandra Martyniuk
291f568585 test: repair: drop log checks from test_repair_succeeds_with_unitialized_bm
Currently, test_repair_succeeds_with_unitialized_bm checks whether
repair finishes successfully and the error is properly handled
if batchlog_manager isn't initialized. Error handling depends on
logs, making the test fragile to external conditions and flaky.

Drop the error handling check, successful repair is a sufficient
passing condition.

Fixes: #21167.
(cherry picked from commit 85d9565158)

Closes scylladb/scylladb#21330
2024-11-04 18:16:55 +02:00
Botond Dénes
d5475fbc07 Merge '[Backport 6.2] repair: Fix finished ranges metrics for removenode' from ScyllaDB
The skipped ranges should be multiplied by the number of tables

Otherwise the finished ranges ratio will not reach 100%.

Fixes #21174

(cherry picked from commit cffe3dc49f)

(cherry picked from commit 1392a6068d)

(cherry picked from commit 9868ccbac0)

Refs #21252

Closes scylladb/scylladb#21313

* github.com:scylladb/scylladb:
  test: Add test_node_ops_metrics.py
  repair: Make the ranges more consistent in the log
  repair: Fix finished ranges metrics for removenode
2024-11-04 18:16:21 +02:00
Anna Stuchlik
6916dbe822 doc: remove the Cassandra references from notedool
This PR removes the reference to Cassandra from the nodetool index,
as the native nodetool is no longer a fork.

In addition, it removes the Apache copyright.

Fixes https://github.com/scylladb/scylladb/issues/21238

(cherry picked from commit ef4bcf8b3f)

Closes scylladb/scylladb#21307
2024-11-04 18:15:36 +02:00
Michał Jadwiszczak
f51a8ed541 test/auth_cluster/test_raft_service_levels: match enterprise SL limit
Despite OSS doesn't limit number of created service levels, match the
enterprise limit to decrease divergence in the test between OSS and
enterprise.

Fixes scylladb/scylladb#21044

(cherry picked from commit 846d94134f)

Closes scylladb/scylladb#21282
2024-11-04 18:14:38 +02:00
Calle Wilund
127606f788 cql_test_env/gossip: Prevent double shutdown call crash
Fixes #21159

When an exception is thrown in sstable write etc such that
storage_manager::isolate is initiated, we start a shutdown chain
for message service, gossip etc. These are synced (properly) in
storage_manager::stop, but if we somehow call gossiper::shutdown
outside the normal service::stop cycle, we can end up running the
method simultaneously, intertwined (missing the guard because of
the state change between check and set). We then end up co_awaiting
an invalid future (_failure_detector_loop_done) - a second wait.

Fixed by
a.) Remove superfluous gossiper::shutdown in cql_test_env. This was added
    in 20496ed, ages ago. However, it should not be needed nowadays.
b.) Ensure _failure_detector_loop_done is always waitable. Just to be sure.

(cherry picked from commit c28a5173d9)

Closes scylladb/scylladb#21393
2024-11-04 16:52:42 +01:00
Benny Halevy
56a0fa922d storage_service: on_change: update_peer_info only if peer info changed
Return an optional peer_info from get_peer_info_for_update
when the `app_state_map` arg does not change peer_info,
so that we can skip calling update_peer_info, if it didn't
change.

Fixes scylladb/scylladb#20991
Refs scylladb/scylladb#16376

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#21152

(cherry picked from commit 04d741bcbb)
2024-11-04 11:20:32 +02:00
Benny Halevy
c841a4a851 compaction_manager: compaction_disabled: return true if not in compaction_state
When a compaction_group is removed via `compaction_manager::remove`,
it is erase from `_compaction_state`, and therefore compaction
is definitely not enabled on it.

This triggers an internal error if tablets are cleaned up
during drop/truncate, which checks that compaction is disabled
in all compaction groups.

Note that the callers of `compaction_disabled` aren't really
interested in compaction being actively disabled on the
compaction_group, but rather if it's enabled or not.
A follow-up patch can be consider to reverse the logic
and expose `compaction_enabled` rather than `compaction_disabled`.

Fixes scylladb/scylladb#20060

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 1c55747637)

Closes scylladb/scylladb#21404
2024-11-03 16:05:05 +02:00
Gleb Natapov
1a9721e93e topology coordinator: take a copy of a replication state in raft_topology_cmd_handler
Current code takes a reference and holds it past preemption points. And
while the state itself is not suppose to change the reference may
become stale because the state is re-created on each raft topology
command.

Fix it by taking a copy instead. This is a slow path anyway.

Fixes: scylladb/scylladb#21220
(cherry picked from commit fb38bfa35d)

Closes scylladb/scylladb#21361
2024-10-30 14:11:17 +01:00
Kamil Braun
1dded7e52f Merge '[Backport 6.2] fix nodetool status to show zero-token nodes' from ScyllaDB
In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes.

This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API  and adding appropriate logic in scylla-nodetool.cc to support zero token nodes.

A test is also added in nodetool/test_status.py to verify this logic. This test fails without this commit’s zero token node support logic, hence verifying the behavior.

This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only
to 6.2 version, since earlier versions don't support zero token nodes.

Fixes: scylladb/scylladb#19849
Fixes: scylladb/scylladb#17857

(cherry picked from commit 72f3c95a63)

(cherry picked from commit 39dfd2d7ac)

(cherry picked from commit c00d40b239)

Refs scylladb/scylladb#20909

Closes scylladb/scylladb#21334

* github.com:scylladb/scylladb:
  fix nodetool status to show zero-token nodes
  test: move `wait_for_first_completed` to pylib/util.py
  token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes
2024-10-29 10:50:35 +01:00
Abhinav
9082d66d8a fix nodetool status to show zero-token nodes
In the current scenario, the nodetool status doesn’t display information
regarding zero token nodes. For example, if 5 nodes are spun by the
administrator, out of which, 2 nodes are zero token nodes, then nodetool
status only shows information regarding the 3 non-zero token nodes.

This commit intends to fix this issue by leveraging the “/storage_service/host_id
” API  and adding appropriate logic in scylla-nodetool.cc to support zero token nodes.

Robust topology tests are added, which spins up scylla nodes and confirm nodetool
status output for various cases, providing good coverage.
A test is also added in nodetool/test_status.py to verify this logic. These tests fail
without this commit’s zero token node support logic, hence verifying the behavior.

The test `test_status_keyspace_joining_node` has been removed. This test is
based on case where host_id=None, which is impossible. Since we now use
host_id_map for node discovery in nodetool, the nodes with "host_id=None"
go undetected. Since this case is anyway impossible, we can get rid of this.

This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only
to 6.2 version, since earlier versions dont support zero token nodes.

Fixes: scylladb/scylladb#19849
(cherry picked from commit c00d40b239)
2024-10-28 21:33:55 +00:00
Abhinav
c7a0876a73 test: move wait_for_first_completed to pylib/util.py
This function is needed in a new test added in the next commit and this
refactoring avoids code duplication.

(cherry picked from commit 39dfd2d7ac)
2024-10-28 21:33:55 +00:00
Abhinav
917d40e600 token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes
Rename host_id map getter, 'get_endpoint_to_host_id_map_for_reading' to 'get_endpoint_to_host_id_map_'
Also modify the getter to return information regarding joining nodes as well.

This getter will later be used for retrieving the nodes in nodetool status, hence it needs to show all nodes,
including joining ones.

The function name suffix `_for_reading` suggests that the function was used
in some other places in the past, and indeed if we need endpoints
"for reading" then we cannot show joining endpoints. But it was confirmed
that this function is currently only used by "/storage_service/host_id" endpoint,
hence it can be modified as required.

Fixes: scylladb/scylladb#17857
(cherry picked from commit 72f3c95a63)
2024-10-28 21:33:54 +00:00
Aleksandra Martyniuk
1fd60424d9 tasks: fix sequence number assignment
Currently, children of virtual tasks do not have sequence number
assigned. Fix it.

(cherry picked from commit 910a6fc032)
2024-10-28 21:32:49 +00:00
Aleksandra Martyniuk
af6ddebc7f tasks: fix abort source subscription of virtual task's child
Currently, if a regular task does not have a parent or its parent
is a virtual tasks then it subscribes to module's abort source
in task_manager::task::impl constructor. However, at this point
the kind of the task's parent isn't set. Due to that, children
of virtual tasks aren't aborted on shutdown.

Subscribe to module's abort source in task::impl::set_virtual_parent.

(cherry picked from commit 1eb47b0bbf)
2024-10-28 21:32:49 +00:00
Tomasz Grabiec
fa71b82da4 node-exporter: Disable hwmon collector
This collector reads nvme temperature sensor, which was observed to
cause bad performance on Azure cloud following the reading of the
sensor for ~6 seconds. During the event, we can see elevated system
time (up to 30%) and softirq time. CPU utilization is high, with
nvm_queue_rq taking several orders of magnitude more time than
normally. There are signs of contention, we can see
__pv_queued_spin_lock_slowpath in the perf profile, called. This
manifests as latency spikes and potentially also throughput drop due
to reduced CPU capacity.

By default, the monitoring stack queries it once every 60s.

(cherry picked from commit 93777fa907)

Closes scylladb/scylladb#21304
2024-10-28 15:05:06 +01:00
Asias He
1a5a6a0758 test: Add test_node_ops_metrics.py
It tests the node_ops_metrics_done metric reaches 100% when a node ops
is done.

Refs: #21174
(cherry picked from commit 9868ccbac0)
2024-10-28 09:54:30 +00:00
Asias He
6ae5481de4 repair: Make the ranges more consistent in the log
Consider the number of tables for the number of ranges logging. Make it
more consistent with the log when the ops starts.

(cherry picked from commit 1392a6068d)
2024-10-28 09:54:30 +00:00
Asias He
0bc22db3a9 repair: Fix finished ranges metrics for removenode
The skipped ranges should be multiplied by the number of tables.

Otherwise the finished ranges ratio will not reach 100%.

Fixes #21174

(cherry picked from commit cffe3dc49f)
2024-10-28 09:54:30 +00:00
Botond Dénes
b78675270e streaming: stream-session: switch to tracking permit
The stream-session is the receiving end of streaming, it reads the
mutation fragment stream from an RPC stream and writes it onto the disk.
As such, this part does no disk IO and therefore, using a permit with
count resources is superfluous. Furthermore, after
d98708013c, the count resources on this
permit can cause a deadlock on the receiver end, via the
`db::view::check_view_update_path()`, which wants to read the content of
a system table and therefore has to obtain a permit of its own.

Switch to a tracking-only permit, primarily to resolve the deadlock, but
also because admission is not necessary for a read which does no IO.

Refs: scylladb/scylladb#20885 (partial fix, solves only one of the deadlocks)
Fixes: scylladb/scylladb#21264
(cherry picked from commit dbb26da2aa)

Closes scylladb/scylladb#21303
2024-10-28 08:07:05 +02:00
Jenkins Promoter
ea6fe4bfa1 Update ScyllaDB version to: 6.2.1 2024-10-27 12:06:35 +02:00
Botond Dénes
30a2ed7488 Merge '[Backport 6.2] cql/tablets: fix retrying ALTER tablets KEYSPACE' from Marcin Maliszkiewicz
ALTER tablets-enabled KEYSPACES (KS) may fail due to
group0_concurrent_modification, in which case it's repeated by a for
loop surrounding the code. But because raft's add_entry consumes the
raft's guard (by std::move'ing the guard object), retries of ALTER KS
will use a moved-from guard object, which is UB, potentially a crash.
The fix is to remove the before mentioned for loop altogether and rethrow the exception, as the rf_change event
will be repeated by the topology state machine if it receives the
concurrent modification exception, because the event will remain present
in the global requests queue, hence it's going to be executed as the
very next event.
Note: refactor is implemented in the follow-up commit.

Fixes: https://github.com/scylladb/scylladb/issues/21102

Should be backported to every 6.x branch, as it may lead to a crash.

(cherry picked from commit de511f56ac)

(cherry picked from commit 3f4c8a30e3)

(cherry picked from commit 522bede8ec)

Refs https://github.com/scylladb/scylladb/pull/21121

Closes scylladb/scylladb#21256

* github.com:scylladb/scylladb:
  test: topology: add disable_schema_agreement_wait utility function
  test: add UT to test retrying ALTER tablets KEYSPACE
  cql/tablets: fix indentation in `rf_change` event handler
  cql/tablets: fix retrying ALTER tablets KEYSPACE
2024-10-25 10:57:36 +03:00
Botond Dénes
dcddb1ff4a Merge '[Backport 6.2] multishard reader: make it safe to create with admitted permits' from ScyllaDB
Passing an admitted permit -- i.e. one with count resources on it -- to the multishard reader, will possibly result in a deadlock, because the permit of the multishard reader is destroyed after the permits of its child readers. Therefore its semaphore resources won't be automatically released until children acquire their own resources. This creates a dependency (an edge in the "resource allocation graph"), where the semaphore used by the multishard reader depends on the semaphores used by children. When such dependencies create a cycle, and permits are acquired by different reads in just the right order, a deadlock will  happen.

Users of the multishard reader have to be aware of this gotcha -- and of course they aren't. This is small wonder, considering that not even the documentation on the multishard reader mentions this problem. To work around this, the user has to call `reader_permit::release_base_resources()` on the permit, before passing it to the multishard reader. On multiple occasions, developers (including the very author of the multishard reader), forgot or didn't know about this and this resulted in deadlocks down the line. This is a design-flaw of the multishard reader, which is addressed in this PR, after which, it is safe to pass admitted or not admitted permits to the multishard reader, it will handle the call to `release_base_resources()` if needed.

After fixing the problem in the multishard reader, the existing calls to `release_base_resources()` on permits passed to multishard readers are removed. A test is added which reproduces the problem and ensures we don't regress.

Refs: https://github.com/scylladb/scylladb/issues/20885 (partial fix, there is another deadlock in that issue, which this PR doesn't fix)
Fixes: https://github.com/scylladb/scylladb/issues/21263

This fixes (indirectly) a regression introduced by d98708013c so it has to be backported to 6.2

(cherry picked from commit e1d8cddd09)

Refs scylladb/scylladb#21058

Closes scylladb/scylladb#21178

* github.com:scylladb/scylladb:
  test/boost/mutation_test: add test for multishard permit safety
  test/lib/reader_lifecycle_policy: add semaphore factory to constructor
  test/lib/reader_lifecycle_policy: rename factory_function
  repair/row_level: drop now unneeded release_base_resource() calls
  readers/multishard: make multishard reader safe to create with admitted permits
2024-10-25 09:32:03 +03:00
Piotr Dulikowski
4ca0e31415 test/test_view_build_status: properly wait for v2 in migration test
The test_view_build_status_migration_to_v2 test case creates a new view
(vt2) after peforming the view_build_status -> view_build_status_v2
migration and waits until it is built by `wait_for_view_v2` function. It
works by waiting until a SELECT from view_build_status_v2 will return
the expected number of rows for a given view.

However, if the host parameter is unspecified, it will query only one
node on each attempt. Because `view_build_status_v2` is managed via
raft, queries always return data from the queried node only. It might
happen that `wait_for_view_v2` fetches expected results from one node
while a different node might be lagging behind the group0 coordinator
and might not have all data yet.

In case of test_view_build_status_migration_to_v2 this is a problem - it
first uses `wait_for_view_v2` to wait for view, later it queries
`view_build_status_v2` on a random node and asserts its state - and
might fail because that node didn't have the newest state yet.

Fix the issue by issuing `wait_for_view_v2` in parallel for all nodes in
the cluster and waiting until all nodes have the most recent state.

Fixes: scylladb/scylladb#21060
(cherry picked from commit a380a2efd9)

Closes scylladb/scylladb#21129
2024-10-24 16:42:53 +03:00
Raphael S. Carvalho
363bc7424e locator: Always preserve balancing_enabled in tablet_metadata::copy()
When there are zero tablets, tablet_metadata::_balancing_enabled
is ignored in the copy.

The property not being preserved can result in balancer not
respecting user's wish to disable balancing when a replica is
created later on.

Fixes #21175.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit dfc217f99a)

Closes scylladb/scylladb#21190
2024-10-24 16:37:41 +03:00
Botond Dénes
a5b11a3189 test/boost/mutation_test: add test for multishard permit safety
Add a test checking that the multishard reader will not deadlock, when
created with an admitted permit, on a semaphore with a single count
resource.

(cherry picked from commit e1d8cddd09)
2024-10-24 09:18:11 -04:00
Botond Dénes
c0eba659f6 test/lib/reader_lifecycle_policy: add semaphore factory to constructor
Allowing callers to specify how the semaphore is created and stopped,
instead of doing so via boolean flags like it is done currently. This
method doesn't scale, so use a factory instead.

(cherry picked from commit 5a3fd69374)
2024-10-24 09:18:11 -04:00
Botond Dénes
dbb1dc872d test/lib/reader_lifecycle_policy: rename factory_function
To reader_factor_function. We are about to add a new factory function
parameters, so the current factory_function has to be renamed to
something more specific.

(cherry picked from commit c8598e21e8)
2024-10-24 09:18:11 -04:00
Botond Dénes
07b288b7d7 repair/row_level: drop now unneeded release_base_resource() calls
The multishard reader now does this itself, no need to do it here.

(cherry picked from commit 76a5ba2342)
2024-10-24 09:18:11 -04:00
Botond Dénes
41a44ddc12 readers/multishard: make multishard reader safe to create with admitted permits
Passing an admitted permit -- i.e. one with count resources on it -- to
the multishard reader, will possibly result in a deadlock, because the
permit of the multishard reader is destroyed after the permits of its
child readers. Therefore its semaphore resources won't be automatically
released until children acquire their own resources.
This creates a dependency (an edge in the "resource allocation graph"),
where the semaphore used by the multishard reader depends on the
semaphores used by children. When such dependencies create a cycle, and
permits are acquired by different reads in just the right order, a
deadlock will happen.

Users of the multishard reader have to be aware of this gotcha -- and of
course they aren't. This is small wonder, considering that not even the
documentation on the multishard reader mentions this problem.
To work around this, the user has to call
`reader_permit::release_base_resources()` on the permit, before passing
it to the multishard reader.
On multiple occasions, developers (including the very author of the
multishard reader), forgot or didn't know about this and this resulted
in deadlocks down the line.
This is a design-flaw of the multishard reader, which is addressed in
this patch, after which, it is safe to pass admitted or not admitted
permits to the multishard reader, it will handle the call to
`release_base_resources()` if needed.

(cherry picked from commit 218ea449a5)
2024-10-24 09:18:11 -04:00
Lakshmi Narayanan Sreethar
3f04df55eb [Backport 6.2] replica/table: check memtable before discarding tombstone during read
On the read path, the compacting reader is applied only to the sstable
reader. This can cause an expired tombstone from an sstable to be purged
from the request before it has a chance to merge with deleted data in
the memtable leading to data resurrection.

Fix this by checking the memtables before deciding to purge tombstones
from the request on the read path. A tombstone will not be purged if a
key exists in any of the table's memtables with a minimum live timestamp
that is lower than the maximum purgeable timestamp.

Fixes #20916

`perf-simple-query` stats before and after this fix :

`build/Dev/scylla perf-simple-query --smp=1 --flush` :
```
// Before this Fix
// ---------------
94941.79 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59393 insns/op,   24029 cycles/op,        0 errors)
97551.14 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59376 insns/op,   23966 cycles/op,        0 errors)
96599.92 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59367 insns/op,   23998 cycles/op,        0 errors)
97774.91 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59370 insns/op,   23968 cycles/op,        0 errors)
97796.13 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59368 insns/op,   23947 cycles/op,        0 errors)

         throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79
instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02
  cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19

// After this Fix
// --------------
95313.53 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59392 insns/op,   24058 cycles/op,        0 errors)
97311.48 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59375 insns/op,   24005 cycles/op,        0 errors)
98043.10 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59381 insns/op,   23941 cycles/op,        0 errors)
96750.31 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59396 insns/op,   24025 cycles/op,        0 errors)
93381.21 tps ( 71.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   59390 insns/op,   24097 cycles/op,        0 errors)

         throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21
instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73
  cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22
```

This PR fixes a regression introduced in ce96b472d3 and should be backported to older versions.

Closes scylladb/scylladb#20985

* github.com:scylladb/scylladb:
  topology-custom: add test to verify tombstone gc in read path
  replica/table: check memtable before discarding tombstone during read
  compaction_group: track maximum timestamp across all sstables

(cherry picked from commit 519e167611)

Backported from #20985 to 6.2.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#21251
2024-10-24 15:33:39 +03:00
Marcin Maliszkiewicz
8c8f97c280 test: topology: add disable_schema_agreement_wait utility function
Code extracted from fa45fdf5f7 as it's being used by
test_alter_tablets_keyspace_concurrent_modification and we're
backporting it.
2024-10-24 11:24:56 +02:00
Piotr Smaron
a61ab7d02e test: add UT to test retrying ALTER tablets KEYSPACE
The newly added testcase is based on the already existing
`test_alter_dropped_tablets_keyspace`.
A new error injection is created, which stops the ALTER execution just
before the changes are submitted to RAFT. In the meantime, a new schema
change is performed using the 2nd node in the cluster, thus causing the
1st node to retry the ALTER statement.

(cherry picked from commit 522bede8ec)
2024-10-23 13:35:26 +00:00
Piotr Smaron
775578af59 cql/tablets: fix indentation in rf_change event handler
Just moved the code that previously was under a `for` loop by 1 tab, i.e. 4 spaces, to the left.

(cherry picked from commit 3f4c8a30e3)
2024-10-23 13:35:26 +00:00
Piotr Smaron
97f22f426f cql/tablets: fix retrying ALTER tablets KEYSPACE
ALTER tablets-enabled KEYSPACES (KS) may fail due to
`group0_concurrent_modification`, in which case it's repeated by a `for`
loop surrounding the code. But because raft's `add_entry` consumes the
raft's guard (by `std::move`'ing the guard object), retries of ALTER KS
will use a moved-from guard object, which is UB, potentially a crash.
The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event
will be repeated by the topology state machine if it receives the
concurrent modification exception, because the event will remain present
in the global requests queue, hence it's going to be executed as the
very next event.
`topology_coordinator::handle_topology_coordinator_error` handling the
case of `group0_concurrent_modification` has been extended with logging
in order not to write catch-log-throw boilerplate.
Note: refactor is implemented in the follow-up commit.

Fixes: scylladb/scylladb#21102
(cherry picked from commit de511f56ac)
2024-10-23 13:35:26 +00:00
Botond Dénes
55a9605687 Merge '[Backport 6.2] Check system.tablets update before putting it into the table' from ScyllaDB
Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load.

Fixes #20043

(cherry picked from commit f09fe4f351)

(cherry picked from commit e5bf376cbc)

(cherry picked from commit 1863ccd900)

Refs #21020

Closes scylladb/scylladb#21111

* github.com:scylladb/scylladb:
  tablets: Validate system.tablets update
  group0_client: Introduce change validation
  group0_client: Add shared_token_metadata dependency
2024-10-23 10:00:39 +03:00
Pavel Emelyanov
83cc3e4791 tablets: Validate system.tablets update
Implement change validation for raft topology_change command. For now
the only check is that the "pending replicas" contains at most one
entry. The check mirrors similar one in `process_one_row` function.

If not passed, this prevents system.tablets from being updated with the
mutation(s) that will not be loaded later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-22 14:45:51 +03:00
Pavel Emelyanov
aef7e7db0b group0_client: Introduce change validation
Add validate_change() methods (well, a template and an overload) that
are called by prepare_command() and are supposed to validate the
proposed change before it hits persistent storage

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-22 14:45:22 +03:00
Pavel Emelyanov
282cdfcfcc group0_client: Add shared_token_metadata dependency
It will be needed later to get tablet_metadata from.
The dependency is "OK", shared_token_metadata is low-level sharded
service. Client already references db::system_keyspace, which in turn
references replica::database which, finally, references token_metadata

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-22 14:45:12 +03:00
Daniel Reis
b661bc39df docs: fix redirect from cert-based auth to security/enable-auth page
(cherry picked from commit 28a265ccd8)

Closes scylladb/scylladb#21124
2024-10-22 09:10:04 +03:00
Botond Dénes
0805780064 Merge '[Backport 6.2] scylla_raid_setup: configure SELinux file context' from ScyllaDB
On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla.

Fixes #19325

(cherry picked from commit 56c971373c)

(cherry picked from commit 0ac450de05)

Refs #20528

Closes scylladb/scylladb#21211

* github.com:scylladb/scylladb:
  scylla_raid_setup: configure SELinux file context
  scylla_coredump_setup: fix SELinux configuration for RHEL9
2024-10-21 16:03:08 +03:00
Takuya ASADA
3de8885161 scylla_raid_setup: configure SELinux file context
On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump
because the service only have write acess with systemd_coredump_var_lib_t.
To make it writable, we need to add file context rule for
/var/lib/scylla/coredump, and run restorecon on /var/lib/scylla.

Fixes #20573

(cherry picked from commit 0ac450de05)
2024-10-21 11:15:06 +00:00
Takuya ASADA
29a0ce3b0a scylla_coredump_setup: fix SELinux configuration for RHEL9
Seems like specific version of systemd pacakge on RHEL9 has a bug on
SELinux configuration, it introduced "systemd-container-coredump" module
to provide rule for systemd-coredump, but not enabled by default.
We have to manually load it, otherwise it causes permission error.

Fixes #19325

(cherry picked from commit 56c971373c)
2024-10-21 11:15:06 +00:00
Benny Halevy
eebf97c545 view: check_needs_view_update_path: get token_metadata_ptr
check_needs_view_update_path is async and might yield
so the token_metadata reference passed to it must be kept
alive throughout the call.

Fixes scylladb/scylladb#20979

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit eaa3b774a6)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#21038
2024-10-21 10:27:52 +02:00
Artsiom Mishuta
a728695d10 test.py: deselect remove_data_dir_of_dead_node event
deselect remove_data_dir_of_dead_node event from test_random_failures
due to ussue #20751

(cherry picked from commit 9b0e15678e)

Closes scylladb/scylladb#21138
2024-10-17 11:38:35 +02:00
Piotr Smaron
82a34aa837 test: fix flaky test_multidc_alter_tablets_rf
The testcase is flaky due to a known python driver issue:
https://github.com/scylladb/python-driver/issues/317.
This issue causes the `CREATE KEYSPACE` statement to be sometimes
executed twice in a row, and the 2nd CREATE statement causes the test to
fail.
In order to work around it, it's enough to add `if not exists` when
creating a ks.

Fixes: #21034

Needs to be backported to all 6.x branches, as the PR introducing this flakiness is backported to every 6.x branch.

(cherry picked from commit f8475915fb)

Closes scylladb/scylladb#21107
2024-10-15 09:26:28 +03:00
Piotr Dulikowski
d10c6a86cc SCYLLA-VERSION-GEN: correct the logic for skipping SCYLLA-*-FILE
The SCYLLA-VERSION-GEN file skips updating the SCYLLA-*-FILE files if
the commit hash from SCYLLA-RELEASE-FILE is the same. The original
reason for this was to prevent the date in the version string from
changing if multiple modes are built across midnight
(scylladb/scylla-pkg#826). However - intentionally or not - it serves
another purpose: it prevents an infinite loop in the build process.

If the build.ninja file needs to be rebuilt, the configure.py script
unconditionally calls ./SCYLLA-VERSION-GEN. On the other hand, if one
of the SCYLLA-*-FILE files is updated then this triggers rebuild
of build.ninja. Apparently, this is sufficient for ninja to enter an
infinite loop.

However, the check assumes that the RELEASE is in the format

  <build identifier>.<date>.<commit hash>

and assumes that none of the components have a dot inside - otherwise it
breaks and just works incorrectly. Specifically, when building a private
version, it is recommended to set the build identifier to
`count.yourname`.

Previously, before 85219e9, this problem wasn't noticed most likely
because reconfigure process was broken and stopped overwriting
the build.ninja file after the first iteration.

Fix the problem by fixing the logic that extracts the commit hash -
instead of looking at the third dot-separated field counting from the
left side, look at the last field.

Fixes: scylladb/scylladb#21027
(cherry picked from commit 64ca58125e)

Closes scylladb/scylladb#21103
2024-10-15 09:26:00 +03:00
Botond Dénes
554838691b Merge '[Backport 6.2] compaction: fix potential data resurrection with file-based migration' from Ferenc Szili
This is a manual backport of #20788

When tablets are migrated with file-based streaming, we can have a situation where a tombstone is garbage collected before the data it shadows lands. For instance, if we have a tablet replica with 3 sstables:

1. sstable containing an expired tombstone
2. sstable with additional data
3. sstable containing data which is shadowed by the expired tombstone in sstable 1

If this tablet is migrated, and the sstables are streamed in the order listed above, the first two sstables can be compacted before the third sstable arrives. In that case, the expired tombstone will be garbage collected, and data in the third sstable will be resurrected after it arrives to the pending replica.

This change fixes this problem by disabling tombstone garbage collection for pending replicas.

This fixes a problem in Enterprise, but the change is in OSS in order to have as few differences between OSS and Enterprise and to have a common infrastructure for disabling tombstone GC on pending replicas.

Fixes #21090

Closes scylladb/scylladb#21061

* github.com:scylladb/scylladb:
  test: test tombstone GC disabled on pending replica
  tablet_storage_group_manager: update tombstone_gc_enabled in compaction group
  database::table: add tombstone_gc_enabled(locator::tablet_id)
2024-10-15 09:25:22 +03:00
Kefu Chai
b691dddf6b install.sh: install seastar/scripts/addr2line.py as well
seastar extracted `addr2line` python module out back in
e078d7877273e4a6698071dc10902945f175e8bc. but `install.sh` was
not updated accordingly. it still installs `seastar-addr2line`
without installing its new dependency. this leaves us with a
broken `seastar-addr2line` in the relocatable tarball.
```console
$ /opt/scylladb/scripts/seastar-addr2line
Traceback (most recent call last):
  File "/opt/scylladb/scripts/libexec/seastar-addr2line", line 26, in <module>
    from addr2line import BacktraceResolver
ModuleNotFoundError: No module named 'addr2line'
```

in this change, we redistribute `addr2line.py` as well. this
should address the issue above.

Fixes scylladb/scylladb#21077

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit da433aad9d)

Closes scylladb/scylladb#21085
2024-10-14 09:52:21 +03:00
Botond Dénes
85b1c64a33 Merge '[Backport 6.2] storage_proxy: Add conditions checking to avoid UB in speculating read executors.' from ScyllaDB
During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking:

- Condition checking in speculating read executors for the number of replicas.
- Checking the consistency of the Effective Replication Map in  filter_for_query(): the map is considered incorrect if the list  of replicas contains a node from a data center whose replication factor is 0.

 Please note: This PR does not fix the issue found in scylladb/scylladb#20282;   it only adds condition checks to prevent undefined behavior in cases of  inconsistent inputs.

Refs scylladb/scylladb#20625

As this issue applies to the releases versions and can affect clients, we need backports to 6.0, 6.1, 6.2.

(cherry picked from commit 132358dc92)

(cherry picked from commit ae23d42889)

(cherry picked from commit ad93cf5753)

(cherry picked from commit 8db6d6bd57)

(cherry picked from commit c373edab2d)

Refs #20851

Closes scylladb/scylladb#21067

* github.com:scylladb/scylladb:
  Add conditions checking for get_read_executor
  Avoid an extra call to block_for in db::filter_for_query.
  Improve code readability in consistency_level.cc and storage_proxy.cc
  tools: Add build_info header with functions providing build type information
  tests: Add tests for alter table with RF=1 to RF=0
2024-10-14 09:51:50 +03:00
Benny Halevy
6e67a993ba storage_service: rebuild: warn about tablets-enabled keyspaces
Until we automatically support rebuild for tablets-enabled
keyspaces, warn the user about them.

The reason this is not an error, is that after
increasing RF in a new datacenter, the current procedure
is to run `nodetool rebuild` on all nodes in that dc
to rebuild the new vnode replicas.
This is not required for tablets, since the additional
replicas are rebuilt automatically as part of ALTER KS.

However, `nodetool rebuild` is also run after local
data loss (e.g. due to corruption and removal of sstables).
In this case, rebuild is not supported for tablets-enabled
keyspaces, as tablet replicas that had lost data may have
already been migrated to other nodes, and rebuilding the
requested node will not know about it.
It is advised to repair all nodes in the datacenter instead.

Refs scylladb/scylladb#17575

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit ed1e9a1543)

Closes scylladb/scylladb#20722
2024-10-14 09:47:35 +03:00
Michał Chojnowski
b8a9fd4e49 reader_concurrency_semaphore: in stats, fix swapped count_resources and memory_resources
can_admit_read() returns reason::memory_resources when the permit is queued due
to lack of count resources, and it returns reason::count_resources when the
permit is queued due to lack of memory resources. It's supposed to be the other
way around.

This bug is causing the two counts to be swapped in the stat dumps printed to
the logs when semaphores time out.

(cherry picked from commit 6cf3747c5f)

Closes scylladb/scylladb#21030
2024-10-13 18:34:18 +03:00
Jenkins Promoter
363cf881d4 Update ScyllaDB version to: 6.2.0 2024-10-13 14:15:40 +03:00
Sergey Zolotukhin
68a55facdf Add conditions checking for get_read_executor
During the investigation of scylladb/scylladb#20282, it was discovered that
implementations of speculating read executors have undefined behavior
when called with an incorrect number of read replicas. This PR
introduces two levels of condition checking:

- Condition checking in speculating read executors for the number of replicas.
- Checking the consistency of the Effective Replication Map in
  get_endpoints_for_reading(): the map is considered incorrect the number of
  read replica nodes is higher than replication factor. The check is
  applied only when built in non release mode.

Please note: This PR does not fix the issue found in scylladb/scylladb#20282;
it only adds condition checks to prevent undefined behavior in cases of
inconsistent inputs.

Refs scylladb/scylladb#20625

(cherry picked from commit c373edab2d)
2024-10-11 18:20:43 +00:00
Sergey Zolotukhin
9010d0a22f Avoid an extra call to block_for in db::filter_for_query.
(cherry picked from commit 8db6d6bd57)
2024-10-11 18:20:43 +00:00
Sergey Zolotukhin
3c0f43b6eb Improve code readability in consistency_level.cc and storage_proxy.cc
Add const correctness and rename some variables to improve code readability.

(cherry picked from commit ad93cf5753)
2024-10-11 18:20:43 +00:00
Sergey Zolotukhin
a22e4476ac tools: Add build_info header with functions providing build type information
A new header provides `constexpr` functions to retrieve build
type information: `get_build_type()`, `is_release_build()`,
and `is_debug_build()`. These functions are useful when adding
changes that should be enabled at compile time only for
specific build types.

(cherry picked from commit ae23d42889)
2024-10-11 18:20:42 +00:00
Sergey Zolotukhin
14650257c0 tests: Add tests for alter table with RF=1 to RF=0
Adding Vnodes and Tablets tests for alter keyspace operation that decreases replication factor
from 1 to 0 for one of two data centers. Tablet version fails due to issue described in
scylladb/scylladb#20625.

Test for scylladb/scylladb#20625

(cherry picked from commit 132358dc92)
2024-10-11 18:20:42 +00:00
Ferenc Szili
2a318817ba test: test tombstone GC disabled on pending replica
This tests if tombstone GC is disabled on pending replicas
2024-10-11 14:10:30 +02:00
Ferenc Szili
5f052a2b52 tablet_storage_group_manager: update tombstone_gc_enabled in compaction group
In order to avoid cases during tablet migrations where we garbage
collect tombstones before the data it shadows arrives, we will
disable tombstone GC on pending replicas.

To achieve this we added a tombston_gc_enabled flag to compaction_group.
This flag is updated from updte_effective_repliction_map method of the
tablet_storage_group_manager class.
2024-10-11 14:09:30 +02:00
David Garcia
e018b38a54 docs: Fix confgroup links
It was not possible to link to configuration parameters groups in docs/reference/configuration-parameters.rst if they contained a space.

(cherry picked from commit 2247bdbc8c)

Closes scylladb/scylladb#21037
2024-10-11 14:31:28 +03:00
Ferenc Szili
14ce5e14d0 database::table: add tombstone_gc_enabled(locator::tablet_id)
This change adds the flag tombstone_gc_enabled to compaction_group.
The value of this flag will be set in
tablet_storage_group_manager::update_effective_replication_map().
2024-10-11 13:29:30 +02:00
Piotr Smaron
d1a31460a0 cql/tablets: handle MVs in ALTER tablets KEYSPACE
ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized
views (MV), and only produced tablets mutations changing tables.
With this patch we're producing tablets mutations for both tables and
MVs, hence when e.g. we change the replication factor (RF) of a KS, both the
tables' RFs and MVs' RFs are updated along with tablets replicas.
The `test_tablet_rf_change` testcase has been extended to also verify
that MVs' tablets replicas are updated when RF changes.

Fixes: #20240
(cherry picked from commit e0c1a51642)

Closes scylladb/scylladb#21022
2024-10-11 14:14:09 +03:00
Botond Dénes
9175cc528b Merge '[Backport 6.2] cql: improve validating RF's change in ALTER tablets KS' from ScyllaDB
This patch series fixes a couple of bugs around validating if RF is not changed by too much when performing ALTER tablets KS.
RF cannot change by more than 1 in total, because tablets load balancer cannot handle more work at once.

Fixes: #20039

Should be backported to 6.0 & 6.1 (wherever tablets feature is present), as this bug may break the cluster.

(cherry picked from commit 042825247f)

(cherry picked from commit adf453af3f)

(cherry picked from commit 9c5950533f)

(cherry picked from commit 47acdc1f98)

(cherry picked from commit 93d61d7031)

(cherry picked from commit 6676e47371)

(cherry picked from commit 2aabe7f09c)

(cherry picked from commit ee56bbfe61)

Refs #20208

Closes scylladb/scylladb#21009

* github.com:scylladb/scylladb:
  cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS
  cql: join new and old KS options in ALTER tablets KS
  cql: fix validation of ALTERing RFs in tablets KS
  cql: harden `alter_keyspace_statement.cc::validate_rf_difference`
  cql: validate RF change for new DCs in ALTER tablets KS
  cql: extend test_alter_tablet_keyspace_rf
  cql: refactor test_tablets::test_alter_tablet_keyspace
  cql: remove unused helper function from test_tablets
2024-10-11 14:13:43 +03:00
Botond Dénes
18be4f454e Merge '[Backport 6.2] Node replace and remove operations: Add deprecate IP addresses usage warning.' from ScyllaDB
- As part of deprecation of IP address usage, warning messages were added when IP addresses specified in the `ignore-dead-nodes` and `--ignore-dead-nodes-for-replace` options for scylla and nodetool.
- Slight optimizations for `utils::split_comma_separated_list`, ` host_id_or_endpoint lists` and `storage_service` remove node operations, replacing `std::list` usage with `std::vector`.

Fixes scylladb/scylladb#19218

Backport: 6.2 as it's not yet released.

(cherry picked from commit 3b9033423d)

(cherry picked from commit a871321ecf)

(cherry picked from commit 9c692438e9)

(cherry picked from commit 6398b7548c)

Refs #20756

Closes scylladb/scylladb#20958

* github.com:scylladb/scylladb:
  config: Add a warning about use of IP address for join topology and replace operations.
  nodetool: Add IP address usage warning for 'ignore-dead-nodes'.
  tests: Fix incorrect UUIDs in test_nodeops
  utils: Optimizations for utils::split_comma_separated_list and usage of host_id_or_endpoint lists
2024-10-11 14:12:51 +03:00
Botond Dénes
f35a083abe repair/row_level: remove reader timeout
This timeout was added to catch reader related deadlocks. We have not
seen such deadlocks for a long time, but we did see false-timeouts
caused by this, see explanation below. Since the cost now outweight the
benefit, remove the timeout altogether.

The false timeout happens during mixed-shard repair. The
`reader_permit::set_timeout()` call is called on the top-level permit
which repair has a handle on. In the case of the mixed-shard repair,
this belongs to the multishard reader. Calling set_timeout() on the
multishard reader has no effect on the actual shard readers, except in
one case: when the shard reader is created, it inherits the multishard
reader's current timeout. As the shard reader can be alive for a long
time, this timeout is not refreshed and ultimately causes a timeout and
fails the repair.

Refs: #18269
(cherry picked from commit 3ebb124eb2)

Closes scylladb/scylladb#20955
2024-10-11 14:11:03 +03:00
Anna Stuchlik
57affc7fad doc: document the option to run ScyllaDB in Docker on macOS
This commit adds a description of a workaround to create a multi-node ScyllaDB cluster
with Docker on macOS.

Refs https://github.com/scylladb/scylladb/issues/16806
See https://forum.scylladb.com/t/running-3-node-scylladb-in-docker/1057/4

(cherry picked from commit 7eb1dc2ae5)

Closes scylladb/scylladb#20931
2024-10-11 14:10:06 +03:00
Raphael S. Carvalho
927e526e2d replica: Fix schema change during migration cleanup
During migration cleanup, there's a small window in which the storage
group was stopped but not yet removed from the list. So concurrent
operations traversing the list could work with stopped groups.

During a test which emitted schema changes during migrations,
a failure happened when updating the compaction strategy of a table,
but since the group was stopped, the compaction manager was unable
to find the state for that group.

In order to fix it, we'll skip stopped groups when traversing the
list since they're unused at this stage of migration and going away
soon.

Fixes #20699.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit cf58674029)

Closes scylladb/scylladb#20899
2024-10-11 14:07:42 +03:00
Calle Wilund
b224665575 database: Also forced new schema commitlog segment on user initiated memtable flush
Refs #20686
Refs #15607

In #15060 we added forced new commitlog segment on user initated flush,
mainly so that tests can verify tombstone gc and other compaction related
things, without having to wait for "organic" segment deletion.
Schema commitlog was not included, mainly because we did not have tests
featuring compaction checks of schema related tables, but also because
it was assumed to be lower general througput.
There is however no real reason to not include it, and it will make some
testing much quicker and more predictable.

(cherry picked from commit 60f8a9f39d)

Closes scylladb/scylladb#20705
2024-10-11 14:03:17 +03:00
Gleb Natapov
9afb1afefa storage_proxy: make sure there is no end iterator in _live_iterators array
storage_proxy::cancellable_write_handlers_list::update_live_iterators
assumes that iterators in _live_iterators can be dereferenced, but
the code does not make any attempt to make sure this is the case. The
iterator can be the end iterator which cannot be dereferenced.

The patch makes sure that there is no end iterator in _live_iterators.

Fixes scylladb/scylladb#20874

(cherry picked from commit da084d6441)

Closes scylladb/scylladb#21003
2024-10-10 17:09:27 +03:00
Kefu Chai
72153cac96 auth: capture boost::regex_error not std::regex_error
in a3db5401, we introduced the TLS certi authenticator, which is
configured using `auth_certificate_role_queries` option . the
value of this option contains a regular expression. so there are
chances the regular expression is malformatted. in that case,
when converting its value presenting the regular expression to an
instance of `boost::regex`, Boost.Regex throws a `boost::regex_error`
exception, not `std::regex_error`.

since we decided to use Boost.Regex, let's catch `boost::regex_error`.

Refs a3db5401
Fixes scylladb/scylladb#20941
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 439c52c7c5)

Closes scylladb/scylladb#20952
2024-10-09 21:58:40 +03:00
Michał Chojnowski
f988980260 utils/rjson.cc: correct a comment about assert()
Commit aa1270a00c changed most uses
of `assert` in the codebase to `SCYLLA_ASSERT`.

But the comment fixed in this patch is talking specifically about
`assert`, and shouldn't have been changed. It doesn't make sense
after the change.

(cherry picked from commit da7edc3a08)

Closes scylladb/scylladb#20976
2024-10-09 21:50:26 +03:00
Anna Stuchlik
1d11adf766 doc: remove outdated JMX references
This commit removes references to JMX from the docs.

Context:
The JMX server has been dropped and removed from installation. The user can
install it manually if needed, as documented with https://github.com/scylladb/scylladb/issues/18687.

This commit removes the outdated information about JMX from other pages
in the documentation, including the docs for nodetool, the list of ports,
and the admin section.

Also, the no longer relevant JMX information is removed from
the Docker Hub docs.

Fixes https://github.com/scylladb/scylladb/issues/18687
Fixes https://github.com/scylladb/scylladb/issues/19575

(cherry picked from commit 4e43d542cd)

Closes scylladb/scylladb#20988
2024-10-09 20:57:49 +03:00
Jenkins Promoter
dae1d18145 Update ScyllaDB version to: 6.2.0-rc3 2024-10-09 15:10:48 +03:00
Kamil Braun
e9588a8a53 Merge '[Backport 6.2] Wait for all users of group0 server to complete before destroying it' from ScyllaDB
Group0 server is often used in asynchronous context, but we do not wait
for them to complete before destroying the server. We already have
shutdown gate for it, so lets use it in those asynch functions.

Also make sure to signal group0 abort source if initialization fails.

Fixes scylladb/scylladb#20701

Backport to 6.2 since it contains af83c5e53e and it made the race easier to hit, so tests became flaky.

(cherry picked from commit ba22493a69)

(cherry picked from commit e642f0a86d)

Refs #20891

Closes scylladb/scylladb#21008

* github.com:scylladb/scylladb:
  group: hold group0 shutdown gate during async operations
  group0: Stop group0 if node initialization fails
2024-10-09 12:19:16 +02:00
Piotr Smaron
c73d0ffbaa cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS
Tablets load balancer is unable to process more than a single pending
replica, thus ALTER tablets KS cannot accept an ALTER statement which
would result in creating 2+ pending replicas, hence it has to validate
if the sum of absoulte differences of RFs specified in the statement is
not greter than 1.

(cherry picked from commit ee56bbfe61)
2024-10-08 18:06:52 +00:00
Piotr Smaron
c7b5571766 cql: join new and old KS options in ALTER tablets KS
A bug has been discovered while trying to ALTER tablets KS and
specifying only 1 out of 2 DCs - the not specified DC's RF has been
zeroed. This is because ALTER tablets KS updated the KS only with the
RF-per-DC mapping specified in the ALTER tablets KS statement, so if a
DC was ommitted, it was assigned a value of RF=0.
This commit fixes that plus additionally passes all the KS options, not
only the replication options, to the topology coordinator, where the KS
update is performed.
`initial_tablets` is a special case, which requires a special handling
in the source code, as we cannot simply update old initial_tablet's
settings with the new ones, because if only ` and TABLETS = {'enabled':
true}` is specified in the ALTER tablets KS statement, we should not zero the `initial_tablets`, but
rather keep the old value - this is tested by the
`test_alter_preserves_tablets_if_initial_tablets_skipped` testcase.
Other than that, the above mentioned testcase started to fail with
these changes, and it appeared to be an issue with the test not waiting
until ALTER is completed, and thus reading the old value, hence the
test's body has been modified to wait for ALTER to complete before
performing validation.

(cherry picked from commit 2aabe7f09c)
2024-10-08 18:06:48 +00:00
Piotr Smaron
92325073a9 cql: fix validation of ALTERing RFs in tablets KS
The validation has been corrected with:
1. Checking if a DC specified in ALTER exists.
2. Removing `REPLICATION_STRATEGY_CLASS_KEY` key from a map of RFs that
   needs their RFs to be validated.

(cherry picked from commit 6676e47371)
2024-10-08 18:06:47 +00:00
Piotr Smaron
f5c0969c06 cql: harden alter_keyspace_statement.cc::validate_rf_difference
This function assumed that strings passed as arguments will be of
integer types, but that wasn't the case, and we missed that because this
function didn't have any validation, so this change adds proper
validation and error logging.
Arguments passed to this function were forwarded from a call to
`ks_prop_defs::get_replication_options`, which, among rf-per-dc mapping, returns also
`class:replication_strategy` pair. Second pair's member has been casted
into an `int` type and somehow the code was still running fine, but only
extra testing added later discovered a bug in here.

(cherry picked from commit 93d61d7031)
2024-10-08 18:06:46 +00:00
Gleb Natapov
90ced080a8 group: hold group0 shutdown gate during async operations
Wait for all outstanding async work that uses group0 to complete before
destroying group0 server.

Fixes scylladb/scylladb#20701

(cherry picked from commit e642f0a86d)
2024-10-08 18:06:45 +00:00
Piotr Smaron
7674d80c31 cql: validate RF change for new DCs in ALTER tablets KS
ALTER tablets KS validated if RF is not changed by more than 1 for DCs
that already had replicas, but not for DCs that didn't have them yet, so
specifying an RF jump from 0 to 2 was possible when listing a new DC in
ALTER tablets KS statement, which violated internal invariants of
tablets load balancer.
This PR fixes that bug and adds a multi-dc testcases to check if adding
replicas to a new DC and removing replicas from a DC is honoring the RF
change constraints.

Refs: #20039
(cherry picked from commit 47acdc1f98)
2024-10-08 18:06:45 +00:00
Gleb Natapov
06ceef34a7 group0: Stop group0 if node initialization fails
Commit af83c5e53e moved aborting of group0 into the storage service
drain function. But it is not called if node fails during initialization
(if it failed to join cluster for instance). So lets abort on both
paths (but only once).

(cherry picked from commit ba22493a69)
2024-10-08 18:06:44 +00:00
Piotr Smaron
ec83367b45 cql: extend test_alter_tablet_keyspace_rf
Added cases to also test decreasing RF and setting the same RF.
Also added extra explanatory comments.

(cherry picked from commit 9c5950533f)
2024-10-08 18:06:44 +00:00
Piotr Smaron
dfe2e20442 cql: refactor test_tablets::test_alter_tablet_keyspace
1. Renamed the testcase to emphasize that it only focuses on testing
   changing RF - there are other tests that test ALTER tablets KS
in general.
2. Fixed whitespaces according to PEP8

(cherry picked from commit adf453af3f)
2024-10-08 18:06:42 +00:00
Piotr Smaron
ad2191e84f cql: remove unused helper function from test_tablets
`change_default_rf` is not used anywhere, moreover it uses
`replication_factor` tag, which is forbidden in ALTER tablets KS
statement.

(cherry picked from commit 042825247f)
2024-10-08 18:06:41 +00:00
Sergey Zolotukhin
855abd7368 config: Add a warning about use of IP address for join topology and replace
operations.

When the '--ignore-dead-nodes-for-replace' config option contains
IP addresses, a warning will be logged, notifying the user that
using IP addresses with this option is deprecated and will no
longer be supported in the next release.

Fixes scylladb/scylladb#19218

(cherry picked from commit 6398b7548c)
2024-10-03 14:10:30 +00:00
Sergey Zolotukhin
086dc6d53c nodetool: Add IP address usage warning for 'ignore-dead-nodes'.
Since we are deprecating the use of IP addresses, a warning message will be printed
if 'nodetool removenode --ignore-dead-nodes' is used with IP addresses.

(cherry picked from commit 9c692438e9)
2024-10-03 14:10:29 +00:00
Sergey Zolotukhin
09b0b3f7d6 tests: Fix incorrect UUIDs in test_nodeops
It was found that the UUIDs used in test_nodeops were
invalid. This update replaces those UUIDs with newly generated
random UUIDs.

(cherry picked from commit a871321ecf)
2024-10-03 14:10:28 +00:00
Sergey Zolotukhin
3bbb7a24b1 utils: Optimizations for utils::split_comma_separated_list and usage of host_id_or_endpoint lists
- utils::split_comma_separated_list now accepts a reference to sstring instead
  of a copy to avoid extra memory allocations. Additionally, the results of
  trimming are moved to the resulting vector instead of being copied.
- service/storage_service removenode, raft_removenode, find_raft_nodes_from_hoeps,
  parse_node_list and api/storage_service::set_storage_service were changed to use
  std::vector<host_id_or_endpoint> instead of std::list<host_id_or_endpoint> as
  std::vector is a more cache-friendly structure,  resulting in better performance.

(cherry picked from commit 3b9033423d)
2024-10-03 14:10:27 +00:00
Pavel Emelyanov
b43454c658 cql: Check that CREATEing tablets/vnodes is consistent with the CLI
There are two bits that control whenter replication strategy for a
keyspace will use tablets or not -- the configuration option and CQL
parameter. This patch tunes its parsing to implement the logic shown
below:

    if (strategy.supports_tablets) {
         if (cql.with_tablets) {
             if (cfg.enable_tablets) {
                 return create_keyspace_with_tablets();
             } else {
                 throw "tablets are not enabled";
             }
         } else if (cql.with_tablets = off) {
              return create_keyspace_without_tablets();
         } else { // cql.with_tablets is not specified
              if (cfg.enable_tablets) {
                  return create_keyspace_with_tablets();
              } else {
                  return create_keyspace_without_tablets();
              }
         }
     } else { // strategy doesn't support tablets
         if (cql.with_tablets == on) {
             throw "invalid cql parameter";
         } else if (cql.with_tablets == off) {
             return create_keyspace_without_tablets();
         } else { // cql.with_tablets is not specified
             return create_keyspace_without_tablets();
         }
     }

closes: #20088

In order to enable tablets "by default" for NetworkTopologyStrategy
there's explicit check near ks_prop_defs::get_initial_tablets(), that's
not very nice. It needs more care to fix it, e.g. provide feature
service reference to abstract_replication_strategy constructor. But
since ks_prop_defs code already highjacks options specifically for that
strategy type (see prepare_options() helper), it's OK for now.

There's also #20768 misbehavior that's preserved in this patch, but
should be fixed eventually as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit ebedc57300)

Closes scylladb/scylladb#20927
2024-10-03 17:09:49 +03:00
Jenkins Promoter
93700ff5d1 Update ScyllaDB version to: 6.2.0-rc2 2024-10-02 14:58:37 +03:00
Anna Stuchlik
5e2b4a0e80 doc: add metric updates from 6.1 to 6.2
This commit specifies metrics that are new in version 6.2 compared to 6.1,
as specified in https://github.com/scylladb/scylladb/issues/20176.

Fixes https://github.com/scylladb/scylladb/issues/20176

(cherry picked from commit a97db03448)

Closes scylladb/scylladb#20930
2024-10-02 12:07:06 +03:00
Calle Wilund
bb5dc0771c commitlog: Fix buffer_list_bytes not updated correctly
Fixes #20862

With the change in 60af2f3cb2 the bookkeep
for buffer memory was changed subtly, the problem here that we would
shrink buffer size before we after flush use said buffer's size to
decrement the buffer_list_bytes value, previously inc:ed by the full,
allocated size. I.e. we would slowly grow this value instead of adjusting
properly to actual used bytes.

Test included.

(cherry picked from commit ee5e71172f)

Closes scylladb/scylladb#20902
2024-10-01 17:41:02 +03:00
Aleksandra Martyniuk
9ed8519362 node_ops: fix task_manager_module::get_nodes()
Currently, node ops virtual task gathers its children from all nodes contained
in a sum of service::topology::normal_nodes and service::topology::transition_nodes.
The maps may contain nodes that are down but weren't removed yet. So, if a user
requests the status of a node ops virtual task, the task's attempt to retrieve
its children list may fail with seastar::rpc::closed_error.

Filter out the tasks that are down in node_ops::task_manager_module::get_nodes.

Fixes: #20843.
(cherry picked from commit a558abeba3)

Closes scylladb/scylladb#20898
2024-10-01 14:52:11 +03:00
Avi Kivity
077d7c06a0 Merge '[Backport 6.2] sstables: Fix use-after-free on page cache buffer when parsing promoted index entries across pages' from ScyllaDB
This fixes a use-after-free bug when parsing clustering key across
pages.

Also includes a fix for allocating section retry, which is potentially not safe (not in practice yet).

Details of the first problem:

Clustering key index lookup is based on the index file page cache. We
do a binary search within the index, which involves parsing index
blocks touched by the algorithm. Index file pages are 4 KB chunks
which are stored in LSA.

To parse the first key of the block, we reuse clustering_parser, which
is also used when parsing the data file. The parser is stateful and
accepts consecutive chunks as temporary_buffers. The parser is
supposed to keep its state across chunks.

In 93482439, the promoted index cursor was optimized to avoid
fully page copy when parsing index blocks. Instead, parser is
given a temporary_buffer which is a view on the page.

A bit earlier, in b1b5bda, the parser was changed to keep shared
fragments of the buffer passed to the parser in its internal state (across pages)
rather than copy the fragments into a new buffer. This is problematic
when buffers come from page cache because LSA buffers may be moved
around or evicted. So the temporary_buffer which is a view on the LSA
buffer is valid only around the duration of a single consume() call to
the parser.

If the blob which is parsed (e.g. variable-length clustering key
component) spans pages, the fragments stored in the parser may be
invalidated before the component is fully parsed. As a result, the
parsed clustering key may have incorrect component values. This never
causes parsing errors because the "length" field is always parsed from
the current buffer, which is valid, and component parsing will end at
the right place in the next (valid) buffer.

The problematic path for clustering_key parsing is the one which calls
primitive_consumer::read_bytes(), which is called for example for text
components. Fixed-size components are not parsed like this, they store
the intermediate state by copying data.

This may cause incorrect clustering keys to be parsed when doing
binary search in the index, diverting the search to an incorrect
block.

Details of the solution:

We adapt page_view to a temporary_buffer-like API. For this, a new concept
is introduced called ContiguousSharedBuffer. We also change parsers so that
they can be templated on the type of the buffer they work with (page_view vs
temporary_buffer). This way we don't introduce indirection to existing algorithms.

We use page_view instead of temporary_buffer in the promoted
index parser which works with page cache buffers. page_view can be safely
shared via share() and stored across allocating sections. It keeps hold to the
LSA buffer even across allocating sections by the means of cached_file::page_ptr.

Fixes #20766

(cherry picked from commit 8aca93b3ec)

(cherry picked from commit ac823b1050)

(cherry picked from commit 93bfaf4282)

(cherry picked from commit c0fa49bab5)

(cherry picked from commit 29498a97ae)

(cherry picked from commit c15145b71d)

(cherry picked from commit 7670ee701a)

(cherry picked from commit c09fa0cb98)

(cherry picked from commit 0279ac5faa)

(cherry picked from commit 8e54ecd38e)

(cherry picked from commit b5ae7da9d2)

Refs #20837

Closes scylladb/scylladb#20905

* github.com:scylladb/scylladb:
  sstables: bsearch_clustered_cursor: Add trace-level logging
  sstables: bsearch_clustered_cursor: Move definitions out of line
  test, sstables: Verify parsing stability when allocating section is retried
  test, sstables: Verify parsing stability when buffers cross page boundary
  sstables: bsearch_clustered_cursor: Switch parsers to work with page_view
  cached_file: Adapt page_view to ContiguousSharedBuffer
  cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start
  sstables, utils: Allow parsers to work with different buffer types
  sstables: promoted_index_block_parser: Make reset() always bring parser to initial state
  sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method
  sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried
2024-10-01 14:51:29 +03:00
Tomasz Grabiec
5a1575678b sstables: bsearch_clustered_cursor: Add trace-level logging
(cherry picked from commit b5ae7da9d2)
2024-10-01 01:38:48 +00:00
Tomasz Grabiec
2401f7f9ca sstables: bsearch_clustered_cursor: Move definitions out of line
In order to later use the formatter for the inner class
promoted_index_block, which is defined out of line after
cached_promoted_index class definition.

(cherry picked from commit 8e54ecd38e)
2024-10-01 01:38:47 +00:00
Tomasz Grabiec
906d085289 test, sstables: Verify parsing stability when allocating section is retried
(cherry picked from commit 0279ac5faa)
2024-10-01 01:38:47 +00:00
Tomasz Grabiec
34dd3a6daa test, sstables: Verify parsing stability when buffers cross page boundary
(cherry picked from commit c09fa0cb98)
2024-10-01 01:38:47 +00:00
Tomasz Grabiec
3afa8ee2ca sstables: bsearch_clustered_cursor: Switch parsers to work with page_view
This fixes a use-after-free bug when parsing clustering key across
pages.

Clustering key index lookup is based on the index file page cache. We
do a binary search within the index, which involves parsing index
blocks touched by the algorithm. Index file pages are 4 KB chunks
which are stored in LSA.

To parse the first key of the block, we reuse clustering_parser, which
is also used when parsing the data file. The parser is stateful and
accepts consecutive chunks as temporary_buffers. The parser is
supposed to keep its state across chunks.

In b1b5bda, the parser was changed to keep shared fragments of the
buffer passed to the parser in its internal state (across pages)
rather than copy the fragments into a new buffer. This is problematic
when buffers come from page cache because LSA buffers may be moved
around or evicted. So the temporary_buffer which is a view on the LSA
buffer is valid only around the duration of a single consume() call to
the parser.

If the blob which is parsed (e.g. variable-length clustering key
component) spans pages, the fragments stored in the parser may be
invalidated before the component is fully parsed. As a result, the
parsed clustering key may have incorrect component values. This never
causes parsing errors because the "length" field is always parsed from
the current buffer, which is valid, and component parsing will end at
the right place in the next (valid) buffer.

The problematic path for clustering_key parsing is the one which calls
primitive_consumer::read_bytes(), which is called for example for text
components. Fixed-size components are not parsed like this, they store
the intermediate state by copying data.

This may cause incorrect clustering keys to be parsed when doing
binary search in the index, diverting the search to an incorrect
block.

The solution is to use page_view instead of temporary_buffer, which
can be safely shared via share() and stored across allocating
section. The page_view maintains its hold to the LSA buffer even
across allocating sections.

Fixes #20766

(cherry picked from commit 7670ee701a)
2024-10-01 01:38:47 +00:00
Tomasz Grabiec
3347152ff9 cached_file: Adapt page_view to ContiguousSharedBuffer
(cherry picked from commit c15145b71d)
2024-10-01 01:38:47 +00:00
Tomasz Grabiec
ff7bd937e2 cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start
Will be easier to implement ContiguousSharedBuffer API as the buffer
size will be equal to _size.

(cherry picked from commit 29498a97ae)
2024-10-01 01:38:47 +00:00
Tomasz Grabiec
50ea1dbe32 sstables, utils: Allow parsers to work with different buffer types
Currently, parsers work with temporary_buffer<char>. This is unsafe
when invoked by bsearch_clustered_cursor, which reuses some of the
parsers, and passes temporary_buffer<char> which is a view onto LSA
buffer which comes from the index file page cache. This view is stable
only around consume(). If parsing requires more than one page, it will
continue with a different input buffer. The old buffer will be
invalid, and it's unsafe for the parser to store and access
it. Unfortunetly, the temporary_buffer API allows sharing the buffer
via the share() method, which shares the underlying memory area. This
is not correct when the underlying is managed by LSA, because storage
may move. Parser uses this sharing when parsing blobs, e.g. clustering
key components. When parsing resumes in the next page, parser will try
to access the stored shared buffers pointing to the previous page,
which may result in use-after-free on the memory area.

In prearation for fixing the problem, parametrize parsers to work with
different kinds of buffers. This will allow us to instantiate them
with a buffer kind which supports sharing of LSA buffers properly in a
safe way.

It's not purely mechanical work. Some parts of the parsing state
machine still works with temporary_buffer<char>, and allocate buffers
internally, when reading into linearized destination buffer. They used
to store this destination in _read_bytes vector, same field which is
used to store the shared buffers. Now it's not possible, since shared
buffer type may be different than temporary_buffer<char>. So those
paths were changed to use a new field: _read_bytes_buf.

(cherry picked from commit c0fa49bab5)
2024-10-01 01:38:47 +00:00
Tomasz Grabiec
45125c4d7d sstables: promoted_index_block_parser: Make reset() always bring parser to initial state
When reset() is done due to allocating section retry, it can be
theoretically in an arbitrary point. So we should not assume that it
finished parsing and state was reset by previous parsing. We should
reset all the fields.

(cherry picked from commit 93bfaf4282)
2024-10-01 01:38:46 +00:00
Tomasz Grabiec
9207f7823d sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method
To unify logic which handles allocating section retry, and thus
improve safety.

(cherry picked from commit ac823b1050)
2024-10-01 01:38:46 +00:00
Tomasz Grabiec
711864687f sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried
Parser's state was not reset when allocating section was retried.

This doesn't cause problems in practice, because reserves are enough
to cover allocation demands of parsing clustering keys, which are at
most 64K in size. But it's still potentially unsafe and needs fixing.

(cherry picked from commit 8aca93b3ec)
2024-10-01 01:38:45 +00:00
Kamil Braun
faf11e5bc3 Merge '[Backport 6.2] Populate raft address map from gossiper on raft configuration change' from ScyllaDB
For each new node added to the raft config populate it's ID to IP mapping in raft address map from the gossiper. The mapping may have expired if a node is added to the raft configuration long after it first appears in the gossiper.

Fixes scylladb/scylladb#20600

Backport to all supported versions since the bug may cause bootstrapping failure.

(cherry picked from commit bddaf498df)

(cherry picked from commit 9e4cd32096)

Refs #20601

Closes scylladb/scylladb#20847

* github.com:scylladb/scylladb:
  test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join
  group0: make sure that address map has an entry for each new node in the raft configuration
2024-09-30 17:01:52 +02:00
Gleb Natapov
f9215b4d7e test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join
(cherry picked from commit 9e4cd32096)
2024-09-26 21:13:34 +00:00
Gleb Natapov
469ac9976a group0: make sure that address map has an entry for each new node in the raft configuration
ID->IP mapping is added to the raft address map when the mapping first
appears in the gossiper, but it is added as expiring entry. It becomes
non expiring when a node is added to raft configuration. But when a node
joins those two events may be distant in time (since the node's request
may sit in the topology coordinator queue for a while) and mappings may
expire already from the map. This patch makes sure to transfer the
mapping from the gossiper for a node that is added to the raft
configuration instead of assuming that the mapping is already there.

(cherry picked from commit bddaf498df)
2024-09-26 21:13:33 +00:00
Botond Dénes
d341f1ef1e Merge '[Backport 6.2] mark node as being replaced earlier' from ScyllaDB
Before 17f4a151ce the node was marked as
been replaced in join_group0 state, before it actually joins the group0,
so by the time it actually joins and starts transferring snapshot/log no
traffic is sent to it. The commit changed this to mark the node as
being replaced after the snapshot/log is already transferred so we can
get the traffic to the node while it sill did not caught up with a
leader and this may causes problems since the state is not complete.
Mark the node as being replaced earlier, but still add the new node to
the topology later as the commit above intended.

Fixes: https://github.com/scylladb/scylladb/issues/20629

Need to be backported since this is a regression

(cherry picked from commit 644e7a2012)

(cherry picked from commit c0939d86f9)

(cherry picked from commit 1b4c255ffd)

Refs #20743

Closes scylladb/scylladb#20829

* github.com:scylladb/scylladb:
  test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts
  topology coordinator:: mark node as being replaced earlier
  topology coordinator: do metadata barrier before calling finish_accepting_node() during replace
2024-09-26 10:37:25 +03:00
Kamil Braun
07dfcd1f64 service: raft: fix rpc error message
What it called "leader" is actually the destination of the RPC.

Trivial fix, should be backported to all affected versions.

(cherry picked from commit 09c68c0731)

Closes scylladb/scylladb#20826
2024-09-26 10:33:50 +03:00
Anna Stuchlik
f8d63b5572 doc: add OS support for version 6.2
This commit adds the OS support for version 6.2.
In addition, it removes support for 6.0, as the policy is only to include
information for the supported versions, i.e., the two latest versions.

Fixes https://github.com/scylladb/scylladb/issues/20804

(cherry picked from commit 8145109120)

Closes scylladb/scylladb#20825
2024-09-26 10:29:08 +03:00
Anna Stuchlik
ca83da91d1 doc: add an intro to the Features page
This commit modifies the Features page in the following way:

- It adds a short introduction and descriptions to each listed feature.
- It hides the ToC (required to control and modify the information on the page,
  e.g., to add descriptions, have full control over what is displayed, etc.)
- Removes the info about Enterprise features (following the request not to include
  Enterprise info in the OSS docs)

Fixes https://github.com/scylladb/scylladb/issues/20617
Blocks https://github.com/scylladb/scylla-enterprise/pull/4711

(cherry picked from commit da8047a834)

Closes scylladb/scylladb#20811
2024-09-26 10:22:36 +03:00
Botond Dénes
f55081fb1a Merge '[Backport 6.2] Rename Alternator batch item count metrics' from ScyllaDB
This PR addresses multiple issues with alternator batch metrics:

1. Rename the metrics to scylla_alternator_batch_item_count with op=BatchGetItem/BatchWriteItem
2. The batch size calculation was wrong and didn't count all items in the batch.
3. Add a test to validate that the metrics values increase by the correct value (not just increase). This also requires an addition to the testing to validate ops of different metrics and an exact value change.

Needs backporting to allow the monitoring to use the correct metrics names.

Fixes #20571

(cherry picked from commit 515857a4a9)

(cherry picked from commit 905408f764)

(cherry picked from commit 4d57a43815)

(cherry picked from commit 8dec292698)

Refs #20646

Closes scylladb/scylladb#20758

* github.com:scylladb/scylladb:
  alternator:test_metrics test metrics for batch item count
  alternator:test_metrics Add validating the increased value
  alternator: Fix item counting in batch operations
  Alterntor rename batch item count metrics
2024-09-26 10:22:00 +03:00
Anna Stuchlik
aa8cdec5bd doc: fix a broken link
This commit fixes a link to the Manager by adding a missing underscore
to the external link.

(cherry picked from commit aa0c95c95c)

Closes scylladb/scylladb#20710
2024-09-26 10:18:59 +03:00
Anna Stuchlik
75a2484dba doc: update the unified installer instructions
This commit updates the unified installer instructions to avoid specifying a given version.
At the moment, we're technically unable to use variables in URLs, so we need to update
the page each release.

Fixes https://github.com/scylladb/scylladb/issues/20677

(cherry picked from commit 400a14eefa)

Closes scylladb/scylladb#20708
2024-09-26 10:04:35 +03:00
Gleb Natapov
37387135b4 test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts
(cherry picked from commit 1b4c255ffd)
2024-09-26 03:45:50 +00:00
Gleb Natapov
ac24ab5141 topology coordinator:: mark node as being replaced earlier
Before 17f4a151ce the node was marked as
been replaced in join_group0 state, before it actually joins the group0,
so by the time it actually joins and starts transferring snapshot/log no
traffic is sent to it. The commit changed this to mark the node as
being replaced after the snapshot/log is already transferred so we can
get the traffic to the node while it sill did not caught up with a
leader and this may causes problems since the state is not complete.
Mark the node as being replaced earlier, but still add the new node to
the topology later as the commit above intended.

(cherry picked from commit c0939d86f9)
2024-09-26 03:45:50 +00:00
Gleb Natapov
729dc03e0c topology coordinator: do metadata barrier before calling finish_accepting_node() during replace
During replace with the same IP a node may get queries that were intended
for the node it was replacing since the new node declares itself UP
before it advertises that it is a replacement. But after the node
starts replacing procedure the old node is marked as "being replaced"
and queries no longer sent there. It is important to do so before the
new node start to get raft snapshot since the snapshot application is
not atomic and queries that run parallel with it may see partial state
and fail in weird ways. Queries that are sent before that will fail
because schema is empty, so they will not find any tables in the first
place. The is pre-existing and not addressed by this patch.

(cherry picked from commit 644e7a2012)
2024-09-26 03:45:50 +00:00
Kamil Braun
9d64ced982 test: fix topology_custom/test_raft_recovery_stuck flakiness
The test performs consecutive schema changes in RECOVERY mode. The
second change relies on the first. However the driver might route the
changes to different servers and we don't have group 0 to guarantee
linearizability. We must rely on the first change coordinator to push
the schema mutations to other servers before returning, but that only
happens when it sees other servers as alive when doing the schema
change. It wasn't guaranteed in the test. Fix this.

Fixes scylladb/scylladb#20791

Should be backported to all branches containing this test to reduce
flakiness.

(cherry picked from commit f390d4020a)

Closes scylladb/scylladb#20807
2024-09-25 15:11:10 +02:00
Abhinav
ea6349a6f5 raft topology: add error for removal of non-normal nodes
In the current scenario, We check if a node being removed is normal
on the node initiating the removenode request. However, we don't have a
similar check on the topology coordinator. The node being removed could be
normal when we initiate the request, but it doesn't have to be normal when
the topology coordinator starts handling the request.
For example, the topology coordinator could have removed this node while handling
another removenode request that was added to the request queue earlier.

This commit intends to fix this issue by adding more checks in the enqueuing phase
and return errors for duplicate requests for node removal.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#20271
(cherry picked from commit b25b8dccbd)

Closes scylladb/scylladb#20799
2024-09-25 11:34:20 +02:00
Benny Halevy
ed9122a84e time_window_compaction_strategy: get_reshaping_job: restrict sort of multi_window vector to its size
Currently the function calls boost::partial_sort with a middle
iterator that might be out of bound and cause undefined behavior.

Check the vector size, and do a partial sort only if its longer
than `max_sstables`, otherwise sort the whole vector.

Fixes scylladb/scylladb#20608

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#20609

(cherry picked from commit 39ce358d82)

Refs: scylladb/scylladb#20609
2024-09-23 16:02:40 +03:00
Amnon Heiman
c7d6b4a194 alternator:test_metrics test metrics for batch item count
This patch adds tests for the batch operations item count.

The tests validate that the metrics tracking the number of items
processed in a batch increase by the correct amount.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 8dec292698)
2024-09-23 11:02:55 +00:00
Amnon Heiman
a35e138b22 alternator:test_metrics Add validating the increased value
The `check_increases_operation` now allows override the checked metric.

Additionally, a custom validation value can now be passed, which make it
possible to validate the amount by which a value has changed, rather
than just validating that the value increased.

The default behavior of validating that values have increased remains
unchanged, ensuring backward compatibility.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 4d57a43815)
2024-09-23 11:02:55 +00:00
Amnon Heiman
3db67faa8a alternator: Fix item counting in batch operations
This patch fixes the logic for counting items in batch operations.
Previously, the item count in requests was inaccurate, it count the
number of tabels in get_item and the request_items in write_items.

The new logic correctly counts each individual item in `BatchGetItem`
and `BatchWriteItem` requests.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 905408f764)
2024-09-23 11:02:55 +00:00
Amnon Heiman
6a12174e2d Alterntor rename batch item count metrics
This patch renames metrics tracking the total number of items in a batch
to `scylla_alternator_batch_item_count`.  It uses the existing `op` label to
differentiate between `BatchGetItem` and `BatchWriteItem` operations.

Ensures better clarity and distinction for batch operations in monitoring.

This an example of how it looks like:
 # HELP scylla_alternator_batch_item_count The total number of items processed across all batches
 # TYPE scylla_alternator_batch_item_count counter
 scylla_alternator_batch_item_count{op="BatchGetItem",shard="0"} 4
 scylla_alternator_batch_item_count{op="BatchWriteItem",shard="0"} 4

(cherry picked from commit 515857a4a9)
2024-09-23 11:02:55 +00:00
Piotr Dulikowski
ca0096ccb8 Merge '[Backport 6.2] message/messaging_service: guard adding maintenance tenant under cluster feature' from Michał Jadwiszczak
In https://github.com/scylladb/scylladb/pull/18729, we introduced a new statement tenant $maintenance, but the change wasn't protected by any cluster feature.
This wasn't a problem for OSS, since unknown isolation cookie just uses default scheduling group. However, in enterprise that leads to creating a service level on not-upgraded nodes, which may end up in an error if user create maximum number of service levels.

This patch adds a cluster feature to guard adding the new tenant. It's done in the way to handle two upgrade scenarios:

version without $maintenance tenant -> version with $maintenance tenant guarded by a feature
version with $maintenance tenant but not guarded by a feature -> version with $maintenance tenant guarded by a feature
The PR adds enabled flag to statement tenants.
This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection.
The $maintenance tenant is added to the config as disabled and it gets enabled once the corresponding feature is enabled.

Fixes https://github.com/scylladb/scylladb/issues/20070
Refs https://github.com/scylladb/scylla-enterprise/issues/4403

(cherry picked from commit d44844241d)

(cherry picked from commit 71a03ef6b0)

(cherry picked from commit b4b91ca364)

Refs https://github.com/scylladb/scylladb/pull/19802

Closes scylladb/scylladb#20690

* github.com:scylladb/scylladb:
  message/messaging_service: guard adding maintenance tenant under cluster feature
  message/messaging_service: add feature_service dependency
  message/messaging_service: add `enabled` flag to statement tenants
2024-09-23 09:48:12 +02:00
Jenkins Promoter
a71d4bc49c Update ScyllaDB version to: 6.2.0-rc1 2024-09-19 10:21:33 +03:00
Michał Jadwiszczak
749399e4b8 message/messaging_service: guard adding maintenance tenant under cluster feature
Set `enabled` flag for `$maintenance` tenant to false and
enable it when `MAINTENANCE_TENANT` feature is enabled.

(cherry-picked from b4b91ca364)
2024-09-18 19:10:24 +02:00
Michał Jadwiszczak
bdd97b2950 message/messaging_service: add feature_service dependency
(cherry-picked from 71a03ef6b0)
2024-09-18 19:09:46 +02:00
Michał Jadwiszczak
1a056f0cab message/messaging_service: add enabled flag to statement tenants
Adding a new tenant needs to be done under cluster feature protection.
However it wasn't the case for adding `$maintenance` statement tenant
and to fix it we need to support an upgrade from node which doesn't
know about maintenance tenant at all and from one which uses it without
any cluster feature protection.

This commit adds `enabled` flag to statement tenants.
This way, when the tenant is disabled, it cannot be used to create
a connection, but it can be used to accept an incoming connection.

(cherry-picked from d44844241d)
2024-09-18 19:09:06 +02:00
Tzach Livyatan
cf78a2caca Update client-node-encryption: OpsnSSL is FIPS *enabled*
Closes scylladb/scylladb#19705

(cherry picked from commit cb864b11d8)
2024-09-18 11:58:46 +03:00
Anna Mikhlin
cbc53f0e81 Update ScyllaDB version to: 6.2.0-rc0 2024-09-17 13:40:50 +03:00
Botond Dénes
a4a8cad97f Merge 'atomic_delete: allow deletion of sstables from several prefixes' from Benny Halevy
Allow create_pending_deletion_log to delete a bunch of sstables
potentially resides in different prefixes (e.g. in the base directory
and under staging/).

The motivation arises from table::cleanup_tablet that calls compaction_group::cleanup on all cg:s via cleanup_compaction_groups.  Cleanup, in turn, calls delete_sstables_atomically on all sstables in the compaction_group, in all states, including the normal state as well as staging - hence the requirement to support deleting sstables in different sub-directories.

Also, apparently truncate calls delete_atomically for all sstables too, via table::discard_sstables, so if it happened to be executed during view update generation, i.e. when there are sstables in staging, it should hit the assertion failure reported in https://github.com/scylladb/scylladb/issues/18862 as well (although I haven't seen it yet, but I see no reason why it would happen). So the issue was apparently present since the initial implementation of the pending_delete_log. It's just that with tablet migration it is more likely to be hit.

Fixes scylladb/scylladb#18862

Needs backport to 6.0 since tablets require this capability

Closes scylladb/scylladb#19555

* github.com:scylladb/scylladb:
  sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory
  sstables: storage: keep base directory in base class
  sstables: storage: define opened_directory in header file
  sstable_directory: use only dirlog
2024-09-17 08:30:40 +03:00
Lakshmi Narayanan Sreethar
626f55a2ea compaction: run cleanup under maintenance scheduling group
The cleanup compaction task is a maintenance operation that runs after
topology changes. So, run it under the maintenance scheduling group to
avoid interference with regular compaction tasks. Also remove the share
allocations done by the cleanup task, as they are unnecessary when
running under the maintenance group.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#20582
2024-09-16 16:58:43 +03:00
Avi Kivity
870d1c16f7 scripts: fix bin/cqlsh shortcut
Since 3c7af28725, the cqlsh submodule no longer contains a
bin/cqlsh shell script. This broke the supermodule's bin/cqlsh
shortcut.

Fix it by invoking cqlsh.py directly.

Closes scylladb/scylladb#20591
2024-09-16 09:52:29 +03:00
Botond Dénes
ea29fe579b Merge 'replica: ignore cleanup of deallocated storage group' from Aleksandra Martyniuk
Cleanup of a deallocated tablet throws an exception.
Since failed cleanup is retried, we end up in an infinite loop.

Ignore cleanup of deallocated storage groups.

Fixes:  #19752.

Needs to be backported to all branches with tablets (6.0 and later)

Closes scylladb/scylladb#20584

* github.com:scylladb/scylladb:
  test: check if cleanup of deallocated sg is ignored
  replica: ignore cleanup of deallocated storage group
2024-09-16 09:22:56 +03:00
Gleb Natapov
695f112795 paxos_state: release semaphore units before checking if a semaphore can be dropped
To drop a semaphore it should not be held by anyone, so we need to
release out units before checking if a semaphore can be dropped.

Fixes: scylladb/scylladb#20602

Closes scylladb/scylladb#20607
2024-09-15 21:21:03 +03:00
Kefu Chai
028410ba58 mutation_writer: use bucket parameter instead of using it->first
as `_bucket` is an `unordered_map<bucket_id, timestamp_bucket_writer>`,
when writing to a given bucket, we try to create a writer with the
specified bucket id, so the returned iterator should point to a node
whose `first` element is always the bucket id.

so, there is no need to reference `it` for the bucket id, let's just
reference the parameter. simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20598
2024-09-15 20:05:12 +03:00
Kefu Chai
49f232f405 compaction: fix a typo in comment
s/expection/exception/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20594
2024-09-15 16:09:01 +03:00
Avi Kivity
b9bc783418 cql3: selection: don't ignore regular column restriction if a regular row is not present
If a regular row isn't present, no regular column restriction
(say, r=3) can pass since all regular columns are presented as NULL,
and we don't have an IS NULL predicate. Yet we just ignore it.

Handle the restriction on a missing column by return false, signifying
the row was filtered out.

We have to move the check after the conditional checking whether there's
any restriction at all, otherwise we exit early with a false failure.

Unit test marked xfail on this issue are now unmarked.

A subtest of test_tombstone_limit is adjusted since it depended on this
bug. It tested a regular column which wasn't there, and this bug caused
the filter to be ignored. Change to test a static column that is there.

A test for a bug found while developing the patch is also added. It is
also tested by test_tombstone_limit, but better to have a dedicated test.

Fixes #10357

Closes scylladb/scylladb#20486
2024-09-15 13:44:16 +03:00
Botond Dénes
6d8e9645ce test/*/run: restore --vnodes into working order
This option was silently broken when --enable-tablet's default changed
from false to true. The reason is that when --vnodes is passed, run only
removes --enable-tablets=true from scylla's command line. With the new
default this is not enough, we need to explicitely disable tablets to
override the default.

Closes scylladb/scylladb#20462
2024-09-13 17:10:09 +03:00
Nadav Har'El
f255391d52 cql-pytest: translate Cassandra's tests for arithmetic operators
This is a translation of Cassandra's CQL unit test source file
OperationFctsTest.java into our cql-pytest framework.

This is a massive test suite (over 800 lines of code) for Cassandra's
"arithmetic operators" CQL feature (CASSANDRA-11935), which was added
to Cassandra almost 8 years ago (and reached Cassandra 4.0), but we
never implemented it in Scylla.

All of the tests in suite fail in ScyllaDB due to our lack of this
feature:

  Refs #2693: Support arithmetic operators

One test also discovered a new issue:

  Refs #20501: timestamp column doesn't allow "UTC" in string format

All the tests pass on Cassandra.

Some of the tests insist on specific error message strings and specific
precision for decimal arithmetic operations - where we may not necessarily
want to be 100% compatible with Cassandra in our eventual implementation.
But at least the test will allow us to make deliberate - and not accidental -
deviations from compatibility with Cassandra.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#20502
2024-09-13 14:52:59 +03:00
Botond Dénes
d3a9654fcc Merge 'Make use of async() context in sstable_mutation_test' from Pavel Emelyanov
This test runs all its cases in seastar thread, but still uses .then() continuations in some of them.
This PR converts all continuations into plain .get()-s.

Closes scylladb/scylladb#20457

* github.com:scylladb/scylladb:
  test: Restore indentation after previous changes
  test: Threadify tombstone_in_tombstone2()
  test: Threadify range_tombstone_reading()
  test: Threadify tombstone_in_tombstone()
  test: Threadify broken_ranges_collection()
  test: Threadify compact_storage_dense_read()
  test: Threadify compact_storage_simple_dense_read()
  test: Threadify compact_storage_sparse_read()
  test: Simplify test_range_reads() counting
  test: Simplify test_range_reads() inner loop
  test: Threadify test_range_reads() itself
  test: Threadify test_range_reads() callers
  test: Threadify generate_clustered() itself
  test: Threadify generate_clustered() callers
  test: Threadify test_no_clustered test
  test: Threadify nonexistent_key test
2024-09-13 14:09:53 +03:00
Aleksandra Martyniuk
2c4b1d6b45 test: check if cleanup of deallocated sg is ignored 2024-09-13 13:00:58 +02:00
Aleksandra Martyniuk
20d6cf55f2 replica: ignore cleanup of deallocated storage group
Currently, attempt to cleanup deallocated storage group throws
an exception. Failed tablet cleanup is retried, stucking
in an endless loop.

Ignore cleanup of deallocated storage group.
2024-09-13 13:00:53 +02:00
Andrei Chekun
bad7407718 test.py: Add support for BOOST_DATA_TEST_CASE
Currently, test.py will throw an error if the test will use
BOOST_DATA_TEST_CASE. test.py as a first step getting all test
functions in the file, but when BOOST_DATA_TEST_CASE will be used the
output will have additional lines indicating parametrized test that
test.py can not handle. This commit adds handling this case, as a caveat
all tests should start from 'test' or they will be ignored.

Closes: #20530

Closes scylladb/scylladb#20556
2024-09-13 13:44:26 +03:00
Botond Dénes
7cb8cab2ae Merge 'Remove make_shared_schema() helper' from Pavel Emelyanov
This function was obsoleted by schema_builder some time ago. Not to patch all its callers, that helper became wrapper around it. Remained users are all in tests, and patching the to use builder directory makes the code shorter in many cases.

Closes scylladb/scylladb#20466

* github.com:scylladb/scylladb:
  schema: Ditch make_shared_schema() helper
  test: Tune up indentation in uncompressed_schema()
  test: Make tests use schema_builder instead of make_shared_schema
2024-09-13 12:25:10 +03:00
Pavel Emelyanov
730731da4a test: Remove unused table config from max_ongoing_compaction_test
The local config is unused since #15909, when the table creation was
changed to use env's facilities.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20511
2024-09-13 12:21:56 +03:00
Pavel Emelyanov
4c77f474ed test: Remove unused upload_path local variable
Since #14152 creation of an sstable takes table dir and its state. The
test in question wants to create and sstable in upload/ subdir and for
that it used to maintain full "cf.dir/upload" path, which is not
required any more.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20514
2024-09-13 12:21:00 +03:00
Pavel Emelyanov
e9a1c0716f test: Use sstables::test_env to make sstables for directory test
This is continuation of #20431 in another test. After #20395 it's also
possible to remove unused local dir variables.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20541
2024-09-13 12:19:59 +03:00
Botond Dénes
4fb194117e Merge 'Generalize multipart upload implementations in S3 client' from Pavel Emelyanov
There are two currently -- upload_sink_base and do_upload_file. This PR merges as much code as possible (spoiler: it's already mostly copy-n-pase-d, so squashing is pretty straightforward)

Closes scylladb/scylladb#20568

* github.com:scylladb/scylladb:
  s3/client: Reuse class multipart_upload in do_upload_file
  s3/client: Split upload_sink_base class into two
2024-09-13 10:35:10 +03:00
Kefu Chai
cf1f90fe0c auth: remove unused #include
the `seastar/core/print.hh` header is no longer required by
`auth/resource.hh`. this was identified by clang-include-cleaner.
As the code is audited, wecan safely remove the #include directive.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20575
2024-09-13 09:49:05 +03:00
Botond Dénes
c7c5817808 Merge 'Improve timestamp heuristics for tombstone garbage collection' from Benny Halevy
When purging regular tombstone consult the min_live_timestamp, if available.
This is safe since we don't need to protect dead data from resurrection, as it is already dead.

For shadowable_tombstones, consult the min_memtable_live_row_marker_timestamp,
if available, otherwise fallback to the min_live_timestamp.

If we see in a view table a shadowable tombstone with time T, then in any row where the row marker's timestamp is higher than T the shadowable tombstone is completely ignored and it doesn't hide any data in any column, so the shadowable tombstone can be safely purged without any effect or risk resurrecting any deleted data.

In other words, rows which might cause problems for purging a shadowable tombstone with time T are rows with row markers older or equal T. So to know if a whole sstable can cause problems for shadowable tombstone of time T, we need to check if the sstable's oldest row marker (and not oldest column) is older or equal T. And the same check applies similarly to the memtable.

If both extended timestamp statistics are missing, fallback to the legacy (and inaccurate) min_timestamp.

Fixes scylladb/scylladb#20423
Fixes scylladb/scylladb#20424

> [!NOTE]
> no backport needed at this time
> We may consider backport later on after given some soak time in master/enterprise
> since we do see tombstone accumulation in the field under some materialized views workloads

Closes scylladb/scylladb#20446

* github.com:scylladb/scylladb:
  cql-pytest: add test_compaction_tombstone_gc
  sstable_compaction_test: add mv_tombstone_purge_test
  sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection
  sstable_compaction_test: tombstone_purge_test: add testlog debugging
  sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp
  sstable, compaction: add debug logging for extended min timestamp stats
  compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats
  compaction: define max_purgeable_fn
  tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh
  sstables: scylla_metadata: add ext_timestamp_stats
  compaction_group, storage_group, table_state: add extended timestamp stats getters
  sstables, memtable: track live timestamps
  memtable_encoding_stats_collector: update row_marker: do nothing if missing
2024-09-13 08:56:51 +03:00
Takuya ASADA
3cd2a61736 dist: drop scylla-jmx
Since JMX server is deprecated, drop them from submodule, build system
and package definition.

Related scylladb/scylla-tools-java#370
Related #14856

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes scylladb/scylladb#17969
2024-09-13 07:59:45 +03:00
Botond Dénes
fc9804ec31 Update tools/java submodule
* tools/java 0b4accdd...e505a6d3 (1):
  > [C-S] Make it use DCAwareRoundRobinPolicy unless rack is provided

Closes scylladb/scylladb#20562
2024-09-13 06:30:04 +03:00
Pavel Emelyanov
17e7d3145c s3/client: Reuse class multipart_upload in do_upload_file
Uploading a file is implemented by the do_upload_file class. This class
re-implements a big portion of what's currently in multipart_upload one.
This patch makes the former class inherit from the latter and removes
all the duplication from it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-12 18:38:16 +03:00
Pavel Emelyanov
14b741afc9 s3/client: Split upload_sink_base class into two
This class implements two facilities -- multipart upload protocol itself
plus some common parts of upload_sink_impl (in fact -- only close() and
plugs put(packet)).

This patch aplits those two facilities into two classes. One of them
will be re-used later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-12 18:00:19 +03:00
Sergey Zolotukhin
612a141660 raft: Fix race condition on override_snapshot_thresholds.
When the server_impl::applier_fiber is paused by a co_await at line raft/server.cc:1375:
```
co_await override_snapshot_thresholds();
```
a new snapshot may be applied, which updates the actual values of the log's last applied
and snapshot indexes. As a result, the new snapshot index could become higher than the
old value stored in _applied_idx at line raft/server.cc:1365, leading to an assertion
failure in log::last_conf_for().
Since error injection is disabled in release builds, this issue does not affect production releases.

This issue was introduced in the following commit
9dfa041fe1,
when error injection was added to override the log snapshot configuration parameters.

How to reproduce:

1. Build debug version of randomized_nemesis_test
```
ninja-build build/debug/test/raft/randomized_nemesis_test
```
2. Run
```
parallel --halt now,fail=1 -j20 'build/debug/test/raft/randomized_nemesis_test \
--run_test=test_frequent_snapshotting  -- -c2 -m2G --overprovisioned --unsafe-bypass-fsync 1 \
--kernel-page-cache 1 --blocked-reactor-notify-ms 2000000  --default-log-level \
trace > tmp/logs/eraseme_{}.log  2>&1 && rm tmp/logs/eraseme_{}.log' ::: {1..1000}
```

Fixes scylladb/scylladb#20363

Closes scylladb/scylladb#20555
2024-09-12 16:19:27 +02:00
Aleksandra Martyniuk
59fba9016f docs: operating-scylla: add task manager docs
Admin-facing documentation of task manager.

Closes scylladb/scylladb#20209
2024-09-12 16:42:28 +03:00
Nadav Har'El
d49dbb944c Merge 'doc: move Alternator in the page tree and remove it's redundant ToC' from Anna Stuchlik
This PR hides the ToC on the Alternator page, as we don't need it, especially at the end of the page.

The ToC must be hidden rather than removed because removing it would, in turn, remove the "Getting Started With ScyllaDB Alternator" and "ScyllaDB Alternator for DynamoDB users" from the page tree and make them inaccessible.

In addition, this PR moves Alternator higher in the page tree.

Fixes https://github.com/scylladb/scylladb/issues/19823

Closes scylladb/scylladb#20565

* github.com:scylladb/scylladb:
  doc: move Alternator higher in the page tree
  doc: hide the redundant ToC on the Alternator page
2024-09-12 15:58:34 +03:00
Nadav Har'El
930accad12 alternator: return error on unused AttributeDefinitions
A CreateTable request defines the KeySchema of the base table and each
of its GSIs and LSIs. It also needs to give an AttributeDefinition for
each attribute used in a KeySchema - which among other things specifies
this attribute's type (e.g., S, N, etc.). Other, non-key, attributes *do
not* have a specified type, and accordingly must not be mentioned in
AttributeDefinitions.

Before this patch, Alternator just ignored unused AttributeDefinitions
entries, whereas DynamoDB throws an error in this case. This patch fixes
Alternator's behavior to match DynamoDB's - and adds a test to verify this.

Besides being more error-path-compatible with DynamoDB, this extra check
can also help users: We already had one user complaining that an
AttributeDefinitions setting he was using was ignored, not realizing
that it wasn't used by any KeySchema. A clear error message would have
saved this user hours of investigation.

Fixes #19784.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#20378
2024-09-12 15:37:18 +03:00
Pavel Emelyanov
632a65bffa Merge 'repair: row_level: coroutinize more functions' from Avi Kivity
Coroutinize more functions in row-level repair to improve maintainability.

The functions all deal with repair buffers, so coroutinization does not affect performance.

Cleanup, no reason to backport

Closes scylladb/scylladb#20464

* github.com:scylladb/scylladb:
  repair: row_level: restore indentation
  repair: row_level: coroutinize repair_service::insert_repair_meta()
  repair: row_level: coroutinize repair_meta::get_full_row_hashes()
  repair: row_level: coroutinize repair_meta::apply_rows_on_follower()
  repair: row_level: coroutinize repair_meta::clear_working_row_buf()
  repair: row_level: coroutinize get_common_diff_detect_algorithm()
  repair: row_level: coroutinize repair_service::remove_repair_meta() (non-selective overload)
  repair: row_level: coroutinize repair_service::remove_repair_meta() (by-address overload)
  repair: row_level: coroutinize repair_service::remove_repair_meta() (by-id overload)
  repair: row_level: row_level_repair::run()
  repair: row_level: row_level_repair::send_missing_rows_to_follower_nodes()
  repair: row_level: row_level_repair::get_missing_rows_from_follower_nodes()
  repair: row_level: row_level_repair::negotiate_sync_boundary()
  repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_process_op()
  repair: row_level: coroutinize repair_meta::get_sync_boundary_handler()
  repair: row_level: coroutinize repair_meta::get_sync_boundary()
  repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions_handler()
  repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions()
  repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions_handler()
  repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions()
  repair: row_level: coroutinize repair_meta::repair_row_level_stop_handler()
  repair: row_level: coroutinize repair_meta::repair_row_level_stop()
  repair: row_level: coroutinize repair_meta::repair_row_level_start_handler()
  repair: row_level: coroutinize repair_meta::repair_row_level_start()
  repair: row_level: coroutinize repair_meta::get_combined_row_hash_handler()
  repair: row_level: coroutinize repair_meta::get_combined_row_hash()
  repair: row_level: coroutinize repair_meta::get_full_row_hashes_handler()
  repair: row_level: coroutinize repair_meta::get_full_row_hashes_with_rpc_stream()
  repair: row_level: coroutinize repair_meta::request_row_hashes()
2024-09-12 15:35:57 +03:00
Kefu Chai
197451f8c9 utils/rjson.cc: include the function name in exception message
recently, we are observing errors like:

```
stderr: error running operation: rjson::error (JSON SCYLLA_ASSERT failed on condition 'false', at: 0x60d6c8e 0x4d853fd 0x50d3ac8 0x518f5cd 0x51c4a4b 0x5fad446)
```

we only passed `false` to the `RAPIDJSON_ASSERT()` macro, so what we
have is but the type of the error (rjson::error) and a backtrace.
would be better if we can have more information without recompiling
or fetching the debug symbols for decipher the backtrace.

Refs scylladb/scylladb#20533
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20539
2024-09-12 15:22:49 +03:00
Anna Stuchlik
851e903f46 doc: move Alternator higher in the page tree 2024-09-12 14:08:26 +02:00
Anna Stuchlik
a32ff55c66 doc: hide the redundant ToC on the Alternator page
This commit hides the ToC, as we don't need it, especially at the end of the page.
The ToC must be hidden rather than removed because removing it would, in turn,
remove the "Getting Started With ScyllaDB Alternator" and "ScyllaDB Alternator for DynamoDB users"
from the page tree and make them inaccessible.
2024-09-12 14:01:15 +02:00
Alexey Novikov
8b6e987a99 test: add test_pinned_cl_segment_doesnt_resurrect_data
add test for issue when writes in commitlog segments pinned to another table can be resurrected.
This test based on dtest code published in #14870 and adapted for community version.
It's a regression test for #15060 fix and should fail before this patch and succeed afterwards.

Refs #14870, #15060

Closes scylladb/scylladb#20331
2024-09-12 10:58:22 +03:00
Takuya ASADA
90ab2a24df toolchain: restore multiarch build
When we introduced optimized clang at 6e487a4, we dropped multiarch build on frozen toolchain, because building clang on QEMU emulation is too heavy.

Actually, even after the patch merged, there are two mode which does not build clang, --clang-build-mode INSTALL_FROM and --clang-build-mode SKIP.
So we should restore multiarch build only these mode, and keep skipping on INSTALL mode since it builds clang.

Since we apply multiarch on INSTALL_FROM mode, --clang-archive replaced
to --clang-archive-x86_64 and --clang-archive-aarch64.

Note that this breaks compatibility of existing clang archive, since it
changes clang root directory name from llvm-project to llvm-project-$ARCH.

Closes #20442

Closes scylladb/scylladb#20444
2024-09-12 10:44:45 +03:00
Kefu Chai
3e84d43f93 treewide: use seastar::format() or fmt::format() explicitly
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.

that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:

```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
  265 |     return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
      |            ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
 4290 |     format(format_string<_Args...> __fmt, _Args&&... __args)
      |     ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
  143 | format(fmt::format_string<A...> fmt, A&&... a) {
      | ^
```

in this change, we

change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
  `seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
  because, `sstring::operator std::basic_string` would incur a deep
  copy.

we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-11 23:21:40 +03:00
Pavel Emelyanov
f227f4332c test: Remove unused path local variable
Left after #20499 :(

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20540
2024-09-11 23:10:25 +03:00
Avi Kivity
ed7d352e7d Merge 'Validate checksums for uncompressed SSTables' from Nikos Dragazis
This PR introduces a new file data source implementation for uncompressed SSTables that will be validating the checksum of each chunk that is being read. Unlike for compressed SSTables, checksum validation for uncompressed SSTables will be active for scrub/validate reads but not for normal user reads to ensure we will not have any performance regression.

It consists of:
* A new file data source for uncompressed SSTables.
* Integration of checksums into SSTable's shareable components. The validation code loads the component on demand and manages its lifecycle with shared pointers.
* A new `integrity_check` flag to enable the new file data source for uncompressed SSTables. The flag is currently enabled only through the validation path, i.e., it does not affect normal user reads.
* New scrub tests for both compressed and uncompressed SSTables, as well as improvements in the existing ones.
* A change in JSON response of `scylla validate-checksums` to report if an uncompressed SSTable cannot be validated due to lack of checksums (no `CRC.db` in `TOC.txt`).

Refs #19058.

New feature, no backport is needed.

Closes scylladb/scylladb#20207

* github.com:scylladb/scylladb:
  test: Add test to validate SSTables with no checksums
  tools: Fix typo in help message of scylla validate-checksums
  sstables: Allow validate_checksums() to report missing checksums
  test: Add test for concurrent scrub/validate operations
  test: Add scrub/validate tests for uncompressed SSTables
  test/lib: Add option to create uncompressed random schemas
  test: Add test for scrub/validate with file-level corruption
  test: Check validation errors in scrub tests
  sstables: Enable checksum validation for uncompressed SSTables
  sstables: Expose integrity option via crawling mutation readers
  sstables: Expose integrity option via data_consume_rows()
  sstables: Add option for integrity check in data streams
  sstables: Remove unused variable
  sstables: Add checksum in the SSTable components
  sstables: Introduce checksummed file data source implementation
  sstables: Replace assert with on_internal_error
2024-09-11 23:09:45 +03:00
Calle Wilund
b7839ec5d0 cql_test_env: Use temp socket + retry to ensure usable port for message_service if listen is enabled
Fixes #20543

In cql_test_env, if cfg_in.ms_listen is set, we try to get a free port for the current test on
which message service rpc can bind. This to allow multiple tests in parallel.

However, we just do this by using random and getting a number, not actually verifying it against
host ports in use.

This is complicated further by the fact that port reuse is effectively disabled in seastar
(see reactor::posix_reuseport_detect()). Due to this, the solution applied here is a combo
of
* Create temp socket with port = 0 to get a previously free port
* Close socket right before listen (to handle reuse not working)
* Retry on EADDRINUSE

Closes scylladb/scylladb#20547
2024-09-11 23:02:41 +03:00
Aleksandra Martyniuk
31ea74b96e db: system_keyspace: change version of topology_requests schema
In 880058073b a new column (request_type)
was added to topology_requests table, but the table's schema version
wasn't changed. Due to that during cluster upgrade, the old and the new
versions occur but they are not distinguishable.

Add offset to schema version of topology_requests table if it contains
request_type column.

Fixes: #20299.

Closes scylladb/scylladb#20402
2024-09-11 16:36:35 +03:00
Piotr Dulikowski
d98708013c Merge 'view: move view_build_status to group0' from Michael Litvak
Migrate the `system_distributed.view_build_status` table to `system.view_build_status_v2`. The writes to the v2 table are done via raft group0 operations.

The new parameter `view_builder_version` stored in `scylla_local` indicates whether nodes should use the old or the new table.

New clusters use v2. Otherwise, the migration to v2 is initiated by the topology coordinator when the feature is enabled. It reads all the rows from the old table and writes them to the new table, and sets `view_builder_version` to v2. When the change is applied, all view_builder services are updated to write and read from the v2 table.

The old table `system_distributed.view_build_status` is set to read virtually from the new table in order to maintain compatibility.

When removing a node from the cluster, we remove its rows from the table atomically (fixes https://github.com/scylladb/scylladb/issues/11836). Also, during the migration, we remove all invalid rows.

Fixes scylladb/scylladb#15329

dtest https://github.com/scylladb/scylla-dtest/pull/4827

Closes scylladb/scylladb#19745

* github.com:scylladb/scylladb:
  view: test view_build_status table with node replace
  test/pylib: use view_build_status_v2 table in wait_for_view
  view_builder: common write view_build_status function
  view_builder: improve migration to v2 with intermediate phase
  view: delete node rows from view_build_status on node removal
  view: sanitize view_build_status during migration
  view: make old view_build_status table a virtual table
  replica: move streaming_reader_lifecycle_policy to header file
  view_builder: test view_build_status_v2
  storage_service: add view_build_status to raft snapshot
  view_builder: migration to v2
  db:system_keyspace: add view_builder_version to scylla_local
  view_builder: read view status from v2 table
  view_builder: introduce writing status mutations via raft
  view_builder: pass group0_client and qp to view_builder
  view_builder: extract sys_dist status operations to functions
  db:system_keyspace: add view_build_status_v2 table
2024-09-11 13:02:58 +02:00
Nikos Dragazis
d1152a200f test: Add test to validate SSTables with no checksums
In a previous patch we extended the return status of
`sstables::validate_checksums()` to report if an SSTable cannot be
validated due to a missing CRC component (i.e., CRC.db does not appear
in TOC.txt).

Add a test case for this.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 13:12:40 +03:00
Nikos Dragazis
1f275c71b1 tools: Fix typo in help message of scylla validate-checksums
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 13:12:39 +03:00
Nikos Dragazis
5c0a7f706b sstables: Allow validate_checksums() to report missing checksums
Change the return type of `sstable::validate_checksums()` from binary
(valid/invalid) to a ternary (valid/invalid/no_checksums). The third
status represents uncompressed SSTables without a CRC component (no
entry for CRC.db in the TOC).

Also, change the JSON response of `sstable validate-checksums` to expose
the new status. Replace the boolean value for valid/invalid checksums
with an object that contains two boolean keys: one that indicates if the
SSTable has checksums, and one that indicates if the checksums are valid
or not. The second key is optional and appears only if the SSTable has
checksums.

Finally, update the documentation to reflect the changes in the API.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 13:12:39 +03:00
Nikos Dragazis
5a284f4a9d test: Add test for concurrent scrub/validate operations
Theoretically it is possible to launch more than one scrub instances
simultaneously. Since the checksum component is a shared resource,
accesses have to be synchronized.

Add a test that launches two scrub operations in validate mode and
ensures that the checksum component is loaded once, referenced by all
scrub instances via shared pointers, and deleted once the scrub
operations finish. Introduce an injection point to achieve concurrent
execution of scrubs.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 13:12:39 +03:00
Nikos Dragazis
e2353f3b3e test: Add scrub/validate tests for uncompressed SSTables
Currently the unit tests check scrub in validate mode against compressed
SSTables only. Mirror the tests for uncompressed SSTables as well.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 13:12:39 +03:00
Nikos Dragazis
2991b09c8e test/lib: Add option to create uncompressed random schemas
Extend the `random_schema_specification` to support creating both
compressed and uncompressed schemas.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 13:12:32 +03:00
Nikos Dragazis
4f56c587f6 test: Add test for scrub/validate with file-level corruption
Currently, we test scrub/validate only against a corrupted SSTable with
content-level corruption (out-of-order partition key).

Add a test for file-level corruption as well. This should trigger the
checksum check in the underlying compressed file data source
implementation.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 12:28:59 +03:00
Nikos Dragazis
cc10a5f287 test: Check validation errors in scrub tests
Scrub was extended in PR #11074 to report validation errors but the
unit tests were not updated.

Update the tests to check the validation errors reported by scrub.
Validation errors must be zero for valid SSTables and non-zero for
invalid SSTables.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 12:28:59 +03:00
Nikos Dragazis
719757fba9 sstables: Enable checksum validation for uncompressed SSTables
Extend the `sstable::validate()` to validate the checksums of
uncompressed SSTables. Given that this is already supported for
compressed SSTables, this allows us to provide consistent behavior
across any type of SSTable, be it either compressed or uncompressed.

The most prominent use case for this is scrub/validate, which is now
able to detect file-level corruption in uncompressed SSTables as
well.

Note that this change will not affect normal user reads which skip
checksum validation altogether.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 12:28:59 +03:00
Nikos Dragazis
716fc487fd sstables: Expose integrity option via crawling mutation readers
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 12:28:59 +03:00
Nikos Dragazis
1d2dc9f2e1 sstables: Expose integrity option via data_consume_rows()
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 12:28:59 +03:00
Nikos Dragazis
2feced32f7 sstables: Add option for integrity check in data streams
Add a new boolean parameter in `sstable::data_stream()` to
enable/disable integrity mechanisms in the underlying data streams.
Currently, this only affects uncompressed SSTables and it allows to
enable/disable checksum validation on each chunk. The validation happens
transparently via the checksummed data source implementation.

The reason we need this option is to allow differentiating the behavior
between normal user reads and scrub/validate reads. We would like to
enable scrub to verify checksums for uncompressed SSTables, while
leaving normal user reads unchanged for performance reasons (read
amplification due to round up of reads to chunk size and loading of the
CRC component).

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 12:27:54 +03:00
Nikos Dragazis
d5bd40ad2c sstables: Remove unused variable
Remove unused stream variable from `sstable::data_stream()`. This was
introduced in commit 47e07b787e but never used.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 12:27:54 +03:00
Nikos Dragazis
2575d20f41 sstables: Add checksum in the SSTable components
Uncompressed SSTables store their checksums in a separate CRC.db file.
Add this in the list of SSTable components.

Since this component is used only for validation, load the component
on-demand for validation tasks and delete it when all validation tasks
finish. In more detail:

- Make the checksum component shareable and weakly referencable.
  Also, add a constructor since it is no longer an aggregate.
- Use a weak pointer to store a non-owning reference in the components
  and a shared pointer to keep the object alive while validation runs.
  Once validation finishes, the component should be cleaned up
  automatically.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 12:27:38 +03:00
Nikos Dragazis
b7dfba4c18 sstables: Introduce checksummed file data source implementation
Introduce a new data source implementation for uncompressed SSTables.

This is just a thin wrapper for a raw data source that also performs
checksum validation for each chunk. This way we can have consistent
behavior for compressed and uncompressed SSTables.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-11 12:26:18 +03:00
Botond Dénes
0e5b444777 Merge 'database::get_all_tables_flushed_at: fix return value' from Lakshmi Narayanan Sreethar
The `database::get_all_tables_flushed_at` method returns a variable
without setting the computed all_tables_flushed_at value. This causes
its caller, `maybe_flush_all_tables` to flush all the tables everytime
regardless of when they were last flushed. Fix this by returning
the computed value from `database::get_all_tables_flushed_at`.

Fixes #20301

Requires a backport to 6.0 and 6.1 as they have the same issue.

Closes scylladb/scylladb#20471

* github.com:scylladb/scylladb:
  cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config
  database::get_all_tables_flushed_at: fix return value
2024-09-11 11:43:45 +03:00
Benny Halevy
4e8f3f4cdd cql-pytest: add test_compaction_tombstone_gc
Test tombstone garbage collection with:
1. conflicting live data in memtable (verifying there is no regression
   in this area)
2. deletion in memtable (reproducing scylladb/scylladb#20423)
3. materialized view update in memtable (reproducing scylladb/scylladb#20424)
in materialized_views

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:06:23 +03:00
Benny Halevy
9270348c38 sstable_compaction_test: add mv_tombstone_purge_test
Simulate view updates pattern and verify that they
don't inhibit tombstone garbage collection.

Verify fix for scylladb/scylladb#20424

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:06:23 +03:00
Benny Halevy
0407e50aa4 sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection
Tests fix for scylladb/scylladb#20423

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:06:06 +03:00
Benny Halevy
a7caa79df7 sstable_compaction_test: tombstone_purge_test: add testlog debugging
Add some testlog debug printouts for the make_* helpers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:58 +03:00
Benny Halevy
470d301fe3 sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp
Rather than forging a timestamp from the gc_clock
just use `next_timestamp` do it can be considered
for tomebstone purging purposes.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:58 +03:00
Benny Halevy
5849ba83e0 sstable, compaction: add debug logging for extended min timestamp stats
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
7d893a5ed9 compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats
When purging regular tombstone consult the min_live_timestamp,
if available.

For shadowable_tombstones, consult the
min_memtable_live_row_marker_timestamp,
if available, otherwise fallback to the min_live_timestamp.

If both are missing, fallback to the legacy
(and inaccurate) min_timestamp.

Fixes scylladb/scylladb#20423
Fixes scylladb/scylladb#20424

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
57e9e9c369 compaction: define max_purgeable_fn
Before we add a new, is_shadowable, parameter to it.

And define global `can_always_purge` and `can_never_purge`
functions, a-la `always_gc` and `never_gc`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
b6fabd98c6 tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh
And define `never_gc` globally, same as `always_gc`

Before adding a new, is_shadowable parameter to it.

Since it is used in the context of compaction
it better fits compaction_garbage_collector header
rather than tombstone.hh

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
4de4af954f sstables: scylla_metadata: add ext_timestamp_stats
Store and retrieve the optional extended timestamp statistics
(min_live_timestamp and min_live_row_marker_timestamp)
in the scylla_metadata component.

Note that there is no need for a cluster feature to
store those attributes since the scylla_metadata
on-disk format is extensible so that old sstables
can be read by new versions, seeing the extra stats
is missing, and new sstables can be read by old
versions that ignore unknown scylla metadata section types.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
6f202cf48b compaction_group, storage_group, table_state: add extended timestamp stats getters
To return the minimum live timestamp and live row-marker
timestamp across a compaction_group, storage_group, or
table_state.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
14d86a3a12 sstables, memtable: track live timestamps
When garbage collecting tombstones, we care only
about shadowing of live data.  However, currently
we track min/max timestamp of both live and dead
data, but there is no problem with purging tombstones
that shadow dead data (expired or shdowed by other
tombstones in the sstable/memtable).

Also, for shadowable tombstones, we track live row marker timestamps
separately since, if the live row marker timestamp is greater than
a shadowable tombstone timestamp, then the row marker
would shadow the shadowable tombstone thus exposing the cells
in that row, even if their timestasmp may be smaller
than the shadow tombstone's.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:49 +03:00
Abhi
9b09439065 raft: Add descriptions for requested abort errors
Fixes: scylladb/scylladb#18902

Closes scylladb/scylladb#20291
2024-09-10 17:56:29 +02:00
Botond Dénes
de81388edb Merge 'commitlog: Handle oversized entries' from Calle Wilund
Refs #18161

Yet another approach to dealing with large commitlog submissions.

We handle oversize single mutation by adding yet another entry
typo: fragmented. In this case we only add a fragment (aha) of
the data that needs storing into each entry, along with metadata
to correlate and reconstruct the full entry on replay.

Because these fragmented entries are spread over N segments, we
also need to add references from the first segment in a chain
to the subsequent ones. These are released once we clear the
relevant cf_id count in the base.
                 *
This approach has the downside that due to how serialization etc
works w.r.t. mutations, we need to create an intermediate buffer
to hold the full serialized target entry. This is then incrementally
written into entries of < max_mutation_size, successively requesting
more segments.

On replay, when encountering a fragment chain, the fragment is
added to a "state", i.e. a mapping of currently processing
frag chains. Once we've found all fragments and concatenated
the buffers into a single fragmented one, we can issue a
replay callback as usual.

Note that a replay caller will need to create and provide such
a state object. Old signature replay function remains for tests
and such.

This approach bumps the file format (docs to come).

To ensure "atomicity" we both force synchronization, and should
the whole op fail, we restore segment state (rewinding), thus
discarding data all we wrote.

Closes scylladb/scylladb#19472

* github.com:scylladb/scylladb:
  commitlog/database: Make some commitlog options updatable + add feature listener
  features/config: Add feature for fragmented commitlog entries
  docs: Add entry on commitlog file format v4
  commitlog_test: Add more oversized cases
  commitlog_replayer: Replay segments in order created
  commitlog_replayer: Use replay state to support fragmented entries
  commitlog_replayer: coroutinize partly
  commitlog: Handle oversized entries
2024-09-10 17:15:46 +03:00
Benny Halevy
8d67357c42 memtable_encoding_stats_collector: update row_marker: do nothing if missing
If the row_marker is missing then its timestamp
is missing as well, so there's no point
calling update_timestamp for it.  Better return early.

This should cause no functional change.

The following patch will add more logic
for tracking extended timestamp stats.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 16:46:34 +03:00
Pavel Emelyanov
b6f662417c table: Remove unused database& argument from take_snapshot() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20496
2024-09-10 14:53:06 +03:00
Gleb Natapov
af83c5e53e group0: stop group0 before draining storage service during shutdown
Currently storage service is drained while group0 is still active. The
draining stops commitlogs, so after this point no more writes are
possible, but if group0 is still active it may try to apply commands
which will try to do writes and they will fail causing group0 state
machine errors. This is benign since we are shutting down anyway, but
better to fix shutdown order to keep logs clean.

Fixes scylladb/scylladb#19665
2024-09-10 13:15:56 +02:00
Lakshmi Narayanan Sreethar
a0f4fe3fc4 cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-10 16:39:05 +05:30
Lakshmi Narayanan Sreethar
4ca720f0bd database::get_all_tables_flushed_at: fix return value
The `database::get_all_tables_flushed_at` method returns a variable
without setting the computed all_tables_flushed_at value. This causes
its caller, `maybe_flush_all_tables` to flush all the tables everytime
regardless of when they were last flushed. Fix this by returning
the computed value from `database::get_all_tables_flushed_at`.

Fixes #20301

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-10 16:35:47 +05:30
Yaniv Michael Kaul
a4ff0aae47 HACKIGN.md: clarify the use of dbuild when running test.py
If you are using dbuild, that's where test.py needs to run.

Also, replace 'Docker image' with the more generic 'container' term.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#20336
2024-09-10 13:40:45 +03:00
Botond Dénes
08f109724b docs/cql/ddl.rst: fix description of sstable_compression
ScyllaDB doesn't support custom compressors. The available compressors
are the only available ones, not the default ones.
Adjust the text to reflect this.

Closes scylladb/scylladb#20225
2024-09-10 13:39:24 +03:00
Pavel Emelyanov
cfa59ab73d test: Use single temp dir for sharded<sstables::test_env>
The test-env in question is mostly started in one-shard mode. Also there
are several boost tests that start sharded<> environment. In that case
instances on different shards live in different temp dirs. That's not
critical yet, but better to have single directory for the whole test.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20412
2024-09-10 11:25:04 +03:00
Artsiom Mishuta
f95c257a1e [test.py]: Fail test teardown in case of task leakage
In test.py every asyncio task spawned during the test must be finished before the next test, otherwise, tests might affect each other results.
The developers are responsible for writing asyncio code in a way that doesn’t leave task objects unfinished.
Test.py has a mechanism that helps test writers avoid such tasks. At the end of each test case, it verifies that the test did not produce/leave any tasks and sets an event object that fails the next test at the start if this is the case(issue https://github.com/scylladb/scylladb/issues/16472)
The problem with this was that breaking the next test was counterintuitive, and the logging for this situation was insufficient and unobvious.

notes:  Task.cancel() is not an option to avoid task leakage
        1) Calling cancel() Does Not Cancel The Task :  the cancel() method just  request that the target task cancel.
        2) Calling cancel() Does Not Block Until The Task is Cancelled:  If the caller needs to know the task is cancelled and done, it could await for the target
        3) In particular PR, task.cancel() cancell task on client(ManagerClient) but not on http server(ScyllaManager). so "await" is needed.

Closes scylladb/scylladb#20012
2024-09-10 10:51:45 +03:00
Pavel Emelyanov
ac2127a640 test: Call table::make_sstable() directly in compaction test
The test in question generates a bunch of table_for_tests objects and
creates sstables for each. For that it calls test_env::make_sstable(),
but it can be made shorter, by calling table method directly.

The hidden goal of this change is to remove the explicit caller of
table::dir() method. The latter is going away.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20451
2024-09-10 10:19:20 +03:00
Botond Dénes
76bb22664a Merge 'Sanitize open_sstables() helper in compaction test' from Pavel Emelyanov
This includes
- coroutinization
- elimination of unused overload

Closes scylladb/scylladb#20456

* github.com:scylladb/scylladb:
  test: Squash two open_sstables() helper together
  test: Coroutinize open_sstables() helper
2024-09-10 10:18:33 +03:00
Botond Dénes
a4a4797e27 Merge 'Alternator: tests and other preparation towards allowing adding a GSI to an existing table' from Nadav Har'El
This series prepares us for working on #11567 -  allow adding a GSI to a pre-existing table. This will require changing the implementation of GSIs in Alternator to not use real columns in the schema for the materialized view, and instead of a computed column - a function which extracts the desired member from the `:attrs` map and de-serializes it.

This series does not contain the GSI re-implementation itself. Rather it contains a few small cleanups and mostly - new regression tests that cover this area, of adding and removing a GSI, and **using** a GSI, in more details than the tests we already had. I developed most of these tests while working on **buggy** fixes for #11567; The bugs in those implementations were exposed by the tests added here - they exposed bugs both in the new feature of adding or removing a GSI, and also regressions to the ordinary operation of GSI. So these tests should be helpful for whoever ends up fixing #11567, be it me based on my buggy implementation (which is _not_ included in this patch series), or someone else.

No backports needed - this is part of a new feature, which we don't usually backport.

Closes scylladb/scylladb#20383

* github.com:scylladb/scylladb:
  test/alternator: more extensive tests for GSI with two new key attributes
  test/alternator: test invalid key types for GSI
  test/alternator: test combination of LSI and GSI
  test/alternator: expand another test to use different write operations
  test/alternator: test GSIs with different key types
  alternator: better error message in some cases of key type mismatch
  test/alternator: test for more elaborate GSI updates
  test/alternator: strengthen tests for empty attribute values
  test/alternator: fix typo in test_batch.py
  test/alternator: more checks for GSI-key attribute validation
  Alternator: drop unneeded "IS NOT NULL" clauses in MV of GSI/LSI
  test/alternator: add more checks for adding/deleting a GSI
  test/alternator: ensure table deletions in test_gsi.py
2024-09-10 10:13:52 +03:00
Pavel Emelyanov
42f8d06a17 test: Use correct schema in directory tests with created table
There are some test cases in sstable_directory_test test actually create
a table with CQL and then try to manipulate its sstables with the help
of sstable_directory. Those tests use existing local helper that starts
sharded<sstable_directory> and this helper passes test-local static
schema to sstable_directory constructor. As a result -- the schema of a
table that test case created and the schema that sstable_directory works
with are different. They match in the columns layout, which helps the
test cases pass, but otherwise are two different schema objects with
different IDs. It's more correct to use table schema for those runs.

The fix introduces another helper to start sharded<sstable_directory>,
and the older wrapper around cql_test_env becomes unused. Drop it too
not to encourage future tests use it and re-introduce schema mismatch
again.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20499
2024-09-10 09:56:26 +03:00
Benny Halevy
f47b5e60bc sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory
To be able to atomically delete sstables both in
base table directory and in its sub-directories,
like `staging/`, use a shared pending_delete_dir
under under the base directory.

Note that this requires loading and processing
the base directory first.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 09:28:13 +03:00
Benny Halevy
44bd183187 sstables: storage: keep base directory in base class
so we can use the base (table) directory for
e.g. pending_delete logs, in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 09:28:13 +03:00
Benny Halevy
027e64876a sstables: storage: define opened_directory in header file
So it can be used outside the storage module
in the following patches.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 09:28:13 +03:00
Benny Halevy
a7b92d7b6f sstable_directory: use only dirlog
Currently, there are leftover log messages using
sstlog rather than dirlog, that was introduced
in aebd965f0e,
and that makes debugging harder.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 09:28:11 +03:00
Botond Dénes
fc690a60d8 Update tools/cqlsh submodule
* tools/cqlsh 86a280a1...b09bc793 (6):
  > build(deps): bump actions/download-artifact in /.github/workflows
  > cqlshlib/test: Add test_formatting.py
  > cqlshlib/test: Use assertEqual instead of assertEquals
  > cqlsh.py: Send DESCRIBE statement to server before parsing
  > cqlsh.py: Fix indentation
  > cqlsh.py: change shebang to /usr/bin/env python3
2024-09-10 08:11:40 +03:00
Lakshmi Narayanan Sreethar
2148e33d37 compaction: remove unnecessary share bump for split, scrub, and upgrade
When split, scrub, and upgrade compactions ran under the compaction
group, they had to bump up their shares to a minimum of 200 to prevent
slow progress as they neared completion, especially in workloads with
inconsistent ingestion rates. Since commit e86965c2 moved these
compactions to the maintenance group, this share bump is no longer
necessary. This patch removes the unnecessary share allocation.

Fixes #20224

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#20495
2024-09-09 22:03:38 +03:00
Avi Kivity
9448260b30 Merge 'major compaction: check only sstables being compacted for tombstone garbage collection' from Lakshmi Narayanan Sreethar
Any expired tombstone can be garbage collected if it doesn't shadow data in the commit log, memtable, or uncompacting SSTables.

This PR introduces a new mode to major compaction, enabled by the `consider_only_existing_data` flag that bypasses these checks. When enabled, memtables and old commitlog segments are cleared with a system-wide flush and all the sstables (after flush) are included in the compaction, so that it works with all data generated up to a given time point.

This new mode works with the assumption that newly written data will not be shadowed by expired tombstones. So it ignores new sstables (and new data written to memtable) created after compaction started. Since there was a system wide flush, commitlog checks can also be skipped when garbage collecting tombstones. Introducing data shadowed by a tombstone during compaction can lead to undefined behavior, even without this PR, as the tombstone may or may not have already been garbage collected.

Fixes #19728

Closes scylladb/scylladb#20031

* github.com:scylladb/scylladb:
  cql-pytest: add test to verify consider_only_existing_data compaction option
  tools/scylla-nodetool: add consider-only-existing-data option to compact command
  api: compaction: add `consider_only_existing_data` option
  compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp
  compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled
  tombstone_gc_state: introduce with_commitlog_check_disabled()
  compaction: introduce new option to check only compacting sstables for gc
  compaction: rename maybe_flush_all_tables to maybe_flush_commitlog
  compaction: maybe_flush_all_tables: add new force_flush param
2024-09-09 20:45:41 +03:00
Avi Kivity
894b85ce95 Merge 'hints: send hints with CL=ALL if target is leaving' from Piotr Dulikowski
Currently, when attempting to send a hint, we might choose its recipients in one of two ways:

- If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.

There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded.

As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destination of the hint is leaving.

Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again.

Fixes scylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835

Should be backported to 6.0 and 6.1; the dtest started failing due to topology on raft, which sped up execution of the test and exposed the preexisting problem.

Closes scylladb/scylladb#20488

* github.com:scylladb/scylladb:
  test: topology_custom/test_hints: consistency test for decommission
  test: topology_custom/test_hints: move sync point helpers to top level
  test: topology/util: extract find_server_by_host_id
  hints: send hints with CL=ALL if target is leaving
  hints: inline do_send_one_mutation
2024-09-09 18:23:13 +03:00
Avi Kivity
c3e19425bd Merge 'docs/dev/docker-hub.md: refresh aio-max-nr calculation' from Laszlo Ersek
~~~
What we have today in "docs/dev/docker-hub.md" on "aio-max-nr" dates back
to scylla commit f4412029f4 ("docs/docker-hub.md: add quickstart section
with --smp 1", 2020-09-22). Problems with the current language:

- The "65K" claim as default value on non-production systems is wrong;
  "fs/aio.c" in Linux initializes "aio_max_nr" to 0x10000, which is 64K.

- The section in question uses equal signs (=) incorrectly. The intent was
  probably to say "which means the same as", but that's not what equality
  means.

- In the same section, the relational operator "<" is bogus. The available
  AIO count must be at least as high (>=) as the requested AIO count.

- Clearer names should be used;
  adjust_max_networking_aio_io_control_blocks() in "src/core/reactor.cc"
  sets a great example:

  - "reactor::max_aio" should be called "storage_iocbs",

  - "detect_aio_poll" should be called "preempt_iocbs",

  - "reactor_backend_aio::max_polls" should be called "network_iocbs".

- The specific value 10000 for the last one ("network_iocbs") is not
  correct in scylla's context. It is correct as the Seastar default, but
  scylla has used 50000 since commit 2cfc517874 ("main, test: adjust
  number of networking iocbs", 2021-07-18).

Rewrite the section to address these problems.

See also:
- https://github.com/scylladb/scylladb/issues/5981
- https://github.com/scylladb/seastar/pull/2396
- https://github.com/scylladb/scylladb/pull/19921

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
~~~

No need for backporting; the documentation being refreshed targets developers as audience, not end-users.

Closes scylladb/scylladb#20398

* github.com:scylladb/scylladb:
  docs/dev/docker-hub.md: refresh aio-max-nr calculation
  docs/dev/docker-hub.md: strip trailing whitespace
2024-09-09 15:04:38 +03:00
Botond Dénes
3e0bff161c Merge 'Use yielding directory lister in sstable_directory' from Pavel Emelyanov
The yielding lister is considered to be better replacement that scan_dir(lambda) one.
Also, the sstable directory will be patched to scan the contents of S3 bucket and yielding lister fits better for generalization.

Closes scylladb/scylladb#20114

* github.com:scylladb/scylladb:
  sstable_directory: Fix indentation after previous patches
  sstable_directory: Use yielding lister in .handle_sstables_pending_delete()
  sstable_directory: Use yielding lister in .cleanup_column_family_temp_sst_dirs()
  sstable_directory: Use yielding lister in .prepare()
  sstable_directory: Shorten lister loop
  sstable_directory: Use with_closeable() in .process()
  directory_lister: Add noexcept default move-constructor
2024-09-09 14:35:51 +03:00
Pavel Emelyanov
0f48847d02 test: Use shorter with_sstable_directory overload()
In sstable directory test there are two of those -- one that works on
path, state, env and callback, and the other one that just needs env and
callback, getting path from env and assuming state is normal.

Two test cases in this test can enjoy the shorter one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20395
2024-09-09 14:25:24 +03:00
Pavel Emelyanov
2bfbbaffac test: Use sstables::test_env to make sstables for schema loader test
This test calls manager directly, but it's shorter to ask test_env for
that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20431
2024-09-09 14:22:58 +03:00
Takuya ASADA
e36c939505 dist: tune LimitNOFILES for large nodes
On very large node, LimitNOFILES=80000 may not enough size, it can cause
"Too many files" error.

To avoid that, let's increase LimitNOFILES on scylla_setup stage,
generate optimal value calurated from memory size and number of cpus.

Closes scylladb/scylla-enterprise#4304

Closes scylladb/scylladb#20443
2024-09-09 14:13:49 +03:00
Piotr Smaron
60af48f5fd cql: fix exception when validating KS in CREATE TABLE
c70f321c6f added an extra check if KS
exists. This check can throw `data_dictionary::no_such_keyspace`
exception, which is supposed to be caught and a more user-friendly
exception should be thrown instead.
This commit fixes the above problem and adds a testcase to validate it
doesn't appear ever again.
Also, I moved the check for the keyspace outside of the `for` loop, as
it doesn't need to be checked repeatedly.

Fixes: scylladb/scylladb#20097

Closes scylladb/scylladb#20404
2024-09-09 13:30:57 +03:00
Nadav Har'El
ee7d4d8825 test/alternator: more extensive tests for GSI with two new key attributes
The case of a GSI with two key attributes (hash and range) which were both
not keys in the base table is a special case, not supported by CQL but
allowed in Alternator. We have several tests for this case, but they don't
cover all the strange possibilities that a GSI row disappears / reappears
when one or two of the attributes is updated / inserted / deleted.
So this patch includes a more extensive test for this case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 13:14:49 +03:00
Nadav Har'El
ad53d6a230 test/alternator: test invalid key types for GSI
This patch adds a test that types which are not allowed for GSI keys -
basically any type except S(tring), B(ytes) or N(number), are rejected
as expected - an error path that we didn't cover in existing tests.

The new test passes - Alternator doesn't have a bug in this area, and
as usual, also passes on DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 13:14:49 +03:00
Nadav Har'El
c4021d0819 test/alternator: test combination of LSI and GSI
To allow adding a GSI to an existing table (refs #11567), we plan to
re-implement GSIs to stop forcing their key attribute to become a real
column in the schema - and let it remains a member of the map ":attrs"
like all non-key attributes. But since LSIs can only be defined on table
creation time, we don't have to change the LSI implementation, and these
can still force their key to become a real column.

What the test in this patch does is to verify that using the same
attribute as a key of *both* GSI and LSI on the same table works.
There's a high risk that it won't work: After all, the LSI should force the
attribute to become a real column (to which base reads and writes go), but
the GSI will use a computed column which reads from ":attrs", no? Well,
it turns out that view.cc's value_getter::operator() always had a
surprising exception which "rescues" this test and makes it pass: Before
using a computed column, this code checks if a base-table column with the
same name exists, and if it does, it is used instead of the computed column!
It's not clear why this logic was chosen, but it turns out to be really
useful for making the test in this test pass. And it's important that if
we ever change that unintuitive behavior, we will have this test as a
regression test.

The new test unsurprisingly passes on current Scylla because its
implementation of GSI and LSI is still the same. But it's an important
regression test for when we change the GSI implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 13:14:49 +03:00
Nadav Har'El
7563d0a8a1 test/alternator: expand another test to use different write operations
Expand another Alternator test (test_gsi.py::test_gsi_missing_attribute)
to write items not just using PutItem, but also using UpdateItem and
BatchWriteItem. There is a risk that these different operations use
slightly different code paths - so better check all of them and not
just PutItem.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 13:14:49 +03:00
Nadav Har'El
4d02beec53 test/alternator: test GSIs with different key types
All of the tests in test/alternator/test_gsi.py use strings as the GSI's
keys. This tests a lot of GSI functionality, but we implicitly assumed that
our implementation used an already-correct and already-tested implementation
of key columns and MV, which if it works for one type, works for other types
as well.

This assumption will no longer hold if we reimplement GSI on a "computed
column" implementation, which might run different code for different types
of GSI key attributes (the supported types are "S"tring, "B"ytes, and
"N"umber).

So in this patch we add tests for writing and reading different types of
GSI key attributes. These tests showed their importance as regression
tests when the first draft of the GSI reimplementation series failed them.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 13:14:49 +03:00
Nadav Har'El
80a0798e77 alternator: better error message in some cases of key type mismatch
Alternator uses a common function get_typed_value() to read the values
of key attribute and confirm they have the expected type (key attributes
have a fixed type in the schema). If the type is wrong, we want to print
a "Type mismatch" error message.

But the current implementation did the checks in the wrong order, and
as a result could print a "Malformed value object" message instead of a
"Type mismatch". That could happen if the wrong type is a boolean, map,
list, or basically any type whose JSON representation is not a string.
The allowed key types - bytes), string and number - all have string
representations in JSON, but still we should first report the mismatched
type and only report the "Malformed object" if the type matches but the
JSON is faulty.

In addition to fixing the error message, we fix an existing test which
complained in a comment (but ignored) that the error message in some
case (when trying to use a map where a key is expected) the strange
"Malformed value object" instead of the expected "Type mismatch".

The next patch will add an additional reproducer for this problem and
its fix. That test will do:

```
    with pytest.raises(ClientError, match='ValidationException.*mismatch'):
        test_table_gsi_6.put_item(Item={'p': p, 's': True})
```
I.e., it tries to set a boolean value for a string key column, and
expect to get the "Type mismatch" error and not the ugly "Malformed
value object".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 13:14:49 +03:00
Nadav Har'El
624ed32278 test/alternator: test for more elaborate GSI updates
Most tests in test_gsi.py involve simple updates to a GSI, just
creating a GSI row. Although a couple of tests did involve more
complex operations (such as an update requiring deleting an old row
from the GSI and inserting a new one,), we did not have a single
organized test designed to check all these cases, so we add one in
this patch.

This test (test_update_gsi_pk) will be important for verifying
the low-level implementation of the new GSI implementation that
we plan to based on computed columns. Early versions of that code
passed many of the simpler tests, but not this one.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 13:14:49 +03:00
Nadav Har'El
65d4ddf093 test/alternator: strengthen tests for empty attribute values
We soon plan to refactor Alternator's GSI and change the validation of
values set in attributes which are GSI keys. It's important to test that
when updating attributes that are *not* GSI keys - and are either base-
table keys or normal non-key attributes - the validation didn't change.
For example, empty strings are still not allowed in base-table key
attributes, but are allowed (since May 2020 in DynamoDB) in non-key
attributes.

We did have tests in this area, but this patch strengthens them -
adding a test for non-key attribute, and expanding the key-attribute
test to cover the UpdateItem and BatchWriteItem operations, not just
PutItem.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 13:14:41 +03:00
Avi Kivity
9a5061209f Merge '[test.py] Enable allure for python test' from Andrei Chekun
To enhance the test reports UX:
1. switching off/on passed/failed/skipped test for better visibility
2. better searching in test results
3. understanding the trends of execution for each test
4. better configurability of the final report

Enable allure adapter for all python tests.
Add tags and parameters to the test to be able to distinguish them across modes and runs.

Related: https://github.com/scylladb/qa-tasks/issues/1665

Related: https://github.com/scylladb/scylladb/pull/19335

Related: https://github.com/scylladb/scylladb/pull/18169

Closes scylladb/scylladb#19942

* github.com:scylladb/scylladb:
  [test.py] Clean duplicated arg for test suite
  [test.py] Enable allure for python test
2024-09-09 12:53:00 +03:00
Nadav Har'El
5859daed68 test/alternator: fix typo in test_batch.py
Two tests had a typo 'item' instead of 'Item'. If Scylla had a bug, this
could have caused these tests to miss the bug.

Scylla passes also the fixed test, because Scylla's behavior is correct.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 12:09:25 +03:00
Nadav Har'El
1f8e39f680 test/alternator: more checks for GSI-key attribute validation
When an attribute is a GSI key, DynamoDB imposes certain rules when
writing values for it - it must be of the declared type for that key,
and can't be an empty string. We had tests for this, but all of them
did the write using the PutItem operation.

In this patch we also test the same things using the UpdateItem and
BatchWriteItem operations. Because Scylla has different code paths
for these three operations, and each code path needs to remember to
call the validation function, all three should all be checked and not just
PutItem.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 12:09:25 +03:00
Nadav Har'El
cf5d7ce212 Alternator: drop unneeded "IS NOT NULL" clauses in MV of GSI/LSI
Scylla's materialized views naturally skip any base rows where the view's
key isn't set (is NULL), because we can't create a view row with a null
key. To make the user aware that this is happening, the user is required
to add "WHERE ... IS NOT NULL" for the view's key columns when defining
the view. However, the only place that these extra IS NOT NULL clauses
are checked are in the CQL "CREATE MATERIALIZED VIEWS" statement - they
are completely ignored in all other places in the code.

In particular, when we create a materialized view in Alternator (GSI or
LSI), we don't have to add these "IS NOT NULL" clauses, as they are
outright ignored. We didn't know they were ignored, and made an effort
to add them - but no matter how incorrectly we did it, it didn't matter :-)
In commit 2bf2ffd3ed it turned out we had a
typo that caused the wrong column name to be printed. Also, even today we
are still missing base key columns that aren't listed as a view key in
Alternator but still added as view clustering keys in Scylla - and again
the fact these were missing also didn't matter. So I think it's time to
stop pretending, and stop calculating these "IS NOT NULL" strings, so
this patch outright removes them from the Alternator view-creation code.

Beyond being a nice cleanup of unnecessary and inaccurate code, it
will also be necessary when we allow in later patches to index for
an Alternator attribute "x" not a real column x in the base table but
rather an element in the ":attrs" map - so adding a "x IS NOT NULL" isn't
only unnecessary, it is outright illegal: The expression evaluation code,
even though it doesn't do anything with the "IS NOT NULL" expression,
still verifies that "x" is a valid column, which it isn't.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 12:09:25 +03:00
Nadav Har'El
8beaa9d10e test/alternator: add more checks for adding/deleting a GSI
We already have tests for the feature of adding or removing a GSI from
an existing table, which Alternator doesn't yet support (issue #11567).
In this patch we add another check, how after a GSI is added, you can
no longer add items with the wrong type for the indexed type, and after
removing a GSI, you can. The expanded tests pass on DynamoDB, and
obviously still xfail on Alternator because the feature is not yet
implemented.

Refs #11567.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 12:09:25 +03:00
Nadav Har'El
ce19311ab3 test/alternator: ensure table deletions in test_gsi.py
Most of the Alternator tests are careful to unconditionally remove the test
tables, even if the test fails. This is important when testing on a shared
database (e.g., DynamoDB) but also useful to make clean shutdown faster
as there should be no user table to flush.

We missed a few such cases in test_gsi.py, and fixed some of them in
commit 59c1498338 but still missed a few,
and this patch fixes some more instances of this problem.
We do this by using the context manager new_test_table() - which
automatically deletes the table when done - instead of the function
create_test_table() which needs an explicit delete at the end.

There are no functional changes in this patch - most of the lines
changed are just reindents.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-09-09 12:09:25 +03:00
Kefu Chai
ccbd3eb9f7 main: do not register redis and alternator services if not enabled
in main.cc, we start redis with `ss.local().register_protocol_server()`
only if it is enabled. but `storage_service` always calls
`stop_server()` with _all_ registered server, no matter if they have
started or not. in general, it does not hurt. for instance,

`redis::controller::stop_server()` is a noop, if the controller
is not started. but `storage_service` still print the logging message
like:
```
INFO  2024-09-04 11:20:02,224 [shard 0:main] storage_service - Shutting down redis server
INFO  2024-09-04 11:20:02,224 [shard 0:main] storage_service - Shutting down redis server was successful
```

this could be confusing or at least distracting when a field engineer
looks at the log. also, please note, `redis_port` and `redis_ssl_port`
cannot be changed dynamically once scylla server is up, so we do not
need to worry about "what if the redis server is started at runtime,
how can is be stopped?".

the same applies to alternator service.

in this change, to avoid surprises, we conditionally register the
protocol servers with the storage service based on their enabled statuses.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20472
2024-09-09 08:44:50 +03:00
Avi Kivity
58713f3080 types: remove some unused free functions
These functions are unused, so safe to remove, and reduce the work
to convert to managed_bytes{,_view}.

Closes scylladb/scylladb#20482
2024-09-09 08:36:33 +03:00
Kefu Chai
720997d1de cql3/statements: mark format string as constexpr const
after switching over to the new `seastar::format()` which enables
the compile-time format check, the fmt string should be a constexpr,
otherwise `fmt::format()` is not able to perform the check at compile
time.

to prepare for bumping up the seastar module to a version which
contains the change of `seastar::format()`, let's mark the format
string with `constexpr const`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20484
2024-09-09 08:35:45 +03:00
Piotr Dulikowski
6f3d0af994 test: topology_custom/test_hints: consistency test for decommission
Adds the test_hints_consistency_during_decommission test which
reproduces the failure observed in scylladb/scylla-dtest#4582. It uses
error injections, including the newly added
topology_coordinator_pause_after_streaming injection, to reliably
orchestrate the scenario observed there.

In a nutshell, the test makes sure to replay hints after streaming
during decommission has finished, but before the cluster switches to
reading from new replicas. Without the fix, hints would be replayed to
the decommissioned node and then would be lost forever after the cluster
start reading from new replicas.
2024-09-08 10:51:38 +02:00
Piotr Dulikowski
30d53167c9 test: topology_custom/test_hints: move sync point helpers to top level
Move create_sync_point and await_sync_point from the scope of the
test_sync_point test to the file scope. They will be used in a test that
will be introduced in the commit that follows.
2024-09-08 10:51:38 +02:00
Piotr Dulikowski
a75d0c0bfa test: topology/util: extract find_server_by_host_id
Move it out from test_mv_tablets_replace.py. It will be used by a test
introduced in a later commit.
2024-09-08 10:51:38 +02:00
Piotr Dulikowski
61ac0a336d hints: send hints with CL=ALL if target is leaving
Currently, when attempting to send a hint, we might choose its
recipients in one of two ways:

- If the original destination is a natural endpoint of the hint, we only
  send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.

There is a problem when we decommission a node: while data is streamed
away from that node, it is still considered to be a natural endpoint of
the data that it used to own. Because of that, it might happen that a
hint is sent directly to it but streaming will miss it, effectively
resulting in the hint being discarded.

As sending the hint _only_ to the leaving replica is a rather bad idea,
send the hint to all replicas also in the case when the original
destiantion of the hint is leaving.

Note that this is a conservative fix written only with the decommission
+ vnode-based keyspaces combo in mind. In general, such "data loss" can
occur in other situations where the replica set is changing and we go
through a streaming phase, i.e. other topology operations in case of
vnodes and tablet load balancing. However, the consistency guarantees of
hinted handoff in the face of topology changes are not defined and it is
not clear what they should be, if there should be any at all. The
picture is further complicated by the fact that hints are used by
materialized views, and sending view updates to more replicas than
necessary can introduce inconsistencies in the form of "ghost rows".
This fix was developed in response to a failing test which checked the
hint replay + decommission scenario, and it makes it work again.

Fixes scylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835
2024-09-08 10:50:59 +02:00
Piotr Dulikowski
8abb06ab82 hints: inline do_send_one_mutation
It's a small method and it is only used once in send_one_mutation.
Inlining it lets us get rid of its declaration in the header - now, if
one needs to change the variables passed from one function to another,
it is no longer necessary to change the header.
2024-09-08 07:19:35 +02:00
Avi Kivity
ab32ce6b45 Merge 'Coroutinize sstable::read_summary() method' from Pavel Emelyanov
Shorter and simpler this way. Hopefully it doesn't sit on critical paths

Closes scylladb/scylladb#20460

* github.com:scylladb/scylladb:
  sstables: Fix indentation after previous patch
  sstables: Coroutinize sstable::read_summary()
2024-09-06 18:45:54 +03:00
Kefu Chai
aeaeaf345d compaction: use structured binding when appropriate
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20473
2024-09-06 18:17:48 +03:00
Kamil Braun
427ad2040f Merge 'test: randomized failure injection for Raft-based topology' from Evgeniy Naydanov
The idea of the test is to have a cluster where one node is stressed with injections and failures and the rest of the cluster is used to make progress of the raft state machine.

To achieve this following two lists introduced in the PR:

  - ERROR_INJECTIONS in error_injections.py
  - CLUSTER_EVENTS in cluster_events.py

Each cluster event is an async generator which has 2 yields and should be used in the following way:

   0. Start the generator:
       ```python
       >>> cluster_event_steps = cluster_event(manager, random_tables, error_injection)
       ```

   1. Run the prepare part (before the first yield)
       ```python
       >>> await anext(cluster_event_steps)
       ```

   2.  Run the cluster event itself (between the yields)
       ```python
       >>> await anext(cluster_event_steps)
       ```

   3. Run the check part (after the second yield)
       ```python
       >>> await anext(cluster_event, None)
       ```

Closes scylladb/scylladb#16223

* github.com:scylladb/scylladb:
  test: randomized failure injection for Raft-based topology
  test: error injections for Raft-based topology
  [test.py] topology.util: add get_non_coordinator_host() function
  [test.py] random_tables: add UDT methods
  [test.py] random_tables: add CDC methods
  [test.py] api: get scylla process status
  [test.py] api: add expected_server_up_state argument to server_add()
2024-09-06 14:00:41 +02:00
Pavel Emelyanov
226fd03bae Merge 'service/qos: remove unused marked_for_deletion field from service_level struct' from Piotr Dulikowski
The `service_level::marked_for_deletion` field is always set to `false`. It might have served some purpose in the past, but now it can be just removed, simplifying the code and eliminating confusion about the field.

This is just code cleanup, no backport is needed.

Closes scylladb/scylladb#20452

* github.com:scylladb/scylladb:
  service/qos: remove the marked_for_deletion parameter
  service/qos: add constructors to service_level
2024-09-06 11:44:25 +03:00
Kamil Braun
52fdf5b4c9 test: test_raft_no_quorum: increase raft timeout in debug mode
The test cases in this file use an error injection to reduce raft group
0 timeouts (from the default 1 minute), in order to speed up the tests;
the scenarios expect these timeouts to happen, so we want them to happen
as quick as possible, but we don't want to reduce timeouts so much that
it will make other operations fail when we don't expect them to (e.g.
when the test wants to add a node to the cluster).

Unfortunately the selected 5 seconds in debug mode was not enough and
made the tests flaky: scylladb/scylladb#20111.

Increase it to 10 seconds. This unfortunately will slow down these tests
as they have to sometimes wait for 10 seconds for the timeout to happen.
But better to have this than a flaky test.

Fixes: scylladb/scylladb#20111

Closes scylladb/scylladb#20320
2024-09-06 11:40:09 +03:00
Avi Kivity
384a09585b repair: row_level: repair_get_row_diff_with_rpc_stream_process_op: simplify return value
During review of 0857b63259 it was noticed that the function

  repair_get_row_diff_with_rpc_stream_process_op()

and its _slow_path callee only ever return stop_iteration::no (or throw
an exception). As such, its return value is useless, and in fact the
only caller ignores it. Simplify by returning a plain future<>.

Closes scylladb/scylladb#20441
2024-09-06 11:39:21 +03:00
Kefu Chai
034c1df29b auth/authentication_options: move fmt::formatter up
so that it is accessible from its caller. if we enforce the
compile-time format string check, the formatter would need the access to
the specialization of `fmt::formatter` of the arguments being foramtted.
to be prepared for this change, let's move the `fmt::formatter`
specialization up, otherwise we'd have following error after switching
to the compile-time format string check introduced by a recent seastar
change:

```
In file included from ./auth/authenticator.hh:22:                                                                                                             ./auth/authentication_options.hh:50:49: error: call to consteval function 'fmt::basic_format_string<char, auth::authentication_option &>::basic_format_string<
char[32], 0>' is not a constant expression
   50 |             : std::invalid_argument(fmt::format("The {} option is not supported.", k)) {
      |                                                 ^                                                                                                     ./auth/authentication_options.hh:57:13: error: explicit specialization of 'fmt::formatter<auth::authentication_option>' after instantiation
   57 | struct fmt::formatter<auth::authentication_option> : fmt::formatter<string_view> {
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/base.h:1228:17: note: implicit instantiation first required here
 1228 |     -> decltype(typename Context::template formatter_type<T>().format(
      |                 ^
In file included from replica/distributed_loader.cc:30:
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20447
2024-09-06 09:12:38 +03:00
Pavel Emelyanov
527fc9594a sstables: Fix indentation after previous patch
And move the comment inside if while at it, it looks better in there
(and makes less churn in the patch itself)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-06 08:43:08 +03:00
Pavel Emelyanov
f7325586f3 sstables: Coroutinize sstable::read_summary()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-06 08:43:07 +03:00
Evgeniy Naydanov
dd99cf197d test: randomized failure injection for Raft-based topology
The idea of the test is to have a small cluster, where one node
is stressed with injections and failures and the rest of
the cluster is used to make progress of the Raft state machine.

To achieve this following two lists introduced in the commit:

  - ERROR_INJECTIONS in error_injections.py
  - CLUSTER_EVENTS in cluster_events.py

Each cluster event is an async generator which has 2 yields and should be
used in the following way:

   0. Start the generator:
       >>> cluster_event_steps = cluster_event(manager, random_tables, error_injection)

   1. Run the prepare part (before the first yield)
       >>> await anext(cluster_event_steps)

   2. Run the cluster event itself (between the yields)
       >>> await anext(cluster_event_steps)

   3. Run the check part (after the second yield)
       >>> await anext(cluster_event, None)
2024-09-05 22:11:32 +00:00
Evgeniy Naydanov
769424723b test: error injections for Raft-based topology
Add following error injections:
 - stop_after_init_of_system_ks
 - stop_after_init_of_schema_commitlog
 - stop_after_starting_gossiper
 - stop_after_starting_raft_address_map
 - stop_after_starting_migration_manager
 - stop_after_starting_commitlog
 - stop_after_starting_repair
 - stop_after_starting_cdc_generation_service
 - stop_after_starting_group0_service
 - stop_after_starting_auth_service
 - stop_during_gossip_shadow_round
 - stop_after_saving_tokens
 - stop_after_starting_gossiping
 - stop_after_sending_join_node_request
 - stop_after_setting_mode_to_normal_raft_topology
 - stop_before_becoming_raft_voter
 - topology_coordinator_pause_after_updating_cdc_generation
 - stop_before_streaming
 - stop_after_streaming
 - stop_after_bootstrapping_initial_raft_configuration
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
ac4ffbad5c [test.py] topology.util: add get_non_coordinator_host() function
Add get_non_coordinator_host() function which returns
ServerInfo for the first host which is not a coordinator
or None if there is no such host.

Also rework get_coordinator_host() to not fail if some
of the hosts don't have a host id.
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
d95d698601 [test.py] random_tables: add UDT methods
Add .add_udt() / .drop_udt() methods.
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
8cb442ca50 [test.py] random_tables: add CDC methods
Add .enabled_cdc() / .disable_cdc() methods.
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
a7119cf420 [test.py] api: get scylla process status
Add `server_get_process_status(server_id)` API call and
wait_for_scylla_process_status() helper function.
2024-09-05 22:11:31 +00:00
Evgeniy Naydanov
241bbb4172 [test.py] api: add expected_server_up_state argument to server_add()
Allow to return from server_add() when a server reaches specified state.
One of:
 - PROCESS_STARTED
 - HOST_ID_QUERIED (previously called NOT_CONNECTED)
 - CQL_CONNECTED (renamed from CONNECTED)
 - CQL_QUERIED (was just QUERIED)

Also, rename CqlUpState to ServerUpState and move to internal_types.
2024-09-05 22:11:31 +00:00
Pavel Emelyanov
f02a686115 schema: Ditch make_shared_schema() helper
Now it's unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 19:34:00 +03:00
Pavel Emelyanov
d045aa6df7 test: Tune up indentation in uncompressed_schema()
After it was switched to use schema builder, the indenation of untouched
lines deserves one extra space.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 19:33:29 +03:00
Pavel Emelyanov
a1deba0779 test: Make tests use schema_builder instead of make_shared_schema
Everything, but perf test is straightforward switch.

The perf-test generated regular columns dynamically via vector, with
builder the vector goes away.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 19:31:30 +03:00
Avi Kivity
c57b8dd0bf repair: row_level: restore indentation 2024-09-05 18:38:43 +03:00
Avi Kivity
710977ef88 repair: row_level: coroutinize repair_service::insert_repair_meta()
Some of the indentation was broken, and is partially repaired by this
change.
2024-09-05 17:59:42 +03:00
Avi Kivity
f23a32ed84 repair: row_level: coroutinize repair_meta::get_full_row_hashes() 2024-09-05 17:56:27 +03:00
Avi Kivity
607747beb1 repair: row_level: coroutinize repair_meta::apply_rows_on_follower() 2024-09-05 17:55:07 +03:00
Avi Kivity
89d4394d12 repair: row_level: coroutinize repair_meta::clear_working_row_buf() 2024-09-05 17:52:32 +03:00
Pavel Emelyanov
69a5ec69c4 test: Use table storage options in sstable_directory_test
When creating sstables this test allocates temporary local options.
That works, because this test doesn't run on object storage, but it's
more correct to pick storage options from the table at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20440
2024-09-05 17:48:25 +03:00
Avi Kivity
4cfc25f8d7 repair: row_level: coroutinize get_common_diff_detect_algorithm()
The function is threaded, but the inner lambda can be coroutinized.
2024-09-05 17:47:27 +03:00
Michael Litvak
9545e0a114 view: test view_build_status table with node replace
Add a test replacing a node and verifying the contents of the
view_build_status table are updated as expected, having rows for the new
node and no rows for the old node.
2024-09-05 15:42:35 +03:00
Michael Litvak
3ca5dd537f test/pylib: use view_build_status_v2 table in wait_for_view
Change the util function wait_for_view to read the view build status
from the system.view_build_status_v2 table which replaces
system_distributed.view_build_status.
The old table can still be used but it is less efficient because it's
implemented as a virtual table which reads from the v2 table, so it's
better to read directly from the v2 table. This can cause slowness in
tests.
The additional util function wait_for_view_v1 reads from the old table.
This may be needed in upgrade tests if the v2 table is not available
yet.
2024-09-05 15:42:35 +03:00
Michael Litvak
5c95aaae0d view_builder: common write view_build_status function
When writing to the view_build_status we have common logic related to
upgrade and deciding whether to write to sys_dist ks or group0.
Move this common logic to a generic function used by all functions
writing to the table.
2024-09-05 15:42:35 +03:00
Michael Litvak
c1f3517a75 view_builder: improve migration to v2 with intermediate phase
Add an intermediate phase to the view builder migration to v2 where we
write to both the old and new table in order to not lose writes during
the migration.
We add an additional view builder version v1_5 between v1 and v2 where
we write to both tables. We perform a barrier before moving to v2 to
ensure all the operations to the old table are completed.
2024-09-05 15:42:35 +03:00
Michael Litvak
446ad3c184 view: delete node rows from view_build_status on node removal
When a node is removed we want to clean its rows from the
view_build_status table.
Now when removing a node and generating the topology state update, we
generate also the mutations to delete all the possible rows belonging to
the node from the table.
2024-09-05 15:42:35 +03:00
Michael Litvak
08462aaff7 view: sanitize view_build_status during migration
When migrating the view_build_status to v2, skip adding any leftover
rows that don't correspond to an existing node or an existing view.

Previously such rows could have been created and not cleaned, for
example when a node is removed.
2024-09-05 15:42:35 +03:00
Michael Litvak
78d6ff6598 view: make old view_build_status table a virtual table
After migrating the view build status from
system_distributed.view_build_status to system.view_build_status_v2, we
set system_distributed.view_build_status to be a virtual table, such
that reading from it is actually reading from the underlying new table.

The reason for this is that we want to keep compatibility with the old
table, since it exists also in Cassandra and it is used by various external
tools to check the view build status. Making the table virtual makes the
transition transparent for external users.

The two tables are in different keyspaces and have different shard
mapping. The v1 table is a distributed table with a normal shard
mapping, and the v2 table is a local table using the null sharder. The
virtual reader works by constructing a multishard reader which reads the rows
from shard zero, and then filtering it to get only the rows owned by the
current shard.
2024-09-05 15:42:35 +03:00
Michael Litvak
09eadcff08 replica: move streaming_reader_lifecycle_policy to header file
move the class streaming_reader_lifecycle_policy to a header file in
order to make it reusable in other places.
2024-09-05 15:42:35 +03:00
Michael Litvak
22f4f1fa49 view_builder: test view_build_status_v2
Add tests to verify the new view_build_status_v2 is used by the
view_builder and can be read from all nodes with the expected values.
Also test a migration from the v1 layout to v2.
2024-09-05 15:42:35 +03:00
Michael Litvak
fcf66ad541 storage_service: add view_build_status to raft snapshot
Include the table system.view_build_status_v2 in the raft snapshot, and
also the view_builder version parameter.
2024-09-05 15:42:30 +03:00
Michael Litvak
8d25a4d678 view_builder: migration to v2
Migrate view_builder to v2, to store the view build status of all nodes
in the group0 based table view_build_status_v2.

Introduce a feature view_build_status_on_group0 so we know when all
nodes are ready to migrate and use the new table.

A new cluster is initialized to use v2. Otherwise, The topology coordinator
initiates the migration when the feature is enabled, if it was not done
already.

The migration reads all the rows in the v1 table and writes it via
group0 to the v2 table, together with a mutation that updates the
view_builder parameter in scylla_local to v2. When this mutation is
applied, it updates the view_builder service to start using the v2
table.
2024-09-05 15:41:04 +03:00
Michael Litvak
f3887cd80b db:system_keyspace: add view_builder_version to scylla_local
Add a new scylla_local parameter view_builder_version, and functions to
read and mutate the value.
The version value defaults to v1 if it doesn't exist in the table.
2024-09-05 15:41:04 +03:00
Michael Litvak
d58a8930c4 view_builder: read view status from v2 table
Update the view_status function to read from the new
view_build_status_v2 table when enabled.
The code to read and extract the values is identical to v1 and v2 except it
accesses different keyspace and table, so the common code is extracted
to the view_status_common function and used by both v1 and v2 flows with
appropriate parameters.
2024-09-05 15:41:04 +03:00
Michael Litvak
05d18b818f view_builder: introduce writing status mutations via raft
Introduce the announce_with_raft function as alternative to writing view build
status mutations to the table in system_distributed. Instead, we can
apply the mutations via group0 operation to the view_build_status_v2
table.
All the view_builder functions that write to the view_build_status table
can be configured by a flag to either write the legacy way or via raft.
2024-09-05 15:41:04 +03:00
Michael Litvak
b8c7a10ae6 view_builder: pass group0_client and qp to view_builder
Store references of group0_client and query_processor in the
view_builder service.
They are required for generating mutations and writing them via group0.
2024-09-05 15:41:04 +03:00
Michael Litvak
b2332c5a72 view_builder: extract sys_dist status operations to functions
Extract all the update and read operations of a view build status in the
table system_distributed.view_build_status to separate functions.
2024-09-05 15:41:04 +03:00
Michael Litvak
bf4a58bf91 db:system_keyspace: add view_build_status_v2 table
add the table system.view_build_status_v2 with the same schema as
system_distributed.view_build_status.
2024-09-05 15:41:04 +03:00
Gleb Natapov
807e37502a db/consistency_level: do not use result from heat weighted load balancer if it contains duplicates
Because of https://github.com/scylladb/scylladb/issues/9285 heat weighted
load balancer may sometimes return same node twice. It may cause wrong
data to be read or unexpected errors to be returned to a client. Since
the original bug is not easy to fix and it is rare lets introduce a
workaround. We will check for duplicates and will use non HWLB one if
one is found.

Fixes scylladb/scylladb#20430

Closes scylladb/scylladb#20414
2024-09-05 15:21:35 +03:00
Wojciech Mitros
c1b0434c16 test: finish mv view update explicitly instead of relying on delay duration
When testing mv admission control, we perform a large view update
and check if the following view update can be admitted due to the
high view backlog usage. We rely on a delay which keeps the backlog
high for longer to make sure the backlog is still increased during
the second write. However, in some test runs the delay is not long
enough, causing the second write to miss the large backlog and not
hit admission control.

In this patch we keep the increased backlog high using another
injection instead of relying on a delay to make absolute sure
that the backlog is still high during the second write.

Fixes scylladb/scylladb#20382

Closes scylladb/scylladb#20445
2024-09-05 15:08:04 +03:00
Lakshmi Narayanan Sreethar
7c5efab7d5 cql-pytest: add test to verify consider_only_existing_data compaction option
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:34:13 +05:30
Lakshmi Narayanan Sreethar
68a902f74a tools/scylla-nodetool: add consider-only-existing-data option to compact command
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:34:06 +05:30
Lakshmi Narayanan Sreethar
84d06a13c7 api: compaction: add consider_only_existing_data option
Added a new parameter `consider_only_existing_data` to major compaction
API endpoints. When enabled, major compaction will:

- Force-flush all tables.
- Force a new active segment in the commit log.
- Compact all existing SSTables and garbage-collect tombstones by only
  checking the SSTables being compacted. Memtables, commit logs, and
  other SSTables not part of the compaction will not be checked, as they
  will only contain newer data that arrived after the compaction
  started.

The `consider_only_existing_data` is passed down to the compaction
descriptor's `gc_check_only_compacting_sstables` option to ensure that
only the existing data is considered for garbage collection.

The option is also passed to the `maybe_flush_commitlog` method to make
sure all the tables are flushed and a new active segment is created in
the commit log.

Fixes #19728

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
98bc44f900 compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp
When gc_check_only_compacting_sstables is enabled,
get_max_purgeable_timestamp should not check memtables and other
sstables that are not part of the compaction to deduce the max purgeable
timestamp.

Refs #19728

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
7b9ce8e040 compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled
When the compaction_descriptor's gc_check_only_compacting_sstables flag
is enabled, create and pass a copy of the get_tombstone_gc_state that
will skip checking the commitlog.

Refs #19728

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
12fa40154b tombstone_gc_state: introduce with_commitlog_check_disabled()
Added a new method, `with_commitlog_check_disabled`, that returns a new
copy of the tombstone_gc_state but with commitlog check disabled. This
will be used by a following patch to disable commitlog checks during
compaction.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
5b8c6a8a5e compaction: introduce new option to check only compacting sstables for gc
Added new option, `gc_check_only_compacting_sstables`, to
compaction_descriptor to control the garbage collection behavior. The
subsequent patches will use this flag to decide if the garbage
collection has to check only the SSTables being compacted to collect
tombstones. This option is disabled for now and will be enabled based on
a new compaction parameter that will be added later in this patch
series.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
5e6bffc146 compaction: rename maybe_flush_all_tables to maybe_flush_commitlog
Major compaction flushes all tables as a part of flushing the commitlog.
After forcing new active segments in the commitlog, all the tables are
flushed to enable reclaim of older commitlog segments. The main goal is
to flush the commitlog and flushing all the table is just a dependency.

Rename maybe_flush_all_tables to maybe_flush_commitlog so that it
reflects the actual intent of the major compaction code. Added a new
wrapper method to database::flush_all_tables(),
database::flush_commitlog(), that is now called from
maybe_flush_commitlog.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar
fa2488cc83 compaction: maybe_flush_all_tables: add new force_flush param
Add a new parameter, `force_flush` to the maybe_flush_all_tables()
method. Setting `force_flush` to true will flush all the tables
regardless of when they were flushed last. This will be used by the new
compaction option in a following patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-09-05 17:25:45 +05:30
Laszlo Ersek
53524974db docs/dev/maintainer.md: clarify "Updating submodule references"
Before the introduction of "scripts/refresh-submodules.sh", there was
indeed some manual work for the maintainer to do, hence "publish your
work" must have sounded correct. Today, the phrase "publish your work"
sounds confusing.

Commit 71da4e6e79 ("docs: Document sync-submodules.sh script in
maintainer.md", 2020-06-18) should have arguably reworded the last step of
the submodule refresh procedure; let's do it now.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>

Closes scylladb/scylladb#20333
2024-09-05 13:57:32 +03:00
Pavel Emelyanov
1f0db29ef6 test: Remove unused directory semaphore
The with_sstable_dir() helper no longer needs one, it used to pass it as
argument to sstable_directory constructor, but now the directory doesn't
need it (takes semaphore via table object).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20396
2024-09-05 13:11:35 +03:00
Kefu Chai
b4fc24cc1f github: use needs.read-toolchain.outputs.image for build-scylla
so we don't need to hardwire the image on which we build scylla.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20370
2024-09-05 12:58:36 +03:00
Pavel Emelyanov
955391d209 sstable_directory: Fix indentation after previous patches
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
2febde24f3 sstable_directory: Use yielding lister in .handle_sstables_pending_delete()
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
02aac3e407 sstable_directory: Use yielding lister in .cleanup_column_family_temp_sst_dirs()
Indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
ff77a677a6 sstable_directory: Use yielding lister in .prepare()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
7b5fe6bee6 sstable_directory: Shorten lister loop
Squash call to lister.get() and check for the returned value into
while()'s condition. This saves few more lines of code as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
5dc266cefa sstable_directory: Use with_closeable() in .process()
The method already uses yielding lister, but handles the exceptions
explicitly. Use with_closeable() helper, it makes the code shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:19:19 +03:00
Pavel Emelyanov
7742b90cb1 directory_lister: Add noexcept default move-constructor
It's required to make it possible to push lister into with_closeable().
Its requiremenent of nothrow-move-constructible doesn't accept
default-generated one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 11:10:21 +03:00
Nikos Dragazis
2450afb934 sstables: Replace assert with on_internal_error
The `skip()` method of the compressed data source implementation uses an
assert statement to check if the given offset is valid.

Replace this with `on_internal_error()` to fail gracefully. An invalid
offset shouldn't bring the whole server down.

Also, enhance the error message for unsynced compressed readers.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-09-05 11:03:54 +03:00
Pavel Emelyanov
da598a6210 test: Restore indentation after previous changes
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:38:01 +03:00
Pavel Emelyanov
e16c07c896 test: Threadify tombstone_in_tombstone2()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
28d016f312 test: Threadify range_tombstone_reading()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
7d567d07ad test: Threadify tombstone_in_tombstone()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
a34e38f070 test: Threadify broken_ranges_collection()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
eac4ec47f8 test: Threadify compact_storage_dense_read()
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
322c1ee9c5 test: Threadify compact_storage_simple_dense_read()
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
df71b3e446 test: Threadify compact_storage_sparse_read()
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
142ccc64fb test: Simplify test_range_reads() counting
It used to keep counter with the help of a smart pointer, now it can
just use on-stack variable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
a78ab2e998 test: Simplify test_range_reads() inner loop
It used to rely on bool (wrapped with pointer) and future<>-based loop
helper, now it can just break from the while loop.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
c84ae64562 test: Threadify test_range_reads() itself
And update its callers again.
Preserve no longer relevant local smart pointers until next patch.
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:33 +03:00
Pavel Emelyanov
253d53b6a1 test: Threadify test_range_reads() callers
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:36:00 +03:00
Pavel Emelyanov
fd8bb0c46c test: Threadify generate_clustered() itself
And update its callers again.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:35:59 +03:00
Pavel Emelyanov
f500ee690b test: Threadify generate_clustered() callers
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:34:54 +03:00
Pavel Emelyanov
08186c048d test: Threadify test_no_clustered test
And update its callers.
Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:26:25 +03:00
Pavel Emelyanov
5f0a40f959 test: Threadify nonexistent_key test
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 10:26:13 +03:00
Pavel Emelyanov
a150a63259 test: Squash two open_sstables() helper together
One accepts integer generations, another one accepts "generic" ones. The
latter is only called by the former, so no sense in keeping it around.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 09:08:40 +03:00
Pavel Emelyanov
4184c688ea test: Coroutinize open_sstables() helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-05 09:08:12 +03:00
Piotr Dulikowski
ecd53db3b0 service/qos: remove the marked_for_deletion parameter
It is always set to false and it doesn't seem to serve any function now.
2024-09-04 21:52:34 +02:00
Piotr Dulikowski
bae6076541 service/qos: add constructors to service_level
Add a default constructor and a constructor which explicitly
initializes all fields of the service_level structure.

This is done in order to make sure that removal of the
marked_for_deletion field can be done safely - otherwise, for example,
service_level could be aggregate-initialized with an incomplete list of
values for the fields, and removing marked_for_deletion which is in the
middle of the struct would cause the is_static field to be initialized
with the value that was designated for marked_for_deletion.

As a bonus, make sure that marked_for_deletion and is_static bool fields
are initialized in the default constructor to false in order to avoid
potential undefined behavior.
2024-09-04 21:52:13 +02:00
Avi Kivity
ec8590ae6c Merge 'Always pass abort_source& to raft_group0_client::hold_read_apply_mutex' from Kamil Braun
There are two versions of `raft_group0_client::hold_read_apply_mutex`, one takes `abort_source&`, the other doesn't. Modify all call sites that used the non-abort-source version to pass an `abort_source&`, allowing us to remove the other overload.

If there is no explicit reason not to pass an `abort_source&`, then one should be passed by default -- it often prevents hangs during shutdown.

---

No backport needed -- no known issues affected by this change.

Closes scylladb/scylladb#19996

* github.com:scylladb/scylladb:
  raft_group0_client: remove `hold_read_apply_mutex` overload without `abort_source&`
  storage_service: pass `_abort_source` to `hold_read_apply_mutex`
  group0_state_machine: pass `_abort_source` to `hold_read_apply_mutex`
  api: move `reload_raft_topology_state` implementation inside `storage_service`
2024-09-04 21:35:27 +03:00
Kefu Chai
fe0e961856 docs: do not install scylla/ppa repo when perform upgrade
for following reasons:

1. the ppa in question does not provide the build for the latest ubuntu's LTS release. it only builds for trusty, xenial, bionic and jammy. according to https://wiki.ubuntu.com/Releases, the latest LTS release is ubuntu noble at the time of writing.
2. the ppa in question does not provide the packages used in production. it does provides the package for *building* scylla
3. after we introduced the relocatable package, there is no need to provide extra user space dependencies apart from scylla packages.

so, in this change, we remove all references to enabling the Scylla/PPA repository.

Fixes scylladb/scylladb#20449

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20450
2024-09-04 20:30:40 +03:00
Avi Kivity
20b79816f1 repair: row_level: coroutinize repair_service::remove_repair_meta() (non-selective overload) 2024-09-04 18:43:19 +03:00
Avi Kivity
3b9ac51b6b repair: row_level: coroutinize repair_service::remove_repair_meta() (by-address overload) 2024-09-04 18:39:21 +03:00
Avi Kivity
704e3f5432 repair: row_level: coroutinize repair_service::remove_repair_meta() (by-id overload) 2024-09-04 18:37:48 +03:00
Avi Kivity
9612c4d790 repair: row_level: row_level_repair::run()
The function itself is threaded, but the inner lambdas are coroutinized
(except one which is expected to run in a thread, and so is threaded).
2024-09-04 18:34:45 +03:00
Avi Kivity
2b94ee981b repair: row_level: row_level_repair::send_missing_rows_to_follower_nodes()
The function itself is threaded, but the inner lambda is coroutinized.
2024-09-04 18:28:27 +03:00
Avi Kivity
c768448339 repair: row_level: row_level_repair::get_missing_rows_from_follower_nodes()
The function itself is threaded, but the inner lambda is coroutinized.
2024-09-04 18:28:12 +03:00
Avi Kivity
d2f1b44487 repair: row_level: row_level_repair::negotiate_sync_boundary()
The function itself is threaded, but the inner lambda is coroutinized.
2024-09-04 18:21:39 +03:00
Kefu Chai
0756520f82 sstable: coroutinize sstable::seal_sstable()
for better readability.

presumably, `sstable::seal_sstable()` is not on the critical path,
and we don't need to worry about the overhead of using C++20 coroutine.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20410
2024-09-04 18:14:33 +03:00
Kefu Chai
88c5c3001a compaction: refactor compaction_manager::can_proceed()
instead of chaining the conditions with '&&', break them down.
for two reasons:

* for better readability: to group the conditions with the same
  purpose together
* so we don't look up the table twice. it's an anti-pattern of
  using STL, and it could be confusing at first glance.

this change is a cleanup, so it does not change the behavior.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20369
2024-09-04 18:12:29 +03:00
Avi Kivity
645e39e746 repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_process_op()
Both the outer function and the inner lambda are coroutinized.
2024-09-04 18:10:43 +03:00
Avi Kivity
4c05d0b965 repair: row_level: coroutinize repair_meta::get_sync_boundary_handler() 2024-09-04 15:33:40 +03:00
Avi Kivity
eea011fad5 repair: row_level: coroutinize repair_meta::get_sync_boundary()
Not really helping anything, but a coroutine is a safer platform for
future changes in administrative APIs.
2024-09-04 15:31:57 +03:00
Avi Kivity
91b88df956 repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions_handler() 2024-09-04 15:20:53 +03:00
Avi Kivity
b73194c9bf repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions()
Not really helping anything, but a coroutine is a safer platform for
future changes in administrative APIs.
2024-09-04 15:18:33 +03:00
Avi Kivity
a69fb626bd repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions_handler() 2024-09-04 15:17:42 +03:00
Avi Kivity
5cd8207ac7 repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions()
Not really helping anything, but a coroutine is a safer platform for
future changes in administrative APIs.
2024-09-04 15:16:32 +03:00
Avi Kivity
e108f867a9 repair: row_level: coroutinize repair_meta::repair_row_level_stop_handler() 2024-09-04 15:15:42 +03:00
Avi Kivity
ffbb973063 repair: row_level: coroutinize repair_meta::repair_row_level_stop()
Not really helping anything, but a coroutine is a safer platform for
future changes in administrative APIs.
2024-09-04 15:14:08 +03:00
Avi Kivity
587b6fe400 repair: row_level: coroutinize repair_meta::repair_row_level_start_handler() 2024-09-04 15:12:49 +03:00
Avi Kivity
db7b1014ff repair: row_level: coroutinize repair_meta::repair_row_level_start() 2024-09-04 15:10:45 +03:00
Avi Kivity
17b82265ae repair: row_level: coroutinize repair_meta::get_combined_row_hash_handler() 2024-09-04 15:08:58 +03:00
Avi Kivity
bacbdde791 repair: row_level: coroutinize repair_meta::get_combined_row_hash() 2024-09-04 15:07:27 +03:00
Avi Kivity
8b8dc5092f repair: row_level: coroutinize repair_meta::get_full_row_hashes_handler() 2024-09-04 15:05:28 +03:00
Avi Kivity
21e01990ff repair: row_level: coroutinize repair_meta::get_full_row_hashes_with_rpc_stream()
The when_all_succeed() call is changed to the safer coroutine::when_all(),
which avoids the temporary futures.
2024-09-04 15:03:00 +03:00
Avi Kivity
572fbfde09 repair: row_level: coroutinize repair_meta::request_row_hashes() 2024-09-04 14:07:59 +03:00
Nadav Har'El
15f8046fcb alternator ttl: fix use-after-free
The Alternator TTL scanning code uses an object "scan_ranges_context"
to hold the scanning context. One of the members of this object is
a service::query_state, and that in turn holds a reference to a
service::client_state. The existing constructor created a temporary
client_state object and saved a reference to it - which can result
in use after free as the temporary object is freed as soon as the
constructor ends.

The fix is to save a client_state in the scan_ranges_context object,
instead of a temporary object.

Fixes #19988

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#20418
2024-09-03 22:15:18 +03:00
Pavel Emelyanov
c03b1e2827 test: Remove unused database argument from make_sstable_for_all_shards() helper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20427
2024-09-03 21:36:28 +03:00
Calle Wilund
2695fefa81 commitlog/database: Make some commitlog options updatable + add feature listener
Makes some commitlog options runtime updatable. Most important for this case,
the usage of fragmented entries. Also adds a subscription in database on said
feature, to possibly enable once cluster enables it.
2024-09-03 16:38:28 +00:00
Calle Wilund
238a0236e5 features/config: Add feature for fragmented commitlog entries
Hides the functionality behind a cluster feature, i.e. postspones
using it until an upgrade is complete etc. This to allow rolling back
even with dirty nodes, at least until a cluster is commited.

Feature can also be disabled by scylla option, just in case. This will
lock it out of whole cluster, but this is probably good, because depending
on off or on, certain schema/raft ops might fail or succeed (due to large
mutations), and this should probably be equivalent across nodes.
2024-09-03 16:38:28 +00:00
Calle Wilund
9bf452c7a0 docs: Add entry on commitlog file format v4 2024-09-03 16:38:28 +00:00
Calle Wilund
ad595e4d6a commitlog_test: Add more oversized cases
Also adds some randomization to the tests.
2024-09-03 16:38:28 +00:00
Calle Wilund
1d5e509136 commitlog_replayer: Replay segments in order created
Minimizes potential buffer usage for fragmented entries.
2024-09-03 16:38:28 +00:00
Calle Wilund
61ff9486fb commitlog_replayer: Use replay state to support fragmented entries 2024-09-03 16:38:27 +00:00
Calle Wilund
7c16683184 commitlog_replayer: coroutinize partly 2024-09-03 16:38:27 +00:00
Calle Wilund
05bf2ae5d7 commitlog: Handle oversized entries
Refs #18161

Yet another approach to dealing with large commitlog submissions.

We handle oversize single mutation by adding yet another entry
type: fragmented. In this case we only add a fragment (aha) of
the data that needs storing into each entry, along with metadata
to correlate and reconstruct the full entry on replay.

Because these fragmented entries are spread over N segments, we
also need to add references from the first segment in a chain
to the subsequent ones. These are released once we clear the
relevant cf_id count in the base.
                 *
This approach has the downside that due to how serialization etc
works w.r.t. mutations, we need to create an intermediate buffer
to hold the full serialized target entry. This is then incrementally
written into entries of < max_mutation_size, successively requesting
more segments.

On replay, when encountering a fragment chain, the fragment is
added to a "state", i.e. a mapping of currently processing
frag chains. Once we've found all fragments and concatenated
the buffers into a single fragmented one, we can issue a
replay callback as usual.

Note that a replay caller will need to create and provide such
a state object. Old signature replay function remains for tests
and such.

This approach bumps the file format (docs to come).

To ensure "atomicity" we both force syncronization, and should
the whole op fail, we restore segment state (rewinding), thus
discarding data all we wrote.

v2:
* Improve some bookeep, ensure we keep track of segments and flush
  properly, to get counter correct
2024-09-03 16:38:27 +00:00
Anna Stuchlik
35796306a7 doc: comment out redirections for pages under Features
This commit temporarily disables redirections for all pages under Features
that were moved with this PR: https://github.com/scylladb/scylladb/pull/20401

Redirections work for all versions. This means that pages in 6.1 are redirected
to URLs that are not available yet (because 6.2 has not been released yet).

The redirections are correct and should be enabled when 6.2 is released:
I've created an issue to do it: https://github.com/scylladb/scylladb/issues/20428

Closes scylladb/scylladb#20429
2024-09-03 17:16:51 +02:00
Avi Kivity
6ddcf80d89 Merge 'Reuse sstable::test_env::reusable_sst() helper for pre-exsting sstables' from Pavel Emelyanov
Tests that try to access sstables from test/resource/ typically sstable::load() it after object creation. There's reusable_sst() helper for that. This PR fixes one more caller that still goes longer route by doing sstable and loading it on its own.

Closes scylladb/scylladb#20420

* github.com:scylladb/scylladb:
  test: Call reusable sst from ka_sst() helper
  test: Move sstable_open_config to reusable_sst()'s argument
2024-09-03 17:40:34 +03:00
Kamil Braun
504bf68ebb raft_group0_client: remove hold_read_apply_mutex overload without abort_source&
Ensure that every caller passes `abort_source&`.
2024-09-03 15:52:05 +02:00
Kamil Braun
79983723c8 storage_service: pass _abort_source to hold_read_apply_mutex
There's no point waiting for this lock if `storage_service` is being
aborted. In theory the lock, if held, should be eventually released by
whatever is holding it during shutdown -- but if there is some cyclic
reference between the services, and e.g. whatever holds the lock is
stuck because of ongoing shutdown and would only be unstuck by
`storage_service` getting stopped (which it can't because it's waiting
on the lock), that would cause a shutdown deadlock. Better to be safe
than sorry.
2024-09-03 15:52:05 +02:00
Kamil Braun
a7097fb985 group0_state_machine: pass _abort_source to hold_read_apply_mutex
`transfer_snapshot` was already passing `_abort_source` when trying to
take the lock but other member functions didn't.
2024-09-03 15:52:05 +02:00
Kamil Braun
a4d1065628 api: move reload_raft_topology_state implementation inside storage_service
In later commit we'll want to access more `storage_service` internals
in the API's implementation (namely, `_abort_source`)

Also moving the implementation there allows making
`service::topology_transition()` private again (it was made public in
992f1327d3 only for this API
implementation)
2024-09-03 15:52:03 +02:00
Andrei Chekun
27e5fa149a [test.py] Clean duplicated arg for test suite
Arguments mode and run_id already set in the _prepare_pytest_params, so
there is no need to set them one more time.
2024-09-03 14:41:57 +02:00
Andrei Chekun
8a9146ebda [test.py] Enable allure for python test
Enable allure adapter for all python tests. Add tag and parameters to the test to be able to distinguish them across modes and runs.

Related: https://github.com/scylladb/qa-tasks/issues/1665
2024-09-03 14:41:57 +02:00
Łukasz Paszkowski
20a6296309 test: Add reversed query tests on simulated upgrade process
Run the reversed queries on a 2-node cluster with CL=ALL with and
without NATIVE_REVERSE_QUERIES feature flag. When the flag is enabled,
the native reversed format is used, otherwise the legacy format.

The NATIVE_REVERSE_QUERIES feature flag is suppressed with an error
injection that simulates cluster upgrade process.

Backport is not required. The patch adds additional upgrade tests
for https://github.com/scylladb/scylladb/pull/18864

Closes scylladb/scylladb#20179
2024-09-03 14:45:08 +03:00
Pavel Emelyanov
0857b63259 Merge 'repair: row_level: coroutinize some slow-path functions' from Avi Kivity
This series coroutinizes up some functions in repair/row_level.cc. This enhances
readability and reduces bloat:

```
size  build/release/repair/row_level.o.{before,after}
   text	   data	    bss	    dec	    hex	filename
1650619	     48	    524	1651191	 1931f7	build/release/repair/row_level.o.before
1604610	     48	    524	1605182	 187e3e	build/release/repair/row_level.o.after
```

46kB of text were saved.

Functions that only touch a single mutation fragment were not coroutinized to avoid
adding a allocation in a fast path. In one case a function was split into a fast path and a
slow path.

Clean-up series, backport not needed.

Closes scylladb/scylladb#20283

* github.com:scylladb/scylladb:
  repair: row_level: restore indentation
  repair: row_level: coroutinize repair_meta::get_full_row_hashes_sink_op()
  repair: row_level: coroutinize repair_meta::get_full_row_hashes_source_op()
  repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_handler()
  repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_handler()
  repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_handler()
  repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_process()
  repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_process_op_slow_path()
  repair: row_level: split repair_get_row_diff_with_rpc_stream_process_op() into fast and slow paths
  repair: row_level: coroutinize repair_meta::put_row_diff_handler()
  repair: row_level: coroutinize repair_meta::put_row_diff_sink_op()
  repair: row_level: coroutinize repair_meta::put_row_diff_source_op()
  repair: row_level: coroutinize repair_meta::put_row_diff()
  repair: row_level: coroutinize repair_meta::get_row_diff_handler()
  repair: row_level: coroutinize repair_meta::get_row_diff_sink_op()
  repair: row_level: coroutinize repair_meta::to_repair_rows_on_wire()
  repair: row_level: coroutinize repair_meta::do_apply_rows()
  repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf_within_set_diff()
  repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf()
  repair: row_level: coroutinize repair_meta::row_buf_csum()
  repair: row_level: coroutinize repair_meta::get_repairs_row_size()
  repair: row_level: coroutinize repair_meta::set_estimated_partitions()
  repair: row_level: coroutinize repair_meta::get_estimated_partitions()
  repair: row_level: coroutinize repair_meta::do_estimate_partitions_on_local_shard()
  repair: row_level: coroutinize repair_reader::close()
  repair: row_level: coroutinize repair_reader::end_of_stream()
  repair: row_level: coroutinize sink_source_for_repair::close()
  repair: row_level: coroutinize sink_source_for_repair::get_sink_source()
2024-09-03 14:41:22 +03:00
Nadav Har'El
dd030f8112 alternator: improve RBAC access denied error messages
This patch address two requests made by reviewers of the original "Add
CQL-based RBAC support to Alternator" series. Both requests were about
the error messages produced when access is denied:

1. The error message is improved to use more proper English, and also
   to include the name of the role which was denied access.

2. The permission-check and error-message-formatting code is
   de-duplicated, using a common function verify_permission().

   This de-duplication required moving the access-denied error path to
   throwing an exception instead of the previous exception-free
   implementation. However, it can be argued that this change is actually
   a good thing, because it makes the successful case, when access is
   allowed, faster.

   The de-duplicated code is shorter and simpler, and allowed changing
   the text of the error message in just one place.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#20326
2024-09-03 14:39:30 +03:00
Kefu Chai
d26bb9ae30 sstables: correct the debugging message printed when removing temp dir
in 372a4d1b79, we introduced a change
which was for debugging the logging message. but the logging message
intended for printing the temp_dir not prints an `optional<int>`. this
is both confusing, and more importantly, it hurts the debuggability.

in this change, the related change is reverted.

Fixes scylladb/scylladb#20408

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20409
2024-09-03 14:36:08 +03:00
Pavel Emelyanov
e4bc5470cf test: Call reusable sst from ka_sst() helper
The sstable_mutation_test wants to load pre-existing sstables from
resouce/ subdir. For that there's reusable_sst() helper on env.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-03 14:01:28 +03:00
Pavel Emelyanov
e9980bd6dd test: Move sstable_open_config to reusable_sst()'s argument
So that callers are able to provide custom config in the future

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-03 14:00:59 +03:00
Laszlo Ersek
cd0819e3ed docs/dev/docker-hub.md: refresh aio-max-nr calculation
What we have today in "docs/dev/docker-hub.md" on "aio-max-nr" dates back
to scylla commit f4412029f4 ("docs/docker-hub.md: add quickstart section
with --smp 1", 2020-09-22). Problems with the current language:

- The "65K" claim as default value on non-production systems is wrong;
  "fs/aio.c" in Linux initializes "aio_max_nr" to 0x10000, which is 64K.

- The section in question uses equal signs (=) incorrectly. The intent was
  probably to say "which means the same as", but that's not what equality
  means.

- In the same section, the relational operator "<" is bogus. The available
  AIO count must be at least as high (>=) as the requested AIO count.

- Clearer names should be used;
  adjust_max_networking_aio_io_control_blocks() in "src/core/reactor.cc"
  sets a great example:

  - "reactor::max_aio" should be called "storage_iocbs",

  - "detect_aio_poll" should be called "preempt_iocbs",

  - "reactor_backend_aio::max_polls" should be called "network_iocbs".

- The specific value 10000 for the last one ("network_iocbs") is not
  correct in scylla's context. It is correct as the Seastar default, but
  scylla has used 50000 since commit 2cfc517874 ("main, test: adjust
  number of networking iocbs", 2021-07-18).

Rewrite the section to address these problems.

See also:
- https://github.com/scylladb/scylladb/issues/5981
- https://github.com/scylladb/seastar/pull/2396
- https://github.com/scylladb/scylladb/pull/19921

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-09-03 12:10:59 +02:00
Laszlo Ersek
15738d14ce docs/dev/docker-hub.md: strip trailing whitespace
Strip trailing whitespace.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-09-03 12:00:28 +02:00
Botond Dénes
2556e902b1 Update tools/jmx submodule
* tools/jmx 89308b77...793452a9 (1):
  > dist: support building packages in Github Actions
2024-09-03 11:58:37 +03:00
Anna Stuchlik
5193d2d171 doc: remove the seeds-related questions from the FAQ
This commit one of the series to remove the FAQ page by removing irrelevant/outdated entries
or moving them to the forum.

The question about seeds is irrelevant, not frequently asked, and covered in other sections
of the docs. Also, it mentions versions that are no longer supported.

Closes scylladb/scylladb#20403
2024-09-03 11:01:49 +03:00
Takuya ASADA
9d7fed40b5 install.sh: fix more incorrect permission on strict umask
Even after 13caac7, we still have more files incorrect permission, since
we use "cp -r" and creating new file with redirect.

To fix this, we need to replace "cp -r" with "cp -pr", and "chmod <perm>" on
newly created files.

Fixes #14383
Related #19775

Closes scylladb/scylladb#19786
2024-09-03 10:37:53 +03:00
Anna Stuchlik
360f7b3d33 doc: move Features to the top-level page
This commit moves the Features page from the section for developers
to the top level in the page tree. This involves:
- Moving the source files to the *features* folder from the  *using-scylla* folder.
- Moving images into *features/images* folder.
- Updating references to the moved resources.
- Adding redirections to the moved pages.

Closes scylladb/scylladb#20401
2024-09-03 07:24:33 +03:00
Kefu Chai
fb2ed20b42 .github: post a comment if "Fixes" policy is violated
it's more visible than an "Error" in the action's detail message.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19271
2024-09-03 07:23:48 +03:00
Botond Dénes
8f31d3f1fc Merge 'tools/nodetool: improve backup and restore commands' from Kefu Chai
this change contains two improvements to "backup" and "restore" commands:

- let them print task id
- let them return 1 as the exist status code upon operation failure

----

these changes are improvements to the newly introduced commands, which are not in any LTS branches yet, so no need to backport.

Closes scylladb/scylladb#20371

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: return failure with exit code in backup/restore
  tools/scylla-nodetool: let backup/restore print task id
2024-09-02 16:40:55 +03:00
Takuya ASADA
59aedb38d0 locator: retry HTTP request to GCE/Azure metadata service
Like we already do on EC2, implement retrying request to the metadata
service on GCE and Azure.

Closes #19817

Closes scylladb/scylladb#20189
2024-09-02 13:04:05 +03:00
Kefu Chai
e66e885e5b tools/scylla-nodetool: return failure with exit code in backup/restore
before this change, "backup" and "restore" commands always return 0 as
their exist code no matter if the performed operation fails or not.

inspired by the "task" commands of nodetool, let's return 1 with
exit code if the operation fails.

the tests are updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-02 15:12:26 +08:00
Kefu Chai
470c3e8535 tools/scylla-nodetool: let backup/restore print task id
in 20fffcdc, we added the "task wait" subcommand, so user is allowed to
interact with a task with its task id. and in existing implementation of
"backup" and "restore" command, if user does not pass `--nowait`, the
command just exits without any output upon sending the request to
scylladb.

in this change, we print out the task_id if user does not pass
`--nowait` command line option to "backup" or "restore" command. this
allows user to follow up on the operation if necessary.

the tests are updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-02 15:12:26 +08:00
Nadav Har'El
0b3890df46 test/cql-pytest: test RBAC auto-grant (and reproduce CDC bug)
This patch adds functional testing for the role-based access control
(RBAC) "auto-grant" feature, where if a user that is allowed to create
a table, it also recieves full permissions over the table it just
created. We also test permissions over new materialized views created
by a user, and over CDC logs. The test for CDC logs reproduces an
already suspected bug, #19798: A user may be allowed to create a table
with CDC enabled, but then is not allowed to read the CDC log just
created. The tests show that the other cases (base tables and views)
do not have this bug, and the creating user does get appropriate
permissions over the new table and views.

In addition to testing auto-grant, the patch also includes tests for
the opposite feature, "auto-revoke" - that permissions are removed when
the table/view/cdc is deleted. If we forget to do that while implementing
auto-grant, we risk that users may be able to use tables created by
other users just because they used the same table _name_ earlier.

It's important to have these auto-revoke tests together with the
auto-grant tests that reproduce #19798 - so we don't forget this
part when finally fixing #19798.

Refs #19798.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19845
2024-09-02 09:03:40 +03:00
Botond Dénes
52bed81a1e Merge 'cql3: add option to not unify bind variables with the same name' from Avi Kivity
Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping.

In 19a6e69001 we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to.

However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection.

Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables.

A unit test is added.

Fixes #15559

This may be useful to users transitioning from Cassandra, so merits a backport.

Closes scylladb/scylladb#19493

* github.com:scylladb/scylladb:
  cql3: add option to not unify bind variables with the same name
  cql3: introduce dialect infrastructure
  cql3: prepared_statement_cache: drop cache key default constructor
2024-09-02 08:34:24 +03:00
Kefu Chai
28b5471c01 docs/dev/maintainer.md: fix formatting
* in the "Backporting Seastar commits" section, there's a single quote
  instead of a backtick in this line, so fix it.
* add backticks around `refresh-submodules.sh`, which is a filename.
* correct the command line setting a git config option, because `git-config`
  does not support this command line syntax,

  ```console
  $ git config --global diff.conflictstyle = diff3
  $ git config --global get diff.conflictstyle
  =
  $ git config --global diff.conflictstyle diff3
  $ git config --global get diff.conflictstyle
  diff3
  ```

  quote from git-config(1)

  > ```
  > git config set [<file-option>] [--type=<type>] [--all] [--value=<value>] [--fixed-value] <name> <value>
  > ```
* stop using the deprecated mode of the `git-config` command, and use
  subcommand instead. as git-config(1) puts:

  > git config <name> <value> [<value-pattern>]
  >   Replaced by git config set [--value=<pattern>] <name> <value>.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20328
2024-09-01 22:24:01 +03:00
Yaniv Michael Kaul
2ebba9cd11 tools/toolchain/dbuild: prefer podman over docker
Check if podman is available before docker. If it is, use it. Otherwise, check for docker.

1. Podman is better. It runs with fewer resources, and I've had display issues with Docker (output was not shown consistently)
2. 'which docker' works even when the docker service and socket are turned off.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#20342
2024-09-01 22:17:01 +03:00
David Garcia
c4da75e392 docs: run docs test on changing config params
Triggers the "Build Docs" PR workflow whenever the `db/config.cc` or `db/config.h` files are edited. These files are used to produce documentation, and this change will help prevent the introduction of breaking changes to the documentation build when they are modified.

Closes scylladb/scylladb#20347
2024-09-01 22:15:48 +03:00
Avi Kivity
0f4b05824e Merge 'perf/perf_sstable: add {crawling,partitioned}_streaming modes' from Kefu Chai
for testing the load performance of load_and_stream operation.

Refs #19989

---

no need to backport. it adds two new tests to the existing `perf_sstable` tool for evaluating the load performance when performing the "load_and_streaming" operation. hence has no impact on the production.

Closes scylladb/scylladb#20186

* github.com:scylladb/scylladb:
  perf/perf_sstable: add {crawling,partitioned}_streaming modes
  test/perf/perf_sstable: use switch-case when appropriate
2024-09-01 22:04:22 +03:00
Avi Kivity
7197d280b0 Merge 'scylla-gdb.py: lazy-evaluate the constants ' from Kefu Chai
instead of evaluating the constants in-class, accessing them via
a cached class property.

it would be handy if we could source `scylla-gdb.py` in `.gdbinit`,
but this script accesses some symbols which are not available without
a file being debugged. what's why gdb fails to load the init script:

```
Traceback (most recent call last):
  File "/home/kefu/dev/scylladb/scylla-gdb.py", line 167, in <module>
    class intrusive_slist:
  File "/home/kefu/dev/scylladb/scylla-gdb.py", line 168, in intrusive_slist
    size_t = gdb.lookup_type('size_t')
             ^^^^^^^^^^^^^^^^^^^^^^^^^
gdb.error: No type named size_t.
```

so we have to `file path/to/scylla` and *then*
`source scylla-gdb.py` every time when we debug scylla or a seastar
application, instead of loading `scylla-gdb.py` in `.gdbinit`.

the reason is that the script accesses the debug symbols like
`gdb.lookup_type('size_t')` in-class. so when the python interpreter
reads the script, it evaluates this statement, but at that moment,
the debug symbols are not loaded, so `source scylla-gdb.py` fails
in `.gdbinit`.

in this change, we transform all these class variables to cached
properties, so that they

* are evaluated on-demand
* are evaluated only once at most

this addresses the pain at the expense of verbosity.

---

this change intends to improve the developer's user experience, and has no impacts on product, so no need to backport.

Closes scylladb/scylladb#20334

* github.com:scylladb/scylladb:
  test/scylla_gdb: test the .gdb init use case
  scylla-gdb.py: lazy-evaluate the constants
2024-09-01 20:00:53 +03:00
Pavel Emelyanov
7df43312ac test: Remove sstable making helpers from table_for_tests
All users of it have sstable_test_env at hand (in fact -- they call env
method to get table_for_test). And since sstable_test_env already has a
bunch of methods to create sstable, the table_for_test wrapper doesn't
need to duplicate this code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20360
2024-09-01 19:58:15 +03:00
Kefu Chai
bc2b7b47c8 build: cmake: add and use Scylla_CLANG_INLINE_THRESHOLD cmake parameter
so that we can set this the parameter passed to `-inline-threshold` with
`configure.py` when building with CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20364
2024-09-01 19:56:02 +03:00
Kefu Chai
6970c502c9 dist: drop %pretrans section
before this change, if user does not have `/bin/sh` around, when
installing scylla packages, the script in `%pretrans" is executed,
and fails due to missing `/bin/sh`. per
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#pretrans

> Note that the %pretrans scriptlet will, in the particular case of
> system installation, run before anything at all has been installed.
> This implies that it cannot have any dependencies at all. For this
> reason, %pretrans is best avoided, but if used it MUST (by necessity)
> be written in Lua. See
> https://rpm-software-management.github.io/rpm/manual/lua.html for more
> information.

but we were trying to warn users upgrading from scylla < 1.7.3, which
was released 7 years ago at the time of writing.

in this change, we drop the `%pretrans` section. hopefuly they will
find their way out if they still exist.

Fixes scylladb/scylladb#20321

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20365
2024-09-01 19:46:19 +03:00
Kefu Chai
a06e1c6545 scylla-housekeeping: use raw string to avoid using escape sequence
before this change, when running `scylla-housekeeping`:
```
/opt/scylladb/scripts/libexec/scylla-housekeeping:122: SyntaxWarning: invalid escape sequence '\s'
  match = re.search(".*http.?://repositories.*/scylladb/([^/\s]+)/.*/([^/\s]+)/scylladb-.*", line)
```

we could have the warning above. because `\s` is not a valid escape
sequence, but the Python interpreter accepts it as two separated
characters of `\s` after complaining. but it's still annoying.

so, let's use a raw string here.

Refs scylladb/scylladb#20317
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20359
2024-09-01 18:59:23 +03:00
Kefu Chai
e431b90145 test/boost/view_build_test: include used header
before this change, when building the test of `view_build_test` with
clang-20, we can have following build failure:

```
FAILED: test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o
/home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o -MF test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o.d -o test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o -c /home/kefu/dev/scylladb/test/boost/view_build_test.cc
/home/kefu/dev/scylladb/test/boost/view_build_test.cc:998:5: error: unknown type name 'simple_schema'
  998 |     simple_schema ss;
      |     ^
```

apparently, `simple_schema`'s declaration is not available in this
translation unit.

in this change

* we include the header where `simple_schema` is defined, so that
  the build passes with clang-20.
* also take this opportunity to reorder the header a little bit,
  so the testing headers are grouped together.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20367
2024-09-01 18:58:23 +03:00
Kefu Chai
753188c33d test: include seastar/testing/random.hh when appropriate
in a recent seastar change (644bb662), we do not include
`seastar/testing/random.hh` in `seastar/testing/test_runner.hh` anymore,
as the latter is not a facade of the former, and neither does it use the
former. as a sequence, some tests which take the advantage of the
included `seastar/testing/random.hh` do not build with the latest
seastar:

```
FAILED: test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o
/usr/bin/clang++ -DBOOST_REGEX_DYN_LINK -DBOOST_REGEX_NO_LIB -DBOOST_UNIT_TEST_FRAMEWORK_DYN_LINK -DBOOST_UNIT_TEST_FRAMEWORK_NO_LIB -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSCYLLA_ENABLE_PREEMPTION_SOURCE -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/build/gen -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -I/__w/scylladb/scylladb/build -isystem /__w/scylladb/scylladb/abseil -isystem /__w/scylladb/scylladb/build/rust -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -MD -MT test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -MF test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o.d -o test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -c /__w/scylladb/scylladb/test/lib/key_utils.cc
In file included from /__w/scylladb/scylladb/test/lib/key_utils.cc:11:
/__w/scylladb/scylladb/test/lib/random_utils.hh:25:30: error: no member named 'local_random_engine' in namespace 'seastar::testing'
   25 |     return seastar::testing::local_random_engine;
      |            ~~~~~~~~~~~~~~~~~~^
1 error generated.
```

in this change, we include `seastar/testing/random.hh` when the random
facility is used, so that they can be compiled with the latest seastar
library.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20368
2024-09-01 18:57:07 +03:00
Kefu Chai
0104c7d371 tools/scylla-nodetool: s/vm.count()/vm.contains()/
under the hood, std::map::count() and std::map::contains() are nearly
identical. both operations search for the given key witin the map.
however, the former finds a equal range with the given
key, and gets the distance between the disntance between the begin
and the end of the range; while the later just searches with the given
key.

since scylla-nodetool is not a performance-critical application, the
minor difference in efficiency between these two operations is unlikely
to have a significant impact on its overall performance.

while std::map::count() is generally suitable for our need, it might be
beneficial to use a more appropriate API.

in this change, we use std::map::contains() in the place of
std::map::count() when checking for the existence of a paramter with
given name.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20350
2024-09-01 18:39:00 +03:00
Avi Kivity
ddf344e4f1 Merge 'compaction: use structured binding and ranges library when appropriate' from Kefu Chai
for better readability

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#20366

* github.com:scylladb/scylladb:
  compaction: use std::views::reverse when appropriate
  compaction: use structured binding when appropriate
2024-09-01 18:35:15 +03:00
Avi Kivity
ea8441dfa3 cql3: add option to not unify bind variables with the same name
Bind variables in CQL have two formats: positional (`?`) where a
variable is referred to by its relative position in the statement,
and named (`:var`), where the user is expected to supply a
name->value mapping.

In 19a6e69001 we identified the case where a named bind variable
appears twice in a query, and collapsed it to a single entry in the
statement metadata. Without this, a driver using the named variable
syntax cannot disambiguate which variable is referred to.

However, it turns out that users can use the positional call form
even with the named variable syntax, by using the positional
API of the driver. To support this use case, we add a configuration
variable to disable the same-variable detection.

Because the detection has to happen when the entire statement is
visible, we have to supply the configuration to the parser. We
call it the `dialect` and pass it from all callers. The alternative
would be to add a pre-prepare call similar to fill_prepare_context that
rewrites all expressions in a statement to deduplicate variables.

A unit test is added.

Fixes #15559
2024-09-01 17:27:48 +03:00
Avi Kivity
60acfd8c08 docs: cql: document ZstdCompressor for CREATE TABLE
Adjust the wording slightly to be less awkward.

Closes scylladb/scylladb#20377
2024-09-01 14:28:09 +03:00
Kefu Chai
e53a9a99cd compaction: use std::views::reverse when appropriate
let's use the standard library when appropriate.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-01 08:44:01 +08:00
Kefu Chai
3801c079e2 compaction: use structured binding when appropriate
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-01 08:34:10 +08:00
Avi Kivity
61e6a77a99 repair: row_level: restore indentation 2024-08-30 23:00:59 +03:00
Avi Kivity
a35942e09a repair: row_level: coroutinize repair_meta::get_full_row_hashes_sink_op()
Extra care is needed for exception handling.
2024-08-30 22:55:16 +03:00
Avi Kivity
8e9ebd82fc repair: row_level: coroutinize repair_meta::get_full_row_hashes_source_op() 2024-08-30 22:55:16 +03:00
Avi Kivity
f7d19e237d repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_handler()
Both the handle_exception() and finally() blocks need some extra care.
2024-08-30 22:55:16 +03:00
Avi Kivity
bb8751f4b5 repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_handler()
Both the handle_exception() and finally() blocks need some extra care.
2024-08-30 22:55:16 +03:00
Avi Kivity
7ba0642da2 repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_handler()
Both the handle_exception() and finally() blocks need some extra care.
2024-08-30 22:55:16 +03:00
Avi Kivity
61bbf452c6 repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_process() 2024-08-30 22:55:16 +03:00
Avi Kivity
01a578f608 repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_process_op_slow_path() 2024-08-30 22:55:16 +03:00
Avi Kivity
3733105f78 repair: row_level: split repair_get_row_diff_with_rpc_stream_process_op() into fast and slow paths
This allows coroutinization of the slow path without affecting the fast path.
2024-08-30 22:55:16 +03:00
Avi Kivity
e17c3b71a8 repair: row_level: coroutinize repair_meta::put_row_diff_handler() 2024-08-30 22:55:16 +03:00
Avi Kivity
74ea2b9663 repair: row_level: coroutinize repair_meta::put_row_diff_sink_op()
Exception handling is a bit awkward since can't co_await in a catch block.
2024-08-30 22:55:16 +03:00
Avi Kivity
e4362a5b7b repair: row_level: coroutinize repair_meta::put_row_diff_source_op() 2024-08-30 22:55:16 +03:00
Avi Kivity
b998d69f09 repair: row_level: coroutinize repair_meta::put_row_diff() 2024-08-30 22:55:16 +03:00
Avi Kivity
3f2b5fe5dc repair: row_level: coroutinize repair_meta::get_row_diff_handler() 2024-08-30 22:55:16 +03:00
Avi Kivity
cd63971501 repair: row_level: coroutinize repair_meta::get_row_diff_sink_op()
Since sink.close() is called from an exception handler, some code
movement is needed so it isn't co_awaited from a catch block.
2024-08-30 22:55:16 +03:00
Avi Kivity
3f28dec88c repair: row_level: coroutinize repair_meta::to_repair_rows_on_wire()
coroutine::maybe_yield() introduced to compensate for loss of
stall-protected do_for_each()
2024-08-30 22:55:16 +03:00
Avi Kivity
1a84f1a73d repair: row_level: coroutinize repair_meta::do_apply_rows()
coroutine::maybe_yield() introduced to compensate for loss of
stall-protected do_for_each()
2024-08-30 22:55:16 +03:00
Avi Kivity
7f15cc446f repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf_within_set_diff()
coroutine::maybe_yield() introduced to compensate for loss of
stall-protected do_for_each()
2024-08-30 22:55:16 +03:00
Avi Kivity
93ca202bd3 repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf()
coroutine::maybe_yield() introduced to compensate for loss of
stall-protected do_for_each()
2024-08-30 22:55:15 +03:00
Avi Kivity
5f8895d908 repair: row_level: coroutinize repair_meta::row_buf_csum()
coroutine::maybe_yield() introduced to compensate for loss of
stall-protected do_for_each()
2024-08-30 22:55:15 +03:00
Avi Kivity
d1e45f2982 repair: row_level: coroutinize repair_meta::get_repairs_row_size()
coroutine::maybe_yield() introduced to compensate for loss of
stall-protected do_for_each()
2024-08-30 22:55:15 +03:00
Avi Kivity
0b1bf57d19 repair: row_level: coroutinize repair_meta::set_estimated_partitions() 2024-08-30 22:55:15 +03:00
Avi Kivity
aee078d8e5 repair: row_level: coroutinize repair_meta::get_estimated_partitions() 2024-08-30 22:55:15 +03:00
Avi Kivity
51534f60eb repair: row_level: coroutinize repair_meta::do_estimate_partitions_on_local_shard() 2024-08-30 22:55:12 +03:00
Kamil Braun
e01cef01a6 Merge 'Ignore seed name resolution errors during the restart of a cluster member node.' from Sergey Zolotukhin
All seeds hostname resolution errors will be ignored during a node
restart in case the node had already joined a cluster.  This will
prevent restart errors if some seed names are not resolvable.

Fixes scylladb/scylladb#14945

Closes scylladb/scylladb#20292

* github.com:scylladb/scylladb:
  Ignore seed name resolution errors on restart.
  Add a test for starting with a wrong seed.
2024-08-30 11:33:44 +02:00
Kamil Braun
292ef0d1f9 Merge 'Fix node replace with inter-dc encryption enabled.' from Gleb Natapov
Currently if a coordinator and a node being replaced are in the same DC
while inter-dc encryption is enabled (connections between nodes in the
same DC should not be encrypted) the replace operation will fail. It
fails because a coordinator uses non encrypted connection to push raft
data to the new node, but the new node will not accept such connection
until it knows which DC the coordinator belongs to and for that the raft
data needs to be transferred.

The series adds the test for this scenario and the fix for the
chicken&egg problem above.

The series (or at least the fix itself) needs to be backported because
this is a serious regression.

Fixes: scylladb/scylladb#19025

Closes scylladb/scylladb#20290

* github.com:scylladb/scylladb:
  topology coordinator: fix indentation after the last patch
  topology coordinator: do not add replacing node without a ring to topology
  test: add test for replace in clusters with encryption enabled
  test.py: add server encryption support to cluster manager
  .gitignore: fix pattern for resources to match only one specific directory
2024-08-30 11:29:05 +02:00
Kefu Chai
82fbe317ec test/scylla_gdb: test the .gdb init use case
before this change, we run all the tests in a single pytest session,
with scylladb debug symbols loaded. but we want to test another use
case, where the scylladb debug symbols are missing.

in this change,

* we do not check for the existence of debug symbols until necessary
* add a mark named "without_scylla"
* run the tests in two pytest sessions
  - one with "without_scylla" mark
  - one with "not without_scylla" mark
* add a test which is marked with the "without_scylla" mark. the test
  verify that the scylla-gdb.py script can be loaded even without
  scylladb debug symbols.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-30 17:05:29 +08:00
Kefu Chai
7dd63c891f scylla-gdb.py: lazy-evaluate the constants
instead of evaluating the constants in-class, accessing them via
a cached class property.

it would be handy if we could source `scylla-gdb.py` in `.gdbinit`,
but this script accesses some symbols which are not available with
a file being debugged. so when gdb fails to load init script:

```
Traceback (most recent call last):
  File "/home/kefu/dev/scylladb/scylla-gdb.py", line 167, in <module>
    class intrusive_slist:
  File "/home/kefu/dev/scylladb/scylla-gdb.py", line 168, in intrusive_slist
    size_t = gdb.lookup_type('size_t')
             ^^^^^^^^^^^^^^^^^^^^^^^^^
gdb.error: No type named size_t.
```

so we have to `file path/to/scylla` and *then*
`source scylla-gdb.py` every time when we debug scylla or a seastar
application, instead of loading `scylla-gdb.py` in `.gdbinit`.

the reason is that the script access the debug symbols like
`gdb.lookup_type('size_t')` in-class. so when the python interpreter
reads the script, it evaluates this statement, but at that moment,
the debug symbols are not loaded, so `source scylla-gdb.py` fails
in `.gdbinit`.

in this change, we transform all these class variables to cached
property, so that they

* are evaluated on-demand
* are evaluated only once at most

this addresses the pain at the expense of verbosity.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-30 17:05:29 +08:00
Pavel Emelyanov
cec4d207f6 Merge 'repair: throw if batchlog manager isn't initialized' from Aleksandra Martyniuk
repair_service::repair_flush_hints_batchlog_handler may access batchlog
manager while it is uninitialized.

Throw if batchlog manager isn't initialized.

Fixes:  #20236.

Needs backport to 6.0 and 6.1 as they suffer from the uninitialized bm access.

Closes scylladb/scylladb#20251

* github.com:scylladb/scylladb:
  test: add test to ensure repair won't fail with uninitialized bm
  repair: throw if batchlog manager isn't initialized
2024-08-30 11:37:24 +03:00
Anna Stuchlik
4471c80bdc doc: add the 6.1-to-6.2 upgrade guide
This commit replaces the 6.0-to-6.1 upgrade guide with the 6.1-to-6.2 upgrade guide.

The new guide is a template that covers the basic procedure.
If any 6.2-specific updates are required, they will have to be added along with development.

Closes scylladb/scylladb#20178
2024-08-30 10:10:45 +03:00
Piotr Dulikowski
c05be27e4a Merge 'db/hints: Move the code for writing hints to a separate function' from Dawid Mędrek
In scylladb/scylladb@7301a96, in the function `hint_endpoint_manager::store_hint()`,
we transformed the lambda passed to `seastar::with_gate()` to a coroutine lambda
to improve the readability. However, there was a subtle problem related to
lifetimes of the captures that needed to be addressed:

* Since we started `co_await`ing in the lambda, the captures were at risk of
  being destructed too soon. The usual solution is to wrap a coroutine lambda
  within a `seastar::coroutine::lambda` object and rely on the extended lifetime
  enforced by the semantics of the language.
  See `docs/dev/lambda-coroutine-fiasco.md` for more context.
* However, since we don't immediately `co_await` the future returned by
  `with_gate()`, we cannot rely on the extended lifetime provided by the wrapper.
  The document linked in the previous bullet point suggests keeping the passed
  coroutine lambda as a variable and pass it as a reference to `with_gate()`.
  However, that's not feasible either because we discard the returned future and
  the function returns almost instantly -- destructing every local object, which
  would encompass the lambda too.

The solution used in the commit was to move captures of the lambda into
the lambda's body. That helped because Seastar's backend is responsible for
keeping all of the local variables alive until the lambda finishes its execution.
However, we didn't move all of the captures into the lambda -- the missing one
was the `this` pointer that was implicitly used in the lambda.

Address sanitiser hasn't reported any bugs related to the pointer yet, but
the bug is most likely there.

In this commit, we transform the lambda's body into a new member function
and only call it from the lambda. This way, we don't need to care about
the lifetimes of the captures because Seastar ensures that the function's
arguments stay alive until the coroutine finishes.

Choosing this solution instead of assigning `this` to a pointer variable
inside the lambda's body and using it to refer to the object's members
has actual benefit: it's not possible to accidentally forget to refer
to a member of the object via the pointer; it also makes the code less
awkward.

Fixes scylladb/scylladb#20306

Closes scylladb/scylladb#20258

* github.com:scylladb/scylladb:
  db/hints: Fix indentation in `do_store_hint()`
  db/hints: Move code for writing hints to separate function
2024-08-30 09:09:02 +02:00
Avi Kivity
bbcfd47bf5 doc: nodetool: toppartitions: document --samplers and --capacity
In particular --capacity is critical for obtaining accurate measurements.

Closes scylladb/scylladb#20192
2024-08-30 10:07:54 +03:00
Botond Dénes
9f9346fc59 Merge 'nodetool: tasks: add nodetool commands to track task manager tasks' from Aleksandra Martyniuk
Add nodetool commands to manage task manager tasks:
- tasks abort - aborts the task
- tasks list - lists all tasks in the module
- tasks modules - lists all modules
- tasks set-ttl - sets task ttl
- tasks status - gets status of the task
- tasks tree - gets statuses of the task and all its desendent's
- tasks ttl - gets task ttl
- tasks wait - waits for the task and gets its status

Fixes: https://github.com/scylladb/scylladb/issues/19201.

Closes scylladb/scylladb#19614

* github.com:scylladb/scylladb:
  test: nodetool: add tests for tasks commands
  nodetool: tasks: add nodetool commands to track task manager tasks
  api: task_manager: return status 403 if a task is not abortable
  api: task_manager: return none instead of empty task id
  api: task_manager: add timeout to wait_task
  api: task_manager: add operation to get ttl
  nodetool: add suboperations support
  nodetool: change operations_with_func type
  nodetool: prepare operation related classes for suboperations
2024-08-30 07:37:37 +03:00
Avi Kivity
d69bf4f010 cql3: introduce dialect infrastructure
A dialect is a different way to interpret the same CQL statement.

Examples:
 - how duplicate bind variable names are handled (later in this series)
 - whether `column = NULL` in LWT can return true (as is now) or
   whether it always returns NULL (as in SQL)

Currently, dialect is an empty structure and will be filled in later.
It is passed to query_processor methods that also accept a CQL string,
and from there to the parser. It is part of the prepared statement cache
key, so that if the dialect is changed online, previous parses of the
statement are ignored and the statement is prepared again.

The patch is careful to pick up the dialect at the entry point (e.g.
CQL protocol server) so that the dialect doesn't change while a statement
is parsed, prepared, and cached.
2024-08-29 21:19:23 +03:00
Avi Kivity
f9322799af cql3: prepared_statement_cache: drop cache key default constructor
It's unnecessary, and interferes with the following patch where
we change the cache key type.
2024-08-29 21:07:00 +03:00
Avi Kivity
67b24859bc Merge 'generic_server: convert connection tracking to seastar::gate' from Laszlo Ersek
~~~
generic_server: convert connection tracking to seastar::gate

If we call server::stop() right after "server" construction, it hangs:

With the server never listening (never accepting connections and never
serving connections), nothing ever calls server::maybe_stop().
Consequently,

    co_await _all_connections_stopped.get_future();

at the end of server::stop() deadlocks.

Such a server::stop() call does occur in controller::do_start_server()
[transport/controller.cc], when

- cserver->start() (sharded<cql_server>::start()) constructs a
  "server"-derived object,

- start_listening_on_tcp_sockets() throws an exception before reaching
  listen_on_all_shards() (for example because it fails to set up client
  encryption -- certificate file is inaccessible etc.),

- the "deferred_action"

      cserver->stop().get();

  is invoked during cleanup.

(The cserver->stop() call exposing the connection tracking problem dates
back to commit ae4d5a60ca ("transport::controller: Shut down distributed
object on startup exception", 2020-11-25), and it's been triggerable
through the above code path since commit 6b178f9a4a
("transport/controller: split configuring sockets into separate
functions", 2024-02-05).)

Tracking live connections and connection acceptances seems like a good fit
for "seastar::gate", so rewrite the tracking with that. "seastar::gate"
can be closed (and the returned future can be waited for) without anyone
ever having entered the gate.

NOTE: this change makes it quite clear that neither server::stop() nor
server::shutdown() must be called multiple times. The permitted sequences
are:

- server::shutdown() + server::stop()

- or just server::stop().

Fixes #10305

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
~~~

Fixes #10305.

I think we might want to backport this -- it fixes a hang-on-misconfiguration which affects `scylla-6.1.0-0.20240804.abbf0b24a60c.x86_64` minimally. Basically every release that contains commit ae4d5a60ca has a theoretical chance for the hang, and every release that contains commit 6b178f9a4a has a practical chance for the hang.

Focusing on the more practical symptom (i.e., releases containing commit 6b178f9a4a), `git tag --contains 6b178f9a4a90` gives us (ignoring candidates and release candidates):
- scylla-6.0.0
- scylla-6.0.1
- scylla-6.0.2
- scylla-6.1.0

Closes scylladb/scylladb#20212

* github.com:scylladb/scylladb:
  generic_server: make server::stop() idempotent
  generic_server: coroutinize server::shutdown()
  generic_server: make server::shutdown() idempotent
  test/generic_server: add test case
  configure, cmake: sort the lists of boost unit tests
  generic_server: convert connection tracking to seastar::gate
2024-08-29 19:45:48 +03:00
Laszlo Ersek
db44000f8d Update seastar submodule
* seastar 83e6cdfd...ec5da7a6 (1):
  > reactor, linux-aio: advise users in more detail on setting aio-max-nr

Fixes #5981

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>

Closes scylladb/scylladb#20307
2024-08-29 19:42:02 +03:00
Raphael S. Carvalho
26facd807e storage_service: avoid processing same table unnecessarily in split monitor
If there's a token metadata for a given table, and it is in split mode,
it will be registered such that split monitor can look at it, for
example, to start split work, or do nothing if table completed it.

during topology change, e.g. drain, split is stalled since it cannot
take over the state machine.
It was noticed that the log is being spammed with a message saying the
table completed split work, since every tablet metadata update, means
waking up the monitor on behalf of a table. So it makes sense to
demote the logging level to debug. That persists until drain completes
and split can finally complete.

Another thing that was noticed is that during drain, a table can be
submitted for processing faster than the monitor can handle, so the
candidate queue may end up with multiple duplicated entries for same
table, which means unnecessary work. That is fixed by using a
sequenced set, which keeps the current FIFO behavior.

Fixes #20339.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#20029
2024-08-29 19:38:43 +03:00
Aleksandra Martyniuk
1f46cad5de test: nodetool: add tests for tasks commands 2024-08-29 17:37:13 +02:00
Aleksandra Martyniuk
20fffcdcf5 nodetool: tasks: add nodetool commands to track task manager tasks 2024-08-29 17:37:12 +02:00
Avi Kivity
7da3314deb Merge 'Integrated restore' from Ernest Zaslavsky
Handed over from https://github.com/scylladb/scylladb/pull/20149

This adds minimal implementation of the start-restore API call.

The method starts a task that runs load-and-stream functionality against sstables from S3 bucket. Arguments are:

```
endpoint -- the ID in object_store.yaml config file
bucket -- the target bucket to get objects from
keyspace -- the keyspace to work on
table -- the table to work on
snapshot -- the name of the snapshot from which the backup was taken
```
The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion.

Remote sstables components are scanned as if they were placed in local upload/ directory. Then colelcted sstables are fed into load-and-stream.

This branch has https://github.com/scylladb/scylladb/pull/19890 (Integrated backup), https://github.com/scylladb/scylladb/pull/20120 (S3 lister) and few more minor PRs merged in. The restore branch itself starts with [utils: Introduce abstract (directory) lister](29c867b54d) commit.

refs: https://github.com/scylladb/scylladb/issues/18392

Closes scylladb/scylladb#20305

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add restore integration
  test/object_store: Add simple restore test
  test/object_store: Generalize prepare_snapshot_for_backup()
  code: Introduce restore API method
  sstable_loader: Add sstables::storage_manager dependency
  sstable_loader: Maintain task manager module
  sstable_loader: Out-line constructor
  distributed_loader: Split get_sstables_from_upload_dir()
  sstables/storage: Compose uploaded sstable path simpler
  sstable_directory: Prepare FS lister to scan files on S3
  sstable_directory: Parse sstable component without full path
  s3-client: Add support for lister::filter
  utils: Introduce abstract (directory) lister
2024-08-29 18:25:30 +03:00
Kamil Braun
9574c399ce Merge 'add support for zero-token nodes' from Patryk Jędrzejczak
We revive the `join_ring` option. We support it only in the
Raft-based topology, as we plan to remove the gossip-based topology
when we fix the last blocker - the implementation of the manual
recovery tool. In the Raft-based topology, a node can be assigned
tokens only once when it joins the cluster. Hence, we disallow
joining the ring later, which is possible in Cassandra.

The main idea behind the solution is simple. We make the unsupported
special case of zero tokens a supported normal case. Nodes with zero
tokens assigned are called "zero-token nodes" from now on.

From the topology point of view, zero-token nodes are the same as
token-owning nodes. They can be in the same states, etc. From the
data point of view, they are different. They are not members of
the token ring, so they are not present in
`token_metadata::_normal_token_owners`. Hence, they are ignored in
all non-local replication strategies. The tablet load balancer also
ignores them.

Zero-token nodes can be used as coordinator-only nodes, just like in
Cassandra. They can handle requests just like token-owning nodes.

The main motivation behind zero-token nodes is that they can prevent
the Raft majority loss efficiently. Zero-token nodes are group 0
voters, but they can run on much weaker and cheaper machines because
they do not replicate data and handle client requests by default
(drivers ignore them). For example, if there are two DCs, one with 4
nodes and one with 5 nodes, if we add a DC with 2 zero-token nodes,
every DC will contain less than half of the nodes, so we won't lose
the majority when any DC dies.

Another way of preventing the Raft majority loss is changing the
voter set, which is tracked by scylladb/scylladb#18793. That approach
can be used together with zero-token nodes. In the example above, if
we choose equal numbers of voters in both DCs, then a DC with one
zero-token node will be sufficient. However, in the typical setup of
2 DCs with the same number of nodes it is enough to add a DC with
only one zero-token node without changing the voter set.

Zero-token nodes could also be used as load balancers in the
Alternator.

Additionally, this PR fixes scylladb/scylladb#11087, which turned out to
be a blocker.

This PR introduced a new feature. There is no need to backport it.

Fixes scylladb/scylladb#6527
Fixes scylladb/scylladb#11087
Fixes scylladb/scylladb#15360

Closes scylladb/scylladb#19684

* github.com:scylladb/scylladb:
  docs: raft: document using zero-token nodes to prevent majority loss
  test: test recovery mode in the presence of zero-token nodes
  test: topology: util.py: add cqls parameter to check_system_topology_and_cdc_generations_v3_consistency
  test: topology: util.py: accept zero tokens in check_system_topology_and_cdc_generations_v3_consistency
  treewide: support zero-token nodes in the recovery mode
  storage_proxy: make TRUNCATE work locally for local tables
  test: topology: util.py: document that check_token_ring_and_group0_consistency fails with zero-token nodes
  test: test zero-token nodes
  test: test_topology_ops: move helpers to topology/util.py
  feature_service: introduce the ZERO_TOKEN_NODES feature
  storage_service: rename join_token_ring to join_topology
  storage_service: raft_topology_cmd_handler: improve warnings
  topology_coordinator: fix indentation after the previous patch
  treewide: introduce support for zero-token nodes in Raft topology
  system_keyspace: load_topology_state: remove assertion impossible to hit
  treewide: distinguish all nodes from all token owners
  gossip topology: make a replacing node remove the replaced node from topology
  locator: topology: add_or_update_endpoint: use none as the default node state
  test: boost: tablets tests: ensure all nodes are normal token owners
  token_metadata: rename get_all_endpoints and get_all_ips
  network_topology_strategy: reallocate_tablets: remove unused dc_rack_nodes
  virtual_tables: cluster_status_table: execute: set dc regardless of the token ownership
2024-08-29 16:26:21 +02:00
Gleb Natapov
32a59ba98f topology coordinator: fix indentation after the last patch 2024-08-29 17:14:09 +03:00
Gleb Natapov
17f4a151ce topology coordinator: do not add replacing node without a ring to topology
When only inter dc encryption is enabled a non encrypted connection
between two nodes is allowed only if both nodes are in the same dc.
If a nodes that initiates the connection knows that dst is in the same
dc and hence use non encrypted connection, but the dst not yet knows the
topology of the src such connection will not be allowed since dst cannot
guaranty that dst is in the same dc.

Currently, when topology coordinator is used, a replacing node will
appear in the coordinator's topology immediately after it is added to the
group0. The coordinator will try to send raft message to the new node
and (assuming only inter dc encryption is enabled and replacing node and
the coordinator are in the same dc) it will try to open regular, non encrypted,
connection to it. But the replacing node will not have the coordinator
in it's topology yet (it needs to sync the raft state for that). so it
will reject such connection.

To solve the problem the patch does not add a replacing node that was
just added to group0 to the topology. It will be added later, when
tokens will be assigned to it. At this point a replacing node will
already make sure that its topology state is up-to-date (since it will
execute a raft barrier in join_node_response_params handler) and it knows
coordinator's topology. This aligns replace behaviour with bootstrap
since bootstrap also does not add a node without a ring to the topology.

The patch effectively reverts b8ee8911ca

Fixes: scylladb/scylladb#19025
2024-08-29 17:14:09 +03:00
Gleb Natapov
2f1b1fd45e test: add test for replace in clusters with encryption enabled 2024-08-29 17:14:09 +03:00
Gleb Natapov
b98282a976 test.py: add server encryption support to cluster manager 2024-08-29 17:14:09 +03:00
Gleb Natapov
84757a4ed3 .gitignore: fix pattern for resources to match only one specific directory 2024-08-29 17:13:58 +03:00
Dawid Medrek
d459cf91eb db/hints: Fix indentation in do_store_hint() 2024-08-29 14:47:08 +02:00
Dawid Medrek
75ce6943d0 db/hints: Move code for writing hints to separate function
In scylladb/scylladb@7301a96, in the function `hint_endpoint_manager::store_hint()`,
we transformed the lambda passed to `seastar::with_gate()` to a coroutine lambda
to improve the readability. However, there was a subtle problem related to
lifetimes of the captures that needed to be addressed:

* Since we started `co_await`ing in the lambda, the captures were at risk of
  being destructed too soon. The usual solution is to wrap a coroutine lambda
  within a `seastar::coroutine::lambda` object and rely on the extended lifetime
  enforced by the semantics of the language.
  See `docs/dev/lambda-coroutine-fiasco.md` for more context.

* However, since we don't immediately `co_await` the future returned by
  `with_gate()`, we cannot rely on the extended lifetime provided by the wrapper.
  The document linked in the previous bullet point suggests keeping the passed
  coroutine lambda as a variable and pass it as a reference to `with_gate()`.
  However, that's not feasible either because we discard the returned future and
  the function returns almost instantly -- destructing every local object, which
  would encompass the lambda too.

The solution used in the commit was to move captures of the lambda into
the lambda's body. That helped because Seastar's backend is responsible for
keeping all of the local variables alive until the lambda finishes its execution.
However, we didn't move all of the captures into the lambda -- the missing one
was the `this` pointer that was implicitly used in the lambda.

Address sanitiser hasn't reported any bugs related to the pointer yet, but
the bug is most likely there.

In this commit, we transform the lambda's body into a new member function
and only call it from the lambda. This way, we don't need to care about
the lifetimes of the captures because Seastar ensures that the function's
arguments stay alive until the coroutine finishes.

Choosing this solution instead of assigning `this` to a pointer variable
inside the lambda's body and using it to refer to the object's members
has actual benefit: it's not possible to accidentally forget to refer
to a member of the object via the pointer; it also makes the code less
awkward.
2024-08-29 14:47:02 +02:00
Aleksandra Martyniuk
627fc46ca7 api: task_manager: return status 403 if a task is not abortable 2024-08-29 13:53:40 +02:00
Aleksandra Martyniuk
10ab60f32b api: task_manager: return none instead of empty task id
If a user requests a status of a task that does not have a parent,
show "none" instead of an empty parent_id.
2024-08-29 13:53:40 +02:00
Aleksandra Martyniuk
5bcff4d544 api: task_manager: add timeout to wait_task 2024-08-29 13:53:40 +02:00
Aleksandra Martyniuk
3d78172328 api: task_manager: add operation to get ttl 2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk
fb160afaf6 nodetool: add suboperations support
Modify nodetool methods so that it support suboperations.
2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk
4b96f9abb9 nodetool: change operations_with_func type
Change the type of operations_with_func so that they can contain
suboperations.
2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk
c6f8a0116a nodetool: prepare operation related classes for suboperations
Modify operation and add operation_action class so that information
about suboperations is stored. It's a preparation for adding
suboperations support to nodetool.
2024-08-29 13:53:39 +02:00
Kefu Chai
dbb056f4f7 build: cmake: point -ffile-prefix-map to build directory
before this change, we included `-ffile-prefix-map=${CMAKE_SOURCE_DIR}=.`
in cflags when building the tree with CMake, but this was wrong.
as the "." directory is the build directory used by CMake. and this
directory is specified by the `-B` option when generating the building
system. if `configure.py --use-cmake` is used to build the tree,
the build directory would be "build". so this option instructs the compiler
to replace the directory of source file in the debug symbols and in
`__FILE__` at compile time.

but, in a typical workspace, for instance, `build/main.cc` does not exist.
the reason why this does not apply to CMake but applies to the rules
generated by `configure.py` is that, `configure.py` puts the generated
`build.ninja` right under the top source directory, so `.` is correct and
it helps to create reproducible builds. because this practically erases
the path prefixes in the build output. while CMake puts it under the
specified build directory, replacing the source directory with the build
directory with the file prefix map is just wrong.

there are two options to address this problem:

* stop passing this option. but this would lead to non-reproducible
  builds. as we would encode the build directory in the "scylla"
  executable. if a developer needs to rebuild an executable for debugging
  a coredump generated in production, he/she would have to either
  build the tree in the same directory as our CI does. or, he/she
  has to pass `-ffile-prefix-map=...` to map the local build directory
  to the one used by CI. this is not convenient.
* instead of using `${CMAKE_SOURCE_DIR}=.`, add `${CMAKE_BINARY_DIR}=.`.
  this erases the build directory in the outputs, but preserves the
  debuggability.

so we pick the second solution.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20329
2024-08-29 12:28:11 +03:00
Patryk Jędrzejczak
c192a9ee3b docs: raft: document using zero-token nodes to prevent majority loss 2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
e027ffdffc test: test recovery mode in the presence of zero-token nodes
We modify existing tests to verify that the recovery mode works
correctly in the presence of zero-token nodes.

In `test_topology_recovery_basic`, we test the case when a
zero-token node is live. In particular, we test that the
gossip-based restart of such a node works.

In `test_topology_recovery_after_majority_loss`, we test the case
when zero-token nodes are unrecoverable. In particular, we test
that the gossip-based removenode of such nodes works.

Since zero-token nodes are ignored by the Python driver if it also
connects to other nodes, we use different CQL sessions for a
zero-token node in `test_topology_recovery_basic`.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
fb1e060c4c test: topology: util.py: add cqls parameter to check_system_topology_and_cdc_generations_v3_consistency
In the following commit, we modify `test_topology_recovery_basic`
to test the recovery mode in the presence of live zero-token nodes.
Unfortunately, it requires a bit ugly workaround. Zero-token nodes
are ignored by the Python driver if it also connects to other
nodes because of empty tokens in the `system.peers` table. In that
test, we must connect to a zero-token node to enter the recovery
mode and purge the Raft data. Hence, we use different CQL sessions
for different nodes.

In the future, we may change the Python driver behavior and revert
this workaround. Moreover, the recovery tests will be removed or
significantly changed when we implement the manual recovery tool.
Therefore, we shouldn't worry about this workaround too much.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
54905fc179 test: topology: util.py: accept zero tokens in check_system_topology_and_cdc_generations_v3_consistency
Before we use `check_system_topology_and_cdc_generations_v3_consistency`
in a test with a zero-token node, we must ensure it doesn't fail
because of zero tokens in a row of the `system.topology` table.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
02bb70da19 treewide: support zero-token nodes in the recovery mode
Before we implement the manual recovery tool, we must support
zero-token nodes in the recovery mode. This means that two topology
operations involving zero-token nodes must work in the gossip-based
topology:
- removing a dead zero-token node,
- restarting a live zero-token node.
We make changes necessary to make them work in this patch.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
87b415efdc storage_proxy: make TRUNCATE work locally for local tables
In on of the following patches, we implement support for zero-token
nodes in the recovery mode. To achieve this, we need to be able to
purge all Raft data on live zero-token nodes by using TRUNCATE.
Currently, TRUNCATE works the same for all replication strategies - it
is performed on all token owners. However, zero-token nodes are not
token owners, so TRUNCATE would ignore them. Since zero-token nodes
store only local tables, fixing scylladb/scylladb#11087 is the perfect
solution for the issue with zero-token nodes. We do it in this patch.

Fixes scylladb/scylladb#11087
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
21c8409fa4 test: topology: util.py: document that check_token_ring_and_group0_consistency fails with zero-token nodes 2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
95e14ae44b test: test zero-token nodes
We add tests to verify the basic properties of zero-token nodes.

`test_zero_token_nodes_no_replication` and
`test_not_enough_token_owners` are more or less deterministic tests.
Running them only in the dev mode is sufficient.

`test_zero_token_nodes_topology_ops` is quite slow, as expected,
considering parameterization and the number of topology operations.
In the future we can think of making it faster or skipping in the
debug mode. For now, our priority is to test zero-token nodes
thoroughly.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
d43d67c525 test: test_topology_ops: move helpers to topology/util.py
In one of the following patches, we reuse the helper functions from
`test_topology_ops` in a new test, so we move them to `util.py`.

Also, we add the `cl` parameter to `start_writes`, as the new test
will use `cl=2`.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
574c252391 feature_service: introduce the ZERO_TOKEN_NODES feature
Zero-token nodes must be supported by all nodes in the cluster.
Otherwise, the non-supporting nodes would crash on some assertion
that assumes only token-owing normal nodes make sense.

Hence, we introduce the ZERO_TOKEN_NODES cluster feature. Zero-token
nodes refuse to boot if it is not supported.

I tested this patch manually. First, I booted a node built in the
previous patch. Then, I tried to add a zero-token node built in this
patch. It refused to boot as expected.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
c25eefe217 storage_service: rename join_token_ring to join_topology
After introducing zero-token nodes that call join_token_ring but do
not join the ring, the join_token_ring name does not make much sense.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
9937cf3a24 storage_service: raft_topology_cmd_handler: improve warnings 2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
3ce936da7b topology_coordinator: fix indentation after the previous patch 2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
22d907e721 treewide: introduce support for zero-token nodes in Raft topology
We revive the `join_ring` option. We support it only in the
Raft-based topology, as we plan to remove the gossip-based topology
when we fix the last blocker - the implementation of the manual
recovery tool. In the Raft-based topology, a node can be assigned
tokens only once when it joins the cluster. Hence, we disallow
joining the ring later, which is possible in Cassandra.

The main idea behind the solution is simple. We make the unsupported
special case of zero tokens a supported normal case. Nodes with zero
tokens assigned are called "zero-token nodes" from now on.

From the topology point of view, zero-token nodes are the same as
token-owning nodes. They can be in the same states, etc. From the
data point of view, they are different. They are not members of
the token ring, so they are not present in
`token_metadata::_normal_token_owners`. Hence, they are ignored in
all non-local replication strategies. The tablet load balancer also
ignores them.

Topology operations involving zero-token nodes are simplified:
- `add` and `replace` finish in the `join_group0` state, so creating
a new CDC generation and streaming are skipped,
- `removenode` and `decommission` skip streaming,
- `rebuild` does not even contact the topology coordinator as there
is nothing to rebuild,

Also, if the topology operation involves a token-owning node,
zero-token nodes are ignored in streaming.

Zero-token nodes can be used as coordinator-only nodes, just like in
Cassandra. They can handle requests just like token-owning nodes.

The main motivation behind zero-token nodes is that they can prevent
the Raft majority loss efficiently. Zero-token nodes are group 0
voters, but they can run on much weaker and cheaper machines because
they do not replicate data and handle client requests by default
(drivers ignore them). For example, if there are two DCs, one with 4
nodes and one with 5 nodes, if we add a DC with 2 zero-token nodes,
every DC will contain less than half of the nodes, so we won't lose
the majority when any DC dies.

Another way of preventing the Raft majority loss is changing the
voter set, which is tracked by scylladb/scylladb#18793. That approach
can be used together with zero-token nodes. In the example above, if
we choose equal numbers of voters in both DCs, then a DC with one
zero-token node will be sufficient. However, in the typical setup of
2 DCs with the same number of nodes it is enough to add a DC with
only one zero-token node without changing the voter set.

Zero-token nodes could also be used as load balancers in the
Alternator.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
ba016c9af7 system_keyspace: load_topology_state: remove assertion impossible to hit
We store tokens in a non-frozen set, which doesn't distinguish an
empty set from no value. Hence, hitting this assertion is impossible.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
ed55261650 treewide: distinguish all nodes from all token owners
In one of the following patches, we introduce support for zero-token
nodes. From that point, getting all nodes and getting all token
owners isn't equivalent. In this patch, we ensure that we consider
only token owners when we want to consider only token owners (for
example, in the replication logic), and we consider all nodes when
we want to consider all nodes (for example, in the topology logic).

The main purpose of this patch is to make the PR introducing
zero-token nodes easier to review. The patch that introduces
zero-token nodes is already complicated. We don't want trivial
changes from this patch to make noise there.

This patch introduces changes needed for zero-token nodes only in the
Raft-based topology and in the recovery mode. Zero-token nodes are
unsupported in the gossip-based topology outside recovery.

Some functions added to `token_metadata` and `topology` are
inefficient because they compute a new data structure in every call.
They are never called in the hot path, so it's not a serious problem.
Nevertheless, we should improve it somehow. Note that it's not
obvious how to do it because we don't want to make `token_metadata`
store topology-related data. Similarly, we don't want to make
`topology` store token-related data. We can think of an improvement
in a follow-up.

We don't remove unused `topology::get_datacenter_rack_nodes` and
`topology::get_datacenter_nodes`. These function can be useful in the
future. Also, `topology::_dc_nodes` is used internally in `topology`.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
2d9575d6a9 gossip topology: make a replacing node remove the replaced node from topology
In the following patch, we change the gossiper to work the same for
zero-token nodes and token-owning nodes. We replace occurrences of
`is_normal_token_owner` with topology-based conditions. We want to
rely on the invariant that token-owning nodes own tokens if and only
if they are in the normal or leaving state. However, this invariant
is broken by a replacing node because it does not remove the
replaced node from topology. Hence, after joining, the replacing node
has topology with a node that is not a token owner anymore but is
in a leaving state (`being_replaced`). We fix it to prevent the
following patch from introducing a regression.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
c7016dedb3 locator: topology: add_or_update_endpoint: use none as the default node state
In one of the following patches, we change the gossiper to work the
same for zero-token nodes and token-owning nodes. We replace
occurrences of `is_normal_token_owner` with topology-based
conditions. We want to rely on the invariant that token-owning nodes
own tokens if and only if they are in the normal or leaving state.
However, this invariant can be broken in the gossip-based topology
when a new node joins the cluster. When a boostrapping node starts
gossiping, other nodes add it to their topology in
`storage_service::on_alive`. Surprisingly, the state of the new node
is set to `normal`, as it's the default value used by
`add_or_update_endpoint`. Later, the state will be set to
`bootstrapping` or `replacing`, and finally it will be set again to
`normal` when the join operation finishes. We fix this strange
behavior by setting the node state to `none` in
`storage_service::on_alive` for nodes not present in the topology.
Note that we must add such nodes to the topology. Other code needs
their Host ID, IP, and location.

We change the default node state from `normal` to `none` in
`add_or_update_endpoint` to prevent bugs like the one in
`storage_service::on_alive`. Also, we ensure that nodes in the `none`
state are ignored in the getters of `locator::topology`.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
6adaf85634 test: boost: tablets tests: ensure all nodes are normal token owners
In one of the following patches, we make NetworkTopologyStrategy
and the tablet load balancer consider only normal token owners to
ensure they ignore zero-token nodes. Some unit tests would start
failing after this change because they do not ensure that all
nodes are normal token owners. This patch prevents it.

Judging by the logic in the test cases in
`network_topology_strategy_test`, `point++` was probably intended
anyway.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
366605224c token_metadata: rename get_all_endpoints and get_all_ips
In one of the following patches, we introduce support for zero-token
nodes. A zero-token node that has successfully joined the cluster is
in the normal state but is not a normal token owner. Hence, the names
of `get_all_endpoints` and `get_all_ips` become misleading. They
should specify that the functions return only IDs/IPs of token owners.
2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
293a66fe41 network_topology_strategy: reallocate_tablets: remove unused dc_rack_nodes 2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak
4ff08decb8 virtual_tables: cluster_status_table: execute: set dc regardless of the token ownership
If a node is in `locator::topology`, then it has a location.
We remove the token ownership condition to make the table more
descriptive.
2024-08-29 10:37:06 +02:00
Kefu Chai
ecfe0aace6 perf: perf_mutation_readers: break memtable class down
before this change, memtable serves as the fixture for 6 test cases,
actually these 6 test cases can be categorized into a matrix of 3 x 2:
{ single_row, multi_row, large_partition } x { single_partition, multi_paritition }.

in this change, we break memtable into 3 different fixtures, to reflect
this fact. more readable this way. and a benefit is that each test does
not have to pay for the overhead of setup it does not use at all.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20177
2024-08-29 08:54:17 +03:00
Botond Dénes
e538e3593c Merge 'build: add --no-use-cmake option to configure.py' from Kefu Chai
as part of the efforts to address scylladb/scylladb#2717, we are
switching over to the CMake-based building system, and fade out the
mechinary to create the rules manually in `configure.py`.

in this change, we add `--no-use-cmake` to `configure.py`, it serves
two purposes:

* prepare for the change which enables cmake by default, by then,
  we would set the default value of `use_cmake` to True, and allow
  user to keep using the existing mechinary in the transition period
  using `--no-use-cmake`.
* allows the CI to tell if a tree is able to build with CMake.
  the command line option of `--use-cmake` is also used by the CI
  workflows, and is passed to `configure.py` if `BUILD_WITH_CMAKE`
  jenkins pipeline parameter is set. but not all branches with
  `--use-cmake` are ready to build with CMake -- only the latest
  master HEAD is ready. so the CI needs to check the capability of
  building with CMake by looking at the output of `configure.py --help`,
  to see if it includes `--no-use-cmake`.
  after this change lands. we will remove the `BUILD_WITH_CMAKE`
  parameter, and use cmake as long as `configure.py` supports
  `--no-use-cmake` option.

the existing mechinary will stay with us for a short transition
period so that developers can take time to get used to the
usage of the naming of targets and the new  directory arrangement.

as a side effect, #20079 will be fixed after switching to CMake.

---

this is a cmake-related change, hence no need to backport.

Closes scylladb/scylladb#20261

* github.com:scylladb/scylladb:
  build: add --no-use-cmake option to configure.py
  build: let configure.py fail if unknown option is passed to it
2024-08-29 08:51:41 +03:00
Kefu Chai
a182bfd96a tools/read_mutation: reuse parse_table_directory_name()
less repeatings this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20315
2024-08-29 08:49:20 +03:00
Nadav Har'El
6391550bbc test/alternator: add another check to test_stream_list_tables
The test test_streams.py::test_stream_list_tables reproduces a bug where
enabling streams added a spurious result to ListTables. A reviewer of
that patch asked to also add a check that name of the table itself
doesn't disappear from ListTables when a stream is enabled, so this is
what this patch adds.

This theoretical scenario (a table's name disappearing from ListTables)
never happened, so the new check doesn't reproduce any known bug, but
I guess it never hurts to make the test stronger for regression testing.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19934
2024-08-29 08:45:22 +03:00
Nadav Har'El
61e5927e8e repair: fix build on older compilers
The code tries to build as "neighbors" an unordered_map from an iterator
of std::tuple, instead of the correct std::pair. Apparently, the tuples
are transparently converted to pairs on the newest compilers and the
whole works, but on slightly older compilers (like the one on Fedora 39)
Scylla no longer compiles - the compiler complains it can't convert a
tuple to a pair in this context.

So fix the code to use pairs, not tuples, and it fixes the build on
Fedora 39.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#20319
2024-08-28 19:56:03 +03:00
Laszlo Ersek
49bff3b1ab generic_server: make server::stop() idempotent
After server::shutdown(), make server::stop() more robust too, by allowing
callers (internal or external) to call it several times (not concurrently
though, just yet; see
<https://github.com/scylladb/scylladb/issues/20309>).

Suggested-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-28 15:54:31 +02:00
Kefu Chai
03ab80501f tools/scylla-nodetool: add restore integration
as we have an API for restore a keyspace / table, let's expose this feature
with nodetool. so we can exercise it without the help of scylla-manager
or 3rd-party tools with a user-friendly interface.

in this change:

* add a new subcommand named "restore" to nodetool
* add test to verify its interaction with the API server
* update the document accordingly.
* the bash completion script is updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-28 15:42:49 +03:00
Pavel Emelyanov
41b9eda398 test/object_store: Add simple restore test
The test shows how to restore previously backed up table:

- backup
- truncate to get rid of existing sstables
- start restore with the new API method
- wait for the task to finish

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-28 15:42:49 +03:00
Pavel Emelyanov
f5a22a94c6 test/object_store: Generalize prepare_snapshot_for_backup()
Give it snapshot-name argument. Next test will want custom snapshot
name.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-28 15:42:49 +03:00
Pavel Emelyanov
11a04bfb66 code: Introduce restore API method
The method starts a task that uses sstables_loader load-and-stream
functionality to bring new sstables into the cluster. The existing
load-and-stream picks up sstables from upload/ directory, the newly
introduced task collects them from S3 bucket and given prefix (that
correspond to the path where backup API method put them).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-28 15:42:49 +03:00
Sergey Zolotukhin
65f37f3ba6 Ignore seed name resolution errors on restart.
Gossiper seeds host name resolution failures are ignored during restart if
a node is already boostrapped (i.e. it has successfully joined the cluster).

Fixes scylladb/scylladb#14945
2024-08-28 14:01:04 +02:00
Patryk Jędrzejczak
08cb3a5e2c test: test_raft_recovery_basic: add raft=trace logs
It could help when we hit scylladb/scylladb#17918 again.

This PR only changes log levels in a test, no need to backport it.

Refs scylladb/scylladb#17918

Closes scylladb/scylladb#20318
2024-08-28 13:50:09 +02:00
Sergey Zolotukhin
fc5e683d02 Add a test for starting with a wrong seed.
The test checks a bootstrapped node start with a wrong host name in the
seeds config.

Test for scylladb/scylladb#14945
2024-08-28 11:34:37 +02:00
Laszlo Ersek
1138347e7e generic_server: coroutinize server::shutdown()
By turning server::shutdown() into a coroutine, we need not dynamically
allocate "nr_conn".

Verified as follows:

(1) In terminal #1:

    build/Dev/scylla --overprovisioned --developer-mode=yes \
        --memory=2G --smp=1 --default-log-level error \
        --logger-log-level cql_server=debug:cql_server_controller=debug

> INFO  [...] cql_server_controller - Starting listening for CQL clients
>                                     on 127.0.0.1:9042 (unencrypted,
>                                     non-shard-aware)
> INFO  [...] cql_server_controller - Starting listening for CQL clients
>                                     on 127.0.0.1:19042 (unencrypted,
>                                     shard-aware)

(2) In terminals #2 and #3:

    tools/cqlsh/bin/cqlsh.py

(3) Press ^C in terminal #1:

> DEBUG [...] cql_server - abort accept nr_total=2
> DEBUG [...] cql_server - abort accept 1 out of 2 done
> DEBUG [...] cql_server - abort accept 2 out of 2 done
> DEBUG [...] cql_server - shutdown connection nr_total=4
> DEBUG [...] cql_server - shutdown connection 1 out of 4 done
> DEBUG [...] cql_server - shutdown connection 2 out of 4 done
> DEBUG [...] cql_server - shutdown connection 3 out of 4 done
> DEBUG [...] cql_server - shutdown connection 4 out of 4 done
> INFO  [...] cql_server_controller - CQL server stopped

This patch is best viewed with "git show --word-diff=color".

Suggested-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-28 10:59:44 +02:00
Laszlo Ersek
2216275ebd generic_server: make server::shutdown() idempotent
Make server::shutdown() more robust by allowing callers (internal or
external) to call it several times (not concurrently though, just yet; see
<https://github.com/scylladb/scylladb/issues/20309>).

Suggested-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-28 10:59:44 +02:00
Laszlo Ersek
dbc0ca6354 test/generic_server: add test case
Check whether we can stop a generic server without first asking it to
listen.

The test fails currently; the failure mode is a hang, which triggers the 5
minute timeout set in the test:

> unknown location(0): fatal error: in "stop_without_listening":
> seastar::timed_out_error: timedout
> seastar/src/testing/seastar_test.cc(43): last checkpoint
> test/boost/generic_server_test.cc(34): Leaving test case
> "stop_without_listening"; testing time: 300097447us

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-28 10:59:44 +02:00
Laszlo Ersek
931f2f8d73 configure, cmake: sort the lists of boost unit tests
Both lists were obviously meant to be sorted originally, but by today
we've introduced many instances of disorder -- thus, inserting a new test
in the proper place leaves the developer scratching their head. Sort both
lists.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-28 10:59:44 +02:00
Laszlo Ersek
5a04743663 generic_server: convert connection tracking to seastar::gate
If we call server::stop() right after "server" construction, it hangs:

With the server never listening (never accepting connections and never
serving connections), nothing ever calls server::maybe_stop().
Consequently,

    co_await _all_connections_stopped.get_future();

at the end of server::stop() deadlocks.

Such a server::stop() call does occur in controller::do_start_server()
[transport/controller.cc], when

- cserver->start() (sharded<cql_server>::start()) constructs a
  "server"-derived object,

- start_listening_on_tcp_sockets() throws an exception before reaching
  listen_on_all_shards() (for example because it fails to set up client
  encryption -- certificate file is inaccessible etc.),

- the "deferred_action"

      cserver->stop().get();

  is invoked during cleanup.

(The cserver->stop() call exposing the connection tracking problem dates
back to commit ae4d5a60ca ("transport::controller: Shut down distributed
object on startup exception", 2020-11-25), and it's been triggerable
through the above code path since commit 6b178f9a4a
("transport/controller: split configuring sockets into separate
functions", 2024-02-05).)

Tracking live connections and connection acceptances seems like a good fit
for "seastar::gate", so rewrite the tracking with that. "seastar::gate"
can be closed (and the returned future can be waited for) without anyone
ever having entered the gate.

NOTE: this change makes it quite clear that neither server::stop() nor
server::shutdown() must be called multiple times. The permitted sequences
are:

- server::shutdown() + server::stop()

- or just server::stop().

Fixes #10305

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-28 10:59:44 +02:00
Kefu Chai
6d8dca1e20 build: add --no-use-cmake option to configure.py
as part of the efforts to address scylladb/scylladb#2717, we are
switching over to the CMake-based building system, and fade out the
mechinary to create the rules manually in `configure.py`.

in this change, we add `--no-use-cmake` to `configure.py`, it serves
two purposes:

* prepare for the change which enables cmake by default, by then,
  we would set the default value of `use_cmake` to True, and allow
  user to keep using the existing mechinary in the transition period
  using `--no-use-cmake`.
* allows the CI to tell if a tree is able to build with CMake.
  the command line option of `--use-cmake` is also used by the CI
  workflows, and is passed to `configure.py` if `BUILD_WITH_CMAKE`
  jenkins pipeline parameter is set. but not all branches with
  `--use-cmake` are ready to build with CMake -- only the latest
  master HEAD is ready. so the CI needs to check the capability of
  building with CMake by looking at the output of `configure.py --help`,
  to see if it includes --no-use-cmake`.
  after this change lands. we will remove the `BUILD_WITH_CMAKE`
  parameter, and use cmake as long as `configure.py` supports
  `--no-use-cmake` option.

the existing mechinary will stay with us for a short transition
period so that developers can take time to get used to the
usage of the naming of targets and the new  directory arrangement.

as a side effect, #20079 will be fixed after switching to CMake.

Refs scylladb/scylladb#2717
Refs scylladb/scylladb#20079

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-28 11:37:56 +08:00
Kefu Chai
a2de14be7f build: let configure.py fail if unknown option is passed to it
this allows us to use `configure.py` to tell if a certain argument is supported
without parsing its output. in the next commit, we will add `--no-use-cmake` option,
which will be used to tell if the tree is ready for using CMake for its building
system.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-28 11:37:55 +08:00
Kefu Chai
e4b213f041 build: cmake: use the same options to configure seastar
in `configure.py`, a set of options are specified when configuring
seastar, but not all of them were ported to scylla's CMake building
system.

for instance, `configure.py` explicitly disables io_uring reactor
backend at build time, but the CMake-based system does not.

so, in this change, in order to preserve the existing behavior, let's
port the two previously missing option to CMake-based building system
as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20288
2024-08-28 06:15:59 +03:00
Avi Kivity
94d5507237 Merge 'select from mutation_fragments() + tablets: handle reads for non-owned partitions' from Botond Dénes
Attempting to read a partition via `SELECT * FROM MUTATION_FRAGMENTS()`, which the node doesn't own, from a table using tablets causes a crash.
This is because when using tablets, the replica side simply doesn't handle requests for un-owned tokens and this triggers a crash.
We should probably improve how this is handled (an exception is better than a crash), but this is outside the scope of this PR.
This PR fixes this and also adds a reproducer test.

Fixes: https://github.com/scylladb/scylladb/issues/18786

Fixes a regression introduced in 6.0, so needs backport to 6.0 and 6.1

Closes scylladb/scylladb#20109

* github.com:scylladb/scylladb:
  test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works
  replica/mutation_dump: enfore pinning of effective replication map
  replica/mutation_dump: handle un-owned tokens (with tablets)
2024-08-27 20:46:10 +03:00
Avi Kivity
b13ab90448 Merge 'alternator/executor: Use native reversed format' from Łukasz Paszkowski
When executing reversed queries, a native revered format shall be used. Therefore, the table schema and the clustering key bounds are reversed before a partition slice and a read command are constructed.

It is, however, possible to run a reversed query passing a table schema but only when there are no restrictions on the clustering keys. In this particular situation, the query returns correct results. Since the current alternator tests in test.py do not imply any restrictions, this situation was not caught during development of https://github.com/scylladb/scylladb/pull/18864.

Hence, additional tests are provided that add clustering keys restrictions when executing reversed queries to capture such errors earlier than in dtests.

Additional manual tests were performed to test a mixed-node cluster (with alternator API enabled in Scylla on each node):

1. 2-node cluster with one node upgraded: reverse read queries performed on an old node
2. 2-node cluster with one node upgraded: reverse read queries performed on a new node
3. 2-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on an old node
4. 2-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on a new node

All reverse read queries above consists of:

- single-partition reverse reads with no clustering key restrictions, with single column restrictions and multi column restrictions both with and without paging turned on

The exact same tests were also performed on a fully upgraded cluster.

Fixes https://github.com/scylladb/scylladb/issues/20191

No backport is required as this is a complementary patch for the series https://github.com/scylladb/scylladb/pull/18864 that did not require backporting.

Closes scylladb/scylladb#20205

* github.com:scylladb/scylladb:
  test_query.py: Test reverse queries with clustering key bounds
  alternator::do_query Add additional trace log
  alternator::do_query: Use native reversed format
  alternator::do_query Rename schema with table_schema
2024-08-27 20:40:49 +03:00
Benny Halevy
18c45f7502 raft_rebuild: propagate source_dc force option to rebuild_option
Currently, the `force` property of the `source_dc` rebuild option
is lost and `raft_topology_cmd_handler` has no way to know
if it was given or not.

This in turn can cause rebuild to fail, even when `--force`
is set by the user, where it would succeed with gossip
topology changes, based on the source_dc --force semantics.

Fixes scylladb/scylladb#20242

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#20249
2024-08-27 17:05:48 +02:00
Kefu Chai
d27fdf9f57 Update seastar submodule
* seastar a7d81328...83e6cdfd (29):
  > fair_queue: Export the number of times class was activated
  > tests/unit: drop support of C++17
  > remove vestigial OSv support
  > cmake: undefine _FORTIFY_SOURCE on thread.cc
  > container_perf: a benchmark for container perf
  > io_sink: use chunked_fifo as _pending_io container
  > chunked_fifo: implement clear in terms of pop_n
  > chunked_fifo: pop_front_n
  > io_sink: use iteration instead of indexing
  > json2code_test: choose less popular port number
  > ioinfo: add '--max-reqsize' parameter
  > treewide: drop the support of fmtlib < 8.0.0
  > build: bump up the required fmtlib version to 8.1.1
  > conditional-variable: align when() and wait() behaviour in case of a predicate throwing an exception
  > stall-analyser: add output support for flamegraph
  > reactor: Add --io-completion-notify-ms option
  > io_queue: Stall detector
  > io_queue: Keep local variable with request execution delay
  > io_queue: Rename flow ratio timer to be more generic
  > reactor: Export _polls counter (internally)
  > dns: de-inline dns_resolver::impl methods
  > dns: enter seastar::net namespace
  > dnf: drop compatibility for c-ares <= 1.16
  > reactor: add missing includes of noncopyable_function.hh
  > reactor: Reset one-shot signal to DFL before handling
  > future: correctly document nested exception type emitted by finally()
  > modules: fix FATAL_ERROR on compiler check
  > seastar.cc: include fmt/ranges.h
  > pack io_request

Closes scylladb/scylladb#20300
2024-08-27 17:51:21 +03:00
Avi Kivity
2f4ef31254 Merge 'tools/testing: update dist-check to use rockylinux and adapt to cmake' from Kefu Chai
`dist-check` tests the generated rpm packages by installing them in a centos 7 container. but this script is terribly outdated

- centos 7 is deprecated. we should use a new distro's latest stable release.
- cqlsh was added to the family of rpms a while ago. we should test it as well.
- the directory hierarchy has been changed. we should read the artifacts from the new directories.
- cmake uses a different directory hierarchy. we should check the directory used by cmake as well.

to address these breaking changes, the scripts are updated accordingly.

---

this change gives an overhaul to a test, which is not used in production. so no need to backport.

Closes scylladb/scylladb#20267

* github.com:scylladb/scylladb:
  tools/testing: add cqlsh rpm
  tools/testing: adapt to cmake build directory
  tools/testing: test with rockylinux:9 not centos:7
  tools/testing: correct the paths to rpm packages and SCYLLA-*-FILE
  dist-check: add :z option when mapping volume
2024-08-27 16:16:34 +03:00
Pavel Emelyanov
1f3f0b1926 sstable_loader: Add sstables::storage_manager dependency
The storage_manager maintains set of clients to configured object
storage(s). The sstables loader  is going to spawn tasks that will talk
to to those storages, thus it needs the storage manager to get the
clients clients from.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
06c3c53deb sstable_loader: Maintain task manager module
This service is going to start tasks managed by task manager. For that,
it should have its module set up and registered.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
9cf95e8a07 sstable_loader: Out-line constructor
It will grow and become more complicated. Better to have it outside the
header.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
6a006d2255 distributed_loader: Split get_sstables_from_upload_dir()
Next patches will need this method to initialize sstable_directory
differently and then do its regular processing. For that, split the
method into two, next patch will re-use the common part it needs.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
630ab1dbea sstables/storage: Compose uploaded sstable path simpler
Current S3 storage driver keeps sstables in bucket in a form of

  /bucket/generation/component-name

To get sstables that are backed up on S3 this format doesn't apply,
because components are uploaded with their names unmodified. This patch
makes S3 storage driver account for that and not re-format component
paths for upload sstable state.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
2eda917375 sstable_directory: Prepare FS lister to scan files on S3
When component lister is created it checks the target storage options
for what kind of lister to create. For local options it creates FS
lister that collects sstables from their component files. For S3
options, it relies on sstables registry.

When collecting sstables from backup, it's not possible to use registry,
because those entries are not there. Instead, lister should pick up
individual components as it they were on local FS. This patch prepares
the lister for that -- in case S3 options are provided and the sstables'
state is "upload", don't try to read those from registry, but
instantiate the FS lister that will later use s3::bucket_lister.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
60d43911a9 sstable_directory: Parse sstable component without full path
When sstable directory collects a entry from storage, it tries to parse
its full path with the help of sstables::parse_path(). There are two
overloads of that function -- one with ks:cf arguments and one without.
The latter tries to "guess" keyspace and table names from the directory
name.

However, ks and table names are already known by the directory, it
doesn't even use the returned ks and cf values, so this parsing is
excessive. Also, future patches will put here backup paths, that might
not match the ks_name/table_name-table_uuid/ pattern that the parser
expects.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
86bc5b11fe s3-client: Add support for lister::filter
Directory lister comes with a filter function that tells lister which
entries to skip by its .get() method. For uniformity, add the same to
S3 bucket_lister.

After this change the lister reports shorter name in the returned
directory entry (with the prefix cut), so also need to tune up the unit
test respectively.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:40 +03:00
Pavel Emelyanov
113d2449f8 utils: Introduce abstract (directory) lister
This patch hides directory_lister and bucket_lister behind a common
facade. The intention is to provide a uniform API for sstable_directory
that it could use to list sstables' components wherever they are.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:40 +03:00
Piotr Dulikowski
da5f4faac1 Merge 'mv: reject user requests by coordinator when a replica is overloaded by MVs' from Wojciech Mitros
Currently, when a view update backlog of one replica is full, the write is still sent by the coordinator to all replicas. Because of the backlog, the write fails on the replica, causing inconsistency that needs to be fixed by repair. To avoid these inconsistencies, this patch adds a check on the coordinator for overloaded replicas. As a result, a write may be rejected before being sent to any replicas and later retried by the user, when the replica is no longer overloaded.

This patch does not remove the replica write failures, because we still may reach a full backlog when more view updates are generated after the coordinator check is performed and before the write reaches the replica.

Fixes scylladb/scylladb#17426

Closes scylladb/scylladb#18334

* github.com:scylladb/scylladb:
  mv: test the view update behavior
  mv: add test for admission control
  storage_proxy: return overloaded_exception instead of throwing
  mv: reject user requests by coordinator when a replica is overloaded by MVs
2024-08-27 12:50:34 +02:00
Aleksandra Martyniuk
f38bb6483a test: add test to ensure repair won't fail with uninitialized bm 2024-08-27 11:37:50 +02:00
Aleksandra Martyniuk
d8e4393418 repair: throw if batchlog manager isn't initialized
repair_service::repair_flush_hints_batchlog_handler may access batchlog
manager while it is uninitialized.

Batchlog manager cannot be initialized before repair as we have the
dependencies chain:
repair_service -> storage_service::join_cluster -> batchlog_manager.

Throw if batchlog manager isn't initialized. That won't cause repair
to fail.
2024-08-27 11:22:28 +02:00
Botond Dénes
5c0f6d4613 Merge 'Make Summary support histogram with infinite bucket vlaues' from Amnon Heiman
This series fixes an issue where histogram Summaries return an infinite value.

It updated the quantile calculation logic to address cases where values fall into the infinite bucket of a histogram.
Now, instead of returning infinite (max int), the calculation will return the last bucket limit, ensuring finite outputs in all cases.

The series adds a test for summaries with a specific test case for this scenario.

Fixes #20255
Need backport to 6.0, 6.1 and 2023.1 and above

Closes scylladb/scylladb#20257

* github.com:scylladb/scylladb:
  test/estimated_histogram_test Add summary tests
  utils/histogram.hh: Make summary support inifinite bucket.
2024-08-27 10:33:54 +03:00
Kefu Chai
ae7ce38721 build: print out the default value of options
instead of using the default `argparse.HelpFormatter`, let's
use `ArgumentDefaultsHelpFormatter`, so that the default values
of options are displayed in the help messages.

this should help developer understand the behavior of the script
better.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20262
2024-08-27 10:04:31 +03:00
Kefu Chai
e2747e4bb5 build: cmake: add dist-check target
to achieve feature parity with our existing building system, we
need to implement a new build target "dist-check" in the CMake-based
building system.

in this change, "dist-check" is added to CMake-based building system.
unlike the rules generated by `configure.py`, the `dist-check` target
in CMake depends on the dist-*-rpm targets. the goal is to enable user
to test `dist-check` without explicitly building the artifacts being
tested.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20266
2024-08-27 10:03:41 +03:00
Kefu Chai
ea612e7065 docs: install poetry>=1.8.0
in 57def6f1, we specified "package-mode" for poetry, but this option
was introduced in poetry 1.8.0, as the "non-package" mode support.
see https://github.com/python-poetry/poetry/releases/tag/1.8.0

this change practically bumps up the minimum required poetry version
to 1.8.0, we did update `pyproject.tombl` to reflect this change.
but wefailed to update the `Makefile`.

in this change, we update `Makefile` to ensure that user which happens
have an older version of poetry can install the version which supports
this version when running `make setupenv`.

Refs scylladb/scylladb#20284
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20286
2024-08-27 09:20:09 +03:00
Yaniv Michael Kaul
022eb25d98 tools/toolchain/README.md: fix wording
Forgot to add that 'reg' tool is also needed.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#20287
2024-08-27 09:18:23 +03:00
Kefu Chai
5cffb23aa3 scylla-gdb.py: use chunked_fifo to represent _sink._pending_io
we switched from `circular_buffer` to `chunked_fifo` to present
`io_sink::_pending_io` in the latest seastar now. to be prepared for
this change, let's

* add `chunked_fifo` class in `scylla-gdb.py`.
* use `circular_buffer` as a fallback of `chunked_fifo`. instead of
  doing this the other way around, we try to send the message that
  the latest seastar uses `chunked_fifo`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20280
2024-08-27 08:44:56 +03:00
Andrei Chekun
fd51332978 test.py: Add parameter to control the pool size from the command line
Add parameter --cluster-pool-size that can control pool size for all
PythonTestSuite tests. By default, the pool size set to 10 for most of
the suites, but this is too much for laptops. So this parameter can be
used to lower the pool size and not to freeze the system. Additionally,
the environment variable CLUSTER_POOL_SIZE was added for a convenient way
to limit pool size in the system without the need to provide each time an
additional parameter.

Related: https://github.com/scylladb/scylladb/pull/20276

Closes scylladb/scylladb#20289
2024-08-26 19:55:41 +03:00
Avi Kivity
0acfa4a00d Merge 'abstract_replication_strategy: make get_ranges async' from Benny Halevy
To prevent stalls due to large number of tokens.
For example, large cluster with say 70 nodes can have
more than 16K tokens.

Fixes #19757

Closes scylladb/scylladb#19758

* github.com:scylladb/scylladb:
  abstract_replication_strategy: make get_ranges async
  database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param
  compaction: task_manager_module: open code maybe_get_keyspace_local_ranges
  alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder
  alternator: ttl: can pass const gms::gossiper& to ranges_holder
  alternator: ttl: ranges_holder_primary: unconstify _token_ranges member
  alternator: ttl: refactor token_ranges_owned_by_this_shard
2024-08-26 16:56:18 +03:00
Botond Dénes
6d633e89ef Merge 'update CODEOWNERS' from Piotr Smaron
Removed people that no longer contribute to the scylladb.git and added/substituted reviewers responsible for maintaining the frontend components.

No need to backport, this is just an information for the github tool.

Closes scylladb/scylladb#20136

* github.com:scylladb/scylladb:
  codeowners: add appropriate reviewers to the cluster components
  codeowners: add appropriate reviewers to the frontend components
  codeowners: fix codeowner names
  codeowners: remove non contributors
2024-08-26 16:44:39 +03:00
Botond Dénes
4505b14fd6 Merge 'table_helper: complete coroutinization' from Avi Kivity
table_helper has some quite awkward code, improve it a little.

Code cleanup, so no reason to backport.

Closes scylladb/scylladb#20194

* github.com:scylladb/scylladb:
  table_helper: insert(): improve indentation
  table_helper: coroutinize insert()
  table_helper: coroutinize cache_table_info()
  table_helper: extract try_prepare()
2024-08-26 13:43:17 +03:00
Botond Dénes
b2c07c9b6f Merge 'compaction: change compaction stop reason ' from Aleksandra Martyniuk
Currently "table removal" is logged as a reason of compaction stop for table drop,
tablet cleanup and tablet split. Modify log to reflect the reason.

Closes scylladb/scylladb#20042

* github.com:scylladb/scylladb:
  test: add test to check compaction stop log
  compaction: fix compaction group stop reason
2024-08-26 13:40:07 +03:00
Kefu Chai
4d516a8363 tools/testing: add cqlsh rpm
we need to test the installation of cqlsh rpm. also, we should use the
correct paths of the generated rpm packages.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-26 11:33:57 +08:00
Kefu Chai
baee15390e tools/testing: adapt to cmake build directory
cmake uses a different arrangement, so let's check for the existence
of the build directory and fallback to cmake's build directory.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-26 11:33:57 +08:00
Kefu Chai
b802c000e1 tools/testing: test with rockylinux:9 not centos:7
the centos image repos on docker has been deprecated, and the
repo for centos7 has been removed from the main CentOS servers.

so we are either not able to install packages from its default repo,
without using the vault mirror, or no longer to pull its image
from dockerhub.

so, in this change

* we switch over to rockylinux:9, which is the latest stable release
  of rockylinux, and rockylinux is a popular clone of RHEL, so it
  matches our expectation of a typical use case of scylla.
* use dnf to manage the packages. as dnf is the standard way to manage
  rpm packages in modern RPM-based distributions.
* do not install deltarpm.
  delta rpms are was not supported since RHEL8, and the `deltarpm` package
  is not longer available ever since. see
  https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html-single/considerations_in_adopting_rhel_8/index#ref_the-deltarpm-functionality-is-no-longer-supported_notable-changes-to-the-yum-stack
  as a sequence, this package does not exist in Rockylinux-9.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-26 11:33:53 +08:00
Kefu Chai
00dad27f67 tools/testing: correct the paths to rpm packages and SCYLLA-*-FILE
when building with the rules generated from `configure.py`,
these files are located under tools' own build directory.
so correct them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-26 11:19:24 +08:00
Kefu Chai
86ef63df92 dist-check: add :z option when mapping volume
if SELinux is enabled on the host, we'd have following failure when
running `dist-check.sh`:
```
+ podman run -i --rm -v /home/kefu/dev/scylladb:/home/kefu/dev/scylladb docker.io/centos:7 /bin/bash -c 'cd /home/kefu/dev/scylladb && /home/kefu/dev/scylladb/tools/testing/dist-check/docker.io/centos-7.sh --mode debug'
/bin/bash: line 0: cd: /home/kefu/dev/scylladb: Permission denied
```

to address the permission issue, we need to instruct podman to
relabel the shared volume, so that the container can access
the shared volume.

see also https://docs.podman.io/en/stable/markdown/podman-pod-create.1.html#volume-v-source-volume-host-dir-container-dir-options

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-26 11:15:40 +08:00
Kefu Chai
8ef26a9c8c build: cmake: add "test" target
before this change, none of the target generated by CMake-based
building system runs `test.py`. but `build.ninja` generated directly
by `configure.py` provides a target named `test`, which runs the
`test.py` with the options passed to `configure.py`.

to be more compatible with the rules generated by `configure.py`,
in this change

* do not include "CTest" module, as we are not using CTest for
  driving tests. we use the homebrew `test.py` for this purpose.
  more importantly, the target named "test" is provided by "CTest".
  so in order to add our own "test" target, we cannot use "CTest"
  module.
* add a target named "test" to run "test.py".
* add two CMake options so we can customize the behavior of "test.py",
  this is to be compatible with the existing behavior of `configure.py`.

Refs scylladb/scylladb#2717

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20263
2024-08-25 21:45:13 +03:00
Avi Kivity
72a85e3812 Merge 'Integrated backup' from Pavel Emelyanov
This adds minimal implementation of the start-backup API call.

The method starts a task that uploads all files from the given keyspace's snapshot to the requested endpoint/bucket. Arguments are:
- endpoint -- the ID in object_store.yaml config file
- bucket -- the target bucket to put objects into
- keyspace -- the keyspace to work on
- snapshot -- the method assumes that the snapshot had been already taken and only copies sstables from it

The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion (hint: it's good to have non-zero TTL value to make sure fast backups don't finish before the caller manages to call wait_task API).

Sstables components are scanned for all tables in the keyspace and are uploaded into the /bucket/${cf_name}/${snapshot_name}/ path.

refs: #18391

Closes scylladb/scylladb#19890

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add backup integration
  docs: Document the new backup method
  test/object_store: Test that backup task is abortable
  test/object_store: Add simple backup test
  test/object_store: Move format_tuples()
  test/pylib: Add more methods to rest client
  backup-task: Make it abortable (almost)
  code: Introduce backup API method
  database: Export parse_table_directory_name() helper
  database: Introduce format_table_directory_name() helper
  snapshot-ctl: Add config to snapshot_ctl
  snapshot-ctl: Add sstables::storage_manager dependency
  snapshot-ctl: Maintain task manager module
  snapshot-ctl: Add "snapshots" logger
  snapshot-ctl: Outline stop() method and constructor
  snapshot-ctl: Inline run_snapshot_list<>
  test/cql_test_env: Export task manager from cql test env
  task_manager: Print task ttl on start (for debugging)
  docs: Update object_storage.md with AWS_ environment
  docs: Restructure object_storage.md
2024-08-25 20:19:10 +03:00
Kefu Chai
f8931a4578 build: cmake: add "dist" target
since the rules generated by `configure.py` has this target, we need
to have an equivalent target as well in CMake-based buidling system.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20265
2024-08-25 20:18:12 +03:00
Andrei Chekun
f54b7f5427 test.py: Increase pool size
Increase pool size changes were recently reverted because of the flakiness for the test_gossip_boot test. Test started
to fail on adding the node to the cluster without any issues in the Scylla log file. In test logs it looked like the
installation process for the new node just hanged. After investigating the problem, I've found out that the issue is that
test.py was draining the io_executor pool for cleaning the directory during install that was set to eight workers. So
to fix the issue, io_executor pool should be increased to more or less the same ratio as it was: doubled cluster pool size.

Closes scylladb/scylladb#20276
2024-08-25 19:59:18 +03:00
Kefu Chai
a0688b29ea replication_strategy: add fmt::formatter<replication_strategy_type>
so that we can use {fmt} with it without the help of fmt::streamed.
also since we have a proper formatter for replication_strategy_type,
let's implement
`formatter<vnode_effective_replication_map::factory_key>`
as well.

since there are no callers of these two operator<<, let's drop
them in this change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20248
2024-08-25 19:34:52 +03:00
Kefu Chai
c88b63ce13 github: use clang-20 in clang-nightly workflow
since clang 19 has been branched. let's track the development brach,
which is clang 20.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20279
2024-08-25 19:31:43 +03:00
Benny Halevy
686a8f2939 abstract_replication_strategy: make get_ranges async
To prevent stalls due to large number of tokens.
For example, large cluster with say 70 nodes can have
more than 16K tokens.

Fixes #19757

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:57:34 +03:00
Benny Halevy
2bbbe2a8bc database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param
Prepare for making the function async.
Then, it will need to hold on to the erm while getting
the token_ranges asynchronously.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:55:33 +03:00
Benny Halevy
ea5a0cca10 compaction: task_manager_module: open code maybe_get_keyspace_local_ranges
It is used only here and can be simplified by
checking if the keyspace replication strategy
is per table by the caller.

Prepare for making get_keyspace_local_ranges async.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Benny Halevy
824bdf99d2 alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder
Add static `make` methods to ranges_holder_{primary,secondary}
and use them to make the ranges objects and pass them
to `token_ranges_owned_by_this_shard`, rather than letting
token_ranges_owned_by_this_shard invoke the right constructor
of the ranges_holder class.

Prepare for making `make` async.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Benny Halevy
b2abbae24b alternator: ttl: can pass const gms::gossiper& to ranges_holder
There's no need to pass a mutable reference to
the gossiper.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Benny Halevy
333c0d7c88 alternator: ttl: ranges_holder_primary: unconstify _token_ranges member
To allow the class to be nothrow_move_constructable.
Prepare for returning it as a future value.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Benny Halevy
d385219a12 alternator: ttl: refactor token_ranges_owned_by_this_shard
Rather than holding a variant member (and defining
both ranges_holder_{primary,secondary} in both
specilizations of the class, just make the internal
ranges_holder class first-class citizens
and parameterize the `token_ranges_owned_by_this_shard`
template by this class type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-25 10:25:32 +03:00
Avi Kivity
c4dd21de38 repair: row_level: coroutinize repair_reader::close() 2024-08-24 00:36:48 +03:00
Avi Kivity
b1dd470533 repair: row_level: coroutinize repair_reader::end_of_stream() 2024-08-24 00:35:59 +03:00
Avi Kivity
7ce76fd0ea repair: row_level: coroutinize sink_source_for_repair::close()
The repeat() loop translates to almost nothing.
2024-08-24 00:30:02 +03:00
Avi Kivity
168a018e45 repair: row_level: coroutinize sink_source_for_repair::get_sink_source() 2024-08-24 00:19:12 +03:00
Avi Kivity
6b370d8154 table_helper: insert(): improve indentation
Restore after coroutinization.
2024-08-24 00:08:05 +03:00
Avi Kivity
ecd7702007 table_helper: coroutinize insert()
Improves readability. The do_with() ensures it's at least as performant
(though it's not in any fast path).
2024-08-24 00:08:05 +03:00
Avi Kivity
980ec2f925 table_helper: coroutinize cache_table_info()
After we extracted try_prepare(), this is fairly simple, and improves
readability.
2024-08-24 00:08:05 +03:00
Avi Kivity
4e44a15d4d table_helper: extract try_prepare()
table_helper::cache_table_info() is fairly convoluted. It cannot be
easily coroutinized since it invokes asynchronous functions in a
catch block, which isn't supported in coroutines. To start to break it
down, extract a block try_prepare() from code that is called twice. It's
both a simplification and a first step towards coroutinization.

The new try_prepare() can return three values: `true` if it succeeded,
`false` if it failed and there's the possibility of attempting a fallback,
and an exception on error.
2024-08-24 00:08:05 +03:00
Lakshmi Narayanan Sreethar
4823a1e203 test/pylib: fix keyspace_compaction method
The `keyspace_compaction` method incorrectly appends the column family
parameter to the URL using a regular string, `"?cf={table}"`, instead of
an f-string, `f"?cf={table}"`. As a result, the column family name is
sent as `{table}` to the server, causing the compaction request to fail.
Fix this issue by passing the parameter to the POST request using a
dictionary instead of appending it to the URL.

Fixes #20264

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#20243
2024-08-23 15:20:10 +03:00
Kefu Chai
4a405b0af9 perf/perf_sstable: enumerate sstables when loading them
before this change, we use the default options when creating `test_env`,
and the default options enable `use_uuid`. but the modes of
`perf-sstables` involving reads assumes that the identifiers are
deterministic. so that the previously written sstables using the "write"
mode can be read with the modes like "index_read", which just uses
`test_env::make_sstable()` in `load_sstables()`, and under the hood,
`test_env::make_sstable()` uses `test_env::new_generation()` for
retrieving the next identifier of sstable. when using integer-base
identifier, this works. as the sstable identifiers are generated
from a monotonically increasing integer sequence, where the identifiers
are deterministic. but this does not apply anymore when the UUID-based
identifiers are used, as the identifiers are generated with a
pseudorandom generator of UUID v1.

in this change, to avoid relying on the determinism of the integer-based
sstable identifier generation, we enumerate sstables by listing the
given directory, and parse the path for their identifier.

after this change, we are able to support the UUID-based sstable
identifier.

another option is disable the UUID-based sstable identifier when
loading sstables. the upside is that this approach is minimal and
straightforward. but the downside is that it encodes the assumption
in the algorithm implicitly, and could be confusing -- we create
a new generation for loading an existing sstable with this generation.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20183
2024-08-23 10:39:24 +03:00
Pavel Emelyanov
d1ac58f088 api: Get compaction througput via compaction manager
Now the endpoint hanler gets the value from db::config which is not nice
from several perspectives. First, it gets config (ab)using database.
Second, it's compaction manager that "knows" its throughput, global
config is the initial source of that information.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20173
2024-08-23 10:33:03 +03:00
Pavel Emelyanov
38edbebb10 compaction_manager: Keep flush-all-before-major option on own config
Currently the major compaction task impl grabs this (non-updateable)
value from db::config. That's not good, all services including
compaction manager have their own configs from which they take options.
Said that, this patch puts the said option onto
compaction_manager::config, makes use of it and configures one from
db::config on start (and tests).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20174
2024-08-23 10:31:55 +03:00
Botond Dénes
15fdc3f6cc Merge 'Add ability to list S3 bucket contents' from Pavel Emelyanov
This is prerequisite for "restore from object storage" feature. In order to collect the sstables in bucket one would need to list the bucket contents with the given prefix. The ListObjectsV2 provides a way for it and here's the respective s3::client extension.

Closes scylladb/scylladb#20120

* github.com:scylladb/scylladb:
  test: Add test for s3::client::bucket_lister
  s3_client: Add bucket lister
  s3_client: Encode query parameter value for query-string
2024-08-23 10:16:07 +03:00
Kefu Chai
7f65ee3270 dbuild: pass --tty only if --interactive
in 947e2814, we pass `--tty` as long as we are using podman _or_
we are in interactive mode. but if we build the tree using podman
using jenkins, we are seeing that ninja is displaying the output
as if it's in an interactive mode. and the output includes ASCII
escape codes. this is distracting.

the reason is that we

* are using podman, and
* ninja tells if it should displaying with a "smart" terminal by
  checking istty() and the "TERM" environmental variable.

so, in this change, we add --tty only if

* we are in the interactive mode.
* or stdin is associated with a terminal. this is the use case
  where user uses dbuild to interactively build scylla

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20196
2024-08-23 09:30:20 +03:00
Kefu Chai
ee19bbed05 test: do not define boost_test_print_type() for types with operator<<
in 30e82a81, we add a contraint to the template parameter of
boost_test_print_type() to prevent it from being matched with
types which can be formatted with operator<<. but it failed to
work. we still have test failure reports like:

```
[Exception] - critical check ['s', 's', 't', '_', 'm', 'r', '.', 'i', 's', '_', 'e', 'n', 'd', '_', 'o', 'f', '_', 's', 't', 'r', 'e', 'a', 'm', '(', ')'] has failed
```

this is not what we expect. the reason is that we passed the template
parameters to the `has_left_shift` trait in the wrong order, see
https://live.boost.org/doc/libs/1_83_0/libs/type_traits/doc/html/boost_typetraits/reference/has_left_shift.html.
we should have passed the lhs of operator<< expression as first
parameter, and rhs the second.

so, in this change, we correct the type constraint by passing the
template parameter in the right order, now the error message looks
better, like:

```
test/boost/mutation_query_test.cc(110): error: in "test_partition_query_is_full": check !partition_slice_builder(*s) .with_range({}) .build() .is_full() has failed
```

it turns out boost::transformed_range<> is formattable with operator<<,
as it fulfills the constraints of `boost::has_left_shift<ostream, R>`,
but when printing it, the compiler fails when it tries to insert the
elements in the range to the output stream.

so, in order to workaround this issue, we add a specialization for
`boost::transformed_range<F, R`.

also, to improve the readability, we reimplement the `has_left_shift<>`
as a concept, so that it's obvious that we need to put both the output
stream as the first parameter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20233
2024-08-23 09:26:22 +03:00
Amnon Heiman
644e6f0121 test/estimated_histogram_test Add summary tests
This patch adds tests for summary calculation. It adds two tests, the
first is a basic calculation for P50, P95, P99 by adding 100 elements
into 20 buckets.

The second test look that if elements are found in the infinite bucket,
the result would be the lower limit (33s) and not infinite.

Relates to #20255

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-08-22 23:34:24 +03:00
Amnon Heiman
011aa91a8c utils/histogram.hh: Make summary support inifinite bucket.
This patch handles an edge cases related to The infinite bucket  
limit.

Summaries are the P50, P95, and P99 quantiles.

The quantiles are calculated from a histogram; we find the bucket and
return its upper limit.

In classic histograms, there is a notion of the infinite bucket;
anything that does not fall into the last bucket is considered to be
infinite;

with quantile, it does not make sense. So instead of reporting infinite
we'll report the bucket lower limit.

Fixes #20255

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-08-22 23:34:24 +03:00
Kefu Chai
39dd088374 test: include used headers
before this change, clang 20 fails to build the tree, like:

```
/home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/database_test.dir/Debug/database_test.cc.o -MF test/boost/CMakeFiles/database_test.dir/Debug/database_test.cc.o.d -o test/boost/CMakeFiles/database_test.dir/Debug/database_test.cc.o -c /home/kefu/dev/scylladb/test/boost/database_test.cc
/home/kefu/dev/scylladb/test/boost/database_test.cc:539:29: error: invalid use of incomplete type 'schema_builder'
  539 |                     return *schema_builder(ks_name, cf_name)
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/schema/schema.hh:115:7: note: forward declaration of 'schema_builder'
  115 | class schema_builder;
      |       ^
```
and
```
/home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/group0_cmd_merge_test.dir/Debug/group0_cmd_merge_test.cc.o -MF test/boost/CMakeFiles/group0_cmd_merge_test.dir/Debug/group0_cmd_merge_test.cc.o.d -o test/boost/CMakeFiles/group0_cmd_merge_test.dir/Debug/group0_cmd_merge_test.cc.o -c /home/kefu/dev/scylladb/test/boost/group0_cmd_merge_test.cc
/home/kefu/dev/scylladb/test/boost/group0_cmd_merge_test.cc:78:18: error: member access into incomplete type 'db::config'
   78 |     cfg.db_config->commitlog_segment_size_in_mb(1);
      |                  ^
/home/kefu/dev/scylladb/data_dictionary/data_dictionary.hh:28:7: note: forward declaration of 'db::config'
   28 | class config;
      |       ^
1 error generated.
```
and
```
`FAILED: test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o
/home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o -MF test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o.d -o test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o -c /home/kefu/dev/scylladb/test/boost/repair_test.cc
/home/kefu/dev/scylladb/test/boost/repair_test.cc:149:45: error: use of undeclared identifier 'global_schema_ptr'
  149 |         co_await e.db().invoke_on_all([gs = global_schema_ptr(gen.schema())](replica::database& db) -> future<> {
      |                                             ^
/home/kefu/dev/scylladb/test/boost/repair_test.cc:150:62: error: use of undeclared identifier 'gs'
  150 |             co_await db.add_column_family_and_make_directory(gs.get(), replica::database::is_new_cf::yes);
      |                                                              ^
2 errors generated.
```

because we are using incomplete types when their complete definitions
are required.
so, in this change, we include the headers for their complete definition.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20239
2024-08-22 20:51:38 +03:00
Kefu Chai
969cbb75ce tools/scylla-nodetool: add backup integration
as we have an API for backup a keyspace, let's expose this feature
with nodetool. so we can exercise it without the help of scylla-manager
or 3rd-party tools with a user-friendly interface.

in this change:

* add a new subcommand named "backup" to nodetool
* add test to verify its interaction with the API server
* add two more route to the REST API mock server, as
  the test is using /task_manager/wait_task/{task_id} API.
  for the sake of completeness, the route for
  /task_manager/{part1} is added as well.
* update the document accordingly.
* the bash completion script is updated accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-22 19:48:06 +03:00
Pavel Emelyanov
245cc852dd docs: Document the new backup method
Add the new /storage_service/backup endpoint to object_storage.md as yet
another way to use S3 from Scylla.
2024-08-22 19:47:06 +03:00
Pavel Emelyanov
de87450453 test/object_store: Test that backup task is abortable
It starts similarly to simpl backup test, but injects a pause into the
task once a single file is scheduled for upload, then aborts the task,
waits for it to fail, and check that _not_ all files are uploaded.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 19:47:06 +03:00
Pavel Emelyanov
f8d894bc23 test/object_store: Add simple backup test
The test shows how to backup a keyspace:

- flush
- take snapshot
- start backup with the new API method
- wait for the task to finish

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 19:47:06 +03:00
Pavel Emelyanov
47e49e6dec test/object_store: Move format_tuples()
There will soon appear a new .py file in the suite that will want to use
this helper too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 19:47:06 +03:00
Pavel Emelyanov
d83d585709 test/pylib: Add more methods to rest client
Namely:
- POST /storage_service/snapshots to take snapshot on a ks
- GET /task_manager/get_task_status/{id} to get status of a running task
- GET /task_manager/wait_task/{id} to wait for a task to finish
- POST /task_manager/abort_task/{id} to abort a running task

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 19:47:06 +03:00
Pavel Emelyanov
ed6e6700ab backup-task: Make it abortable (almost)
Make the impl::is_abortable() return 'yes' and check the impl::_as in
the files listing loop. It's not real abort, since files listing loop is
expected to be fast and most of the time will be spent in s3::client
code reading data from disk and sending them to S3, but client doesn't
support aborting its requests. That's some work yet to be done.

Also add injection for future testing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 19:47:06 +03:00
Pavel Emelyanov
a812f13ddd code: Introduce backup API method
The method starts a task that uploads all files from the given
keyspace's snapshot to the requested endpoint/bucket. The task runs in
the background, its task_id is returned from the method once it's
spawned and it should be used via /task_manager API to track the task
execution and completion (hint: it's good to have non-zero TTL value to
make sure fast backups don't finish before the caller manages to call
wait_task API).

If snapshot doesn't exist, nothing happens (FIXME, need to return back
an error in that case).

If endpoint is not configured locally, the API call resolves with
bad-request instantly.

Sstables components are scanned for all tables in the keyspace and are
uploaded into the /bucket/${cf_name}/${snapshot_name}/ path.

Task is not abortable (FIXME -- to be added) and doesn't really report
its progress other than running/done state (FIXME -- to be added too).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 19:47:06 +03:00
Pavel Emelyanov
f7b380d53b database: Export parse_table_directory_name() helper
There's parse_table_directory_name() static helper in database.cc code
that is used by methods that parse table tree layout for snapshot.
Export this helper for external usage and rename to fit the format_...
one introduced by previous patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:57:48 +03:00
Pavel Emelyanov
33962946fc database: Introduce format_table_directory_name() helper
The one makes table directory (not full path) out of table name and
uuid. This is to be symmetrical with yet another helper that converts
dirctory name back to table name and uuid (next patch)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:57:48 +03:00
Pavel Emelyanov
dff51fd58c snapshot-ctl: Add config to snapshot_ctl
Pretty much all services in Scylla have their own config. Add one to
snapshot-ctl too, it will be populated later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:57:20 +03:00
Pavel Emelyanov
f37857e20a snapshot-ctl: Add sstables::storage_manager dependency
The storage_manager maintains set of clients to configured object
storage(s). The snapshot ctl is going to spawn tasks that will talk to
those storages, thus it needs the storage manager to get the clients
from.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:08:21 +03:00
Pavel Emelyanov
362331c89b snapshot-ctl: Maintain task manager module
This service is going to start tasks managed by task manager. For that,
it should have its module set up and registered.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:08:21 +03:00
Pavel Emelyanov
4ae89a9c81 snapshot-ctl: Add "snapshots" logger
Will be used later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:08:21 +03:00
Pavel Emelyanov
90c794172b snapshot-ctl: Outline stop() method and constructor
These two are going to grow, keep them out not to pollute the header

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:08:21 +03:00
Pavel Emelyanov
96946a4b11 snapshot-ctl: Inline run_snapshot_list<>
This helper will be used by a code from another .cc file, so the
template needs to be in header for smooth instantiation

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:08:21 +03:00
Pavel Emelyanov
4e73b4d8ad test/cql_test_env: Export task manager from cql test env
To be used by one of the next patches

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:08:21 +03:00
Pavel Emelyanov
4b86eede1f task_manager: Print task ttl on start (for debugging)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 14:08:21 +03:00
Pavel Emelyanov
8949d73cd9 docs: Update object_storage.md with AWS_ environment
Commit 51c53d8db6 made it possible to configure object storage endpoint
creds via environment. Mention this in the docs.
2024-08-22 14:08:21 +03:00
Pavel Emelyanov
d3f9865d2f docs: Restructure object_storage.md
Currently the doc assumes that object storage can only be used to keep
sstables on it. It's going to change, restructure the doc to allow for
more usage scenarios.
2024-08-22 14:08:21 +03:00
Pavel Emelyanov
4e2d7aa2a2 test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works
Currently it doesn't, one of the node crashes with std::out_of_range
exception and meaningless calltrace

[Botond]: this test checks the case of reading a partition via
MUTATION_FRAGMENTS from a node which doesn't own said partition.

refs: #18786

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-22 06:24:06 -04:00
Botond Dénes
46563d719f replica/mutation_dump: enfore pinning of effective replication map
By making it a required argument, making sure the topology version is
pinned for the duration of the query. This is needed because mutation
dump queries bypass the storage proxy, where this pinning usually takes
place. So it has to be enforced here.
2024-08-22 06:24:06 -04:00
Botond Dénes
de5329157c replica/mutation_dump: handle un-owned tokens (with tablets)
When using tablets, the replica-side doesn't handle un-owned tokens.
table::shard_for_reads() will just return 0 for un-owned tokens, and a
later attempt at calling table::storage_group_for_token() with said
un-owned token will cause a crash (std::terminate due to
std::out_of_range thrown in noexcept context).
The replicas rely on the coordinator to not send stray requests, but for
select from mutation_fragments(table) queries, there is no coordinator
side who could do the correct dispatching. So do this in
mutation_dump(), just creating empty readers for un-owned tokens.
2024-08-22 03:06:55 -04:00
Łukasz Paszkowski
a11d19f321 test_query.py: Test reverse queries with clustering key bounds
Since a native reversed format is used for reversed queries,
additional tests with restrictions on clustering keys are required
to capture possible errors like https://github.com/scylladb/scylladb/issues/20191
earlier than in dtests.

Add parametrization to the following tests:
  + test_query_reverse
  + test_query_reverse_paging
to accept a comparison operator used in selection criteria for a Query
operation.
2024-08-21 14:21:34 +02:00
Aleksandra Martyniuk
9b7c837106 test: add test to check compaction stop log 2024-08-21 12:42:37 +02:00
Aleksandra Martyniuk
5005e19de7 compaction: fix compaction group stop reason
compaction_manager::remove passes "table removal" as a reason
of stopping ongoing compactions, but currently remove method
is also called when a tablet is migrated or split.

Pass the actual reason of compaction stop, so that logs aren't
misleading.
2024-08-21 12:42:09 +02:00
Avi Kivity
2ef5b5e4fe Revert "[test.py] Increase pool size for CI"
This reverts commit cc428e8a36. It causes
may spurious CI failures while nodes are being torn down. Revert it until
the root cause is fixed, after which it can be reinstated.

Fixes #20116.
2024-08-21 13:21:08 +03:00
Benny Halevy
f40d06b766 table: calculate_tablet_count: use sg_manager storage_groups size
Now, when each shard storage_group_manager keeps
only the storage_groups for the tablet replica it owns,
we can simple return the storage_group map size
instead of counting the number of tablet replicas
mapped to this shard.

Add a unit test that sums the tablet count
on all shards and tests that the sum is equal
to the configured default `initial_tablets.

Fixes #18909

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#20223
2024-08-21 11:01:58 +02:00
Tomasz Grabiec
a3a97e8aad Merge 'schema_tables: calculate_schema_digest: prevent stalls due to large m…' from Benny Halevy
…utations vector

With a large number of table the schema mutations
vector might get big enoug to cause reactor stalls when freed.

For example, the following stall was hit on
2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables:
```
 (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730
 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799
```

This change returns a mutations generator from
the `map` lambda coroutine so we can process them
one at a time, destroy the mutations one at a time, and by that, reducing memory footprint and preventing reactor stalls.

Fixes #18173

Closes scylladb/scylladb#18174

* github.com:scylladb/scylladb:
  schema_tables: calculate_schema_digest: filter the key earlier
  schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector
2024-08-20 21:24:38 +02:00
Łukasz Paszkowski
f29d7ffa81 alternator::do_query Add additional trace log
Additional log prints information on the read query being executed.
It lists information like whether the query is a reversed one or
not, and table_schema and query_schema versions.
2024-08-20 20:56:15 +02:00
Łukasz Paszkowski
727cbd8151 alternator::do_query: Use native reversed format
When executing reversed queries, a native revered format shall be used.
Therefore the table schema and the clustering key bounds are reversed
before a partition slice and a read command are constructed.

Similarly as for cql3::statements::select_statement.
2024-08-20 20:56:15 +02:00
Łukasz Paszkowski
3720e8aabe alternator::do_query Rename schema with table_schema
In order to increase readability, a schema variable is renamed to
a table_schema to emphesize a table schema is passed to the function
and used across it.

Allows us to introduce a query_schema variable in the next patch.
2024-08-20 20:56:06 +02:00
Aleksandra Martyniuk
9d9414a75d replica: add/remove table atomically
Currently, database::tables_metadata::add_table needs to hold a write
lock before adding a table. So, if we update other classes keeping
track of tables before calling add_table, and the method yields,
table's metadata will be inconsistent.

Set all table-related info in tables_metadata::add_table_helper (called
by add_table) so that the operation is atomic.

Analogically for remove_table.

Fixes: #19833.

Closes scylladb/scylladb#20064
2024-08-20 20:53:32 +03:00
Kamil Braun
5c9efdff50 Merge 'raft: store_snapshot_descriptor to use actually preserved items number when truncating the local log table' from Sergey Zolotukhin
io_fiber/store_snapshot_descriptor now gets the actual number of items
preserved when the log is truncated, fixing extra entries remained after
log snapshot creation. Also removes incorrect check for the number of
truncated items in the
raft_sys_table_storage::store_snapshot_descriptor.

Minor change: Added error_injection test API for changing snapshot thresholds settings.

Fixes scylladb/scylladb#16817
Fixes scylladb/scylladb#20080

Closes scylladb/scylladb#20095

* github.com:scylladb/scylladb:
  raft:  Ensure const correctness in applier_fiber.
  raft: Invoke store_snapshot_descriptor with actually preserved items.
  raft: Use raft_server_set_snapshot_thresholds in tests.
  raft: Fix indentation in server.cc
  raft: Add a test to check log size after truncation.
  raft: Add raft_server_set_snapshot_thresholds injection.
  utils: Ensure const correctness of injection_handler::get().
2024-08-20 18:15:30 +02:00
Tomasz Grabiec
ff52527c54 Merge 'repair: do_rebuild_replace_with_repair: use source_dc only when safe' from Benny Halevy
It is unsafe to restrict the sync nodes for repair to the source data center if it has too low replication factor in network_topology_replication_strategy, or if other nodes in that DC are ignored.

Also, this change restricts the usage of source_dc to `network_topology` and `everywhere_topology`
strategies, as with simple replication strategy
there is no guarantee that there would be any
more replicas in that data center.

Fixes #16826

Reproducer submitted as https://github.com/scylladb/scylla-dtest/pull/3865
It fails without this fix and passes with it.

* Requires backport to live versions.  Issue hit in the filed with 2022.2.14

Closes scylladb/scylladb#16827

* github.com:scylladb/scylladb:
  repair: do_rebuild_replace_with_repair: use source_dc only when safe
  repair: replace_with_repair: pass the replace_node downstream
  repair: replace_with_repair: pass ignore_nodes as a set of host_id:s
  repair: replace_rebuild_with_repair: pass ks_erms from caller
  nodetool: rebuild: add force option
  Add and use utils::optional_param to pass source_dc
2024-08-20 16:13:23 +02:00
Sergey Zolotukhin
13b3d3a795 raft: Ensure const correctness in applier_fiber.
Add 'const' to non mutable varibales in server_impl::applier_fiber() function.
2024-08-20 15:24:00 +02:00
Sergey Zolotukhin
c3e52ab942 raft: Invoke store_snapshot_descriptor with actually preserved items.
- raft_sys_table_storage::store_snapshot_descriptor now receives a number of
preserved items in the log, rather than _config.snapshot_trailing value;
- Incorrect check for truncated number of items in store_snapshot_descriptor
was removed.

Fixes scylladb/scylladb#16817
Fixes scylladb/scylladb#20080
2024-08-20 15:22:49 +02:00
Sergey Zolotukhin
922e035629 raft: Use raft_server_set_snapshot_thresholds in tests.
Replace raft_server_snapshot_reduce_threshold with raft_server_set_snapshot_thresholds in tests
as raft_server_set_snapshot_thresholds fully covers the functionality of raft_server_snapshot_reduce_threshold.
2024-08-20 15:08:49 +02:00
Sergey Zolotukhin
00a1d3e305 raft: Fix indentation in server.cc 2024-08-20 15:08:45 +02:00
Sergey Zolotukhin
b6de8230a9 raft: Add a test to check log size after truncation.
The test checks that snapshot_trailing_size parameter is taken
into consideration when the log system table is truncated.
Test for  scylladb#16817
2024-08-20 14:15:50 +02:00
Sergey Zolotukhin
9dfa041fe1 raft: Add raft_server_set_snapshot_thresholds injection.
Use error injection to allow overriding following snapshot threshold settings:
- snapshot_threshold
- snapshot_threshold_log_size
- snapshot_trailing
- snapshot_trailing_size
2024-08-20 14:15:50 +02:00
Sergey Zolotukhin
c5da0775f2 utils: Ensure const correctness of injection_handler::get().
Make utils::error_injection::injection_handler::get() method 'const' as it does not mutate object's state.
2024-08-20 14:15:50 +02:00
Botond Dénes
3ee0d7f2d1 Merge 'tools: Enhance scylla sstable shard-of to support tablets' from Kefu Chai
before this change, `scylla sstable shard-of` didn't support tablets,
because:

- with tablets enabled, data distribution uses the scheduler
- this replaces the previous method of mapping based on vnodes and shard numbers
- as a result, we can no longer deduce sstable mapping from token ranges

in this change, we:
- read `system.tablets` table to retrieve tablet information
- print the tablet's replica set (list of <host, shard> pairs)
- this helps users determine where a given sstable is hosted

This approach provides the closest equivalent functionality of
`shard-of` in the tablet era.

Fixes scylladb/scylladb#16488

---

no need to backport, it's an improvement, not a critical fix.

Closes scylladb/scylladb#20002

* github.com:scylladb/scylladb:
  tools: enhance `scylla sstable shard-of` to support tablets
  replica/tablets: extract tablet_replica_set_from_cell()
  tools: extract get_table_directory() out
  tools: extract read_mutation out
  build: split the list of source file across multiple line
  tools/scylla-sstable: print warning when running shard-of with tablets
2024-08-20 13:51:12 +03:00
Avi Kivity
e2b179a3d0 Merge 'Coroutinize sstable_directory registry garbage collecting method' from Pavel Emelyanov
null

Closes scylladb/scylladb#20172

* github.com:scylladb/scylladb:
  sstable_directory: Coroutinize inner lambdas
  sstable_directory: Fix indentation after previous patch
  sstable_directory: Coroutinize outer cotinuation chain
2024-08-20 12:50:09 +03:00
David Garcia
fea707033f docs: improve include flag directive
The include flag directive now treats missing content as info logs instead of warnings. This prevents build failures when the enterprise-specific content isn't yet available.

If the enterprise content is undefined, the directive automatically loads the open-source content. This ensures the end user has access to some content.

address comments

Closes scylladb/scylladb#19804
2024-08-20 12:21:39 +03:00
Kefu Chai
9a10c33734 build: cmake: do not build storage_proxy.o by default
in 5ce07e5d84, the target named "storage_proxy.o" was added for
training the build of clang. but the rule for building this target
has two flaws:

* it was added a dependency of the "all" target, but we don't need
  to build `storage_proxy.cc` twice when building the tree in the
  regular build job. we only need to build it when creating the
  profile for training the build of clang.
* it misses the include directory of abseil library. that's why we
  have following build failure when building the default target:

  ```
  [2024-08-18T14:58:37.494Z] /usr/local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/jenkins/workspace/scylla-master/scylla-ci/scylla -I/jenkins/workspace/scylla-master/scylla-ci/scylla/seastar/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/seastar/gen/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/seastar/gen/src -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/gen -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/jenkins/workspace/scylla-master/scylla-ci/scylla=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/storage_proxy.o.dir/Debug/storage_proxy.cc.o -MF service/CMakeFiles/storage_proxy.o.dir/Debug/storage_proxy.cc.o.d -o service/CMakeFiles/storage_proxy.o.dir/Debug/storage_proxy.cc.o -c /jenkins/workspace/scylla-master/scylla-ci/scylla/service/storage_proxy.cc
  [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/service/storage_proxy.cc:17:
  [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/db/commitlog/commitlog.hh:19:
  [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/db/commitlog/commitlog_entry.hh:15:
  [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/mutation/frozen_mutation.hh:15:
  [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/mutation/mutation_partition_view.hh:16:
  [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/build/gen/idl/mutation.dist.impl.hh:14:
  [2024-08-18T14:58:37.495Z] /jenkins/workspace/scylla-master/scylla-ci/scylla/serializer_impl.hh:20:10: fatal error: 'absl/container/btree_set.h' file not found
  [2024-08-18T14:58:37.495Z]    20 | #include <absl/container/btree_set.h>
  [2024-08-18T14:58:37.495Z]       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  [2024-08-18T14:58:37.495Z] 1 error generated.
  ```
* if user only enables "dev" mode, we'd have:
  ```
  CMake Error at service/CMakeLists.txt:54 (add_library):
  No SOURCES given to target: storage_proxy.o
  ```

so, in this change, we

* exclude this target from "all"
* link this target against abseil header library, so it has access
  to the abseil library. please note, we don't need to build an
  executable in this case, so the header would suffice.
* add a proxy target to conditionally enable/disable this target.
  as CMake does not support generator expression in `add_dependencies()`
  yet at the time of writing.
  see https://gitlab.kitware.com/cmake/cmake/-/issues/19467

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20195
2024-08-19 21:30:34 +03:00
Avi Kivity
7eb3b15fff Merge 'utils/tagged_integer: remove conversion to underlying integer' from Laszlo Ersek
~~~
utils/tagged_integer: remove conversion to underlying integer

Silently converting a tagged (i.e., "dimension-ful") integer to a naked
("dimensionless") integer defeats the purpose of having tagged integers,
and is a source of practical bugs, such as
<https://github.com/scylladb/scylladb/issues/20080>.

We could make the conversion operator explicit, for enforcing

  static_cast<TAGGED_INTEGER_TYPE::value_type>(TAGGED_INTEGER_VALUE)

in every conversion location -- but that's a mouthful to write. Instead,
remove the conversion operator, and let clients call the (identically
behaving) value() member function.
~~~

No backport needed (refactoring).

The series is supposed to solve #20081.

Two patches in the series touch up code that is known to be (orthogonally) buggy; see
- `service/raft_sys_table_storage: tweak dead code` (#20080)
- `test/raft/replication: untag index_t in test_case::get_first_val()` (#20151)

Fixes for those (independent) issues will have to be rebased on this series, or this series will have to be rebased on those (due to context conflicts).

The series builds at every stage. The debug and release unit test suites pass at the end.

Closes scylladb/scylladb#20159

* github.com:scylladb/scylladb:
  utils/tagged_integer: remove conversion to underlying integer
  test/raft/randomized_nemesis_test: clean up remaining index_t usage
  test/raft/randomized_nemesis_test: clean up index_t usage in store_snapshot()
  test/raft/replication: clean up remaining index_t usage
  test/raft/replication: take an "index_t start_idx" in create_log()
  test/raft/replication: untag index_t in test_case::get_first_val()
  test/raft/etcd_test: tag index_t and term_t for comparisons and subtractions
  test/raft/fsm_test: tag index_t and term_t for comparisons and subtractions
  test/raft/helpers: tighten compare_log_entries() param types
  service/raft_sys_table_storage: tweak dead code
  service/raft_sys_table_storage: simplify (snap.idx - preserve_log_entries)
  service/raft_sys_table_storage: untag index_t and term_t for queries
  raft/server: clean up index_t usage
  raft/tracker: don't drop out of index_t space for subtraction
  raft/fsm: clean up index_t and term_t usage
  raft/log: clean up index_t usage
  db/system_keyspace: promise a tagged integer from increment_and_get_generation()
  gms/gossiper: return "strong_ordering" from compare_endpoint_startup()
  gms/gossiper: get "int32_t" value of "gms::version_type" explicitly
2024-08-19 19:52:54 +03:00
Benny Halevy
5f655e41e3 repair: do_rebuild_replace_with_repair: use source_dc only when safe
It is unsafe to restrict the sync nodes for repair to
the source data center if we cannot guarantee a quorum
in the data center with network-topology replication strategy.

This change restricts the usage of source_dc in the following cases:
1. For SimpleStrategy - source_dc is ignored since there is no guarantee
that it contains remaining replicas for all tokens.
2. For EverywhereStrategy - use source_dc if there are remaining
live nodes in the datacenter.
3. For NetworkTopologyStrategy:
a. It is considered unsafe to use source_dc if number of nodes
   lost in that DC (replaced/rebuilt node + additional ignored nodes)
   is greater than 1, or it has 1 lost node and rf <= 1 in the DC.

b. If the source_dc arg is forced, as with the new
   `nodetool rebuild --force <source_dc>` option,
   we use it anyway, even if it's considered to be unsafe.
   A warning is printed in this case.

c. If the source_dc arg is user-provided, (using nodetool rebuild),
   an error exception is thrown, advising to use an alternative dc,
   if available, omit source_dc to sync with all nodes, or use the
   --force option to use the given source_dc anyhow.

d. Otherwise, we look for an alternative source datacenter,
   that has not lost any node. If such datacenter is found
   we use it as source_dc for the keyspace, and log a warning.

e. If no alternative dc is found (and source_dc is implicit), then:
   log a warning and fall back to using replicas from all nodes in the cluster.

Fixes #16826

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-19 17:23:51 +03:00
Benny Halevy
8665eef98c repair: replace_with_repair: pass the replace_node downstream
To be used by the next path to count how many nodes
are lost in each datacenter.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-19 17:23:33 +03:00
Benny Halevy
9729dd21c3 repair: replace_with_repair: pass ignore_nodes as a set of host_id:s
The callers already pass ignore_nodes as host_id:s
and we translate them into inet_address only for repair
so delay the translation as much as posible,

Refs scylladb/scylladb#6403

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-19 17:22:01 +03:00
Benny Halevy
b5d0ab092c repair: replace_rebuild_with_repair: pass ks_erms from caller
The keyspaces replication maps must be in sync with the
token_metadata_ptr passed already to the functions,
so instead of getting it in the callee, let the caller
get the ks_erms along with retrieving the tmptr.

Note that it's already done on the rebuild path
for streaming based rebuild.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-19 17:20:27 +03:00
Benny Halevy
0419b1d522 nodetool: rebuild: add force option
To be used to force usage of source_dc, even
when it is unsafe for rebuild.

Update docs and add test/nodetool/test_rebuild.py

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-19 17:20:12 +03:00
Benny Halevy
8b1877f3ca Add and use utils::optional_param to pass source_dc
Clearly indicate if a source_dc is provided,
and if so, was it explicitly given by the user,
or was implicitly selected by scylla.

This will become useful in the next patches
that will use that to either reject the operation
if it's unsafe to use the source_dc and the dc was
explicitly given by the user, or whether
to fallback to using all nodes otherwise.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-19 17:13:54 +03:00
Anna Stuchlik
83d5cb04c2 doc: extract the info about tablets defaut to a separate file
This commit extracts the information about the default for tables in keyspace creation
to a separate file in the _common folder. The file is then included using
the scylladb_include_flag directive.

The purpose of this commit is to make it possible to include a different file
in the scylla-enterprise repo - with a different default.

Refs https://github.com/scylladb/scylla-enterprise/issues/4585

Closes scylladb/scylladb#20181
2024-08-19 16:16:18 +03:00
Kefu Chai
25b3c50f71 test/nodetool: print default value of options in help message
would be more helpful, if the output of "--help" command line can
include the default value of options.

so, in this change, we include the default values in it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20170
2024-08-19 16:15:24 +03:00
Botond Dénes
40d2a6f0b2 Merge 'test.py: use XPath for iterating in "TestSuite/TestSuite"' from Kefu Chai
before this change, we check for the existence of "TestSuite" node
under the root of XML tree, and then enumerating all "TestSuite" nodes
under this "TestSuite", this approach works. but it

* introduces unnecessary indent
* is not very readable

in this change, we just use "./TestSuite/TestSuite" for enumerating
all "TestSuite" nodes under "TestSuite". simpler this way.

---

it's a cleanup in the test driver script, hence no need to backport.

Closes scylladb/scylladb#20169

* github.com:scylladb/scylladb:
  test.py: fix the indent
  test.py: use XPath for iterating in "TestSuite/TestSuite"
2024-08-19 16:13:42 +03:00
Botond Dénes
6835f7e993 Merge 'Add CQL-based RBAC support to Alternator' from Piotr Smaron
Alternator already supports **authentication** - the ability to to sign each request as a particular user. The users that can be used are the different "roles" that are created by CQL "CREATE ROLE" commands. This series adds support for **authorization**, i.e., the ability to determine that only some of these roles are allowed to read or write particular tables, to create new tables, and so on.

The way we chose to do this in this series is to support CQL's existing role-based access control (RBAC) commands - GRANT and REVOKE - on Alternator tables. For example, an Alternator table "xyz" is visible to CQL as "alternator_xyz.xyz", so a `GRANT SELECT ON alternator_xyz.xyz TO myrole` will allow read commands (e.g., GetItem) on that table, and without this GRANT, a GetItem will fail with `AccessDeniedException`.

This series adds the necessary checks to all relevant Alternator operations, and also adds extensive functional testing for this feature - i.e., that certain DynamoDB API operations are not allowed without the appropriate GRANTs.

The following permissions are needed for the following Alternator API operations:

* **SELECT**:      `GetItem`, `Query`, `Scan`, `BatchGetItem`, `GetRecords`
* **MODIFY**:      `PutItem`, `DeleteItem`, `UpdateItem`, `BatchWriteItem`
* **CREATE**:      `CreateTable`
* **DROP**:        `DeleteTable`
* **ALTER**:       `UpdateTable`, `TagResource`, `UntagResource`, `UpdateTimeToLive`
* _none needed_: `ListTables`, `DescribeTable`, `DescribeEndpoints`, `ListTagsOfResource`, `DescribeTimeToLive`, `DescribeContinuousBackups`, `ListStreams`, `DescribeStream`, `GetShardIterator`

Currently, I decided that for consistency each operation requires one permission only. For example, PutItem only requires MODIFY permission. This is despite the fact that in some cases (namely, `ReturnValues=ALL_OLD`) it can also _read_ the item. We should perhaps discuss this decision - and compare how it was done in CQL - e.g., what happens in LWT writes that may return old values?

Different permissions can be granted for a base table, each of its views, and the CDC table (Alternator streams). This adds power - e.g., we can allow a role to read only a view but not the base table, or read the table but not its history. GRANTing permissions on views or CDC logs require knowing their names, which are somewhat ugly (e.g., the name of GSI "abc" in table "xyz" is `alternator_xyz.xyz:abc`). But usefully, the error message when permissions are denied contains the full name of the table that was lacking permissions and which permissions were lacking, so users can easily add them.

In addition to permissions checking, this series also correctly supports _auto-grant_ (except #19798): When a role has permissions to `CreateTable`, any table it creates will automatically be granted all permissions for this role, so this role will be able to use the new table and eventually delete it. `DeleteTable` does the opposite - it removes permissions from tables being deleted, so that if later a second user re-creates a table with the same name, the first user will not have permissions over the new table.

The already-existing configuration parameter `alternator_enforce_authorization` (off by default), which previously only enabled authentication, now also enables authorization. Users that upgrade to the new version and already had `alternator_enforce_authorization=true` should verify that the users they use to authenticate either have the appropriate permissions or the "superuser" flag. Roles used to authenticate must also have the "login" flag.

Please note that although the new RBAC support implements the access control feature we asked for in #5047, this implementation is _not compatible_ with DynamoDB. In DynamoDB, the access control is configured through IAM operations or through the new `PutResourcePolicy` - operation, not through CQL (obviously!). DynamoDB also offers finer access-control granularity than we support (Scylla's RBAC works on entire tables, DynamoDB allows setting permissions on key prefixes, on individual attributes, and more). Despite this non-compatibility, I believe this feature, as is, will already be useful to Alternator users.

Fixes #5047 (after closing that issue, a new clean issue should be opened about the DynamoDB-compatible APIs that we didn't do - just so we remember this wasn't done yet).

New feature, should not be backported.

Closes scylladb/scylladb#20135

* github.com:scylladb/scylladb:
  tests: disable test_alternator_enforce_authorization_true
  test, alternator: test for alternator_enforce_authorization config
  test/pylib: allow setting driver_connect() options in servers_add()
  test: fix test_localnodes_joining_nodes
  alternator, RBAC: reproducer for missing CDC auto-grant
  alternator: document the new RBAC support
  alternator: add RBAC enforcement to GetRecords
  test/alternator: additional tests for RBAC
  test/alternator: reduce permissions-validity-in-ms
  test/alternator: add test for BatchGetItem from multiple tables
  alternator: test for operations that do not need any permissions
  alternator: add RBAC enforcement to UpdateTimeToLive
  alternator: add RBAC enforcement to TagResource and UntagResource
  alternator: add RBAC enforcement to BatchGetItem
  alternator: add RBAC enforcement to BatchWriteItem
  alternator: add RBAC enforcement to UpdateTable
  alternator: add RBAC enforcement to Query and Scan
  alternator: add RBAC enforcement to CreateTable
  alternator: add RBAC enforcement to DeleteTable
  alternator: add RBAC enforcement to UpdateItem
  alternator: add RBAC enforcement to DeleteItem
  alternator: add RBAC enforcement to PutItem
  alternator: add RBAC enforcement to GetItem
  alternator: stop using an "internal" client_state
2024-08-19 16:09:53 +03:00
Tomasz Grabiec
c1de4859d8 Merge 'tablets: Fix race between repair and split' from Raphael "Raph" Carvalho
Consider the following:

```
T
0   split prepare starts
1                               repair starts
2   split prepare finishes
3                               repair adds unsplit sstables
4                               repair ends
5   split executes
```

If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path.

The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set.

Fixes #19378.
Fixes #19416.

**Please replace this line with justification for the backport/\* labels added to this PR**

Closes scylladb/scylladb#19427

* github.com:scylladb/scylladb:
  tablets: Fix race between repair and split
  compaction: Allow "offline" sstable to be split
2024-08-19 14:44:28 +02:00
Kefu Chai
151074240c utils: cached_file: use structured binding when appropriate
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20184
2024-08-19 14:01:42 +03:00
Piotr Smaron
f773c76bfb codeowners: add appropriate reviewers to the cluster components 2024-08-19 12:39:47 +02:00
Anna Stuchlik
8fb746a5d2 doc: fix a link on the RBAC page
This commit fixes an external link on the Role Based Access Control page.

Fixes https://github.com/scylladb/scylladb/issues/20166

Closes scylladb/scylladb#20171
2024-08-19 12:56:38 +03:00
Piotr Smaron
cdc88cd06c tests: disable test_alternator_enforce_authorization_true
The test is flaky and needs to be fixed in order to not randomly break
our CI, OTOH can be commented out for the time being, so that we can
marge the feature.
2024-08-19 09:57:53 +02:00
Nadav Har'El
989dbef315 test, alternator: test for alternator_enforce_authorization config
This patch adds tests that demonstrates the current way that Alternator's
authentication and authorization are both enabled or disabled by the
option "alternator_enforce_authorization".

If in the future we decide to change this option or eliminate it (e.g.,
remain just with the "authenticator" and "authorizer" options), we can
easily update these tests to fit the new configuration parameters and
check they work as expected.

Because the new tests want to start Scylla instances with different
configuration parameters, they are written in the the "topology"
framework and not in the test/alternator framework. The test/alternator
framework still contains (test/alternator/test_cql_rbac.py) the vast
majority of the functional testing of the RBAC feature where all those
tests just assume that RBAC is enabled and needs to be tested.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
41418603e1 test/pylib: allow setting driver_connect() options in servers_add()
The manager.driver_connect() functions allows to pass parameters when
creating the connection (e.g., a special auth_provider), but unfortunately
right now the servers_add() function always calls driver_connect()
without parameters. So in this patch we just add a new optional
parameter to servers_add(), driver_connect_opts, that will be passed to
driver_connect().

In theory instead of the new option to driver_connect() a caller can
pass start=False to servers_add() and later call driver_connect()
manually with the right arguments. The problem is that start=False
avoids more than just calling driver_connect(), so it doesn't solve
the problem.

An example of using the new option is to run Scylla with authentication
enabled, and then connect to it using the correct default account
("cassandra"/"cassandra"):

    config = {
        'authenticator': 'PasswordAuthenticator',
        'authorizer': 'CassandraAuthorizer'
    }
    servers = await manager.servers_add(1, config=config,
        driver_connect_opts={'auth_provider':
            PlainTextAuthProvider(username='cassandra', password='cassandra')})

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
de20ac1a6d test: fix test_localnodes_joining_nodes
The existing test
topology_experimental_raft/test_alternator::test_localnodes_joining_nodes

Tried to create a second server but *not* wait for it to complete, but
the trick it used (cancelling the task) doesn't work since commit 2ee063c
makes a list of unwaited tasks and waits for them anyway. The test
*appears* to work because it is the last test in the file, but if we
ever add another test in the same file (like I plan to do in the next
patch), that other test will find a "BROKEN" ScyllaClusterManager and
report that it failed :-(

Other tricks I tried to use (like killing the servers) also didn't work
because of various limitations and complications of the test framework
and all its layers.

So not wanting to fight the fragile testing framework any more at this
point, I just gave up and the test will *wait* for the second server
to come up. This adds 120 seconds (!) to the test, but since this whole
test file already takes more than 500 seconds to complete, let's bite
this bullet. Maybe in the future when the test framework improves, we can
avoid this 120 second wait.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
79f9b3007e alternator, RBAC: reproducer for missing CDC auto-grant
This patch adds a reproducing (xfailing) test for issue #19798, which
shows that if a role is able to create an Alternator table, the role is
able to read the new table (this is known as "auto-grant"), but is NOT
able to read the CDC log (i.e., use Alternator Streams' "GetRecords").

Once we do fix this auto-grant bug, it's also important to also implement
auto-revoke - the permissions on a deleted table must be deleted as well
(otherwise the old owner of a deleted table will be able to read a new
table with the same name). This patch also adds a test verifying that
auto-revoke works. This test currently passes (because there is no auto-
grant, so nothing needs to be revoked...) but if we'll implement auto-grant
and forget auto-revoke, the second test will start to fail - so I added
this test as a precaution against a bad fix.

Refs #19798

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
7de6aedd47 alternator: document the new RBAC support
In docs/alternator/compatibility.md we said that although Alternator
supports authentication, it doesn't support authorization (access
control). Now it does, so the relevant text needs to be corrected
to fit what we have today.

It's still in the compatibility.md document because it's not the same
API as DynamoDB's, so users with existing applications may need to be
aware of this difference.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
f9ff475dfb alternator: add RBAC enforcement to GetRecords
This patch adds a requirement for the "SELECT" permission on a table to
run a GetRecords on it (the DynamoDB Streams API, i.e., CDC).

The grant is checked on the *CDC log table* - not on the base table,
which allows giving a role the ability to read the base but not is
change stream, or vice versa.

The operations ListStreams, DescribeStreams, GetShardIterators do not
require any permissions to run - they do not read any data, and are
(in my opinion) similar in spirit to DescribeTable, so I think it's fine
not to require any permissions for them.

A test is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
0789841cf8 test/alternator: additional tests for RBAC
Additional tests for support for CQL Role-Based Access Control (RBAC)
in Alternator:

1. Check that even in an Alternator table whose name isn't valid as CQL
   table names (e.g., uses the dot character) the GRANT/REVOKE commands
   work as expected.

2. Check that superuser roles have full permissions, as expected.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
409fea5541 test/alternator: reduce permissions-validity-in-ms
We set in test/cql-pytest/run.py, affecting test/alternator/run, the
configuration permissions_validity_in_ms by default to 100ms. This means
that tests that need to check how GRANT or REVOKE work always need to
sleep for more than 100ms, which can make a test with a lot of these
operations very slow.

So let's just set this configuration value to 5ms. I checked that it
doesn't adversely affect the total running speed of test/alternator/run.

This change only affects running tests through test/alternator/run, which
is expected to be fast. I left the default for test.py as it was, 100ms,
the latency of individual tests is less important there.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
1b20a11dec test/alternator: add test for BatchGetItem from multiple tables
While working on the RBAC on BatchGetItem, I noticed that although
BatchGetItem may ask to read items from several tables, we don't have
a test covering this case! This patch fixes that testing oversight.

Note that for the write-side version of this operation, BatchWriteItem,
we do have tests that write to several tables in the same batch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
f827bd51d2 alternator: test for operations that do not need any permissions
Some operations, namely ListTables, DescribeTable, DescribeEndpoints,
ListTagsOfResource, DescribeTimeToLive and DescribeContinuousBackups
do not need any permissions to be GRANTed to a role.

Our rationale for this decision is that in CQL, "describe table" and
friends also do not require any permissions.

This patch includes a test that verifies that they really don't need
permissions.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
9417cf8bcf alternator: add RBAC enforcement to UpdateTimeToLive
This patch adds a requirement for the "ALTER" permission on a table to
run a UpdateTimeToLive on it. UpdateTimeToLive is similar in purpose to
UpdateTable, so it makes sense to use the same permission "ALTER" as we
do for UpdateTable.

A tests is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
e76316495c alternator: add RBAC enforcement to TagResource and UntagResource
This patch adds a requirement for the "ALTER" permission on a table to
run the TagResource or UntagResource operations on it. CQL does not
have an exact parallel of DynamoDB's tagging feature, but our usual
use of tags as an extension of UpdateTable to change non-standard options
(e.g., write isolation policy or tablets setup), so it makes sense to
require the same permissions we require for UpdateTable - namely "ALTER".

A test for both operations is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:53 +02:00
Nadav Har'El
fda4a9fad8 alternator: add RBAC enforcement to BatchGetItem
This patch adds a requirement for the "SELECT" permission on a table to
run a BatchGetItem on it. A single batch may ask to write to several
different tables, so we fail the entire batch with AccessDeniedException
if any of the tables mentioned in the batch do not have SELECT permissions
for this role.

A tests is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:57:51 +02:00
Nadav Har'El
b02288785f alternator: add RBAC enforcement to BatchWriteItem
This patch adds a requirement for the "MODIFY" permission on a table to
run a BatchWriteItem on it. A single batch may ask to write to several
different tables, so we fail the entire batch with AccessDeniedException
if any of the tables mentioned in the batch do not have MODIFY permissions
for this role.

A tests is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:56:28 +02:00
Nadav Har'El
445a5d57cd alternator: add RBAC enforcement to UpdateTable
This patch adds a requirement for the "ALTER" permission on a table to
run a UpdateTable on it.

A tests is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:45:22 +02:00
Nadav Har'El
b4484158e7 alternator: add RBAC enforcement to Query and Scan
This patch adds a requirement for the "SELECT" permission on a table to
run a Query or Scan on it.

Both Query and Scan operations call the same do_query() function, so the
permission checks are put there.

Note that Query can read from either the base table or one of its views,
and the permissions on the base and each of the views can be separate
(so we can allow a role to only read one view, for example).

Tests for all of the above are also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:45:22 +02:00
Nadav Har'El
82f7e55943 alternator: add RBAC enforcement to CreateTable
This patch adds a requirement for the "CREATE" permission on ALL
KEYSPACES to run a CreateTable operation.

The CreateTable operation also performs so-called "auto-grant": When a
role creates a table, it is automatically granted full permissions to
read, write, change or delete that new table.

A test for all these things is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:45:22 +02:00
Nadav Har'El
79dfb7b7d5 alternator: add RBAC enforcement to DeleteTable
This patch adds a requirement for the "DROP" permission on a table to
run a DeleteTable on it.

Moreover, when a table and its views are deleted, any special permissions
previously GRANTed on this table are removed. This is necessary because
if a role creates a table it is automatically granted permissions on this
table (this is known as "auto-grant" - see the CreateTable patch for
details). If this role deletes this table and later a second role creates
a table with the same name, we don't want the first role to have
permissions on this new table.

Tests for permission enforcements and revocation on delete are also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:45:22 +02:00
Nadav Har'El
2ebc0501b8 alternator: add RBAC enforcement to UpdateItem
This patch adds a requirement for the "MODIFY" permission on a table to
run a UpdateItem on it.

Only the MODIFY permission is required, even if the operation may also
read the old value of the item, such as a read-modify-write operation
or even using ReturnValues='ALL_OLD'.

A test is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:45:22 +02:00
Nadav Har'El
36d8aea654 alternator: add RBAC enforcement to DeleteItem
This patch adds a requirement for the "MODIFY" permission on a table to
run a DeleteItem on it.

Only the MODIFY permission is required, even if the operation may also
read the old value of the item (using ReturnValues='ALL_OLD').

A test is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:45:22 +02:00
Nadav Har'El
34c975854a alternator: add RBAC enforcement to PutItem
This patch adds a requirement for the "MODIFY" permission on a table to
run a PutItem on it.

Only the MODIFY permission is required, even if the operation may also
read the old value of the item (using ReturnValues='ALL_OLD').

A test is also added.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:45:22 +02:00
Nadav Har'El
3008b8416c alternator: add RBAC enforcement to GetItem
In this patch, we begin to add role-based access control (RBAC)
enforement to Alternator - in this patch only to GetItem.

After the preparation of client_state correctly in the previous patch,
the permission check itself in the get_item() function is very simple.
The bigger part of this patch is a full functional test in
test/alternator/test_cql_rbac.py. The test is quite self-explanatory
and heavily commented. Basically we check that a new role cannot
read with GetItem a pre-existing table, and we can add that ability
by GRANTing (in CQL) the new role the ability to SELECT the table,
the keyspace, all keyspaces, or add that ability to some other role
that this role inherits.

In the following patches, we will add role-based access control to
the Alternator operations, but the functional tests will be shorter -
we don't need to check the role inheritence, "all keyspaces" feature,
and so on, for every operation separately since they all use the
same underlying checking functions which handles these role inheritence
issues in exactly the same way.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:45:22 +02:00
Nadav Har'El
583f060bd8 alternator: stop using an "internal" client_state
Scylla uses a "client_state" object to encapsulate the information of
who the client is - its IP address, which user was authenticated, and so on.

For an unknown reason, Alternator created for each request an "internal"
client_state, meaning that supposedly the client for each request was
some sort of internal process (e.g., repair) rather than a real client.
This was wrong, and we even had a FIXME about not putting the client's
IP address in client_state.

So in this patch, we start using a normal "external" client_state
instead of an "internal" one. The client_state constructors are very
different in the two cases, so a few lines of code had to change.

I hope that this change will cause no functional changes. For example,
Alternator was already setting its own timeouts explicitly and not
relying on the default ones for external clients. However, we need to
fix this for the following patches which introduce permissions checks
(Role-Based Access Control - RBAC) - the client_state methods for
checking permissions become no-ops for *internal* clients (even if the
client_state contains an authenticated users). We need these functions
to do their job - so we need an *external* variant of client_state.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-08-19 09:45:22 +02:00
Tomasz Grabiec
ab7656a7be Merge 'replica: fix copy constructor of tablet_sstable_set' from Lakshmi Narayanan Sreethar
Commit 9f93dd9fa3 changed
`tablet_sstable_set::_sstable_sets` to be a `absl::flat_hash_map` and
in addition, `std::set<size_t> _sstable_set_ids` was
added. `_sstable_set_ids` is set up in the
`tablet_sstable_set(schema_ptr s, const storage_group_manager& sgm,
const locator::tablet_map& tmap)` constructor, but it is not copied in
`tablet_sstable_set(const tablet_sstable_set& o)`.

This affects the `tablet_sstable_set::tablet_sstable_set` method as it
depends on the copy constructor. Since sstable set can be cloned when
a new sstable set is added, the issue will cause ids not being copied
into the new sstable set. It's healed only after compaction, since the
sstable set is rebuilt from scratch there.

This PR fixes this issue by removing the existing copy constructor of
`tablet_sstable_set` to enable the implicit default copy constructor.

Fixes #19519

Closes scylladb/scylladb#20115

* github.com:scylladb/scylladb:
  boost/sstable_set_test: add testcase to test tablet_sstable_set copy constructor
  replica: fix copy constructor of tablet_sstable_set
2024-08-19 00:53:29 +02:00
Avi Kivity
390e01673b Merge 'Adding batch latency and batch size metrics to Alternator' from Amnon Heiman
This patch adds metrics for batch get_item and batch write_item.
The new metrics record summary and histogram for latencies and batch size.

Batch sizes are implemented as ever-growing counters. To get the average batch size divide the rate of
the batch size counter by the rate of the number of batch counter:
```rate(batch_get_item_batch_size)/rate(batch_get_item)```

Relates to #17615

New code, No need to backport

Closes scylladb/scylladb#20190

* github.com:scylladb/scylladb:
  Add tests for Alternator batch operation metrics
  alternator/executor: support batch latency and size metrics
  Add metrics for Alternator get and write batch operations
2024-08-18 21:22:39 +03:00
Amnon Heiman
63fdfb89cd Add tests for Alternator batch operation metrics
This patch adds unit tests to verify the correctness of the newly
introduced histogram metrics for get and write batch operation
latencies.

The test uses the existing latency test with the added metrics.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-08-18 12:19:43 +03:00
Amnon Heiman
d20a333f51 alternator/executor: support batch latency and size metrics
This patch Updated the get and write batch operations in Alternator to
record latency using the newly added histogram metrics.
It adds logic to increment the counters with the number of items
processed in each batch.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-08-18 12:14:23 +03:00
Amnon Heiman
8bad4b44f8 Add metrics for Alternator get and write batch operations
Introduced histogram metrics to track latency for Alternator's get and
write batch operations.

Added counters to record the number of items processed in each batch
operation.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-08-18 12:09:46 +03:00
Lakshmi Narayanan Sreethar
ec47b50859 boost/sstable_set_test: add testcase to test tablet_sstable_set copy constructor
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-08-17 23:38:05 +05:30
Lakshmi Narayanan Sreethar
44583eed9e replica: fix copy constructor of tablet_sstable_set
Remove the existing copy constructor to enable the use of the implicit
copy constructor. This fixes the issue of `_sstable_set_ids` not being
copied in the current copy constructor.

Fixes #19519

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-08-17 23:37:58 +05:30
Kefu Chai
3d593ceeb1 perf/perf_sstable: add {crawling,partitioned}_streaming modes
for testing the load performance of load_and_stream operation.

Refs scylladb/scylladb#19989

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-17 14:43:54 +08:00
Kefu Chai
7806c72e49 test/perf/perf_sstable: use switch-case when appropriate
this change is a follow up of 06c60f6ab, which updated the
2nd step of the test to use switch-case, but missed the 1st step.
so this change updates the first step of the test as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-17 14:38:37 +08:00
Pavel Emelyanov
6a9b8ea135 sstable_directory: Coroutinize inner lambdas
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-16 10:45:27 +03:00
Pavel Emelyanov
7401c0ace2 sstable_directory: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-16 10:45:27 +03:00
Pavel Emelyanov
7422504d35 sstable_directory: Coroutinize outer cotinuation chain
Indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-16 10:45:27 +03:00
Kefu Chai
e8f9f71ef3 test.py: fix the indent
and take this opportunity to fix a typo in comment.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-16 13:32:57 +08:00
Kefu Chai
e88166f7a4 test.py: use XPath for iterating in "TestSuite/TestSuite"
before this change, we check for the existence of "TestSuite" node
under the root of XML tree, and then enumerating all "TestSuite" nodes
under this "TestSuite", this approach works. but it

* introduces unnecessary indent
* is not very readable

in this change, we just use "./TestSuite/TestSuite" for enumerating
all "TestSuite" nodes under "TestSuite". simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-16 13:32:33 +08:00
Kefu Chai
afee3924b3 s3/client: check for "Key" and "Value" tag in "Tag" XML tag
despite that the API document at
https://docs.aws.amazon.com/AmazonS3/latest/API/API_Tag.htm
claims that both these tags are "Required" in the "Tag" object returned
by S3 APIs, we still have to check them before dereferencing the pointer
of the child node, as we should not trust the output of an external API.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20160
2024-08-15 20:16:35 +03:00
Andrei Chekun
f24f5b7db2 test.py: Fix boost XML conversion to allure when XML file is empty
The method cannot find the TestSuite in the XML file and fails the whole job, however tests are passed. The issue was in incorrect understanding of boost summarization method. It creates one file for all modes, so there is no need to go through all modes to convert the XML file for allure.

Closes: https://github.com/scylladb/scylladb/issues/20161

Closes scylladb/scylladb#20165
2024-08-15 20:15:31 +03:00
Benny Halevy
52234214e5 schema_tables: calculate_schema_digest: filter the key earlier
Currently, each frozen mutation we get from
system_keyspace::query_mutations is unfrozen in whole
to a mutation and only then we check its key with
the provided `accept_keyspace` function.

This is wasteful, since they key can be processed
directly form the frozen mutation, before taking
the toll of unfreezing it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-15 12:33:34 +03:00
Benny Halevy
95a5fba0ea schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector
With a large number of table the schema mutations
vector might get big enoug to cause reactor stalls
when freed.

For example, the following stall was hit on
2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables:
```
 (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730
 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799
```

This change returns a mutations generator from
the `map` lambda coroutine so we can process them
one at a time, destroy the mutations one at a time,
and by that, reducing memory footprint and preventing
reactor stalls.

Fixes #18173

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-08-15 12:33:34 +03:00
Kefu Chai
c628fa4e9e tools: enhance scylla sstable shard-of to support tablets
before this change, `scylla sstable shard-of` didn't support tablets,
because:

- with tablets enabled, data distribution uses the scheduler
- this replaces the previous method of mapping based on vnodes and shard numbers
- as a result, we can no longer deduce sstable mapping from token ranges

in this change, we:
- read `system.tablets` table to retrieve tablet information
- print the tablet's replica set (list of <host, shard> pairs)
- this helps users determine where a given sstable is hosted

This approach provides the closest equivalent functionality of
`shard-of` in the tablet era.

Fixes scylladb/scylladb#16488
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-15 15:49:55 +08:00
Kefu Chai
4291033b14 replica/tablets: extract tablet_replica_set_from_cell()
so it can be reused to implement a low-level tool which reads tablets
data from sstables

Refs scylladb/scylladb#16488
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-15 15:49:55 +08:00
Kefu Chai
e1162e0dae tools: extract get_table_directory() out
the `get_table_directory()` function will have applications
beyond its current use in `schema_loader.cc`. its ability to locate
the directory storing the sstables of given table could be valuable
in other subcommand(s) implementation.

so, in this change we extract it out into a dedicated source file,
so that it accept the primary_key and an optional clustering_key.

Refs scylladb/scylladb#16488

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-15 15:49:55 +08:00
Kefu Chai
a04e0b6c7d tools: extract read_mutation out
the `read_mutation_from_table_offline()` function will have applications
beyond its current use in `schema_loader.cc`. its ability to parser
mutation data from sstables could be valuable in other subcommand(s)
implementation.

so, in this change we extract it out into a dedicated source file,
so that it accept the primary_key and an optional clustering_key.

Refs scylladb/scylladb#16488

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-15 15:49:55 +08:00
Kefu Chai
74a670dd19 build: split the list of source file across multiple line
Split the extended list of source files across multiple lines.
This improves readability and makes future additions easier to
review in diffs.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-15 15:49:55 +08:00
Kefu Chai
3f8f1d7274 tools/scylla-sstable: print warning when running shard-of with tablets
the subcommand of "shard-of" does not support tablets yet. so let's
print out an error message, instead of printing the mapping assuming
that the sstables are distributed based on token only.

this commit also adds two more command line options to this subcommand,
so that user is required to specify either "--vnodes" or "--tablets"
to instruct the tool how the cluster distributes the tokens across nodes
and their shards. this helps to minimize the suprise of user.

this change prepares for the succeeding changes to implement the tablets
support.

the corresponding test is updated accordingly so that it only exercises
the "shard-of" subcommand without tablets. we will test it with tablets
enabled in a succeeding change.

Refs scylladb/scylladb#16488
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-15 15:49:55 +08:00
Laszlo Ersek
baf6ec49ff utils/tagged_integer: remove conversion to underlying integer
Silently converting a tagged (i.e., "dimension-ful") integer to a naked
("dimensionless") integer defeats the purpose of having tagged integers,
and is a source of practical bugs, such as
<https://github.com/scylladb/scylladb/issues/20080>.

We could make the conversion operator explicit, for enforcing

  static_cast<TAGGED_INTEGER_TYPE::value_type>(TAGGED_INTEGER_VALUE)

in every conversion location -- but that's a mouthful to write. Instead,
remove the conversion operator, and let clients call the (identically
behaving) value() member function.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-15 02:12:58 +02:00
Laszlo Ersek
9aa7d232d6 test/raft/randomized_nemesis_test: clean up remaining index_t usage
With implicit conversion of tagged integers to untagged ones going away,
explicitly tag (or untag, as necessary) the operands of the following
operations, in "test/raft/randomized_nemesis_test.cc":

- addition of tagged and untagged (both should be tagged)

- taking the minimum of an index difference and a container size (both
  should be untagged)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 22:54:42 +02:00
Laszlo Ersek
1af3460a81 test/raft/randomized_nemesis_test: clean up index_t usage in store_snapshot()
With implicit conversion of tagged integers to untagged ones going away,
unpack and clean up the relatively complex

  first_to_remain = max(snap.idx + 1 - preserve_log_entries, 0)

calculation in persistence::store_snapshot().

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 22:54:42 +02:00
Laszlo Ersek
4dc2faa49a test/raft/replication: clean up remaining index_t usage
With implicit conversion of tagged integers to untagged ones going away,
explicitly untag the operands / arguments of the following operations, in
"test/raft/replication.hh":

- assignment to raft_cluster::_seen

- call to hasher_int::hash_range()

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 22:54:42 +02:00
Laszlo Ersek
3a32f3de81 test/raft/replication: take an "index_t start_idx" in create_log()
raft_cluster::get_states() passes a "start_idx" to create_log(), and
create_log() uses it as an "index_t" object. Match the type of "start_idx"
to its name.

This patch is best viewed with "git show -W".

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 22:54:42 +02:00
Laszlo Ersek
08e117aeb5 test/raft/replication: untag index_t in test_case::get_first_val()
In test_case::get_first_val(), the asssignment

  first_val = initial_snapshots[initial_leader].snap.idx;

*both* relies on implicit conversion of the tagged integer type "index_t"
to the underlying "uint64_t", *and* is a logic bug, as reported at
<https://github.com/scylladb/scylladb/issues/20151>.

For now, wean the buggy asssignment off the disappearing
tagged-to-untaggged conversion.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 22:54:42 +02:00
Laszlo Ersek
6254fca7f5 test/raft/etcd_test: tag index_t and term_t for comparisons and subtractions
Properly annotate index_t and term_t constants for use in
BOOST_CHECK_EQUAL() and BOOST_CHECK().

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 22:54:42 +02:00
Laszlo Ersek
bd4fc85bf0 test/raft/fsm_test: tag index_t and term_t for comparisons and subtractions
Properly annotate index_t and term_t constants for use in
BOOST_CHECK_EQUAL(), BOOST_CHECK(). Clean up the first args of
read_quorum() calls -- stay in term_t space.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 22:54:42 +02:00
Laszlo Ersek
265655473e test/raft/helpers: tighten compare_log_entries() param types
The "from" and "to" parameters of compare_log_entries() are raft log
indices; change them to raft::index_t, and update the callers.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 22:54:42 +02:00
Piotr Smaron
3e3858521d codeowners: add appropriate reviewers to the frontend components 2024-08-14 22:26:35 +02:00
Piotr Smaron
1b2e88b96a codeowners: fix codeowner names 2024-08-14 22:26:26 +02:00
Laszlo Ersek
5dcc627465 service/raft_sys_table_storage: tweak dead code
In raft_sys_table_storage::store_snapshot_descriptor(), the condition

  preserve_log_entries > snap.idx

*both* relies on implicit conversion of the tagged integer type "index_t"
to the underlying "uint64_t", *and* is a logic bug, as reported at
<https://github.com/scylladb/scylladb/issues/20080>.

Ticket#20080 explains that this condition always evaluates to false in
practice, and that the "else" branch handles all cases correctly anyway.
For now, wean the buggy expression off the disappearing
tagged-to-untaggged conversion.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 21:35:34 +02:00
Andrei Chekun
3407ae5d8f [test.py] Add Junit logger for boost test
Currently, boost tests aren't using Junit. Enable Junit report output and clean them from skipped test, since boost tests are executed by function name rather than filename. This allows including boost tests result to the Allure report.

Related: https://github.com/scylladb/qa-tasks/issues/1665

Closes scylladb/scylladb#19925
2024-08-14 22:18:31 +03:00
Avi Kivity
6d6f93e4b5 Merge 'test/nodetool: enable running nodetool tests under test/nodetool' from Kefu Chai
before this change, we assume user runs nodetool tests right under the root source directory. if user runs them under `test/nodetool`, the suppression rules are not applied. as the path is incorrect in that case.

after this change, the supression rules' path is deduced from the top src directory. so we can now run the nodetool test under `test/nodetool` .

---

no need to backport, this change improves developer's experience.

Closes scylladb/scylladb#20119

* github.com:scylladb/scylladb:
  test/nodetool: deduce subpression path from top srcdir
  test/nodetool: deduce path from top srcdir
2024-08-14 22:10:38 +03:00
Michał Jadwiszczak
f7eb74e31f cql3/statements/create_service_level: forbid creating SL starting with $
Tenant names starting with `$` are reserved for internal ones.
Forbid creating new service level which name starts with `$`
and log a warning for existing service levels with `$` prefix.

Closes scylladb/scylladb#20122
2024-08-14 21:25:31 +03:00
Kefu Chai
5ce07e5d84 build: cmake: add compiler-training target
`tools/toolchain/optimized_clang.sh` builds this target for creating
the profile in order to build clang optimized with this profile data.

so let's be compatible with `configure.py`, and add this target to
CMake building system as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20105
2024-08-14 21:21:33 +03:00
Ernest Zaslavsky
f5f65ead1e Add .clang-format, also add CLion build folder to the .gitignore file
Closes scylladb/scylladb#20123
2024-08-14 21:20:29 +03:00
Pavel Emelyanov
66d72e010c distributed_loader: Lock table via global table ptr
The lock_table() method needs database, ks and cf to find the table on
all shards. The same can be achieved with the help of global_table_ptr
thing that all the core callers already have at hand.

There's a test that doesn't have global table, but it can get one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20139
2024-08-14 20:53:21 +03:00
Pavel Emelyanov
7e3e5cfcad sstable_directory: Simplify special-purpose local-only constructor
Typically the sstable_directory is constructed out of a table object.
Some code, namely tests and schema-loader, don't have table at hand and
construct directory out of schema, sharder, path-to-sstables, etc. This
code doesn't work with any storage options other than local ones, so
there's no need (yet) to carry this argument over.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#20138
2024-08-14 20:22:50 +03:00
Avi Kivity
28d3b91cce Merge 'test/perf/perf_sstables: use test_modes as the type of its option' from Kefu Chai
before this change, we look up for the mode using the command line
option as the key, but that's incorrect if the command line option does
not match with any of the known names. in that case, `test_mode` just
create another pair of <sstring, test_modes>, and return the second
component of this pair. and the second component is not what we expect.
we should have thrown an exception.

in this change

* the test_mode map is marked const.
* the overloads for parsing / formatting the `test_modes` type are
  added, so that boost::program_options can parse and format it.

after this change, we  print more user friendly error, like

```
/scylla perf-sstable --mode index-foo
error: the argument ('index-foo') for option '--mode' is invalid

Try --help.
```

instead of a bunch of output which is printed as if we passes the correct option as the argument of the `--mode` option.

---

it's an improvement of developer experience, hence no need to backport.

Closes scylladb/scylladb#20140

* github.com:scylladb/scylladb:
  test/perf/perf_sstable: use switch-case when appropriate
  test/perf/perf_sstables: use test_modes as the type of its option
2024-08-14 20:18:22 +03:00
Piotr Smaron
31cb5b132b codeowners: remove non contributors 2024-08-14 18:52:25 +02:00
Avi Kivity
3de4e8f91b Merge 'cql: process LIMIT for GROUP BY select queries' from Paweł Zakrzewski
This change fixes #17237, fixes #5361 and fixes #5362 by passing the limit value down the call chain in cql3. A test is also added.

fixes #17237
fixes #5361
fixes #5362

The regression happened in 5.4 as we changed the way GROUP BY is processed in 432cb02 - to force aggregation when it is used. The LIMIT value was not passed to aggregations and thus we failed to adhere to it.

W want to backport this fix to 5.4 and 6.0 to have continuous correct results for the test case from #17237

This patch consists of 4 commits:
- fa4225ea0fac2057b7a9976f57dc06bcbd900cd4 - cql3: respect the user-defined page size in aggregate queries - a precondition for this patch to be implementable
- 8fbe69e74dca16ed8832d9a90489ca47ba271d0b - cql3/select_statement: simplify the get_limit function - the `do_get_limit()` function did a lot of legwork that should not be associated with it. This change makes it trivial and makes its callers do additional checks (for unset guards, or for an aggregate query)
- 162828194a2b88c22fbee335894ff045dcc943c9 - cql3: process LIMIT for GROUP BY queries - pass the limit value down the chain and make use of it. This is the actual fix to #17237
- b3dc6de6d6cda8f5c09b01463bb52f827a6a00b4 - test/cql-pytest: Add test for GROUP BY queries with LIMIT - tests

Closes scylladb/scylladb#18842

* github.com:scylladb/scylladb:
  test/cql-pytest: Add test for GROUP BY queries with LIMIT
  cql3: process LIMIT for GROUP BY queries
  cql3/select_statement: simplify the get_limit function
  cql3: respect the user-defined page size in aggregate queries
2024-08-14 17:54:59 +03:00
Avi Kivity
8c257db283 Merge 'Native reverse pages over RPC' from Łukasz Paszkowski
Drop half-reversed (legacy) format of query::partition_slice.

The select query builds a fully reversed (native) slice for reversed queries and use it together with a reversed
schema to construct query::read_command that is further propagated to the database.

A cluster feature is added to support nodes that still operate on half-reversed slices. When the feature is turned off:
- query::read_command is transformed (to have table schema and half-reversed slices) before sending to other nodes
- query::read_command is transformed (to have query schema (reversed) and reversed slices) after receiving it from other nodes
- Similarly, mutations are transformed. They are reversed before being sent to other nodes or after receiving them from other nodes.

Additional manual tests were performed to test a mixed-node cluster:

1. 3-node cluster with one node upgraded: reverse read queries performed on an old node
2. 3-node cluster with one node upgraded: reverse read queries performed on a new node
3. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on an old node
4. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on a new node

All reverse read queries above consists of:

- single-partition reverse reads with no clustering key restrictions, with single column restrictions and multi column restrictions both with and without paging turned on
- multi-partition reverse reads with range restrictions with optional partition limit and partial ordering

The exact same tests were also performed on a fully upgraded cluster.

Fixes https://github.com/scylladb/scylladb/issues/12557

Closes scylladb/scylladb#18864

* github.com:scylladb/scylladb:
  mutation_partition: drop reverse parameter in compact_for_query
  clustering_key_filter: unify get_ranges and get_native_ranges
  streamed_mutation_freezer: drop the reverse parameter
  reverse-reads.md: Drop legacy reverse format information
  Fix comments refering to half-reversed (legacy) slices
  select_statement::do_execute: Add tracing informaction
  query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges
  query-request: Drop half_reverse_slice as it is no longer used anywhere
  readers: Use reversed schema and native reversed slices
  database: accept reversed schema for reversed queries
  storage_proxy: Support reverse queries in native format
  query_pagers: Replace _schema with _query_schema
  query_pagers: Support reverse queries in native format
  select_statement: Execute reversed query in native format
  storage_proxy::remote: Add support for mixed-node clusters
  mutation_query: Add reversed function to reverse reconcilable_result
  query-request: Add reversed function to reverse read_command
  features: add native_reverse_queries
  kl::reader::make_reader: Unify interface with mx::reader::make_reader
  config: drop reversed_reads_auto_bypass_cache
  config: drop enable_optimized_reversed_reads
2024-08-14 17:51:56 +03:00
Anna Stuchlik
99be8de71e doc: set 6.1 as the latest stable version
This commit updates the configuration for ScyllaDB documentation so that:
- 6.1 is the latest version.
- 6.1 is removed from the list of unstable versions.

It must be merged when ScyllaDB 6.1 is released.

No backport is required.

Closes scylladb/scylladb#20041
2024-08-14 13:43:17 +02:00
Laszlo Ersek
d87d1ae29d service/raft_sys_table_storage: simplify (snap.idx - preserve_log_entries)
With conversion of tagged integers to untagged ones going away, replace

  static_cast<uint64_t>(snap.idx)

with

  snap.idx.value()

Furthermore, casting "preserve_log_entries" (of type "size_t") to
"uint64_t" is redundant (both "snap.idx" and "preserve_log_entries" carry
nonnegative values, and the mathematical difference is expected to be
nonnegative); remove the cast.

Finally, simplify the initialization syntax.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
e781046739 service/raft_sys_table_storage: untag index_t and term_t for queries
With implicit conversion of tagged integers to untagged ones going away,
explicitly untag index_t and term_t values in the following two contexts:

- when they are passed to CQL queries as int64_t,

- when they are default-constructed as fallbacks for int64_t fields
  missing from CQL result sets.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
4f1f207be1 raft/server: clean up index_t usage
With implicit conversion of tagged integers to untagged ones going away,
explicitly tag (or untag, as necessary) the operands of the following
operations, in "raft/server.cc":

- addition of tagged and untagged (both should be tagged)

- subscripting an array by tagged (should be untagged)

- comparing a size-like threshold against tagged (should be untagged)

- exposing tagged via gauges (should be untagged)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
1b134d52ac raft/tracker: don't drop out of index_t space for subtraction
Tagged integers support subtraction; use it.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
b6233209d9 raft/fsm: clean up index_t and term_t usage
With implicit conversion of tagged integers to untagged ones going away,
explicitly tag (or untag, as necessary) the operands of the following
operations, in "raft/fsm.cc":

- addition of tagged and untagged (both should be tagged)

- comparison (relop) between tagged an untagged (both should be tagged)

- subscripting or sizing an array by tagged (should be untagged)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
5b9a4428c6 raft/log: clean up index_t usage
With implicit conversion of tagged integers to untagged ones going away,
explicitly tag (or untag, as necessary) the operands of the following
operations, in raft/log.{cc,h}:

- addition of tagged and untagged (both should be tagged)

- comparison (relop) between tagged an untagged (both should be tagged)

- subscripting an array, or offsetting an iterator, by tagged (should be
  untagged)

- comparing an array bound against tagged (should be untagged)

- subtracting tagged from an array bound (should be untagged)

Note: these files mix uniform initialization syntax (index_t{...}) with
constructor call syntax (index_t()), with the former being more frequent.
Stick with the former here too, for consistency.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
9e95f3a198 db/system_keyspace: promise a tagged integer from increment_and_get_generation()
Internally, increment_and_get_generation() produces a
"gms::generation_type" value.

In turn, all callers of increment_and_get_generation() -- namely
scylla_main() [main.cc] and single_node_cql_env::run_in_thread()
[test/lib/cql_test_env.cc] -- pass the resolved value to
storage_service::init_address_map() and storage_service::join_cluster(),
both of which take a "gms::generation_type".

Therefore it is pointless to "untag" the generation value temporarily
between the producer and the consumers. Correct the return type of
increment_and_get_generation().

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
baccbc09c5 gms/gossiper: return "strong_ordering" from compare_endpoint_startup()
The callers of gossiper::compare_endpoint_startup() need not (should not)
learn of any particular (tagged or untagged) difference of generations;
they only care about the ordering of generations. Change the return type
of compare_endpoint_startup() to "std::strong_ordering", and delegate the
comparison to tagged_tagged_integer::operator<=>.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
3bb608056c gms/gossiper: get "int32_t" value of "gms::version_type" explicitly
In do_sort(), we need to drop to "int32_t" temporarily, so that we can
call ::abs() on the version difference. Do that explicitly.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Michał Chojnowski
4d77faa61e cql_test_env: ensure shutdown() before stop() for system_keyspace
If system_keyspace::stop() is called before system_keyspace::shutdown(),
it will never finish, because the uncleared shared pointers will keep
it alive indefinitely.

Currently this can happen if an exception is thrown before the construction
of the shutdown() defer. This patch moves the shutdown() call to immediately
before stop(). I see no reason why it should be elsewhere.

Fixes scylladb/scylla-enterprise#4380

Closes scylladb/scylladb#20089
2024-08-14 12:16:44 +03:00
Kefu Chai
06c60f6abe test/perf/perf_sstable: use switch-case when appropriate
instead of using a chain of `if-else`, use switch-case instead,
it's visually easier to follow than `if`-`else` blocks. and since
we never need to handle the `else` case, the `throw` statement
is removed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-14 17:14:42 +08:00
Kefu Chai
5141c6efe0 test/perf/perf_sstables: use test_modes as the type of its option
before this change, we look up for the mode using the command line
option as the key, but that's incorrect if the command line option does
not match with any of the known names. in that case, `test_mode` just
create another pair of <sstring, test_modes>, and return the second
component of this pair. and the second component is not what we expect.
we should have thrown an exception.

in this change

* the test_mode map is marked const.
* the overloads for parsing / formatting the `test_modes` type are
  added, so that boost::program_options can parse and format it.

after this change,

* we can print more user friendly error, like

```
/scylla perf-sstable --mode index-foo
error: the argument ('index-foo') for option '--mode' is invalid

Try --help.
```

instead of a bunch of output which is printed as if we passes the
correct option as the argument of the `--mode` option.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-14 17:14:42 +08:00
Dawid Medrek
4ba9cb0036 README: Update the version of C++ to C++23
Scylla has started being built with C++23. We update
the information in the relevant documents accordingly.

Closes scylladb/scylladb#20134
2024-08-14 12:06:23 +03:00
Kamil Braun
a3d53bd224 Merge 'Prevent ALTERing non-existing KS with tablets' from Piotr Smaron
ALTER tablets KS executes in 2 steps:
1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req,
2. global topo req is executed by topo coordinator, which reads data attached to the req.

The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue.
BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126
(I suggest to disable displaying whitespace differences when reviewing this PR).

Fixes: scylladb/scylladb#19576

Closes scylladb/scylladb#19666

* github.com:scylladb/scylladb:
  tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist
  cql: refactor rf_change indentation
  Prevent ALTERing non-existing KS with tablets
2024-08-14 10:27:41 +02:00
Piotr Smaron
ddb5204929 tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist
Using the error injection framework, we inject a sleep into the
processing path of ALTER tablets KS, so that the topology coordinator of
the leader node
sleeps after the rf_change event has been scheduled, but before it is
started to be executed. During that time the second node executes a DROP
KS statement, which is propagated to the leader node. Once leader node
wakes up and resumes processing of ALTER tablets KS, the KS won't exist
and the node cannot crash, which was the case before.
2024-08-13 21:51:51 +02:00
Pavel Emelyanov
05adee4c82 test: Add test for s3::client::bucket_lister
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 21:15:43 +03:00
Pavel Emelyanov
a02e65c649 s3_client: Add bucket lister
The lister resembles the directory_lister from util -- it returns
entries upon its .get() invocation, and should be .close()d at the end.

Internally the lister issues ListObjectsV2 request with provided prefix
and limits the server with the amount of entries returned not to consume
too much local memory (we don't have streaming XML parser for response).
If the result is indeed truncated, the subsequent calls include the
continuation token as per [1]

[1] https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 21:15:43 +03:00
Avi Kivity
d82fd8b5f0 Merge 'Relax sstable_directory::process_descriptor() call graph' from Pavel Emelyanov
The method logic is clean and simple -- load sstable from the descriptor and sort it into one of collections (local, shared, remote, unsorted). To achieve that there's a bunch of helper methods, but they duplicate functionality of each other. Squashing most of this code into process_descriptor() makes it easier to read and keeps sstable_directory private API much shorter.

Closes scylladb/scylladb#20126

* github.com:scylladb/scylladb:
  sstable_directory: Open-code load_sstable() into process_descriptor()
  sstable_directory: Squash sort_sstable() with process_descriptor()
  sstable_directory: Remove unused sstable_filename(desc) helper
  sstable_directory: Log sst->get_filename(), not sstable_filename(desc)
  sstable_directory: Keep loaded sst in local var
  sstable_directory: Remove unused helpers
  sstable_directory: Load sstable once when sorting
2024-08-13 16:42:52 +03:00
Pavel Emelyanov
d3870304a9 sstable_directory: Open-code load_sstable() into process_descriptor()
There are two load_sstable() overloads, and one of them is only used
inside process_descriptor(). What this loading helper does is, in fact,
processes given descriptor, so it's worth having it open-coded into its
caller.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 13:27:00 +03:00
Pavel Emelyanov
da4a5df339 sstable_directory: Squash sort_sstable() with process_descriptor()
The latter (caller) loads sstable, so does the former, so load it once
and then put it in either list/set, depending on flags and shard info.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 13:26:10 +03:00
Pavel Emelyanov
d8cb175fb7 sstable_directory: Remove unused sstable_filename(desc) helper
It's unused after previous patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 12:55:40 +03:00
Pavel Emelyanov
aa40aeb72f sstable_directory: Log sst->get_filename(), not sstable_filename(desc)
There are some places that log sstable Data file name via sstable
descriptor. After previous patching all those loggers have sstable at
hand and can use sstable::get_filename() instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 12:55:40 +03:00
Pavel Emelyanov
369f9111b8 sstable_directory: Keep loaded sst in local var
This will make next patch shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 12:55:40 +03:00
Pavel Emelyanov
ad3725fbbd sstable_directory: Remove unused helpers
After previous patch some wrappers around load_sstable() became unused.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 12:55:40 +03:00
Pavel Emelyanov
63f1969e08 sstable_directory: Load sstable once when sorting
In order to decide which list to put sstable into, the sort_sstable()
first calls get_shards_for_this_sstable() which loads the sstable
anyway. If loaded shards contain only the current one (which is the
common case) sstable is loaded again. In fact, if the sstable happens to
be remote it's loaded anyway to get its open info.

Fix that by loading sstable, then getting shards directly from it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 12:55:16 +03:00
Łukasz Paszkowski
ba2f037af5 mutation_partition: drop reverse parameter in compact_for_query
The reverse parameter is no longer used with native reverse reads.
The row ranges are provided in native reverse order together with
a reversed schema, thus the reverse parameter remain false all the
time and can be droped.
2024-08-13 10:07:12 +02:00
Łukasz Paszkowski
43221bbeed clustering_key_filter: unify get_ranges and get_native_ranges
When a reverse slice is provided, it is given in the native reverse
format. Thus the ranges will be returned in the same order as stored
in the slice.

Therefore there is no need to distinguish between get_ranges and
get_native_ranges. The latter one gets dropped and get_ranges returns
ranges in the same order as stored in the slice.
2024-08-13 10:07:12 +02:00
Łukasz Paszkowski
8b5ec0e963 streamed_mutation_freezer: drop the reverse parameter
The reverse parameter is no longer used with native reverse reads.
A reversed schema is provided and thus the reverse parameter shall
remain false all the time.
2024-08-13 10:07:12 +02:00
Łukasz Paszkowski
f4ca734ccb reverse-reads.md: Drop legacy reverse format information 2024-08-13 10:07:12 +02:00
Łukasz Paszkowski
b3bf555036 Fix comments refering to half-reversed (legacy) slices 2024-08-13 10:07:12 +02:00
Łukasz Paszkowski
15a01c7111 select_statement::do_execute: Add tracing informaction
Add information on table and query schema versions to tracing.
2024-08-13 10:07:12 +02:00
Łukasz Paszkowski
158b994676 query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges
Simplify implementation and for clustering key ranges in native
reversed format, require a reversed table schema.

Trimming native reversed clustering key ranges requires a reversed
schema to be passed in. Thus, the reverse flag is no longer required
as it would always be set to false.
2024-08-13 10:07:10 +02:00
Łukasz Paszkowski
8d95d44027 query-request: Drop half_reverse_slice as it is no longer used anywhere 2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
da95f44adc readers: Use reversed schema and native reversed slices
The reconcilable_result is built as it would be constructed for
forward read queries for tables with reversed order.

Mutations constructed for reversed queries are consumed forward.

Drop overloaded reversed functions that reverse read_command and
reconcilable_result directly and keep only those requiring smart
pointers. They are not used any more.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
faa62310d9 database: accept reversed schema for reversed queries
Remove schema reversing in query() and query_mutations() methods.
Instead, a reversed schema shall be passed for reversed queries.
Rename a schema variable from s into query_schema for readability.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
df734e35a1 storage_proxy: Support reverse queries in native format
For reversed queries, query_result() method accepts a reversed table
schema and read_command with a query schema version and a slice in
native reversed format.

Support mixed-node clusters. In such a case, the feature flag
native_reverse_queries is disabled and the read_command in sent
to replicas in the old regacy format (stores table schema version
and a slice in the legacy reverse format).

After the reconciliation, for the read+repair case, un-reversed
mutations are sent to replicas, i.e. forward ones.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
d9e76a5295 query_pagers: Replace _schema with _query_schema
For readability purposes. As the constructor accepts a query schema,
let the varaible holding a schema be called _query_schema.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
0b2e5ff28f query_pagers: Support reverse queries in native format
For reversed queries, accept a reversed table schema and read_command
with a query schema version and a slice in native reversed format.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
309ba68692 select_statement: Execute reversed query in native format
Use a reversed schema and a native reversed slice when constructing
a read_command and executing a reversed select statement.

Such a created read_command is passed further down to query_pagers::pager
and storage::proxy::query_result that transform it to the format
they accept/know, i.e. lagacy.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
8c391a8ebe storage_proxy::remote: Add support for mixed-node clusters
In handle_read, detect whether a coming read_command is in the
legacy reversed format or native reversed format.

The result will be used to transform the read_command between format
as well as to transforms the results before they are send back to
the coordinator.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
fbd324b5cd mutation_query: Add reversed function to reverse reconcilable_result
The reconcilable_result is reversed by reversing mutations for all
paritions it holds. Reversing is asynchronous to avoid potential
stall.

Use for transitions between legacy and native formats and in order
to support mixed-nodes clusters.
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
b91edbacf1 query-request: Add reversed function to reverse read_command
The read_command is reversed by reversing the schema version it
holds and transforming a slice from the legacy reversed format to
the native reversed format.

Use for trasition between format and to support mixed-nodes clusters
2024-08-13 10:03:46 +02:00
Łukasz Paszkowski
9690785112 features: add native_reverse_queries
Enabled when all replicas support the native_reversed command slice
and return the result in reverse order in this case.
2024-08-13 10:03:42 +02:00
Łukasz Paszkowski
7b201e9165 kl::reader::make_reader: Unify interface with mx::reader::make_reader
Ensure both readers have the same interfaces to avoid mistakes as
both readers are used in sstable::make_reader. Less error prone.
2024-08-13 10:02:43 +02:00
Łukasz Paszkowski
b270097f1f config: drop reversed_reads_auto_bypass_cache
Reverse reads have already been with us for a while, thus this back
door option to bypass in-memory data cache for reversed queries can
be retired.
2024-08-13 10:02:42 +02:00
Łukasz Paszkowski
80df313f49 config: drop enable_optimized_reversed_reads
Reverse reads have already been with us for a while, thus this back
door option to read entire paritions forward and reversing them after
can be retired.
2024-08-13 10:02:42 +02:00
Pavel Emelyanov
6675bd8a5c s3_client: Encode query parameter value for query-string
When signing AWS query one need to prepare "query string" which is a
line looking like `encode(query_param)=encode(query_value)&...`. Encoded
are only the query parameter names and values. It was missing in current
code and so far worked because no encodable characters were used.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 10:59:31 +03:00
Raphael S. Carvalho
74612ad358 tablets: Fix race between repair and split
Consider the following:

T
0   split prepare starts
1                               repair starts
2   split prepare finishes
3                               repair adds unsplit sstables
4                               repair ends
5   split executes

If repair produces sstable after split prepare phase, the replica
will not split that sstable later, as prepare phase is considered
completed already. That causes split execution to fail as replicas
weren't really prepared. This also can be triggered with
load-and-stream which shares the same write (consumer) path.

The approach to fix this is the same employed to prevent a race
between split and migration. If migration happens during prepare
phase, it can happen source misses the split request, but the
tablet will still be split on the destination (if needed).
Similarly, the repair writer becomes responsible for splitting
the data if underlying table is in split mode. That's implemented
in replica::table for correctness, so if node crashes, the new
sstable missing split is still split before added to the set.

Fixes #19378.
Fixes #19416.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-08-12 17:28:51 -03:00
Raphael S. Carvalho
239344ab55 compaction: Allow "offline" sstable to be split
In order to fix the race between split and repair, we must introduce
the ability to split an "offline" sstable, one that wasn't added
to any of the table's sstable set yet.

It's not safe to split a sstable after adding it to the set, because
a failure to split can result in unsplit data left in the set, causing
split to fail down the road, since the coordinator thinks this replica
has only split data in the set.

Refs #19378.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-08-12 17:27:16 -03:00
Laszlo Ersek
607abe96e8 test/sstable: merge test_using_reusable_sst*()
All lambdas passed to test_using_reusable_sst() conform to the prototype

  void (test_env&, sstable_ptr)

All lambdas passed to test_using_reusable_sst_returning() conform to the
prototype

  NON_VOID (test_env&, sstable_ptr)

The common parameter list of both prototypes can be expressed with the
concept

  std::invocable<test_env&, sstable_ptr>

Once a "Func" template parameter (i.e., function type) satisfying this
concept is taken, then "Func"'s void or non-void return type can be
commonly expressed with

  std::invoke_result_t<Func, test_env&, sstable_ptr>

In turn, test_env::do_with_async_returning<...> can be instantiated with
this return type, even if it happens to be "void".

([stmt.return] specifies, "[a] return statement with an operand of type
void shall be used only in a function that has a cv void return type",
meaning that

  return func(env)

will do the right thing in the body of
test_env::do_with_async_returning<void>().)

Merge test_using_reusable_sst() and test_using_reusable_sst_returning()
into one. Preserve the function name from the former, and the
test_env::do_with_async_returning<...>() call from the latter.

Suggested-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>

Closes scylladb/scylladb#20090
2024-08-12 17:52:01 +03:00
Kefu Chai
db4654ca49 test/nodetool: deduce subpression path from top srcdir
there are chances that developer launch `pytest` right under
`test/nodetool`, in that case current working directory is not
the root directory of the project, so the path to suppression rules
does not point to a file.

to cater the needs to run the test under `test/nodetool`, let's
use the path deduced from the top_srcdir.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-12 22:50:18 +08:00
Kefu Chai
c817e13d63 test/nodetool: deduce path from top srcdir
add a helper to get path from top src dir, more readable this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-08-12 22:50:18 +08:00
Nikos Dragazis
90363ce802 test: Test the SSTable validation API against malformed SSTables
Unit testing for the SSTable validation API happens in
`sstable_validate_test`. Currently, this test checks the API against
some invalid SSTables with out-of-order clustering rows and out-of-order
partitions. However, both are types of content-level corruption that do
not trigger `malformed_sstable_exception` errors.

Extend the test to cover cases of file-level corruption as well, i.e.,
cases that would raise a `malformed_sstable_exception`. Construct an
SSTable with an invalid checksum to trigger this.

This is part of the effort to improve scrub to handle all kinds of
corruption.

Fixes scylladb/scylladb#19057

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#20096
2024-08-12 15:09:58 +03:00
Botond Dénes
fec57c83e6 Merge 'cell_locker: maybe_rehash: ignore allocation failures' from Benny Halevy
`maybe_rehash` is complimentary and is not strictly require to succeed. If it fails, it will retry on the next call, but there's no reason to throw an exception that will fail its caller, since `maybe_rehash` is called as the final step after the caller has already succeeded with its action.

Minor enhancement for the error path, no backport required.

Closes scylladb/scylladb#19910

* github.com:scylladb/scylladb:
  cell_locker: maybe_rehash: reindent
  cell_locker: maybe_rehash: ignore allocation failures
2024-08-12 10:54:56 +03:00
Kefu Chai
0ae04ee819 build: cmake: use $<CONFIG:cfgs> when appropriate
per
https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html#genex:CONFIG,
`cfgs` can be a comma-separated list. this is supported by CMake
3.19 and up, and our minimum required CMake version is 3.27. so let's
switch over from the composition of `IN_LIST` and `CONFIG` generator
expressions to a single one. simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20110
2024-08-11 21:28:38 +03:00
Avi Kivity
318278ff92 Merge 'tablets: reload only changed metadata' from Botond Dénes
Currently, each change to tablet metadata triggers a full metadata reload from disk. This is very wasteful, especially if the metadata change affects only a single row in the `system.tablets` table. This is the case when the tablet load balancer triggers a migration, this will affect a single row in the table, but today will trigger a full reload.
We expect tablet count to potentially grow to thousands and beyond and the overhead of this full reload can become significant.
This PR makes tablet metadata reload partial, instead of reloading all metadata on topology or schema changes, reload only the partitions that are affected by the change. Copy the rest from the in-memory state.
This is done with two passes: first the change mutations are scanned and a hint is produced. This hint is then passed down to the reload code, which will use it to only reload parts (rows/partitions) of the metadata that has actually changed.

The performance difference between full reload and partial reload is quite drastic:
```
INFO  2024-07-25 05:06:27,347 [shard 0:stat] testlog - Tablet metadata reload:
full      616.39ms
partial     0.18ms
```
This was measured with the modified (by this PR) `perf_tablets`, which creates 100 tables, each with 2K tablets. The test was modified to change a single tablet, then do a full and partial reload respectively, measuring the time it takes for reach.

Fixes: #15294

New feature, no backport needed.

Closes scylladb/scylladb#15541

* github.com:scylladb/scylladb:
  test/perf/perf_tablets: add tablet metadata reload perf measurement
  test/boost/tablets_test: add test for partial tablet metadata updates
  db/schema_tables: pass tablet hint to update_tablet_metadata()
  service/storage_service: load_tablet_metadata(): add hint parameter
  service/migration_listener: update_tablet_metadata(): add hint parameter
  service/raft/group0_state_machine: provide tablet change hint on topology change
  service/storage_service: topology_state_load(): allow providing change hint
  replica/tablets: add update_tablet_metadata()
  replica/tablets: fix indentation
  replica/tablets: extract tablet_metadata builder logic
  replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint()
  locator/tablets: add tablet_map::clear_tablet_transition_info()
  locator/tablets: make tablet_metadata cheap to copy
  mutation/canonical_mutation: add key()
2024-08-11 21:27:18 +03:00
Botond Dénes
2b2db510b7 test/perf/perf_tablets: add tablet metadata reload perf measurement
Measure reload perf of full reload vs. partial reload, after changing a
single tablet.
While at it, modify the `--tablets-per-table` parameter, so that it has
a default parameter which works OOTB. The previous default was both too
large (causing oversized commitlog entry errors) and not a power of two.
2024-08-11 09:53:19 -04:00
Botond Dénes
65eee200b2 test/boost/tablets_test: add test for partial tablet metadata updates 2024-08-11 09:53:19 -04:00
Botond Dénes
b886ed44a7 db/schema_tables: pass tablet hint to update_tablet_metadata()
Replace the has_tablet_mutations in `merge_tables_and_views()` with a
hint parameter, which is calculated in the caller, from the original
schema change mutations. This hint is then forwarded to the notifier's
`update_tablet_metadata()` so that subscribers can refresh only the
tablet partitions that changed.
2024-08-11 09:53:19 -04:00
Botond Dénes
5bff422b54 service/storage_service: load_tablet_metadata(): add hint parameter
Allowing for reloading only those parts of the tablet metadata that were
actually changed.
2024-08-11 09:53:19 -04:00
Botond Dénes
2cec0d8dd1 service/migration_listener: update_tablet_metadata(): add hint parameter
The hint contains information related to what exactly changed, allowing
listeners to do partial updates, instead of reloading all metadata on
each notification.
2024-08-11 09:53:19 -04:00
Botond Dénes
ca302d9e28 service/raft/group0_state_machine: provide tablet change hint on topology change
So that when reloading tablet state metadata from the disk, only the
changed parts are reloaded.
2024-08-11 09:53:19 -04:00
Botond Dénes
806ec3244a service/storage_service: topology_state_load(): allow providing change hint
So that when reloading state from disk, only changed parts are reloaded
instead of all. For now, only tablets have hints implemented.
2024-08-11 09:53:18 -04:00
Botond Dénes
bb1e733fe0 replica/tablets: add update_tablet_metadata()
Allows updateng tablet metadata in-place, according to the provided
hint, reading and updating only the parts that actually changed.
2024-08-11 09:52:37 -04:00
Botond Dénes
66292b4baa replica/tablets: fix indentation
Left broken from the previous patch.
2024-08-11 09:52:37 -04:00
Botond Dénes
aa378c458e replica/tablets: extract tablet_metadata builder logic
So it can be reused in a new method.
Indentation is left broken deliberately, to make the patch easier to
read.
2024-08-11 09:52:37 -04:00
Botond Dénes
f5976aa87b replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint()
Extract a hint of what a tablet mutation changed. The hint can be later
used to selectively reload only the changed parts from disk.
Two variants are added:
* get_tablet_metadata_change_hint() - extracts a hint from a list of
  tablet mutations
* update_tablet_metadata_change_hint() - updates an existing hint based
  on a single mutation, allowing for incremental hint extraction
2024-08-11 09:52:37 -04:00
Botond Dénes
54ea71f8a6 locator/tablets: add tablet_map::clear_tablet_transition_info() 2024-08-11 09:52:37 -04:00
Botond Dénes
0254cfc7d3 locator/tablets: make tablet_metadata cheap to copy
Keep lw_shared_ptr<tablet_map> in the tablet map and use COW semantics.
To prevent accidental changes to shared tablet_map instances, all
modifications to a tablet_map have to go through a new
`mutate_tablet_map()` method, which implements the copy-modify-swap
idiom.
2024-08-11 09:52:37 -04:00
Botond Dénes
fb0ab3c1fb mutation/canonical_mutation: add key()
Extracts the partition key without deserializing the entire mutation.
2024-08-11 09:52:37 -04:00
Calle Wilund
e18a855abe extensions: Add exception types for IO extensions and handle in memtable write path
Fixes #19960

Write path for sstables/commitlog need to handle the fact that IO extensions can
generate errors, some of which should be considered retry-able, and some that should,
similar to system IO errors, cause the node to go into isolate mode.

One option would of course be for extensions to simply generate std::system_errors,
with system_category and appropriate codes. But this is probably a bad idea, since
it makes it more muddy at which level an error happened, as well as limits the
expressibility of the error.

This adds three distinct types (sharing base) distinguishing permission, availabilty
and configuration errors. These are treated akin to EACCESS, ENOENT and EINVAL in
disk error handler and memtable write loop.

Tests updated to use and verify behaviour.

Closes scylladb/scylladb#19961
2024-08-11 13:52:35 +03:00
Raphael S. Carvalho
75829d75ec replica: Fix race between split compaction and migration
After removal of rwlock (53a6ec05ed), the race was introduced because the order that
compaction groups of a tablet are closed, is no longer deterministic.

Some background first:
Split compaction runs in main (unsplit) group, and adds sstable to left and right groups
on completion.

The race works as follow:
1) split compaction starts on main group of tablet X
2) tablet X reaches cleanup stage, so its compaction groups are closed in parallel
3) left or right group are closed before main (more likely when only main has flush work to do)
4) split compaction completes, and adds sstable to left and right
5) if e.g left is closed, adjusting backlog tracker will trigger an exception, and since that
happens in row cache update's execute(), node crashes.

The problem manifested as follow:
[shard 0: gms] raft_topology - Initiating tablet cleanup of 5739b9b0-49d4-11ef-828f-770894013415:15 on 102a904a-0b15-4661-ba3f-f9085a5ad03c:0
...
[shard 0:strm] compaction - [Split keyspace1.standard1 009e2f80-49e5-11ef-85e3-7161200fb137] Splitting [/var/lib/scylla/data/keyspace1/...]
...
[shard 0:strm] cache - Fatal error during cache update: std::out_of_range (Compaction state for table [0x600007772740] not found),
at: ...
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, row_cache::do_update(...
   --------
   seastar::internal::do_with_state<std::tuple<row_cache::external_updater, std::function<seastar::future<void> ()> >, seastar::future<void> >
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::(anonymous namespace)::thread_wake_task
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::async<sstables::compaction::run(...
   seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::future<sstables::compaction_resu...

From the log above, it can be seen cache update failure happens under streaming sched group and
during compaction completion, which was good evidence to the cause.
Problem was reproduced locally with the help of tablet shuffling.

Fixes: #19873.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#19987
2024-08-11 11:00:19 +03:00
Botond Dénes
1f4b9a5300 Merge 'compaction: drop compaction executors' possibility to bypass task manager' from Aleksandra Martyniuk
If parent_info argument of compaction_manager::perform_compaction
is std::nullopt, then created compaction executor isn't tracked by task
manager. Currently, all compaction operations should by visible in task
manager.

Modify split methods to keep split executor in task manager. Get rid of
the option to bypass task manager.

Closes scylladb/scylladb#19995

* github.com:scylladb/scylladb:
  compaction: replace optional<task_info> with task_info param
  compaction: keep split executor in task manager
2024-08-11 10:26:43 +03:00
Botond Dénes
0bb1075a19 Merge 'tasks: fix task handler' from Aleksandra Martyniuk
There are some bugs missed in task handler:
- wait_for_task does not wait until virtual tasks are done, but returns the status immediately;
- wait_for_task suffers from use after return;
- get_status_recursively does not set the kind of task essentials.

Fix the aforementioned.

Closes scylladb/scylladb#19930

* github.com:scylladb/scylladb:
  test: add test to check that task handler is fixed
  tasks: fix task handler
2024-08-11 10:23:17 +03:00
Paweł Zakrzewski
9db272c949 test/cql-pytest: Add test for GROUP BY queries with LIMIT
Remove xfail from all tests for #5361, as the issue is fixed.

Remove xfail from test_group_by_clustering_prefix_with_limit
It references #5362, but is fixed by #17237.

Refs #17237
2024-08-11 09:08:44 +02:00
Paweł Zakrzewski
e7ae7f3662 cql3: process LIMIT for GROUP BY queries
Currently LIMIT not passed to the query executor at all and it was just
an accident that it worked for the case referenced in #17237. This
change passes the limit value down the chain.
2024-08-11 09:08:43 +02:00
Paweł Zakrzewski
3838ad64b3 cql3/select_statement: simplify the get_limit function
The get_limit() function performed tasks outside of its scope - for
example checked if the statement was an aggregate. This change moves the
onus of the check to the caller.
2024-08-11 09:08:43 +02:00
Paweł Zakrzewski
08f3219cb8 cql3: respect the user-defined page size in aggregate queries
The comment in the code already states that we should use the
user-defined page size if it's provided. To avoid OOM conditions we'll
use the internally defined limit as the upper bound or if no page size
is provided.

This change lays ground work for fixing #5362 and is necessary to pass
the test introduced in #19392 once it is implemented.
2024-08-11 09:08:43 +02:00
Michał Jadwiszczak
3745d0a534 gms/feature_service: allow to suppress features
This patch adds `suppress_features` error injection. It allows to revoke
support for some features and it can be used to simulate upgrade process
in test.py.

Features to suppress are passed as injection's value, separated by `;`.
Example: `PARALLELIZED_AGGREGATION;UDA_NATIVE_PARALLELIZED_AGGREGATION`

Fixes scylladb/scylladb#20034

Closes scylladb/scylladb#20055
2024-08-09 19:15:19 +02:00
Kefu Chai
a78f46aad7 s3/client: customize options for input_stream
before this change, we use the default options for
performing read on the input. and the default options
is like
```c++
struct file_input_stream_options {
    size_t buffer_size = 8192;    ///< I/O buffer size
    unsigned read_ahead = 0;      ///< Maximum number of extra read-ahead operations
};
```
which is not able to offer good throughput when
reading from disk, when we stream to S3.

so, in this change, we use options which allows better throughput.

Refs 061def001d
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20074
2024-08-09 11:52:30 +03:00
Dawid Medrek
e5d01d4000 db/hints: Make commitlog use commitlog IO scheduling group
Before these changes, we didn't specify which I/O scheduling
group commitlog instances in hinted handoff should use.
In this commit, we set it explicitly to the commitlog
scheduling group. The rationale for this choice is the fact
we don't want to cause a bottleneck on the write path
-- if hints are written too slowly, new incoming mutations
(NOT hints) might be rejected due to a too high number
of hints currently being written to disk; see
`storage_proxy::create_write_response_handler_helper()`
for more context.

Fixes scylladb/scylladb#18654

Closes scylladb/scylladb#19170
2024-08-08 16:14:07 +02:00
Piotr Dulikowski
b72906518f Merge 'service levels: update connections parameters automatically' from Michał Jadwiszczak
This patch makes all cql connections update theirs service level parameters automatically when:
- any service level is created or changed
- one role is granted to another
- any service level is attached to/detached from a role

First of all, the patch defines what a service level and an effective service level are 938aa10509. No new type of service levels are introduced, the commit only clarifies definitions and names what an effective service level is.
(Effective service level is created by merging all service levels which are attached to all roles granted to the user. It represents exact  values of connection's parameters.)

Previously, to find an effective service level of a user, it required O(n) internal queries: O(n) queries to recursively find all granted roles (`standard_role_manager::query_granted()`) and a query for each role to get its service level (`standard_role_manager::get_attribute()`, which sums to O(n) queries).

Because we want to reload SL parameters for all opened cql connections, we don't want to do O(n) queries for every connection, every time we create or change any service level/grant one role to another/attach or detach a service level to/from a role.

To speed it up, the patch adds another layer of service level controller cache, which stored `role_name -> effective_service_level` mapping. This way finding a effective service level for a role is only a lookup to a map.
Building the new cache requires only 2 queries: one to obtain all role hierarchy one to get all roles' service level.

Fixes scylladb/scylladb#12923

Closes scylladb/scylladb#19085

* github.com:scylladb/scylladb:
  test/auth_cluster/test_raft_service_levels: add test for automatic connection update
  api/cql_server_test: add CQL server testing API
  transport/cql_server: subscribe to sl effective cache reloaded
  transport/controller: coroutinize `subscribe_server` and `unsubscribe_server`
  transport/cql_server: add method to update service level params on all connections
  generic_server: use async function in `for_each_gently()`
  service/qos/sl_controller: use effective service levels cache
  service/qos/service_level_controller: notify subscribers on effective cache reloaded
  service/raft/group0_state_machine: update effective service levels cache
  service/topology_coordinator: migrate service levels before auth
  service/qos/service_level_controller: effective service levels cache
  utils/sorting: allow to pass any container as verticies
  service/qos/service_level_controller: replace shard check to assert
  service/qos: define effective service level
  service/qos/qos_common: use const reference in `init_effective_names()`
  service/qos/service_level_controller: remove unused field
  auth: return map of directly granted roles
  test/auth/test_auth_v2_migration: create sl1 in the test
2024-08-08 15:31:04 +02:00
Anna Stuchlik
a1b4357765 doc: update Raft info in 6.1
This commit updates the Raft information regarding the Raft verification procedure.
In 6.1, the procedure is no longer related to the upgrade.

Fixes https://github.com/scylladb/scylladb/issues/19932

Closes scylladb/scylladb#20040
2024-08-08 11:25:50 +02:00
PeterFlockhart
0f9c6d24cf Update SELECT grammar to define group_by_clause explicitly
Closes scylladb/scylladb#20046
2024-08-08 12:23:20 +03:00
Avi Kivity
12c68bcf75 Merge 'querier: include cell stats in page stats' from Botond Dénes
We have two mechanism to give visibility into reads having to process many tombstones:
* a warning in the logs, triggered if a read processed more the `tombstone_warn_threshold` dead rows/tombstones
* a trace message, which includes stats of the amount of rows in the page, including the amount of live and dead rows as well as tombstones

This series extends this to also include information on cells, so we have visibility into the case where a read has to process an excessive amount of cell tombstones (mainly because of collections).
A log line is now also logged if the amount of dead cells/tombstones in the page exceeds `tombstone_warn_threshold`. The trace message is also extended to contain cell stats.

The `tombstone_warn_threshold` log lines now receive a 10s rate-limit to avoid excessive log spamming. The rate-limit is separate for the row and cell logs.

Example of the new log line (`tombstone_warn_threshold=10` ):
```
WARN  2024-05-30 07:56:44,979 [shard 0:stmt] querier - Read 98 live cells and 126 dead cells/tombstones for system_schema.scylla_tables <partition-range-scan> (-inf, +inf) (see tombstone_warn_threshold)
```

Example of the new tracing message:
```
Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 13 cell(s) (1 live, 12 dead) [shard 0] | 2024-05-30 08:13:19.690803 | 127.0.0.1 |           6114 | 127.0.0.1
```

Fixes: https://github.com/scylladb/scylladb/issues/18996

Improvement, not a backport candidate.

Closes scylladb/scylladb#18997

* github.com:scylladb/scylladb:
  test/boost: mutation_test: add test for cell compaction stats
  mutation/compact_and_expire_result: drop operator bool()
  querier: consume_page(): add rate-limiting to tombstone warnings
  querier: consume_page(): add cell stats to page stats trace message
  querier: consume_page(): add tombstone warning for cell tombstones
  querier: consume_page(): extract code which logs tombstone warning
  mutation/mutation_compactor: collect and aggregate cell compaction stats
  mutation: row::compact_and_expire(): use compact_and_expire_result
  collection_mutation: compact_and_expire(): use compact_and_expire_result
  mutation: introduce compact_and_expire_result
2024-08-08 12:16:13 +03:00
Calle Wilund
d6742e9bce distributed_loader: Remove load_prio_keyspaces
Fixes #13334

All required code paths (see enterprise) now uses
extensions::is_extension_internal_keyspace.
The old mechanism can be removed. One less global var.

Closes scylladb/scylladb#20047
2024-08-08 12:10:27 +03:00
Avi Kivity
db77b5bd03 Merge 'convert the rest of test/boost/sstable_test.cc to co-routines and seastar::thread' from Laszlo Ersek
This is a followup to #19937, for #19803. See in particular [this comment](https://github.com/scylladb/scylladb/issues/19803#issuecomment-2258371923).

The primary conversion target is coroutines. However, while coroutines are the most convenient style, they are only infrequently usable in this case, for the following reasons:
- Wherever we have a `future::finally()` that calls a cleanup function that returns a future (which must be awaited), we cannot use `co_await`. We can only use `seastar::async()` with `deferred_close` or `defer()`.
- The code passes lots of lambdas, and `co_await` cannot be used in lambdas. First, I tried, and the compiler rejects it; second, a capturing lambda that is a coroutine is a trap [[1]](https://devblogs.microsoft.com/oldnewthing/20211103-00/?p=105870) [[2]](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rcoro-capture).

In most cases, I didn't have to use naked `seastar::async()`; there were specialized wrappers in place already. Thus, most of the changes target `seastar::thread` context under existent `seastar::async()` wrappers, and only a few functions end up as coroutines.

The last patch in the series (`test/sstable: remove useless variable from promoted_index_read()`) is an independent micro-cleanup, the opportunity for which I thought to have noticed while reading the code.

The tail of `test/boost/sstable_test.cc` (the stuff following `promoted_index_read()`) is already written as `seastar::thread`. That's already better (for readability) than future chaining; but could have I perhaps further converted those functions to coroutines? My answer was "no":
- Some of the candidate functions relied on deferred cleanups that might need to yield (all three variants of `count_rows()`).
- Some had been implemented by passing lambdas to wrappers of `seastar::async()` (`sub_partition_read()`, `sub_partitions_read()`).
- The test case `test_skipping_in_compressed_stream()` initially looked promising for co-routinization (from its starting point `seastar::async()`), because it seemed to employ no deferred cleanup (that might need to yield). However, the function uses three lambdas that must be able to yield internally, and one of those (`make_is()`) is even capturing.
- The rest (`test_empty_key_view_comparison()`, `test_parse_path_good()`, `test_parse_path_bad()`) was synchronous code to begin with.

```
 test/boost/sstable_test.cc | 188 +++++++++-----------
 1 file changed, 83 insertions(+), 105 deletions(-)
```

Refactoring; no backport needed.

Closes scylladb/scylladb#20011

* github.com:scylladb/scylladb:
  test/sstable: remove useless variable from promoted_index_read()
  test/sstable: rewrite promoted_index_read() with async()
  test/sstable: unfuturize lambda invocation in test_using_reusable_sst*()
  test/sstable: rewrite wrong_range() with async()
  test/sstable: simplify not_find_key_composite_bucket0() under test_using_reusable_sst()
  test/sstable: rewrite full_index_search() with async()
  test/sstable: simplify find_key*(), all_in_place() under test_using_reusable_sst()
  test/sstable: rewrite (un)compressed_random_access_read() with async()
  test/sstable: simplify write_and_validate_sst()
  test/sstable: simplify check_toc_func() under async()
  test/sstable: simplify check_statistics_func() under async()
  test/sstable: simplify check_summary_func() under async()
  test/sstable: coroutinize check_component_integrity()
  test/sstable: rewrite write_sst_info() with async()
  test/sstable: simplify missing_summary_first_last_sane()
  test/sstable: coroutinize summary_query_fail()
  test/sstable: rewrite summary_query() with async()
  test/sstable: coroutinize (simple/composite)_index_read()
  test/sstable: rewrite index_read() with async()
  test/sstable: rewrite test_using_reusable_sst() with async()
  test/sstable: rewrite test_using_working_sst() with async()
2024-08-08 11:55:37 +03:00
Michał Jadwiszczak
b62a8b747a test/auth_cluster/test_raft_service_levels: add test for automatic
connection update
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
870bdaa6b1 api/cql_server_test: add CQL server testing API
Add a CQL server testing API with and endpoint to dump
service level parameters of all CQL connections.

This endpoint will be later used to test functionality of
automated updating CQL connections parameters.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
c3e8778ad4 transport/cql_server: subscribe to sl effective cache reloaded
Make cql server (but not maintenance server) is subscribed to qos
configuration change.
Trigger update of connections' service level params on effective cache
reloaded event.

It's not done on maintenance server because it doesn't support role
hierarchy nor attaching service levels.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
b2f2288292 transport/controller: coroutinize subscribe_server and unsubscribe_server 2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
4af90726b6 transport/cql_server: add method to update service level params on all
connections

Trigger update of service level param on every cql connection.
In enterprise, the method needs also to update connections' scheduling
group.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
324b3c43c0 generic_server: use async function in for_each_gently()
In the following patch, we will add a method to update service levels
parameters for each cql connections.
To support this, this patch allows to pass async function as a parameter
to `for_each_gently()` method.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
93e6de0d04 service/qos/sl_controller: use effective service levels cache
Use cache to quickly access effective service level of a role.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
664a1913c6 service/qos/service_level_controller: notify subscribers on effective
cache reloaded

Add event representing reload of effective service level cache and
notify subscribers when the cache is reloaded.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
5f8132c13c service/raft/group0_state_machine: update effective service levels cache
Updates to `system.role_members` and `system.role_attributes` affect
effective service levels cache, so applying mutations to those tables
should reload the effective SL cache.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
7b28df9b4d service/topology_coordinator: migrate service levels before auth
Effective service level cache will be updated when mutations are applied to
some of the auth tables.
But the effective cache depends on first-level service levels cache, so
service levels data should be migrated before auth data.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
842573d0af service/qos/service_level_controller: effective service levels cache
Add a second layer of service_level_controller cache which contains
role name -> effective service level mapping.
To build the mapping, controller uses first cache layer (service level
name -> service level) and 2 queries to auth tables (one to `roles` and
one to `role_members`).
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
4922f87fed utils/sorting: allow to pass any container as verticies
The container containing all verticies doesn't have to be a vector.
Allowing to pass any container that meet conditions, will make to
function more flexible.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
619937c466 service/qos/service_level_controller: replace shard check to assert
The cache is only updated on shard 0, so doing assert is a better sanity
check.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
be4c83ad3c service/qos: define effective service level
Write down definitions of `service level` and `effective service level`
in service/qos/service_level_controller.hh.

Until now, effective service level was only used as result of
`LIST EFFECTIVE SERVICE LEVEL OF <role>`.
Now we want to have quick access to effective service level of
each role and introduce cache of effective sl to do it.
New definitions clarify things.

The commit also renames:
- `update_service_levels_from_distributed_data` -> `update_service_levels_cache`
  Later we will introduce effective_service_level_cache, so this change
  standarizes the names.
- `find_service_level` -> `find_effective_service_level`
  The function actualy returns effective service level.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
0da979e013 service/qos/qos_common: use const reference in init_effective_names()
`service_level_options::init_effective_names()` method's argument has no
reason to be mutable reference.
This commit converts it to const ref.
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
37cd998993 service/qos/service_level_controller: remove unused field 2024-08-08 10:42:08 +02:00
Michał Jadwiszczak
f9048de0ce auth: return map of directly granted roles
Returns multimap of directly granted roles for each role. Uses
only one query to create the map, instead of doing recursive queries
for each individual role.
2024-08-08 10:42:08 +02:00
Michał Jadwiszczak
d643d5637c test/auth/test_auth_v2_migration: create sl1 in the test
Test `test_auth_v2_migration` creates auth data where role `users`
has assigned service level `sl:fefe` but the service level isn't
actually created.

In following patches, we are going to introduce effective service levels
cache which depends on auth and is refreshed when mutations are applied
to v2 auth tables.

Without this changes, this test will fail because the service level
doesn't exist.
Also the name `sl:fefe` is change to `sl1`.
2024-08-08 10:42:08 +02:00
Avi Kivity
3fe60560d2 Merge 'Coroutinize view_builder::start()' from Pavel Emelyanov
It runs in the background and consists of two parts -- async() lambda and following .then()-s. This PR move the background running code into its own method and coroutinizes it in parts. With #19954 merged it finally looks really nice.

Closes scylladb/scylladb#20058

* github.com:scylladb/scylladb:
  view_builder: Restore indentation after previous patches
  view_builder: Coroutinize inner start_in_background() calls
  view_builder: Coroutinize outer start_in_background() calls
  view_builder: Add helper method for background start
2024-08-07 19:47:32 +03:00
Kamil Braun
4181a1c53e storage_service: raft topology: warn when raft_topology_cmd_handler fails due to abort
Currently we print an ERROR on all exceptions in
`raft_topology_cmd_handler`. This log level is too high, in some cases
exceptions are expected -- like during shutdown. And it causes dtest
failures.

Turn exceptions from aborts into WARN level.

Also improve logging by printing the command that failed.

Fixes scylladb/scylladb#19754

Closes scylladb/scylladb#19935
2024-08-07 17:57:23 +02:00
Tomasz Grabiec
1a4baa5f9e tablets: Do not allocate tablets on nodes being decommissioned
If tablet-based table is created concurrently with node being
decommissioned after tablets are already drained, the new table may be
permanently left with replicas on the node which is no longer in the
topology. That creates an immidiate availability risk because we are
running with one replica down.

This also violates invariants about replica placement and this state
cannot be fixed by topology operations.

One effect is that this will lead to load balancer failure which will
inhibit progress of any topology operations:

  load_balancer - Replica 154b0380-1dd2-11b2-9fdd-7156aa720e1a:0 of tablet 7e03dd40-537b-11ef-9fdd-7156aa720e1a:1 not found in topology, at:  ...

Fixes #20032

Closes scylladb/scylladb#20053
2024-08-07 18:52:58 +03:00
Dawid Medrek
96509c4cf7 db/hints: Make sync points be created for all hosts when not specified
Sync points are created, via POST HTTP requests, for a subset of nodes
in the cluster. Those nodes are specified in a request's parameter
`target_hosts`. When the parameter is empty, Scylla should assume
the user wants to create a sync point for ALL nodes.

Before these changes, sync points were created only for LIVE nodes.
If a node was dead but still part of the cluster and the user
requested creating a sync point leaving the parameter `target_hosts`
empty, the dead node was skipped during the creation of the sync point.
That was inconsistent with the guarantees the sync point API provides.

In this commit, we fix that issue and add a test verifying that
the changes have made the implementation compliant with the design
of the sync point API -- the test only passes after this commit.

Fixes scylladb/scylladb#9413

Closes scylladb/scylladb#19750
2024-08-07 13:15:20 +02:00
Pavel Emelyanov
63afbc0fcb view_builder: Restore indentation after previous patches
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-07 14:00:01 +03:00
Pavel Emelyanov
aa1a5d3201 view_builder: Coroutinize inner start_in_background() calls
One of the co_await-ed parts of this method is async() lambda. It can be
coroutinized too. One thing to care is the semaphore units -- its scope
should (?) terminate earlier than the whole start_in_background() so
release it explicitly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-07 14:00:01 +03:00
Pavel Emelyanov
167c6a9c5e view_builder: Coroutinize outer start_in_background() calls
The method consists of two parts -- one running in async() thread and
continuations to it. This patch turns the latter chain into co_await-s.
The mentioned chain is "guarded" by then_wrapped() catch of any
exception, which is turned into a plain try-catch block.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-07 14:00:01 +03:00
Pavel Emelyanov
10a87f5c5b view_builder: Add helper method for background start
The view_builder::start() happens in the background. It's good to have
explicit start_in_background() method and coroutinize it next.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-07 13:59:57 +03:00
Dawid Medrek
ec691a84a5 docs/hinted_handoff: Describe sync point HTTP API
In this commit, we describe the mechanism of sync point
in Hinted Handoff in the user documentation. We explain
the motivation for it and how to use it, as well as list
and describe all of the parameters involved in the process.
Errors that may appear and experienced by the user
are addressed in the article.

Fixes scylladb/scylladb#18500

Closes scylladb/scylladb#19686
2024-08-07 11:12:23 +02:00
Pavel Emelyanov
2fd60b0adc api: Move config-related endpoints from storage_service.cc
The get_all_data_file_locations and get_saved_caches_location get the
returned data from db::config and should be next other endpoints working
with config data.

refs: #2737

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19958
2024-08-07 10:18:29 +03:00
Piotr Dulikowski
1963619803 Merge 'Use cross shard barrier to start view builder' from Pavel Emelyanov
When starting, view builder wants all shards to synchronize with each other in the middle of initialization. For that they all synchronize via shard-0's instance counter and a shared future. There's cross-shard barrier in utils/ that provides the same facility.

Closes scylladb/scylladb#19954

* github.com:scylladb/scylladb:
  view_builder: Drop unused members
  view_builder: Use cross-shard barrier on start
  view_builder: Add cross-shard barrier to its .start() method
2024-08-07 08:54:15 +02:00
Botond Dénes
78206a3fad test/boost: mutation_test: add test for cell compaction stats 2024-08-06 08:56:28 -04:00
Botond Dénes
259a59bd64 mutation/compact_and_expire_result: drop operator bool()
Having an operator bool() on this struct is counter-intuitive, so this
commit drops it and migrates any remaining users to bool is_live().
The purpose of this operator bool() was to help in incrementally replace
the previous bool return type with compact_and_expire_result in the
compact_and_expire() call stack. Now that this is done, it has served
its purpose.
2024-08-06 08:56:28 -04:00
Botond Dénes
f638c37c4b querier: consume_page(): add rate-limiting to tombstone warnings
These warnings can be logged once per query, which could result in
filling the logs with thousands of log lines.
Rate-limit to once per 10sec.
2024-08-06 08:56:11 -04:00
Botond Dénes
d69b16a51e querier: consume_page(): add cell stats to page stats trace message 2024-08-06 08:56:11 -04:00
Botond Dénes
98c599f73a querier: consume_page(): add tombstone warning for cell tombstones
Since it is really difficult to meaningfully aggregate cell tombstones
with row tombstones, there is two separate warning for them.
2024-08-06 08:56:11 -04:00
Botond Dénes
fa2ee6d545 querier: consume_page(): extract code which logs tombstone warning
Soon, we want to log a warning on too many cell tombstones as well.
Extract the logging code to allow reuse between the row and cell
tombstone warnings.
2024-08-06 08:56:11 -04:00
Botond Dénes
e403644c8b mutation/mutation_compactor: collect and aggregate cell compaction stats
row::compact_and_expire() now returns details cell stats. Collect and
aggregate these, using the existing compaction_stats::row_stats
structure.
2024-08-06 08:56:11 -04:00
Botond Dénes
0396db497c mutation: row::compact_and_expire(): use compact_and_expire_result
Collect, store and return stats about cells, via
compact_and_expire_result.
2024-08-06 08:56:11 -04:00
Botond Dénes
2c6d4e21e6 collection_mutation: compact_and_expire(): use compact_and_expire_result
Collect, store and return stats about cells, via
compact_and_expire_result.
2024-08-06 08:56:11 -04:00
Botond Dénes
e773a8eee6 mutation: introduce compact_and_expire_result
To hold cell stats, to be collected during row::compact_and_expire().
Users will come in the next patches.
2024-08-06 08:56:11 -04:00
Aleksandra Martyniuk
9ec8000499 test: add test to check that task handler is fixed 2024-08-06 13:15:33 +02:00
Aleksandra Martyniuk
811ca00cec tasks: fix task handler
There are some bugs missed in task handler:
- wait_for_task does not wait until virtual tasks are done, but
  returns the status immediately;
- wait_for_task suffers from use after return;
- get_status_recursively does not set the kind of task essentials.

Fix the aforementioned.
2024-08-06 13:15:13 +02:00
Anna Stuchlik
849856b964 doc: add post-installation configuration to the Web Installer page
This commit extracts the information about the configuration the user should do
right after installation (especially running scylla_setup) to a separate file.
The file is included in the relevant pages, i.e., installing with packages
and installing with Web Installer.

In addition, the examples on the Web Installer page are updated
with supported versions of ScyllaDB.

Fixes https://github.com/scylladb/scylladb/issues/19908

Closes scylladb/scylladb#20035
2024-08-06 13:49:09 +03:00
Kamil Braun
f348f33667 raft topology: improve logging
Add more logging for raft-based topology operations in INFO and DEBUG
levels.

Improve the existing logging, adding more details.

Fix a FIXME in test_coordinator_queue_management (by readding a log
message that was removed in the past -- probably by accident -- and
properly awaiting for it to appear in test).

Enable group0_state_machine logging at TRACE level in tests. These logs
are relatively rare (group 0 commands are used for metadata operations)
and relatively small, mostly consist of printing `system.group0_history`
mutation in the applied command, for example:
```
TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - apply() is called with 1 commands
TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd: prev_state_id: optional(dd9d47c6-50ee-11ef-d77f-500b8e1edde3), new_state_id: dd9ea5c6-50ee-11ef-ae64-dfbcd08d72c3, creator_addr: 127.219.233.1, creator_id: 02679305-b9d1-41ef-866d-d69be156c981
TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd.history_append: {canonical_mutation: table_id 027e42f5-683a-3ed7-b404-a0100762063c schema_version c9c345e1-428f-36e0-b7d5-9af5f985021e partition_key pk{0007686973746f7279} partition_tombstone {tombstone: none}, row tombstone {range_tombstone: start={position: clustered, ckp{0010b4ba65c64b6e11ef8080808080808080}, 1}, end={position: clustered, ckp{}, 1}, {tombstone: timestamp=1722617232237511, deletion_time=1722617232}}{row {position: clustered, ckp{0010dd9ea5c650ee11efae64dfbcd08d72c3}, 0} tombstone {row_tombstone: none} marker {row_marker: 1722617232237511 0 0}, column description atomic_cell{ create system_distributed keyspace; create system_distributed_everywhere keyspace; create and update system_distributed(_everywhere) tables,ts=1722617232237511,expiry=-1,ttl=0}}}
```
note that the mutation contains a human-readable description of the
command -- like "create system_distributed keyspace" above.

These logs might help debugging various issues (e.g. when `apply` hangs
waiting for read_apply mutex, or takes too long to apply a command).

Ref: scylladb/scylladb#19105
Ref: scylladb/scylladb#19945

Closes scylladb/scylladb#19998
2024-08-06 11:50:16 +03:00
Kamil Braun
aa9d5fe3f5 Merge 'doc: add the 6.0-to-6.1 upgrade guide' from Anna Stuchlik
This PR adds the 6.0-to-6.1 upgrade guide (including metrics) and removes the 5.4-to-6.0 upgrade guide.

Compared 5.4-to-6.0, the the 6.0-to-6.1 guide:

- Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite.
- Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates
  are mandatory in 6.1 and don't require any user action after upgrading to 6.1.
- Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management),
  so now there's no scenario that would require the user to follow the validation procedure.
- Removed the references to the Enable Consistent Topology Updates page (which was in version 6.0 and is removed with this PR) across the docs.

See the individual commits for more details.

Fixes https://github.com/scylladb/scylladb/issues/19853
Fixes https://github.com/scylladb/scylladb/issues/19933

This PR must be backported to branch-6.1 as it is critical in version 6.1.

Closes scylladb/scylladb#19983

* github.com:scylladb/scylladb:
  doc: remove the 5.4-to-6.0 upgrade guide
  doc: add the 6.0-to-6.1 upgrade guide
2024-08-06 10:23:18 +02:00
Andrei Chekun
cc428e8a36 [test.py] Increase pool size for CI
Currently, the resource utilization in CI is low. Increasing the number of clusters will increase how many tests are executed simultaneously. This will decrease the time it takes to execute and improve resource utilization.

Related: https://github.com/scylladb/qa-tasks/issues/1667

Closes scylladb/scylladb#19832
2024-08-06 11:20:36 +03:00
Botond Dénes
822d3b11d0 tool/scylla-nodetool: refresh: improve error-message on missing ks/tbl args
The command has a singl check for the missing keyspace and/or table
parameters and if the check fails, there is a combined error message.
Apparently this is confusing, so split the check so that missing
keyspace and missing table args have its own check and error message.

Fixes: scylladb/scylladb#19984

Closes scylladb/scylladb#20005
2024-08-05 22:36:05 +03:00
Anna Stuchlik
32fa5aa938 doc: remove the 5.4-to-6.0 upgrade guide
This commit removes the 5.4-to-6.0 upgrade guide and all references to it.
It mainly removes references to the Enable Consistent Topology Updates page,
which was added as enabling the feature was optional.
In rare cases, when a reference to that page is necessary,
the internal link is replaced with an external link to version 6.0.
Especially the Handling Cluster Membership Change Failures page was modified
for troubleshooting purposes rather than removed.
2024-08-05 20:13:48 +02:00
Kefu Chai
b1405da6ac s3/client: use div_ceil() defined by utils/div_ceil.hh
instead of reinventing the wheel, let's use the existing one.

in this change, we trade the `div_ceil()` implementated in s3/client.cc
for the existing one in utils/div_ceil.hh . because we are not using
`std::lldiv()` anymore, the corresponding `#include <cstdlib>` is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20000
2024-08-05 15:35:18 +03:00
Kefu Chai
12a066ccdf sstable_directory: use return_exception_ptr() when appropriate
instead of using `std::rethrow_exception()`, use
`coroutine::return_exception_ptr()` which is a little bit  more
efficient.

See also 6cafd83e1c

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20001
2024-08-05 12:54:27 +03:00
Kefu Chai
0bc886d005 service: mark fmt::formatter<T>::format() as const
fmt 11 enforces the constness of `format()` member function, if it
is not marked with `const`, the tree fails to build with fmt 11, like:

```
/usr/include/fmt/base.h:1393:23: error: no matching member function for call to 'format'
 1393 |     ctx.advance_to(cf.format(*static_cast<qualified_type*>(arg), ctx));
      |                    ~~~^~~~~~
/usr/include/fmt/base.h:1374:21: note: in instantiation of function template specialization 'fmt::detail::value<fmt::context>::format_custom_arg<service::migration_badness, fmt::formatter<service::migration_badness>>' requested here
 1374 |     custom.format = format_custom_arg<
      |                     ^
/home/kefu/dev/scylladb/service/tablet_allocator.cc:170:14: note: in instantiation of function template specialization 'fmt::format_to<fmt::basic_appender<char>, const locator::global_tablet_id &, const locator::tablet_replica &, const locator::tablet_replica &, const service::migration_badness &, 0>' requested here
  170 |         fmt::format_to(ctx.out(), "{{tablet: {}, {} -> {}, badness: {}", candidate.tablet, candidate.src,
      |              ^
/home/kefu/dev/scylladb/service/tablet_allocator.cc:161:10: note: candidate function template not viable: 'this' argument has type 'const fmt::formatter<service::migration_badness>', but method is not marked const
  161 |     auto format(const service::migration_badness& badness, FormatContext& ctx) {
      |          ^
```

so, in this change, we mark these two `format()` member functions const.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20013
2024-08-05 12:53:42 +03:00
Piotr Dulikowski
a038a1fdef Merge 'db: coroutinize do_apply_counter_update' from Michael Litvak
rewrite the function as coroutine to make it easier to read and maintain, following lifetime issues we had and fixed in this function.

The second commit adds a test that drops a table while there is a counter update operation ongoing in the table.
The test reproduces issue https://github.com/scylladb/scylla-enterprise/issues/4475 and verifies it is fixed.

Follow-up to https://github.com/scylladb/scylladb/pull/19948
Doesn't require backport because the fix to the issue was already done and backported. This is just cleanup and a test.

Closes scylladb/scylladb#19982

* github.com:scylladb/scylladb:
  db: test counter update while table is dropped
  db: coroutinize do_apply_counter_update
2024-08-05 10:08:18 +02:00
Nadav Har'El
247b84715a test/cql-pytest: reproducers for key length bugs
Recently, some users have seen "Key size too large" errors in various
places. Cassandra and Scylla impose a 64KB length limit on keys, and
we have known about bugs in this area for a long time - and even had
some translated Cassandra unit tests that cover some of them. But these
tests did not cover all the corner cases and left us with partial and
fragmented knowledge of this problem, spread over many test files and
many issues.

In this patch, we add a single test file, test/cql-pytest/test_key_length.py
which attempts to rigourously explore the various bugs we have with
CQL key length limits. These test aim to reproduce all known bugs in
this area:

* Refs #3017 - CQL layer accepts set values too large to be written to
  an sstable
* Refs #10366 - Enforce Key-length limits during SELECT
* Refs #12247 - Better error reporting for oversized keys during INSERT
* Refs #16772 - Key length should be limited to exactly 65535, not less

The following less interesting bug is already covered by many tests so
I decided not to test it again:

* Refs #7745 - Length of map keys and set items are incorrectly limited
  to 64K in unprepared CQL

There's also a situation in materialized views and secondary indexes,
where a column that was _not_ a key, now becomes a key, and a length
limit needs to be enforced on it. We already have good test coverage
for this (in test/cql-pytest/test_secondary_index.py and in
test/cql-pytest/test_materialized_view.py), and we have an issue:

* Refs #8627 - Cleanly reject updates with indexed values where value > 64k

All 16 tests added here pass on Cassandra 5 except one that fails on
https://issues.apache.org/jira/browse/CASSANDRA-19270, but 11 of the
tests currently fail on Scylla (6 on #12247, 2 on #10366, 3 on #16772).

It is possible that our decision in #16772 will not be to fix Scylla
to match Cassandra but rather to declare that strict compatibility isn't
needed in this case or even that Cassandra is wrong. But even then,
having these tests which demonstrate the behavior of both Cassandra
and Scylla will be important.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16779
2024-08-05 10:13:49 +03:00
Tzach Livyatan
861a1cedea Improve tombstone_compaction_interval description
Closes scylladb/scylladb#19072
2024-08-05 10:10:55 +03:00
Pavel Emelyanov
f0f28cf685 docs: Extend debugging with info about exploring ELF notes
When debugging coredumps some (small, but useful) information is hidden
in the notes of the core ELF file. Add some words about it exists, what
it includes and the thing that is always forgotten -- the way to get one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19962
2024-08-05 09:49:52 +03:00
Tzach Livyatan
858fd4d183 Update tracing.rst - fix table node_slow_log_time name
Closes scylladb/scylladb#19893
2024-08-05 09:47:27 +03:00
Botond Dénes
76b6e8c5aa Merge 'Drop datadir from keyspace::config' from Pavel Emelyanov
Commit ad0e6b79 (replica: Remove all_datadir from keyspace config) removed all_datadirs from keyspace config, now it's datadir turn. After this change keyspace no longer references any on-disk directories, only the sstables's storage driver attached to keyspace's tables does.

refs #12707

Closes scylladb/scylladb#19866

* github.com:scylladb/scylladb:
  replica: Remove keyspace::config::datadir
  sstables/storage: Evaluate path for keyspace directory in storage
  sstables/storage: Add sstables_manager arg to init_keyspace_storage()
2024-08-05 09:46:29 +03:00
Avi Kivity
2eff4b41ad repair: row_level: coroutinize working_row_hashes()
It uses do_with, so it allocates unconditionally. Might as well use
the allocation for a nice coroutine.

Closes scylladb/scylladb#19915
2024-08-05 08:55:34 +03:00
Anna Stuchlik
eca2dfd8c3 doc: add OS support for version 6.1
This commit adds OS support for version 6.1 and removes OS support for 5.4
(according to our support policy for versions).

Closes scylladb/scylladb#19992
2024-08-05 08:25:16 +03:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Avi Kivity
cdee667170 alternator: destroy streamed json values gently
Large json return values are streamed to avoid memory pressure
and stalls, but are destroyed all at once. This in itself can cause
stalls [1].

Destroy them gently to avoid the stalls.

[1]

++[0#1/1 100%] addr=0x46880df total=514498 count=7004 avg=73:
|              seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}> at ./build/release/seastar.lto/./seastar/include/seastar/util/backtrace.hh:64
++           - addr=0x4680b35:
|              seastar::backtrace_buffer::append_backtrace_oneline at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:839
|              (inlined by) seastar::print_with_backtrace at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:858
++           - addr=0x46800f7:
|              seastar::internal::cpu_stall_detector::generate_trace at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1469
++           - addr=0x4680178:
|              seastar::internal::cpu_stall_detector::maybe_report at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1206
|              (inlined by) seastar::internal::cpu_stall_detector::on_signal at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1226
++           - addr=0x3dbaf: ?? ??:0
  ++[1#1/812 13%] addr=0x217b774 total=69336 count=990 avg=70:
  |               rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:721
  | ++[2#1/3 85%] addr=0x217b7db total=58974 count=842 avg=70:
  | |             rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71
  | |             (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733
  | | ++[3#1/4 45%] addr=0x217b7db total=902102 count=12903 avg=70:
  | | |             rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71
  | | |             (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733
  | | -> continued at addr=0x217b7db above
  | | |+[3#2/4 40%] addr=0x217b8b3 total=794219 count=11363 avg=70:
  | | |             rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:726
  | | | ++[4#1/1 100%] addr=0x217b7db total=909571 count=13012 avg=70:
  | | | |              rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71
  | | | |              (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733
  | | | -> continued at addr=0x217b7db above
  | | |+[3#3/4 15%] addr=0x43d35a3 total=296768 count=4246 avg=70:
  | | |             seastar::shared_ptr_count_for<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
  | | |             (inlined by) seastar::shared_ptr_count_for<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492
  | | | ++[4#1/2 98%] addr=0x43e7d06 total=289680 count=4144 avg=70:
  | | | |             seastar::shared_ptr<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:570
  | | | |             (inlined by) alternator::make_streamed(rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>&&)::$_0::operator() at ./alternator/executor.cc:127
  | | | ++          - addr=0x184e0a6:
  | | | |             std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240
  | | | |             (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose at ./build/release/seastar.lto/./seastar/include/seastar/core/coroutine.hh:125
  | | | |             (inlined by) seastar::reactor::run_tasks at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:2651
  | | | |             (inlined by) seastar::reactor::run_some_tasks at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3114
  | | | | ++[5#1/1 100%] addr=0x2503b87 total=310677 count=4417 avg=70:
  | | | | |              seastar::reactor::do_run at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3283
  | | | |   ++[6#1/2 78%] addr=0x46a2898 total=400571 count=5450 avg=73:
  | | | |   |             seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0::operator() at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:4501
  | | | |   |             (inlined by) std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61
  | | | |   |             (inlined by) std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:111
  | | | |   |             (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290
  | | | |   ++          - addr=0x4673fda:
  | | | |   |             std::function<void ()>::operator() at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
  | | | |   |             (inlined by) seastar::posix_thread::start_routine at ./build/release/seastar.lto/./seastar/src/core/posix.cc:90
  | | | |   ++          - addr=0x8c946: ?? ??:0
  | | | |   ++          - addr=0x11296f: ?? ??:0
  | | | |   ++[6#2/2 22%] addr=0x2502c1e total=113613 count=1549 avg=73:
  | | | |   |             seastar::reactor::run at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3166
  | | | |   ++          - addr=0x22068e0:
  | | | |   |             seastar::app_template::run_deprecated at ./build/release/seastar.lto/./seastar/src/core/app-template.cc:276
  | | | |   ++          - addr=0x220630b:
  | | | |   |             seastar::app_template::run at ./build/release/seastar.lto/./seastar/src/core/app-template.cc:167
  | | | |   ++          - addr=0x22334bc:
  | | | |   |             scylla_main at ./main.cc:672
  | | | |   ++          - addr=0x20411cc:
  | | | |   |             std::function<int (int, char**)>::operator() at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
  | | | |   |             (inlined by) main at ./main.cc:2072
  | | | |   ++          - addr=0x27b89: ?? ??:0
  | | | |   ++          - addr=0x27c4a: ?? ??:0
  | | | |   ++          - addr=0x28c8fb4:
  | | | |   |             _start at ??:?

Closes scylladb/scylladb#19968
2024-08-05 00:35:52 +03:00
Botond Dénes
c34127092d reader_concurrency_semaphore: test constructor: don't ignore metrics param
The for_tests constructor has a metrics parameter defaulted to
register_metrics::no, but when delegating to the other constructor, a
hard-coded register_metrics::no is passed. This makes no difference
currently, because all callers use the default and the hard-coded value
corresponds to it. Let's fix it nevertheless to avoid any future
surprises.

Closes scylladb/scylladb#20007
2024-08-04 21:14:42 +03:00
Laszlo Ersek
0933a52c0b test/sstable: remove useless variable from promoted_index_read()
The large_partition_schema() call returns a copy of the "schema_ptr"
object that points to an effectively statically initialized thread_local
"schema" object. The large_partition_schema() call has no bearing on
whether, or when, the "schema" object is constructed, and has no side
effects (other than copying an "lw_shared_ptr" object). Furthermore, the
return value of large_partition_schema() is not used for anything in
promoted_index_read().

This redundant call seems to date back to original commit 3dd079fb7a
("tests: add test for reading parts of a large partition", 2016-08-07).
Remove the call and the variable.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
bb58446258 test/sstable: rewrite promoted_index_read() with async()
For better readability, replace future::then() chaining with
future::get().

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
1f565626d4 test/sstable: unfuturize lambda invocation in test_using_reusable_sst*()
All lambdas passed to test_using_reusable_sst() and
test_using_reusable_sst_returning() have been converted to future::get()
calls (according to the seastar::thread context that they are now executed
in). None of the lambdas return futures anymore; they all directly return
void or non-void. Therefore, drop futurize_invoke(...).get() around the
lambda invocations in test_using_reusable_sst*().

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
8ea881ae04 test/sstable: rewrite wrong_range() with async()
For better readability, replace the future::then() chaining (and the
associated manual fiddling with object lifecycles) with future::get() (and
rely on seastar::thread's stack). We're already in seastar::thread
context.

Similarly, replace the future::finally() underlying with_closeable() with
deferred_close(); with the assumption that mutation_reader::close() never
fails (and is therefore safe to call in the "deferred_close" destructor).
This is actually guaranteed, as mutation_reader::close() is marked
"noexcept".

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
e7e9a0a696 test/sstable: simplify not_find_key_composite_bucket0() under test_using_reusable_sst()
According to early patch "test/sstable: rewrite test_using_reusable_sst()
with async" in this series, lambdas passed to test_using_reusable_sst()
are invoked:

(a) less importantly here, in seastar::thread context,

(b) more importantly here, futurized (temporarily so).

The test case not_find_key_composite_bucket0() doesn't chain futures;
therefore it needs no conversion to future::get() for purpose (a);
however, we can eliminate its empty future return. Fact (b) will cover for
that, until all such lambdas are converted to direct "void" returns (at
which point we can remove the futurization from
test_using_reusable_sst()).

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
95cf16708d test/sstable: rewrite full_index_search() with async()
For better readability, replace future::then() chaining with
future::get(). (We're already in seastar::thread context.)

This patch is best viewed with "git show -b".

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
2a27d5b344 test/sstable: simplify find_key*(), all_in_place() under test_using_reusable_sst()
According to early patch "test/sstable: rewrite test_using_reusable_sst()
with async" in this series, lambdas passed to test_using_reusable_sst()
are invoked:

(a) less importantly here, in seastar::thread context,

(b) more importantly here, futurized (temporarily so).

The test cases find_key_map(), find_key_set(), find_key_list(),
find_key_composite(), all_in_place() don't chain futures; therefore they
need no conversion to future::get() for purpose (a); however, we can
eliminate their empty future returns. Fact (b) will cover for that, until
all such lambdas are converted to direct "void" returns (at which point we
can remove the futurization from test_using_reusable_sst()).

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
d22bd93abb test/sstable: rewrite (un)compressed_random_access_read() with async()
For better readability, replace future::then() chaining with
future::get(). (We're already in seastar::thread context.)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
6e35e584c8 test/sstable: simplify write_and_validate_sst()
All three lambdas passed to write_and_validate_sst() now use future::get()
rather than future::then() chaining; in other words, the future::get()
calls inside all these seastar::thread contexts have been pushed down to
the lambdas. Change all these lambdas' return types from future<> to void.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
8819b3f134 test/sstable: simplify check_toc_func() under async()
The lambda passed to write_and_validate_sst() already runs in
seastar::thread context; replace future::then() chaining with
future::get() calls.

We're going to eliminate the trailing "return make_ready_future<>()"
later.

This patch is best viewed with "git show -W -b".

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
de56883a17 test/sstable: simplify check_statistics_func() under async()
The lambda passed to write_and_validate_sst() already runs in
seastar::thread context; replace future::then() chaining with
future::get() calls.

We're going to eliminate the trailing "return make_ready_future<>()"
later.

This patch is best viewed with "git show -W -b".

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
1a85412f96 test/sstable: simplify check_summary_func() under async()
The lambda passed to write_and_validate_sst() already runs in
seastar::thread context; replace future::then() chaining with
future::get() calls.

We're going to eliminate the trailing "return make_ready_future<>()"
later.

This patch is best viewed with "git show -W -b".

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
7b21bce1ca test/sstable: coroutinize check_component_integrity()
check_component_integrity() does not rely on any deferred close or stop
operations; turn it into a coroutine therefore, for best readability.

This conversion demonstrates particularly well how much the stack eases
coding. We no longer need to artificially extend the lifetime of "tmp"
with a final

  .then([tmp] {})

future. Consequently, "tmp" no longer needs to be a shared pointer to an
on-heap "tmpdir" object; "tmp" can just be a "tmpdir" object on the stack.

While at it, eliminate the single-use local objects "s" and "gen", for
movability's sake. (We could use std::move() on these variables, but it
seems easier to just flatten the function calls that produce the
corresponding rvalues into the write_sst_info() argument list.)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
caca13fe28 test/sstable: rewrite write_sst_info() with async()
For better readability, replace future::then() chaining with
future::get().

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
cfe92ee203 test/sstable: simplify missing_summary_first_last_sane()
The lambda passed to test_using_reusable_sst() is now invoked --
futurized, transitorily -- in seastar::thread context; stop returning an
explicit make_ready_future<>() from the lambda.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
10ebc0a2d2 test/sstable: coroutinize summary_query_fail()
summary_query_fail() does not rely on any deferred close or stop
operations; turn it into a coroutine therefore, for best readability.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
a403ad0703 test/sstable: rewrite summary_query() with async()
For better readability, replace future::then() chaining with
future::get(). (We're already in seastar::thread context.)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
3a57a7cfea test/sstable: coroutinize (simple/composite)_index_read()
simple_index_read() and composite_index_read() do not rely on any deferred
close or stop operations; turn them into coroutines therefore, for best
readability.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
eeeab1110a test/sstable: rewrite index_read() with async()
For better readability, replace future::then() chaining with
future::get(). (We're already in seastar::thread context.)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
17d4fac669 test/sstable: rewrite test_using_reusable_sst() with async()
Improve the readability of test_using_reusable_sst() by replacing
future::then() chaining with test_env::do_with_async() and future::get().

Unlike seastar::async(), test_env::do_with_async() restricts its input
lambda to returning "void". Because of this, introduce the variant
test_using_reusable_sst_returning(), based on
test_env::do_with_async_returning(), for lambdas returning non-void. Put
the latter to use in index_read() at once.

Subsequently, we'll gradually convert the lambdas passed to
test_using_reusable_sst() and test_using_reusable_sst_returning() from
returning futures to returning direct values. In order for
test_using_reusable_sst() and test_using_reusable_sst_returning() to cope
with both types of lambdas, wrap the lambdas into futurize_invoke().get().
In the seastar::thread context, future::get() will gracefully block on
genuine futures, and return immediately on direct values that were
futurized on the spot.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Laszlo Ersek
79a8a6c638 test/sstable: rewrite test_using_working_sst() with async()
Make test_using_working_sst() easier to read by:

(1) replacing test_env::do_with() with seastar::async(),
    seastar::defer(), and future::get();

(2) replacing seastar::async() and seastar::defer() with
    test_env::do_with_async().

Technically speaking, this change does not perfectly preserve exceptional
behavior. Namely, test_env::do_with() uses future::finally() to link
test_env::stop() to the chain of futures, and future::finally() permits
test_env::stop() itself to throw an exception -- potentially leading to a
seastar::nested_exception being thrown, which would carry both the
original exception and the one thrown by test_env::stop().

Contrarily, the test_env::stop() deferred with seastar::defer() runs in a
destructor, and therefore test_env::stop() had better not throw there.

However, we will assume that test_env::stop() does not throw, albeit not
marked "noexcept". Prior commits 8d704f2532 ("sstable_test_env:
Coroutinize and move to .cc test_env::stop()", 2023-10-31) and
2c78b46c78 ("sstables::test_env: Carry compaction manager on board",
2023-10-31) show that we've considered individual actions in
test_env::stop() not to throw before.

The 128KB stack of seastar::thread (which underlies seastar::async())
should be a tolerable cost in a test case, in exchange for the improved
readability.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-04 15:35:51 +02:00
Kefu Chai
0660675387 utils/div_ceil: add constraints to template arguments
to better reflect what we expect from the arguments.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20003
2024-08-04 15:32:01 +03:00
Aleksandra Martyniuk
2ab56b7f56 repair: use find_column_family in insert_repair_meta
repair_service::insert_repair_meta gets the reference to a table
and passes it to continuations. If the table is dropped in the meantime,
the reference becomes invalid.

Use find_column_family at each table occurrence in insert_repair_meta
instead.

Closes scylladb/scylladb#19953
2024-08-04 13:56:38 +03:00
Kefu Chai
571ae0ac96 docs: link to current document instead of the github wiki
before this change, the hyper link brings us to a GitHub wiki page,
which just points the reader to
https://docs.scylladb.com/operating-scylla/snitch/.
this is not a great user experience.

so, in this change, we just reference the document in the current build.
more efficient this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19952
2024-08-04 11:47:21 +03:00
Kefu Chai
f7556edc65 build: cmake: define SCYLLA_ENABLE_PREEMPTION_SOURCE for dev build
in fabab2f4, we introduced preemption_source, and added
`SCYLLA_ENABLE_PREEMPTION_SOURCE` preprocessor macro to enable
opt-in the pluggable preemption check.

but CMake building system was not updated accordingly.

so, in this change, let's sync the CMake building system with
`configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19951
2024-08-04 11:46:28 +03:00
Yaron Kaikov
8221a178d8 Revert "dist: support nonroot and offline mode for scylla-housekeeping"
This reverts commit c3bea539b6.

Since it breaking offline-installer artifact-tests. Also, it seems that we should have merged it in the first place since we don't need scylla-housekeeping checks for offline-installer

Closes scylladb/scylladb#19976
2024-08-04 10:55:26 +03:00
Aleksandra Martyniuk
c456a43173 compaction: replace optional<task_info> with task_info param
compaction_manager::perform_compaction does not create task manager
task for compaction if parent_info is set to std::nullopt. Currently,
we always want to create task manager task for compaction.

Remove optional from task info parameters which start compaction.
Track all compactions with task manager.
2024-08-02 14:38:46 +02:00
Aleksandra Martyniuk
108d0344b8 compaction: keep split executor in task manager
If perform_compaction gets std::nullopt as a parent info then
the executor won't be tracked by task manager.

Modify storage_group::split call so that it passes empty task_info
instead of nullopt to track split.
2024-08-02 12:45:32 +02:00
Wojciech Mitros
543dab9e88 mv: test the view update behavior
With the recently added mv admission control, we can now
test how are the view update backlogs updated and propagated
without relying just on the response delays that it was causing
until now.
This patch adds a test for it, replicating issues scylladb/scylladb#18461
and scylladb/scylladb#18783.
In the test, we start with an empty view update backlog, then perform
a write to it, increasing its backlog and saving the updated backlog
on coordinator, the backlog then drops back to 0, we wait 1s for the
backlog to be gossiped and we perform another write which should
succeed.
Due to scylladb/scylladb#18461, the test would fail because
in both gossip rounds before and after the write, the backlog was empty,
causing the write to be blocked by admission control indefinitely.
Due to scylladb/scylladb#18783, the test would fail because when
the backlog drops back to 0 after the write, the change is never
registered, causing all writes to be blocked as well.
2024-08-02 12:12:24 +02:00
Wojciech Mitros
795ac177c2 mv: add test for admission control
In this patch we add 2 tests for checking that the mv admission control works.
The first one simply checks whether, after increasing the backlog on one node
over the admission control threshold, the following request is rejected with
the error message corresponding to the admission control.
The second one checks whether, after triggering admission control, the entire
user request fails instead of just failing a replica write. This is done by
performing a number of writes, some of which trigger the admission control
and cause retries, then checking if the node that had a large view update backlog
received all the writes. Before, the writes would succeed on enough replicas,
reaching QUORUM, and allowing the user write to succeed and cause no retries,
even though on the replica with a high backlog the write got rejected due to
the backlog size.
2024-08-02 12:12:24 +02:00
Wojciech Mitros
a55b7688b6 storage_proxy: return overloaded_exception instead of throwing
To avoid an expensive stack unwind, instead of throwing an error,
we can just return it thanks to the boost::result type that the
affected methods use. The result with an exception needs to be
constructed not implicitly, but with boost::outcome_v2::failure,
because the exception, converted into coordinator_exception_container
can be then converted into both into a successful response_id_type
as well as into a failure.
2024-08-02 12:12:24 +02:00
Wojciech Mitros
5eaae05aaf mv: reject user requests by coordinator when a replica is overloaded by MVs
Currently, when a replica's view update backlog is full, the write is still
sent by the coordinator to all replicas. Because of the backlog, the write
fails on the replica, causing inconsistency that needs to be fixed by repair.
To avoid these inconsistencies, this patch adds a check on the coordinator
for overloaded replicas. As a result, a write may be rejected before being
sent to any replicas and later retried by the user, when the replica is no
longer overloaded.

Fixes scylladb/scylladb#17426
2024-08-02 12:12:19 +02:00
Piotr Dulikowski
39b49a41cc Merge 'mv: delete a partition in a single operation when applicable' from Michael Litvak
Currently when a partition is deleted from the base table, we generate a
row tombstone update for each one of the view rows in the partition.

When the partition key in the view is the same as the base, maybe in a
different order, this can be done more efficiently - The whole corresponding
view partition can be deleted with one partition tombstone update.

With this commit, when generating view updates, if the update mutation has a
partition tombstone then for the views which have the same partition key
we will generate a partition tombstone update, and skip the individual
row tombstone updates.

Fixes scylladb/scylladb#8199

Closes scylladb/scylladb#19338

* github.com:scylladb/scylladb:
  mv: skip reading rows when generating partition tombstone update
  mv: delete a partition in a single operation when applicable
  cql-pytest: move ScyllaMetrics to util file to allow reuse
2024-08-02 11:00:18 +02:00
Michael Litvak
0f5e8c52ad db: test counter update while table is dropped
Add a test that drops a table while there is a counter update operation
ongoing in the table.
The test reproduces issue scylladb/scylla-enterprise#4475 and verifies
it is fixed.
2024-08-01 22:23:17 +03:00
Avi Kivity
99d0aaa7d2 Merge 'tablets: load_balancer: Improve per-table balance' from Tomasz Grabiec
Tablet load balancer tries to equalize tablet load between shards by
moving tablets. Currently, the tablet load balancer assumes that each
tablet has the same hotness. This may not be true, and some tables may
be hotter than others. If some nodes end up getting more tablets of
the hot table, we can end up with request load imbalance and reduced
performance.

In 79d0711c7e we implemented a
mitigation for the problem by randomly choosing the table whose tablet
replica should be moved. This should improve fairness of
movement. However, this proved to not be enough to get a good
distribution of tablets.

This change improves candidate selection to not relay on randomness
but rather evaluating candidates with respect to the impact on load
imbalance.  Also, if there is no good candidate, we consider picking
other source shards, not the most-loaded one. This is helpful because
when finishing node drain we get just a few candidates per shard, all
of which may belong to a single table, and the destination may already
be overloaded with that table. Another shard may contain tablets of
another table which is not yet overloaded on the destination. And
shards may be of similar load, so it doesn't matter much which shard
we choose to unload.

We also consider other destinations, not the least-loaded one. This
helps when draining nodes and the source node has few shard
candidates. Shards on the destination may have similar load so there
is more than one good destinatin candidate. By limiting ourselves to a
single shard, we increase the chance that we're overload the table on
that shard.

The algorithm was evaluated using "scylla perf-load-balancing", which
simulates a sequeunce of 8 node bootstraps and decommissions for
different node and shard counts, RF, and tablet counts.

For example, for the following parameters:

  params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32}

The results are:

Before:

  Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}
  Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}}
  Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}}

After:

  Overcommit       : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}
  Overcommit       : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}}
  Overcommit       : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}

So worst shard overcommit for table1 was reduced from 4 to 1.5. Overcommit
of 4 means that the most-loaded shard has 4 times more tablets than
the average per-shard load in the cluster.

Also, node overcommit for table1 was reduced from 1.81 to 1.02.

The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution.

The algorithm is not perfect, it finds a local optimum. In the above
test, overcommit of 1.5 is not the best possible (1.25).

One of the reason why the current algorithm doesn't achieve best
distribution is that it works with a single movement at a time and
replication constraints limit the choice of destinations. Viable
destinations for remaining candidates may by only on nodes which are
not least-loaded, and we won't be able to fill the least loaded
node. Doing so would require more complex movement involving moving a
tablet from one of the destination nodes which doesn't have a replica
on the least loaded node and then replacing it with the candidate from
the source node.

Another limitation is that the algorithm can only fix balance by
moving tablets away from most loaded nodes, and it does so due to
imbalance between nodes. So it cannot fix the imbalance which is
already present on the nodes if there is not much to move due to
similar load between nodes. It is designed to not make the imbalance
worse, so it works good if we started in a good shape.

Fixes https://github.com/scylladb/scylladb/issues/16824

Closes scylladb/scylladb#19779

* github.com:scylladb/scylladb:
  test: perf: tablet_load_balancing: Test with higher shard and tablet counts
  tablets: load_balancer: Avoid quadratic complexity when finding best candidate
  tablets: load_balancer: Maintain load sketch properly during intra-node migration
  tablets: load_balancer: Use "drained" flag
  test: perf: tablet_load_balancing: Report load balancer stats
  tablets: load_balancer: Move load_balancer_stats_manager to header file
  tablets: load_balancer: Split evaluate_candidate() into src and dst part
  tablets: load_balancer: Optimize evaluate_candidate()
  tablets: load_balancer: Add more statistics
  tablets: load_balancer: Track load per table on cluster level
  tablets: load_balancer: Track load per table on node level
  tablets: load_balancer: Use a single load sketch for tracking all nodes
  locator: load_sketch: Introduce populate_dc()
  tablets: load_balancer: Modify target load sketch only when emitting migration
  locator: load_sketch: Introduce get_most_loaded_shard()
  locator: load_sketch: Introduce get_least_loaded_shard()
  locator: load_sketch: Optimize pick()/unload()
  locator: load_sketch: Introduce load_type
  test: perf: tablet_load_balancing: Report total tablet counts
  test: perf: tablet_load_balancing: Print run parameters in the single simulation case too
  test: perf: tablet_load_balancing: Report time it took to schedule migrations
  tablets: load_balancer: Log table load stats after each migration
  tablets: load_balancer: Log per-shard load distribution in debug level
  tablets: load_balancer: Improve per-table balance
  tablets: load_balancer: Extract check_convergence()
  tablets: load_balancer: Extract nodes_by_load_cmp
  tablets: load_balancer: Maintain tablet count per table
  tablets: load_balancer: Reuse src_node_info
  test: perf: tablet_load_balancing: Print warnings about bad overcommit
  test: perf: tablet_load_balancing: Allow running a single simulation
  test: perf: tablet_load_balancing: Report best possible shard overcommit
  test: perf: tablet_load_balancing: Report global shard overcommit
2024-08-01 21:12:14 +03:00
Michael Litvak
22b282f5c5 db: coroutinize do_apply_counter_update
rewrite the function as coroutine to make it easier to read and maintain,
following lifetime issues we had and fixed in this function.
2024-08-01 19:09:04 +03:00
Anna Stuchlik
9972e50134 doc: add the 6.0-to-6.1 upgrade guide
This commit adds the 6.0-to-6.1 upgrade guide.

Compared to the previous upgrade guide:

- Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite.
- Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates
  are mandatory in 6.1 and don't require any user action after upgrading to 6.1.
- Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management),
  so now there's no scenario that would require the user to follow the validation procedure.
2024-08-01 14:58:14 +02:00
Piotr Smaron
0ea2128140 cql: refactor rf_change indentation 2024-08-01 14:37:53 +02:00
Piotr Smaron
5b089d8e10 Prevent ALTERing non-existing KS with tablets
ALTER tablets KS executes in 2 steps:
1. ALTER KS's cql handler forms a global topo req, and saves data required
   to execute this req,
2. global topo req is executed by topo coordinator, which reads data
   attached to the req.

The KS name is among the data attached to the req.
There's a time window between these steps where a to-be-altered KS could
have been DROPped, which results in topo coordinator forever trying to
ALTER a non-existing KS. In order to avoid it, the code has been changed
to first check if a to-be-altered KS exists, and if it's not the case,
it doesn't perform any schema/tablets mutations, but just removes the
global topo req from the coordinator's queue.
BTW. just adding this extra check resulted in broader than expected
changes, which is due to the fact that the code is written badly and
needs to be refactored - an effort that's already planned under #19126

Fixes: #19576
2024-08-01 14:37:53 +02:00
Piotr Dulikowski
44f327675d Merge 'Remove gossiper argument from storage_service::join_cluster()' from Pavel Emelyanov
It's only needed to start hints via proxy, but proxy can do it without gossiper argument

Closes scylladb/scylladb#19894

* github.com:scylladb/scylladb:
  storage_service: Remote gossiper argument from join_cluster()
  proxy: Use remote gossiper to start hints resource manager
  hints: Const-ify gossiper references and anchor pointers
2024-08-01 10:18:14 +02:00
Michael Litvak
c944e28e43 db: fix waiting for counter update operations on table stop
When a table is dropped it should wait for all pending operations in the
table before the table is destroyed, because the operations may use the
table's resources.
With counter update operations, currently this is not the case. The
table may be destroyed while there is a counter update operation in
progress, causing an assert to be triggered due to a resource being
destroyed while it's in use.
The reason the operation is not waited for is a mistake in the lifetime
management of the object representing the write in progress. The commit
fixes it so the object lives for the duration of the entire counter
update operation, by moving it to the `do_with` list.

Fixes scylladb/scylla-enterprise#4475

Closes scylladb/scylladb#19948
2024-08-01 09:39:49 +02:00
Nadav Har'El
5411559a94 test/cql-pytest: test ALLOW FILTERING in intersection of two indexes
A user complained that ScyllaDB is incompatible with Cassandra when it
requires ALLOW FILTERING on a restriction like WHERE x=1 AND y=1 where
x and y are two columns with secondary indexes.

In the tests added in this patch we show that:

1. Scylla *is* compatible with Cassandra when the traditional "CREATE
   INDEX" is used - ALLOW FILTERING *is* required in this case in both
   Cassandra and Scylla.

2. If SAI is used in Cassandra (CREATE CUSTOM INDEX USING 'SAI'),
   indeed ALLOW FILTERING becomes optional. I believe this is incorrect
   so I opened CASSANDRA-19795.

These two tests combined show that we're not incompatible with Cassandra,
rather Cassandra's two index implementations are incompatible between
themselves, and Scylla is in fact compatible in this case with Cassadra's
traditional index and not with SAI.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19909
2024-07-31 14:01:29 +03:00
Laszlo Ersek
e67eb0ccc1 test/sstable: coroutinize do_write_sst()
Make do_write_sst() easier to read by coroutinizing it.

Closes #19803.

Suggested-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>

Closes scylladb/scylladb#19937
2024-07-31 13:59:26 +03:00
Kefu Chai
020333fcf1 sstables: fix a typo in comment
s/guranteed/guaranteed/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19946
2024-07-31 13:58:09 +03:00
Tomasz Grabiec
28de5231f4 test: perf: tablet_load_balancing: Test with higher shard and tablet counts
We have up to 200 shards in production, so test this to catch
performance issues.
2024-07-31 12:57:15 +02:00
Tomasz Grabiec
19b7fb3a4d tablets: load_balancer: Avoid quadratic complexity when finding best candidate
If the source and destination shards picked for migration based on
global tablet balance do not have a good candidate in terms of effect
on per-table balance, the algorithm explores other source shards and
destinations. This has quadratic complexity in terms of shard count in
the worst case, when there are no good candidates.

Since we can have up to ~200 shards, this can slow down scheduling
significantly. I saw total scheduling time of 5 min in the following run:

 scylla perf-load-balancing -c1 -m1G  --iterations=8 \
    --nodes=4 --tablets1=1024 --tablets2=8096 \
    --rf1=2 --rf2=3 --shards=256

To improve, change the apprach to first find the best source shard and
then best target shard, sequentially. So it's now linear in terms of
shard count.

After the change, the total scheduling time in that run is down to 4s.

Minimizing source and destination metrics piece-wise minimizes the
combined metric, so badness of the best candidate doesn't suffer after
this change.
2024-07-31 12:57:15 +02:00
Tomasz Grabiec
93df82032f tablets: load_balancer: Maintain load sketch properly during intra-node migration
Affects only intra-node migration. The code was recording destination
shard as taken and did not un-take it in case we skipped the migration
due to lack of candidates.

Noticed during code review. Impact is minor, since even if this leads
to suboptimal balance, the next scheduling round should fix it.

Also, the source shard was not unloaded, but that should have no
impact on decisions. But to be future-proof, better to maintain the
load accurately in case the algorithm is extended with more steps.
2024-07-31 12:57:15 +02:00
Tomasz Grabiec
88988ce0db tablets: load_balancer: Use "drained" flag
Cleanup / optimization.
2024-07-31 12:57:15 +02:00
Tomasz Grabiec
56801b7cb7 test: perf: tablet_load_balancing: Report load balancer stats 2024-07-31 12:57:15 +02:00
Tomasz Grabiec
90c9934099 tablets: load_balancer: Move load_balancer_stats_manager to header file
So that stats can be accessed outside tablet allocator.
2024-07-31 12:57:15 +02:00
Anna Stuchlik
ae28880fc8 doc: enable publishing docs for branch-6.1
This commit enables publishing documentation from branch-6.1. The docs will be published as UNSTABLE (the warning about version 6.1 being unstable will be displayed).

Fixes https://github.com/scylladb/scylladb/issues/19926

No backport is required.

Closes scylladb/scylladb#19931
2024-07-31 12:48:51 +02:00
Kamil Braun
c05e077a13 Merge 'raft: fix the shutdown phase being stuck' from Emil Maskovsky
Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it.

This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered.

This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100):
* TestRepairAdditional::test_repair_kill_1
* TestRepairAdditional::test_repair_kill_3

Fixes scylladb/scylladb#19223

Closes scylladb/scylladb#19860

* github.com:scylladb/scylladb:
  raft: fix the shutdown phase being stuck
  raft: use the abort source reference in raft group0 client interface
2024-07-31 12:10:30 +02:00
Pavel Emelyanov
93ed978729 view_builder: Drop unused members
There's a counter and a shared future on board, that used to facilitate
start-time barrier synchronization. Now they are not needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-31 12:59:40 +03:00
Pavel Emelyanov
613161c7b9 view_builder: Use cross-shard barrier on start
When starting, view builder spawns an async background fibers, and upon
its completion each shard needs to wait for other shards to do the same.
This is exactly what cross-shard barrier is about, so instead of
synchronizing via v.b.'s shard-0 instance, use the barrier. This makes
the view_builder::start() shorder and earier to read.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-31 12:56:25 +03:00
Pavel Emelyanov
fb1b749445 view_builder: Add cross-shard barrier to its .start() method
The barrier will be used by next patch to synchronize shards with each
other. When passed to invoke_on_all() lambda like this, each lambda gets
its its copy of the barrier "handler" that maintains shared state across
shards.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-31 12:54:28 +03:00
Tomasz Grabiec
94cce4b7d3 tablets: load_balancer: Split evaluate_candidate() into src and dst part
Those parts will be used separately later.
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
4df2abe47a tablets: load_balancer: Optimize evaluate_candidate()
Moves load computation out of the hot path by relying on
data structures maintained globally during plan making.
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
5e7facd543 tablets: load_balancer: Add more statistics 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
be055977c9 tablets: load_balancer: Track load per table on cluster level 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
81fcee2040 tablets: load_balancer: Track load per table on node level 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
e7ef7419dc tablets: load_balancer: Use a single load sketch for tracking all nodes
This is code simplification and optimization.

Avoids multiple passes of tablet metadata to consturct load sketch for
each target node.
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
352b8e0ddd locator: load_sketch: Introduce populate_dc() 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
9a7afd334b tablets: load_balancer: Modify target load sketch only when emitting migration
This avoids the need to unpick() a replica when the candidate is not
selected. Optimization.
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
b78657ce7d locator: load_sketch: Introduce get_most_loaded_shard() 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
de404471b7 locator: load_sketch: Introduce get_least_loaded_shard() 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
8fbfd595bb locator: load_sketch: Optimize pick()/unload()
They are executed frequently during tablet scheduling. Currently, they
have time complexity of O(N*log(N)) in terms of shard count. With
large shard counts, that has significant overhead.

This patch optimizes them down to O(log(N)).
2024-07-31 11:38:17 +02:00
Tomasz Grabiec
d0b0f95849 locator: load_sketch: Introduce load_type 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
8f3b623144 test: perf: tablet_load_balancing: Report total tablet counts 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
662a0ff038 test: perf: tablet_load_balancing: Print run parameters in the single simulation case too 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
a040404875 test: perf: tablet_load_balancing: Report time it took to schedule migrations 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
ae7fd80554 tablets: load_balancer: Log table load stats after each migration 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
b8996a0f59 tablets: load_balancer: Log per-shard load distribution in debug level 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
469e2f3f90 tablets: load_balancer: Improve per-table balance
Tablet load balancer tries to equalize tablet load between shards by
moving tablets. Currently, the tablet load balancer assumes that each
tablet has the same hotness. This may not be true, and some tables may
be hotter than others. If some nodes end up getting more tablets of
the hot table, we can end up with request load imbalance and reduced
performance.

In 79d0711c7e we implemented a
mitigation for the problem by randomly choosing the table whose tablet
replica should be moved. This should improve fairness of
movement. However, this proved to not be enough to get a good
distribution of tablets.

This change improves candidate selection to not relay on randomness
but rather evaluating candidates with respect to the impact on load
imbalance.  Also, if there is no good candidate, we consider picking
other source shards, not the most-loaded one. This is helpful because
when finishing node drain we get just a few candidates per shard, all
of which may belong to a single table, and the destination may already
be overloaded with that table. Another shard may contain tablets of
another table which is not yet overloaded on the destination. And
shards may be of similar load, so it doesn't matter much which shard
we choose to unload.

We also consider other destinations, not the least-loaded one. This
helps when draining nodes and the source node has few shard
candidates. Shards on the destination may have similar load so there
is more than one good destinatin candidate. By limiting ourselves to a
single shard, we increase the chance that we're overload the table on
that shard.

The algorithm was evaluated using "scylla perf-load-balancing", which
simulates a sequeunce of 8 node bootstraps and decommissions for
different node and shard counts, RF, and tablet counts.

For example, for the following parameters:

  params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32}

The results are:

After:

  Overcommit       : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}
  Overcommit       : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}}
  Overcommit       : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}

Before:

  Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}}
  Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}}
  Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}}

So shard overcommit for table1 was reduced from 4 to 1.5. Overcommit
of 4 means that the most-loaded shard has 4 times more tablets than
the average per-shard load in the cluster.

Also, node overcommit for table1 was reduced from 1.81 to 1.02.

The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution.

The algorithm is not perfect, it finds a local optimum. In the above
test, overcommit of 1.5 is not the best possible (1.25).

One of the reason why the current algorithm doesn't achieve best
distribution is that it works with a single movement at a time and
replication constraints limit the choice of destinations. Viable
destinations for remaining candidates may by only on nodes which are
not least-loaded, and we won't be able to fill the least loaded
node. Doing so would require more complex movement involving moving a
tablet from one of the destination nodes which doesn't have a replica
on the least loaded node and then replacing it with the candidate from
the source node.

Another limitation is that the algorithm can only fix balance by
moving tablets away from most loaded nodes, and it does so due to
imbalance between nodes. So it cannot fix the imbalance which is
already present on the nodes if there is not much to move due to
similar load between nodes. It is designed to not make the imbalance
worse, so it works good if we started in a good shape.

Fixes #16824
2024-07-31 11:38:16 +02:00
Tomasz Grabiec
b7661aa6c9 tablets: load_balancer: Extract check_convergence()
Will be reused when evaluating different targets for migration in later
stages.

The refactoring drops updating of _stats.for_dc(dc).stop_no_candidates
and we update _stats.for_dc(dc).stop_load_inversion in both cases
where convergence check may fail. The reason is that stat updates must
be outside check_convergence(), since the new use case should not
update those stats (it doesn't stop balancing, just drops
candidates). Propagating the information for distinguishing the two
cases would be a burden. But it's not necessary, since both cases are
actually load inversion cases, one pre-migration the other
post-migration, so we don't need the distinction.

It's actually wrong to increment stop_no_candidates, since there may
still be candidates, it's the load which is inverted.
2024-07-31 11:26:11 +02:00
Tomasz Grabiec
41e643ddb9 tablets: load_balancer: Extract nodes_by_load_cmp
Will be reused in a different place.
2024-07-31 11:26:11 +02:00
Tomasz Grabiec
8a7257971d tablets: load_balancer: Maintain tablet count per table 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
4e4f13ac9d tablets: load_balancer: Reuse src_node_info 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
71b8d6b7aa test: perf: tablet_load_balancing: Print warnings about bad overcommit 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
0d50a028a5 test: perf: tablet_load_balancing: Allow running a single simulation 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
3f3660c3fe test: perf: tablet_load_balancing: Report best possible shard overcommit 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
c89a320925 test: perf: tablet_load_balancing: Report global shard overcommit
Rather than maximum per-node shard overcommit. Global shard overcommit
is a better metric since we want to equalize global load not just
per-node load.
2024-07-31 11:26:11 +02:00
Emil Maskovsky
5dfc50d354 raft: fix the shutdown phase being stuck
Some of the calls inside the `raft_group0_client::start_operation()`
method were missing the abort source parameter. This caused the repair
test to be stuck in the shutdown phase - the abort source has been
triggered, but the operations were not checking it.

This was in particular the case of operations that try to take the
ownership of the raft group semaphore (`get_units(semaphore)`) - these
waits should be cancelled when the abort source is triggered.

This should fix the following tests that were failing in some percentage
of dtest runs (about 1-3 of 100):
* TestRepairAdditional::test_repair_kill_1
* TestRepairAdditional::test_repair_kill_3

Fixes scylladb/scylladb#19223
2024-07-31 09:18:54 +02:00
Emil Maskovsky
2dbe9ef2f2 raft: use the abort source reference in raft group0 client interface
Most callers of the raft group0 client interface are passing a real
source instance, so we can use the abort source reference in the client
interface. This change makes the code simpler and more consistent.
2024-07-31 09:18:54 +02:00
Benny Halevy
82333036f3 cell_locker: maybe_rehash: reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-31 10:06:07 +03:00
Benny Halevy
8853adea96 cell_locker: maybe_rehash: ignore allocation failures
`maybe_rehash` is complimentary and is not strictly required to succeed.
If it fails, it will retry on the next call, but there's no reason
to throw a bad_alloc exception that will fail its caller, since `maybe_rehash`
is called as the final step after the caller has already succeeded with
its action.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-31 10:06:06 +03:00
Pavel Emelyanov
9214aecbe7 storage_service: Remove orphan forward declaration of a method
The start_sys_dist_ks() itself was removed by bc051387c5

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19928
2024-07-30 16:17:49 +03:00
Benny Halevy
e58ca8c44b service_level_controller: stop: always call subscription on_abort
We want to call `service_level_controller::do_abort()` in all cases.
The current code (introduced in
535e5f4ae7)
calls do_abort if abort was not requested, however, since
it does so by checking the subscription bool operator,
it would miss the case where abort was already requested
before the subscription took place (in service_level_controller
ctor).

With scylladb/seastar@470b539b1c and
scylladb/seastar@8ecce18c51
we can just unconditionally call the subscription `on_abort`
method, that ensures only-once semantics, even if abort
was already requested at subscription time.

Fixes scylladb/scylladb#19075

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#19929
2024-07-30 13:23:17 +03:00
Kefu Chai
35394c3f9a docs/dev: fix a typo
remove the extraneous "is".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19902
2024-07-30 10:46:25 +03:00
Pavel Emelyanov
97154b0671 Merge 'mapreduce_service: complete coroutinization' from Avi Kivity
mapreduce_server was previously coroutinized, but only partially. This
series completes coroutinization and eliminates remaining continuation chains.

None of this code is performance sensitive as it runs at the super-coordinator level
and is amortized over a full scan of the entire table.

No backport needed as this is a cleanup.

Closes scylladb/scylladb#19913

* github.com:scylladb/scylladb:
  mapreduce_service: reindent
  mapreduce_service: coroutinize retrying_dispatcher::dispatch_to_node()
  mapreduce_service: coroutinize dispatch() inner lambda
2024-07-30 10:44:34 +03:00
Nadav Har'El
d293a5787f alternator: exclude CDC log table from ListTables
The Alternator command ListTables is supposed to list actual tables
created with CreateTable, and should list things like materialized views
(created for GSI or LSI) or CDC log tables.

We already properly excluded materialized views from the list - and
had the tests to prove it - but forgot both the exclusion and the testing
for CDC log tables - so creating a table xyz with streams enable would
cause ListTables to also list "xyz_scylla_cdc_log".

This patch fixes both oversights: It adds the code to exclude CDC logs
from the output of ListTables, add adds a test which reproduces the bug
before this fix, and verifies the fix works.

Fixes #19911.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19914
2024-07-30 10:43:29 +03:00
Nadav Har'El
ca8b91f641 test: increase timeouts for /localnodes test
In commit bac7c33313 we introduced a new
test for the Alternator "/localnodes" request, checking that a node
that is still joining does not get returned. The tests used what I
thought were "very high" timeouts - we had a timeout of 10 seconds
for starting a single node, and injected a 20 second sleep to leave
us 10 seconds after the first sleep.

But the test failed in one extremely slow run (a debug build on
aarch64), where starting just a single node took more than 15 seconds!

So in this patch I increase the timeouts significantly: We increase
the wait for the node to 60 seconds, and the sleeping injection to
120 seconds. These should definitely be enough for anyone (famous
last words...).

The test doesn't actually wait for these timeouts, so the ridiculously
high timeouts shouldn't affect the normal runtime of this test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19916
2024-07-30 10:41:48 +03:00
Avi Kivity
52ee6127dd Merge 'Use boto3 in object_store test to list bucket' from Pavel Emelyanov
There's a test in object_store suite that verifies the contents of a bucket. It does with the plain http request, but unfortunately this doesn't work -- even local minio uses restricted bucket and using plain http request results in 403(Forbidden) error code. Test doesn't check it and continues working with empty list of objects which, in turn, is what it expects to see.

The fix is in using boto3. With it, the acc/secret pair is picked up and listing the bucket finally works.

Closes scylladb/scylladb#19889

* github.com:scylladb/scylladb:
  test/object_store: Use boto3.resource to list bucket
  test/object_store: Add get_s3_resource() helper
2024-07-29 13:49:50 +03:00
Pavel Emelyanov
8b1a106b62 test/object_store: Use boto3.resource to list bucket
Instead of plain http request, use the power of boto3 package. The
recently added get_s3_resource() facilitates creating one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-29 12:29:16 +03:00
Pavel Emelyanov
172e1cb0da test/object_store: Add get_s3_resource() helper
It creates boto3.resource object that points to endpoint maintained
by s3_server argument (that tests obtain via fixture). This allows
using boto3 to access S3 bucket from local minio server.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-29 12:25:57 +03:00
Kefu Chai
1094c71282 cql3/statement: use compile-time format string
instead of using fmt::runtime, use compile-time format string in
order to detect the bad format string, or missing format arguments,
or arguments which are not formattable at compile time.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19901
2024-07-28 21:54:43 +03:00
Benny Halevy
be880ab22c Update seastar submodule
* seastar 67065040...a7d81328 (30):
  > reactor: Initialize _aio_pollfd later
  > abortable_fifo: fix a typo in comment
  > net: Expose DNS error category
  > pollable_fd_state: use default-generated dtor
  > perftune: tune tcp_mem
  > scripts/perftune.py: clock source tweaking: special case Amazon and Google KVM virtualizations
  > abort_source: subscription: keep callback function alive after abort
  > github: disable ccache when building with C++ modules
  > github: add enable-ccache input to test.yaml
  > pollable_fd_state: Mark destructor protected and make non-virtual
  > reactor: Mark .configure() private
  > reactor: Set aio_nowait_supported once
  > reactor: Add .no_poll_aio to reactor_config
  > reactor: Move .max_poll_time on reactor_config
  > reactor: Move .task_quota on reactor_config
  > reactor: Move .strict_o_direct on reactor_config
  > reactor: Move .bypass_fsync on reactor_config
  > reactor: Move .max_task_backlog on reactor_config
  > reactor: Move .force_io_getevents_syscall on reactor_config
  > reactor: Move .have_aio_fsync on reactor_config
  > reactor: Move .kernel_page_cache on reactor_config
  > reactor: Move .handle_sigint on reactor_config
  > reactor_backend: Construct _polling_io from reactor config
  > reactor: Move config when constructing
  > reactor: Use designated initializers to set up reactor_config
  > native-stack: use queue::pop_eventually() in listener::accept()
  > abort_source: subscription: allow calling on_abort explicitly
  > file: document that close() returns the file object to uninitialized state
  > code-cleanup: do not include 'smp.hh' in 'reactor.hh'
  > code-cleanup: remove redundant includes of smp.hh

Closes scylladb/scylladb#19912
2024-07-28 21:04:45 +03:00
Kefu Chai
36f5032b2d db: correct the doxygen comment
the parameter names do not match with the ones we are using.
these comments were inherited from Origin, but we failed to update
them accordingly.

in this change, the comments are updated to reflect the function
signatures.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19900
2024-07-28 18:24:57 +03:00
Kefu Chai
67e07bee25 build: cmake: use per-mode build dir
The build_unified.sh script accepts a --build-dir option, which
specifies the directory used for storing temporary files extracted
from tarballs defined by the --pkgs option. When performing parallel
builds of multiple modes, it's crucial that each build uses a unique
build directory. Reusing the same build directory for different modes
can lead to conflicts, resulting in build failures or, more seriously,
the creation of tarballs containing corrupted files.

so, in this change, we specify a different directory for each mode,
so that they don't share the same one.

Refs scylladb/scylladb#2717
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19905
2024-07-28 18:11:37 +03:00
Avi Kivity
149a47088e mapreduce_service: reindent 2024-07-28 17:55:51 +03:00
Avi Kivity
0dd03789f3 mapreduce_service: coroutinize retrying_dispatcher::dispatch_to_node()
Simplify the function by converting it to a coroutine.

Note that while the final co_return co_await looks like a loop (and
therefore an await would introduce an O(n) allocation), it really isn't -
we retry at most once.
2024-07-28 17:54:01 +03:00
Avi Kivity
b019927a0e mapreduce_service: coroutinize dispatch() inner lambda
dispatch() is a coroutine, but the inner lambda that is executed per
node is still a continuation chain. Make it uniform by converting to
a coroutine.
2024-07-28 17:36:08 +03:00
Kefu Chai
ee80742c39 cql3: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19906
2024-07-28 17:29:07 +03:00
Benny Halevy
26abad23d9 sstable_directory: delete_atomically: allow sstables from multiple prefixes
Currently, delete_atomically can be called with
a list of sstables from mixed prefixes in two cases:
1. truncate: where we delete all the sstables in the table directory
2. tablet cleanup: similar to truncate but restricted to sstables in a
   single tablet replica

In both cases, it is possible that sstables in staging (or quarantine)
are mixed with sstables in the base directory.

Until a more comprehensive fix is in place,
(see https://github.com/scylladb/scylladb/pull/19555)
this change just lifts the ban on atomic deletion
of sstables from different prefixes, and acknowledging
that the implementation is not atomic across
prefixes.  This is better than crashing for now,
and can be backported more easily to branches
that support tablets so tablet migration can
be done safely in the presence of repair of
tables with views.

Refs scylladb/scylladb#18862

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#19816
2024-07-28 17:26:31 +03:00
Pavel Emelyanov
aaad2bbeaf storage_service: Remote gossiper argument from join_cluster()
This pointer was only needed to pull all the way down the hints resource
manager start() method. It's no longer needed for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-26 16:29:58 +03:00
Pavel Emelyanov
a1dbaba9e1 proxy: Use remote gossiper to start hints resource manager
By the time hinst resource manager is started, proxy already has its
remote part initialized. Remote returns const gossiper pointer, but
after previous change hints code can live with it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-26 16:29:03 +03:00
Pavel Emelyanov
dd7c7c301d hints: Const-ify gossiper references and anchor pointers
There are two places in hints code that need gossiper: hist_sender
calling gossiper::is_alive() and endpoint_downtime_not_bigger_than()
helper in manager. Both can live with const gossiper, so the dependency
references and anchor pointers can be restricted to const too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-26 16:28:54 +03:00
Lakshmi Narayanan Sreethar
27b305b9d1 boost/bloom_filter_test: wait for total memory reclaimed update
The testcase `test_bloom_filter_reclaim_during_reload` checks the
SSTable manager's `_total_memory_reclaimed` against an expected value to
verify that a Bloom filter was reloaded. However, it does not wait for
the manager to update the variable, causing the check to fail if the
update has not occurred yet. Fix it by making the testcase wait until
the variable is updated to the expected value.

Fixes #19879

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#19883
2024-07-26 08:15:11 +03:00
Tomasz Grabiec
851da230c8 Merge 'db/view: drop view updates to replaced node marked as left' from Piotr Dulikowski
When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address.

This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0.

As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas.

In addition to the fix, this PR also includes a regression test heavily based on the test that @kbr-scylla prepared during his investigation of the issue.

Fixes: scylladb/scylladb#19439

This issue can cause multiple nodes to crash at once and the fix is quite small, so I think this justifies backporting it to all affected versions. 6.0 and 6.1 are affected. No need to backport to 5.4 as this issue only happens with tablets, and tablets are experimental there.

Closes scylladb/scylladb#19765

* github.com:scylladb/scylladb:
  test: regression test for MV crash with tablets during decommission
  db/view: drop view updates to replaced node marked as left
2024-07-25 11:47:14 +02:00
Michael Litvak
6f25f4b387 mv: skip reading rows when generating partition tombstone update
when deleting a base partition, in some cases we can update the view by
generating a single partition deletion update, instead of generating a
row deletion update for each of the partition rows.
If this is the case for all the affected views, and there are no other
updates besides deleting the partition, then we can skip reading and
iterating over all the rows, since this won't generate any additional
updates that are not covered already.
2024-07-25 11:12:58 +03:00
Michael Litvak
d0b02dc0d0 mv: delete a partition in a single operation when applicable
Currently when a partition is deleted from the base table, we generate a
row tombstone update for each one of the view rows in the partition.

When the partition key in the view is the same as the base, maybe in a
different order, this can be done more efficiently - The whole corresponding
view partition can be deleted with one partition tombstone update.

With this commit, when generating view updates, if the update mutation has a
partition tombstone then for the views which have the same partition key
we will generate a partition tombstone update, and skip the individual
row tombstone updates.

Fixes scylladb/scylladb#8199
2024-07-25 11:12:58 +03:00
Michael Litvak
98cc707c76 cql-pytest: move ScyllaMetrics to util file to allow reuse
ScyllaMetrics is a useful generic component for retrieving metrics in a
pytest.
The commit moves the implementation from test_shedding.py to util.py to
make it reusable in other tests in cql-pytest.
2024-07-25 11:12:58 +03:00
Botond Dénes
1bfe73c2ea Merge 'Order API endpoints registration in main' from Pavel Emelyanov
There are few api::set_foo()-s left in main that are placed in ~~random~~ legacy order. This PR fixes it and makes few more associated cleanups.

refs: #2737

Closes scylladb/scylladb#19682

* github.com:scylladb/scylladb:
  api: Unset cache_service endpoints on stop
  main: Don't ignore set_cache_service() future
  api: Move storage API few steps above
  api: Register token-metadata API next to token-metadata itsels
  api: Do not return zero local host-id
  api: Move snitch API registration next to snitch itself
2024-07-25 09:59:38 +03:00
Pavel Emelyanov
456dbc122b api: Unset cache_service endpoints on stop
They currently stay registered long after the dependent services get
stopped. There's a need for batch unsetting (scylladb/seastar#1620), so
currently only this explicit listing :(

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 18:51:32 +03:00
Pavel Emelyanov
61fb0ad996 main: Don't ignore set_cache_service() future
The call itself seem to be in wrong place -- there's no "cache service"
also the API uses database and snapshot_ctl to work on. So it deserves
more cleanup, but at least don't throw the returned future<> away.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 18:51:32 +03:00
Pavel Emelyanov
e1eb48f9c2 api: Move storage API few steps above
The sequence currently is

sharded<storage_service>.start()
sharded<query_processor>.invoke_on_all(start_remote)
api::set_server_storage_service()

The last two steps can be safely swapped to keep storage service API
next to its service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 18:51:32 +03:00
Pavel Emelyanov
6ae09cc6bf api: Register token-metadata API next to token-metadata itsels
Right now API registration happens quite late because it waits storage
service to register its "function" first. This can be done beforeheand
and the t.m. API can be moved to where it should be.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 18:51:32 +03:00
Pavel Emelyanov
10566256fd api: Do not return zero local host-id
The local host id is read from local token metadata and returned to the
caller as string. The t.m. itself starts with default-constructed host
id vlaue which is updated later. However, even such "unset" host id
value can be rendered as string without errors. This makes the correct
work of the API endpoint depend on the initialization sequence which may
(spoilter: it will) change in the future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 18:51:32 +03:00
Pavel Emelyanov
29738f0cb6 api: Move snitch API registration next to snitch itself
Once sharded<snitch> is started, it can register its handlers

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 18:51:07 +03:00
Pavel Emelyanov
6357755624 replica: Remove keyspace::config::datadir
It's finally no longer used. Now only sstables storage code "knows" that
keyspace may have its on-disk directory.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 17:45:51 +03:00
Pavel Emelyanov
f767e25c8b sstables/storage: Evaluate path for keyspace directory in storage
Currently the init_keyspace_storage() expects that the caller would
tell it where the ks directory is, but it's not nice as keyspace may
not necessarity keep its sstables in any directory.

This patch moves the directory path evaluation into storage code,
specifically to the lambda that is called for on-disk sstables. The
way directory is evaluated mirrors the one from make_keyspace_config()
that will be removed by next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 17:45:50 +03:00
Pavel Emelyanov
3ae41bd6f6 sstables/storage: Add sstables_manager arg to init_keyspace_storage()
Will be needed by next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-07-24 17:41:45 +03:00
Botond Dénes
6337372b9d test/boost/reader_concurrency_semaphore_test: un-flake test admission
The admission test has a section which tests admission when the
semaphore has inactive reads. This section (and therefore the enire
test) became flaky lately, after a seemingly unrelated seastar upgrade,
which improved timers.
The cause of the flakyness is the permit which is made inactive later:
this permit is created with 0 timeout (times out immediately). For some
time now, when the timeout timer of a permit fires, if the permit is
inactive, it is evicted. This is what makes the test fail: the inactive
read times out and ends up evicting this permit, which is not expected
for the test. The reason this was not a problem before, is that the test
finishes very quickly, usually, before the timer could even be polled by
the reactor. The recent seastar changes changed this and now the timer
sometimes get polled and fires, failing the test.

Fixes: #19801

Closes scylladb/scylladb#19859
2024-07-24 13:04:50 +03:00
Takuya ASADA
02b20089cb scylla_raid_setup: install update-initramfs when it's not available
scylla_raid_setup may fail on Ubuntu minimal image since it calls
update-initramfs without installing.

Closes scylladb/scylladb#19651
2024-07-24 11:55:16 +03:00
Pavel Emelyanov
b02d20d12d Merge 'Minor improvements around compaction groups' from Raphael "Raph" Carvalho
Minor changes, no backporting needed.

Closes scylladb/scylladb#19723

* github.com:scylladb/scylladb:
  replica: rename for_each_const_compaction_group()
  replica: Fix comment about compaction group
  replica: remove unused compaction_group_vector
2024-07-24 11:22:24 +03:00
Nadav Har'El
edc5bca6b1 alternator: do not allow authentication with a non-"login" role
Alternator allows authentication into the existing CQL roles, but
roles which have the flag "login=false" should be refused in
authentication, and this patch adds the missing check.

The patch also adds a regression test for this feature in the
test/alternator test framework, in a new test file
test/alternator/cql_rbac.py. This test file will later include more
tests of how the CQL RBAC commands (CREATE ROLE, GRANT, REVOKE)
affect authentication and authorization in Alternator.
In particular, these tests need to use not just the DynamoDB API but
also CQL, so this new test file includes the "cql" fixture that allows
us to run CQL commands, to create roles, to retrieve their secret keys,
and so on.

Fixes scylladb/scylladb#19735

Closes scylladb/scylladb#19740
2024-07-24 08:20:23 +02:00
Botond Dénes
84db147c58 Merge 'tasks: introduce virtual tasks' from Aleksandra Martyniuk
Introduce virtual tasks - task manager tasks which cover
cluster-wide operations.

Virtual tasks aren't kept in memory, instead their statuses
are retrieved from associated service when user requests
them with task manager API. From API users' perspective,
virtual tasks behave similarly to regular tasks, but they can
be queried from any node in a cluster.

Virtual tasks cannot have a parent task. They can have
children on each node in a cluster, but do not keep references
to them. So, if a direct child of a virtual task is unregistered
from task manager, it will no longer be shown in parent's
children vector.

virtual_task class corresponds to all virtual tasks in one
group. If users want to list all tasks in a module, a virtual_task
returns all recent supported operations; if they request virtual
task's status - info about the one specified operation is
presented. Time to live, number of tracked operations etc.
depend on the implementation of individual virtual_task.
All virtual_tasks are kept only on shard 0.

Refs: https://github.com/scylladb/scylladb/issues/15852

New feature, no backport needed.

Closes scylladb/scylladb#16374

* github.com:scylladb/scylladb:
  docs: describe virtual tasks
  db: node_ops: filter topology request entries
  test: add a topology suite for testing tasks
  node_ops: service: create streaming tasks
  node_ops: register node_ops_virtual_task in task manager
  service: node_ops: keep node ops module in storage service
  node_ops: implement node_ops_virtual_task methods
  db: service: modify methods to get topology_requests data
  db: service: add request type column to topology_requests
  node_ops: add task manager module and node_ops_virtual_task
  tasks: api: add virtual task support to get_task_status_recursively
  tasks: api: add virtual task support
  tasks: api: add virtual tasks support to get_tasks
  tasks: add task_handler to hide task and virtual_task differences from user
  tasks: modify invoke_on_task
  tasks: implement task_manager::virtual_task::impl::get_children
  tasks: keep virtual tasks in task manager
  tasks: introduce task_manager::virtual_task
2024-07-24 08:34:28 +03:00
Botond Dénes
0bb6413ea5 Merge 'github: disable scheduled workflow on forks' from Kefu Chai
as these workflows are scheduled periodically, and if they fail, notifications are sent to the repo's owner. to minimize the surprises to the contributors using github, let's disable these workflows on fork repos.

Closes scylladb/scylladb#19736

* github.com:scylladb/scylladb:
  github: do not run clang-tidy as a cron job
  github: disable scheduled workflow on forks
2024-07-24 07:50:39 +03:00
Avi Kivity
3c930a61c9 Merge 'test: scylla_cluster: support more test scenarios' from Patryk Jędrzejczak
We modify `ScyllaCluster.server_start` so that it changes seeds of the
starting node to all currently running nodes. This allows writing tests like
```python
s1 = await manager.server_add(start=False)
await manager.server_add()
await manager.server_start(s1.server_id)
```
However, it disallows writing tests that start multiple clusters. To fix this,
we add the `seeds` parameter to `server_start`.

We also improve the logic in `ScyllaCluster.add_server` to allow writing
tests like
```python
await manager.server_add(expected_error="...")
await manager.server_add()
```

This PR only adds improvements to the `test.py` framework, no need
to backport it.

Closes scylladb/scylladb#19847

* github.com:scylladb/scylladb:
  test: scylla_cluster: improve expected_error in add_server
  test: scylla_cluster: support more test scenarios
  test: scylla_cluster: correctly change seeds in server_start
2024-07-23 22:05:31 +03:00
Patryk Jędrzejczak
02ccd2e3af test: scylla_cluster: improve expected_error in add_server
We make two changes:
- we lease the IP address of a node that failed to boot because of
  an expected error,
- we don't log "Cluster ... added ..." when a node fails to boot
  because of an expected error.
2024-07-23 14:35:09 +02:00
Patryk Jędrzejczak
4079cd1a7b test: scylla_cluster: support more test scenarios
Here are some examples of tests that don't work with no initial
nodes, but they should work:

1.
```
await manager.server_add(expected_error="...")
await manager.server_add()
```

2.
```
await manager.servers_add(2, expected_error="...")
await manager.servers_add(2)
```

3.
```
s1 = await manager.server_add(start=False)
await manager.server_start(s1.server_id, expected_error="...")
await manager.server_add()
```

4.
```
[s1, s2] = await manager.servers_add(2, start=False)
await manager.server_start(s1.server_id, expected_error="...")
await manager.server_start(s2.server_id, expected_error="...")
await manager.servers_add(2)
```

5.
```
s1 = await manager.server_add(start=False)
await manager.server_add()
await manager.server_start(s1.server_id)
```

6.
```
[s1, s2] = await manager.servers_add(2, start=False)
await manager.servers_add(2)
await manager.server_start(s1.server_id)
await manager.server_start(s2.server_id)
```

In this patch, we make a few improvements to make tests like the ones
presented above work. I tested all the examples above manually.

From now on, servers receive correct seeds if the first servers added
in the test didn't start or failed to boot.

Also, we remove the assertion preventing the creation of a second
cluster. This assertion failed the tests presented above. We could
weaken it to make these tests pass, but it would require some work.
Moreover, we have tests that intentionally create two clusters.
Therefore, we go for the easiest solution and accept that a single
`ScyllaCluster` may not correspond to a single Scylla cluster.
2024-07-23 14:35:09 +02:00
Patryk Jędrzejczak
e196c1727e test: scylla_cluster: correctly change seeds in server_start
We change seeds in `ScyllaCluster.server_start` to all currently
running nodes. The previous code only pretended that it did it.

After doing this change, writing tests that create multiple clusters
is impossible. To allow it, we add the `seeds` parameter to
`ManagerClient.server_start`. We use it to fix and simplify the only
test that creates two clusters - `test_different_group0_ids`.
2024-07-23 14:35:08 +02:00
Aleksandra Martyniuk
d04159e7de docs: describe virtual tasks 2024-07-23 13:35:02 +02:00
Aleksandra Martyniuk
c64cb98bcf db: node_ops: filter topology request entries
system_keyspace::get_topology_request_entries returns entries for
requests which are running or have finished after specified time.

In task manager node ops task set the time so that they are shown
for task_ttl seconds after they have finished.
2024-07-23 13:35:02 +02:00
Aleksandra Martyniuk
36b77c0592 test: add a topology suite for testing tasks
Add topology_tasks test suite for testing task manager's node ops
tasks. Add TaskManagerClient to topology_tasks for an easy usage
of task manager rest api.

Write a test for bootstrap, replace, rebuild, decommission and remove
top level tasks using the above.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
a903971a74 node_ops: service: create streaming tasks
Create tasks which cover streaming part of topology changes. These
tasks are children of respective node_ops_virtual_task.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
63e82764e1 node_ops: register node_ops_virtual_task in task manager 2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
8e56913fdf service: node_ops: keep node ops module in storage service
Keep task manager node ops module in storage service. It will be
used to create and manage tasks related to topology changes.

The module is created and registered in storage service constructor.
In storage_service::stop() the module is stopped and so all the remaining
tasks would be unregistered immediately after they are finished.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
b97a348361 node_ops: implement node_ops_virtual_task methods 2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
94282b5214 db: service: modify methods to get topology_requests data
Modify get_topology_request_state (and wait_for_topology_request_completion),
so that it doesn't call on_internal_error when request_id isn't
in the topology_requests table if require_entry == false.

Add other methods to get topology request entry.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
880058073b db: service: add request type column to topology_requests
topology_requests table will be used by task manager node ops tasks,
but it loses info about request type, which is required by tasks.

Add request_type column to topology_requests.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
91fbfbf98a node_ops: add task manager module and node_ops_virtual_task
Add task manager node ops module and node_ops_virtual_task.

Some methods will be implemented in later patches.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
d2e6010670 tasks: api: add virtual task support to get_task_status_recursively
Virtual tasks are supported by get_task_status_recursively. Currently
only local descendants' statuses are shown.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
5f7f403a15 tasks: api: add virtual task support
Virtual tasks are supported by get_task_status, abort_task and
wait_task.

Task status returned by get_task_status and wait_task:
- contains task_kind to indicate whether it's virtual (cluster) or
  regular (node) task;
- children list apart from task_id contains node address of the task.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
20ba7ceff9 tasks: api: add virtual tasks support to get_tasks
task_manager/list_module_tasks/{module} starts supporting virtual tasks,
which means that their stats will also be shown for users.

Additional task_kind param is added to indicate whether the task is
virutal (cluster-wide) or regular (node-wide).

Support in other paths will be added in following patches.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
1d85b319e0 tasks: add task_handler to hide task and virtual_task differences from user
Contrary to regular tasks, which are per-operation, virtual tasks
are associated with the whole group of operations. There may be many
operations of each group performed at the same time. Info about each
running operation will be shown to a user through the API.

For virtual tasks, task manager imitates a regular task covering
each operation, but task_manager::tasks aren't actually created
in the memory. Instead, information (e.g. status) about the operation
is retrieved from associated service and passed to a user.

To hide most of the differences from user, task_handler class is created.
Task handler performs appropriate actions depending on task's kind.

However, users need to stay conscious about the kind of task, because:
- get_task_status and wait_task do not unregister virtual tasks;
- time for which a virtual tasks stays in task manager depends
  on associated service and tasks' implementation;
- number of virtual task's children shown by get_tasks doesn't have
  to be monotonous.

API is modified to use task_handler.
API-specific classes are moved to task_handler.{cc,hh}.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
abde7ba271 tasks: modify invoke_on_task
Modify task_manager::invoke_on_task to also check virtual tasks.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
6029936665 tasks: implement task_manager::virtual_task::impl::get_children
Return a vector of task_identity of all children of a virtual task
in a cluster.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
9de8d4b5b0 tasks: keep virtual tasks in task manager
Virtual tasks are kept in task manager together with regular tasks.
All virtual tasks are stored on shard 0.

task_manager::module::make_task is modified to consider virtual
tasks as possible parents.
2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk
00cfc49d18 tasks: introduce task_manager::virtual_task
A virtual task is a new kind of task supported by task manager,
which covers cluster-wide operations.

From users' perspective virtual tasks behave similarly
to task_manager::tasks. The API side of virtual tasks will be
covered in the following patches.

Contrary to task_manager::task, virtual task does not update
its fields proactively. Moreover, no object is kept in memory
for each individual virtual task's operation. Instead a service
(or services) is queried on API user's demand to learn about
the status of running operation. Hence the name.

task_manager::virtual_task is responsible for a whole group
of virtual tasks, i.e. for tracking and generating statuses
of all operations of similar type.

To enable tracking of some kind of operations, one needs to
override task_manager::virtual_task::impl and provide implementations
of the methods returning appropriate information about the operations.
task_manager::virtual_task must be kept on shard 0.

Similarly to task_manager::tasks, virtual tasks can have child tasks,
responsible for tracking suboperations' progress. But virtual tasks
cannot have parents - they are always roots in task trees.

Some methods and structs will be implemented in later patches.
2024-07-23 13:35:01 +02:00
Nadav Har'El
bac7c33313 alternator: fix "/localnodes" to not return nodes still joining
Alternator's "/localnodes" HTTP request is supposed to return the list of
nodes in the local DC to which the user can send requests.

The existing implementation incorrectly used gossiper::is_alive() to check
for which nodes to return - but "alive" nodes include nodes which are still
joining the cluster and not really usable. These nodes can remain in the
JOINING state for a long time while they are copying data, and an attempt
to send requests to them will fail.

The fix for this bug is trivial: change the call to is_alive() to a call
to is_normal().

But the hard part of this test is the testing:

1. An existing multi-node test for "/localnodes" assummed that right after
   a new node was created, it appears on "/localnodes". But after this
   patch, it may take a bit more time for the bootstrapping to complete
   and the new node to appear in /localnodes - so I had to add a retry loop.

2. I added a test that reproduces the bug fixed here, and verifies its
   fix. The test is in the multi-node topology framework. It adds an
   injection which delays the bootstrap, which leaves a new node in JOINING
   state for a long time. The test then verifies that the new node is
   alive (as checked by the REST API), but is not returned by "/localnodes".

3. The new injection for delaying the bootstrap is unfortunately not
   very pretty - I had to do it in three places because we have several
   code paths of how bootstrap works without repair, with repair, without
   Raft and with Raft - and I wanted to delay all of them.

Fixes #19694.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19725
2024-07-23 13:51:16 +03:00
Pavel Emelyanov
65565a56c3 Merge 's3/client: add client::upload_file()' from Kefu Chai
this member function prepares for the backup feature, where the
object to be stored in the object storage is already persisted as a
file on local filesystem. this brings us two benefits:

- with the file, we don't need to accumulate the payloads in memory
  and send them in batch, as we do in upload_sink and in
  upload_jumbo_sink. this puts less pressure on the memory subsystem.
- with the file, we can read multiple parts in parallel if multpart
  upload applies to it, this helps to improve the throughput.

so, this new helper is introduced to help upload an sstable from local
filesystem to the object storage.

Fixes https://github.com/scylladb/scylladb/issues/16287

Closes scylladb/scylladb#16387

* github.com:scylladb/scylladb:
  s3/client: add client::upload_file()
  s3/client: move constants related to aws constraints out
2024-07-23 12:39:27 +03:00
Kefu Chai
061def001d s3/client: add client::upload_file()
this member function prepares for the backup feature, where the
object to be stored in the object storage is already persisted as a
file on local filesystem. this brings us two benefits:

- with the file, we don't need to accumulate the payloads in memory
  and send them in batch, as we do in upload_sink and in
  upload_jumbo_sink. this puts less pressure on the memory subsystem.
- with the file, we can read multiple parts in parallel if multpart
  upload applies to it, this helps to improve the throughput.

so, this new helper is introduced to help upload an sstable from local
filesystem to the object storage.

Fixes #16287
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-23 14:39:30 +08:00
Kefu Chai
6701ce50a5 s3/client: move constants related to aws constraints out
minimum_part_size and aws_maximum_parts_in_piece are
AWS S3 related constraints, they can be reused out of
client::upload_sink and client::upload_jumbo_sink, so
in this change

* extract them out.
* use the user-defined literal with IEC prefix for
  better readablity to define minimum_part_size
* add "aws_" prefix to `minimum_part_size` to be
  more consistent.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-23 14:33:54 +08:00
Takuya ASADA
c3bea539b6 dist: support nonroot and offline mode for scylla-housekeeping
Introduce support nonroot and offline mode for scylla-housekeeping.

Closes #13084

Closes scylladb/scylladb#13088
2024-07-23 07:57:32 +03:00
Aleksandra Martyniuk
dfe3af40ed test: tasks: adjust tests to new wait_task behavior
After c1b2b8cb2c /task_manager/wait_task/
does not unregister tasks anymore.

Delete the check if the task was unregistered from test_task_manager_wait.
Check task status in drain_module_tasks to ensure that the task
is removed from task manager.

Fixes: #19351.

Closes scylladb/scylladb#19834
2024-07-22 18:24:54 +03:00
Nadav Har'El
9eb47b3ef0 Merge 'config: round-trip boolean configuration variables' from Avi Kivity
When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted
on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in
both directions.

Not a regression, so a backport isn't strictly necessary.

Closes scylladb/scylladb#19792

* github.com:scylladb/scylladb:
  config: specialize from-string conversion for bool
  config: wrap boost::lexical_cast<> when converting from strings
2024-07-22 17:53:02 +03:00
Botond Dénes
d3135db457 Merge 'commitlog: Add optional max lifetime parameter to cl instance' from Calle Wilund
If set, any remaining segment that has data older than this threshold will request flushing, regardless of data pressure. I.e. even a system where nothing happends will after X seconds flush data to free up the commit log.

Related to  #15820

The functionality here is to prevent pathological/test cases where a silent system cannot fully process stuff like compaction, GC etc due to things like CL forcing smaller GC windows etc.

Closes scylladb/scylladb#15971

* github.com:scylladb/scylladb:
  commitlog: Make max data lifetime runtime-configurable
  db::config: Expose commitlog_max_data_lifetime_in_s parameter
  commitlog: Add optional max lifetime parameter to cl instance
2024-07-22 17:21:33 +03:00
Botond Dénes
3ff33e9c70 Update ./tools/java submodule
* ./tools/java dbaf7ba7...0b4accdd (1):
  > cassandra-stress: Make default repl. strategy NetworkTopologyStrategy

Closes scylladb/scylladb#19818
2024-07-22 17:12:09 +03:00
Kamil Braun
8ec90a0e60 docs: extend "forbidden operations" section for Raft-topology upgrade
The Raft-topology upgrade procedure must not be run concurrently with
version upgrade.

Closes scylladb/scylladb#19746
2024-07-22 12:45:38 +03:00
Botond Dénes
591876b44e Merge 'sstables: do not reload components of unlinked sstables' from Lakshmi Narayanan Sreethar
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.

The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.

Fixes https://github.com/scylladb/scylladb/issues/19722

Closes scylladb/scylladb#19717

* github.com:scylladb/scylladb:
  boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded
  sstables: do not reload components of unlinked sstables
  sstables/sstables_manager: introduce on_unlink method
2024-07-22 12:08:25 +03:00
Avi Kivity
358147959e Merge 'keep table directory open for flushing' from Laszlo Ersek
`filesystem_storage` methods frequently call `sync_directory()`, for the sake of flushing (sync'ing) a directory. `sync_directory()` always brackets the sync with open and close, and given that most `sync_directory()` calls target the sstable base directory, those repeated opens and closes are considered wasteful. Rework the `filesystem_storage::_dir` member (from a mere pathname) so that it stand for an `opened_directory` object, which keeps the sstable base directory open, for the purpose of repeated sync'ing.

Resolves #2399.

Closes scylladb/scylladb#19624

* github.com:scylladb/scylladb:
  sstables/storage: synch "dst_dir" more leanly in create_links_common()
  sstables/storage: close previous directory asynchronously upon dir change
  sstables/storage: futurize change_dir_for_test()
  sstables/storage: sync through "opened_directory" in filesystem...::move()
  sstables/storage: sync through "opened_directory" in the "easy" cases
  sstables/storage: introduce "opened_directory" class
2024-07-21 17:07:44 +03:00
Yaron Kaikov
d3cbe04130 .github/mergify.yml: update conf to support 6.1
Modify Mergify configuation to support `6.1` instead of `5.2` which is
EOL

Closes scylladb/scylladb#19810
2024-07-21 17:02:19 +03:00
Łukasz Paszkowski
781eb7517c api/system: add highest_supported_sstable_format path
Current upgrade dtest rely on a ccm node function to
get_highest_supported_sstable_version() that looks for
r'Feature (.*)_SSTABLE_FORMAT is enabled' in the log files.

Starting from scylla-6.0 ME_SSTABLE_FORMAT is enabled by default
and there is no cluster feature for it. Thus get_highest_supported_sstable_version()
returns an empty list resulting in the upgrade tests failures.

This change introduces a seperate API path that returns the highest
supported sstable format (one of la, mc, md, me) by a scylla node.

Fixes scylladb/scylladb#19772

Backports to 6.0 and 6.1 required. The current upgrade test in dtest
checks scylla upgrades up to version 5.4 only. This patch is a
prerequisite to backport the upgrade tests fix in dtest.

Closes scylladb/scylladb#19787
2024-07-21 17:00:19 +03:00
Avi Kivity
36b57f3432 Merge 'token: inline optimizations' from Benny Halevy
This series contains several optimizations for dht::token
around its comparison functions as well as minimum_token and maximum_token definitions,
by moving them inline into dht/token.hh

This results in a nice improvement in perf-simple-query:
```
==> perf-simple-query.pre <== (21c67a5a64)
         throughput: mean=95774.01 standard-deviation=1129.83 median=96243.64 median-absolute-deviation=1090.08 maximum=96864.09 minimum=94471.19
instructions_per_op: mean=41813.68 standard-deviation=16.27 median=41809.29 median-absolute-deviation=7.02 maximum=41841.64 minimum=41799.41
  cpu_cycles_per_op: mean=22383.19 standard-deviation=331.01 median=22254.53 median-absolute-deviation=332.26 maximum=22744.11 minimum=21996.73

==> perf-simple-query.post.0 <== (token: move ordering operator inline)
         throughput: mean=96350.01 standard-deviation=640.10 median=96228.88 median-absolute-deviation=621.45 maximum=96988.16 minimum=95478.51
instructions_per_op: mean=41627.13 standard-deviation=37.55 median=41627.06 median-absolute-deviation=2.43 maximum=41679.44 minimum=41573.31
  cpu_cycles_per_op: mean=22184.65 standard-deviation=151.03 median=22163.05 median-absolute-deviation=120.83 maximum=22348.49 minimum=21967.30

==> perf-simple-query.post.1 <== (token: operator<=>: optimize the common case)
         throughput: mean=96778.29 standard-deviation=1719.34 median=97021.72 median-absolute-deviation=1059.56 maximum=98300.99 minimum=93893.75
instructions_per_op: mean=41590.25 standard-deviation=5.53 median=41589.50 median-absolute-deviation=4.17 maximum=41598.39 minimum=41584.57
  cpu_cycles_per_op: mean=22135.33 standard-deviation=471.98 median=21969.30 median-absolute-deviation=244.89 maximum=22905.24 minimum=21685.33

==> perf-simple-query.post.3 <== (token: always initialize data member)
         throughput: mean=98264.33 standard-deviation=998.49 median=98533.02 median-absolute-deviation=780.45 maximum=99075.40 minimum=96656.51
instructions_per_op: mean=41657.61 standard-deviation=22.53 median=41648.49 median-absolute-deviation=12.89 maximum=41696.81 minimum=41642.07
  cpu_cycles_per_op: mean=21808.57 standard-deviation=93.63 median=21794.56 median-absolute-deviation=75.41 maximum=21949.46 minimum=21719.55

==> perf-simple-query.post.4 <== (token: constexpr ctors, methods, and minimum/maximum_token)
         throughput: mean=98095.05 standard-deviation=1333.32 median=98930.22 median-absolute-deviation=906.80 maximum=99209.38 minimum=96194.25
instructions_per_op: mean=41572.28 standard-deviation=6.04 median=41574.49 median-absolute-deviation=4.76 maximum=41579.56 minimum=41564.72
  cpu_cycles_per_op: mean=21831.35 standard-deviation=169.56 median=21732.86 median-absolute-deviation=102.93 maximum=22091.66 minimum=21689.63

==> perf-simple-query.post.5 <== (token: initialize non-key tokens with min() value)
         throughput: mean=99502.32 standard-deviation=1003.70 median=99744.03 median-absolute-deviation=388.87 maximum=100482.95 minimum=97813.42
instructions_per_op: mean=41593.48 standard-deviation=17.27 median=41585.25 median-absolute-deviation=8.46 maximum=41619.41 minimum=41575.86
  cpu_cycles_per_op: mean=21545.90 standard-deviation=86.66 median=21578.01 median-absolute-deviation=43.17 maximum=21612.41 minimum=21395.42
```

Optimization only. No backport required

Closes scylladb/scylladb#19782

* github.com:scylladb/scylladb:
  token: initialize non-key tokens with min() value
  token: make kind-based ctor private
  token: constexpr ctors, methods, and minimum/maximum_token
  token: always initialize data member
  everywhere: use dht::token is_{minimum,maximum}
  token: operator<=>: optimize the common case
  token: move ordering operator inline
  partitioner_test: add more token-level tests
2024-07-21 15:07:36 +03:00
Benny Halevy
365e1fb1b9 token: initialize non-key tokens with min() value
We already have code to return min() for
the minimum and maximum tokens in long_token()
and raw(), so instead of using code to return
it, just make sure to set it in the _data member.

Note that although this change affect serialization,
the existing codebase ignores the deserialized bytes
and places a constant (0 before this patch, or min()
with it) in _data for non-key (minumum or maximum) tokens.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Benny Halevy
9f05072527 token: make kind-based ctor private
Users outside of the token module don't
need to mess with the token::kind.
They can only create key tokens.
Never, minimum or maximum tokens, with a particular
datya value.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Benny Halevy
6806112189 token: constexpr ctors, methods, and minimum/maximum_token
sizeof(dht::token) is only 16 bytes and therefore
it can be passed with 2 registers.

There is no sense in defining minimum_token
and maximum_token out of line, returning a token&
to statically allocated values that require memory
access/copy, while the only call sites that needs
to point to the static min/max tokens are in
dht::ring_position_view.
Instead, they can be defined inline as constexpr
functions and return their const values.

Respectively, define token ctors and methods
as constexpr where applicable (and noexcept while at it
where applicable)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Benny Halevy
e509ccd184 token: always initialize data member
Make sure to always initalize the _data member
to 0 for non-key (minimum or maximum) tokens.

This allows to simplify the equality operator
that now doesn't need to rely on `operator<=>`

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Benny Halevy
850f298ccd everywhere: use dht::token is_{minimum,maximum}
The is_minimum/is_maximum predicates are more
efficient than comparing the the m{minimum,maximum}_token
values, respectrively. since the is_* functions
need to check only the token kind.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Benny Halevy
5a60ba5c5f token: operator<=>: optimize the common case
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Benny Halevy
adc1d7f68f token: move ordering operator inline
Token comparisons are abundant.
The equality operator is defined inline
in dht/token.hh by calling `t1 <=> t2`,
and so is `tri_compare_raw`, which `operator<=>`
calls in the common path, but `operator<=>` itself
is defined out of line, losing the benefits of inlining.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Benny Halevy
7e745d31ed partitioner_test: add more token-level tests
Before changing how minimum and maximum
tokens are represented in memory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:37 +03:00
Kamil Braun
ad68a7f799 Merge 'test: raft: fix the flaky test_raft_recovery_stuck' from Emil Maskovsky
Use the rolling restart to avoid spurious driver reconnects.

This can be eventually reverted once the scylladb/python-driver#295 is fixed.

Fixes scylladb/scylladb#19154

Closes scylladb/scylladb#19771

* github.com:scylladb/scylladb:
  test: raft: fix the flaky `test_raft_recovery_stuck`
  test: raft: code cleanup in `test_raft_recovery_stuck`
2024-07-19 19:34:43 +02:00
Piotr Dulikowski
4571262e46 Merge 'Improve constness of functions schema code' from Marcin Maliszkiewicz
In v4 of scylladb/scylladb#19598 the last commit of the patch was replaced but this change missed merge so submitting it in a separate patch.

In the current patch, the original functions class correctly marks methods as const where appropriate, and the instance() method now returns a const object. This ensures protection against accidental modifications, as all changes must go through the change_batch object.

Since the functions_changer class was intended to serve the same purpose, it is now redundant. Therefore, we are reverting the commit that introduced it.

Relates scylladb/scylladb#19153

Closes scylladb/scylladb#19647

* github.com:scylladb/scylladb:
  cql3: functions: replace template with std::function in with_udf_iter()
  cql3: functions: improve functions class constness handling
  Revert "cql3: functions: make modification functions accessible only via batch class"
2024-07-19 19:23:11 +02:00
Emil Maskovsky
9ab25e5cbf test: raft: replace the use of read_barrier work-around
Replaced the old `read_barrier` helper from "test/pylib/util.py"
by the new helper from "test/pylib/rest_client.py" that is calling
the newly introduced direct REST API.

Replaced in all relevant tests and decommissioned the old helper.

Introduced a new helper `get_host_api_address` to retrieve the host API
address - which in come cases can be different from the host address
(e.g. if the RPC address is changed).

Fixes: scylladb/scylladb#19662

Closes scylladb/scylladb#19739
2024-07-19 19:20:44 +02:00
Laszlo Ersek
680403d2cd sstables/storage: synch "dst_dir" more leanly in create_links_common()
filesystem_storage::create_links_common() runs on directories that
generally differ from "_dir", thus, we can't replace its sync_directory()
calls with _dir.sync(). We can still use a common (temporary)
"opened_directory" object for synching "dst_dir" three times, saving two
open and two close operations.

This patch is best viewed with "git show -W".

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-07-19 15:46:31 +02:00
Laszlo Ersek
0057ee2431 sstables/storage: close previous directory asynchronously upon dir change
In "filesystem_storage", change_dir_for_test() and move() replace "_dir"
with "opened_directory(new_dir)" using the move assignment operator.
Consequently, the file descriptor underlying "_dir" is closed
synchronously as a part of object destruction.

Expose the async file::close() function through "opened_directory".
Introduce filesystem_storage::change_dir() as a common async workhorse for
both change_dir_for_test() and move(). In change_dir(), close the old
directory asynchronously.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-07-19 15:43:19 +02:00
Laszlo Ersek
6711574646 sstables/storage: futurize change_dir_for_test()
Currently change_dir_for_test() is synchronous. Make it return a future,
so that we can use async operations in change_dir_for_test() overrides.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-07-19 15:43:19 +02:00
Laszlo Ersek
ef446c4da0 sstables/storage: sync through "opened_directory" in filesystem...::move()
Near the end of filesystem_storage::move(), we sync both the old
directory, and the new directory, if "delay_commit" is null. At that
point, the new directory is just "_dir"; call _dir.sync() instead of
sync_directory().

This patch is best viewed with "git show -W".

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-07-19 15:14:46 +02:00
Laszlo Ersek
4d33640481 sstables/storage: sync through "opened_directory" in the "easy" cases
Replace

  sst.sstable_write_io_check(sync_directory, _dir.native())

with

  _dir.sync(sst._write_error_handler)

Also replace the explicit (but still relatively "easy")
open_checked_directory() + flush() + flush() operations in
filesystem_storage::seal() with two _dir.sync() calls.

Because filesystem_storage::create_links_common() is marked "const", we
need to declare "_dir" mutable.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-07-19 15:14:46 +02:00
Laszlo Ersek
2c01171a4d sstables/storage: introduce "opened_directory" class
"filesystem_storage::_dir" is currently of type "std::filesystem::path".
Introduce a new class called "opened_directory", and change the type of
"_dir" to the new class "opened_directory".

"opened_directory" keeps the directory open, and offers synchronization on
that open directory (i.e., without having to reopen the directory every
time). In subsequent patches, that will be put to use.

The opening and closing of the wrapped directory cannot easily be handled
explicitly in the "filesystem_storage" member functions.

(

  Namely, test::store() and test::rewrite_toc_without_scylla_component()
  -- both in "test/lib/sstable_utils.hh" -- perform "open -> ... -> seal"
  sequences, and such a sequence may be executed repeatedly. For example,
  sstable_directory_shared_sstables_reshard_correctly()
  [test/boost/sstable_directory_test.cc] does just that; it "reopens" the
  "filesystem_storage" object repeatedly.

)

Rather than trying to restrict the order of "filesystem_storage" member
function calls, replace the "opened_directory" object with a new one
whenever the directory pathname is re-set; namely in
filesystem_storage::change_dir_for_test() and filesystem_storage::move().

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-07-19 15:14:46 +02:00
Piotr Dulikowski
204a479e82 Merge 'db/hints: Test manager::too_many_in_flight_hints_for()' from Dawid Mędrek
In 6e79d64, the behavior of `manager::too_many_in_flight_hints_for()`
was accidentally modified. It remained unnoticed for some time
and then fixed. In this commit, we add a test verifying that
the concurrency of hints being written to disk is indeed limited
and the limitations are imposed properly.

Refs scylladb/scylladb#17636
Fixes scylladb/scylladb#17660

Closes scylladb/scylladb#19741

* github.com:scylladb/scylladb:
  db/hints: Verify that Scylla limits the concurrency of written hints
  db/hints: Coroutinize `hint_endpoint_manager::store_hint()`
  db/hints: Move a constant value to the TU it's used in
2024-07-19 13:26:34 +02:00
Lakshmi Narayanan Sreethar
0615c8a46b boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-19 13:15:57 +05:30
Lakshmi Narayanan Sreethar
31ff69a13c sstables: do not reload components of unlinked sstables
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.

The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.

Fixes #19722

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-19 13:15:57 +05:30
Lakshmi Narayanan Sreethar
dbf22848a8 sstables/sstables_manager: introduce on_unlink method
Added a new method, on_unlink() to the sstable_manager. This method is
now used by the sstable to notify the manager when it has been unlinked,
enabling the manager to update its bookkeeping as required. The
on_unlink method doesn't do anything yet but will be updated by the next
patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-19 13:15:55 +05:30
Kefu Chai
c52f49facb build: cmake: do not mark cqlsh noarch
in 3c7af287, cqlsh's reloc package was marked as "noarch", and its
filename was updated accordingly in `configure.py`, so let's update
the CMake building system accordingly.

this change should address the build failure of

```
08:48:14  [3325/4124] Generating ../Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz
08:48:14  FAILED: Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz
08:48:14  cd /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist && /usr/bin/cmake -E copy /jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz
08:48:14  Error copying file "/jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz" to "/jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz".
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19710
2024-07-19 08:00:17 +03:00
Kefu Chai
34bf10050b build: cmake: bump up the minimal required fmt to 10.0.0
in cccec07581, we started using a featured introduced by {fmt} v10.
so we need to bump up the required version in CMake as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19709
2024-07-19 07:58:31 +03:00
Botond Dénes
79567c1c98 scripts/open-coredump.sh: allow complete bypass of S3 server
In some cases, the S3 server will not know about a certain build and
any attempt to open a coredump which was generated by this build will
fail, because the S3 server returns an empty/illegal response.
There is already a bypass for missing package-url in the S3 server
response, but this doesn't help in the case when the response is also
missing other metadata, like build-id and version info.
Extend this existig mechanism with a new --scylla-package-url flag,
which provides complete bypass. When provided, the S3 server will not be
queried at all, instead the package is downloaded from the link and
version metadata is extracted from the package itself.

Closes scylladb/scylladb#19769
2024-07-18 21:43:53 +03:00
Avi Kivity
58a8fd6f19 Update tools/python3 submodule (install umask, selinux)
* tools/python3 18fa79e...fbf12d0 (1):
  > install.sh: fix incorrect permission on strict umask

Ref https://github.com/scylladb/scylladb/issues/8589
Ref https://github.com/scylladb/scylladb/issues/19775
2024-07-18 21:36:50 +03:00
Avi Kivity
7984e595ce Update tools/java submodule (install selinux context)
* tools/java 33938ec16f...dbaf7ba7db (1):
  > install.sh: apply correct security context on offline installer

Ref https://github.com/scylladb/scylladb/issues/8589
2024-07-18 21:03:32 +03:00
Kefu Chai
4fbfecbb3e Update seastar submodule
* seastar 908ccd93...67065040 (44):
  > metrics: Use this_shard_id unconditionally
  > sstring: prevent fmt from formatting sstring as a sequence
  > coding style: allow lines up to 160 chars in length
  > src/core: remove unnecessary includes
  > when_all: stop using deprecated std::aligned_union_t
  > reactor: respect preempt requests in debug mode
  > core: fix -Wunused-but-set-variable
  > gate: add try_hold
  > sstring: declare nested type with typename
  > rpc: pass start time to `wait_for_reply()` which accepts `no_wait_type`
  > scripts/perftune.py: get rid of "SyntaxWarning: invalid escape sequence"
  > scripts/perftune.py: add support for tweaking VLAN interfaces
  > scripts/perftune.py: improve discovery of bond device slaves
  > scripts/perftune.py: refactor __learn_slaves() function
  > code-cleanup: add missing header guards
  > code-cleanup: remove redundant includes of 'reactor.hh'
  > code-cleanup: explicitly depend on io_desc.hh
  > scripts/perftune.py: aRFS should be disabled by default in non-MQ mode
  > code-cleanup: remove unneeded includes of fair_queue.hh
  > docker: fix mount of install-dependencies
  > code-cleanup: remove redundant includes of linux-aio.hh
  > fstream: reformat the doxygen comment of make_file_input_stream()
  > iostream: use new-style consumer to implement copy()
  > stall-analyser: use 0 for default value of --minimum
  > reactor: fix crash during metrics gathering
  > build: run socket test with linux-aio reactor backend
  > test: Add testing of connect()-ion abort ability
  > linux_perf_event: exclude_idle only on x86_64
  > linux_perf_event: add make_linux_perf_event
  > stall-analyser: gracefully handle empty input
  > shared_token_bucket: resolve FIXME
  > io_tester: ensure that file object is valid when closing it
  > tutorial.md: fix typo in Dan Kegel's name
  > test,rpc: Extend simple ping-pong case
  > rpc: Calculate delay and export it via metrics
  > rpc: Exchange handler duration with server responses
  > rpc: Track handler execution time
  > rpc: Fix hard-coded constants when sending unknown verb reply
  > reactor: Unfriend alien and smp queues
  > reactor: Add and use stopped() getter
  > reactor: Generalize wakeup() callers
  > file: Use lighter access to map of fs-info-s
  > file: Fix indentation after previous patch
  > file: Don't return chain of ready futures from make_file_impl

Closes scylladb/scylladb#19780
2024-07-18 20:00:15 +03:00
Avi Kivity
f7e24cf0b1 Update tools/jmx submodule (umask fix)
* tools/jmx 3328a22...89308b7 (1):
  > install.sh: fix incorrect permission on strict umask

Ref scylladb/scylladb#14383
Ref scylladb/scylladb#8589
2024-07-18 19:37:57 +03:00
Avi Kivity
c3b9e64713 Merge 'sstable::open_sstable: pass origin from the writer' from Lakshmi Narayanan Sreethar
Pass origin when opening the sstable from the writer and store it in the
sstable object. This will make the origin available for the entire write
path.

Closes scylladb/scylladb#19721

* github.com:scylladb/scylladb:
  sstables: use _origin in write path
  sstable::open_sstable: pass and store origin
2024-07-18 19:30:32 +03:00
Avi Kivity
926a02451e Merge 'sstables/index_reader: abort reading during shutdown' from Lakshmi Narayanan Sreethar
This PR adds support for aborting index reads from within `index_consume_entry_context::consume_input` when the server is being stopped. The abort source is now propagated down to the `index_consume_entry_context`, making it available for `consume_input` to check if an abort has been requested. If an abort is detected, `consume_input` will throw an exception to stop the index read operation.

Closes scylladb/scylladb#19453

* github.com:scylladb/scylladb:
  test/boost: test abort behaviour during index read
  sstables/index_reader: stop consuming index when abort has been requested
  sstables::index_consume_entry_context: store abort_source
  sstable: drop old filter only after the new filter is built during rebuild
  sstables/sstables_manager: store abort_source in sstable_manager
  replica/database: pass abort_source to database constructor
2024-07-18 19:26:22 +03:00
Avi Kivity
0780228aa2 config: specialize from-string conversion for bool
The yaml/json representation for bool is true/false, but boost::lexical_cast
is 1/0. Specialize bool conversion to accept true/false (for yaml/json
compatibilty) and 1/0 (for backward compatibility). This provides
round-trip conversion for bool configs in system.config.
2024-07-18 18:38:22 +03:00
Avi Kivity
33eaa61cdd config: wrap boost::lexical_cast<> when converting from strings
Configuration uses boost::lexical_cast to convert strings to native
values (e.g. bools/ints). However, boost::lexical_cast doesn't
recognize true/false for bool. Since we can't change boost::lexical_cast,
replace it with a wrapper that forwards directly to boost::lexical_cast.

In the next step, we'll specialize it for bool.
2024-07-18 18:38:19 +03:00
Piotr Dulikowski
5ec8c06561 test: regression test for MV crash with tablets during decommission
Regression test for scylladb/scylladb#19439.

Co-authored-by: Kamil Braun <kbraun@scylladb.com>
2024-07-18 16:00:26 +02:00
Anna Mikhlin
cd007123c3 Update ScyllaDB version to: 6.2.0-dev 2024-07-18 16:07:07 +03:00
Avi Kivity
47e99f4e04 Merge 'Fix lwt semaphore guard accounting' from Gleb Natapov
Currently the guard does not account correctly for ongoing operation if semaphore acquisition fails. It may signal a semaphore when it is not held.

Should be backported to all supported versions.

Closes scylladb/scylladb#19699

* github.com:scylladb/scylladb:
  test: add test to check that coordinator lwt semaphore continues functioning after locking failures
  paxos: do not signal semaphore if it was not acquired
2024-07-18 14:58:31 +03:00
Dawid Medrek
8b6e887e02 db/hints: Verify that Scylla limits the concurrency of written hints
In 6e79d64, the behavior of `manager::too_many_in_flight_hints_for()`
was accidentally modified. It remained unnoticed for some time
and then fixed. In this commit, we add a test verifying that
the concurrency of hints being written to disk is indeed limited
and the limitations are imposed properly.
2024-07-18 13:49:29 +02:00
Kefu Chai
db56af2e41 replication_strategy: mark fmt::formatter<..>::format() const
since fmt 11, it is required that the format() to be const, otherwise
its caller in fmt library would not be able to call it. and compile
would fail like:

```
/home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT locator/CMakeFiles/scylla_locator.dir/RelWithDebInfo/abstract_replication_strategy.cc.o -MF locator/CMakeFiles/scylla_locator.dir/RelWithDebInfo/abstract_replication_strategy.cc.o.d -o locator/CMakeFiles/scylla_locator.dir/RelWithDebInfo/abstract_replication_strategy.cc.o -c /home/kefu/dev/scylladb/locator/abstract_replication_strategy.cc
In file included from /home/kefu/dev/scylladb/locator/abstract_replication_strategy.cc:9:
In file included from /home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:16:
In file included from /home/kefu/dev/scylladb/gms/inet_address.hh:11:
In file included from /usr/include/fmt/ostream.h:23:
In file included from /usr/include/fmt/chrono.h:23:
In file included from /usr/include/fmt/format.h:41:
/usr/include/fmt/base.h:1393:23: error: no matching member function for call to 'format'
 1393 |     ctx.advance_to(cf.format(*static_cast<qualified_type*>(arg), ctx));
      |                    ~~~^~~~~~
/usr/include/fmt/base.h:1374:21: note: in instantiation of function template specialization 'fmt::detail::value<fmt::context>::format_custom_arg<locator::vnode_effective_replication_map::factory_key, fmt::formatter<locator::vnode_effective_replication_map::factory_key>>' requested here
 1374 |     custom.format = format_custom_arg<
      |                     ^
/home/kefu/dev/scylladb/seastar/include/seastar/util/log.hh:299:33: note: in instantiation of function template specialization 'fmt::format_to<seastar::internal::log_buf::inserter_iterator &, locator::vnode_effective_replication_map::factory_key &, const void *, 0>' requested here
  299 |                     return fmt::format_to(it, fmt.format, std::forward<Args>(args)...);
      |                                 ^
/home/kefu/dev/scylladb/seastar/include/seastar/util/log.hh:428:9: note: in instantiation of function template specialization 'seastar::logger::log<locator::vnode_effective_replication_map::factory_key &, const void *>' requested here
  428 |         log(log_level::debug, std::move(fmt), std::forward<Args>(args)...);
      |         ^
/home/kefu/dev/scylladb/locator/abstract_replication_strategy.cc:561:18: note: in instantiation of function template specialization 'seastar::logger::debug<locator::vnode_effective_replication_map::factory_key &, const void *>' requested here
  561 |         rslogger.debug("create_effective_replication_map: found {} [{}]", key, fmt::ptr(erm.get()));
      |                  ^
/home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:471:10: note: candidate function template not viable: 'this' argument has type 'const fmt::formatter<locator::vnode_effective_replication_map::factory_key>', but method is not marked const
  471 |     auto format(const locator::vnode_effective_replication_map::factory_key& key, FormatContext& ctx) {
      |          ^
1 error generated.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19768
2024-07-18 13:52:36 +03:00
Avi Kivity
c93e2662ae build: regenerate toolchain for optimized clang
Generate a profile-guided-optimization build of clang and install it.
See bd34f2fe46.

The optimized clang package can be found in

  https://devpkg.scylladb.com/clang/clang-18.1.6-Fedora-40-x86_64.tar.gz
  https://devpkg.scylladb.com/clang/clang-18.1.6-Fedora-40-aarch64.tar.gz

Closes scylladb/scylladb#19685
2024-07-18 12:57:45 +03:00
Botond Dénes
8cc99973eb Merge 'Apply sstable io error handler to exceptions generated when opening file' from Calle Wilund
Fixes #19753

SSTable file open provides an `io_error_handler` instance which is applied to a file-wrapper to process any IO errors happing during read/write via the handler in `storage_service`, which in turn will effectively disable the node. However, this is not applied to the actual open operation itself, i.e. any exception generated by the file open call itself will instead just escape to caller.

This PR adds filtering via the `error_handler` to sstable open + makes `storage_service` "isolate" mechanism non-module-static (thus making it testable) and adds tests to check we exhibit the same behaviour in both cases.

The main motivation for this issue it discussions that secondary level IO issues (i.e. caused by extensions) should trigger the same behaviour as, for example, running out of disk space.

Closes scylladb/scylladb#19766

* github.com:scylladb/scylladb:
  memtable_test: Add test for isolate behaviour on exceptions during flush
  cql_test_env: Expose storage service
  storage_service: Make isolate guard non-static and add test accessor
  sstable: apply error_handler on open exceptions
2024-07-18 08:14:40 +03:00
Avi Kivity
d5af86bd8a test: cql-pytest: config_value_context: remove strange ast.literal_eval call
cql-pytest's config_value_context is used to run a code sequence with
different ScyllaDB configuration applied for a while. When it reads
the original value (in order to restore it later), it applies
ast.literal_eval() to it. This is strange, since the config variable isn't
a Python literal.

It was added in 8c464b2ddb ("guardrails: restrict replication
strategy (RS)"). Presumably, as a workaround for #19604 - it sufficiently
massaged the input we read via SELECT to be acceptable later via UPDATE.

Now that #19604 is fixed, we can remove the call to ast.literal_eval,
but have to fix up the parameters to config_value_context to something
that will be accepted without further massaging.

This is a step towards fixing #15559, where we want to run some tests
with a boolean configuration variable changed, and literal_eval is
transforming the string representation of integers to integers and
confusing the driver.

Closes scylladb/scylladb#19696
2024-07-18 08:11:26 +03:00
Dawid Medrek
414ea68cac exceptions/exceptions.hh: Wrap #include <concepts> within an #ifdef
`GitHub Actions / Analyze #includes in source files` keeps reporting
that the include shouldn't be present in the file. The reason is
that we use FMT with version >10, so the fragment of the code that
uses the include is not compiled. We move the include to a place
where it's used, which should fix the warnings.

Closes scylladb/scylladb#19776
2024-07-17 22:09:41 +03:00
Yaron Kaikov
ddcc6ec1e4 dist/docker/debian/build_docker.sh: Build container based on Ubuntu24.04
Now that we added support for Ubuntu24.04 and also migrating our images
to be based on that
(https://github.com/scylladb/scylla-machine-image/pull/530), we should
also modify our docker image

Fixes: https://github.com/scylladb/scylladb/issues/19738

Closes scylladb/scylladb#19764
2024-07-17 18:45:48 +03:00
Calle Wilund
91b1be6736 memtable_test: Add test for isolate behaviour on exceptions during flush
Tests that certain exceptions thrown during flush to sstable does not
crash the node, but does trigger io_error_handler and causes node isolation
2024-07-17 09:36:28 +00:00
Calle Wilund
f996dfc4fa cql_test_env: Expose storage service
So tests can play with it.
2024-07-17 09:36:28 +00:00
Calle Wilund
de728958d1 storage_service: Make isolate guard non-static and add test accessor
Makes storage service isolate repeatable in same process and more testable.
Note, since the test var now is shard-local we need to check twice: once
on error, once on reaching shard zero for actual shutdown.
2024-07-17 09:36:28 +00:00
Calle Wilund
7918ec2e39 sstable: apply error_handler on open exceptions 2024-07-17 09:36:27 +00:00
Emil Maskovsky
a89facbc74 test: raft: fix the flaky test_raft_recovery_stuck
Use the rolling restart to avoid spurious driver reconnects.

This can be eventually reverted once the scylladb/python-driver#295 is
fixed.

Fixes scylladb/scylladb#19154
2024-07-17 09:16:06 +02:00
Emil Maskovsky
ef3393bd36 test: raft: code cleanup in test_raft_recovery_stuck
Cleaning up the imports.
2024-07-17 09:09:46 +02:00
Lakshmi Narayanan Sreethar
7b58fa2534 sstables: use _origin in write path
Now that the origin is available inside the sstable object, no need to
pass it to the methods called in the write path.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:44:28 +05:30
Lakshmi Narayanan Sreethar
b762a09dcd sstable::open_sstable: pass and store origin
Pass origin when opening the sstable from the writer and store it in the
sstable object. This will make the origin available for the entire write
path.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:43:30 +05:30
Lakshmi Narayanan Sreethar
7d0f3ace4a test/boost: test abort behaviour during index read
Added a new boost test, index_reader_test, with a testcase to verifyi
the abort behaviour during an index read using
index_consume_entry_context.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar
64dadd5ec2 sstables/index_reader: stop consuming index when abort has been requested
When an abort is requested, stop further reading of the index file and
throw and exception from index_consume_entry_context::process_state().

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar
c2524337a2 sstables::index_consume_entry_context: store abort_source
Store abort source inside sstables::index_consume_entry_context, so that
the next patch can implement cancelling the index read when abort is
requested.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar
587da62686 sstable: drop old filter only after the new filter is built during rebuild
sstable::maybe_rebuild_filter_from_index drops the existing filter first
and then rebuilds the new filter as the method is only called before the
sstable is sealed. But to make the index read abortable, the old filter
can be dropped only after the new filter is built so that in case if the
index consumer gets aborted, we still have the old filter to write to
disk.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:42:47 +05:30
Lakshmi Narayanan Sreethar
6a3e7a5e7a sstables/sstables_manager: store abort_source in sstable_manager
Add a new member that stores the abort_source. This can later be used by
the sstables to check if an abort has been requested. Also implement
sstables_manager::get_abort_source() that returns a const reference to
the abort source.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:36:06 +05:30
Lakshmi Narayanan Sreethar
e2142974f8 replica/database: pass abort_source to database constructor
This is in preparation for the following patch that adds abort_source
variable to the sstables_manager.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-07-16 20:36:06 +05:30
Piotr Dulikowski
6af7882c59 db/view: drop view updates to replaced node marked as left
When a node that is permanently down is replaced, it is marked as "left"
but it still can be a replica of some tablets. We also don't keep IPs of
nodes that have left and the `node` structure for such node returns an
empty IP (all zeros) as the address.

This interacts badly with the view update logic. The base replica paired
with the left node might decide to generate a view update. Because
storage proxy still uses IPs and not host IDs, it needs to obtain the
view replica's IP and tell the storage proxy to write a view update to
that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to
write a hint towards this address - hinted handoff on the other hand
operates on host IDs and not IPs, so it attempts to translate the IP
back, which triggers an assertion as there is no replica with IP
0.0.0.0.

As a quick workaround for this issue just drop view updates towards
nodes which seem to have IPs that are all zeros. It would be more proper
to keep the view updates as hints and replay them later to the new
paired replica, but achieving this right now would require much more
significant changes. For now, fixing a crash is more important than
keeping views consistent with base replicas.

Fixes: scylladb/scylladb#19439
2024-07-16 15:50:11 +02:00
Emil Maskovsky
21c67a5a64 test: raft: fix the flaky test_change_ip
The python driver might currently trigger spurios reconnects that cause
the `NoHostAvailable` to be thrown, which is not expected.

This patch adds a retry mechanism to the test to make skip this failure
if it occurs, as a work-around.

The proper fix is expected to be done in the scylladb/python-driver#295,
once fixed there this work-around can be reverted.

Fixes: scylladb/scylla#18547

Closes scylladb/scylladb#19759
2024-07-16 15:46:16 +02:00
Botond Dénes
1be6cfb16e Update tools/java submodule
* tools/java 01ba3c19...33938ec1 (1):
  > cassandra-stress: delay before retry
2024-07-16 16:29:51 +03:00
Gleb Natapov
4178589826 test: add test to check that coordinator lwt semaphore continues functioning after locking failures 2024-07-16 12:32:25 +03:00
Gleb Natapov
87beebeed0 paxos: do not signal semaphore if it was not acquired
The guard signals a semaphore during destruction if it is marked as
locked, but currently it may be marked as locked even if locking failed.
Fix this by using semaphore_units instead of managing the locked flag
manually.

Fixes: https://github.com/scylladb/scylladb/issues/19698
2024-07-16 12:32:25 +03:00
Avi Kivity
dde209390f Merge 'sstables: fix some mixups between the writer's schema and the sstable's schema' from Michał Chojnowski
There are two schemas associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

This series fixes the known mixups between the two — when setting up compression,
and when setting up the bloom filters.

Fixes #16065

The bug is present in all supported versions, so the patch has to be backported to all of them.

Closes scylladb/scylladb#19695

* github.com:scylladb/scylladb:
  sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's
  sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's
  sstables: for i_filter downcasts, use dynamic_cast instead of static_cast
2024-07-16 12:17:41 +03:00
Raphael S. Carvalho
c061ec8d1c test: Fix max_ongoing_compaction_test test
```
DEBUG 2024-07-03 00:59:58,291 [shard 0:main] compaction_manager - Compaction task 0x51800002a480 for table tests.3 compaction_group=0 [0x503000062050]: switch_state: none -> pending: pending=2 active=0 done=0 errors=0

DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - Checking droppable sstables in tests.3, candidates=0
DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - time_window_compaction_strategy::newest_bucket:
  now 1720314000000000
  buckets = {
    key=1720314000000000, size=2
    key=1720310400000000, size=2

1720314000000000: GMT: Sunday, July 7, 2024 1:00:00 AM
1720310400000000: GMT: Sunday, July 7, 2024 12:00:00 AM
```

the test failed to complete when ran across different clock hours, as it
expected all sstables produced to belong to same window of 1h size.
let's fix it by reusing timestamps, so it's always consistent.

Fixes #13280.
Fixes #18564.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#19749
2024-07-16 07:29:10 +03:00
Kefu Chai
c911832ed9 github: do not run clang-tidy as a cron job
we already run it for every pull request, so no need to run it
periodically.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-15 19:19:49 +08:00
Kefu Chai
dc189c67a6 github: disable scheduled workflow on forks
as these workflows are scheduled periodically, and if they fail,
notifications are sent to the repo's owner. to minimize the surprises
to the contributors using github, let's disable these workflows on
fork repos.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-15 19:19:28 +08:00
Emil Maskovsky
144794a952 raft: Fix crash in leader_host API handler
The leader_host API handler was eventually using the `req` unique_ptr
after it has been already destroyed (passed down to the future lambda
by reference). This was causing an occassional crash in some tests.

Reworked the leader_host handler to use the req only outside of the
future lambda.

Also updated the code to handle the possibility that the non-default
leader group (other than Group 0) might reside on a different shard
than the shard 0 - using the same concept of calling on all shards via
`invoke_on_all()` as done for the other requests.

Fixes scylladb/scylladb#19714

Closes scylladb/scylladb#19715
2024-07-15 11:06:56 +02:00
Marcin Maliszkiewicz
395dec35c1 cql3: functions: replace template with std::function in with_udf_iter()
Templates are slower to compile and more difficult to read,
in this case generalization is not needed and can be replaced
by std::function.
2024-07-15 09:39:20 +02:00
Marcin Maliszkiewicz
85d38e013c cql3: functions: improve functions class constness handling
Declares getters as const methods. Makes instance() function
return const object so that it may only be modified via change_batch
class.
2024-07-15 09:39:20 +02:00
Marcin Maliszkiewicz
b9861c0bb7 Revert "cql3: functions: make modification functions accessible only via batch class"
This reverts commit 3f1c2fecc2.

This access control property will be implemented differently
(by using const) in subsequent commit hence revert.
2024-07-15 09:39:20 +02:00
Dawid Medrek
7301a96ff4 db/hints: Coroutinize hint_endpoint_manager::store_hint() 2024-07-15 04:15:25 +02:00
Avi Kivity
c11f2c9bcd Merge 'scylla-housekeeping: fix exception on parsing version string v2' from Takuya ASADA
This reverts 65fbf72ed0 and introduce new version of the patch which fixes SCT breakage after the commit merged.

----

Since Python 3.12, version parsing becomes strict, parse_version() does
not accept the version string like '6.1.0~dev'.
To fix this, we need to pass acceptable version string to parse_version() like
'6.1.0.dev0', which is allowed on Python version scheme.

reference: https://packaging.python.org/en/latest/specifications/version-specifiers/

Fixes https://github.com/scylladb/scylladb/issues/19564

Closes https://github.com/scylladb/scylladb/pull/19572

Closes scylladb/scylladb#19670

* github.com:scylladb/scylladb:
  scylla-housekeeping: fix exception on parsing version string
  Revert "scylla-housekeeping: fix exception on parsing version string"
2024-07-14 16:24:41 +03:00
Raphael S. Carvalho
8df7f78969 replica: rename for_each_const_compaction_group()
use same name as non-const-qualified variant, by relying on
overloading.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-07-12 16:33:34 -03:00
Raphael S. Carvalho
518677d7f9 replica: Fix comment about compaction group
there's not a 1:1 relationship between compaction group count and
tablet count. a tablet replica has a storage group instance, which
may map to multiple compaction groups during split mode.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-07-12 16:24:51 -03:00
Raphael S. Carvalho
f139aa1df6 replica: remove unused compaction_group_vector
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-07-12 16:16:47 -03:00
Botond Dénes
53a6ec05ed Merge 'replica: remove rwlock for protecting iteration over storage group map' from Raphael "Raph" Carvalho
rwlock was added to protect iterations against concurrent updates to the map.

the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup).

the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time).

to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out.

Fixes #18821.

```
WRITE
=====

./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets --write

- BEFORE

65559.52 tps ( 59.6 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   52841 insns/op,   30946 cycles/op,        0 errors)
67408.05 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53018 insns/op,   30874 cycles/op,        0 errors)
67714.72 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53026 insns/op,   30881 cycles/op,        0 errors)
67825.57 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53015 insns/op,   30821 cycles/op,        0 errors)
67810.74 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53009 insns/op,   30828 cycles/op,        0 errors)

         throughput: mean=67263.72 standard-deviation=967.40 median=67714.72 median-absolute-deviation=547.02 maximum=67825.57 minimum=65559.52
instructions_per_op: mean=52981.61 standard-deviation=79.09 median=53014.96 median-absolute-deviation=36.54 maximum=53025.79 minimum=52840.56
  cpu_cycles_per_op: mean=30869.90 standard-deviation=50.23 median=30874.06 median-absolute-deviation=42.11 maximum=30945.94 minimum=30820.89

- AFTER
65448.76 tps ( 59.5 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   52788 insns/op,   31013 cycles/op,        0 errors)
67290.83 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53025 insns/op,   30950 cycles/op,        0 errors)
67646.81 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53025 insns/op,   30909 cycles/op,        0 errors)
67565.90 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53058 insns/op,   30951 cycles/op,        0 errors)
67537.32 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   52983 insns/op,   30963 cycles/op,        0 errors)

         throughput: mean=67097.93 standard-deviation=931.44 median=67537.32 median-absolute-deviation=467.97 maximum=67646.81 minimum=65448.76
instructions_per_op: mean=52975.85 standard-deviation=108.07 median=53024.55 median-absolute-deviation=49.45 maximum=53057.99 minimum=52788.49
  cpu_cycles_per_op: mean=30957.17 standard-deviation=37.43 median=30951.31 median-absolute-deviation=7.51 maximum=31013.01 minimum=30908.62

READ
=====

./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets

- BEFORE

79423.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41840 insns/op,   26820 cycles/op,        0 errors)
81076.70 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41837 insns/op,   26583 cycles/op,        0 errors)
80927.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41829 insns/op,   26629 cycles/op,        0 errors)
80539.44 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41841 insns/op,   26735 cycles/op,        0 errors)
80793.10 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41864 insns/op,   26662 cycles/op,        0 errors)

         throughput: mean=80551.99 standard-deviation=661.12 median=80793.10 median-absolute-deviation=375.37 maximum=81076.70 minimum=79423.36
instructions_per_op: mean=41842.20 standard-deviation=13.26 median=41840.14 median-absolute-deviation=5.68 maximum=41864.50 minimum=41829.29
  cpu_cycles_per_op: mean=26685.88 standard-deviation=93.31 median=26662.18 median-absolute-deviation=56.47 maximum=26820.08 minimum=26582.68

- AFTER
79464.70 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41799 insns/op,   26761 cycles/op,        0 errors)
80954.58 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41803 insns/op,   26605 cycles/op,        0 errors)
81160.90 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41811 insns/op,   26555 cycles/op,        0 errors)
81263.10 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41814 insns/op,   26527 cycles/op,        0 errors)
81162.97 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41806 insns/op,   26549 cycles/op,        0 errors)

         throughput: mean=80801.25 standard-deviation=755.54 median=81160.90 median-absolute-deviation=361.72 maximum=81263.10 minimum=79464.70
instructions_per_op: mean=41806.47 standard-deviation=5.85 median=41806.05 median-absolute-deviation=4.05 maximum=41813.86 minimum=41799.36
  cpu_cycles_per_op: mean=26599.22 standard-deviation=94.84 median=26554.54 median-absolute-deviation=50.51 maximum=26761.06 minimum=26527.05
```

Closes scylladb/scylladb#19469

* github.com:scylladb/scylladb:
  replica: remove rwlock for protecting iteration over storage group map
  replica: get rid of fragile compaction group intrusive list
2024-07-12 15:45:36 +03:00
Dawid Medrek
3e02e66ca8 db/hints: Move a constant value to the TU it's used in
Until now, the constant `HINT_FILE_WRITE_TIMEOUT` was
declared as a static member of `db::hints::manager`.
However, the constant is only ever used in one
translation unit, so it makes more sense to move it
there and not include boilerplate in a header.
2024-07-12 13:08:33 +02:00
Piotr Dulikowski
3cdf549da2 Merge 'remove utils::in' from Avi Kivity
utils::in uses std::aligned_storage, which is deprecated. Rather than fixing it, replace its only
user with simpler code and remove it.

No backport needed as this isn't fixing a bug.

Closes scylladb/scylladb#19683

* github.com:scylladb/scylladb:
  utils: remove utils/in.hh
  gossiper: remove initializer-list overload of add_local_application_state()
2024-07-12 12:06:09 +02:00
Takuya ASADA
373a7825b5 scylla-housekeeping: fix exception on parsing version string
Since Python 3.12, version parsing becomes strict, parse_version() does
not accept the version string like '6.1.0~dev'.
To fix this, we need to pass acceptable version string to parse_version() like
'6.1.0.dev0', which is allowed on Python version scheme.

Also, release canditate version like '6.0.0~rc3' has same issue, it
should be replaced to '6.0.0rc3' to compare in parse_version().

reference: https://packaging.python.org/en/latest/specifications/version-specifiers/

Fixes #19564

Closes scylladb/scylladb#19572
2024-07-12 03:23:34 +09:00
Takuya ASADA
db04f8b16e Revert "scylla-housekeeping: fix exception on parsing version string"
This reverts commit 65fbf72ed0, since
it breaks scylla-housekeeping and SCT because the patch modified
version string.

We shoudn't modify version string directly, need to pass
modified string just for parse_version() instead.
2024-07-12 03:23:34 +09:00
Emil Maskovsky
b9abad0515 test: raft: fix the topology failure recovery test flakiness
Setting the error condition for all nodes in the cluster to avoid
having to check which one is the coordinator. This should make the test
more stable and avoid the flakiness observed when the coordinator node
is the one that got the error condition injected.

Randomizing the retrieved running servers to reproduce the issue more
frequently and to avoid making any assumptions about the order of the
servers.

Note that only the "raft_topology_barrier_fail" needs to run
on a non-coordinator node, the other error "stream_ranges_fail" can be
injected on any node (including the coordinator).

Fixes: scylladb/scylladb#18614

Closes scylladb/scylladb#19663
2024-07-11 16:23:26 +02:00
Piotr Dulikowski
188b4ac0fc Merge 'service_level_controller: update configuration on raft change' from Michał Jadwiszczak
This patch is a follow-up to scylladb/scylladb#16585.

Once we have service levels on raft, we can get rid of update loop, which updates the configuration in a configured interval (default is 10s).
Instead, this PR introduces methods to `group0_state_machine` which look through table ids in mutations in `write_mutation` and update submodules based on that ids.

Fixes: scylladb/scylladb#18060

Closes scylladb/scylladb#18758

* github.com:scylladb/scylladb:
  test: remove `sleep()`s which were required to reload service levels configuration
  test/cql_test_env: remove unit test service levels data accessors
  service/storage_service: reload SL cache on topology_state_load()
  service/qos/service_level_controller: move semaphore breaking to stop
  service/qos/service_level_controller: maybe start and stop legacy update loop
  service/qos/service_level_controller: make update loop legacy
  raft/group0_state_machine: update submodules based on table_id
  service/storage_service: add a proxy method to reload sl cache
2024-07-11 16:18:48 +02:00
Kefu Chai
2a1c9ed7cb github: use needs.read-toolchain.outputs.image for iwyu's container
in 9a71543fd2, we introduced a regression,
which failed to use the proper value for the container image in which
the iwyu workflow is run.

in this change, we pass the correct value, as we do in clang-tidy.yaml
workflow.

Refs 9a71543fd2
Fixes scylladb/scylladb#19704
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19697
2024-07-11 17:17:37 +03:00
Michał Chojnowski
1a8ee69a43 sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

The problem fixed by this patch is that the writer was wrongly creating
the compressor objects based on its own schema, but using them based
based on the sstable's schema the sstable's schema.
This patch forces the writer to use the sstable's schema for both.
2024-07-11 12:53:54 +02:00
Michał Chojnowski
d10b38ba5b sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

The problem fixed by this patch is that the writer was wrongly creating
the filter based on its own schema, while the layer outside the writer
was interpreting it as if it was created with the sstable's schema.

This patch forces the writer to pick the filter's parameters based on the
sstable's schema instead.
2024-07-11 12:53:54 +02:00
Michał Chojnowski
a1834efd82 sstables: for i_filter downcasts, use dynamic_cast instead of static_cast
As of this patch, those static_casts are actually invalid in some cases
(they cast to the wrong type) because of an oversight.
A later patch will fix that. But to even write a reliable reproducer
for the problem, we must force the invalid casts to manifest as a crash
(instead of weird results).

This patch both allows writing a reproducer for the bug and serves
as a bit of defensive programming for the future.
2024-07-11 12:53:54 +02:00
Tomas Nozicka
26466a3043 Allow configuring default loglevel with args for container images
Closes scylladb/scylladb#19671
2024-07-11 12:37:53 +03:00
Piotr Dulikowski
19c5e1807c Merge 'schema: fix describe of indexes on collections' from Michał Jadwiszczak
If the index was created on collection (both frozen or not), its description wasn't a correct create statement.
This patch fixes the bug and includes functions like `full()`, `keys()`, `values()`, ... used to create index on collections.

Fixes scylladb/scylladb#19278

Closes scylladb/scylladb#19381

* github.com:scylladb/scylladb:
  cql-pytest/test_describe: add a test for describe indexes
  schema/schema: fix column names in index description
2024-07-11 09:11:01 +02:00
Kefu Chai
9a71543fd2 github: always use the tools/toolchain/image for lint workflows
instead of hardwiring the toolchain image in github workflows, read it
from `tools/toolchain/image`. a dedicated reusable workflow is added to
read from this file, and expose its content with an output parameter.

also, switch iwyu.yaml workflow to this image, more maintainable this
way. please note, before this change, we are also using the latest
stable build of clang, and since fedora 40 is also using the clang 18,
so the behavior is not change. but with this change, we don't have
the flexibility of using other clang versions provided
https://apt.llvm.org in future.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19655
2024-07-10 23:45:35 +03:00
Avi Kivity
65a7fc9902 Merge 'transport, service: move definition of destructors into .cc' from Kefu Chai
this changeset includes two changes:

- service: move storage_service::~storage_service() into .cc
- transport: move the cql_server::~cql_server() into .cc

they intends to address the compile failures when building scylladb with clang-19. clang-19 is more picky when generating the defaulted destructors with incomplete types. but its behavior makes sense regarding to standard compliance. so let's update accordingly.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#19668

* github.com:scylladb/scylladb:
  transport: move the cql_server::~cql_server() into .cc
  service: move storage_service::~storage_service() into .cc
2024-07-10 23:43:16 +03:00
Kefu Chai
06ba523818 sstable: extract file_writer out
`sstables::write()` has multiple overloads, which are defined in
`sstables/writer.hh`. two of these overloads are template functions,
which have a template parameter named `W`, which has a type constraint
requiring it to fulfill the `Writer` concept. but in `types.hh`, when
the compiler tries to instantiate the template function with signature
of `write(sstable_version_types v, W& out, const T& t)` with
`file_writer` as the template parameter of `w`, `file_writer` is only
forward-declared using `class file_writer` in the same header file, so
this type is still an incomplete type at that moment. that's why the
compiler is not able to determine if `file_writer` fulfills the
constraint or not. actually, the declaration of `file_writer` is located
in `sstables/writer.hh`, which in turn includes `types.hh`. so they
form a cyclic dependency.

in this change, in order to break this cycle, we extract file_writer out
into a separate header file, so that both `sstables/writer.hh` and
`sstables/types.hh` can include it. this address the build failure.

Fixes scylladb/scylladb#19667
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19669
2024-07-10 23:32:47 +03:00
Michał Chojnowski
fdd8b03d4b scylla-gdb.py: add $coro_frame()
Adds a convenience function for inspecting the coroutine frame of a given
seastar task.

Short example of extracting a coroutine argument:

```
(gdb) p *$coro_frame(seastar::local_engine->_current_task)
$1 = {
  __resume_fn = 0x2485f80 <sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&)>,
  ...
  PointerType_7 = 0x601008e67880,
  ...
  __coro_index = 0 '\000'
  ...
(gdb) p $downcast_vptr($->PointerType_7)
  $2 = (schema *) 0x601008e67880
```

Closes scylladb/scylladb#19479
2024-07-10 21:46:27 +03:00
Avi Kivity
45e27c0da2 config, enum_option: allow round-trip string conversion
The default configuration for replication_strategy_warn_list is
["SimpleStrategy"], but one cannot set this via CQL:

cqlsh> select * from system.config where name = 'replication_strategy_warn_list';

 name                           | source  | type                      | value
--------------------------------+---------+---------------------------+--------------------
 replication_strategy_warn_list | default | replication strategy list | ["SimpleStrategy"]

(1 rows)
cqlsh> update system.config set value  = '[NetworkTopologyStrategy]' where name = 'replication_strategy_warn_list';
cqlsh> select * from system.config where name = 'replication_strategy_warn_list';

 name                           | source | type                      | value
--------------------------------+--------+---------------------------+-----------------------------
 replication_strategy_warn_list |    cql | replication strategy list | ["NetworkTopologyStrategy"]

(1 rows)
cqlsh> update system.config set value  = '["NetworkTopologyStrategy"]' where name = 'replication_strategy_warn_list';
WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed for system.config - received 0 responses and 1 failures from 1 CL=ONE." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 1}

Fix by allowing quotes in enum_set parsing.

Bug present since 8c464b2ddb ("guardrails: restrict
replication strategy (RS)", 6.0).

Fixes #19604.

Closes scylladb/scylladb#19605
2024-07-10 20:39:01 +03:00
Yaron Kaikov
e33126fc3e .github/script/label_promoted_commit.py: add label only if ref is PR
we got a failure during check-commit action:
```
Run python .github/scripts/label_promoted_commits.py --commit_before_merge 30e82a81e8 --commit_after_merge f31d5e3204 --repository scylladb/scylladb --ref refs/heads/master

Commit sha is: d5a149fc01
Commit sha is: 415457be2b
Commit sha is: d3b1ccd03a
Commit sha is: 1fca341514
Commit sha is: f784be6a7e
Commit sha is: 80986c17c3
Commit sha is: 492d0a5c86
Commit sha is: 7b3f55a65f
Commit sha is: 78d6471ce4
Commit sha is: 7a69d9070f
Commit sha is: a9e985fcc9
master branch, pr number is: 19213
Traceback (most recent call last):
  File "/home/runner/work/scylladb/scylladb/.github/scripts/label_promoted_commits.py", line 87, in <module>
    main()
  File "/home/runner/work/scylladb/scylladb/.github/scripts/label_promoted_commits.py", line 81, in main
    pr = repo.get_pull(pr_number)
  File "/usr/lib/python3/dist-packages/github/Repository.py", line 2746, in get_pull
    headers, data = self._requester.requestJsonAndCheck(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck
    return self.__check(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/pulls/pulls#get-a-pull-request", "status": "404"}
Error: Process completed with exit code 1.
```

The reason for this failure is since in one of the promoted commits
(a9e985fcc9) had a reference of `Closes`
to an issue.

Fixes: https://github.com/scylladb/scylladb/issues/19677

Closes scylladb/scylladb#19678
2024-07-10 15:27:12 +03:00
Botond Dénes
9bdcba7a46 Merge 'conf: scylla.yaml: update documentation for tablets' from Benny Halevy
Tablets are no longer in experimental_features since 83d491a, so remove them from the experimental_features section documentation.

Also, expand the documentation for the `enable_tablets` option.

Fixes #19456

Needs backport to 6.0

Closes scylladb/scylladb#19516

* github.com:scylladb/scylladb:
  conf: scylla.yaml: enable_tablets: expand documentation
  conf: scylla.yaml: remove tablets from experimental_features doc comment
2024-07-10 14:32:40 +03:00
Avi Kivity
8b7a2661c1 utils: remove utils/in.hh
It uses deprecated std::aligned_storage and had only one user (now
removed) rather than maintain it, remove.
2024-07-10 14:11:27 +03:00
Avi Kivity
d50ba03965 gossiper: remove initializer-list overload of add_local_application_state()
The initializer_list overload uses a too-clever technique to avoid copies.
While copies here are unlikely to pose any real problem (we're allocating
map nodes anyway), it's simple enough to provide a copy-less replacement
that doesn't require questionable tricks.

We replace the initializer_list<..., in<>> overload with a variadic
template that constructs a temporary map.
2024-07-10 14:11:27 +03:00
Michał Jadwiszczak
375499b727 test: remove sleep()s which were required to reload service levels configuration
Previously, some service levels tests requires to sleep in order to
ensure in-memory configuration of service levels was updated.

Now, when we are updating the configuration as the raft log is applied,
doing read barrier (for instance to execute `DROP TABLE IF EXISTS
non_existing_table`) is enough and the sleeps are not needed.
2024-07-10 10:42:21 +02:00
Michał Jadwiszczak
23bebb8037 test/cql_test_env: remove unit test service levels data accessors
Unit test data accessors were created to avoid starting update loop in
unit test and to update controller's configuration directly.

With raft data accessor and configuration updates on applying raft log,
we can get rid of unit test data accessors and use the raft one.

This also make unit test env a bit like real Scylla environment.
2024-07-10 10:42:21 +02:00
Michał Jadwiszczak
de857d9ce3 service/storage_service: reload SL cache on topology_state_load()
Since SL cache is no longer updated in a loop, it needs to be
initialized on startup and because we are updating the cache while
applying raft commands, we can initialize it on topology_state_load().
2024-07-10 10:42:20 +02:00
Jadw1
cf29242962 service/qos/service_level_controller: move semaphore breaking to stop
Before this, the notification semaphore was broken() in do_abort(),
which was triggered by early abort source.
However we are going to reload sl cache on topology state reload
and it can happen after the early abort source is triggered, so
it may throw broken_semaphore exception.

We can move semaphore breaking to stop() method. Legacy update loop
is still stopped in do_abort(), so it doesn't change the order of
service level controller shutdown.
2024-07-10 10:33:24 +02:00
Michał Jadwiszczak
85119b90df service/qos/service_level_controller: maybe start and stop legacy update
loop

In previous commit, we marked the update loop as legacy.

For compatibility reasons, we need to start legacy update loop
when the cluster is in recovery mode or it hasn't been upgraded to raft topology.
Then, in the update loop we check if all conditions are met and stop the
loop.

This commit also moves start of update loop later (after topology state is loaded) in main.cc.
There is no risk in doing it later.
2024-07-10 10:23:04 +02:00
Michał Jadwiszczak
b0f76db9f2 service/qos/service_level_controller: make update loop legacy
Rename method which started update loop to better reflect
what it does.

Previously the method was named `update_from_distributed_data`,
however it doesn't update anything but only start the update loop,
which we are making legacy.
2024-07-10 10:23:04 +02:00
Michał Jadwiszczak
5ddf5e3d7d raft/group0_state_machine: update submodules based on table_id
We want to update service levels cache when any new mutations are
applied to service levels table.
To not create new raft command type, this commit changes design of
`write_mutations` to updated in-memory structures based on
mutations' table_id.
2024-07-10 10:23:04 +02:00
Michał Jadwiszczak
b61047a3f8 service/storage_service: add a proxy method to reload sl cache
In this series of patches, we want to reload service levels cache
when any changes to SL table are applied.

Firstly we need to have a way to trigger reload of the cache from
`group0_state_machines`.
To not introduce another dependency, we can use `storage_service` (which
has access to SL controller) and add a proxy method to it.
2024-07-10 10:23:04 +02:00
Nadav Har'El
c6cffe36dd Merge 'cql: forbid having counter columns in tablets tables' from Piotr Smaron
Counter updates break under tablet migration (#18180), and for this reason counters need to be disabled until the problem is fixed. It's enough to forbid creating a table with counters, as altering a table without counters already cannot result in the table having counters:
1) Adding a counter column to a table without counters:
```
cqlsh> ALTER TABLE temp.cf ADD (col_name counter);
ConfigurationException: Cannot add a counter column (col_name) in a non counter column family
```
2) Altering a column to be of the counter type:
```
cqlsh> ALTER TABLE temp.cf ALTER col_name TYPE counter;
ConfigurationException: Cannot change col_name from type int to type counter: types are incompatible.
```

Fixes: #19449
Fixes: https://github.com/scylladb/scylladb/issues/18876

Need to backport to 6.0, as this is broken there.

Closes scylladb/scylladb#19518

* github.com:scylladb/scylladb:
  doc: add notes to feature pages which don't support tablets
  cql: adjust warning about tablets
  cql: forbid having counter columns in tablets tables
2024-07-10 10:18:30 +03:00
Michał Jadwiszczak
b65a4c66f0 cql-pytest/test_describe: add a test for describe indexes 2024-07-10 07:14:46 +02:00
Kefu Chai
7e4e685964 transport: move the cql_server::~cql_server() into .cc
because transport/server.cc has the complete definition of event_notifier, the
compiler can default-generate the destructor of `cql_server` with the necessary
information. otherwise, clang-19 would fail to build, like:

```
FAILED: CMakeFiles/scylla.dir/Dev/main.cc.o
/home/kefu/.local/bin/clang++ -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_PROGRAM_OPTIONS_NO_LIB -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/abseil -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT CMakeFiles/scylla.dir/Dev/main.cc.o -MF CMakeFiles/scylla.dir/Dev/main.cc.o.d -o CMakeFiles/scylla.dir/Dev/main.cc.o -c /home/kefu/dev/scylladb/main.cc
In file included from /home/kefu/dev/scylladb/main.cc:11:
In file included from /usr/include/yaml-cpp/yaml.h:10:
In file included from /usr/include/yaml-cpp/parser.h:11:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/memory:78:
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'cql_transport::cql_server::event_notifier'
   91 |         static_assert(sizeof(_Tp)>0,
      |                       ^~~~~~~~~~~
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:398:4: note: in instantiation of member function 'std::default_delete<cql_transport::cql_server::event_notifier>::operator()' requested here
  398 |           get_deleter()(std::move(__ptr));
      |           ^
/home/kefu/dev/scylladb/transport/server.hh:135:7: note: in instantiation of member function 'std::unique_ptr<cql_transport::cql_server::event_notifier>::~unique_ptr' requested here
  135 | class cql_server : public seastar::peering_sharded_service<cql_server>, public generic_server::server {
      |       ^
/home/kefu/dev/scylladb/transport/server.hh:135:7: note: in implicit destructor for 'cql_transport::cql_server' first required here
/home/kefu/dev/scylladb/transport/server.hh:149:11: note: forward declaration of 'cql_transport::cql_server::event_notifier'
  149 |     class event_notifier;
      |           ^
1 error generated.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-10 12:52:51 +08:00
Kefu Chai
79ffde063a service: move storage_service::~storage_service() into .cc
as repair/repair.cc has the complete definition of node_ops_meta_data, the
compiler can default-generate the destructor of `storage_service` with the necessary
information. otherwise, clang-19 would fail to build, like:

```
FAILED: repair/CMakeFiles/repair.dir/Dev/repair.cc.o
/home/kefu/.local/bin/clang++ -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT repair/CMakeFiles/repair.dir/Dev/repair.cc.o -MF repair/CMakeFiles/repair.dir/Dev/repair.cc.o.d -o repair/CMakeFiles/repair.dir/Dev/repair.cc.o -c /home/kefu/dev/scylladb/repair/repair.cc
In file included from /home/kefu/dev/scylladb/repair/repair.cc:9:
In file included from /home/kefu/dev/scylladb/repair/repair.hh:11:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/unordered_map:41:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:33:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:35:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:34:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/tuple:38:
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_pair.h:291:11: error: field has incomplete type 'service::node_ops_meta_data'
  291 |       _T2 second;                ///< The second member
      |           ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/ext/aligned_buffer.h:93:28: note: in instantiation of template class 'std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>' requested here
   93 |     : std::aligned_storage<sizeof(_Tp), __alignof__(_Tp)>
      |                            ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:334:43: note: in instantiation of template class '__gnu_cxx::__aligned_buffer<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>' requested here
  334 |       __gnu_cxx::__aligned_buffer<_Value> _M_storage;
      |                                           ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:373:7: note: in instantiation of template class 'std::__detail::_Hash_node_value_base<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>' requested here
  373 |     : _Hash_node_value_base<_Value>
      |       ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:1662:21: note: in instantiation of template class 'std::__detail::_Hash_node_value<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>, false>' requested here
 1662 |                         ._M_bucket_index(declval<const __node_value_type&>(),
      |                                          ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:109:11: note: in instantiation of member function 'std::_Hashtable<utils::tagged_uuid<node_ops_id_tag>, std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>, std::allocator<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<node_ops_id_tag>>, std::hash<utils::tagged_uuid<node_ops_id_tag>>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>>::~_Hashtable' requested here
  109 |     class unordered_map
      |           ^
/home/kefu/dev/scylladb/service/storage_service.hh:109:7: note: forward declaration of 'service::node_ops_meta_data'
  109 | class node_ops_meta_data;
      |       ^
In file included from /home/kefu/dev/scylladb/repair/repair.cc:9:
In file included from /home/kefu/dev/scylladb/repair/repair.hh:11:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/unordered_map:41:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:33:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:35:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:34:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/tuple:38:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_pair.h:60:
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-10 12:52:51 +08:00
Michał Jadwiszczak
253feb6811 schema/schema: fix column names in index description
Previously description of index didn't include functions for
indexes on collections like full(), keys(), values(), etc...
2024-07-09 22:37:05 +02:00
Raphael S. Carvalho
c539b7c861 replica: remove rwlock for protecting iteration over storage group map
rwlock was added to protect iterations against concurrent updates to the map.

the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup).

the rwlock is very problematic because it can result in topology changes blocked, as updating
token metadata takes the exclusive lock, which is serialized with table wide ops like
split / major / explicit flush (and those can take a long time).

to get rid of the lock, we can copy the storage group map and guard individual groups with a gate
(not a problem since map is expected to have a maximum of ~100 elements).
so cleanup can close that gate (carefully closed after stopping individual groups such that
migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered
by nodetool flush) can skip a group that was closed, as such a group is being migrated out.

Check documentation added to compaction_group.hh to understand how
concurrent iterations and updates to the map work without the rwlock.

Yielding variants that iterate over groups are no longer returning group
id since id stability can no longer be guaranteed without serializing split
finalization and iteration.

Fixes #18821.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-07-09 16:59:24 -03:00
Raphael S. Carvalho
ad5c5bca5f replica: get rid of fragile compaction group intrusive list
It was added to make integration of storage groups easier, but it's
complicated since it's another source of truth and we could have
problems if it becomes inconsistent with the group map.

Fixes #18506.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-07-09 16:53:35 -03:00
Piotr Smaron
531659f8dc doc: add notes to feature pages which don't support tablets
There's already a page which lists which features are not working with
tablets: architecture/tablets.html#limitations-and-unsupported-features,
but it's also helpful for users to be warned about this when visiting a
specific feature doc page.
2024-07-09 18:18:05 +02:00
Avi Kivity
f31d5e3204 Merge 'repair/streaming: enable toggling tombstone gc with a config item' from Botond Dénes
We currently disable tombstone GC for compaction done on the read path of streaming and repair, because those expired tombstones can still prevent data resurrection. With time-based tombstone GC, missing a repair for long enough can cause data resurrection because a tombstone is potentially GC'd before it could be spread to every node by repair. So repair disseminating these expired tombstones helps clusters which missed repair for long enough. It is not a guarantee because compaction could have done the GC itself, but it is better than nothing.
This last resort is getting less important with repair-based tombstone GC. Furthermore, we have seen this cause huge repair amplification in a cluster, where expired tombstones triggered repair replicating otherwise identical rows.

This series makes tombstone GC on the streaming/repair compaction path configurable with a config item. This new config item defaults to `false` (current behaviour), setting it to `true`, will enable tombstone GC.

Fixes: https://github.com/scylladb/scylladb/issues/19015

Not a regression, no backport needed

Closes scylladb/scylladb#19016

* github.com:scylladb/scylladb:
  test/topology_custom/test_repair: add test for enable_tombstone_gc_for_streaming_and_repair
  replica/table: maybe_compact_for_streaming(): toggle tombstone GC based on the control flag
  replica: propagate enable_tombstone_gc_for_streaming_and_repair to maybe_compact_for_streaming()
  db/config: introduce enable_tombstone_gc_for_streaming_and_repair
2024-07-09 19:04:11 +03:00
Piotr Smaron
5bfabff9a0 cql: adjust warning about tablets
Made it shorter, simpler and mentioned also that counters aren't
supported with tablets.

Fixes: #18876
2024-07-09 18:01:37 +02:00
Piotr Smaron
c70f321c6f cql: forbid having counter columns in tablets tables
Counter updates break under tablet migration (#18180), and for this
reason they need to be disabled until the problem is fixed.
It's enough to forbid creating a table with counters, as altering a
table without counters already cannot result in the table having
counters:
1) Adding a counter column to a table without counters:
```
cqlsh> ALTER TABLE temp.cf ADD (col_name counter);
ConfigurationException: Cannot add a counter column (col_name) in a non counter column family
```
2) Altering a column to be of the counter type:
```
cqlsh> ALTER TABLE temp.cf ALTER col_name TYPE counter;
ConfigurationException: Cannot change col_name from type int to type counter: types are incompatible.
```

Fixes: #19449
2024-07-09 18:01:31 +02:00
Patryk Wrobel
a89e3d10af code-cleanup: add missing header guards
The following command had been executed to get the
list of headers that did not contain '#pragma once':
'grep -rnw . -e "#pragma once" --include *.hh -L'

This change adds missing include guard to headers
that did not contain any guard.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#19626
2024-07-09 18:31:35 +03:00
Calle Wilund
8295980d14 commitlog: Make max data lifetime runtime-configurable 2024-07-09 12:30:49 +00:00
Calle Wilund
0c6679e55f db::config: Expose commitlog_max_data_lifetime_in_s parameter
To allow user control of commitlog time based expiry.
Set to 24h initially.
2024-07-09 12:30:48 +00:00
Calle Wilund
55d6afda6e commitlog: Add optional max lifetime parameter to cl instance
If set, any remaining segment that has data older than this threshold
will request flushing, regardless of data pressure. I.e. even a system
where nothing happends will after X seconds flush data to free up the
commit log.
2024-07-09 12:30:48 +00:00
Takuya ASADA
cae999c094 toolchain: change optimized clang install method to standard one
Previously optimized clang installation was not used standard build
script, it overwrites preinstalled Fedora's clang binaries instead.
However this breaks on clang-18.1.8, since libLTO versioning convention.
To avoid such problem, let's switch to standard installation method and
swith install prefix to /usr/local.

Fixes #19203

Closes scylladb/scylladb#19505
2024-07-09 14:22:42 +03:00
Tomasz Grabiec
252110bc54 Merge 'mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion' from Michał Chojnowski
apply_monotonically() is run with reclaim disabled. So with some bad luck,
sentinel insertion might fail with bad_alloc even on a perfectly healthy node.
We can't deal with the failure of sentinel insertion, so this will result in a
crash.

This patch prevents the spurious OOM by reserving some memory (1 LSA segment)
and only making it available right before the critical allocations.

Fixes https://github.com/scylladb/scylladb/issues/19552

Closes scylladb/scylladb#19617

* github.com:scylladb/scylladb:
  mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion
  logalloc: add hold_reserve
  logalloc: generalize refill_emergency_reserve()
2024-07-09 13:09:01 +02:00
Anna Stuchlik
948459b1ac doc: replace a link on the CDC+Kafka page
This commit replaces a link to the installation section with a link to the getting started section.

Closes scylladb/scylladb#19658
2024-07-09 12:35:43 +03:00
Michael Litvak
ed33e59714 storage_proxy: remove response handler if no targets
When writing a mutation, it might happen that there are no live targets
to send the mutation to, yet the request can be satisfied. For example,
when writing with CL=ANY to a dead node, the request is completed by
storing a local hint.

Currently, in that case, a write response handler is created for the
request and it remains active until it timeouts because it is not
removed anywhere, even though the write is completed successfuly after
storing the hint. The response handler should be removed usually when
receiving responses from all targets, but in this case there are no
targets to trigger the removal.

In this commit we check if we don't have live targets to send the
mutation to. If so, we remove the response handler immediately.

Fixes scylladb/scylladb#19529

Closes scylladb/scylladb#19586
2024-07-09 12:11:05 +03:00
Kamil Braun
98c18d8904 Merge 'Add API for read barrier' from Emil Maskovsky
Introduce REST API for triggering a read barrier.

This is to make sure the database schema is up to date on the node where
the read barrier is triggered. One of the use cases is the database
backup via the Scylla Manager, which requires that the schema backed up
is matching the data or newer (data can be migrated, but an older schema
would cause issues).

Fixes scylladb/scylladb#19213

Closes scylladb/scylladb#19597

* github.com:scylladb/scylladb:
  raft: add the read barrier REST API
  raft: use `raft_timeout` in trigger_snapshot
  raft: use bad_param_exception for consistency
  test: raft: verify schema updated after read barrier
2024-07-09 10:58:21 +02:00
Kefu Chai
6af989782c test: sstable_directory_test: use THREADSAFE_BOOST_REQUIRE_EQUAL when appropriate
for better debugging experience.

before this change, we have

```
fatal error: in "sstable_directory_test_generation_sanity": critical check sst->generation() == sst1->generation() has failed
```
after this change, we have

```
fatal error: in "sstable_directory_test_generation_sanity": critical
check sst->generation() == sst1->generation() has failed [3ghm_0ntw_29vj625yegw7jodysc != 3ghm_0ntw_29vj625yegw7jodysd]
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19639
2024-07-09 10:54:23 +03:00
Kefu Chai
30e82a81e8 test: do not define boost_test_print_type() for types with operator<<
before this change, we provide `boost_test_print_type()` for all types
which can be formatted using {fmt}. these types includes those who
fulfill the concept of range, and their element can be formatted using
{fmt}. if the compilation unit happens to include `fmt/ranges.h`.
the ranges are formatted with `boost_test_print_type()` as well. this
is what we expect. in other words, we use {fmt} to format types which
do not natively support {fmt}, but they fulfill the range concept.

but `boost::unit_test::basic_cstring` is one of them

- it can be formatted using operator<<, but it does not provide
  fmt::format specialization
- it fulfills the concept of range
- and its element type is `char const`, which can be formatted using
  {fmt}

that's why it's formatted like:

```
test/boost/sstable_directory_test.cc(317): fatal error: in "sstable_directory_test_generation_sanity": critical check ['s', 's', 't', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')', ' ', '=', '=', ' ', 's', 's', 't', '1', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')'] has failed`
```

where the string is formatted as a sequence-alike container. this
is far from readable.

so, in this change, we do not define `boost_test_print_type()` for
the types which natively support `operator<<` anymore. so they can
be printed with `operator<<` when  boost::test prints them.

Fixes scylladb/scylladb#19637
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19638
2024-07-09 10:34:37 +03:00
Botond Dénes
9544c364be scylla-gdb.py: introduce scylla large-objects
The equivalent of small-objects, but for large objects (spans).
Allows listing object of a large-class, and therefore investigating a
run-away class, by attempting to identify the owners of the objects in
it.

Written to investigate #16493

Closes scylladb/scylladb#16711
2024-07-09 10:21:09 +03:00
Emil Maskovsky
a9e985fcc9 raft: add the read barrier REST API
This will allow to trigger the read barrier directly via the API,
instead of doing work-arounds (like dropping a non-existent table).

The intended use-case is in the Scylla Manager, to make sure that
the database schema is up to date after the data has been backed up
and before attempting to backup the database schema.

The database schema in particular is being backed up just on a single
node, which might not yet have the schema at least as new as the data
(data can be migrated to a newer schema, but not a vice-versa).

The read barrier issued on the node should ensure that the node should
have the schema at least as new as the data or newer.

Closes #19213
2024-07-08 18:16:27 +02:00
Emil Maskovsky
7a69d9070f raft: use raft_timeout in trigger_snapshot
Migrate the "trigger_snapshot" to use the standardized `raft_timeout` approach.
2024-07-08 18:13:31 +02:00
Michał Chojnowski
78d6471ce4 mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion
apply_monotonically() is run with reclaim disabled. So with some bad luck,
sentinel insertion might fail with bad_alloc even on a perfectly healthy node.
We can't deal with the failure of sentinel insertion, so this will result in a
crash.

This patch prevents the spurious OOM by reserving some memory (1 LSA segment)
and only making it available right before the critical allocations.

Fixes scylladb/scylladb#19552
2024-07-08 16:08:27 +02:00
Michał Chojnowski
7b3f55a65f logalloc: add hold_reserve
mutation_partition_v2::apply_monotonically() needs to perform some allocations
in a destructor, to ensure that the invariants of the data structure are
restored before returning. But it is usually called with reclaiming disabled,
so the allocations might fail even in a perfectly healthy node with plenty of
reclaimable memory.

This patch adds a mechanism which allows to reserve some LSA memory (by
asking the allocator to keep it unused) and make it available for allocation
right when we need to guarantee allocation success.
2024-07-08 16:08:27 +02:00
Wojciech Przytuła
691e245152 storage_proxy: fix uninitialized LWT contention counter
When debugging the issue of high LWT contention metric, we (the
drivers team) discovered that at least 3 drivers (Go, Java, Rust)
cause high numbers in that metrics in LWT workloads - we doubted that
all those drivers route LWT queries badly. We tried to understand that
metric and its semantics. It took 3 people over 10 hours to figure out
what it is supposed to count.

People from core team suspected that it was the drivers sending
requests to different shards, causing contention. Then we ran the
workload against a single node single shard cluster... and observed
contention. Finally, we looked into the Scylla code and saw it.

**Uninitialized stack value.**

The core member was shocked. But we, the drivers people, felt we always
knew it. It's yet another time that we are blamed for a server-side
issue. We rebuilt scylla with the variable initialized to 0 and the
metric kept being 0.

To prevent such errors in the future, let's consider some lints that
warn against uninitialized variables. This is such an obvious feature
of e.g. Rust, and yet this has shown to be cause a painful bug in 2024.

Closes scylladb/scylladb#19625
2024-07-08 16:55:46 +03:00
Emil Maskovsky
492d0a5c86 raft: use bad_param_exception for consistency
Replace the `std::runtime_error` by the `bad_param_exception` that is used in other places.
2024-07-08 14:31:11 +02:00
Takuya ASADA
cbf33aba5c scylla_coredump_setup: install systemd-coredump before has_zstd()
On Ubuntu/Debian, we have to install systemd-coredump before
running has_ztd(), since it detect ZSTD support by running coredumpctl.

Move pkg_install('systemd-coredump') to the head of the script.

Fixes #19643

Closes scylladb/scylladb#19648
2024-07-08 15:04:34 +03:00
Kefu Chai
229250ef3e .github: use scylla-toolchain for newer fmt
in cccec07581, we started using a featured introduced by {fmt} v10.
but we are still using the {fmt} cooked using seastar, and it is
9.1.0, so this breaks the build when running the clang-tidy workflow.

in this change, instead of building on ubuntu jammy, we use the
scylladb/scylla-toolchain image based on fedora 40, which provides
{fmt} v10.2.1. since we are have clang 18 in fedora 40, this change
does not sacrifice anything.

after this change, clang-tidy workflow should be back to normal.

Fixes scylladb/scylladb#19621
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19628
2024-07-08 11:14:02 +02:00
Emil Maskovsky
80986c17c3 test: raft: verify schema updated after read barrier
Regression test for #19213.
2024-07-08 10:50:32 +02:00
Piotr Dulikowski
3c535641fd Merge 'service/storage_proxy: Add metrics keeping track of incoming hints' from Dawid Mędrek
Although Scylla already exposes metrics keeping track of various information related to hinted handoff, all of them correspond to either storing or sending hints. However, when debugging, it's also crucial to be aware of how many hints are coming to a given node and what their size is. Unfortunately, the existing metrics are not enough to obtain that information.

This PR introduces the following new metrics:

* `sent_bytes_total` – the total size of the hints that have been sent from a given shard,
* `received_hints_total` – the total number of hints that a given shard has received,
* `received_hints_bytes_total` – the total size of the hints a given shard has received.

It also renames `hints_manager_sent` to `hints_manager_sent_total` to avoid conflicts of prefixes between that metric and `sent_bytes_total` in tests.

Fixes scylladb/scylladb#10987

Closes scylladb/scylladb#18976

* github.com:scylladb/scylladb:
  db/hints: Add a metric for the size of sent hints
  service/storage_proxy: Add metrics for received hints
2024-07-08 10:29:53 +02:00
Botond Dénes
56c194e52c Merge 'compaction: not include unused headers' from Kefu Chai
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#19581

* github.com:scylladb/scylladb:
  .github: add compaction to iwyu's CLEANER_DIR
  compaction: not include unused headers
2024-07-08 10:03:51 +03:00
Israel Fruchter
32e6725b8e Update tools/cqlsh submodule
* tools/cqlsh 73bdbeb0...86a280a1 (1):
  > remove cassandra from the shiv package

Ref: scylladb/scylla-cqlsh#96

Closes scylladb/scylladb#19558
2024-07-08 10:00:59 +03:00
Michael Litvak
407274e828 view: drain view builder before database
The view builder is doing write operations to the database.
In order for the view builder to shutdown gracefully without errors, we
need to ensure the database can handle writes while it is drained.
The commit changes the drain order, so that view builder is drained
before the database shuts down.

Fixes scylladb/scylladb#18929

Closes scylladb/scylladb#19609
2024-07-05 22:17:40 +03:00
Botond Dénes
103bd8334a service/paxos/paxos_state: restore resilience against dropped tables
Recently, the code in paxos_state::prepare(), paxos_state::accept() and
paxos_state::learn() was coroutinized by 58912c2cc1, 887a5a8f62 and
2b7acdb32c respectively. This introduced a regression: the latency
histogram updater code, was moved from a finally() to a defer(). Unlike
the former, the latter runs in a noexcept context so the possible
replica::no_such_column_family raised from the latency update code now
crashes the node, instead of failing just the paxos operation as before.
Fix by only updating the latency histogram if the table still exists.

Fixes: scylladb/scylladb#19620

Closes scylladb/scylladb#19623
2024-07-05 14:58:11 +02:00
Anna Stuchlik
8759dfae96 doc: add Run in Docker page to the documentation
The page was missing from the docs. I created the page based on
the information in the download center (which will be closed down soon)
and other ScyllaDB resources.

Closes scylladb/scylladb#19577
2024-07-04 20:20:03 +03:00
Dawid Medrek
0e1cb0dc73 db/hints: Add logging when ignoring hint directories
In 2446cce, we stopped trying to attempt to create
endpoint managers for invalid hint directories
even when their names represented IP addresses or
host IDs. In this commit, we add logging informing
the user about it.

Refs scylladb/scylladb#19173

Closes scylladb/scylladb#19618
2024-07-04 20:14:52 +03:00
Botond Dénes
155acbb306 reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop
Now that the CPU concurency limit is configurable, new reads might be
ready to execute right after the current one was executed. So move the
poll for admitting new reads into the inner loop, to prevent the
situation where the inner loop yields and a concurrent
do_wait_admission() finds that there are waiters (queued because at the
time they arrived to the semaphore, the _ready_list was not empty) but it
is is possible to admit a new read. When this happens the semaphore will
dump diagnostics to help debug the apparent contradiction, which can
generate a lot of log spam. Moving the poll into the inner loop prevents
the false-positive contradiction detection from firing.

Refs: scylladb/scylladb#19017

Closes scylladb/scylladb#19600
2024-07-04 17:47:52 +03:00
Avi Kivity
0626e0487d Merge 'Add copy on write to functions schema code' from Marcin Maliszkiewicz
This is the first patch from series which would allow us to unify raft command code. Property we want to achieve is that all modifications performed by a single raft command can be made visible atomically. This helps to exclude accidental dependencies across subsystem updates and make easier to reason about state.

Here we alter functions schema code so that changes are first applied to a copy of declared functions and then made visible atomically. Later work will apply similar strategy to the whole schema.

Relates scylladb/scylladb#19153

Closes scylladb/scylladb#19598

* github.com:scylladb/scylladb:
  cql3: functions: make modification functions accessible only via batch class
  db: replica: batch functions schema modifications
  cql3: functions: introduce class for batching functions modifications
  cql3: functions: make functions class non-static
  cql3: functions: remove reduntant class access specifiers
  cql3: functions: remove unused java snippet
2024-07-04 17:40:23 +03:00
Anna Stuchlik
822a58f964 doc: remove support for Debian 10
This PR removes support for Debian 10, which reached end of life on June 30, 2024.

Refs https://github.com/scylladb/scylla-enterprise/issues/4377

Closes scylladb/scylladb#19616
2024-07-04 17:24:57 +03:00
Marcin Maliszkiewicz
3f1c2fecc2 cql3: functions: make modification functions accessible only via batch class
This is to assure that all the code is using batching
2024-07-04 13:10:26 +02:00
Marcin Maliszkiewicz
32fe101f9d db: replica: batch functions schema modifications
Before each function change was immediately visible as
during event notification logic yielded.

Now we first gather the modifications and then commit them.

Further work will broaden the scope of atomicity to the whole
schema and even across other subsystems.
2024-07-04 13:10:26 +02:00
Michał Chojnowski
f784be6a7e logalloc: generalize refill_emergency_reserve()
In the next patch, we will want to do the thing as
refill_emergency_reserve() does, just with a quantity different
than _emergency_reserve_max. So we split off the shareable part
to a new function, and use it to implement refill_emergency_reserve().
2024-07-04 12:19:01 +02:00
Pavel Emelyanov
9a654730a7 tablet_allocator: Put more info into failed-to-drain exception
When balancer fails to find a node to balance drained tablets into, it
throws an exception with tablet id and node id, but it's also good to
know more details about the balancing state that lead to failure

refs: #19504

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19588
2024-07-04 12:18:50 +02:00
Marcin Maliszkiewicz
4d937c5a17 cql3: functions: introduce class for batching functions modifications
It will hold a temporary shallow copy of declared functions.
Then each modification adds/removes/replaces stored function object.
At the end change is commited by moving temporary copy to the
main functions class instance.
2024-07-04 12:14:36 +02:00
Nadav Har'El
96dff367f8 Merge 'storage_proxy: update view update backlog on correct shard when writing' from Wojciech Mitros
This series is another approach of https://github.com/scylladb/scylladb/pull/18646 and https://github.com/scylladb/scylladb/pull/19181. In this series we only change where the view backlog gets
updated - we do not assure that the view update backlog returned in a response is necessarily the backlog
that increased due to the corresponding write, the returned backlog may be outdated up to 10ms. Because
 this series does not include this change, it's considerably less complex and it doesn't modify the common
write patch, so no particular performance considerations were needed in that context. The issue being fixed
is still the same, the full description can be seen below.

When a replica applies a write on a table which has a materialized view
it generates view updates. These updates take memory which is tracked
by `database::_view_update_concurrency_sem`, separate on each shard.
The fraction of units taken from the semaphore to the semaphore limit
is the shard's view update backlog. Based on these backlogs, we want
to estimate how busy a node is with its view updates work. We do that
by taking the max backlog across all shards.
To avoid excessive cross-shard operations, the node's (max) backlog isn't
calculated each time we need it, but up to 1 time per 10ms (the `_interval`) with an optimization where the backlog of the calculating shard is immediately up-to-date (we don't need cross-shard operations for it):
```
update_backlog node_update_backlog::fetch() {
    auto now = clock::now();
    if (now >= _last_update.load(std::memory_order_relaxed) + _interval) {
        _last_update.store(now, std::memory_order_relaxed);
        auto new_max = boost::accumulate(
                _backlogs,
                update_backlog::no_backlog(),
                [] (const update_backlog& lhs, const per_shard_backlog& rhs) {
                    return std::max(lhs, rhs.load());
                });
        _max.store(new_max, std::memory_order_relaxed);
        return new_max;
    }
    return std::max(fetch_shard(this_shard_id()), _max.load(std::memory_order_relaxed));
}
```
For the same reason, even when we do calculate the new node's backlog,
we don't read from the `_view_update_concurrency_sem`. Instead, for
each shard we also store a update_backlog atomic which we use for
calculation:
```
    struct per_shard_backlog {
        // Multiply by 2 to defeat the prefetcher
        alignas(seastar::cache_line_size * 2) std::atomic<update_backlog> backlog = update_backlog::no_backlog();
        need_publishing need_publishing = need_publishing::no;

        update_backlog load() const {
            return backlog.load(std::memory_order_relaxed);
        }
    };
 std::vector<per_shard_backlog> _backlogs;
```
Due to this distinction, the update_backlog atomic need to be updated
separately, when the `_view_update_concurrency_sem` changes.
This is done by calling `storage_proxy::update_view_update_backlog`, which reads the `_view_update_concurrency_sem` of the shard (in `database::get_view_update_backlog`)
and then calls node`_update_backlog::add` where the read backlog
is stored in the atomic:
```
void storage_proxy::update_view_update_backlog() {
    _max_view_update_backlog.add(get_db().local().get_view_update_backlog());
}
void node_update_backlog::add(update_backlog backlog) {
    _backlogs[this_shard_id()].backlog.store(backlog, std::memory_order_relaxed);
    _backlogs[this_shard_id()].need_publishing = need_publishing::yes;
}
```
For this implementation of calculating the node's view update backlog to work,
we need the atomics to be updated correctly when the semaphores of corresponding
shards change.

The main event where the view update backlog changes is an incoming write
request. That's why when handling the request and preparing a response
we update the backlog calling `storage_proxy::get_view_update_backlog` (also
because we want to read the backlog and send it in the response):
backlog update after local view updates (`storage_proxy::send_to_live_endpoints` in `mutate_begin`)
```
 auto lmutate = [handler_ptr, response_id, this, my_address, timeout] () mutable {
     return handler_ptr->apply_locally(timeout, handler_ptr->get_trace_state())
             .then([response_id, this, my_address, h = std::move(handler_ptr), p = shared_from_this()] {
         // make mutation alive until it is processed locally, otherwise it
         // may disappear if write timeouts before this future is ready
         got_response(response_id, my_address, get_view_update_backlog());
     });
 };
backlog update after remote view updates (storage_proxy::remote::handle_write)

 auto f = co_await coroutine::as_future(send_mutation_done(netw::messaging_service::msg_addr{reply_to, shard}, trace_state_ptr,
         shard, response_id, p->get_view_update_backlog()));
```
Now assume that on a certain node we have a write request received on shard A,
which updates a row on shard B (A!=B). As a result, shard B will generate view
updates and consume units from its `_view_update_concurrency_sem`, but will
not update its atomic in `_backlogs` yet. Because both shards in the example
are on the same node, shard A will perform a local write calling `lmutate` shown
above. In the `lmutate` call, the `apply_locally` will initiate the actual write on
shard B and the `storage_proxy::update_view_update_backlog` will be called back
on shard A. In no place will the backlog atomic on shard B get updated even
though it increased in size due to the view updates generated there.
Currently, what we calculate there doesn't really matter - it's only used for the
MV flow control delays, so currently, in this scenario, we may only overload
a replica causing failed replica writes which will be later retried as hints. However,
when we add MV admission control, the calculated backlog will be the difference
between an accepted and a rejected request.

Fixes: https://github.com/scylladb/scylladb/issues/18542

Without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none

Closes scylladb/scylladb#19341

* github.com:scylladb/scylladb:
  test: add test for view backlog not being updated on correct shard
  test: move auxiliary methods for waiting until a view is built to util
  mv: update view update backlog when it increases on correct shard
2024-07-04 11:40:09 +03:00
Marcin Maliszkiewicz
16b770ff1a cql3: functions: make functions class non-static
This is done to ease code reuse in the following commit.
It'd also help should we ever want properly mount functions
class to schema object instead of static storage.
2024-07-04 10:24:57 +02:00
Marcin Maliszkiewicz
47033dce7a cql3: functions: remove reduntant class access specifiers 2024-07-04 10:24:57 +02:00
Marcin Maliszkiewicz
e86191b19f cql3: functions: remove unused java snippet
It doesn't seem to serve any purpose now.
2024-07-04 10:24:57 +02:00
Kefu Chai
cccec07581 db: use format_as() in favor of fmt::streamed()
since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also,
we've switched to the build image based on fedora 40, which ships
fmt-devel v10.2.1, there is no need to use fmt::streamed() when
the corresponding format_as() as available.

simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19594
2024-07-04 11:10:43 +03:00
Kefu Chai
35e7a0b36f test/cql-pytest: use offset-aware API to avoid deprecate warning
to avoid warning like

```
DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
```

and to be future-proof, let's use the offset-aware timestamp.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19536
2024-07-04 10:48:00 +03:00
Kefu Chai
03e1fce7aa zstd: include external header with brackets
zstd.h is a header provided by libzstd, so let's include it with
brackets, more consistent this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19538
2024-07-04 10:42:29 +03:00
Takuya ASADA
09e22690dc scylla_coredump_setup: enable compress by default when zstd support detected
We disabled coredump compression by default because it was too slow,
but recent versions of systemd-coredump supports faster zstd based compression,
so let's enable compression by default when zstd support detected.

Related scylladb/scylla-machine-image#462

Closes scylladb/scylladb#18854
2024-07-04 10:38:22 +03:00
Botond Dénes
e3e5f8209d Merge 'alternator: fix "/localnodes" to use broadcast_rpc_address' from Nadav Har'El
This short series fixes Alternator's "/localnodes" request to allow a node's external IP address - configured with `broadcast_rpc_address` - to be listed instead of its usual, internal, IP address.

The first patch fixes a bug in gossiper::get_rpc_address(), which the second patch needs to implement the feature. The second patch also contains regression tests.

Fixes #18711.

Closes scylladb/scylladb#18828

* github.com:scylladb/scylladb:
  alternator: fix "/localnodes" to use broadcast_rpc_address
  gossiper: fix get_rpc_address() for this node
2024-07-04 10:37:28 +03:00
Takuya ASADA
65fbf72ed0 scylla-housekeeping: fix exception on parsing version string
Since Python 3.12, version parsing becomes strict, parse_version() does
not accept the version string like '6.1.0~dev'.
To fix this, we need to replace version string from '6.1.0~dev' to
'6.1.0.dev0', which is allowed on Python version scheme.

reference: https://packaging.python.org/en/latest/specifications/version-specifiers/

Fixes #19564

Closes scylladb/scylladb#19572
2024-07-04 10:27:51 +03:00
Avi Kivity
69450780a7 docs: explain tuning for a node that is overcommitted at the hypervisor level
Closes scylladb/scylladb#19589
2024-07-04 10:23:25 +03:00
Pavel Emelyanov
8809b99736 s3/client: Unmark put-object lambdas from mutable
They don't need to modify the captured objects. In fact, they must not
do it in the first place, because the request can be called more than
once and the buffers must not change between those invocations.

For the memory_sink_buffers there must be const method to get the vector
of temporary_buffers themselves.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19599
2024-07-04 10:07:48 +03:00
Lakshmi Narayanan Sreethar
c80df8504c sstables::maybe_rebuild_filter_from_index: log sstable origin
Log the sstable origin when its bloom filter is being rebuilt. The
origin has to be passed to the method by the caller as it is not
available in the sstable object when the filter is rebuilt.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#19601
2024-07-04 10:01:23 +03:00
Wojciech Mitros
1fdc65279d test: add test for view backlog not being updated on correct shard
This patch adds a test for reproducing issue https://github.com/scylladb/scylladb/issues/18542
The test performs writes on a table with a materialized view and
checks that the view backlog increases. To get the current view
update backlog, a new metric "view_update_backlog" is added to
the `storage_proxy` metrics. The metric differs from the metric
from `database` metric with the same name by taking the backlog
from the max_view_update_backlog which keeps view update backlogs
from all shards which may be a bit outdated, instead of taking
the backlog by checking the view_update_semaphore which the backlog
is based on directly.
2024-07-03 23:18:52 +02:00
Wojciech Mitros
c4f5659c11 test: move auxiliary methods for waiting until a view is built to util
In many materialized view tests we need to wait until a view is built before
actually working on it, future tests will also need it. In existing tests
we use the same, duplicated method for achieving that.
In this patch the method is deduplicated and moved to pylib/util.py
and existing tests are modified to use it instead.
2024-07-03 23:18:52 +02:00
Wojciech Mitros
fd9c7d4d59 mv: update view update backlog when it increases on correct shard
When performing a write, we should update the view update backlog
on the shard where the mutation is actually applied. Instead,
currently we only update it on the shard that initially received
the write request (which didn't change at all) and as a result,
the backlog on the correct shard and the aggregated max view update
backlog are not updated at all.
This patch enables updating the backlog on the correct shard. The
update is now performed just after the view generation and propagation
finishes, so that all backlog increases are noted and the backlog is
ready to be used in the write response.
Additionally, after this patch, we no longer (falsely) assume that
the backlog is modified on the same shard as where we later read it
to attach to a response. However, we still compare the aggregated
backlog from all shards and the backlog from the shard retrieving
the max, as with a shard-aware driver, it's likely the exact shard
whose backlog changed.
2024-07-03 23:18:52 +02:00
Avi Kivity
3fc4e23a36 forward_service: rename to mapreduce_service
forward_service is nondescriptive and misnamed, as it does more than
forward requests. It's a classic map/reduce algorithm (and in fact one
of its parameters is "reducer"), so name it accordingly.

The name "forward" leaked into the wire protocol for the messaging
service RPC isolation cookie, so it's kept there. It's also maintained
in the name of the logger (for "nodetool setlogginglevel") for
compatibility with tests.

Closes scylladb/scylladb#19444
2024-07-03 19:29:47 +03:00
Avi Kivity
f798217293 Merge 'build: cmake: include the whole archive of zstd.a' from Kefu Chai
before this change, when linking scylla-main, the linker discards
the unreferenced symbols defined by zstd.cc. but we use constructor
of static variable `registerator` to register the zstd compressor,
this variable is not used from the linker's point of view. but we
do rely on the side effect of its constructor.
that's why the rules generated by CMake fails to build tests and
scylla executables with zstd support. that's why we have following
test failure:
```
boost.sstable_3_x_test.test_uncompressed_collections_read
...
[Exception] - no_such_class: unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor'
 == [File] - seastar/src/testing/seastar_test.cc
 == [Line] - 43
```

in this change, we single out zstd.cc and build it as an archive,
so that scylla-main can include as a whole. an alternative is to
link scylla-main as a whole archive, but that might increase the disk
foot print when building lots of tests -- some of them do not use all
symbols exposed by scylla-main, and can potentially have smaller
size if linker can discard the unused symbols.

Refs https://github.com/scylladb/scylladb/issues/2717

---

cmake related change, hence no need to backport.

Closes scylladb/scylladb#19539

* github.com:scylladb/scylladb:
  build: cmake: include the whole archive of zstd.a
  build: cmake: find libzstd before using it
2024-07-03 17:38:22 +03:00
Botond Dénes
fca0a58674 Merge 'Close output_stream in get_compaction_history() API handler' from Pavel Emelyanov
If an httpd body writer is called with output_stream<>, it mist close the stream on its own regardless of any exceptions it may generate while working, otherwise stream destructor may step on non-closed assertion. Stepped on with different handler, see #19541

Coroutinize the handler as the first step while at it (though the fix would have been notably shorter if done with .finally() lambda)

Closes scylladb/scylladb#19543

* github.com:scylladb/scylladb:
  api: Close response stream of get_compaction_history()
  api: Flush output stream in get_compaction_history() call
  api: Coroutinize get_compaction_history inner function
2024-07-03 17:00:26 +03:00
Kefu Chai
fd5c04acbb .github: use the latest dbuild image
scylla does not build using scylla-toolchain:fedora-38-20240521, like:

```
FAILED: repair/CMakeFiles/repair.dir/repair.cc.o
/usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/build/gen -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -isystem /__w/scylladb/scylladb/abseil -O2 -std=gnu++2b -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT repair/CMakeFiles/repair.dir/repair.cc.o -MF repair/CMakeFiles/repair.dir/repair.cc.o.d -o repair/CMakeFiles/repair.dir/repair.cc.o -c /__w/scylladb/scylladb/repair/repair.cc
In file included from /__w/scylladb/scylladb/repair/repair.cc:10:
In file included from /__w/scylladb/scylladb/repair/row_level.hh:14:
In file included from /__w/scylladb/scylladb/repair/task_manager_module.hh:14:
In file included from /__w/scylladb/scylladb/tasks/task_manager.hh:20:
In file included from /__w/scylladb/scylladb/seastar/include/seastar/coroutine/parallel_for_each.hh:24:
/usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/ranges:6161:14: error: requires clause differs in template redeclaration
    requires forward_range<_Vp>
             ^
/usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/ranges:5860:14: note: previous template declaration is here
    requires input_range<_Vp>
             ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19547
2024-07-03 16:57:22 +03:00
Kefu Chai
a88496318b alternator: use std::to_underlying() when appropriate
now that we can use C++23 features, there is no need to hardcode the
underlying type anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19546
2024-07-02 18:51:29 +03:00
Kefu Chai
57def6f1e2 docs: install in non-package node
when running `make setup`, we could have following failure:
```
Installing the current project: scylla (4.3.0)
The current project could not be installed: No file/folder found for package scylla
If you do not want to install the current project use --no-root
```

because docs is not a proper python project named "scylla",
and do not have a directory structure expected by poetry. what we
expect from poetry, is to manage the dependencies for building
the document.

so, in this change, we install in the `non-package` mode when running
`poetry install`, this skips the root package, which does not exist.
as an alternative, we could put an empty `scylla.py` under `docs`
directory, but that'd be overkill. or we could pass `--no-root`
to `poetry install`, but would be ideal if we can keep the settings
in a single place.

see also https://python-poetry.org/docs/basic-usage/#operating-modes,
and https://python-poetry.org/docs/cli/#options-2, for more
details on the settings and command line options of poetry.

please note this setting was added to poetry 1.8, so the required
poetry version is updated. we might need to upgrade poetry in existing
installation.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19498
2024-07-02 18:03:20 +03:00
Michael Litvak
08b29460fc mv: skip building view updates on a pending replica
Currently, a pending replica that applies a write on a table that has
materialized views, will build all the view updates as a normal replica,
only to realize at a late point, in db::view::get_view_natural_endpoint(),
that it doesn't have a paired view replica to send the updates to. It will
then either drop the view updates, or send them to a pending view
replica, if such exists.

This work is unnecessary since it may be dropped, and even if there is a
pending view replica to send the updates to, the updates that are built
by the pending replica may be wrong since it may have incomplete
information.

This commit fixes the inefficiency by skipping the view update building
step when applying an update on a pending replica.

The metric total_view_updates_on_wrong_node is added to count the cases
that a view update is determined to be unnecessary.

The test reproduces the scenario of writing to a table and applying
the update on a pending replica, and verifies that the pending replica
doesn't try to build view updates.

Fixes scylladb/scylladb#19152

Closes scylladb/scylladb#19488
2024-07-02 13:10:18 +02:00
Nadav Har'El
d61513c41c Merge 'reader_concurrency_semaphore: make CPU concurrency configurable' from Botond Dénes
The reader concurrency semaphore restricts the concurrency of reads that require CPU (intention: they read from the cache) to 1, meaning that if there is even a single active read which declares that it needs just CPU to proceed, no new read is admitted. This is meant to keep the concurrency of reads in the cache at 1. The idea is that concurrency in the cache is not useful: it just leads to the reactor rotating between these reads, all of the finishing later then they could if they were the only active read in the cache.
This was observed to backfire in the case where there reads from a single table are mostly very fast, but on some keys are very slow (hint: collection full of tombstones). In this case the slow read keeps up the fast reads in the queue, increasing the 99th percentile latencies significantly.

This series proposes to fix this, by making the CPU concurrency configurable. We don't like tunables like this and this is not a proper fix, but a workaround. The proper fix would be to allow to cut any page early, but we cannot cut a page in the middle of a row. We could maybe have a way of detecting slow reads and excluding them from the CPU concurrency. This would be a heuristic and it would be hard to get right. So in this series a robust and simple configurable is offered, which can be used on those few clusters which do suffer from the too strict concurrency limit. We have seen it in very few cases so far, so this doesn't seem to be wide-spread.

Fixes: https://github.com/scylladb/scylladb/issues/19017

This fixes a regression introduced in 5.0, so we have to backport to all currently supported releases

Closes scylladb/scylladb#19018

* github.com:scylladb/scylladb:
  test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrenc  Please enter the commit message for your changes. Lines starting
  test/boost/reader_concurrency_semaphore_test: hoist require_can_admit
  reader_concurrency_semaphore: wire in the configurable cpu concurrency
  reader_concurrency_semaphore: add cpu_concurrency constructor parameter
  db/config: introduce reader_concurrency_semahore_cpu_concurrency
2024-07-02 13:39:00 +03:00
Tzach Livyatan
6ea475ec76 Docs: Fix a typo in sstable-corruption.rst
Closes scylladb/scylladb#19515
2024-07-02 11:58:27 +02:00
Kamil Braun
bcfdeda080 Merge 'co-routinize paxos_state functions' from Gleb
Co-routinize paxos_state functions to make them more readable.

* 'gleb/coroutineze-paxos-state' of github.com:scylladb/scylla-dev:
  paxos: simplify paxos_state::prepare code to not work with raw futures
  paxos: co-routinize paxos_state::learn function
  paxos: remove no longer used with_locked_key functions
  paxos: co-routinize paxos_state::accept function
  paxos: co-routinize paxos_state::prepare function
  paxos: introduce get_replica_lock() function to take RAII guard for local paxos table access
2024-07-02 11:54:13 +02:00
Tzach Livyatan
4938927fc2 Docs: fix typo in config-commands.rst
This is a leftover from https://github.com/scylladb/scylladb/pull/19578,
which mistakenly update the "scylla" script name to "ScyllaDB"

Closes scylladb/scylladb#19583
2024-07-02 10:54:47 +02:00
Kamil Braun
edeb266fc2 Merge 'docs, config: render logging related options' from Kefu Chai
this changeset adds a filter to customize the rendering of default
values, and enables the `scylladb_cc_properties` extension to display
the logging message related options. it prepares for the further
improvements in
https://opensource.docs.scylladb.com/master/reference/configuration-parameters.html.

this changeset also prepare for the improvements requested by #19463

---

it's an improvement in the document, hence no need to backport.

Closes scylladb/scylladb#19483

* github.com:scylladb/scylladb:
  config: add descriptions for default_log_level and friends
  config: define log_to_syslog in a different line
  docs: parse log_legacy_value as declarations of config option
2024-07-02 10:44:50 +02:00
Kefu Chai
aedd145d6b .github: add compaction to iwyu's CLEANER_DIR
to avoid future violations of include-what-you-use.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-02 14:06:42 +08:00
Kefu Chai
e87b64b7bb compaction: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-02 14:06:42 +08:00
Tzach Livyatan
91401f7da5 docs: Update Scylla to ScyllaDB in *all* RST docs files v3
Closes scylladb/scylladb#19578
2024-07-01 18:04:21 +02:00
Andrei Chekun
b6aabca9a7 Add documentation how to use allure reporting
Add documentation how to install and basic usage example of the allure reporting tool.
Fix typo test/README.md

Related: scylladb/qa-tasks#1665

Depends on: scylladb/scylladb#18169

Closes scylladb/scylladb#18710
2024-07-01 16:21:50 +02:00
Gleb Natapov
9ebdb23002 raft: add more raft metrics to make debug easier 2024-07-01 10:55:22 +02:00
Kamil Braun
94bc9d4f5b Merge 'Do not expire local addres in raft address map since the local node cannot disappear' from Gleb Natapov
A node may wait in the topology coordinator queue for awhile before been
joined. Since the local address is added as expiring entry to the raft
address map it may expire meanwhile and the bootstrap will fail. The
series makes the entry non expiring.

Fixes  scylladb/scylladb#19523

Needs to be backported to 6.0 since the bug may cause bootstrap to fail.

Closes scylladb/scylladb#19557

* github.com:scylladb/scylladb:
  test: add test that checks that local address cannot expire between join request placemen and its processing
  storage_service: make node's entry non expiring in raft address map
2024-07-01 09:12:48 +02:00
Kefu Chai
90be71d959 build: cmake: include the whole archive of zstd.a
before this change, when linking scylla-main, the linker discards
the unreferenced symbols defined by zstd.cc. but we use constructor
of static variable `registerator` to register the zstd compressor,
this variable is not used from the linker's point of view. but we
do rely on the side effect of its constructor.
that's why the rules generated by CMake fails to build tests and
scylla executables with zstd support. that's why we have following
test failure:
```
boost.sstable_3_x_test.test_uncompressed_collections_read
...
[Exception] - no_such_class: unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor'
 == [File] - seastar/src/testing/seastar_test.cc
 == [Line] - 43
```

in this change, we single out zstd.cc and build it as an archive,
so that scylla-main can include as a whole. an alternative is to
link scylla-main as a whole archive, but that might increase the disk
foot print when building lots of tests -- some of them do not use all
symbols exposed by scylla-main, and can potentially have smaller
size if linker can discard the unused symbols.

Refs #2717
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-01 11:51:19 +08:00
Kefu Chai
1e0af0fb7e build: cmake: find libzstd before using it
we use libzstd in zstd.cc. so let's find this library before using
it. this helps user to identify problem when preparing the building
environment, instead of being greeted by a compile-time failure.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-01 11:51:19 +08:00
Kefu Chai
b71b638b2e config: add descriptions for default_log_level and friends
so that their description can be displayed in
`reference/configuration-parameters/` web page.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-01 09:47:28 +08:00
Kefu Chai
b486f4ef01 config: define log_to_syslog in a different line
before this change, docs/_ext/scylladb_cc_properties.py parses
the options line by line, because `log_to_stdout` and `log_to_syslog`
are defined in a single line, this script is not able to parse them,
hence fails to display them on the `reference/configuration-parameters/`
web page.

after this change, these two member variables are defined on different
lines. both of them can be displayed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-01 09:47:28 +08:00
Kefu Chai
34cab80103 docs: parse log_legacy_value as declarations of config option
before this change, we only consider "named_value<type>" as the
declaration of option, and the "Type" field of the corresponding
option is displayed if its declaration is found. otherwise, "Type"
field is not rendered. but some logging related options are declared
using `log_legacy_value`, so they are missing.

after this change, they are displayed as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-01 09:47:28 +08:00
Kefu Chai
405f624776 cql3: define dtor of modification_statement in .cc file
before this change, we rely on the compiler to use the
definition of `cql3::attributes` to generate the defaulted
destructor in .cc file. but with clang-19, it insists that
we should have a complete definition available for defining
the defaulted destructor, otherwise it fails the build:

```
/home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o -MF CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o.d -o CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o -c /home/kefu/dev/scylladb/table_helper.cc
In file included from /home/kefu/dev/scylladb/table_helper.cc:10:
In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/coroutine.hh:25:
In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/future.hh:30:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/memory:78:
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'cql3::attributes'
   91 |         static_assert(sizeof(_Tp)>0,
      |                       ^~~~~~~~~~~
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:398:4: note: in instantiation of member function 'std::default_delete<cql3::attributes>::operator()' requested here
  398 |           get_deleter()(std::move(__ptr));
      |           ^
/home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:40:7: note: in instantiation of member function 'std::unique_ptr<cql3::attributes>::~unique_ptr' requested here
   40 | class modification_statement : public cql_statement_opt_metadata {
      |       ^
/home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:40:7: note: in implicit destructor for 'cql3::statements::modification_statement' first required here
/home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:28:7: note: forward declaration of 'cql3::attributes'
   28 | class attributes;
      |       ^
```

so, in this change, we define the destructor in .cc file, where
the complete definition of `cql3::attributes` is available.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19545
2024-06-30 19:35:05 +03:00
Avi Kivity
0ce00ebfbd Merge 'Close output stream in task manager's API get_tasks handler' from Pavel Emelyanov
If client stops reading response early, the server-side stream throws but must be closed anyway. Seen in another endpoint and fixed by #19541

Closes scylladb/scylladb#19542

* github.com:scylladb/scylladb:
  api: Fix indentation after previous patch
  api: Close response stream on error
  api: Flush response output stream before closing
2024-06-30 19:34:00 +03:00
Avi Kivity
3a85d88b68 Merge 'Close output_stream in get_snapshot_details() API handler' from Pavel Emelyanov
All streams used by httpd handlers are to be closed by the handler itself, caller doesn't take care of that.

fixes: #19494

Closes scylladb/scylladb#19541

* github.com:scylladb/scylladb:
  api: Fix indentation after previous patch
  api: Close output_stream on error
  api: Flush response output stream before closing
2024-06-30 19:33:16 +03:00
Avi Kivity
2fbc532e4d Update tools/python3 submodule
* tools/python3 3e833f1...18fa79e (1):
  > reloc: use `--add-rpath` and not `--set-rpath`
2024-06-30 19:31:23 +03:00
Kefu Chai
77d2d5821d build: cmake: do not mark cqlsh noarch
in 3c7af287, cqlsh's reloc package was marked as "noarch", and its
filename was updated accordingly in `configure.py`, so let's update
the CMake building system accordingly.

this change should address the build failure of

```
08:48:14  [3325/4124] Generating ../Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz
08:48:14  FAILED: Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz
08:48:14  cd /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist && /usr/bin/cmake -E copy /jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz
08:48:14  Error copying file "/jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz" to "/jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz".
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19544
2024-06-30 19:26:54 +03:00
Nadav Har'El
44e036c53c alternator: fix "/localnodes" to use broadcast_rpc_address
Alternator's non-standard "/localnodes" HTTP request returns a list of
live nodes on this DC, to consider for load balancing. The returned
node addresses should be external IP addresses usable by the clients.
Scylla has a configuration parameter - broadcast_rpc_address - which
defines for a node an external IP address. If such a configuration
exists, we need to use those external IP addresses, not the internal
ones.

Finding these broadcast_rpc_address of all nodes is easy, because the
gossiper already gossips them.

This patch also tests the new feature:
1. The existing single-node test is extended to verify that without
   broadcast_rpc_address we get the usual IP address.
2. A new two-node test is added to check that when broadcast_rpc_address
   is configured, we get that address and not the usual internal IP
   addresses.

Fixes #18711.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-06-30 18:38:15 +03:00
Nadav Har'El
2a2e8167c8 gossiper: fix get_rpc_address() for this node
Commit dd46a92e23 introduced a function gossiper::get_rpc_address()
as a shortcut for get_application_state_ptr(endpoint, RPC_ADDRESS) -
i.e., it fetches the endpoint's configured broadcast_rpc_address
(despite its confusing name, this is the endpoint's external IP address
that clients can use to make CQL connections).

But strangely, the implementation get_rpc_address() made an exception
for asking about the *current* host - where instead of getting this
node's broadcast_rpc_address, it returns its internal address, which
is not what this function was supposed to do - it's not useful for
it to do one thing for this node, and a different thing for other
nodes, and when I wrote code that uses this function (see the next
patch), this resulted in wrong results for the current node.

The fix is simple - drop the wrong if(), and get the
broadcast_rpc_address stored by the gossiper unconditionally - the
gossiper knows it for this node just like for other nodes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-06-30 18:38:15 +03:00
Gleb Natapov
3f136cf2eb test: add test that checks that local address cannot expire between join request placemen and its processing 2024-06-30 15:52:23 +03:00
Gleb Natapov
5d8f08c0d7 storage_service: make node's entry non expiring in raft address map
Local address map entry should never expire in the address map.
2024-06-30 15:08:50 +03:00
Kefu Chai
947e28146d dbuild: pass --tty when running in interactive mode
podman does not allocate a tty by default, so without `-t` or `--tty`,
one cannot use a functional terminal when interacting with the
container. that what one can expect when running `dbuild -i --`, and
we are greeted with :

```
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
```

after this change, one can enjoy the good-old terminal as usual
after being dropped to the container provided by `dbuild -i --`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19550
2024-06-30 12:06:55 +03:00
Pavel Emelyanov
d034cde01f Merge 'build: update C++ standard to C++23' from Avi Kivity
Switch the C++ standard from C++20 to C++23. This is straightforward, but there are a few
fallouts (mostly due to std::unique_ptr that became constexpr) that need to be fixed first.

Internal enhancement - no backport required

Closes scylladb/scylladb#19528

* github.com:scylladb/scylladb:
  build: switch to C++23
  config: avoid binding an lvalue reference to an rvalue reference
  readers: define query::partition_slice before using it in default argument
  test: define table_for_tests earlier
  compaction: define compaction_group::table_state earlier
  compaction: compaction_group: define destructor out-of-line
  compaction_manager: define compaction_manager::strategy_control earlier
2024-06-28 18:02:33 +03:00
Avi Kivity
cf66f233aa build: remove aarch64 workarounds
In 90a6c3bd7a ("build: reduce release mode inline tuning on aarch64") we
reduced inlining on aarch64, due to miscompiles.

In 224a2877b9 ("build: disable -Og in debug mode to avoid coroutine
asan breakage") we disabled optimization in debug mode, due to miscompiles.

With clang 18.1, it appears the miscompiles are gone, and we can remove
the two workarounds.

Closes scylladb/scylladb#19531
2024-06-28 17:53:51 +03:00
Pavel Emelyanov
b4f9387a9d api: Close response stream of get_compaction_history()
The function must close the stream even if it throws along the way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-28 16:56:53 +03:00
Pavel Emelyanov
6d4ba98796 api: Flush output stream in get_compaction_history() call
It's currently implicitly flushed on its close, but in that case close
can throw while flusing. Next patch wants close not to throw and that's
possible if flushing the stream in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-28 16:55:58 +03:00
Pavel Emelyanov
acb351f4ee api: Coroutinize get_compaction_history inner function
The handler returns a function which is then invoked with output_stream
argument to render the json into. This function is converted into
coroutine. It has yet another inner lambda that's passed into
compaction_manager::get_compaction_history() as consumer lambda. It's
coroutinized too.

The indentation looks weird as preparation for future patching.
Hopefullly it's still possible to understand what's going on.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-28 16:53:46 +03:00
Pavel Emelyanov
1be8b2fd25 api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-28 16:07:21 +03:00
Pavel Emelyanov
986a04cb11 api: Close response stream on error
The handler's lambda is called with && stream object and must close the
stream on its own regardless of what.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-28 16:06:41 +03:00
Pavel Emelyanov
4897d8f145 api: Flush response output stream before closing
The .close() method flushes the stream, but it may throw doing it. Next
patch will want .close() not to throw, for that stream must be flushed
explicitly before closing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-28 16:05:20 +03:00
Pavel Emelyanov
1839030e3b api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-28 15:41:12 +03:00
Pavel Emelyanov
a0c1552cea api: Close output_stream on error
If the get_snapshot_details() lambda throws, the output stream remains
non-closed which is bad. Close it regardless of what.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-28 15:40:42 +03:00
Pavel Emelyanov
d1fd886608 api: Flush response output stream before closing
Otherwise close() may throw and this is what next patch will want not to
happen.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-28 15:40:00 +03:00
Piotr Dulikowski
f00c4eaf72 Merge '[test.py] add --extra-scylla-cmdline-options argument for test.py' from Artsiom Mishuta
this PR has 2 commits
- [test: pass Scylla extra CMD args from test.py args](6b367a04b5)
- [test: adjust scylla_cluster.merge_cmdline_options behavior](c60b36090a)

the main goal is to solve [test.py: provide an easy-to-remember, univeral way to run scylla with trace level logging](https://github.com/scylladb/scylladb/issues/14960) issue

but also can be used to easily apply additional arguments for all UnitTests and PythonTests on the fly from the test.py CMD

Closes scylladb/scylladb#19509

* github.com:scylladb/scylladb:
  test: adjust scylla_cluster.merge_cmdline_options behavior
  test: pass scylla extra CMD args from test.py args
2024-06-28 11:11:29 +02:00
Kamil Braun
6ec8143e56 Merge 'Remove dead code from migration_manager and schema_tables' from Benny Halevy
This short series removed some ancient legacy code from
migration_manager and schema_tables, before I make further changes in this area.

We have more such code under the cql3 hierarchy but it can be dealt with as a follow up.

No backport required

Closes scylladb/scylladb#19530

* github.com:scylladb/scylladb:
  schema_tables: remove dead code
  migration_manager: remove dead code
2024-06-28 10:59:21 +02:00
Piotr Smaron
88eda47f13 cql: forbid switching from tablets to vnodes in ALTER KS
This check is already in place, but isn't fully working, i.e.
switching from a vnode KS to a tablets KS is not allowed, but
this check doesn't work in the other direction. To fix the
latter, `ks_prop_defs::get_initial_tablets()` has been changed
to handle 3 states: (1) init_tablets is set, (2) it was skipped,
(3) tablets are disabled. These couldn't fit into std::optional,
so a new local struct to hold these states has been introduced.
Callers of this function have been adjusted to set init_tablets
to an appropriate value according to the circumstances, i.e. if
tablets are globally enabled, but have been skipped in the CQL,
init_tablets is automatically set to 0, but if someone executes
ALTER KS and doesn't provide tablets options, they're inherited
from the old KS.
I tried various approaches and this one resulted in the least
lines of code changed. I also provided testcases to explain how
the code behaves.

Fixes: #18795

Closes scylladb/scylladb#19368
2024-06-28 11:41:41 +03:00
Gleb Natapov
5c72af7a93 paxos: simplify paxos_state::prepare code to not work with raw futures 2024-06-28 07:30:45 +03:00
Gleb Natapov
2b7acdb32c paxos: co-routinize paxos_state::learn function 2024-06-28 07:30:45 +03:00
Gleb Natapov
6bf307ffe8 paxos: remove no longer used with_locked_key functions 2024-06-28 07:30:45 +03:00
Gleb Natapov
887a5a8f62 paxos: co-routinize paxos_state::accept function 2024-06-28 07:30:45 +03:00
Benny Halevy
b7f00ba4bf schema_tables: remove dead code
Well, even after 10 years, the c++ compilers still
do not compile Java...

And having that legacy code laying around
not only it doesn't help anyone understand what's
going on, but on the contrary, it's confusing and distracting.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-27 20:34:02 +03:00
Benny Halevy
5f6c411656 migration_manager: remove dead code
Well, even after 10 years, the c++ compilers still
do not compile Java...

And having that legacy code laying around
not only it doesn't help anyone understand what's
going on, but on the contrary, it's confusing and distracting.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-27 20:30:33 +03:00
Avi Kivity
4d85db9f39 build: switch to C++23
Set the C++ dialect to C++23, allowing us to use the new features.
2024-06-27 19:36:13 +03:00
Avi Kivity
d14eec8160 config: avoid binding an lvalue reference to an rvalue reference
config_file::add_deprecated_options() returns an lvalue reference
to a parameter which itself is an rvalue reference. In C++20 this
is bad practice (but not a bug in this case) as rvalue references
are not expected to live past the call. In C++23, it fails to compile.

Fix by accepting an lvalue reference for the parameter, and adjust the
caller.
2024-06-27 19:36:13 +03:00
Avi Kivity
ed816afac4 readers: define query::partition_slice before using it in default argument
C++23 made std::unique_ptr constexpr. A side effect of this (presumably)
is that the compiler compiles it more eagerly, requiring the full definition
of the class in std::make_unique, while it previously was content with
finding the definition later.

One victim of this change is the default argument of make_reversing_reader;
define it earlier (by including its header) to build with C++23.
2024-06-27 19:36:13 +03:00
Piotr Dulikowski
f9abe52d3b Merge 'test: auth: add random tag to resources in test_auth_v2_migration' from Marcin Maliszkiewicz
Those tests are sometimes failing on CI and we have two hypothesis:
1. Something wrong with consistency of statements
2. Interruption from another test run (e.g. same queries performed concurrently or data remained after previous run)

To exclude or confirm 2. we add random marker to avoid potential collision, in such case it will be clearly visible that wrong data comes from a different run.

Related scylladb/scylladb#18931
Related scylladb/scylladb#18319

backport: no, just a test fix

Closes scylladb/scylladb#19484

* github.com:scylladb/scylladb:
  test: auth: add random tag to resources in test_auth_v2_migration
  test: extend unique_name with random sufix
2024-06-27 17:35:14 +02:00
Gleb Natapov
58912c2cc1 paxos: co-routinize paxos_state::prepare function 2024-06-27 18:10:49 +03:00
Gleb Natapov
4f546b8b79 paxos: introduce get_replica_lock() function to take RAII guard for local paxos table access 2024-06-27 18:09:30 +03:00
Avi Kivity
e5807555bd test: define table_for_tests earlier
C++23 made std::unique_ptr constexpr. A side effect of this (presumably)
is that the compiler compiles it more eagerly, requiring the full definition
of the class in std::make_unique, while it previously was content with
finding the definition later.

One victim of this change is table_for_tests; define it earlier to
build with C++23.
2024-06-27 17:54:12 +03:00
Avi Kivity
d5ba0b4041 compaction: define compaction_group::table_state earlier
C++23 made std::unique_ptr constexpr. A side effect of this (presumably)
is that the compiler compiles it more eagerly, requiring the full definition
of the class in std::make_unique, while it previously was content with
finding the definition later.

One victim of this change is compaction_group::table_state; define
it earlier to build with C++23.
2024-06-27 17:54:12 +03:00
Avi Kivity
9ecf4ada49 compaction: compaction_group: define destructor out-of-line
Define compaction_group::~compaction_group() out-of-line to prevent
problems instantiating compaction_group::_table_state, which is an
std::unique_ptr. In C++23, std::unique_ptr is constexpr, which means
its methods (in this case the destructor) require seeing the definition
of the class at the point of instantiation.
2024-06-27 17:54:12 +03:00
Avi Kivity
050e7bbd64 compaction_manager: define compaction_manager::strategy_control earlier
C++23 made std::unique_ptr constexpr. A side effect of this (presumably)
is that the compiler compiles it more eagerly, requiring the full definition
of the class in std::make_unique, while it previously was content with
finding the definition later.

One victim of this change is compaction_manager::strategy_control; define
it earlier to build with C++23.
2024-06-27 17:54:12 +03:00
Andrei Chekun
561e88f00e [test.py] Throw meaningful error when something wrong wit Scylla binary
Fixes: https://github.com/scylladb/scylladb/issues/19489

There is already a check that Scylla binary is executable, but it's done on later stage. So in logs for specific test file there will be a message about something wrong with binary, but in console there will be now signs of that. Moreover, there will be an error that completely misleads what actually happened and why test run failed. With this check test will fail earlier providing the correct reason why it's failed

Closes scylladb/scylladb#19491
2024-06-27 17:38:32 +03:00
Avi Kivity
581d619572 storage_proxy: trace speculative retries
A speculative retry can appear out of the blue[1] and confuse people, as
it looks like the consistency level was elevated. Fix by adding such
a tracepoint.

Sample output:

```
 activity                                                                                                                                    | timestamp                  | source    | source_elapsed | client
---------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                          Execute CQL3 query | 2024-06-27 14:25:58.947000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                               Parsing a statement [shard 0] | 2024-06-27 14:25:58.947918 | 127.0.0.1 |              2 | 127.0.0.1
                                                                          Processing a statement for authenticated user: anonymous [shard 0] | 2024-06-27 14:25:58.948025 | 127.0.0.1 |            108 | 127.0.0.1
 Creating read executor for token -4069959284402364209 with all: [127.0.0.1, 127.0.0.2] targets: [127.0.0.2] repair decision: NONE [shard 0] | 2024-06-27 14:25:58.948125 | 127.0.0.1 |            209 | 127.0.0.1
                                                                                 Added extra target 127.0.0.1 for speculative read [shard 0] | 2024-06-27 14:25:58.948128 | 127.0.0.1 |            212 | 127.0.0.1
                                                                                                Creating speculating_read_executor [shard 0] | 2024-06-27 14:25:58.948129 | 127.0.0.1 |            213 | 127.0.0.1
                                                                                        read_data: sending a message to /127.0.0.2 [shard 0] | 2024-06-27 14:25:58.948138 | 127.0.0.1 |            222 | 127.0.0.1
                                                                                              Launching speculative retry for data [shard 0] | 2024-06-27 14:25:58.948234 | 127.0.0.1 |            318 | 127.0.0.1
                                                                                                       read_data: querying locally [shard 0] | 2024-06-27 14:25:58.948235 | 127.0.0.1 |            319 | 127.0.0.1
                                                          Start querying singular range {{-4069959284402364209, pk{000400000001}}} [shard 0] | 2024-06-27 14:25:58.948246 | 127.0.0.1 |            330 | 127.0.0.1
                                                                          [reader concurrency semaphore user] admitted immediately [shard 0] | 2024-06-27 14:25:58.948250 | 127.0.0.1 |            334 | 127.0.0.1
                                                                                [reader concurrency semaphore user] executing read [shard 0] | 2024-06-27 14:25:58.948258 | 127.0.0.1 |            342 | 127.0.0.1
                                      Querying cache for range {{-4069959284402364209, pk{000400000001}}} and slice [(-inf, +inf)] [shard 0] | 2024-06-27 14:25:58.948281 | 127.0.0.1 |            365 | 127.0.0.1
       Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2024-06-27 14:25:58.948311 | 127.0.0.1 |            395 | 127.0.0.1
                                                                                                                  Querying is done [shard 0] | 2024-06-27 14:25:58.948320 | 127.0.0.1 |            404 | 127.0.0.1
                                                                                       read_data: message received from /127.0.0.1 [shard 0] | 2024-06-27 14:25:58.948351 | 127.0.0.2 |             12 | 127.0.0.1
                                                                                              Done processing - preparing a result [shard 0] | 2024-06-27 14:25:58.948354 | 127.0.0.1 |            438 | 127.0.0.1
                                                          Start querying singular range {{-4069959284402364209, pk{000400000001}}} [shard 0] | 2024-06-27 14:25:58.948370 | 127.0.0.2 |             31 | 127.0.0.1
                                                                          [reader concurrency semaphore user] admitted immediately [shard 0] | 2024-06-27 14:25:58.948374 | 127.0.0.2 |             35 | 127.0.0.1
                                                                                [reader concurrency semaphore user] executing read [shard 0] | 2024-06-27 14:25:58.948388 | 127.0.0.2 |             49 | 127.0.0.1
                                      Querying cache for range {{-4069959284402364209, pk{000400000001}}} and slice [(-inf, +inf)] [shard 0] | 2024-06-27 14:25:58.948405 | 127.0.0.2 |             66 | 127.0.0.1
       Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] | 2024-06-27 14:25:58.948424 | 127.0.0.2 |             85 | 127.0.0.1
                                                                                                                  Querying is done [shard 0] | 2024-06-27 14:25:58.948430 | 127.0.0.2 |             91 | 127.0.0.1
                                                                      read_data handling is done, sending a response to /127.0.0.1 [shard 0] | 2024-06-27 14:25:58.948436 | 127.0.0.2 |             97 | 127.0.0.1
                                                                                           read_data: got response from /127.0.0.2 [shard 0] | 2024-06-27 14:25:58.949140 | 127.0.0.1 |           1224 | 127.0.0.1
                                                                                                                            Request complete | 2024-06-27 14:25:58.947449 | 127.0.0.1 |            449 | 127.0.0.1
```

Ref #18988

[1] not completely out of the blue, ff29f430 indicates that a speculative read
    *can* happen.

Closes scylladb/scylladb#19520
2024-06-27 17:37:36 +03:00
Botond Dénes
b4f3809ad2 test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrenc
Please enter the commit message for your changes. Lines starting
2024-06-27 09:57:11 -04:00
Botond Dénes
9cbdd8ef92 test/boost/reader_concurrency_semaphore_test: hoist require_can_admit
This is currently a lambda in a test, hoist it into the global scope and
make it into a function, so other tests can use it too (in the next
patch).
2024-06-27 09:57:11 -04:00
Botond Dénes
07c0a8a6f8 reader_concurrency_semaphore: wire in the configurable cpu concurrency
Before this patch, the semaphore was hard-wired to stop admission, if
there is even a single permit, which is in the need_cpu state.
Therefore, keeping the CPU concurrency at 1.
This patch makes use of the new cpu_concurrency parameter, which was
wired in in the last patches, allowing for a configurable amount of
concurrent need_cpu permits. This is to address workloads where some
small subset of reads are expected to be slow, and can hold up faster
reads behind them in the semaphore queue.
2024-06-27 09:57:11 -04:00
Botond Dénes
59faa6d4ff reader_concurrency_semaphore: add cpu_concurrency constructor parameter
In the case of the user semaphore, this receives the new
reader_concurrency_semaphore_cpu_limit config item.
Not used yet.
2024-06-27 09:57:11 -04:00
Benny Halevy
7f05f95ec4 conf: scylla.yaml: enable_tablets: expand documentation
The exiting documentation comment for `enable_tablets`
is very terse and lacks details about the effect of enabling
or disabling tablets.

This change adds more details about the impact of `enable_tablets`
on newly created keyspaces, and hot to disable tablets when
keyspaces are created.

Also, a note was added to warn about the irreversibility
of the tablets enablement per keyspace.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-27 14:41:43 +03:00
Avi Kivity
0d23b8165e build: update frozen toolchain to Fedora 40 with clang 18.1.6
This refreshes our dependencies to a supported distribution.

Closes scylladb/scylladb#19205
2024-06-27 14:27:21 +03:00
Yaron Kaikov
efa94b06c2 .github/scripts/label_promoted_commits.py: fix adding labels when PR is closed
`prs = response.json().get("items", [])` will return empty when there are no merged PRs, and this will just skip the all-label replacement process.

This is a regression following the work done in #19442

Adding another part to handle closed PRs (which is the majority of the cases we have in Scylla core)

Fixes: https://github.com/scylladb/scylladb/issues/19441

Closes scylladb/scylladb#19497
2024-06-27 14:00:44 +03:00
Pavel Emelyanov
6c1e5c248f main,proxy: Drain proxy in its stop_remote
Currently proxy initialization is pretty disperse, in particular it's
stopped in several steps -- first drain_on_shutdown() then
stop_remote(). In between there's nothing that needs proxy in any
particular sate, so those two steps can be merged into one.

refs: scylladb/scylladb#2737

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19344
2024-06-27 12:26:51 +02:00
Pavel Emelyanov
1a219c674c s3/client: Always retry http requests
Real S3 server is known to actively close connections, thus breaking S3
storage backend at random places. The recent http client update is more
robust against that, but the needed feature is OFF by default.

refs: scylladb/seastar#1883

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19461
2024-06-27 13:14:24 +03:00
Artsiom Mishuta
919d44e0c7 test: adjust scylla_cluster.merge_cmdline_options behavior
adjust merge_cmdline_options behaviour to
append --logger-log-level option instead of merge

this behaviour can be changed(if needed)
to previour version(all merge):
merge_cmdline_options(list1, list2, appending_options=[])

or, to append different cmd options:
merge_cmdline_options(list1, list2, appending_options=[option1,option2])
2024-06-27 10:03:31 +02:00
Artsiom Mishuta
440785bc41 test: pass scylla extra CMD args from test.py args
this commit introduces a test.py option --extra-scylla-cmdline-options
to pass extra scylla cmdline options for all tests.

Options should be space-separated:
'--logger-log-level raft=trace --default-log-level error'
2024-06-27 10:02:55 +02:00
Artsiom Mishuta
677173bf8b test: generate core dumps on crashes in nodetool tests
The nodetool tests does not set the asan/ubsan options
to abort on error and create core dumps

Fix by setting the environment variables in nodetool tests.

Closes scylladb/scylladb#19503
2024-06-27 10:44:33 +03:00
Marcin Maliszkiewicz
b708c5701f test: auth: add random tag to resources in test_auth_v2_migration
Those tests are sometimes failing on CI and we have two hypothesis:
1. Something wrong with consistency of statements
2. Interruption from another test run (e.g. same queries performed
  concurrently or data remained after previous run)

To exclude or confirm 2. we add random marker to avoid potential collision,
in such case it will be clearly visible that wrong data comes from
a different run.

Related scylladb/scylladb#18931
Related scylladb/scylladb#18319
2024-06-27 09:28:27 +02:00
Marcin Maliszkiewicz
d08a80b34f test: extend unique_name with random sufix
This reduces collision risk in an unlikely
and incorrect setup where tests would be
run concurrently by multiple processes.
2024-06-27 09:28:02 +02:00
Anna Stuchlik
e2994a19d5 doc: update Scylla Doctor installation
This commit updates the instuctions on how to download and run Scylla Doctor,
following the changes in how Scylla Doctor is released.

Closes scylladb/scylladb#19510
2024-06-27 10:22:08 +03:00
Botond Dénes
2fe50cda22 Merge 'chunked_vector enhancements' from Benny Halevy
This short series enhances utils::chunked_vector so it could be used more easily to convert dht::partition_range_vector to chunked_vector, for example.

- utils: chunked_vector: document invalidation of iterators on move
- utils: chunked_vector: add ctor from std::initializer_list
- utils: chunked_vector: add ctor from a single value

No backport required

Closes scylladb/scylladb#19462

* github.com:scylladb/scylladb:
  chunked_vector_test: add tests for value-initialization constructor
  utils: chunked_vector: add ctor from std::initializer_list
  utils: chunked_vector: document invalidation of iterators on move
2024-06-27 10:20:47 +03:00
Benny Halevy
92f8d219b3 conf: scylla.yaml: remove tablets from experimental_features doc comment
tablets are no longer in experimental_features
since 83d491af02.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-27 08:55:30 +03:00
Anna Stuchlik
072542a5cc doc: add a page with ScyllaDB limits
This commit adds a page listing the ScyllDB limits
we know today.
The page can and should be extended when other limits
are confirmed.

Closes scylladb/scylladb#19399
2024-06-27 08:28:51 +03:00
Kefu Chai
52f1168a3d repair: remove unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19508
2024-06-26 21:57:03 +03:00
Israel Fruchter
3c7af28725 cqlsh: update cqlsh submodule
this change updates the cqlsh submodule:

* tools/cqlsh/ ba83aea3...73bdbeb0 (4):
  > install.sh: replace tab with spaces
  > define the the debug packge is empty
  > tests: switch from using cqlsh bash to the test the python file
  > package python driver as wheels

it also includes follow change to package cqlsh as a regular
rpm instead of as a "noarch" rpm:

so far cqlsh bundles the python-driver in, but only as source.
meaning the package wasn't architecture, and also didn't
have the libev eventloop compiled in.

Since from python 3.12 and up, that would mean we would
fallback into asyncio eventloop (which still exprimental)
or into error (once we'll sync with the driver upstream)

so to avoid those, we are change the packaging of cqlsh
to be architecture specific, and get cqlsh compiled, and bundle
all of it's requirements as per architecture installed bundle of wheels.
using `shiv`, i.e. one file virtualenv that we'll be packing
into our artifacts

Ref: https://github.com/scylladb/scylla-cqlsh/issues/90
Ref: https://github.com/scylladb/scylla-cqlsh/pull/91
Ref: https://github.com/linkedin/shiv

Closes scylladb/scylladb#19385

* tools/cqlsh ba83aea...242876c (1):
  > Merge 'package python driver as wheels' from Israel Fruchter

Update tools/cqlsh/ submodule

in which, the change of `define the the debug packge is empty`
should address the build failure like

```
Processing files: scylla-cqlsh-debugsource-6.1.0~dev-0.20240624.c7748f60c0bc.aarch64
error: Empty %files file /jenkins/workspace/scylla-master/next/scylla/tools/cqlsh/build/redhat/BUILD/scylla-cqlsh/debugsourcefiles.list
RPM build errors:
    Empty %files file /jenkins/workspace/scylla-master/next/scylla/tools/cqlsh/build/redhat/BUILD/scylla-cqlsh/debugsourcefiles.list
```

Closes scylladb/scylladb#19473
2024-06-26 12:07:21 +03:00
Botond Dénes
1fca341514 test/topology_custom/test_repair: add test for enable_tombstone_gc_for_streaming_and_repair 2024-06-26 04:05:17 -04:00
Botond Dénes
d3b1ccd03a replica/table: maybe_compact_for_streaming(): toggle tombstone GC based on the control flag
Now enable_tombstone_gc_for_streaming_and_repair is wired in all the way
to maybe_compact_for_streaming(), so we can implement the toggling of
tombstone GC based on it.
2024-06-26 04:05:17 -04:00
Botond Dénes
415457be2b replica: propagate enable_tombstone_gc_for_streaming_and_repair to maybe_compact_for_streaming()
Just wiring, the new flag will be used in the next patch.
2024-06-26 04:05:17 -04:00
Botond Dénes
d5a149fc01 db/config: introduce enable_tombstone_gc_for_streaming_and_repair
To control whether the compacting reader (if enabled) for streaming and
repair can garbage-collect tombstones.
Default is false (previous behaviour).
Not wired yet.
2024-06-26 04:05:17 -04:00
Pavel Emelyanov
263668bc85 transport: Use sharded<>::invoke_on_others()
When preparing statement, the server code first does it on non-local
shards, then on local one. The former call is done the hard way, while
there's a short sugar sharded<> class method doing it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19485
2024-06-25 22:17:59 +03:00
Kamil Braun
13fc2bd854 Merge notify other nodes on boot from Gleb
The series adds a step during node's boot process, just before completing
the initialization, in which the node sends a notification to all other
normal nodes in the cluster that it is UP now. Other nodes wait for this
node to be UP and in normal state before replying. This ensures that,
in a healthy cluster, when a node start serving queries the entire
cluster knows its up-to-date state. The notification is a best effort
though. If some nodes are down or do not reply in time the boot process
continues. It is somewhat similar to shutdown notification in this regard.

* 'gleb/notify-up-v2' of github.com:scylladb/scylla-dev:
  gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization
  Wait for booting node to be marked UP before complete booting.
  gossiper: move gossip verbs to the idl
2024-06-25 17:58:17 +02:00
Aleksandra Martyniuk
2394e3ee7a repair: drop timeout from table_sync_and_check
Delete 10s timeout from read barrier in table_sync_and_check,
so that the function always considers all previous group0 changes.

Fixes: #18490.

Closes scylladb/scylladb#18752
2024-06-25 17:44:31 +02:00
Avi Kivity
c80dc57156 Merge 'batchlog replay: bypass tombstones generated by past replays' from Botond Dénes
The `system.batchlog` table has a partition for each batch that failed to complete. After finally applying the batch, the partition is deleted. Although the table has gc_grace_second = 0, tombstones can still accumulate in memory, because we don't purge partition tombstones from either the memtable or the cache. This can lead to the cache and memtable of this table to accumulate many thousands of even millions of tombstones, making batchlog replay very slow. We didn't notice this before, because we would only replay all failed batches on unbootstrap, which is rare and a heavy and slow operation on its own right already.
With repair-based tombstone-gc however, we do a full batchlog replay at the beginning of each repair, and now this extra delay is noticeable.
Fix this by making sure batchlog replays don't have to scan through all the tombstones generated by previous replays:
* flush the `system.batchlog` memtable at the end of each batchlog replay, so it is cleared of tombstones
* bypass the cache

Fixes: https://github.com/scylladb/scylladb/issues/19376

Although this is not a regression -- replay was like this since forever -- now that repair calls into batchlog replay, every release which uses repair-based tombstone-gc should get this fix

Closes scylladb/scylladb#19377

* github.com:scylladb/scylladb:
  db/batchlog_manager: bypass cache when scanning batchlog table
  db/batchlog_manager: replace open-coded paging with internal one
  db/batchlog_manager: implement cleanup after all batchlog replay
  cql3/query_processor: for_each_cql_result(): move func to the coro frame
2024-06-25 16:11:01 +03:00
Avi Kivity
371e37924f Merge 'Rebuild bloom filters that have bad partition estimates' from Lakshmi Narayanan Sreethar
The bloom filters are built with partition estimates because the actual
partition count might not be available in all cases. If the estimate is
inaccurate, the bloom filters might end up being too large or too small
compared to their optimal sizes. This PR rebuilds bloom filters with
inaccurate partition estimates using the actual partition count before
the filter is written to disk. A bloom filter is considered to have an
inaccurate estimate if its false positive rate based on the current
bitmap size is either less than 75% or more than 125% of the configured
false positive rate.

Fixes #19049

A manual test was run to check the impact of rebuild on compaction.

Table definition used : CREATE TABLE scylla_bench.simple_table (id int PRIMARY KEY);

Setup : 3 billion random rows with id in the range [0, 1e8) were inserted as batches of 5 rows into scylla_bench.simple_table via 80 threads.

Compaction statistics :

scylla_bench.simple_table :
(a) Total number of compactions : `1501`
(b) Total time spent in compaction : `9h58m47.269s`
(c) Number of compactions which rebuilt bloom filters : `16`
(d) Total time taken by these 16 compactions which rebuilt bloom filters : `2h55m11.89s`
(e) Total time spent by these 16 compactions to rebuild bloom filters : `8m6.221s` which is
- `4.63%` of the total time taken by the compactions which rebuilt filters (d)
- `1.35%` of the total compaction time (b).

(f) Total bytes saved by rebuilding filters : `388 MB`

system.compaction_history :
(a) Total number of compactions : `77`
(b) Total time spent in compaction : `21.24s`
(c) Number of compactions which rebuilt bloom filters : `74`
(d) Time taken by these 74 compactions which rebuilt bloom filters : `20.48s`
(e) Time spent by these 74 compactions to rebuild bloom filters : `377ms` which is
- `1.84%` of the total time taken by the compactions which rebuilt filters (d)
- `1.77%` of the total compaction time (b).

(f) Total bytes saved by rebuilding filters : `20 kB`

The following tables also had compactions and the bloom filter was rebuilt in all those compactions.
However, the time taken for every rebuild was observed as 0ms from the logs as it completed within a microsecond :

system.raft :
(a) Total number of compactions : `2`
(b) Total time spent in compaction : `106ms`
(c) Total bytes saved by rebuilding filters : `960 B`

system_schema.tables :
(a) Total number of compactions : `1`
(b) Total time spent in compaction : `25ms`
(c) Total bytes saved by rebuilding filter : `312 B`

system.topology :
(a) Total number of compactions : `1`
(b) Total time spent in compaction : `25ms`
(c) Total bytes saved by rebuilding filter : `320 B`

Closes scylladb/scylladb#19190

* github.com:scylladb/scylladb:
  bloom_filter_test: add testcase to verify filter rebuilds
  test/boost: move bloom filter tests from sstable_datafile_test into a new file
  sstables/mx/writer: rebuild bloom filters with bad partition estimates
  sstables/mx/writer: add variable to track number of partitions consumed
  sstable: introduce sstable::maybe_rebuild_filter_from_index()
  sstable: add method to return filter format for the given sstable version
  utils/i_filter: introduce get_filter_size()
2024-06-25 15:35:09 +03:00
Nadav Har'El
35ace0af5c Merge 'Move some /storage_proxy API endpoints to config.cc' from Pavel Emelyanov
API endpoints that need a particular service to get data from are registered next to this service (#2737). In /storage_proxy function there live some endpoints that work with config, so this PR moves them to the existing config.cc with config-related endpoints. The path these endpoints are registered with remains intact, so some tweak in proxy API registration is also here.

Closes scylladb/scylladb#19417

* github.com:scylladb/scylladb:
  api: Use provided db::config, not the one from ctx
  api: Move some config endpoints from proxy to config
  api: Split storage_proxy api registration
  api: Unset config endpoints
2024-06-25 13:55:58 +03:00
Michał Chojnowski
c7dc3b9b58 scylla-gdb.py: add line information to coroutine names in scylla fiber
For convenience.

Note that this line info only points to the function as a whole, not to the
current suspend point.
I think there's no facility for converting the `__coro_index` to the current suspend point automatically.

Before:

```
(gdb) scylla fiber seastar::local_engine->_current_task
[shard  1] #0  (task*) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16  (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] )
[shard  1] #1  (task*) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16  (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] )
[shard  1] #2  (task*) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16  (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] )
```

After:

```
(gdb) scylla fiber seastar::local_engine->_current_task
[shard  1] #0  (task*) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16  (sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) at sstables/sstables.cc:352)
[shard  1] #1  (task*) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16  (sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) at sstables/sstables.cc:570)
[shard  1] #2  (task*) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16  (sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const at sstables/sstables.cc:992)

```

Closes scylladb/scylladb#19478
2024-06-25 13:55:10 +03:00
Kefu Chai
def432617d docs: print out invalid branch name
to help user to understand what the extension is expecting.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19477
2024-06-25 13:17:25 +03:00
Botond Dénes
31c0fa07d8 db/batchlog_manager: bypass cache when scanning batchlog table
Scans should not pollute the cache with cold data, in general. In the
case of the batchlog table, there is another reason to bypass the cache:
this table can have a lot of partition tombstones, which currently are
not purged from the cache. So in certain cases, using the cache can make
batch replay very slow, because it has to scan past tombstones of
already replayed batches.
2024-06-25 06:15:47 -04:00
Botond Dénes
29f610d861 db/batchlog_manager: replace open-coded paging with internal one
query_processor has built-in paging support, no need to open-code paging
in batchlog manager code.
2024-06-25 06:15:47 -04:00
Botond Dénes
2dd057c96d db/batchlog_manager: implement cleanup after all batchlog replay
We have a commented code snippet from Origin with cleanup and a FIXME to
implement it. Origin flushes the memtables and kicks a compaction. We
only implement the flush here -- the flush will trigger a compaction
check and we leave it up to the compaction manager to decide when a
compaction is worthwhile.
This method used to be called only from unbootstrap, so a cleanup was
not really needed. Now it is also called at the end of repair, if the
table is using repair-based tombstone-gc. If the memtable is filled with
tombstones, this can add a lot of time to the runtime of each repair. So
flush the memtable at the end, so the tombstones can be purged (they
aren't purged from memtables yet).
2024-06-25 06:15:47 -04:00
Botond Dénes
4e96e320b4 cql3/query_processor: for_each_cql_result(): move func to the coro frame
Said method has a func parameter (called just f), which it receives as
rvalue ref and just uses as a reference. This means that if caller
doesn't keep the func alive, for_each_cql_result() will run into
use-after-free after the first suspention point. This is unexpected for
callers, who don't expect to have to keep something alive, which they
passed in with std::move().
Adjust the signature to take a value instead, value parameters are moved
to the coro frame and survive suspention points.
Adjust internal callers (query_internal()) the same way.

There are no known vulnerable external callers.
2024-06-25 06:15:25 -04:00
Benny Halevy
3f23016cc0 perf-simple-query: add mean and standard deviation stats
Currently, perf-simple-query summarizes the statistics
only for the throughput, printing the median,
median absolute deviation, minimum, and maximum.
But the throughput put is typically highly variable
and its median is noisy.

This patch calculates also the mean and standard deviation
and does that also for instructions_per_op and cpu_cycles_per_op
to present a fuller picture of the performance metrics.

Output example:
```
random-seed=3383668492
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
95613.97 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42456 insns/op,   22117 cycles/op,        0 errors)
97538.45 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42454 insns/op,   22094 cycles/op,        0 errors)
95883.37 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42438 insns/op,   22268 cycles/op,        0 errors)
96791.45 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42433 insns/op,   22256 cycles/op,        0 errors)
97894.71 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42420 insns/op,   22010 cycles/op,        0 errors)

         throughput: mean=96744.39 standard-deviation=996.89 median=96791.45 median-absolute-deviation=861.02 maximum=97894.71 minimum=95613.97
instructions_per_op: mean=42440.08 standard-deviation=14.99 median=42437.59 median-absolute-deviation=13.58 maximum=42456.15 minimum=42420.10
  cpu_cycles_per_op: mean=22148.98 standard-deviation=110.43 median=22117.04 median-absolute-deviation=106.89 maximum=22267.70 minimum=22010.42
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#19450
2024-06-25 12:25:59 +03:00
Yaron Kaikov
394cba3e4b .github/workflow: close and replace label when backport promoted
Today after Mergify opened a Backport PR, it will stay open until someone manually close the backport PR , also we can't track using labels which backport was done or not since there is no indication for that except digging into the PR and looking for a comment or a commit ref

The following changes were made in this PR:
* trigger add-label-when-promoted.yaml also when the push was made to `branch-x.y`
* Replace label `backport/x.y` with `backport/x.y-done` in the original PR (this will automatically update the original Issue as well)
* Add a comment on the backport PR and close it

Fixes: https://github.com/scylladb/scylladb/issues/19441

Closes scylladb/scylladb#19442
2024-06-25 12:11:28 +03:00
Benny Halevy
8daf755f8a statement_restrictions: partition_ranges_from_singles: no need to default-initialize result
Currently, the returned `ranges` vector is first initialized
to `product_size` and then the returned partition ranges are
copied into it.

Instead, we can simply reserve the vector capacity,
without initializing it, and then emplace all partition ranges
onto it using std::back_inserter.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#19457
2024-06-25 12:11:28 +03:00
Laszlo Ersek
656a9468bb HACKING.md: fix typo in "--overprovisioned" option name
Grepped the tree for "--overprovisioned" (coming from
<https://university.scylladb.com/courses/scylla-essentials-overview/lessons/high-availability/topic/consistency-level-demo-part-1/>),
and noticed that this instance was *not* matched by grep (while another
one just below was).

Fixes: 4f838a82e2
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>

Closes scylladb/scylladb#19458
2024-06-25 12:11:28 +03:00
Kefu Chai
adca415245 bytes: drop unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

the callers in alternator/streams.cc is updated to use `fmt::print()`
to format the `bytes` instances.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19448
2024-06-25 12:11:28 +03:00
Kefu Chai
94e36d4af4 auth: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

this change addresses the leftover of 850ee7e170a.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19467
2024-06-25 12:11:28 +03:00
Benny Halevy
378578b481 chunked_vector_test: add tests for value-initialization constructor
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-25 12:08:11 +03:00
Benny Halevy
5bd2ee7507 utils: chunked_vector: add ctor from std::initializer_list
Prepare for using utils::chunked_vector for
dht::partition_range_vector

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-25 12:08:06 +03:00
Benny Halevy
7780af2e84 utils: chunked_vector: document invalidation of iterators on move
chunked_vector differs from std::vector where
the latter's move constructor is required to preserve
and iterators to the moved-from vector.
In contrast, chunked_vector::iterator keeps a pointer
to the chunked_vector::_chunks data, which is
a utils::small_vector, and when moved, it might
invalidate the iterator since the moved-to _chunks
might copy the contents of the internal capacity
rather than moving the allocated capacity.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-25 11:44:50 +03:00
Botond Dénes
c7317be09a db/config: introduce reader_concurrency_semahore_cpu_concurrency
To allow increasing the semaphore's CPU concurrency, which is currently
hard-limited to 1. Not wired yet.
2024-06-25 04:00:11 -04:00
Piotr Dulikowski
85219e9294 configure.py: fix the 'configure' rule generated during regeneration
The Ninja makefile (build.ninja) generated by the ./configure.py script
is smart enough to notice when the configure.py script is modified and
re-runs the script in order to regenerate itself. However, this
operation is currently not idempotent and quickly breaks because
information about the Ninja makefile's name is not passed properly.

This is the rule used for makefile's regeneration:

```
rule configure
  command = {python} configure.py --out={buildfile}.new $configure_args && mv {buildfile}.new {buildfile}
  generator = 1
  description = CONFIGURE $configure_args
```

The `buildfile` variable holds the value of the `--out` option which is
set to `build.ninja` if not provided explicitly.

Note that regenerating the makefile passes a name with the `.new` suffix
added to the end; we want to first write the file in full and then
overwrite the old file via a rename. However, notice that the script was
called with `--out=build.ninja.new`; the `configure` rule in the
regenerated file will have `configure.py --out=build.ninja.new.new` and
then `mv build.ninja.new.new build.ninja.new`. So, second regeneration
will just leave a build.ninja.new file which is not useful.

Fix this by introducing an additional parameter `--out-final-name`.
This parameter is only supposed to be used in the regeneration rule and
its purpose is to preserve information about the original file name.
After this change I no longer see `build.ninja.new` being created after
a sequence of `touch configure.py && ninja` calls.

Closes scylladb/scylladb#19428
2024-06-24 21:20:32 +03:00
Laszlo Ersek
a4c6ae688a install-dependencies.sh: set file mode creation mask to 0022
The docs [1] clearly say "install-dependencies.sh" should be run as
"root"; however, the script silently assumes that the umask inherited from
the calling environment is 0022. That's not necessarily the case, and
there's an argument to be made for "root" setting umask 0077 by default.
The script behaves unexpectedly under such circumstances; files and
directories it creates under /opt and /usr/local are then not accessible
to unprivileged users, leading to compilation failures later on.

Set the creation mask explicitly to 0022.

[1] https://github.com/scylladb/scylladb/blob/master/HACKING.md#dependencies

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>

Closes scylladb/scylladb#19464
2024-06-24 19:46:15 +03:00
Marcin Maliszkiewicz
a4e26585e5 git: add build.ninja.new to .gitignore
Since some time executing our ninja build
targets generates also build.ninja.new file.
Adding it to .gitignore for convenience as we
won't commit this file.

Closes scylladb/scylladb#19367
2024-06-24 16:48:50 +03:00
Kefu Chai
e61061d19f test.py: improve help message on tests selection
Since 3afbd21f, we are able to selectively choose a single test
in a boost test executable which represents a test suite, and to
choose a single test in a pytest script with the syntax of
"test_suite::test_case". it's very handy for manual testing.

so let's document in the command line help message as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19454
2024-06-24 14:27:02 +03:00
Kefu Chai
e9d8c25e86 alternator: define static variable
before this change, when linking an executable referencing `marker`,
we could have following error:
```
13:58:02  ld.lld: error: undefined symbol: alternator::event_id::marker
13:58:02  >>> referenced by streams.cc
13:58:02  >>>               build/dev/alternator/streams.o:(from_string_helper<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>, alternator::event_id>::Set(rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>&, alternator::event_id, rjson::internal::throwing_allocator&))
13:58:02  clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
```
it turns out `event_id::marker` is only declared, but never defined.
please note, the non-inline static member variable in its class
definition is not considered as a definition, see
[class.static.data](https://eel.is/c++draft/class.static.data#3)

> The declaration of a non-inline static data member in its class
> definition is not a definition and may be of an incomplete type
> other than cv void.

so, let's declare it as a `constexpr` instead. it implies `inline`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19452
2024-06-24 13:15:00 +03:00
Kefu Chai
af2b0b030b test/pylib: use raw string to avoid using escape sequence
before this change, when running test like:
```console
./test.py --mode release topology_experimental_raft/test_tablets
/home/kefu/dev/scylladb/test/pylib/scylla_cluster.py:333: SyntaxWarning: invalid escape sequence '\('
  deleted_sstable_re = f"^.*/{keyspace}/{table}-[0-9a-f]{{32}}/.* \(deleted\)$"
```
we could have the warning above. because `\(` is not a valid escape
sequence, but the Python interpreter accepts it as two separated
characters of `\(` after complaining. but it's still annoying.

so, let's use a raw string here, as we want to match "(deleted)".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19451
2024-06-24 11:11:44 +03:00
Lakshmi Narayanan Sreethar
a09556a49f bloom_filter_test: add testcase to verify filter rebuilds
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-24 12:11:37 +05:30
Lakshmi Narayanan Sreethar
4aa5698f0d test/boost: move bloom filter tests from sstable_datafile_test into a new file
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar
21e463b108 sstables/mx/writer: rebuild bloom filters with bad partition estimates
The bloom filters are built with partition estimates, as the actual
partition count might not be available in all the cases. If the estimate
was bad, the bloom filters might end up too large or too small than
their optimal sizes. Rebuild such bloom filters with actual partition
count before the filter is written to disk and the sstable is sealed.

Fixes #19049

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar
afc90657d6 sstables/mx/writer: add variable to track number of partitions consumed
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar
fccb1a11e5 sstable: introduce sstable::maybe_rebuild_filter_from_index()
Add method sstable::maybe_rebuild_filter_from_index() that rebuilds
bloom filters which had bad partition estimates when they were built.
The method checks the false positive rate based on the current bitset
size against the configured false positive rate to decide whether a
filter needs to be rebuilt. If the current false positive rate is within
75% to 125% of the configured false positive rate, the bloom filter will
not be rebuilt. Otherwise, the filter will be rebuilt from the index
entries. This method should only be called before an SSTable is sealed
as the bloom filter is updated in-place.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar
a7d77f6304 sstable: add method to return filter format for the given sstable version
Extract out the filter format computing logic from sstable::read_filter
into a separate function. This is done so that the subsequent patches
can make use of this function.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-24 12:06:01 +05:30
Botond Dénes
6dd6f0198e utils/i_filter: introduce get_filter_size()
Currently, the only way to get the size of a filter, for certain
parameters is to actually create one. This requires a seastar thread
context and potentially also allocates huge amount of memory.
Provdide a method which just calculates the size, without any of the
above mentioned baggage.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-24 12:06:01 +05:30
Kefu Chai
a230ecc4eb utils/murmur_hash: replace rotl64() with std::rotl()
since we are now able to use C++20, there is no need to use the
homebrew rotl64(). so in this change, we replace rotl64() with
std::rotl(), and remove the former from the source tree.

the underlying implementations of these two solutions are equivalent,
so no performance changes are expected. all caller sites have been
audited: all of them pass `uint64` as the first parameter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19447
2024-06-24 08:24:43 +03:00
Marcin Maliszkiewicz
794440eb85 test: skip checking default role in test_auth_v2_migration
Default role creation in auth-v1 is asynchronous and all nodes race to
create it so we'd need to delay the test and wait. Checking this particular
role doesn't bring much value to the test as we check other roles
to demonstrate correctness.

Fixes scylladb/scylladb#19039

Closes scylladb/scylladb#19424
2024-06-23 19:50:55 +03:00
Avi Kivity
0d52f0684a Merge 'Sanitize gossiper API endpoints management' from Pavel Emelyanov
Gossiper has two blocs of endpoints, both are registered in legacy/random place in main. This PR moves them next to gossiper start and adds unregistration for both.

refs: #2737

Closes scylladb/scylladb#19425

* github.com:scylladb/scylladb:
  api: Remove dedicated failure_detector registration method
  api: Move failure_detector endpoints set/unset to gossiper
  api: Unset failure detector endpoints method
  api: (Un)Register gossiper API in correct place
  api: Unset gossiper endpoints on stop
  asi: Coroutinize set_server_gossip()
2024-06-23 19:35:11 +03:00
Kefu Chai
850ee7e170 auth: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19429
2024-06-23 19:25:23 +03:00
Kefu Chai
72fdee1efb README.md: add badges for cron jobs
these jobs are scheduled to verify the builds of scylla, like
if it builds with the latest Seastar, if scylla can generated
reproducible builds, and if it builds with the nightly build of
clang. the failure of these workflow are not very visible without
clicking into the corresponding workflow in
https://github.com/scylladb/scylladb/actions.

in this change, we add their badges in the testing section of README.md,
so one can identify the test failures of them if any,

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19430
2024-06-23 19:24:40 +03:00
Kefu Chai
a7e38ada8e test: remove unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19432
2024-06-23 18:02:52 +03:00
zhouxiang
694014591a test/alternator/test_projection_expression.py: remove useless comparisons
pytest.raises expects a block of code that will raise an exception, not
a comparison of results.

Closes scylladb/scylladb#19436
2024-06-23 13:53:14 +03:00
Pavel Emelyanov
d8009ed843 api/cache_service: Don't use database to perform map+reduce on
The sharded<database> is used as a map_reduce0() method provider,
there's no real need in database itself. Simple smp::map_reduce()
would work just as good.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19364
2024-06-21 19:47:25 +03:00
Kefu Chai
f781c3babe .github: add reproducible-build workflow
to verify that scylla builds are reproducible.

the new workflow builds scylla twice with master HEAD, and compares
the md5sums of the built scylla executables. it fails if the md5sum:s
do not match.

this workflow is triggered at 5AM every Friday. its status can be
found at https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml
after it's built for the first time.

Refs #19225
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19409
2024-06-21 19:39:37 +03:00
Nadav Har'El
81a02f06dd test/cql-pytest: add more tests for SELECT's LIMIT
SELECT's "LIMIT" feature is tested in combination with other features
in different test/cql-pytest/*.py source files - for examples the
combination of LIMIT and GROUP BY is tested in test_group_by.py.

This patch adds a new test file, test_limit.py, for testing aspects
basic usage of LIMIT that weren't already tested in other files.
The new file also has a comment saying where we have other tests
for LIMIT combined with other features.

All the new tests pass (on both Scylla and Cassandra). But they can
be useful as regression tests to test patches which modify the
behavior of LIMIT - e.g., pull reques #18842.

This patch also adds another test in test_group_by.py. This adds to
one of the tests for the combination of LIMIT and GROUP BY (in this
case, GROUP BY of clustering prefix, no aggregation) also a check
for paging, that was previously missing.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19392
2024-06-21 19:35:15 +03:00
Pavel Emelyanov
755be887a6 api: Remove dedicated failure_detector registration method
It's now empty and can be dropped

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:54 +03:00
Pavel Emelyanov
2bfa1b3832 api: Move failure_detector endpoints set/unset to gossiper
These two api functions both need gossiper service and only it, and thus
should have set/unset calls next to each other. It's worth putting them
into a single place

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:54 +03:00
Pavel Emelyanov
88a6094121 api: Unset failure detector endpoints method
There's one more set of endpoints that need gossiper -- the
failure_detector ones. They are registered, but not unregistered, so
here's the method to do it. It's not called by any code yet, because
next patch would need to rework the caller anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:53 +03:00
Pavel Emelyanov
f84694166e api: (Un)Register gossiper API in correct place
Each service's endpoints are to be registered just after the service
itself, so should gossiper's

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:53 +03:00
Pavel Emelyanov
19f3a9805a api: Unset gossiper endpoints on stop
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:53 +03:00
Pavel Emelyanov
c7547b9c7e asi: Coroutinize set_server_gossip()
One of the next patches will add more async calls here, so not to
create then-chains, convert it into a coroutine

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:53 +03:00
Kefu Chai
eef64a6bb8 build: cmake: do not add "absl::headers" to include dirs
`absl::headers` is a library, not the path to its headers.

before this change, the command lines of genereated build rule look
like:

```
-I/home/kefu/dev/scylladb/repair/absl::headers
```

this does not hurt, as other libraries might add the intended include
dir to the compiler command line, but this is just wrong.

so let's remove it. please note, `repair` target already links against
`absl::headers`. so we don't need to add `absl::headers` to its linkage
again.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19384
2024-06-21 19:22:17 +03:00
Kefu Chai
7b10cc8079 treewide: include seastar headers with brackets
this change was created in the same spirit of ebff5f5d.

despite that we include Seastar as a submodule, Seastar is not a
part of scylla project. so we'd better include its headers using
brackets.

ebff5f5d addressed this cosmetic issue a while back. but probably
clangd's header-insertion helped some of contributor to insert
the missing headers with `"`. so this style of `include` returned
to the tree with these new changes.

unfortunately, clangd does not allow us to configure the style
of `include` at the time of writing.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19406
2024-06-21 19:20:27 +03:00
Kefu Chai
987fd59f21 test: correct some misspellings
fix a typo in source code. this typo was identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19412
2024-06-21 19:16:11 +03:00
Kefu Chai
52693fc21c Update seastar submodule
* seastar 9ce62705...908ccd93 (42):
  > include/seastar: do not include unused headers
  > timer-set: Add missing sanity headers
  > tutorial.md: fix typos
  > Update tutorial.md to reflect update preemption methods
  > tutorial.md: remove trailing whitespace
  > json: Add a test for jsonable objects
  > json: Make formatter::write(vector/map/umap) copy their arguments
  > json: Make formatter call write for jsonable
  > test: futures: verify stream yields the consumed value
  > build: add pyyaml to install-dependencies.sh
  > stall-analyser: remove unused variable
  > stall-analyser: use itertools.dropwhile when appropriate
  > scripts: sort packages alphanumerically
  > docker: bind the file instead of copying during the build stage
  > docker: lint dockerfile
  > dns: use undeprecated c-ares APIs
  > stall-analyser: use argparse.FileType when appropriate
  > http/client: Retry request over fresh connection in case old one failed
  > http/client: Fix indentation after previous patch
  > http/client: Pass request and handle by reference
  > http/client: Introduce make_new_connection()
  > http/client: Fix parser result checking
  > http/client: Document max_connections
  > test/http: Generalize http connection factory
  > loopback_socket: Shutdown socket on EOF close
  > loopback_socket: Rename buffer's shutdown() to abort()
  > test: Add test for sharded<>::invoke_on_...() compilation
  > net/tls: Added additional error codes
  > io-tester.md: update available parameters for job description
  > io_tester: expose extent_allocation_size_hint via job param
  > file: Unfriend reactor class
  > memory.cc: fix cross-shard shrinking realloc
  > sharded: Mark invoke_on_others() helper lambda mutable
  > scheduling: Unfriend reactor from scheduling_group_key
  > reactor: Make allocate_scheduling_group_specific_data() accept key_id argument
  > reactor: Add local key_id variable to allocate_scheduling_group_specific_data()
  > timer: Unfriend reactor
  > reactor: Generalize timer removal
  > timer: Add type alias for timer_set
  > reactor: Move reactor::complete_timers() to timer_set
  > tests: test protobuf support in prometheus_test.py
  > tests: enable prometheus_test.py to test metrics without aggregation

Closes scylladb/scylladb#19405
2024-06-21 18:52:58 +03:00
Dawid Medrek
2446cce272 db/hints: Initialize endpoint managers only for valid hint directories
Before these changes, it could happen that Scylla initialized
endpoint managers for hint directories representing

* host IDs before migrating hinted handoff to using host IDs,
* IP addresses after the migration.

One scenario looked like this:

1. Start Scylla and upgrade the cluster to using host IDs.
2. Create, by hand, a hint directory representing an IP address.
3. Trigger changing the host filter in hinted handoff; it could
   be achieved by, for example, restricting the set of data
   centers Scylla is allowed to save hints for.

When changing the host filter, we browse the hint directories
and create endpoint managers if we can send hints towards
the node corresponding to a given hint directory. We only
accepted hint directories representing IP addresses
and host IDs. However, we didn't check whether the local node
has already been upgraded to host-ID-based hinted handoff
or not. As a result, endpoint managers were created for
both IP addresses and host IDs, no matter whether we were
before or after the migration.

These changes make sure that any time we browse the hint
directories, we take that into account.

Fixes scylladb/scylladb#19172

Closes scylladb/scylladb#19173
2024-06-21 15:59:49 +02:00
Avi Kivity
3cfb0503a9 Update tools/cqlsh submodule for v6.0.21-scylla
* tools/cqlsh 0d58e5c...ba83aea (1):
  > requirements: update scylla-driver
2024-06-21 16:04:21 +03:00
Piotr Dulikowski
cf2b4bf721 Merge 'cdc: do not include unused headers' from Kefu Chai
also add `auth` and `cdc` to iwyu's `CLEANER_DIR` setting.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#19410

* github.com:scylladb/scylladb:
  .github: add auth and cdc to iwyu's CLEANER_DIR
  cdc: do not include unused headers
2024-06-21 13:44:40 +02:00
Pavel Emelyanov
0330640b4d api: Use provided db::config, not the one from ctx
The set_server_config() already has the db::config reference for
endpoints to work with, there's no need to obtain one via ctx and
database.

This change kills two birds with one stone -- less users of database as
config provider, less places that need http context -> database
dependency.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 13:30:54 +03:00
Pavel Emelyanov
afb48d8ab9 api: Move some config endpoints from proxy to config
Those getting (and setting, but these are not implemented) various
timeouts work on config, whilst register themselves in storage_proxy
function. Since the "service" they need to work with is config, move the
endpoints to config endpoints code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 13:29:38 +03:00
Pavel Emelyanov
0aad406a2f api: Split storage_proxy api registration
The set_server_storage_proxy() does two things -- registers
storage_proxy "function" and sets proxy routes, that depend on it. Next
patches will move some /storage_proxy/... endpoints registration to
earlier stage, so the function should be ready in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 13:28:29 +03:00
Pavel Emelyanov
473cb62a9a api: Unset config endpoints
The set_server_config() needs the stop-time peer, here it is.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 13:28:06 +03:00
Kefu Chai
c429a8d8ae sstables: use "me" sstable format by default
in 7952200c, we changed the `selected_format` from `mc` to `me`,
but to be backward compatible the cluster starts with "md", so
when the nodes in cluster agree on the "ME_SSTABLE_FORMAT" feature,
the format selector believes that the node is already using "ME",
which is specified by `_selected_format`. even it is actually still
using "md", which is specified by `sstable_manager::_format`, as
changed by 54d49c04. as explained above, it was specified to "md"
in hope to be backward compatible when upgrading from an existign
installation which might be still using "md". but after a second
thought, since we are able to read sstables persisted with older
formats, this concern is not valid.

in other words, 7952200c introduced a regression which changed the
"default" sstable format from `me` to `md`.

to address this, we just change `sstable_manager::_format` to "me",
so that all sstables are created using "me" format.

a test is added accordingly.

Fixes #18995
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19293
2024-06-21 12:56:01 +03:00
Yaron Kaikov
57428d373b [actions] fix sync label from PR to linked issue
in
b8c705bc54
i modified the even name to `pull_request_target`,

This caused skipping sync process when PR label was added/removed

Fixing it

Closes scylladb/scylladb#19408
2024-06-21 11:39:44 +03:00
Kamil Braun
627d566811 Merge 'join_token_ring, gossip topology: recalculate sync nodes in wait_alive' from Patryk Jędrzejczak
The node booting in gossip topology waits until all NORMAL
nodes are UP. If we removed a different node just before,
the booting node could still see it as NORMAL and wait for
it to be UP, which would time out and fail the bootstrap.

This issue caused scylladb/scylladb#17526.

Fix it by recalculating the nodes to wait for in every step of the
of the `wait_alive` loop.

Although the issue fixed by this PR caused only test flakiness,
it could also manifest in real clusters. It's best to backport this
PR to 5.4 and 6.0.

Fixes scylladb/scylladb#17526

Closes scylladb/scylladb#19387

* github.com:scylladb/scylladb:
  join_token_ring, gossip topology: update obsolete comment
  join_token_ring, gossip topology: fix indendation after previous patch
  join_token_ring, gossip topology: recalculate sync nodes in wait_alive
2024-06-21 10:22:32 +02:00
Piotr Dulikowski
c3536015e4 Merge 'cql3/statement/select_statement: do not parallelize single-partition aggregations' from Michał Jadwiszczak
This patch adds a check if aggregation query is doing single-partition read and if so, makes the query to not use forward_service and do not parallelize the request.

Fixes scylladb/scylladb#19349

Closes scylladb/scylladb#19350

* github.com:scylladb/scylladb:
  test/boost/cql_query_test: add test for single-partition aggregation
  cql3/select_statement: do not parallelize single-partition aggregations
2024-06-21 08:50:00 +02:00
Kefu Chai
694fe58d6e .github: add auth and cdc to iwyu's CLEANER_DIR
to avoid future violations of include-what-you-use.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-21 14:29:48 +08:00
Kefu Chai
1a4740ddc0 cdc: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-21 14:29:48 +08:00
Avi Kivity
fdc1449392 treewide: rename flat_mutation_reader_v2 to mutation_reader
flat_mutation_reader_v2 was introduced in a pair of commits in 2021:

  e3309322c3 "Clone flat_mutation_reader related classes into v2 variants"
  08b5773c12 "Adapt flat_mutation_reader_v2 to the new version of the API"

as a replacement for flat_mutation_reader, using range_tombstone_change
instead of range_tombstone to represent represent range tombstones. See
those commits for more information.

The transition was incremental; the last use of the original
flat_mutation_reader was removed in 2022 in commit

  026f8cc1e7 "db: Use mutation_partition_v2 in mvcc"

In turn, flat_mutation_reader was introduced in 2017 in commit

  748205ca75 "Introduce flat_mutation_reader"

To transition from a mutation_reader that nested rows within
a partition in a separate stream, to a flat reader that streamed
partitions and rows in the same stream.

Here, we reclaim the original name and rename the awkward
flat_mutation_reader_v2 to mutation_reader.

Note that mutation_fragment_v2 remains since we still use the original
for compatibilty, sometimes.

Some notes about the transition:

 - files were also renamed. In one case (flat_mutation_reader_test.cc), the
   rename target already existed, so we rename to
    mutation_reader_another_test.cc.

 - a namespace 'mutation_reader' with two definitions existed (in
   mutation_reader_fwd.hh). Its contents was folded into the mutation_reader
   class. As a result, a few #includes had to be adjusted.

Closes scylladb/scylladb#19356
2024-06-21 07:12:06 +03:00
Avi Kivity
185338c8cf Merge 'Reduce TWCS off-strategy space overhead' from Raphael "Raph" Carvalho
Normally, the space overhead for TWCS is 1/N, where is number of windows. But during off-strategy, the overhead is 100% because input sstables cannot be released earlier.

Reshaping a TWCS table that takes ~50% of available space can result in system running out of space.

That's fixed by restricting every TWCS off-strategy job to 10% of free space in disk. Tables that aren't big will not be penalized with increased write amplification, as all input (disjoint) sstables can still be compacted in a single round.

Fixes #16514.

Closes scylladb/scylladb#18137

* github.com:scylladb/scylladb:
  compaction: Reduce twcs off-strategy space overhead to 10% of free space
  compaction: wire storage free space into reshape procedure
  sstables: Allow to get free space from underlying storage
  replica: don't expose compaction_group to reshape task
2024-06-20 18:51:25 +03:00
Kefu Chai
42b9784650 build: cmake: mark wasm "ALL"
so that "wasm" target is built. "wasm" generates the text format
of wasm code. and these wasm applications are used by the test_wasm
tests.

the rules generated by `configure.py` adds these .wat files as a
dependency of `{mode}-build`, which is in turn a dependency of `{mode}`.

in this change, let's mirror this behavior by making `wasm` ALL,
so it is built by the default target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19391
2024-06-20 18:45:31 +03:00
Kefu Chai
caf1149f11 cql-pytest/test_sstable: do not import unused modules
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19389
2024-06-20 17:14:28 +03:00
Avi Kivity
02cf17f4dc Merge 'Sanitize load_meter API handlers management' from Pavel Emelyanov
The service in question is pretty small one, but it has its API endpoint that lives in /storage_service group. Currently when a service starts and has any endpoints that depend on it, the endpoint registration should follow it (#2737). Here's the PR that does it for load meter. Another goal of this change is that http context now has one less dependency onboard.

Closes scylladb/scylladb#19390

* github.com:scylladb/scylladb:
  api: Remove ctx->load_meter dependency
  api: Use local load_meter reference in handlers
  api: Fix indentation after previous patch
  api: Coroutinize load_meter::get_load_map handler
  api: Move load meter handlers
  api: Add set/unset methods for load_meter
2024-06-20 17:07:19 +03:00
Gleb Natapov
7bc05c3880 gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization
When a node bootstraps it may happen that some nodes still see it as
bootstrapping while the node itself already is in normal state and ready
to serve queries. We want to delay the bootstrap completion until all
nodes see the new node as normal. Piggy back on UP notification to do so
and what of the node that sent the notification to be seen as normal.

Fixes #18678
2024-06-20 16:37:56 +03:00
Anna Stuchlik
027cf3f47d doc: remove the link to Scylladb Google group
The group is no longer active and should be removed from resources.

Closes scylladb/scylladb#19379
2024-06-20 15:31:03 +02:00
Yaron Kaikov
f2705b3887 [action] add github context info for better debugging
It seems that we skip the sync label process between PR and linked
Issues

Adding those debug prints will allow us to understand why

Closes scylladb/scylladb#19393
2024-06-20 16:17:04 +03:00
Gleb Natapov
28c0a27467 Wait for booting node to be marked UP before complete booting.
Currently a node does not wait to be marked UP by other nodes before
complete booting which creates a usability issue: during a rolling restart
it is not enough to wait for local CQL port to be opened before
restarting next node, but it is also needed to check that all other
nodes already see this node as alive otherwise if next node is restarted
some nodes may see two node as dead instead of one.

This patch improves the situation by making sure that boot process does
not complete before all other nodes do not see the booting one as alive.
This is still a best effort thing: if some nodes are unreachable or
gossiper propagation takes too much time the boot process continues
anyway.

Fixes scylladb/scylladb#19206
2024-06-20 14:55:40 +03:00
Pavel Emelyanov
de80094815 Merge 'treewide: remove unused operator<<' from Kefu Chai
since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s.
there are more occurrences of unused operator<< in the tree, but let's do the cleanup piecemeal.

---

this is a cleanup, so no need to backport

Closes scylladb/scylladb#19346

* github.com:scylladb/scylladb:
  types: remove unused operator<<
  node_ops: remove unused operator<<
  lang: remove unused operator<<
  gms: remove unused operator<<
  dht: remove unused operator<<
  test: do not use operator<< for std::optional
2024-06-20 13:18:59 +03:00
Pavel Emelyanov
873d76c02b api: Remove ctx->load_meter dependency
Now the API uses captured reference and the explicit dependency is not
needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:38:28 +03:00
Pavel Emelyanov
d85e70ef98 api: Use local load_meter reference in handlers
Now it uses ctx.lm dependency, but the idiomatic way for API is to use
the argument one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:37:48 +03:00
Pavel Emelyanov
bc5e360066 api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:37:39 +03:00
Pavel Emelyanov
e54f651beb api: Coroutinize load_meter::get_load_map handler
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:37:18 +03:00
Pavel Emelyanov
40c178bee2 api: Move load meter handlers
Now they are in storage service set/unset helper, but there's the
dedicated set/unset pair for meter's enpoints.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:36:38 +03:00
Pavel Emelyanov
724d62aa87 api: Add set/unset methods for load_meter
The meter is pretty small sevice and its API is also tiny. Still, it's a
standalone top-level service, and its API should come next to it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:35:58 +03:00
Botond Dénes
b09196ac49 Merge 'tasks: fix tasks abort' from Aleksandra Martyniuk
Currently if task_manager::task::impl::abort preempts before children are recursively aborted and then the task gets unregistered, we hit use after free since abort uses children vector which is no longer alive.

Modify abort method so that it goes over all tasks in task manager and aborts those with the given parent.

Fixes: #19304.

Requires backport to all versions containing task manager

Closes scylladb/scylladb#19305

* github.com:scylladb/scylladb:
  test: add test for abort while a task is being unregistered
  tasks: fix tasks abort
2024-06-20 12:09:30 +03:00
Kefu Chai
1a724f22f9 mutation: silence false alarm from clang-tidy
before this change, because it seems that we move away from `p2` in
each iteration, so the succeeding iterations are moving from an empty
`p2`, clang-tidy warns at seeing this.

but we only move from `p2._static_row` in the first iteration when the
dest `mutation_partition` instance's static row is empty. and in the
succeeding iterations, the dest `mutation_partition` instance's static
row is not empty anymore if it is set. so, this is a false alarm.

in this change, we silence this warning. another option is to extract
the single-shot mutation out of the loop, and pass the `std::move(p2)`
only for the single-shot mutation, but that'd be a much more intrusive
change. we can revisit this later.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19331
2024-06-20 12:05:20 +03:00
Kefu Chai
9f0b60c7a0 rust: disable incremental build for release build
so that the release build is reproducible. a reproduciable helps
developers to perform postmortem debugging.

Fixes #19225
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19374
2024-06-20 12:01:14 +03:00
Patryk Jędrzejczak
bcc0a352b7 join_token_ring, gossip topology: update obsolete comment
The code mentioned in the comment has already been added. We change
the comment to prevent confusion.
2024-06-20 10:59:50 +02:00
Patryk Jędrzejczak
7735bd539b join_token_ring, gossip topology: fix indendation after previous patch 2024-06-20 10:59:50 +02:00
Patryk Jędrzejczak
017134fd38 join_token_ring, gossip topology: recalculate sync nodes in wait_alive
Before this patch, if we booted a node just after removing
a different node, the booting node may still see the removed node
as NORMAL and wait for it to be UP, which would time out and fail
the bootstrap.

This issue caused scylladb/scylladb#17526.

Fix it by recalculating the nodes to wait for in every step of the
of the `wait_alive` loop.
2024-06-20 10:59:49 +02:00
Anna Stuchlik
680405b465 doc: separate Entrprise- from OSS-only content
This commit adds files that contain Open Source-specific information
and includes these files with the .. scylladb_include_flag:: directive.
The files include a) a link and b) Table of Contents.

The purpose of this update is to enable adding
Open Source/Enterprise-specific information in the Reference section.

Closes scylladb/scylladb#19362
2024-06-20 11:58:32 +03:00
Piotr Dulikowski
75441ee120 Merge 'mv: fix value of the gossiped view update backlog' from Wojciech Mitros
Currently, when calculating the view update backlog for gossip,
we start with `db::view::update_backlog()` and compare it to backlogs
from all shards. However, this backlog can't be compared to other
backlogs - it has size 0 and we compare the fraction current/size
when comparing backlogs, causing us to compare with `NaN`.
This patch fixes it by starting the comparisons with an empty backlog.

The patch introducing this issue (f70f774e40) wasn't backported, so this one doesn't need to be either

Closes scylladb/scylladb#19247

* github.com:scylladb/scylladb:
  mv: make the view update backlog unmofidiable
  mv: fix value of the gossiped view update backlog
2024-06-20 06:27:11 +02:00
Piotr Dulikowski
78a40dbe2c Merge 'cql: remove global_req_id from schema_altering_statement' from Marcin Maliszkiewicz
Such field is no longer needed as the information comes
directly from group0_batch.

Fixes scylladb/scylladb#19365

Backport: no, we don't backport code cleanups

Closes scylladb/scylladb#19366

* github.com:scylladb/scylladb:
  cql: remove global_req_id from schema_altering_statement
  cql: switch alter keyspace prepare_schema_mutations to use group0_batch
2024-06-20 06:21:48 +02:00
Dawid Medrek
c56de90a26 test/boost/hint_test.cc: Add missing parse() callback
Before these changes, compilation was failing with the following
error:

In file included from test/boost/hint_test.cc:12:
/usr/include/fmt/ranges.h:298:7: error: no member named 'parse' in 'fmt::formatter<db::hints::sync_point::host_id_or_addr>'
  298 |     f.parse(ctx);
      |     ~ ^

We add the missing callback.

Closes scylladb/scylladb#19375
2024-06-19 23:19:33 +02:00
Wojciech Mitros
cde14a5788 mv: make the view update backlog unmofidiable
Currently, a view update backlog may reach an invalid state, when
its max is 0 and its relative_size() is NaN as a result. This can
be achieved either by constructing the backlog with a 0 max or by
modifying the max of an existing backlog. In particular, this
happens when creating the backlog using the default constructor.

In this patch the the default constructor is deleted and a check
is added to make sure that the max is different than 0 is added
to its constructor - if the check fails, we construct an empty
backlog instead, to handle the possibility of getting an invalid
backlog sent from a node with a version that's missing this check.
Additionally, we make the backlogs members private, exposing them
only through const getters.
2024-06-19 19:44:57 +02:00
Pavel Emelyanov
5fe4290f66 gitattributes: Mark swagger .js files as binary
The goal is the same as in 29768a2d02 (gitattributes: Mark *.svg as
binary) -- prevent grep from searching patterns in those files.

Despite those files are, in fact, javascript code, the way they are
formatted is not suitable for human reading, so it's unlikely that anyone
would be interested in grep-ing patters in it. At the same time, those
files consist of of very long lines, so if a grep finds a pattern in one
of those, the output is spoiled.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19357
2024-06-19 15:07:56 +03:00
Botond Dénes
9d1fa828be Merge 'utils/large_bitset: replace reserve_partial with utils::reserve_gently' from Lakshmi Narayanan Sreethar
Replace the reserve_partial loop in large_bitset constructor with a new function - reserve_gently() that can reserve memory without stalling by repeatedly calling reserve_partial() method of the passed container.

Closes scylladb/scylladb#19361

* github.com:scylladb/scylladb:
  utils/large_bitset: replace reserve_partial with utils::reserve_gently
  utils/stall_free: introduce reserve_gently
2024-06-19 14:31:59 +03:00
Michał Jadwiszczak
8eb5ca8202 test/boost/cql_query_test: add test for single-partition aggregation 2024-06-19 09:24:17 +02:00
Piotr Dulikowski
7567b87e72 Merge 'auth: reuse roles select query during cache population' from Marcin Maliszkiewicz
With big number of shards in the cluster (e.g. 500+) due to cache
periodic refresh we experience high load on role_permissions table
(e.g. 1k op/s). The load on roles table is amplified because to populate
single entry in the cache we do several selects on roles table. Some
of this can't be avoided because roles are arranged in a tree-like
structure where permissions can be inherited.

This patch tries to reuse queries which are simply duplicated. It should
reduce the load on roles table by up to 50%.

Fixes scylladb/scylladb#19299

Closes scylladb/scylladb#19300

* github.com:scylladb/scylladb:
  auth: reuse roles select query during cache population
  auth: coroutinize service::get_uncached_permissions
  auth: coroutinize service::has_superuser
2024-06-19 07:53:47 +02:00
Marcin Maliszkiewicz
56707e2965 cql: remove global_req_id from schema_altering_statement
Such field is no longer needed as the information comes
directly from group0_batch.

Fixes scylladb/scylladb#19365
2024-06-18 20:26:09 +02:00
Lakshmi Narayanan Sreethar
9ad800cfb9 utils/large_bitset: replace reserve_partial with utils::reserve_gently
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-18 23:36:30 +05:30
Lakshmi Narayanan Sreethar
31414f54c6 utils/stall_free: introduce reserve_gently
Add reserve_gently() that can reserve memory without stalling by
repeatedly calling reserve_partial() method of the passed container.
Update the comments of existing reserve_partial() methods to mention
this newly introduced reserve_gently() wrapper.
Also, add test to verify the functionality.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-18 23:36:30 +05:30
Marcin Maliszkiewicz
685aecde61 cql: switch alter keyspace prepare_schema_mutations to use group0_batch
This is needed to simplify the code in the following commit.
2024-06-18 19:54:55 +02:00
Michał Jadwiszczak
e9ace7c203 cql3/select_statement: do not parallelize single-partition aggregations
Currently reads with WHERE clause which limits them to be
single-partition reads, are unnecessarily parallelized.

This commit checks this condition and the query doesn't use
forward_service in single-partition reads.
2024-06-18 19:21:32 +02:00
Pavel Emelyanov
f7d5d4877c Merge '[test.py] Fix several issues in log gathering' from Andrei Chekun
Related: https://github.com/scylladb/scylladb/issues/17851

Fix the issue that test logs were not deleted
Fix the issue that the URL to the failed test directory was incorrectly shown even when artifacts_dir_url option was not provided
Fix the issue that there were no node logs when it failed to join the cluster

Closes scylladb/scylladb#19115

* github.com:scylladb/scylladb:
  [test.py] Fix logs had multiplication of lines
  [test.py] Fix log not deleted
  [test.py] Fix log for failed node was nod added to failed directory
  [test.py] Fix URl for failed logs directory in CI
2024-06-18 15:37:29 +03:00
Aleksandra Martyniuk
50cb797d95 test: add test for abort while a task is being unregistered 2024-06-18 13:41:51 +02:00
Aleksandra Martyniuk
3463f495b1 tasks: fix tasks abort
Currently if task_manager::task::impl::abort preempts before children
are recursively aborted and then the task gets unregistered, we hit
use after free since abort uses children vector which is no
longer alive.

Modify abort method so that it goes over all tasks in task manager
and aborts those with the given parent.

Fixes: #19304.
2024-06-18 13:39:29 +02:00
Botond Dénes
2123b22526 Merge 'doc: add 6.x.y to 6.x.z and remove 5.x.y to 5.x.z upgrade guide' from Anna Stuchlik
This PR removes the 5.x.y to 5.x.z upgrade guide and adds the 6.x.y to 6.x.z upgrade guide.

The previous maintenance upgrade guides, such as from 5.x.y to 5.x.z, consisted of several documents - separate for each platform.
The new 6.x.y to 6.x.z upgrade guide is one document - there are tabs to include platform-specific information (we've already done it for other upgrade guides as one generic document is more convenient to use and maintain).

I did not modify the procedures. At some point, they have been reviewed for previous upgrade guides.

Fixes https://github.com/scylladb/scylladb/issues/19322

-  This PR must be backported to branch-6.0, as it adds 6.x specific content.

Closes scylladb/scylladb#19340

* github.com:scylladb/scylladb:
  doc: remove the 5.x.y to 5.x.z upgrade guide
  doc: add the 6.x.y to 6.x.z upgrade guide-6
2024-06-18 14:24:38 +03:00
Wojciech Mitros
1de5566cfa mv: fix value of the gossiped view update backlog
Currently, when calculating the view update backlog for gossip,
we start with `db::view::update_backlog()` and compare it to backlogs
from all shards. However, this backlog can't be compared to other
backlogs - it has size 0 and we compare the fraction current/size
when comparing backlogs, causing us to compare with `NaN`.
This patch fixes it by starting the comparisons with an empty backlog.
2024-06-18 13:15:18 +02:00
Kefu Chai
87247c6542 .github: add workflow to build with latest seastar
so we can be awared that if scylla builds with seastar master HEAD,
and to be prepared if a build failure is found.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19135
2024-06-18 13:34:43 +03:00
Andrei Chekun
6a4b441bf2 [test.py] Fix logs had multiplication of lines
Since the test name was not unique across the run and when we were using a --repeat option, there were several handlers for the same file. With this change test name and accordingly, the log name will be different for the same test but different repeat case. Remove mode from the test name since it's already in mode directory.
2024-06-18 11:14:07 +02:00
Andrei Chekun
b01a5f9bd9 [test.py] Fix log not deleted
One of the created log files was not deleted at all, because there was no delete command. Unlink moved on later stage explicitly after removing the handler that writing to this file to avoid the possibility that something will be added after removing the file.
2024-06-18 11:14:01 +02:00
Kefu Chai
0a74d45425 build: cmake: add commitlog_cleanup_test
in 94cdfcaa94, we added commitlog_cleanup_test to `configure.py`,
but didn't add it to the CMake building system.

in this change, let's add it to the CMake building system.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19314
2024-06-18 12:12:28 +03:00
Kefu Chai
68ef7dda79 config: correct the comment on printable_to_json()
seastar::format() does not use operator<< under the hood, it uses
{fmt}, so update the comment accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19315
2024-06-18 12:08:59 +03:00
Nadav Har'El
2ec1e0f0d5 test/cql-pytest: tests verifying UUID sort order
In issue #15561 some doubts were raised regarding the way ScyllaDB sorts
UUID values. This patch adds a heavily-commented cql-pytest test that
helps understand - and verify that understanding - of the way Scylla sorts
UUIDs, and shows there is some reason in the madness (in particular,
Version 1 UUIDs (time uuids) are sorted like timeuuids, and not as byte
arrays.

The new tests check the different cases (see the comments in the test),
and as usual for cql-pytest tests - they passes also on Cassandra, which
allows us to confirm that the sort order we used is identical to the one
used by Cassandra and not something that Scylla mis-implemented.

Having this test in our suite will also ensure that the UUID ordering
never changes accidentally in the future. If it ever changes, it can
break access to existing tables that use UUID clustering keys, so
it shouldn't change.

Fixes #15561

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19343
2024-06-18 12:05:30 +03:00
Pavel Emelyanov
147552c34a Merge 'configurable maintenance (streaming) semaphore count resource limit' from Botond Dénes
Making the count resources on the maintenance (streaming) semaphore live update via config. This will allow us to improve repair speed on mixed-shard clusters, where we suspect that reader trashing -- due to the combination of high number of readers on each shard and very conservative reader count limit (10) -- is the main cause of the slowness.
Making this count limit confgurable allows us to start experimenting with this fix, without committing to a count limit increase (or removal), addressing the pain in the field.

Refs: #18269

No OSS backport needed.

Closes scylladb/scylladb#19248

* github.com:scylladb/scylladb:
  replica/database: wire in maintenance_reader_concurrency_semaphore_count_limit
  db/config: introduce maintenance_reader_concurrency_semaphore_count_limit
  reader_concurrency_semaphore: make count parameter live-update
2024-06-18 12:02:24 +03:00
Gleb Natapov
fb764720d3 topology coordinator: add more trace level logging for debugging
Add more logging that provide more visibility into what happens during
topology loading.

Message-ID: <ZnE5OAmUbExVZMWA@scylladb.com>
2024-06-18 10:34:03 +02:00
Botond Dénes
1acc57e19d Merge 'schema: Make "describe" use extensions to string' from Calle Wilund
Fixes #19334

Current impl uses hardcoded printing of a few extensions.
Instead, use extension options to string and print all.

Note: required to make enterprise CI happy again.

Closes scylladb/scylladb#19337

* github.com:scylladb/scylladb:
  schema: Make "describe" use extensions to string
  schema_extensions: Add an option to string method
2024-06-18 11:28:11 +03:00
Botond Dénes
495f7160da Update tools/jmx submodule
* tools/jmx 53696b13...3328a229 (1):
  > scylla-apiclient: add missing license for SBOM report
2024-06-18 11:11:57 +03:00
Kefu Chai
fd0de02b81 types: remove unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-18 15:55:22 +08:00
Kefu Chai
2c1a3e7191 node_ops: remove unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-18 15:55:22 +08:00
Kefu Chai
84f0fd6823 lang: remove unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-18 15:55:22 +08:00
Kefu Chai
ec5f0fccce gms: remove unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-18 15:55:22 +08:00
Kefu Chai
51d686ea9f dht: remove unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-18 11:26:20 +08:00
Kefu Chai
ef0f4eaef2 test: do not use operator<< for std::optional
we don't provide it anymore, and if any of existing type provides
constructor accepting an `optional<>`, and hence can be formatted
using operator<< after converting it, neither shall we rely on this
behavior, as it is fragile.

so, in this change, we switch to `fmt::print()` to use {fmt} to
print `optional<>`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-18 10:41:48 +08:00
Andrei Chekun
3c921d5712 Add allure pytest adaptor to the toolchain
Add allure-pytest pip dependency to be able to use it for generating the allure report later.
Main benefits of the allure report:
1. Group test failures
2. Possibility to attach log files to she test itself
3. Timeline of test run
4. Test description on the report
5. Search by test name or tag

[avi: regenerate toolchain]

Closes scylladb/scylladb#19335
2024-06-17 23:17:01 +03:00
Nadav Har'El
4faceeaa33 Merge 'treewide: drop thrift support' from Kefu Chai
thrift support was deprecated since ScyllaDB 5.2

> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.

so let's drop it. in this change,

* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool.

Fixes #3811
Fixes #18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

- [x] not a fix, no need to backport

Closes scylladb/scylladb#18453

* github.com:scylladb/scylladb:
  config: expand on rpc_keepalive's description
  api: s/rpc/thrift/
  db/system_keyspace: drop thrift_version from system.local table
  transport: do not return client_type from cql_server::connection::make_client_key()
  treewide: drop thrift support
2024-06-17 22:36:49 +03:00
Andrei Chekun
8845978ec5 [test.py] Unbreak cql-pytest and alternator
Provide possibility to run pytest without explicitly providing mode parameter

Closes scylladb/scylladb#19342
2024-06-17 21:41:09 +03:00
Piotr Dulikowski
85128c5b10 Merge 'cql3: always return created event in create keyspace statement' from Marcin Maliszkiewicz
cql3: always return created event in create ks/table/type/view statement

In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS
and later USE KEYSPACE it can happen that schema in driver's session is
out of sync because it synces when it receives special message from
CREATE KEYSPACE response.

Similar situation occurs with other schema change statements.

In this patch we fix only create keyspace/table/type/view statements
by always sending created event. Behavior of any other schema altering
statements remains unchanged.

Fixes https://github.com/scylladb/scylladb/issues/16909

**backport: no, it's not a regression**

Closes scylladb/scylladb#18819

* github.com:scylladb/scylladb:
  cql3: always return created event in create ks/table/type/view statement
  cql3: auth: move auto-grant closer to resource creation code
  cql3: extract create ks/table/type/view event code
2024-06-17 19:58:38 +02:00
Anna Stuchlik
ea35982764 doc: remove the 5.x.y to 5.x.z upgrade guide
This commit removes the upgrade guide from 5.x.y to 5.x.z.
It is reduntant in version 6.x.
2024-06-17 17:28:39 +02:00
Anna Stuchlik
ead201496d doc: add the 6.x.y to 6.x.z upgrade guide-6
This commit adds the upgrade guide from 6.x.y to 6.x.z.
2024-06-17 17:23:00 +02:00
Marcin Maliszkiewicz
95673907ca auth: reuse roles select query during cache population
With big number of shards in the cluster (e.g. 500+) due to cache
periodic refresh we experience high load on role_permissions table
(e.g. 1k op/s). The load on roles table is amplified because to populate
single entry in the cache we do several selects on roles table. Some
of this can't be avoided because roles are arranged in a tree-like
structure where permissions can be inherited.

This patch tries to reuse queries which are simply duplicated. It should
reduce the load on roles table by up to 50%.

Fixes scylladb/scylladb#19299
2024-06-17 16:46:33 +02:00
Marcin Maliszkiewicz
547eb6d59b auth: coroutinize service::get_uncached_permissions 2024-06-17 16:46:28 +02:00
Marcin Maliszkiewicz
00a24507cb auth: coroutinize service::has_superuser 2024-06-17 16:46:22 +02:00
Kefu Chai
a5a5ca0785 auth: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19312
2024-06-17 17:33:55 +03:00
Yaniv Michael Kaul
9b0eb82175 dist/common/scripts/scylla_coredump_setup: fix typo
Does not able -> Unable

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#19328
2024-06-17 17:33:46 +03:00
Kefu Chai
b64126fe1c db: remove unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19313
2024-06-17 17:33:31 +03:00
Calle Wilund
73abc56d79 schema: Make "describe" use extensions to string
Fixes #19334

Current impl uses hardcoded printing of a few extensions.
Instead, use extension options to string and print all.
2024-06-17 13:30:24 +00:00
Calle Wilund
d27620e146 schema_extensions: Add an option to string method
Allow an extension to describe itself as the CQL property
string that created it (and is serialized to schema tables)

Only paxos extension requires override.
2024-06-17 13:30:10 +00:00
Gleb Natapov
09556bff0e gossiper: move gossip verbs to the idl 2024-06-17 12:47:17 +03:00
Kefu Chai
7e9550e9f9 test/py/minio_server.py: do not reference non-existent old_env
in 51c53d8db6, we check `self.old_env[env]` for None, but there
are chances `self.old_env` does not contain a value with `env`.
in that case, we'd have following failure:

```
Traceback (most recent call last):
  File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 307, in <module>
    asyncio.run(main())
  File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 304, in main
    await server.stop()
  File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 274, in stop
    self._unset_environ()
  File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 211, in _unset_environ
    if self.old_env[env] is not None:
       ~~~~~~~~~~~~^^^^^
KeyError: 'S3_CONFFILE_FOR_TEST'
```

this happens if we run `pylib/minio_server.py` as a standalone
application.

in this change, instead of getting the value with index, we use
`dict.get()`, so that it does not throw when the dict does not
have the given key.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19291
2024-06-17 12:42:43 +03:00
Andrei Chekun
293cf355df [test.py] Fix log for failed node was nod added to failed directory
If something happens during nod adding to the cluster, it will not be registered as a part of the cluster. This leads to situations during log gathering that logs for a such node will be missing.
2024-06-17 11:16:55 +02:00
Andrei Chekun
7bbb8d9260 [test.py] Fix URl for failed logs directory in CI
Incorrect passing of the artifacts_dir_url parameter from test.py to pytest leads to the situation when it will pass None as a string and pytest will generate incorrect URL.
2024-06-17 11:16:48 +02:00
Aleksandra Martyniuk
fb3153d253 api: task_manager: delete module from full_task_status
Delete module field from full_task_status as it is unused.

Closes scylladb/scylladb#18853
2024-06-17 09:03:19 +03:00
Nadav Har'El
9fc70a28ca test: unflake test test_alternator_ttl_scheduling_group
This test in topology_experimental_raft/test_alternator.py wants to
check that during Alternator TTL's expiration scans, ALL of the CPU was
used in the "streaming" scheduling group and not in the "statement"
scheduling group. But to allow for some fluke requests (e.g., from the
driver), the test actually allows work in the statement group to be
up to 1% of the work.

Unfortunately, in one test run - a very slow debug+aarch64 run - we
saw the work on the statement group reach 1.4%, failing the test.
I don't know exactly where this work comes from, perhaps the driver,
but before this bug was fixed we saw more than 58% of the work in the
wrong scheduling group, so neither 1% or 1.4% is a sign that the bug
came back. In fact, let's just change the threshold in the test to 10%,
which is also much lower than the pre-fix value of 58%, so is still a
valid regression test.

Fixes #19307

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19323
2024-06-17 08:39:38 +03:00
Yaron Kaikov
996be2e235 dbuild: update toolchain to get latest scylla-api-client
a new Scylla-api-client was released to get a proper license information
in our SBOM report,

Refs: https://github.com/scylladb/scylla-jmx/issues/237

Closes scylladb/scylladb#19324
2024-06-17 08:37:49 +03:00
Dawid Medrek
670830091c db/hints: Use dedicated functions to lock a shared mutex
Seastar has functions implementing locking a `seastar::shared_mutex`.
We should use those now instead of reimplementing them in Scylla.

Closes scylladb/scylladb#19253
2024-06-14 20:31:37 +02:00
Kamil Braun
bbb424a757 Merge '[test.py] Add uniqueness to the test name' from Andrei Chekun
In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same. To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id.

Fixes: https://github.com/scylladb/scylladb/issues/17851

Fixes: https://github.com/scylladb/scylladb/issues/15973

Closes scylladb/scylladb#19235

* github.com:scylladb/scylladb:
  [test.py] Add uniqueness to the test name
  [test.py] Refactor alternator, nodetool, rest_api
2024-06-14 17:59:07 +02:00
Botond Dénes
5b87fa4cea Merge 'doc: document keyspace and table for nodetool ring' from Kefu Chai
these two arguments are critical when tablets are enabled.

Fixes https://github.com/scylladb/scylladb/issues/19296

---

6.0 is the first release with tablets support. and `nodetool ring` is an important tool to understand the data distribution. so we need to backport this document change to 6.0

Closes scylladb/scylladb#19297

* github.com:scylladb/scylladb:
  doc: document `keyspace` and `table` for `nodetool ring`
  doc: replace tab with space
2024-06-14 16:04:23 +03:00
Kefu Chai
ea3b8c5e4f doc: document keyspace and table for nodetool ring
these two arguments are critical when tablets are enabled.

Fixes #19296
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-14 21:01:14 +08:00
Botond Dénes
c563acdbe9 Merge 'build: cmake: use path to be compatible with CI' from Kefu Chai
this change is created in the same spirit of 1186ddef16, which updated the rule for generating the stripped dist pkg, but it failed to update the one for generating the unstripped dist pkg. what's why we have build failure when the workflow is looking for the unstripped tar.gz:

```
08:02:47  ++ ls /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz
08:02:47  ls: cannot access '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz': No such file or directory`
```

so, in this change, we fix the path.

Refs #2717

---

* cmake related change, hence no need to backport.

Closes scylladb/scylladb#19290

* github.com:scylladb/scylladb:
  build: cmake: use per-mode path for building unstripped_dist_pkg
  build: cmake: use path to be compatible with CI
2024-06-14 15:35:26 +03:00
Kefu Chai
d498ca3afa test: randomized_nemesis_test: use BOOST_REQUIRE_* when appropriate
for better debuggability.

Refs #17030
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19282
2024-06-14 15:33:07 +03:00
Kefu Chai
d887fd2402 build: use default modes when no modes are selected
when `--use-cmake` option is passed to `configure.py`,

- before this change, all modes are selected if no
  `--mode` options are passed to `configure.py`.
- after this change, only the modes whose `build_by_default` is
  `True` are selected, if no `--mode` options are specfied.

the new behavior matches the existing behavior. otherwise,
`ninja -C build mode_list` would list the mode which
is not built by default.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19292
2024-06-14 15:31:58 +03:00
Botond Dénes
b2ebc172d0 Merge 'Fix usage of utils/chunked_vector::reserve_partial' from Lakshmi Narayanan Sreethar
utils/chunked_vector::reserve_partial: fix usage in callers

The method reserve_partial(), when used as documented, quits before the
intended capacity can be reserved fully. This can lead to overallocation
of memory in the last chunk when data is inserted to the chunked vector.
The method itself doesn't have any bug but the way it is being used by
the callers needs to be updated to get the desired behaviour.

Instead of calling it repeatedly with the value returned from the
previous call until it returns zero, it should be repeatedly called with
the intended size until the vector's capacity reaches that size.

This PR updates the method comment and all the callers to use the
right way.

Fixes #19254

Closes scylladb/scylladb#19279

* github.com:scylladb/scylladb:
  utils/large_bitset: remove unused includes identified by clangd
  utils/large_bitset: use thread::maybe_yield()
  test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial
  utils/lsa/chunked_managed_vector: fix reserve_partial()
  utils/chunked_vector: return void from reserve_partial and make_room
  test/boost/chunked_vector_test: fix testcase tests_reserve_partial
  utils/chunked_vector::reserve_partial: fix usage in callers
2024-06-14 15:31:00 +03:00
Kefu Chai
5c41073e00 tools/scylla-sstable: format error message with compile-time check
before this change, we use runtime format string to format error
messages. but it does not have the compile time format check.
if we pass arguments which are not formattable, {fmt} throws at
runtime, instead of error out at compile-time. this could be very
annoying, because we format error messages at the error handling
path. but if user ends up seeing an exception for {fmt} instead
of a nice error message, it would be far from helpful.

in this change, we

- use compile-time format string
- fix two caller sites, where we pass `std::exception_ptr` to
  {fmt}, but `std::exception_ptr` is not formattable by {fmt} at
  the time of writing. we do have operator<< based formatter for
  it though. so we delegate to `fmt::streamed` to format it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19294
2024-06-14 15:30:19 +03:00
Kefu Chai
aef1718833 doc: replace tab with space
more consistent this way, also easier to format in a regular editor
without additional setup.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-14 18:46:09 +08:00
Kamil Braun
982fa31250 Merge 'test: servers_add: fix the expected_error parameter' from Patryk Jędrzejczak
This PR fixes two problems with the `expected_error`
parameter in `server_add` and `servers_add`.
1. It didn't work in `server_add` if the cluster was empty
because of an incorrect attempt to connect the driver.
2. It didn't work in `servers_add` completely because the
`seeds` parameter was handled incorrectly.

This PR only adds improvements in the testing framework,
no need to backport it.

Closes scylladb/scylladb#19255

* github.com:scylladb/scylladb:
  test: manager_client, scylla_cluster: fix type annotations in add_servers
  test: manager_client: don't connect driver after failed server_{add, start}
  test: scylla_cluster: pass seeds to add_servers
2024-06-14 11:33:21 +02:00
Wojciech Mitros
d31437b589 mv: replicate the gossiped backlog to all shards
On each shard of each node we store the view update backlogs of
other nodes to, depending on their size, delay responses to incoming
writes, lowering the load on these nodes and helping them get their
backlog to normal if it were too high.

These backlogs are propagated between nodes in two ways: the first
one is adding them to replica write responses. The seconds one
is gossiping any changes to the node's backlog every 1s. The gossip
becomes useful when writes stop to some node for some time and we
stop getting the backlog using the first method, but we still want
to be able to select a proper delay for new writes coming to this
node. It will also be needed for the mv admission control.

Currently, the backlog is gossiped from shard 0, as expected.
However, we also receive the backlog only on shard 0 and only
update this shard's backlogs for the other node. Instead, we'd
want to have the backlogs updated on all shards, allowing us
to use proper delays also when requests are received on shards
different than 0.

This patch changes the backlog update code, so that the backlogs
on all shards are updated instead. This will only be performed
up to once per second for each other node, and is done with
a lower priority, so it won't severly impact other work.

Fixes: scylladb/scylladb#19232

Closes scylladb/scylladb#19268
2024-06-14 11:24:20 +02:00
Andrei Chekun
8d1d206aff [test.py] Add uniqueness to the test name
In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same.
To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id.

Fixes: https://github.com/scylladb/scylladb/issues/17851

Fixes: https://github.com/scylladb/scylladb/issues/15973
2024-06-14 11:23:04 +02:00
Wojciech Mitros
9bae1814ab test: add test for failed view building write
For various reasons, a view building write may fail. When that
happens, the view building should not finish until these writes
are successfully retried and they should not interfere with any
writes that are performed to the base table while the view is
building.

The test introduced in this patch confirms that this is the case.

Refs scylladb/scylladb#19261

Closes scylladb/scylladb#19263
2024-06-14 10:38:21 +02:00
Lakshmi Narayanan Sreethar
c49f6391ab utils/large_bitset: remove unused includes identified by clangd
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar
83190fa075 utils/large_bitset: use thread::maybe_yield()
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar
310c5da4bb test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial
Update the maximum size tested by the testcase. The test always created
only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less
than the default max chunk size (12.8 KB). So, use twice the
max_chunk_capacity as the test size distribution upper limit to verify
that partial_reserve can reserve multiple chunks.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar
d4f8b91bd6 utils/lsa/chunked_managed_vector: fix reserve_partial()
Fix the method comment and return types of chunked_managed_vector's
reserve_partial() similar to chunked_vector's reserve_partial() as it
has the same issues mentioned in #19254. Also update the usage in the
chunked_managed_vector_test.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar
0a22759c2a utils/chunked_vector: return void from reserve_partial and make_room
Since reserve_partial does not depend on the number of remaining
capacity to be reserved, there is no need to return anything from it and
the make_room method.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-14 13:43:07 +05:30
Lakshmi Narayanan Sreethar
29f036a777 test/boost/chunked_vector_test: fix testcase tests_reserve_partial
Fix the usage of reserve_partial in the testcase. Also update the
maximum chunk size used by the testcase. The test always created only
one chunk as the maximum size tested by it (1 << 12 = 4KB) was less
than the default max chunk size (128 KB). So, use smaller chunk size,
512 bytes, to verify that partial_reserve can reserve multiple chunks.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-14 13:43:07 +05:30
Kefu Chai
df094061e3 test: randomized_nemesis_test: define static variable
before this change, when linking randomized_nemesis_test with ld.lld:

```
[4/4] Linking CXX executable test/raft/RelWithDebInfo/randomized_nemesis_test
FAILED: test/raft/RelWithDebInfo/randomized_nemesis_test
: && /home/kefu/.local/bin/clang++ -ffunction-sections -fdata-sections -O3 -g -gz -Xlinker --build-id=sha1 --ld-path=ld.lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 -Xlinker --gc-sections test/raft/CMakeFiles/test-raft-helper.dir/RelWithDebInfo/helpers.cc.o test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o -o test/raft/RelWithDebInfo/randomized_nemesis_test -L/home/kefu/dev/scylladb/idl/absl::headers -Wl,-rpath,/home/kefu/dev/scylladb/idl/absl::headers  test/lib/RelWithDebInfo/libtest-lib.a  seastar/RelWithDebInfo/libseastar.a  /usr/lib64/libxxhash.so  seastar/RelWithDebInfo/libseastar_testing.a  test/lib/RelWithDebInfo/libtest-lib.a  -Xlinker --push-state -Xlinker --whole-archive  auth/RelWithDebInfo/libscylla_auth.a  -Xlinker --pop-state  /usr/lib64/libcrypt.so  cdc/RelWithDebInfo/libcdc.a  compaction/RelWithDebInfo/libcompaction.a  mutation_writer/RelWithDebInfo/libmutation_writer.a  -Xlinker --push-state -Xlinker --whole-archive  dht/RelWithDebInfo/libscylla_dht.a  -Xlinker --pop-state  types/RelWithDebInfo/libtypes.a  index/RelWithDebInfo/libindex.a  -Xlinker --push-state -Xlinker --whole-archive  locator/RelWithDebInfo/libscylla_locator.a  -Xlinker --pop-state  message/RelWithDebInfo/libmessage.a  gms/RelWithDebInfo/libgms.a  sstables/RelWithDebInfo/libsstables.a  readers/RelWithDebInfo/libreaders.a  schema/RelWithDebInfo/libschema.a  -Xlinker --push-state -Xlinker --whole-archive  tracing/RelWithDebInfo/libscylla_tracing.a  -Xlinker --pop-state  RelWithDebInfo/libscylla-main.a  abseil/absl/strings/RelWithDebInfo/libabsl_cord.a  abseil/absl/strings/RelWithDebInfo/libabsl_cordz_info.a  abseil/absl/strings/RelWithDebInfo/libabsl_cord_internal.a  abseil/absl/strings/RelWithDebInfo/libabsl_cordz_functions.a  abseil/absl/strings/RelWithDebInfo/libabsl_cordz_handle.a  abseil/absl/crc/RelWithDebInfo/libabsl_crc_cord_state.a  abseil/absl/crc/RelWithDebInfo/libabsl_crc32c.a  abseil/absl/crc/RelWithDebInfo/libabsl_crc_internal.a  abseil/absl/crc/RelWithDebInfo/libabsl_crc_cpu_detect.a  abseil/absl/strings/RelWithDebInfo/libabsl_str_format_internal.a  /usr/lib64/libz.so  service/RelWithDebInfo/libservice.a  node_ops/RelWithDebInfo/libnode_ops.a  service/RelWithDebInfo/libservice.a  node_ops/RelWithDebInfo/libnode_ops.a  -lsystemd  raft/RelWithDebInfo/libraft.a  repair/RelWithDebInfo/librepair.a  streaming/RelWithDebInfo/libstreaming.a  replica/RelWithDebInfo/libreplica.a  db/RelWithDebInfo/libdb.a  mutation/RelWithDebInfo/libmutation.a  data_dictionary/RelWithDebInfo/libdata_dictionary.a  cql3/RelWithDebInfo/libcql3.a  transport/RelWithDebInfo/libtransport.a  cql3/RelWithDebInfo/libcql3.a  transport/RelWithDebInfo/libtransport.a  lang/RelWithDebInfo/liblang.a  /usr/lib64/liblua-5.4.so  -lm  /usr/lib64/libsnappy.so.1.1.10  abseil/absl/container/RelWithDebInfo/libabsl_raw_hash_set.a  abseil/absl/hash/RelWithDebInfo/libabsl_hash.a  abseil/absl/hash/RelWithDebInfo/libabsl_city.a  abseil/absl/types/RelWithDebInfo/libabsl_bad_variant_access.a  abseil/absl/hash/RelWithDebInfo/libabsl_low_level_hash.a  abseil/absl/types/RelWithDebInfo/libabsl_bad_optional_access.a  abseil/absl/container/RelWithDebInfo/libabsl_hashtablez_sampler.a  abseil/absl/profiling/RelWithDebInfo/libabsl_exponential_biased.a  abseil/absl/synchronization/RelWithDebInfo/libabsl_synchronization.a  abseil/absl/debugging/RelWithDebInfo/libabsl_stacktrace.a  abseil/absl/synchronization/RelWithDebInfo/libabsl_graphcycles_internal.a  abseil/absl/synchronization/RelWithDebInfo/libabsl_kernel_timeout_internal.a  abseil/absl/debugging/RelWithDebInfo/libabsl_symbolize.a  abseil/absl/debugging/RelWithDebInfo/libabsl_debugging_internal.a  abseil/absl/base/RelWithDebInfo/libabsl_malloc_internal.a  abseil/absl/debugging/RelWithDebInfo/libabsl_demangle_internal.a  abseil/absl/time/RelWithDebInfo/libabsl_time.a  abseil/absl/strings/RelWithDebInfo/libabsl_strings.a  abseil/absl/strings/RelWithDebInfo/libabsl_strings_internal.a  abseil/absl/strings/RelWithDebInfo/libabsl_string_view.a  abseil/absl/base/RelWithDebInfo/libabsl_throw_delegate.a  abseil/absl/numeric/RelWithDebInfo/libabsl_int128.a  abseil/absl/base/RelWithDebInfo/libabsl_base.a  abseil/absl/base/RelWithDebInfo/libabsl_raw_logging_internal.a  abseil/absl/base/RelWithDebInfo/libabsl_log_severity.a  abseil/absl/base/RelWithDebInfo/libabsl_spinlock_wait.a  -lrt  abseil/absl/time/RelWithDebInfo/libabsl_civil_time.a  abseil/absl/time/RelWithDebInfo/libabsl_time_zone.a  rust/RelWithDebInfo/libwasmtime_bindings.a  rust/librust_combined.a  /usr/lib64/libdeflate.so  utils/RelWithDebInfo/libutils.a  /usr/lib64/libxxhash.so  /usr/lib64/libcryptopp.so  /usr/lib64/libboost_regex.so.1.83.0  /usr/lib64/libicui18n.so  /usr/lib64/libicuuc.so  /usr/lib64/libboost_unit_test_framework.so.1.83.0  seastar/RelWithDebInfo/libseastar_testing.a  seastar/RelWithDebInfo/libseastar.a  /usr/lib64/libboost_program_options.so  /usr/lib64/libboost_thread.so  /usr/lib64/libboost_chrono.so  /usr/lib64/libboost_atomic.so  /usr/lib64/libcares.so  /usr/lib64/libfmt.so.10.2.1  /usr/lib64/liblz4.so  -ldl  /usr/lib64/libgnutls.so  -latomic  /usr/lib64/libsctp.so  /usr/lib64/libprotobuf.so  /usr/lib64/libyaml-cpp.so  /usr/lib64/libhwloc.so  //usr/lib64/liburing.so  /usr/lib64/libnuma.so  /usr/lib64/libboost_unit_test_framework.so && :
ld.lld: error: undefined symbol: append_seq::magic
>>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92)
>>>               test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38)
>>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92)
>>>               test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38)
>>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92)
>>>               test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(append_seq::append(int) const)
>>> referenced 5 more times
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```

it turns out `append_seq::magic` is only declared, but never defined.
please note, the non-inline static member variable in its class
definition is not considered as a definition, see
[class.static.data](https://eel.is/c++draft/class.static.data#3)

> The declaration of a non-inline static data member in its class
> definition is not a definition and may be of an incomplete type
> other than cv void.

so, let's declare it as a `constexpr` instead. it implies `inline`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19283
2024-06-14 10:00:21 +03:00
Kefu Chai
4c1006a5bb dist: s/SafeConfigParser/ConfigParser/
`SafeConfigParser` was renamed to `ConfigParser` in Python 3.2,
and Python warns us:

> scylla-housekeeping:183: DeprecationWarning: The SafeConfigParser
> class has been renamed to ConfigParser in Python 3.2. This alias will
> be removed in Python 3.12. Use ConfigParser directly instead.

see https://docs.python.org/3.2/library/configparser.html#configparser.ConfigParser
and https://docs.python.org/3.1/library/configparser.html#configparser.SafeConfigParser

Fixes #13046
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19285
2024-06-14 09:59:22 +03:00
Kefu Chai
3a5898880e alternator: drop unused friend declaration
in 57c408ab, we dropped operator<< for `parsed::path`, but we forgot
to drop the friend declaration for it along with the operator. so in
this change, let's drop the friend declaration.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19287
2024-06-14 09:58:09 +03:00
Kefu Chai
83c6ae10c4 sstables/compress: put type constraints into template type param
more compact this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19284
2024-06-14 09:50:55 +03:00
Kefu Chai
6556cd684e cql3: remove unused operator<<
as these operators are not used anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19288
2024-06-14 09:45:35 +03:00
Botond Dénes
d50688efee Merge 'api: do not include unused headers' from Kefu Chai
these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning.

also, add api to iwyu github workflow's CLEANER_DIR, to prevent future violations.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#19269

* github.com:scylladb/scylladb:
  .github: add api to iwyu's CLEANER_DIR
  api: do not include unused headers
2024-06-14 09:34:13 +03:00
Kefu Chai
28a4298005 build: cmake: use per-mode path for building unstripped_dist_pkg
`before this change, we use "scylla" as the dependecy of
unstripped_dist_pkg, but that's implies the scylla built with the
default mode. if the build rules is generated using the
multi-config generator, the default mode does not necessarily
identical to the current `$<CONFIG>`, so let's be more explicit.
otherwise, we could run into built failure like

```
FAILED: dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz
cd /jenkins/workspace/scylla-master/scylla-ci/scylla && scripts/create-relocatable-package.py --build-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo --node-exporter-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/node_exporter --debian-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/debian /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz
ldd: /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/scylla: No such file or directory
Traceback (most recent call last):
  File "/jenkins/workspace/scylla-master/scylla-ci/scylla/scripts/create-relocatable-package.py", line 109, in <module>
    libs.update(ldd(exe))
                ^^^^^^^^
  File "/jenkins/workspace/scylla-master/scylla-ci/scylla/scripts/create-relocatable-package.py", line 37, in ldd
    for ldd_line in subprocess.check_output(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ldd', '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/scylla']' returned non-zero exit status 1.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-14 13:27:26 +08:00
Kefu Chai
b94420a9dd build: cmake: use path to be compatible with CI
this change is created in the same spirit of 1186ddef16, which
updated the rule for generating the stripped dist pkg, but it
failed to update the one for generating the unstripped dist pkg.
what's why we have build failure when the workflow is looking for
the unstripped tar.gz:

```
08:02:47  ++ ls /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz
08:02:47  ls: cannot access '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz': No such file or directory`
```

so, in this change, we fix the path.

Refs #2717

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-14 13:27:26 +08:00
Botond Dénes
ea40567bbc Merge 'Some cleanups for replica table' from Raphael "Raph" Carvalho
backport not needed, these are just cleanups.

Closes scylladb/scylladb#19260

* github.com:scylladb/scylladb:
  replica: simplify perform_cleanup_compaction()
  replica: return storage_group by reference on storage_group_for*()
  replica: devirtualize storage_group_of()
2024-06-14 08:14:58 +03:00
Botond Dénes
bf429695b6 Merge 'test_tablets: add test_tablet_storage_freeing' from Michał Chojnowski
Before work on tablets was completed, it was noticed that — due to some missing pieces of implementation — Scylla doesn't properly close sstables for migrated-away tablets. Because of this, disk space wasn't being reclaimed properly.

Since the missing pieces of implementation were added, the problem should be gone now. This patch adds a test which was used to reproduce the problem earlier. It's expected to pass now, validating that the issue was fixed.

Should be backported to branch-6.0, because the tested problem was also affecting that branch.

Fixes #16946

Closes scylladb/scylladb#18906

* github.com:scylladb/scylladb:
  test_tablets: add test_tablet_storage_freeing
  test: pylib: add get_sstables_disk_usage()
2024-06-14 08:08:54 +03:00
Raphael S. Carvalho
f143f5b90d replica: remove linear search when picking memtable_list for range scan with tablets
with tablets, we're expected to have a worst of ~100 tablets in a given
table and shard, so let's avoid linear search when looking for the
memtable_list in a range scan. we're bounded by ~100 elements, so
shouldn't be a big problem, but it's an inefficiency we can easily
get rid of.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#19286
2024-06-14 08:00:17 +03:00
Benny Halevy
fb3db7d81f perf-simple-query: add cpu_cycles / op metric
Example output:
```
bhalevy@[] scylla$ build/release/scylla perf-simple-query --default-log-level=error -c 1 --duration 10
random-seed=4058714023
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
86912.75 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42346 insns/op,   22811 cycles/op,        0 errors)
91348.29 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42306 insns/op,   22362 cycles/op,        0 errors)
87965.84 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42338 insns/op,   22966 cycles/op,        0 errors)
90793.67 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42351 insns/op,   22783 cycles/op,        0 errors)
90104.27 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42358 insns/op,   22875 cycles/op,        0 errors)
90397.13 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42355 insns/op,   22735 cycles/op,        0 errors)
89142.39 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42363 insns/op,   22996 cycles/op,        0 errors)
90410.40 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42363 insns/op,   22725 cycles/op,        0 errors)
88173.10 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42366 insns/op,   23160 cycles/op,        0 errors)
88416.51 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42379 insns/op,   23102 cycles/op,        0 errors)

median 90104.26849997675
median absolute deviation: 1244.02
maximum: 91348.29
minimum: 86912.75
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18818
2024-06-14 07:42:09 +03:00
Lakshmi Narayanan Sreethar
64768b58e5 utils/chunked_vector::reserve_partial: fix usage in callers
The method reserve_partial(), when used as documented, quits before the
intended capacity can be reserved fully. This can lead to overallocation
of memory in the last chunk when data is inserted to the chunked vector.
The method itself doesn't have any bug but the way it is being used by
the callers needs to be updated to get the desired behaviour.

Instead of calling it repeatedly with the value returned from the
previous call until it returns zero, it should be repeatedly called with
the intended size until the vector's capacity reaches that size.

This commit updates the method comment and all the callers to use the
right way.

Fixes #19254

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-06-13 21:42:11 +05:30
Raphael S. Carvalho
ace4e5111e compaction: Reduce twcs off-strategy space overhead to 10% of free space
TWCS off-strategy suffers with 100% space overhead, so a big TWCS table
can cause scylla to run out of disk space during node ops.

To not penalize TWCS tables, that take a small percentage of disk,
with increased write ampl, TWCS off-strategy will be restricted to
10% of free disk space. Then small tables can still compact all
disjoint sstables in a single round.

Fixes #16514.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-06-13 13:06:51 -03:00
Raphael S. Carvalho
0ce8ee03f1 compaction: wire storage free space into reshape procedure
After this, TWCS reshape procedure can be changed to limit job
to 10% of available space.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-06-13 12:53:27 -03:00
Raphael S. Carvalho
51c7ee889e sstables: Allow to get free space from underlying storage
That will be used in turn to restrict reshape to 10% of available space
in underlying storage.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-06-13 12:43:14 -03:00
Raphael S. Carvalho
b8bd4c51c2 replica: don't expose compaction_group to reshape task
compaction_group sits in replica layer and compaction layer is
supposed to talk to it through compaction::table_state only.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-06-13 12:43:14 -03:00
Andrei Chekun
93b9b85c12 [test.py] Refactor alternator, nodetool, rest_api
Make alternator, nodetool and rest_api test directories as python packages.
Move scylla-gdb to scylla_gdb and make it python package.
2024-06-13 13:56:10 +02:00
Avi Kivity
f1819419cc Merge 'scylla-sstable: add method to load the schema from the sstable itself' from Botond Dénes
As it turns out, each sstable carries its own schema in its serialization header (Statistics component). This schema is incomplete -- the names of the key columns are not stored, just their type. Static and regular columns do have names and types stored however. This bare-bones schema is enough to parse and display the content of the sstable. Another thing missing is schema options (the stuff after the `WITH` keyword, except the clustering order). The only options stored are the compression options (in the CompressionInfo component), this is actually needed to read the Data component.

This series adds a new method to `tools/schema_loader.cc` to extract the schema stored in the sstable itself. This new schema load method is used as the last fall-back for obtaining the schema, in case scylla-sstable is trying to autodetect the schema of the sstable. Although, right now this bare-bones schema is enough for everything scylla-sstable does, it is more future proof to stick to the "full" schema if possible, so this new method is the last resort for now.

Fixes: https://github.com/scylladb/scylladb/issues/17869
Fixes: https://github.com/scylladb/scylladb/issues/18809

New functionality, no backport needed.

Closes scylladb/scylladb#19169

* github.com:scylladb/scylladb:
  tools/scylla-sstable: log loaded schema with trace level
  tools/scylla-sstable: load schema from the sstable as fallback
  tools/schema_loader: introduce load_schema_from_sstable()
  test/lib/random_schema: remove assert on min number of regular columns
  sstables: introduce load_metadata()
2024-06-13 12:21:09 +03:00
Benny Halevy
34dfa4d3a3 storage_service: join_token_ring: reject replace on different dc or rack
Do not allow replacing a node on one dc/rack
with a node on a different dc/rack as this violates
the assumption of replace node operation that
all token ranges previously owned by the dead
node would be rebuilt on the new node.

Fixes scylladb/scylladb#16858
Refs scylladb/scylla-enterprise#3518

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16862
2024-06-13 11:19:47 +02:00
Botond Dénes
6868add228 replica/database: wire in maintenance_reader_concurrency_semaphore_count_limit
Making the count resources on the maintenance (streaming) semaphore live
update via config. This will allow us to improve repair speed on
mixed-shard clusters, where we suspect that reader trashing -- due to
the combination of high number of readers on each shard and very
conservative reader count limit (10) -- is the main cause of the
slowness.
Making this count limit confgurable allows us to start experimenting
with this fix, without committing to a count limit increase (or
removal), addressing the pain in the field.
2024-06-13 01:59:21 -04:00
Botond Dénes
665fdd6ce4 db/config: introduce maintenance_reader_concurrency_semaphore_count_limit
To control the amount of count resources of the maintenance (streaming)
semaphore. Not wired yet.
2024-06-13 01:59:21 -04:00
Botond Dénes
ba0cc29d82 reader_concurrency_semaphore: make count parameter live-update
So that the amount of count resources can be changed at run-time,
triggered by a e.g. a config change.
Previous constant-count based constructor is left intact, to avoid
patching all clients, as only a small subset will want the new
functionality.
2024-06-13 01:59:21 -04:00
Nadav Har'El
44ea1993ba test/cql-pytest: tests CREATE/DROP INDEX during paged query
This patch includes extensive testing for what happens to an ongoing
paged query when a secondary index is suddenly added or dropped.
Issue #18992 was opened suggesting that this would be broken, and indeed
the tests included here show that it is indeed broken.

The four tests included in this patch are heavily commented to explain
what they are testing and why, but here is a short summary of what is
being tested by each of them:

1. A paged query filtering on v=17 continues correctly even if an
   index is created on v.

2. A paged query filtering on v1 and v2 where v2 is indexed,
   continues correctly even if an index is created on v1 (remember
   that Scylla prefers to use the first index mentioned in the query).

3. A paged query using an index on v continues correctly even if that
   index is deleted.

4. However, if the query doesn't say "ALLOW FILTERING", it cannot
   be continued after the index is deleted.

All these tests pass on Cassandra, but all of them except the fourth
fail on Scylla, reproducing issue #18992. Somewhat to my suprise, the
failure of the query in all the failed tests is silent (i.e., trying to
fetch the next page just fetches nothing and says the iteration is done).
I was expecting more dramatic failures ("marshaling error" messages,
crashes, etc.) but didn't get them.

Refs #18992

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19000
2024-06-13 08:39:22 +03:00
Botond Dénes
145a67f77c tools/scylla-sstable: log loaded schema with trace level
The schema of the sstable can be interesting, so log it with trace
level. Unfortunately, this is not the nice CQL statement we are used to
(that requires a database object), but the not-nearly-so-nice CFMetadata
printout. Still, it is better then nothing.
2024-06-13 01:32:17 -04:00
Botond Dénes
43c44f0af5 tools/scylla-sstable: load schema from the sstable as fallback
When auto-detecting the schema of the sstable, if all other methods
failed, load the schema from the sstable's serialization header. This
schema is incomplete. It is just enough to parse and display the content
of the sstable. Although parsing and displaying the content of the
sstable is all scylla-sstable does, it is more future-compatible to us
the full schema when possible. So the always-available but minimal
schema that each sstable has on itself, is used just as a fallback.

The test which tested the case when all schema load attempts fail,
doesn't work now, because loading the serialization header always
succeeds. So convert this test into two positive tests, testing the
serialization header schema fallback instead.
2024-06-13 01:32:17 -04:00
Botond Dénes
8f2ba03465 tools/schema_loader: introduce load_schema_from_sstable()
Allows loading the schema from an sstable's serialization header. This
schema is incomplete, but it is enough to parse and display the content
of the sstable.
2024-06-13 01:32:17 -04:00
Botond Dénes
0d7335dd27 test/lib/random_schema: remove assert on min number of regular columns
It is legal for a schema to have 0 regular columns, so remove the assert
on the schema specification's regular column count.
2024-06-13 01:32:17 -04:00
Piotr Dulikowski
0b5a0c969a Merge 'hinted handoff: migrate sync point to host ID' from Michael Litvak
Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module.
Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs.
The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of new type the translation is avoided.

Fixes #18653

Closes scylladb/scylladb#19134

* github.com:scylladb/scylladb:
  db/hints: migrate sync point to host ID
  db/hints: rename sync point structures with _v1 suffix to _v1_v2
2024-06-13 06:16:00 +02:00
Kefu Chai
9d8d9168e6 .github: add api to iwyu's CLEANER_DIR
to avoid future violations of include-what-you-use.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-13 09:32:51 +08:00
Kefu Chai
c03141b4b2 api: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-13 09:32:51 +08:00
Anna Stuchlik
603c662049 doc: remove an entry about seeds from FAQ
This commit removes a useless entry from the FAQ page.
It contains a false recommendation to configure multiple seeds.

Closes scylladb/scylladb#19259
2024-06-12 19:11:52 +02:00
Dawid Medrek
dc41086c57 db/hints: Add a metric for the size of sent hints
In this commit, we add a new metric `sent_total_size`
keeping track of how many bytes of hints a node
has sent. The metric is supposed to complement its
counterpart in storage proxy that counts how many
bytes of hints a node has received. That information
should prove useful in analyzing statistics of
a cluster -- load on given nodes and where it comes
from.

We also change the name of the matric `sent`
to `sent_total` to avoid the conflict of prefixes
between the two metrics.
2024-06-12 18:20:08 +02:00
Raphael S. Carvalho
f3a1f5df83 replica: simplify perform_cleanup_compaction()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-06-12 12:44:21 -03:00
Raphael S. Carvalho
6214dda506 replica: return storage_group by reference on storage_group_for*()
those functions cannot return nullptr, will throw when group is not
found, so better return ref instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-06-12 11:53:06 -03:00
Patryk Jędrzejczak
a7ab9a015a test: manager_client, scylla_cluster: fix type annotations in add_servers 2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak
1eb25d22c6 test: manager_client: don't connect driver after failed server_{add, start}
If adding or starting a server fails expectedly, there is no reason
to update or connect the driver. Moreover, before this patch, we
couldn't use `server_add` and `servers_add` with `expected_error`
if the cluster was empty. After expected bootstrap failures, we
tried to connect the driver, which rightfully failed on
`assert len(hosts) > 0` in `cluster_con`.
2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak
8f486de8d3 test: scylla_cluster: pass seeds to add_servers
This parameter was incorrectly missing. For this reason,
`expected_error` was passed from `add_servers` to `add_server` as
`seeds`, which caused strange crashes.
2024-06-12 16:51:19 +02:00
Botond Dénes
435c01d1e6 sstables: introduce load_metadata()
Loads just the metadata components. No validation.
Split off from load(), to allow scylla-sstable to partially load an
sstable.
2024-06-12 10:46:38 -04:00
Botond Dénes
aa27f8f365 Merge 'Improve handling of outdated --experimental-features' from Pavel Emelyanov
Some time ago it turned out that if unrecognized feature name is met in scylla.yaml, the whole experimental features list is ignored, but scylla continues to boot. There's UNUSED feature which is the proper way to deprecate a feature, and this PR improves its handling in several ways.

1. The recently removed "tablets" feature is partially brought back, but marked as UNUSED
2. Any UNUSED features met while parsing are printed into logs
3. The enum_option<> helper is enlightened along the way

refs: #18968

Closes scylladb/scylladb#19230

* github.com:scylladb/scylladb:
  config: Mark tablets feature as unused
  main: Warn unused features
  enum_option: Carry optional key on board
  enum_option: Remove on-board _map member
2024-06-12 17:33:14 +03:00
Botond Dénes
d2a4cd9cae Merge 'Register API endpoints next to corresponding services' from Pavel Emelyanov
The API endpoints are registered for particular services (with rare exceptions), and once the corresponding service is ready, its endpoints section can be registered too. Same but reversed is for shutdown, and it's automatic with deferred actions.

refs: #2737

Closes scylladb/scylladb#19208

* github.com:scylladb/scylladb:
  main: Register task manager API next to task manager itself
  main: Register messaging API next to messaging service
  main: Register repair API next to repair service
2024-06-12 17:31:30 +03:00
Kefu Chai
2eca8b54de auth/role_or_anonymous: drop operator<< for role_or_anonymous
its declaration was removed in 84a9d2fa, which failed to remove
the implementation from .cc file.

in this change, let's remove operator<< for role_or_anonymous
completely.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19243
2024-06-12 17:30:20 +03:00
Raphael S. Carvalho
9c1d3bcc02 replica: devirtualize storage_group_of()
can be made private to tablet_storage_group_manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-06-12 11:29:49 -03:00
Kamil Braun
a441d06d6c raft: fsm: add details to on_internal_error_noexcept message
If we receive a message in the same term but from a different leader
than we expect, we print:
```
Got append request/install snapshot/read_quorum from an unexpected leader
```
For some reason the message did not include the details (who the leader
was and who the sender was) which requires almost zero effort and might
be useful for debugging. So let's include them.

Ref: scylladb/scylla-enterprise#4276

Closes scylladb/scylladb#19238
2024-06-12 17:29:42 +03:00
Pavel Emelyanov
4400f9082e lang: Return context as future, not via reference argument
Commit 882b2f4e9f (cql3, schema_tables: Generalize function creation)
erroneously says that optional<context> is not suitable for future<>
type, but in fact it is.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19204
2024-06-12 16:54:46 +03:00
Kefu Chai
8c99d9e721 .github: use libstdc++-13
since gcc-13 is packaged by ppa:ubuntu-toolchain-r, and GCC-13 was
released 1 year ago, let's use it instead. less warnings, as the
standard library from GCC-13 is more standard compliant.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19162
2024-06-12 16:52:05 +03:00
Botond Dénes
e91f82fd5c Merge '.github: add workflow to build with clang nightly' from Kefu Chai
to be prepared for changes from clang, and enjoy the new warnings/errors from this compiler.

* it is an improvement in our CI, no need to backport.

Closes scylladb/scylladb#19164

* github.com:scylladb/scylladb:
  .github: add workflow to build with clang nightly
  .github: rename clang-tidy-matcher.json to clang-matcher.json
2024-06-12 16:50:21 +03:00
Pavel Emelyanov
24c818453d main: Start view builder earlier
Commit 47dbf23773 (Rework view services and system-distributed-keyspace
dependencies) made streaming and repair services depend on view builder,
but missed the fact that the builder itself starts much later.

Move view builder earlier, that's safe, no activity is started upon
that, real building is kicked much later when invoke_on_all(start)
happens.

Other than than, start system distributed keyspace earlier, which also
looks safe, as it's also started "for real" later, by storage service
when it joins the ring.

fixes: #19133

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19250
2024-06-12 16:46:55 +03:00
Anna Stuchlik
3f9cc0ec3f doc: reorganize ToC of the Reference section
This commit adds a proper ToC to the Reference section to improve
how it renders.

Closes scylladb/scylladb#18901
2024-06-12 16:16:04 +03:00
Kefu Chai
da59710fb9 doc: remove unused documents
upgrade/_common are document fragments included by other documents.
but quite a few the documents previously including these fragments
were removed. but we didn't remove these fragments along with them.

in this change, we drop them.

Fixes #19245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19251
2024-06-12 16:14:57 +03:00
Botond Dénes
cd05de6cfb Merge 'test: memtable_test: increase unspooled_dirty_soft_limit ' from Kefu Chai
before this change, when performing memtable_test, we expect that
the memtables of ks.cf is the only memtables being flushed. and
we inject 4 failures in the code path of flush, and wait until 4
of them are triggered. but in the background, `dirty_memory_manager`
performs flush on all tables when necessary. so, the total number of
failures is not necessary the total number of failures triggered
when flushing ks.cf, some of them could be triggered when flushing
system tables. that's why we have sporadict test failures from
this test. as we might check `t.min_memtable_timestamp()` too soon.

after this change, we increase `unspooled_dirty_soft_limit` setting,
in order to disable `dirty_memory_manager`, so that the only flush
is performed by the test.

Fixes https://github.com/scylladb/scylladb/issues/19034

---

the issue applies to both 5.4 and 6.0, and this issue hurts the CI stability, hence we should backport it.

Closes scylladb/scylladb#19252

* github.com:scylladb/scylladb:
  test: memtable_test: increase unspooled_dirty_soft_limit
  test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE
2024-06-12 16:14:05 +03:00
Dawid Medrek
23bea50de0 service/storage_proxy: Add metrics for received hints
In this commit, we add two new metrics to storage proxy:

* `received_hints_total`,
* `received_hints_bytes_total`.

Before these changes, we had to rely solely on other
metrics indicating how many hints nodes have written,
rejected, sent, etc. Because hints are subject to
many more or less controllable factors, e.g. a target
node still being a replica for a mutation, it was
very difficult to approximate how many hints a given
node might have received or what part of its load
they were. The newly introduced metrics are supposed
to help reason about those.
2024-06-12 14:44:47 +02:00
Kefu Chai
223fba3243 test: memtable_test: increase unspooled_dirty_soft_limit
before this change, when performing memtable_test, we expect that
the memtables of ks.cf is the only memtables being flushed. and
we inject 4 failures in the code path of flush, and wait until 4
of them are triggered. but in the background, `dirty_memory_manager`
performs flush on all tables when necessary. so, the total number of
failures is not necessary the total number of failures triggered
when flushing ks.cf, some of them could be triggered when flushing
system tables. that's why we have sporadict test failures from
this test. as we might check `t.min_memtable_timestamp()` too soon.

after this change, we increase `unspooled_dirty_soft_limit` setting,
in order to disable `dirty_memory_manager`, so that the only flush
is performed by the test.

Fixes #19034
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-12 19:17:27 +08:00
Kefu Chai
2df4e9cfc2 test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE
before this change, we verify the behavior of design under test using
`BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test
fails, the test just aborts. this is not very helpful for postmortem
debugging.

after this change, we use `BOOST_REQUIRE` macro for verifying the
behavior, so that Boost.Test prints out the condition if it does not
hold when we test it.

Refs #19034
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-12 19:17:27 +08:00
Pavel Emelyanov
c752bda0a2 Merge '.github: change severity to error in clang-include-cleaner ' from Kefu Chai
in this changeset, we tighten the clang-include-cleaner workflow, and address the warnings in two more subdirectories in the source tree.

* it's a cleanup, no need to backport

Closes scylladb/scylladb#19155

* github.com:scylladb/scylladb:
  .github: add alternator to iwyu's CLEANER_DIR
  alternator: do not include unused headers
  .github: change severity to error in clang-include-cleaner
  exceptions: do not include unused headers
2024-06-12 10:16:17 +03:00
Kefu Chai
0c9ea654f5 service/paxos: drop operator<< for proposal
since we stopped using the generic container formatters which in turn
use operator<< for formatting the elemements. we can drop more
operator<< operators.

so, in this change, we drop operator<< for proposal.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19156
2024-06-12 10:14:47 +03:00
Dawid Medrek
431ec55f6c service/storage_proxy: Move a comment to its relevant place
In b92fb35, we put a comment in the wrong place. These changes
move it to the right one.

Closes scylladb/scylladb#19215
2024-06-12 10:10:02 +03:00
Avi Kivity
dffd0901b3 dist: scylla_util: sysconfig_parser: replace deprecated ConfigParser.readfp
ConfigParser.readfp was deprecated in Python 3.2 and removed in Python 3.12.

Under Fedora 40, the container fails to launch because it cannot parse its
configuration.

Fix by using the newer read_file().

Closes scylladb/scylladb#19236
2024-06-12 10:07:10 +03:00
Benny Halevy
2ed81cbf84 locator/topology: update_node: format also shard_count in debug log message
The format string is missing `shard_count={}`

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#19242
2024-06-12 10:04:23 +03:00
Kefu Chai
4175e02d9d clustering_bounds_comparator: drop operator<< for bound_kind
turns out operator<< for bound_kind is not used anymore, so let's
drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19159
2024-06-11 18:01:06 +02:00
Avi Kivity
6608f49718 Merge 'make enable_compacting_data_for_streaming_and_repair truly live-update' from Botond Dénes
This config item is propagated to the table object via table::config. Although the field in `table::config`, used to propagate the value, was `utils::updateable_value<T>`, it was assigned a constant and so the live-update chain was broken.
This series fixes this and adds a test which fails before the patch and passes after. The test needed new test infrastructure, around the failure injection api, namely the ability to exfiltrate the value of internal variable. This infrastructure is also added in this series.

Fixes: https://github.com/scylladb/scylladb/issues/18674

- [x] This patch has to be backported because it fixes broken functionality

Closes scylladb/scylladb#18705

* github.com:scylladb/scylladb:
  test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update
  test/pylib: rest_client: add get_injection()
  api/error_injection: add getter for error_injection
  utils/error_injection: add set_parameter()
  replica/database: fix live-update enable_compacting_data_for_streaming_and_repair
2024-06-11 15:53:19 +03:00
Kefu Chai
d05db52d11 build: remove coverage compiling options from the cxx_flags
in 44e85c7d, we remove coverage compiling options from the cflags
when building abseil. but in 535f2b21, these options were brought
back as parts of cxx_flags.

so we need to remove them again from cxx_flags.
Fixes #19219
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19220
2024-06-11 14:58:27 +03:00
Pavel Emelyanov
b2520b8185 config: Mark tablets feature as unused
This features used to be there for a while, but then it was removed by
83d491af02. This patch partially takes it
back, but maps to UNUSED, so that if met in config, it's warned, but
other features are parsed as well.

refs: #18968

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-11 12:58:19 +03:00
Pavel Emelyanov
b85a02a3fe main: Warn unused features
When seeing an UNUSED feature -- print it into log. This is where the
enum_option::key is in use. The thing is that experimental features map
different unused feature names into the single UNUSED feature enum
value, so once the feature is parsed its configured name only persists
in the option's key member (saved by previous patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-11 12:56:51 +03:00
Pavel Emelyanov
0c0a7d9b9a enum_option: Carry optional key on board
It facilitates option formatting, but the main purpose is to be able to
find out the exact keys, not values, later (see next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-11 12:55:14 +03:00
Pavel Emelyanov
f56cdb1cac enum_option: Remove on-board _map member
The map in question is immutable and can obtained from the Mapper type
at any time, there's no need in keeping its copy on each enum_option

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-11 12:54:39 +03:00
Michael Litvak
afc9a1a8a6 db/hints: migrate sync point to host ID
Change the format of sync points to use host ID instead of IPs, to be
consistent with the use of host IDs in hinted handoff module.
Introduce sync point v3 format which is the same as v2 except it stores
host IDs instead of IPs.
The encoding of sync points now always uses the new v3 format with host
IDs.
The decoding supports both formats with host IDs and IPs, so a sync point
contains now a variant of either types, and in the case of the new
format the translation from IP to host ID is avoided.
2024-06-11 11:07:00 +02:00
Michael Litvak
b824e73418 db/hints: rename sync point structures with _v1 suffix to _v1_v2
rename sync point types and variables to have v1/v2 suffix according to
their use.
2024-06-11 11:05:59 +02:00
Avi Kivity
03e776ce3e Update tools/java submodule
* tools/java 88809606c8...01ba3c196f (3):
  > Revert "build: don't add nonexistent directory 'lib' to relocatable packages"
  > build: run antlr in a separate process
  > build: don't add nonexistent directory 'lib' to relocatable packages
2024-06-11 11:58:56 +03:00
Botond Dénes
8ef4fbdb87 test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update
Avoid this the live-update feature of this config item breaking
silently.
2024-06-11 04:17:48 -04:00
Botond Dénes
0c61b1822c test/pylib: rest_client: add get_injection()
The /v2/error_injection/{injection} endpoint now has a GET method too,
expose this.
2024-06-11 04:17:48 -04:00
Botond Dénes
feea609e37 api/error_injection: add getter for error_injection
Allow external code to obtain information about an error injection
point, including whether it is enabled, and importantly, what its
parameters are. Together with the `set_parameter()` added in the
previous patch, this allows tests to read out the values of internal
parameters, via a set_parameter() injection point.
2024-06-11 04:17:48 -04:00
Botond Dénes
4590026b38 utils/error_injection: add set_parameter()
Allow injection points to write values into the parameter map, which
external code can then examine. This allows exfiltrating the values if
internal variables, to be examined by tests, without exposing these
variables via an "official" path.
2024-06-11 04:17:48 -04:00
Pavel Emelyanov
1b9cedb3f3 test: Reduce failure detector timeout for failed tablets migration test
Most of the time this test spends waiting for a node to die. Helps 3x times

Was
  real	9m21,950s
  user	1m11,439s
  sys	1m26,022s

Now
  real	3m37,780s
  user	0m58,439s
  sys	1m13,698s

refs: #17764

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19222
2024-06-11 09:55:06 +02:00
Calle Wilund
dfd996e7c1 describe_statement: Filter out "extension internal" keyspaces in DESC SCHEMA
Fixes /scylladb/scylla-enterprise#4168

Unless listing all (including system) keyspaces, filter out "extension internal"
keyspaces. These are to be considered "system" for the purposes of exposing to
end user.

Closes scylladb/scylladb#19214
2024-06-11 10:01:42 +03:00
Botond Dénes
dbccb61636 replica/database: fix live-update enable_compacting_data_for_streaming_and_repair
This config item is propagated to the table object via table::config.
Although the field in table::config, used to propagate the value, was
utils::updateable_value<T>, it was assigned a constant and so the
live-update chain was broken.
This patch fixes this.
2024-06-11 01:15:20 -04:00
Raphael S. Carvalho
7b41630299 replica: Refresh mutation source when allocating tablet replicas
Consider the following:

1) table A has N tablets and views
2) migration starts for a tablet of A from node 1 to 2.
3) migration is at write_both_read_old stage
4) coordinator will push writes to both nodes (pending and leaving)
5) A has view, so writes to it will also result in reads (table::push_view_replica_updates())
6) tablet's update_effective_replication_map() is not refreshing tablet sstable set (for new tablet migrating in)
7) so read on step 5 is not being able to find sstable set for tablet migrating in

Causes the following error:
"tablets - SSTable set wasn't found for tablet 21 of table mview.users"

which means loss of write on pending replica.

The fix will refresh the table's sstable set (tablet_sstable_set) and cache's snapshot.
It's not a problem to refresh the cache snapshot as long as the logical
state of the data hasn't changed, which is true when allocating new
tablet replicas. That's also done in the context of compactions for example.

Fixes #19052.
Fixes #19033.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#19099
2024-06-11 06:59:04 +03:00
Calle Wilund
51c53d8db6 main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars
Fixes scylladb/scylla-pkg#3845

Don't overwrite (or rather change) AWS credentials variables if already set in
enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI.

v2:
* Allow environment variables in reading obj storage config - allows CI to
  use real credentials in env without risking putting them info less seure
  files
* Don't write credentials info from miniserver into config, instead use said
  environment vars to propagate creds.

v3:
* Fix python launch scripts to not clear environment, thus retaining above aws envs.

Closes scylladb/scylladb#19086
2024-06-11 06:59:04 +03:00
Nadav Har'El
73dfa4143a cql-pytest: translate Cassandra's tests for SELECT DISTINCT
This is a translation of Cassandra's CQL unit test source file
DistinctQueryPagingTest.java into our cql-pytest framework.

The 5 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for one already-known issue:

Refs #10354: SELECT DISTINCT should allow filter on static columns,
             not just partition keys

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18971
2024-06-11 06:59:04 +03:00
Michał Chojnowski
823da140dd test_tablets: add test_tablet_storage_freeing
Tests that tablet storage is freed after it is migrated away.

Fixes #16946
2024-06-10 14:25:37 +02:00
Michał Chojnowski
7741491b47 test: pylib: add get_sstables_disk_usage()
Adds an util for measuring the disk usage of the given table on the given
node.
Will be used in a follow-up patch for testing that sstables are freed
properly.
2024-06-10 14:25:37 +02:00
Pavel Emelyanov
b10ddcfd18 main: Register task manager API next to task manager itself
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-10 12:49:11 +03:00
Pavel Emelyanov
02c36ebd2e main: Register messaging API next to messaging service
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-10 12:49:02 +03:00
Pavel Emelyanov
f7e4724770 main: Register repair API next to repair service
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-10 12:48:51 +03:00
Anna Stuchlik
55ed18db07 doc: mark tablets as GA in the CREATE KEYSPACE section
This commit removes the information that tablets are an experimental feature
from the CREATE KEYSPACE section.

In addition, it removes the notes and cautions that are redundant when
a feature is GA, especially the information and warnings about the future
plans.

Fixes https://github.com/scylladb/scylladb/issues/18670

Closes scylladb/scylladb#19063
2024-06-10 12:36:36 +03:00
Kefu Chai
069be01451 lang: remove redundant std::move()
C++ standard enforces copy elision in this case. and copy elision is
more performant than constructing the return value with a move
constructor, so no need to use `std:move()` here.

and GCC-14 rightfully points this out:

```
/home/kefu/dev/scylladb/lang/lua.cc: In member function ‘data_value {anonymous}::from_lua_visitor::operator()(const utf8_type_impl&)’:
/var/ssd/scylladb/lang/lua.cc:797:25: error: redundant move in return statement [-Werror=redundant-move]
  797 |         return std::move(s);
      |                ~~~~~~~~~^~~
/home/kefu/dev/scylladb/lang/lua.cc:797:25: note: remove ‘std::move’ call
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19187
2024-06-10 07:41:25 +03:00
Botond Dénes
7b2aad56c4 test/boost/sstable_datafile_test: remove unused semaphores
The tests use the ones from test_env, the explicitely created ones are
unused.

Closes scylladb/scylladb#19167
2024-06-09 20:43:59 +03:00
Kefu Chai
535f2b2134 build: populate cxxflags to abseil
before this change, when building abseil, we don't pass cxxflags
to compiler, and abseil libraries are build with the default
optimization level. in the case of clang, its default optimization
level is `-O0`, it compiles the fastest, but the performance of
the emitted code is not optimized for runtime performance. but we
expect good performance for the release build. a typical command line
for building abseil looks like
```
clang++  -I/home/kefu/dev/scylladb/master/abseil -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -MF absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o.d -o absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/base/internal/scoped_set_env.cc
```

so, in this change, we populate cxxflags to abseil, so that the
per-mode `-O` option can be populated when building abseil.

after this change, the command line building abseil in release mode
looks like

```
clang++  -I/home/kefu/dev/scylladb/master/abseil -ffunction-sections -fdata-sections  -O3 -mllvm -inline-threshold=2500 -fno-slp-vectorize -DSCYLLA_BUILD_MODE=release -g -gz -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -MF absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o.d -o absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/flags/internal/commandlineflag.cc
```

Refs 0b0e661a85
Fixes #19161
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19160
2024-06-09 20:01:50 +03:00
Tomasz Grabiec
c8f71f4825 test: tablets: Fix flakiness of test_removenode_with_ignored_node due to read timeout
The check query may be executed on a node which doesn't yet see that
the downed server is down, as it is not shut down gracefully. The
query coordinator can choose the down node as a CL=1 replica for read
and time out.

To fix, wait for all nodes to notice the node is down before executing
the checking query.

Fixes #17938

Closes scylladb/scylladb#19137
2024-06-09 19:39:57 +03:00
Kefu Chai
b5dce7e3d0 docs: correct the link pointing to Scylla U
before this change it points to
https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/
which then redirects the browser to
https://university.scylladb.com/courses/scylla-operations/,
but it should have point to
https://university.scylladb.com/courses/data-modeling/lessons/change-data-capture-cdc/

in this change, the hyperlink is corrected.

Fixes #19163
Refs 6e97b83b60
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19182
2024-06-09 19:37:21 +03:00
Avi Kivity
7b301f0cb9 Merge 'Encapsulate wasm and lua management in lang::manager service' from Pavel Emelyanov
After wasm udf appeared, code in main, create_function_statement and schema_tables got some involvements into details of wasm engine management. Also, even prior to this, there was duplication in how function context is created by statement code and schema_tables code.

This PR generalizes function context creation and encapsulates the management in sharded<lang::manager> service. Also it removes the wasm::startup_context thing and makes wasm start/stop be "classical" (see #2737)

Closes scylladb/scylladb#19166

* github.com:scylladb/scylladb:
  code: Enlighten wasm headers usage
  lang: Unfriend wasm context from manager
  lang, cql3, schema_tables: Don't mess with db::config
  lang: Don't use db::config to create lua context
  lang: Don't use db::config to create wasm context
  lang: Drop manager::precompile() method
  cql3, schema_tables: Generalize function creation
  wasm: Replace startup_context with wasm_config
  lang: Add manager::start() method
  lang: Move manager to lang namespace
  lang: Move wasm::manager to its .cc/.hh files
2024-06-09 19:32:26 +03:00
Kefu Chai
9318d21a22 sstables: change const_iterator::value_type to uint64_t
in general, the value_type of a `const_iterator` is `T` instead of
`const T`, what has the const specifier is `reference`. because,
when dereferencing an iterator, the value type does not matter any
more, as it always a copy.

and GCC-14 points this out:

```
/home/kefu/dev/scylladb/sstables/compress.hh:224:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
  224 |             value_type operator*() const {
      |             ^~~~~~~~~~
/home/kefu/dev/scylladb/sstables/compress.hh:228:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
  228 |             value_type operator[](ssize_t i) const {
      |             ^~~~~~~~~~
```

so, in this change, let's change the value_type to `uint64_t`.
please note, it's not typical to return `value_type` from `operator*`
or `operator[]` of an iterator. but due to the design of
segmented_offsets, we cannot return a reference, so let's keep it
this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19186
2024-06-09 19:21:16 +03:00
Avi Kivity
b2a500a9a1 Merge 'alternator: keep TTL work in the maintenance scheduling group' from Botond Dénes
Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group).
This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group.

Fixes: #18719

- [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this.

Closes scylladb/scylladb#18729

* github.com:scylladb/scylladb:
  alternator, scheduler: test reproducing RPC scheduling group bug
  main: add maintenance tenant to messaging_service's scheduling config
2024-06-09 19:20:18 +03:00
Kefu Chai
58edee8d93 mutation/mutation_rebuilder: remove redundant std::move()
GCC-14 rightfully points out:

```
/var/ssd/scylladb/mutation/mutation_rebuilder.hh: In member function ‘const mutation& mutation_rebuilder::consume_new_partition(const dht::decorated_key&)’:
/var/ssd/scylladb/mutation/mutation_rebuilder.hh:24:36: error: redundant move in initialization [-Werror=redundant-move]
   24 |         _m = mutation(_s, std::move(dk));                                                                                                                                                                           |                           ~~~~~~~~~^~~~
/var/ssd/scylladb/mutation/mutation_rebuilder.hh:24:36: note: remove ‘std::move’ call
```

as `dk` is passed with a const reference, `std::move()` does not help
the callee to consume from it. so drop the `std::move()` here.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19188
2024-06-09 19:19:37 +03:00
Nadav Har'El
13cf6c543d test/alternator: fix flaky test test_item_latency
The Alternator test test_metrics.py::test_item_latency confirms that
for several operation types (PutItem, GetItem, DeleteItem, UpdateItem)
we did not forget to measure their latencies.

The test checked that a latency was updated by checking that two metrics
increases:
    scylla_alternator_op_latency_count
    scylla_alternator_op_latency_sum

However, it turns out that the "sum" is only an approximate sum of all
latencies, and when the total sum grows large it sometimes does *not*
increase when a short latency is added to the statistics. When this
happens, this test fails on the assertion that the "sum" increases after
an operation. We saw this happening sometimes in CI runs.

The simple fix is to stop checking _sum at all, and only verify that
the _count increases - this is really an integer counter that
unconditionally increases when a latency is added to the histogram.

Don't worry that the strength of this test is reduced - this test was
never meant to check the accuracy or correctness of the histograms -
we should have different (and better) tests for that, unrelated to
Alternator. The purpose of *this* test is only to verify that for some
specific operation like PutItem, Alternator didn't forget to measure its
latency and update the histogram. We want to avoid a bug like we had
in counters in the past (#9406).

Fixes #18847.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19080
2024-06-09 19:19:09 +03:00
Botond Dénes
37fd568139 sstables/compress.hh: remove unused forward declaration
struct compress if forward declared right before its definition. At some
point in the past there was probably some code there using it, but now
its gone so remove it.

Closes scylladb/scylladb#19168
2024-06-09 17:52:05 +03:00
Guilherme Nogueira
cf157e4423 Remove comma that breaks CQL DML on tablets.rst
The current sample reads:

```cql
CREATE KEYSPACE my_keyspace
WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'replication_factor': 3,
} AND tablets = {
    'enabled': false
};
```

The additional comma after `'replication_factor': 3` breaks the query execution.

Closes scylladb/scylladb#19177
2024-06-09 14:58:13 +03:00
Botond Dénes
6e3b997e04 docs: nodetool status: document keyspace and table arguments
Also fix the example nodetool status invocation.

Fixes: #17840

Closes scylladb/scylladb#18037
2024-06-09 00:37:12 +02:00
Kefu Chai
f4706be8a8 test: test_topology_ops: adapt to tablets
in e7d4e080, we reenabled the background writes in this test, but
when running with tablets enabled, background writes are still
disabled because of #17025, which was fixed last week. so we can
enable background writes with tablets.

in this change,

* background writes are enabled with tablets.
* increase the number of nodes by 1 so that we have enough nodes
  to fulfill the needs of tablets, which enforces that the number
  of replicas should always satisfy RF.
* pass rf to `start_writes()` explicitly, so we have less
  magic numbers in the test, and make the data dependencies
  more obvious.

Fixes #17589
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18707
2024-06-08 17:46:37 +02:00
Dawid Medrek
a5528a2093 db/hints: Log when ignoring invalid hint directories
In 58784cd, aa4b06a and other commits migrating
hinted handoff from IPs to host IDs (scylladb/scylladb#15567),
we started ignoring hint directories of invalid names,
i.e. those that represent neither an IP address, nor a host ID.
They remain on disk and are taken into account while computing
e.g. the total size of hints, but they're not used in any way.

These changes add logs informing the user when Scylla
encounters such a directory.

Closes scylladb/scylladb#17566
2024-06-07 19:19:15 +02:00
Michał Chojnowski
fee48f67ef storage_proxy: avoid infinite growth of _throttled_writes
storage_proxy has a throttling mechanism which attempts to limit the number
of background writes by forcefully raising CL to ALL
(it's not implemented exactly like that, but that's the effect) when
the amount of background and queued writes is above some fixed threshold.
If this is applied to a write, it becomes "throttled",
and its ID is appended to into _throttled_writes.

Whenever the amount of background and queued writes falls below the threshold,
writes are "unthrottled" — some IDs are popped from _throttled_writes
and the writes represented by these IDs — if their handlers still exist —
have their CL lowered back.

The problem here is that IDs are only ever removed from _throttled_writes
if the number of queued and background writes falls below the threshold.
But this doesn't have to happen in any finite time, if there's constant write
pressure. And in fact, in one load test, it hasn't happened in 3 hours,
eventually causing the buffer to grow into gigabytes and trigger OOM.

This patch is intended to be a good-enough-in-practice fix for the problem.

Fixes scylladb/scylladb#17476
Fixes scylladb/scylladb#1834

Closes scylladb/scylladb#19136
2024-06-07 15:56:23 +02:00
Gleb Natapov
34cf5c81f6 group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group
Currently they both run in streaming group and it may become busy during
repair/mv building and affect group0 functionality. Move it to the
gossiper group where it should have more time to run.

Fixes scylladb/scylladb#18863

Closes scylladb/scylladb#19138
2024-06-07 15:31:44 +02:00
Pavel Emelyanov
bebd121936 code: Enlighten wasm headers usage
Now when function context creation is encapsulated in lang::manager,
some .cc files can stop using wasm-specific headers and just go with the
lang/manager.hh one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
ceebbc5948 lang: Unfriend wasm context from manager
The friendship was needed to get engine and instance cache from manager,
but there's a shorter way to create cotnext with the info it needs.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
b0ffc03599 lang, cql3, schema_tables: Don't mess with db::config
Not function context creation is encapsulated in lang::manager so it's
possible to patch-out few more places that use database as config
provider.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
b854bf4b83 lang: Don't use db::config to create lua context
Similarly to previous patch, lua context needs db::config for creation.
It's better to get the configurables via lang::manager::config.

One thing to note -- lua config carries updateable_values on board, but
respective db::config options and _not_ LiveUpdate-able, so the lua
config could just use simple data types. This patch keeps updateable
values intact for brevity.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
783ccc0a74 lang: Don't use db::config to create wasm context
The managerr needs to get two "fuel" configurables from db::config in
order to create context. Instead of carrying db config from callers,
keep the options on existing lang::manager::config and use them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
f277bd89f5 lang: Drop manager::precompile() method
It's not helping much any longer. Manager can call wasm:: stuff directly
with less code involved.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
882b2f4e9f cql3, schema_tables: Generalize function creation
When a function is created with the CREATE FUNCTION statement, the
statement handler does all the necessary preparations on its own. The
very same code exists in schema_tables, when the function is loaded on
boot. This patch generalizes both and keeps function language-specific
context creation inside lang/ code.

The creation function returns context via argument reference. It would
have been nicer if it was returned via future<>, but it's not suitable
for future<T> type :(

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
fe7ff7172d wasm: Replace startup_context with wasm_config
The lang::manager starts with the help of a context because it needs to
have std::shared_ptr<> pointg to cross-shard shared wasm engine and
runner thread. For that a context is created in advance, that then helps
sharing the engine and runner across manager instances.

This patch removes the "context" and replaces it with classical
manager::config. With it, it's lang::manager who's now responsible for
initializing itself.

In order to have cross-shard engine and thread pointers, the start()
method uses invoke_on_others() facility to share the pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Pavel Emelyanov
0dad72b736 lang: Add manager::start() method
Just like any other sharded<> service, the lang::manager now starts and
stops in a classical sequence of

  await sharded<manager>::start()
  defer([] { await sharded<manager>::stop() })
  await sharded<manager>::invoke_on_all(&manager::start)

For now the method is no-op, next patches will start using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Pavel Emelyanov
f950469af5 lang: Move manager to lang namespace
And, while at it, rename local variable to refer to it to as "manager"
not "wasm". Query processor and database also have getters named
"wasm()", these are not renamed yet to keep patch smaller (and those
getters are going to be reworked further anyway).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Pavel Emelyanov
1dec79e97d lang: Move wasm::manager to its .cc/.hh files
It's going to become a facade in front of both -- wasm and lua, so keep
it in files with language independent names.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Marcin Maliszkiewicz
c13fea371c cql3: always return created event in create ks/table/type/view statement
In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS
and later USE KEYSPACE it can happen that schema in driver's session is
out of sync because it synces when it receives special message from
CREATE KEYSPACE response.

Similar situation occurs with other schema change statements.

In this patch we fix only create keyspace/table/type/view statements
by always sending created event. Behavior of any other schema altering
statements remains unchanged.
2024-06-07 10:36:40 +02:00
Marcin Maliszkiewicz
f6108a72d3 cql3: auth: move auto-grant closer to resource creation code
This should reduce the risk of re-introducing issue similar to
the one fixed in ab6988c52f

When grant code is closer to actual creation code (announcing mutations)
there is lower chance of those two effects being triggered differently,
if we ever call grant_permissions_to_creator and not announce mutations
that's very likely a security vulnerability.

Additionally comment was rewritten to be more accurate.
2024-06-07 10:26:32 +02:00
Piotr Dulikowski
e18aeb2486 Merge 'mv: gossip the same backlog if a different backlog was sent in a response' from Wojciech Mitros
Currently, there are 2 ways of sharing a backlog with other nodes: through
a gossip mechanism, and with responses to replica writes. In gossip, we
check each second if the backlog changed, and if it did we update other
nodes with it. However if the backlog for this node changed on another
node with a write response, the gossiped backlog is currently not updated,
so if after the response the backlog goes back to the value from the previous
gossip round, it will not get sent and the other node will stay with an
outdated backlog - this can be observed in the following scenario:

1. Cluster starts, all nodes gossip their empty view update backlog to one another
2. On node N, `view_update_backlog_broker` (the backlog gossiper) performs an iteration of its backlog update loop, sees no change (backlog has been empty since the start), schedules the next iteration after 1s
3. Within the next 1s, coordinator (different than N) sends a write to N causing a remote view update (which we do not wait for). As a result, node N replies immediately with an increased view update backlog, which is then noted by the coordinator.
4. Still within the 1s, node N finishes the view update in the background, dropping its view update backlog to 0.
5. In the next and following iterations of `view_update_backlog_broker` on N, backlog is empty, as it was in step 2, so no change is seen and no update is sent due to the check
```
auto backlog = _sp.local().get_view_update_backlog();
if (backlog_published && *backlog_published == backlog) {
    sleep_abortable(gms::gossiper::INTERVAL, _as).get();
    continue;
}
```

After this scenario happens, the coordinator keeps an information about an increased view update backlog on N even though it's actually already empty

This patch fixes the issue this by notifying the gossip that a different backlog
was sent in a response, causing it to send an unchanged backlog to other
nodes in the following gossip round.

Fixes: https://github.com/scylladb/scylladb/issues/18461

Similarly to https://github.com/scylladb/scylladb/pull/18646, without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none

Tests: manual. Currently this patch only affects the length of MV flow control delay, which is not reliable to base a test on. A proper test will be added when MV admission control is added, so we'll be able to base the test on rejected requests

Closes scylladb/scylladb#18663

* github.com:scylladb/scylladb:
  mv: gossip the same backlog if a different backlog was sent in a response
  node_update_backlog: divide adding and fetching backlogs
2024-06-07 10:20:21 +02:00
Marcin Maliszkiewicz
281c06ba2e cql3: extract create ks/table/type/view event code
So that the code in subsequent commit is cleaner.

Create function/aggregate code was not changed as it
would require bigger refactor.
2024-06-07 10:07:50 +02:00
Wojciech Mitros
4aa7ada771 exceptions: make view update timeouts inherit from timed_out_error
Currently, when generating and propagating view updates, if we notice
that we've already exceeded the time limit, we throw an exception
inheriting from `request_timeout_exception`, to later catch and
log it when finishing request handling. However, when catching, we
only check timeouts by matching the `timed_out_error` exception,
so the exception thrown in the view update code is not registered
as a timeout exception, but an unknown one. This can cause tests
which were based on the log output to start failing, as in the past
we were noticing the timeout at the end of the request handling
and using the `timed_out_error` to keep processing it and now, even
though we do notice the timeout even earlier, due to it's type we
log an error to the log, instead of treating it as a regular timeout.
In this patch we make the error thrown on timeout during view updates
inherit from `timed_out_error` instead of the `request_timeout_exception`
(it is also moved from the "exceptions" directory, where we define
exceptions returned to the user).
Aside from helping with the issue described above, we also improve our
metrics, as the `request_timeout_exception` is also not checked for
in the `is_timeout_exception` method, and because we're using it to
check whether we should update write timeout metrics, they will only
start getting updated after this patch.

Closes scylladb/scylladb#19102
2024-06-07 09:54:48 +02:00
Kefu Chai
01568a36a5 .github: add workflow to build with clang nightly
to be prepared for changes from clang, and enjoy the new
warnings/errors from this compiler.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 14:23:06 +08:00
Kefu Chai
bbeabe2989 .github: rename clang-tidy-matcher.json to clang-matcher.json
as the matcher actually applies to all warnings from clang frontend,
and hence can be reused when building the tree with clang, so let's
rename it before using it in the clang build workflows.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 14:23:06 +08:00
Anna Stuchlik
582bafabb3 doc: set 6.0 as the latest stable version
This commit updates the configuration for ScyllaDB documentation so that:

6.0 is the latest version.
6.0 is removed from the list of unstable versions.

It must be merged when ScyllaDB 6.0 is released.

No backport is required.

Closes scylladb/scylladb#19003
2024-06-07 09:13:56 +03:00
Kefu Chai
571ab9f5f0 config: expand on rpc_keepalive's description
before this change, we use "RPC or native". but before thrift support
is removed "RPC" implies "thrift", now that we've dropped thrift
support, "RPC" could be confusing here, so let's be more specific,
and put all connection types in place of "RPC or native".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 09:23:10 +08:00
Kefu Chai
c75442bc2a api: s/rpc/thrift/
replace all occurrences of "rpc" in function names and debugging
messages to "thrift", as "rpc" is way too general, and since we
are removing "thrift" support, let's take this opportunity to
use a more specific name.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 09:23:10 +08:00
Kefu Chai
36239ec592 db/system_keyspace: drop thrift_version from system.local table
so we don't create new sstables with this unused column, but we
can still open old sstables of this table which was created with
the old schema.

Refs #3811
Refs #18416

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 09:23:10 +08:00
Kefu Chai
f688fa16bc transport: do not return client_type from cql_server::connection::make_client_key()
since we've dropped the thift support, the `client_type` is always
`cql`, there is no need to differentiate different clients anymore.
so, we change `make_client_key()` so that it only return the IP address
and port.

Refs #3811
Refs #18416

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 09:23:06 +08:00
Kefu Chai
0e04a033af .github: add alternator to iwyu's CLEANER_DIR
to avoid future violations of include-what-you-use.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 07:45:00 +08:00
Kefu Chai
a2f54ded80 alternator: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 07:45:00 +08:00
Kefu Chai
0ff66bf564 .github: change severity to error in clang-include-cleaner
since we've addressed all warnings, we are ready to tighten the
standards of this workflow, so that contributors are awared of
the violation of include-what-you-use policy.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 07:28:52 +08:00
Kefu Chai
d33ab21ef8 exceptions: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 07:28:52 +08:00
Kefu Chai
ad649be1bf treewide: drop thrift support
thrift support was deprecated since ScyllaDB 5.2

> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.

so let's drop it. in this change,

* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is
  preserved for backward compatibility, as we could load
  from an existing system.local table which still contains
  this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for
  backward compatibility with java-based nodetool.
* `rpc_port` and `start_rpc` options are preserved, but
  they are marked as "Unused". so that the new release
  of scylladb can consume existing scylla.yaml configurations
  which might contain these settings. by making them
  deprecated, user will be able get warned, and update
  their configurations before we actually remove them
  in the next major release.

Fixes #3811
Fixes #18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 06:44:59 +08:00
Avi Kivity
cd553848c1 Merge 'auth-v2: use a single transaction in auth related statements ' from Marcin Maliszkiewicz
Due to gradual raft introduction into statements code in cases when single statement modified more than one table or mutation producing function was composed out of simpler ones we violated transactional logic and statement execution was not atomic as whole.

This patch changes that, so now either all changes resulting from statement execution are applied or none. Affected statements types are:
- schema modification
- auth modifications
- service levels modifications

Fixes https://github.com/scylladb/scylladb/issues/17738

Closes scylladb/scylladb#17910

* github.com:scylladb/scylladb:
  raft: rename mutations_collector to group0_batch
  raft: rename announce to commit
  cql3: raft: attach description to each mutations collector group
  auth: unify mutations_generator type
  auth: drop redundant 'this' keyword
  auth: remove no longer used code from standard_role_manager::legacy_modify_membership
  cql3: auth: use mutation collector for service levels statements
  cql3: auth: use mutation collector for alter role
  cql3: auth: use mutation collector for grant role and revoke role
  cql3: auth: use mutation collector for drop role and auto-revoke
  auth: add refactored modify_membership func in standard_role_manager
  auth: implement empty revoke_all in allow_all_authorizer
  auth: drop request_execution_exception handling from default_authorizer::revoke_all
  Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks"
  cql3: auth: use mutation collector for grant and revoke permissions
  cql3: extract changes_tablets function in alter_keyspace_statement
  cql3: auth: use mutation collector for create role statement
  auth: move create_role code into service
  auth: add a way to announce mutations having only client_state ref
  auth: add collect_mutations common helper
  auth: remove unused header in common.hh
  auth: add class for gathering mutations without immediate announce
  auth: cql3: use auth facade functions consistently on write path
  auth: remove unused is_enforcing function
2024-06-06 17:31:26 +03:00
Yaniv Michael Kaul
82875095e9 Raft: improve descriptions of metrics
1. Fixed a single typo (send -> sent)
2. Rephrase 'How many' to 'Number of' and use less passive tense.
3. Be more specific in the description of the different metrics insteda of the more generic descriptions.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#19067
2024-06-06 15:18:47 +03:00
Kefu Chai
bac7e1e942 doc: document "enable_tablets" option
it sets the cluster feature of tablets, and is a prerequisite for
using tablets.

Refs #18670
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19090
2024-06-06 15:06:32 +03:00
Marcin Maliszkiewicz
63e6334a64 raft: rename mutations_collector to group0_batch 2024-06-06 13:26:34 +02:00
Kamil Braun
57e810c852 Merge 'Serialize repair with tablet migration' from Tomasz Grabiec
We want to exclude repair with tablet migrations to avoid races
between repair reads and writes with replica movement. Repair is not
prepared to handle topology transitions in the middle.

One reason why it's not safe is that repair may successfully write to
a leaving replica post streaming phase and consider all replicas to be
repaired, but in fact they are not, the new replica would not be
repaired.

Other kinds of races could result in repair failures. If repair writes
to a leaving replica which was already cleaned up, such writes will
fail, causing repair to fail.

Excluding works by keeping effective_replication_map_ptr in a version
which doesn't have table's tablets in transitions. That prevents later
transitions from starting because topology coordinator's barrier will
wait for that erm before moving to a stage later than
allow_write_both_read_old, so before any requests start using the new
topology. Also, if transitions are already running, repair waits for
them to finish.

A blocked tablet migration (e.g. due to down node) will block repair,
whereas before it would fail. Once admin resolves the cause of blocked migration,
repair will continue.

Fixes #17658.
Fixes #18561.

Closes scylladb/scylladb#18641

* github.com:scylladb/scylladb:
  test: pylib: Do not block async reactor while removing directories
  repair: Exclude tablet migrations with tablet repair
  repair_service: Propagate topology_state_machine to repair_service
  main, storage_service: Move topology_state_machine outside storage_service
  storage_srvice, toplogy: Extract topology_state_machine::await_quiesced()
  tablet_scheduler: Make disabling of balancing interrupt shuffle mode
  tablet_scheduler: Log whether balancing is considered as enabled
2024-06-06 11:27:03 +02:00
Kamil Braun
256517b570 Merge 'tablets: Filter-out left nodes in get_natural_endpoints()' from Tomasz Grabiec
The API already promises this, the comment on effective_replication_map says:
"Excludes replicas which are in the left state".

Tablet replicas on the replaced node are rebuilt after the node
already left. We may no longer have the IP mapping for the left node
so we should not include that node in the replica set. Otherwise,
storage_proxy may try to use the empty IP and fail:

  storage_proxy - No mapping for :: in the passed effective replication map

It's fine to not include it, because storage proxy uses keyspace RF
and not replica list size to determine quorum. The node is not coming
up, so noone should need to contact it.

Users which need replica list stability should use the host_id-based API.

Fixes #18843

Closes scylladb/scylladb#18955

* github.com:scylladb/scylladb:
  tablets: Filter-out left nodes in get_natural_endpoints()
  test: pylib: Extract start_writes() load generator utility
2024-06-06 11:23:27 +02:00
Wojciech Mitros
f70f774e40 mv: gossip the same backlog if a different backlog was sent in a response
Currently, there are 2 ways of sharing a backlog with other nodes: through
a gossip mechanism, and with responses to replica writes. In gossip, we
check each second if the backlog changed, and if it did we update other
nodes with it. However if the backlog for this node changed on another
node with a write response, the gossiped backlog is currently not updated,
so if after the response the backlog goes back to the value from the previous
gossip round, it will not get sent and the other node will stay with an
outdated backlog.
This patch changes this by notifying the gossip that a the backlog changed
since the last gossip round so a different backlog could have been send
through the response piggyback mechanism. With that information, gossip
will send an unchanged backlog to other nodes in the following gossip round.

Fixes: https://github.com/scylladb/scylladb/issues/18461
2024-06-06 10:45:15 +02:00
Wojciech Mitros
272e80fe0a node_update_backlog: divide adding and fetching backlogs
Currently, we only update the backlogs in node_update_backlog at the
same time when we're fetching them. This is done using storage_proxy's
method get_view_update_backlog, which is confusing because it's a getter
with side-effects. Additionally, we don't always want to update the
backlog when we're reading it (as in gossip which is only on shard 0)
and we don't always want to read it when we're updating it (when we're
not handling any writes but the backlog drops due to background work
finish).

This patch divides the node_view_backlog::add_fetch as well the
storage_proxy::get_view_update_backlog both into two methods; one
for updating and one for reading the backlog. This patch only replaces
the places where we're currently using the view backlog getter, more
situations where we should get/update the backlog should be considered
in a following patch.
2024-06-06 10:45:13 +02:00
Botond Dénes
8ff1742182 Merge 'Relax production_snitch_base's property file parsing' from Pavel Emelyanov
It consists of reading method and parsing one and it uses class fields to carry data between those two. The former is additionally built with curly continuation chains, while it's naturally linear, so turn it into a coroutine while at it

Closes scylladb/scylladb#18994

* github.com:scylladb/scylladb:
  snitch: Remove production_snitch_base::_prop_file_contents
  snitch: Remove production_snitch_base::_prop_file_size
  snitch: Coroutinize load_property_file()
2024-06-06 09:14:33 +03:00
Botond Dénes
cd10beb89d Merge 'Don't use db::config by gossiper' from Pavel Emelyanov
All sharded<service>'s a supposed to have their own config and not use global db::config one. The service config, in turn, is to be created by main/cql_test_env/whatever out of db::config and, maybe, other data. Gossiper is almost there, but it still uses db::config in few places.

Closes scylladb/scylladb#19051

* github.com:scylladb/scylladb:
  gossiper: Stop using db::config
  gossiper: Move force_gossip_generation on gossip_config
  gossiper: Move failure_detector_timeout_ms on gossip_config
  main: Fix indentation after previous patch
  main: Make gossiper config a sharded parameter
  main: Add local variable for set of seeds
  main: Add local variable for group0 id
  main: Add local variable for cluster_name
2024-06-06 09:12:51 +03:00
Botond Dénes
44975abe18 Merge 'Sanitize start-stop of protocol servers' from Pavel Emelyanov
Protocol servers are started last, and are registered in storage_service, which stops them. Also there are deferred actions scheduled to stop protocol servers on aborted start and a FIXME asking to make even this case rely on storage_service. Also, there's a (rather rare) aborted-start bug in alternator and redis. Yet, thrift can be left started in some weird circumstances. This patch fixes it all. As a side effect, the start-stop code becomes shorter and a bit better structured.

refs: #2737

Closes scylladb/scylladb#19042

* github.com:scylladb/scylladb:
  main: Start alternator expiration service earlier
  main: Start redis transparently
  main: Start alternator transparently
  main: Start thrift transparently
  main: Start native transport transparently
  storage_service: Make register_protocol_server() start the server
  storage_service: Turn register_protocol_server() async method
  storage_service: Outline register_protocol_server()
  main: Schedule deferred drain_on_shutdown() prior to protocol servers
  main: Move some trailing startup earlier
2024-06-06 09:08:05 +03:00
Botond Dénes
db5c23491e Merge '.github: annotate the report from clang-include-cleaner' from Kefu Chai
this series

* add annotation to the github pull request when extraneous `#include` processor macros are identified
* add `exceptions` subdirectory to `CLEANER_DIRS` to demonstrate the annotation. we will fix the identified issue in a follow-up change.
 ---
* This is a CI workflow improvement. No backporting is required.

Closes scylladb/scylladb#19037

* github.com:scylladb/scylladb:
  .github: add exception to CLEANER_DIRS
  .github: annotate the report from clang-include-cleaner
  .github: build headers before running clang-include-cleaner
2024-06-06 09:02:26 +03:00
Pavel Emelyanov
acc438e98b view-update-generator: Start in provided scheduling group
Currently it gets the streaming/maintenance one from database, but it
can as well just assume that it's already running in the correct one,
and the main code fulfils this assumption.

This removes one more place that uses database as sched groups provider.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19078
2024-06-06 08:58:05 +03:00
Tzach Livyatan
c30f81c389 Docs: fix start command in Update replace-dead-node.rst
Fix #18920

Closes scylladb/scylladb#18922
2024-06-06 08:56:07 +03:00
Botond Dénes
7aa9bfa661 Merge 'util/result_try: pass template arg list explicitly' from Kefu Chai
clang-19 introduced a change which enforces the change proposed by [CWG 96](https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#96), which was accepted by C++20 in [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html), as [[temp.names]p5](https://eel.is/c++draft/temp.names#6).

so, to be future-proof and to be standard compliant, let's pass the
template arguments. otherwise we'd have build failure like
```
error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
```

---

no need to backport. as this change only addresses a FTBFS with a recent build of clang-19. but our CI is not a clang built from llvm's main HEAD.

Closes scylladb/scylladb#19100

* github.com:scylladb/scylladb:
  util/result_try: pass template arg list explicitly
  util/result_try: pass func as `const F&` instead of `F&&`
2024-06-06 08:54:42 +03:00
Nadav Har'El
b5fd854c77 cql-pytest: be more forgiving to ancient versions of Scylla
We recently added to cql-pytest tests the ability to check if tablets
are enabled or not (for some tablet-specific tests). When running
tests against Cassandra or old pre-tablet versions of Scylla, this
fact is detected and "False" is returned immediately. However, we
still look at a system table which didn't exist on really ancient
versions of Scylla, and tests couldn't run against such versions.

The fix is trivial: if that system table is missing, just ignore the
error and return False (i.e., no tablets). There were no tablets on
such ancient versions of Scylla.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#19098
2024-06-06 08:53:26 +03:00
Pavel Emelyanov
4606302ead distributed_loader: Remove base_path from populator
It's unused, populator uses it to print debugging messages, but it can
as well use table->dir() for it, just as sstable_directory does. One
message looks useless and is removed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19113
2024-06-06 08:49:41 +03:00
Pavel Emelyanov
84f0bab27c hints/manager: Simplify hints dir evaluation
Currently the code wraps simple "if" with std::invoke over a lambda.
Also, the local variable that gets the result, is declared as const one,
which prevents it from being std::move()-d in the very next line.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19106
2024-06-06 08:31:30 +03:00
Pavel Emelyanov
ad0e6b79fc replica: Remove all_datadir from keyspace config
This vector of paths is only used to generate the same vector of paths
for table config, but the latter already has all the needed info.

It's the part of the plan to stop using paths/directories in keyspaces
and tables, because with storage-options tables no longer keep their
data in "files on disk", so this information goes to sstables storage
manager (refs #12707)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19119
2024-06-06 08:30:34 +03:00
Kefu Chai
4a36918989 topology_coordinator: handle/wait futures when stopping topology_coordinator
before this change, unlike other services in scylla,
topology_coordinator is not properly stopped when it is aborted,
because the scylla instance is no longer a leader or is being shut down.
its `run()` method just stops the grand loop and bails out before
topology_coordinator is destroyed. but we are tracking the migration
state of tablets using a bunch of futures, which might not be
handled yet, and some of them could carry failures. in that case,
when the `future` instances with failure state get destroyed,
seastar calls `report_failed_future`. and seastar considers this
practice a source a bug -- as one just fails to handle an error.
that's why we have following error:

```
WARN  2024-05-19 23:00:42,895 [shard 0:strm] seastar - Exceptional future ignored: seastar::rpc::unknown_verb_error (unknown verb), backtrace: /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56c14e /home/bhalevy/.ccm/scylla-repository/local_tarball/libre
loc/libseastar.so+0x56c770 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56ca58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x38c6ad 0x29cdd07 0x29b376b 0x29a5b65 0x108105a /home/bhalevy/.ccm/scylla-repository/local_tarbal
l/libreloc/libseastar.so+0x3ff1df /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x400367 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff838 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36de58
 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d092 0x1017cba 0x1055080 0x1016ba7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a 0x1015524
```
and the backtrace looks like:
```
seastar::current_backtrace_tasklocal() at ??:?
seastar::current_tasktrace() at ??:?
seastar::current_backtrace() at ??:?
seastar::report_failed_future(seastar::future_state_base::any&&) at ??:?
service::topology_coordinator::tablet_migration_state::~tablet_migration_state() at topology_coordinator.cc:?
service::topology_coordinator::~topology_coordinator() at topology_coordinator.cc:?
service::run_topology_coordinator(seastar::sharded<db::system_distributed_keyspace>&, gms::gossiper&, netw::messaging_service&, locator::shared_token_metadata&, db::system_keyspace&, replica::database&, service::raft_group0&, service::topology_state_machine&, seastar::abort_source&, raft::server&, seastar::noncopyable_function<seastar::future<service::raft_topology_cmd_result> (utils::tagged_tagged_integer<raft::internal::non_final, raft::term_tag, unsigned long>, unsigned long, service::raft_topology_cmd const&)>, service::tablet_allocator&, std::chrono::duration<long, std::ratio<1l, 1000l> >, service::endpoint_lifecycle_notifier&) [clone .resume] at topology_coordinator.cc:?
seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at main.cc:?
seastar::reactor::run_some_tasks() at ??:?
seastar::reactor::do_run() at ??:?
seastar::reactor::run() at ??:?
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ??:?
```

and even worse, these futures are indirectly owned by `topology_coordinator`.
so there are chances that they could be used even after `topology_coordinator`
is destroyed. this is a use-after-free issue. because the
`run_topology_coordinator` fiber exits when the scylla instance retires
from the leader's role, this use-after-free could be fatal to a
running instance due to undefined behavior of use after free.

so, in this change, we handle the futures in `_tablets`, and note
down the failures carried by them if any.

Fixes #18745
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18991
2024-06-06 07:55:03 +03:00
Israel Fruchter
1fd600999b Update tools/cqlsh submodule v6.0.20
* tools/cqlsh c8158555...0d58e5ce (6):
  > cqlsh.py: fix server side describe after login command
  > cqlsh: try server-side DESCRIBE, then client-side
  > Refactor tests to accept both client and server side describe
  > github actions: support testing with enterprise release
  > Add the tab-completion support of SERVICE_LEVEL statements
  > reloc/build_reloc.sh: don't use `--no-build-isolation`

Closes scylladb/scylladb#18990
2024-06-06 07:32:05 +03:00
Tomasz Grabiec
2c3f7c996f test: pylib: Fetch all pages by default in run_async
Fetching only the first page is not the intuitive behavior expected by users.

This causes flakiness in some tests which generate variable amount of
keys depending on execution speed and verify later that all keys were
written using a single SELECT statement. When the amount of keys
becomes larger than page size, the test fails.

Fixes #18774

Closes scylladb/scylladb#19004
2024-06-05 18:07:24 +03:00
Tomasz Grabiec
5ca54a6e88 test: pylib: Do not block async reactor while removing directories
This fixes a problem where suite cleanup schedules lots of uninstall()
tasks for servers started in the suite, which schedules lots of tasks,
which synchronously call rmtree(). These take over a minute to finish,
which blocks other tasks for tests which are still executing.

In particular, this was observed to case
ManagerClient.server_stop_gracefully() to time-out. It has a timeout
of 60 seconds. The server was stopped quickly, but the RESTful API
response was not processed in time and the call timed out when it got
the async reactor.
2024-06-05 16:11:22 +02:00
Tomasz Grabiec
98323be296 repair: Exclude tablet migrations with tablet repair
We want to exclude repair with tablet migrations to avoid races
between repair reads and writes with replica movement. Repair is not
prepared to handle topology transitions in the middle.

One reason why it's not safe is that repair may successfully write to
a leaving replica post streaming phase and consider all replicas to be
repaired, but in fact they are not, the new replica would not be
repaired.

Other kinds of races could result in repair failures. If repair writes
to a leaving replica which was already cleaned up, such writes will
fail, causing repair to fail.

Excluding works by keeping effective_replication_map_ptr in a version
which doesn't have table's tablets in transitions. That prevents later
transitions from starting because topology coordinator's barrier will
wait for that erm before moving to a stage later than
allow_write_both_read_old, so before any requets start using the new
topology. Also, if transitions are already running, repair waits for
them to finish.

Fixes #17658.
Fixes #18561.
2024-06-05 16:11:22 +02:00
Tomasz Grabiec
e97acf4e30 repair_service: Propagate topology_state_machine to repair_service 2024-06-05 16:11:22 +02:00
Tomasz Grabiec
c45ce41330 main, storage_service: Move topology_state_machine outside storage_service
It will be propagated to repair_service to avoid cyclic dependency:

storage_service <-> repair_service
2024-06-05 16:11:22 +02:00
Tomasz Grabiec
476c076a21 storage_srvice, toplogy: Extract topology_state_machine::await_quiesced()
Will be used later in a place which doesn't have access to storage_service
but has to toplogy_state_machine.

It's not necessary to start group0 operation around polling because
the busy() state can be checked atomically and if it's false it means
the topology is no longer busy.
2024-06-05 16:11:22 +02:00
Tomasz Grabiec
1513d6f0b0 tablet_scheduler: Make disabling of balancing interrupt shuffle mode
Tests will rely on that, they will run in shuffle mode, and disable
balancing around section which otherwise would be infinitely blocked
by ongoing shuffling (like repair).
2024-06-05 16:11:22 +02:00
Tomasz Grabiec
6c64cf33df tablet_scheduler: Log whether balancing is considered as enabled 2024-06-05 16:11:22 +02:00
Benny Halevy
b2fa954d82 gms: endpoint_state: get_dc_rack: do not assign to uninitialized memory
Assigning to a member of an uninitialized optional
does not initialize the object before assigning to it.
This resulted in the AddressSanitizer detecting attempt
to double-free when the uninitialized string contained
apprently a bogus pointer.

The change emplaces the returned optional when needed
without resorting to the copy-assignment operator.
So it's not suceptible to assigning to uninitialized
memory, and it's more efficient as well...

Fixes scylladb/scylladb#19041

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#19043
2024-06-05 13:09:01 +03:00
Kamil Braun
18f5d6fd89 Merge 'Fail bootstrap if ip mapping is missing during double write stage' from Gleb Natapov
If a node restart just before it stores bootstrapping node's IP it will
not have ID to IP mapping for bootstrapping node which may cause failure
on a write path. Detect this and fail bootstrapping if it happens.

Closes scylladb/scylladb#18927

* github.com:scylladb/scylladb:
  raft topology: fix indentation after previous commit
  raft topology: do not add bootstrapping node without IP as pending
  test: add test of bootstrap where the coordinator crashes just before storing IP mapping
  schema_tables: remove unused code
2024-06-05 11:15:15 +02:00
Raphael S. Carvalho
3983f69b2d topology_experimental_raft/test_tablets: restore usage of check_with_down
e7246751b6 incorrectly dropped its usage in
test_tablet_missing_data_repair.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#19092
2024-06-05 10:11:02 +02:00
Kefu Chai
b7994ee4f6 util/result_try: pass template arg list explicitly
clang-19 introduced a change which enforces the change proposed
by [CWG 96](https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#96),
which was accepted by C++20 in
[P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html),
as [[temp.names]p5](https://eel.is/c++draft/temp.names#6).

so, to be future-proof and to be standard compliant, let's pass the
template arguments. otherwise we'd have build failure like
```
error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-05 13:19:45 +08:00
Kefu Chai
e2158a0c72 util/result_try: pass func as const F& instead of F&&
as we the functor passed to `invoke()` is not a rvalue, if we specify
the template parameter explicitly, clang errors out like:

```
/home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o -MF transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o.d -o transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o -c /home/kefu/dev/scylladb/transport/server.cc
In file included from /home/kefu/dev/scylladb/transport/server.cc:39:
/home/kefu/dev/scylladb/utils/result_try.hh:210:28: error: no matching function for call to 'invoke'
  210 |                     return Converter::template invoke<const Cb, const Ex&>(_cb, ex);
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/utils/result_try.hh:194:143: note: while substituting into a lambda expression here
  194 |         return [this, cont = std::forward<Continuation>(cont)] (bool& already_caught) mutable -> typename Converter::template wrapped_type<R> {
      |                                                                                                                                               ^
/home/kefu/dev/scylladb/utils/result_try.hh:327:40: note: in instantiation of function template specialization 'utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>::wrap_in_catch<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, (lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here
  327 |                 first_handler.template wrap_in_catch<R, Converter, Continuation>(std::forward<Continuation>(cont)),
      |                                        ^
/home/kefu/dev/scylladb/utils/result_try.hh:518:54: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here
  518 |     result_type res = try_catch_chain_type::template invoke_in_try_catch<>([&fun] (bool&) { return fun(); }, handlers...);
      |                                                      ^
/home/kefu/dev/scylladb/transport/server.cc:484:83: note: in instantiation of function template specialization 'utils::result_try<(lambda at /home/kefu/dev/scylladb/transport/server.cc:484:94), utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>' requested here
  484 |         return utils::result_into_future<result_with_foreign_response_ptr>(utils::result_try([&] () -> result_with_foreign_response_ptr {
      |                                                                                   ^
/home/kefu/dev/scylladb/utils/result_try.hh:33:5: note: candidate function template not viable: expects an rvalue for 1st argument
   33 |     invoke(F&& f, Args&&... args) {
      |     ^      ~~~~~
/home/kefu/dev/scylladb/utils/result_try.hh:210:28: error: no matching function for call to 'invoke'
  210 |                     return Converter::template invoke<const Cb, const Ex&>(_cb, ex);
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/utils/result_try.hh:194:143: note: while substituting into a lambda expression here
  194 |         return [this, cont = std::forward<Continuation>(cont)] (bool& already_caught) mutable -> typename Converter::template wrapped_type<R> {
      |                                                                                                                                               ^
/home/kefu/dev/scylladb/utils/result_try.hh:327:40: note: in instantiation of function template specialization 'utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>::wrap_in_catch<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, (lambda at /home/kefu/dev/scylladb/utils/result_try.hh:194:16)>' requested here
  327 |                 first_handler.template wrap_in_catch<R, Converter, Continuation>(std::forward<Continuation>(cont)),
      |                                        ^
/home/kefu/dev/scylladb/utils/result_try.hh:326:79: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:194:16)>' requested here
  326 |         return try_catch_chain_impl<R, Converter, CatchHandlers...>::template invoke_in_try_catch<>(
      |                                                                               ^
/home/kefu/dev/scylladb/utils/result_try.hh:518:54: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here
  518 |     result_type res = try_catch_chain_type::template invoke_in_try_catch<>([&fun] (bool&) { return fun(); }, handlers...);
      |                                                      ^
/home/kefu/dev/scylladb/transport/server.cc:484:83: note: in instantiation of function template specialization 'utils::result_try<(lambda at /home/kefu/dev/scylladb/transport/server.cc:484:94), utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>' requested here
  484 |         return utils::result_into_future<result_with_foreign_response_ptr>(utils::result_try([&] () -> result_with_foreign_response_ptr {
      |                                                                                   ^
/home/kefu/dev/scylladb/utils/result_try.hh:33:5: note: candidate function template not viable: expects an rvalue for 1st argument
   33 |     invoke(F&& f, Args&&... args) {
      |     ^      ~~~~~
```

so to prepare for the change to pass template parameter explicitly,
let's pass `f` as a `const` reference, instead of as a rvalue refernece.
also, this parameter type matches with our usage case -- we always
pass a member variable `_cb` to `invoke`, and we don't expect that
`invoke()` would move it away.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-05 13:19:40 +08:00
Kefu Chai
cfd6084edd Update seastar submodule
* seastar 914a4241...9ce62705 (18):
  > github: do not set --dpdk-machine haswell
  > io_tester: correct calculation of writes count
  > io-tester.md: update information about file size
  > reactor: align used hint for extent size to 128KB for XFS
  > Fix compilation failure on Ubuntu 22.04
  > io_tester: align the used file size to 1MB
  > circular_buffer_fixed_capacity: arrow operator instead of . operator
  > posix-file-impl: Do not keep device-id on board
  > github: s/clang++-18/clang++/
  > include: include used headers
  > include: include used headers
  > iotune: allow user to set buffer size for random IO
  > abort_source: add method to get exception pointer
  > github: cancel a job if it takes longer than 40 minutes
  > std-compat: remove #include:s which were added for pre C++17
  > perf_tests: measure and report also cpu cycles
  > linux_perf_events: add user_cpu_cycles_retired
  > linux_perf_event: user_instructions_retired: exclude_idle

Closes scylladb/scylladb#19019
2024-06-05 08:13:55 +03:00
Michał Chojnowski
c901139d07 scylla-gdb.py: print coroutine names in scylla fiber
Enriches the output of `scylla fiber` with resolved names of coroutine resume functions.

Before:

```
[shard  2] #0  (task*) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16
[shard  2] #1  (task*) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16
[shard  2] #2  (task*) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16
```

After:

```
[shard  2] #0  (task*) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16  (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] )
[shard  2] #1  (task*) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16  (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] )
[shard  2] #2  (task*) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16  (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] )
```

Closes scylladb/scylladb#19091
2024-06-04 22:32:17 +03:00
Pavel Emelyanov
dcc083110d gossiper: Stop using db::config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:19:47 +03:00
Pavel Emelyanov
00d8590d7e gossiper: Move force_gossip_generation on gossip_config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:19:47 +03:00
Pavel Emelyanov
e3abc5d2fd gossiper: Move failure_detector_timeout_ms on gossip_config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:19:47 +03:00
Pavel Emelyanov
53906aa431 main: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:19:47 +03:00
Pavel Emelyanov
fcab847f31 main: Make gossiper config a sharded parameter
Next patches will put updateable_value's on it, but plain copy of them
across shard doesn't work (see #7316)

Indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:19:26 +03:00
Pavel Emelyanov
77361e1661 main: Add local variable for set of seeds
Next patch will do seeds assignment to gossiper config on each
shard, so it's good to have it once, then copy around

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:18:47 +03:00
Pavel Emelyanov
9c719a0a02 main: Add local variable for group0 id
Next patch will do group0_id assignment to gossiper config on each
shard, so it's good to have it once, then copy around

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:17:58 +03:00
Pavel Emelyanov
b069544d16 main: Add local variable for cluster_name
It's modified if its empty, next patch will make this code be called on
each shard, so modification must happen only once

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:17:58 +03:00
Marcin Maliszkiewicz
ac0e164a6b raft: rename announce to commit
Old wording was derived from existing code which
originated from schema code. Name commit better
describes what we do here.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
370a5b547e cql3: raft: attach description to each mutations collector group
This description is readable from raft log table.
Previously single description was provided for the whole
announce call but since it can contain mutations from
various subsystems now description was moved to
add_mutation(s)/add_generator function calls.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
3289fbd71e auth: unify mutations_generator type
mutation_collector supports generators but it was added to
/service/raft code so it couldn't depend on /auth/ but once
it's added we can remove generator type from /auth/ as it
can depend on /service/raft.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
64b635bb58 auth: drop redundant 'this' keyword 2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
b639350933 auth: remove no longer used code from standard_role_manager::legacy_modify_membership
Since we gruadually switched all auth-v2 code paths
to use modify_membership it's now safe to delete unused code.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
a88b7fc281 cql3: auth: use mutation collector for service levels statements
This is done to achieve single transaction semantics.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
97a5da5965 cql3: auth: use mutation collector for alter role
This is done to achieve single transaction semantics.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
a12c8ebfce cql3: auth: use mutation collector for grant role and revoke role
This is done to achieve single transaction semantics.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
5ba7d1b116 cql3: auth: use mutation collector for drop role and auto-revoke
The main theme of this commit is executing drop
keyspace/table/aggregate/function statements in a single
transaction together with auth auto-revoke logic.
This is the logic which cleans related permissions after
resource is deleted.

It contains serveral parts which couldn't easily be split
into separate commits mainly because mutation collector related
paths can't be mixed together. It would require holding multiple
guards which we don't support. Another reason is that with mutation
collector the changes are announced in a single place, at the end
of statement execution, if we'd announce something in the middle
then it'd lead to raft concurrent modification infinite loop as it'd
invalidate our guard taken at the begining of statement execution.

So this commit contains:

- moving auto-revoke code to statement execution from migration_listener
 * only for auth-v2 flow, to not break the old one
 * it's now executed during statement execution and not merging schemas,
   which means it produces mutations once as it should and not on each
   node separately
 * on_before callback family wasn't used because I consider it much
   less readable code. Long term we want to remove
   auth_migration_listener.

- adding mutation collector to revoke_all
 * auto-revoke uses this function so it had to be changed,
   auth::revoke_all free function wrapper was added as cql3
   layer should not use underlying_authorizer() directly.

- adding mutation collector to drop_role
 * because it depends on revoke_all and we can't mix old and new flows
 * we need to switch all functions auth::drop_role call uses
 * gradual use of previously introduced modify_membership, otherwise
   we would need to switch even more code in this commit
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
9ca15a3ada auth: add refactored modify_membership func in standard_role_manager
The new function is simplified and handles only auth-v2 flow
with mutation_collector (single transaction logic).

It's not used in this commit and we'll switch code paths
gradually in subsequent commits.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
f67761f5b6 auth: implement empty revoke_all in allow_all_authorizer
There is no need to throw an exception because it was
always ignored later with an empty catch block.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
75ccab9693 auth: drop request_execution_exception handling from default_authorizer::revoke_all
The change applies only to auth-v2 code path.

It seems nothing in the code except cdc and truncate
throws this exception so it's probably dead code.

I'll keep it for now in other places to not accidentally
break things in auth-v1, in auth-v2 even if this exception
is used it should likely fail the query because otherwise
data consistency is silently violated.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
01fb43e35f Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks"
This reverts commit 80ed442be2.

This logic was replaced in previous commit by dynamic cast.
Hopefully even this cast will be eliminated in the future.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
0573fee2a9 cql3: auth: use mutation collector for grant and revoke permissions
This is done to achieve single transaction semantics.

The change includes auto-grant feature. In particular
for schema related auto-grant we don't use normal
mutation collector announce path but follow migration manager,
this may be unified in the future.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
9ddfc2ce4b cql3: extract changes_tablets function in alter_keyspace_statement
It will be used outside this class in the following commit
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
2a6cfbfb33 cql3: auth: use mutation collector for create role statement
This is done to achieve single transaction semantics.

grant_permissions_to_creator is logically part of create role
but its change will be included in following commits
as it spans multiple usages.

Additinally we disabled rollback during create role as
it won't work and is not needed with single transaction logic.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
e4a83008b6 auth: move create_role code into service
We need this later as we'll add condition
based on legacy_mode(qp) and free function
doesn't have access to qp.

Moreover long term we should get rid of this
weird free function pattern bloat.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
6f654675c6 auth: add a way to announce mutations having only client_state ref
Statements code have only access to client_state from
which it takes auth::service. It doesn't have abort_source
nor group0_client so we need to add them to auth::service.

Additionally since abort_source can't be const the whole
announce_mutations method needs non const auth::service
so we need to remove const from the getter function.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
47864b991a auth: add collect_mutations common helper
It will be used in subsequent commits.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
b2cbcb21e8 auth: remove unused header in common.hh 2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
7e0a801f53 auth: add class for gathering mutations without immediate announce
To achieve write atomicity across different tables we need to announce
mutations in a single transaction. So instead of each function doing
a separate announce we need to collect mutations and announce them once
at the end.
2024-06-04 15:43:04 +02:00
Piotr Dulikowski
01ff8108c1 Merge 'db/hints: Use host ID to IP mappings to choose the ep manager to drain when node is leaving' from Dawid Mędrek
In d0f5873, we introduced mappings IP–host ID between hint directories and the hint endpoint managers managing them. As a consequence, it may happen that one hint directory stores hints towards multiple nodes at the same time. If any of those nodes leaves the cluster, we should drain the hint directory. However, before these changes that doesn't happen – we only drain it when the node of the same host ID as the hint endpoint manager leaves the cluster.

This PR fixes that draining issue in the pre-host-ID-based hinted handoff. Now no matter which of the nodes corresponding to a hint directory leaves the cluster, the directory will be drained.

We also introduce error injections to be able to test that it indeed happens.

Fixes scylladb/scylladb#18761

Closes scylladb/scylladb#18764

* github.com:scylladb/scylladb:
  db/hints: Introduce an error injection to test draining
  db/hints: Ensure that draining happens
2024-06-04 10:17:14 +02:00
Botond Dénes
d120f0d7d3 Merge 'tasks: introduce task manager's task folding' from Aleksandra Martyniuk
Task manager's tasks stay in memory after they are finished.
Moreover, even if a child task is unregistered from task manager,
it is still alive since its parent keeps a foreign pointer to it. Also,
when a task has finished successfully there is no point in keeping
all of its descendants in memory.

The patch introduces folding of task manager's tasks. Whenever
a task which has a parent is finished it is unregistered from task
manager and foreign_ptr to it (kept in its parent) is replaced
with its status. Children's statuses of the task are dropped unless
they or one of their descendants failed. So for each operation we
keep a tree of tasks which contains:
- a root task and its direct children (status if they are finished, a task
  otherwise);
- running tasks and their direct children (same as above);
- a statuses path from root to failed tasks.

/task_manager/wait_task/ does not unregister tasks anymore.

Refs: #16694.

- [ ] ** Backport reason (please explain below if this patch should be backported or not) **
Requires backport to 6.0 as task number exploded with tablets.

Closes scylladb/scylladb#18735

* github.com:scylladb/scylladb:
  docs: describe task folding
  test: rest_api: add test for task tree structure
  test: rest_api: modify new_test_module
  tasks: test: modify test_task methods
  api: task_manager: do not unregister task in /task_manager/wait_task/
  tasks: unregister tasks with parents when they are finished
  tasks: fold finished tasks info their parents
  tasks: make task_manager::task::impl::finish_failed noexcept
  tasks: change _children type
2024-06-04 08:43:44 +03:00
Pavel Emelyanov
9e65434692 main: Start alternator expiration service earlier
Prior to registering drain_on_shutdown and all the protorocl servers.
To keep the natural sequence

- start core
- register drain-on-shutdown
- start transport(s)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 23:01:17 +03:00
Pavel Emelyanov
d7c231ede9 main: Start redis transparently
It's now possible to start protocol server when registered. It will also
be stopped automatically on shutdown / aborted shutdown.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 23:01:17 +03:00
Pavel Emelyanov
4204d7f4f9 main: Start alternator transparently
It's now possible to start protocol server when registered. It will also
be stopped automatically on shutdown / aborted shutdown.

Also move the controller variable lower to keep it all next to each
other.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 23:01:17 +03:00
Pavel Emelyanov
d3e1121793 main: Start thrift transparently
It's now possible to start protocol server when registered. It will also
be stopped automatically on shutdown / aborted shutdown.

It also fixes a rare bug. If thrifst is not asked to be started on boot,
its deferred shutdown action isn't created, so it it's later started via
the API, it won't be stopped on shutdown.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 23:01:17 +03:00
Pavel Emelyanov
830a87e862 main: Start native transport transparently
It's now possible to start protocol server when registered. It will also
be stopped automatically on shutdown / aborted shutdown.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 23:01:17 +03:00
Marcin Maliszkiewicz
09b26208e9 auth: cql3: use auth facade functions consistently on write path
Auth interface is quite mixed-up but general rule is that cql
statements code calls auth::* free functions from auth/service.hh
to execute auth logic.

There are many exceptions where underlying_authorizer or
underlying_role_manager or auth::service method is used instead.
Service should not leak it's internal APIs to upper layers so
functions like underlying_role_manager should not exists.

In this commit we fix tiny fragment related to auth write path.
2024-06-03 14:27:13 +02:00
Marcin Maliszkiewicz
126c82a6f5 auth: remove unused is_enforcing function 2024-06-03 14:27:13 +02:00
Wojciech Mitros
2cafa573df mv: update the backlogs when view updates finish
Currently, the backlog used for MV flow control is only updated
after we generate view updates as a result of a write request.
However, when the resources are no longer used, we should also
notice that to prevent excessive slowdowns caused by the MV
flow control calulating the delays based of an outdated, large
backlog.
This patch makes it so the backlogs are updated every time
a view update finishes, and not only when the updates start.

Fixes #18783

Closes scylladb/scylladb#18804
2024-06-03 14:10:49 +03:00
Avi Kivity
f133ae945a Merge 'repair: Introduce new primary replica selection algorithm for tablets' from Benny Halevy
Tablet allocation does not guarantee fairness of
the first replica in the replicas set across dcs.
The lack of this fix cause the following dtest to fail:
repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc

Use the tablet_map get_primary_replica or get_primary_replica_within_dc,
respectively to see if this node is the primary replica for each tablet
or not.

Fixes https://github.com/scylladb/scylladb/issues/17752

No backport is required before 6.0 as tablets (and tablet repair) are introduced in 6.0

Closes scylladb/scylladb#18784

* github.com:scylladb/scylladb:
  repair: repair_tablets: use get_primary_replica
  repair: repair_tablets: no need to check ranges_specified per tablet
  locator: tablet_map: add get_primary_replica_within_dc
  locator: tablet_map: get_primary_replica: do not copy tablet info
  locator: tablet_map: get_primary_replica: return tablet_replica
2024-06-03 13:16:49 +03:00
Kefu Chai
0da0461668 build: cmake: do not scan for C++20 modules
when creating the build rules using CMake 3.28 and up, it generates
the rules to scan for C++20 modules for C++20 projects by default.
but this slows down the compilation, and introduces unnecessary
dependencies for each of the targets when building .cc files. also,
it prevents the static analysis tools from running from a repo which
only have its building system generated, but not yet built. as,
these tools would need to process the source files just like a compiler
does, and if any of the included header files is missing, they just
fail.

so, before we migrate to C++20 modules, let's disable this feature.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19038
2024-06-03 12:51:40 +03:00
Pavel Emelyanov
9292d326b7 storage_service: Make register_protocol_server() start the server
After a protocol server is registered, it can be instantly started by
the main code. It makes sense to generalize this sequence by teaching
register_protocol_server() start it.

For now it's a no-op change, as "start_instantly" is false by default,
but next patches will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 12:12:03 +03:00
Pavel Emelyanov
2aab9f6340 storage_service: Turn register_protocol_server() async method
To make the next patch shorter

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 12:12:03 +03:00
Pavel Emelyanov
eb033e3c5f storage_service: Outline register_protocol_server()
To make next patch shorter

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 12:12:03 +03:00
Pavel Emelyanov
315ef4c484 main: Schedule deferred drain_on_shutdown() prior to protocol servers
Nex patches will remove protocol servers' deferred stops and will rely
on drain_on_shutdown -> stop_transport to do it, so the drain deferred
action some come before protocol servers' registration.

This also fixes a bug. Currently alternator and redis both rely on
protocol servers to stop them on shutdown. However, when startup is
aborted prior to drain_on_shutdown() registration, protocol servers are
not stopped and alternator and redis can remain stopped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 12:11:04 +03:00
Pavel Emelyanov
2fa89d8696 main: Move some trailing startup earlier
The set_abort_on_ebadf() call and some api endpoints registration come
after protocol servers. The latter is going to be shuffled, so move the
former earlier not to hang around.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-03 12:01:24 +03:00
Kefu Chai
c6691d3217 .github: add exception to CLEANER_DIRS
to cover more directories to prevent regressions of violating
the "include what you use" policy in this directory.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-03 12:45:04 +08:00
Kefu Chai
21bdda550a .github: annotate the report from clang-include-cleaner
before this change, user has to click into the "Details" link for
access the report from clang-include-cleaner. but this is neither
convenient nor obvious.

after this change, the report is annotated in the github web interface,
this helps the reviewers and contributers to user this tool in a
more efficient way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-03 12:45:04 +08:00
Kefu Chai
3d056a0cf2 .github: build headers before running clang-include-cleaner
clang-include-cleaner actually interprets the preprocessor macros,
and looks at the symbols. so we have to prepare the included headers
before using it.

so, but in ScyllaDB, we don't have a single target for building all the
used headers, so we have to build them either in batch of separately.

in this change, we build the included headers before running
clang-include-cleaner. this allows us to run clang-include-cleaner on
more source files.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-03 11:30:31 +08:00
Nadav Har'El
95db1c60d6 test/alternator: fix a test failing on Amazon DynamoDB
The test test_table.py::test_concurrent_create_and_delete_table failed
on Amazon DynamoDB because of a silly typo - "false" instead of "False".
A function detecting Scylla tried to return false when noticing this
isn't Scylla - but had a typo, trying to return "false" instead of "False".

This patch fixes this typo, and the test now works on DynamoDB:
test/alternator/run --aws test_table.py::test_concurrent_create_and_delete_table

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17799
2024-06-02 22:25:56 +03:00
Avi Kivity
79d0711c7e Merge 'tablets: load balancer: Use random selection of candidates when moving tablets' from Tomasz Grabiec
In order to avoid per-table tablet load imbalance balance from forming
in the cluster after adding nodes, the load balancer now picks the
candidate tablet at random. This should keep the per-table
distribution on the target node similar to the distribution on the
source nodes.

Currently, candidate selection picks the first tablet in the
unordered_set, so the distribution depends on hashing in the unordered
set. Due to the way hash is calculated, table id dominates the hash
and a single table can be chosen more often for migration away. This
can result in imbalance of tablets for any given table after
bootstrapping a new node.

For example, consider the following results of a simulation which
starts with a 6-node cluster and does a sequence of node bootstraps
and decommissions.  One table has 4096 tablets and RF=1, and the other
has 256 tablets and RF=2.  Before the patch, the smaller table has
node overcommit of 2.34 in the worst topology state, while after the
patch it has overcommit of 1.65. overcommit is calculated as max load
(tablet count per node) dividied by perfect average load (all tablets / nodes):

  Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64}
  Overcommit       : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}}
  Overcommit       : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}}
  Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}}
  Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}}

The worst state before the patch had the following distribution of tablets for the smaller table:

  Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62
  Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76
  Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88
  Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05
  Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37
  Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74
  Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33

One node has as many as 171 tablets of that table and another one has as few as 3.

After the patch, the worst distribution looks like this:

  Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17
  Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68
  Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00
  Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32
  Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88
  Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72
  Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65

Most-loaded node has 121 tablets and least loaded node has 34 tablets.
It's still not good, a better distribution is possible, but it's an improvement.

Refs #16824

Closes scylladb/scylladb#18885

* github.com:scylladb/scylladb:
  tablets: load balancer: Use random selection of candidates when moving tablets
  test: perf: Add test for tablet load balancer effectiveness
  load_sketch: Extract get_shard_minmax()
  load_sketch: Allow populating only for a given table
2024-06-02 22:03:37 +03:00
Benny Halevy
18df36d920 repair: repair_tablets: use get_primary_replica
Tablet allocation does not guarantee fairness of
the first replica in the replicas set across dcs.
The lack of this fix cause the following dtest to fail:
repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc

Use the tablet_map get_primary_replica* functions to get
the primary replica for each tablet, possibly within a dc.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-02 20:28:39 +03:00
Benny Halevy
009767455d repair: repair_tablets: no need to check ranges_specified per tablet
The code already turns off `primary_replica_only`
if `!ranges_specified.empty()`, so there's no need to
check it again inside the per-tablet loop.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-02 20:26:09 +03:00
Benny Halevy
84761acc31 locator: tablet_map: add get_primary_replica_within_dc
Will be needed by repair in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-02 20:26:09 +03:00
Benny Halevy
2de79c39dc locator: tablet_map: get_primary_replica: do not copy tablet info
Currently, the function needlessly copies the tablet_info
(all tablet replicas in particular) to a local variable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-02 20:26:09 +03:00
Benny Halevy
c52f70f92c locator: tablet_map: get_primary_replica: return tablet_replica
This is required by repair when it will start using get_primary_replica
in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-06-02 20:26:09 +03:00
Tomasz Grabiec
603abddca9 tablets: load balancer: Use random selection of candidates when moving tablets
In order to avoid per-table tablet load imbalance balance from forming
in the cluster after adding nodes, the load balancer now picks the
candidate tablet at random. This should keep the per-table
distribution on the target node similar to the distribution on the
source nodes.

Currently, candidate selection picks the first tablet in the
unordered_set, so the distribution depends on hashing in the unordered
set. Due to the way hash is calculated, table id dominates the hash
and a single table can be chosen more often for migration away. This
can result in imbalance of tablets for any given table after
bootstrapping a new node.

For example, consider the following results of a simulation which
starts with a 6-node cluster and does a sequence of node bootstraps
and decommissions.  One table has 4096 tablets and RF=1, and the other
has 256 tablets and RF=2.  Before the patch, the smaller table has
node overcommit of 2.34 in the worst topology state, while after the
patch it has overcommit of 1.65. overcommit is calculated as max load
(tablet count per node) dividied by perfect average load (all tablets / nodes):

  Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64}
  Overcommit       : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}}
  Overcommit       : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}}
  Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}}
  Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}}

The worst state before the patch had the following distribution of tablets for the smaller table:

  Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62
  Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76
  Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88
  Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05
  Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37
  Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74
  Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33

One node has as many as 171 tablets of that table and the one has as few as 3.

After the patch, the worst distribution looks like this:

  Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17
  Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68
  Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00
  Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32
  Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88
  Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72
  Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65

Most-loaded node has 121 tablets and least loaded node has 34 tablets.
It's still not good, a better distribution is possible, but it's an improvement.

Refs #16824
2024-06-02 14:23:00 +02:00
Tomasz Grabiec
7b1eea794b test: perf: Add test for tablet load balancer effectiveness 2024-06-02 14:23:00 +02:00
Tomasz Grabiec
c9bcb5e400 load_sketch: Extract get_shard_minmax() 2024-06-02 14:23:00 +02:00
Tomasz Grabiec
3be6120e3b load_sketch: Allow populating only for a given table 2024-06-02 14:23:00 +02:00
Avi Kivity
db4e4df762 alternator: yield while converting large responses to json text
We have two paths for generating the json text representation, one
for large items and one for small items, but the large item path is
lacking:

 - it doesn't yield, so a response with many items will stall
 - it doesn't wait for network sends to be accepted by the network
   stack, so it will allocate a lot of memory

Fix by moving the generation to a thread. This allows us to wait for
the network stack, which incidentally also fixes stalls.

The cost of the thread is amortized by the fact we're emitting a large
response.

Fixes #18806

Closes scylladb/scylladb#18807
2024-06-02 13:07:13 +03:00
Michał Jadwiszczak
5b4e688668 docs/procedures/backup-restore: use DESC SCHEMA WITH INTERNALS
Update docs for backup procedure to use `DESC SCHEMA WITH INTERNALS`
instead of plain `DESC SCHEMA`.
Add a note to use cqlsh in a proper version (at least 6.0.19).

Closes scylladb/scylladb#18953
2024-05-31 15:26:36 +02:00
Aleksandra Martyniuk
beef77a778 docs: describe task folding 2024-05-31 10:40:04 +02:00
Aleksandra Martyniuk
d7e80a6520 test: rest_api: add test for task tree structure
Add test which checks whether the tasks are folded into their parent
as expected.
2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
fc0796f684 test: rest_api: modify new_test_module
Remove remaining test tasks when a test module is removed, so that
a node could shutdown even if a test fails.
2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
30f97ea133 tasks: test: modify test_task methods
Wait until the task is done in test_task::finish_failed and
test_task::finish to ensure that it is folded into its parent.
2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
c1b2b8cb2c api: task_manager: do not unregister task in /task_manager/wait_task/
If /task_manager/wait_task/ unregisters the task, then there is no
way to examine children failures, since their statuses can be checked
only through their parent.
2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
a82a2f0624 tasks: unregister tasks with parents when they are finished
Unregister children that are finished from task manager. They can be
examined through they parents.
2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
e6c50ad2d0 tasks: fold finished tasks info their parents
Currently, when a child task is unregistered, it is still kept by its parent. This leads
to excessive memory usage, especially when the tasks are configured to be kept in task
manager after they are finished (task_ttl_in_seconds).

Introduce task_essentials struct which keeps only data necesarry for task manager API.
When a task which has a parent is finished, a foreign pointer to it in its parent is replaced
with respective task_essentials. Once a parent task is finished it is also folded into
its parent (if it has one). Children details of a folded task are lost, unless they
(or some of their subtrees) failed. That is, when a task is finished, we keep:
- a root task (until it is unregistered);
- task_essentials of root's direct children;
- a path (of task_essentials) from root to each failed task (so that the reason
  of a failure could be examined).
2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
319e799089 tasks: make task_manager::task::impl::finish_failed noexcept 2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
6add9edf8a tasks: change _children type
Keep task children in a map. It's a preparation for further changes.
2024-05-31 10:27:09 +02:00
Pavel Emelyanov
273dca6f27 query_processor: Coroutinize stop()
This effectively removes "finally" block so if
authorized_prepared_cache.stop() resolves with exception, the
prepared_cache.stop() is skipped. But that's not a problem -- even if
.stop() throws the shole scylla stop aborts so we don't really care if
it was clean or not.

Also, authorized_prepared_cache.stop() closes the gate and cancels the
timer. None of those can resolve with exception.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19001
2024-05-31 10:22:08 +03:00
Benny Halevy
427acb393e data_dictionary: keyspace_metadata: format: print also initial_tablets
Currently, there is no indication of tablets in the logged KSMetaData.
Print the tablets configuration of either the`initial`  number of tablets,
if enabled, or {'enabled':false} otherwise.

For example:
```
migration_manager - Create new Keyspace: KSMetaData{name=tablets_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"initial":0}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004d446a8}

migration_manager - Create new Keyspace: KSMetaData{name=vnodes_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"enabled":false}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004c33ea8}

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18998
2024-05-31 10:09:58 +03:00
Nadav Har'El
c786621b4c test/cql-pytest: reproduce bug of secondary index used before built
This patch adds a test reproducing for the known issue #7963, where
after adding a secondary-index to a table, queries might immediately
start to use this index - even before it is built - and produce wrong
results.

The issue is still open and unfixed, so the new test is marked "xfail".

Interestingly, even though Cassandra claims to have found and fixed
a similar bug in 2015 (CASSANDRA-8505), this test also fails on
Cassandra - trying a query right after CREATE INDEX and before it
was fully built may cause the query to fail.

Refs #7963

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18993
2024-05-31 10:05:00 +03:00
Raphael S. Carvalho
b396b05e20 replica: Fix race of tablet snapshot with compaction
tablet snapshot, used by migration, can race with compaction and
can find files deleted. That won't cause data loss because the
error is propagated back into the coordinator that decides to
retry streaming stage. So the consequence is delayed migration,
which might in turn reduce node operation throughput (e.g.
when decommissioning a node). It should be rare though, so
shouldn't have drastic consequences.

Fixes #18977.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18979
2024-05-31 09:58:49 +03:00
Lakshmi Narayanan Sreethar
3d7d1fa72a db/config.cc: increment components_memory_reclaim_threshold config default
Incremented the components_memory_reclaim_threshold config's default
value to 0.2 as the previous value was too strict and caused unnecessary
eviction in otherwise healthy clusters.

Fixes #18607

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#18964
2024-05-30 18:03:51 +03:00
Botond Dénes
0ead3570b4 Merge 'Run sstables loader in scheduling group' from Pavel Emelyanov
Currently the loader is called via API, which inherits the maintenance scheduling group from API http server. The loader then can either do load_and_stream() or call (legacy) distributed_loader::upload_new_sstables(). The latter first switches into streaming scheduling group, but the former doesn't and continues running in the maintenance one.

All this is not really a problem, because streaming sched group and maintenance sched group is one group under two different variable names. However, it's messy and worth delegating the sched group switch (even if it's a no-op) to the sstables-loader. As a nice side effect, this patch removes one place that uses database as proxy object to get configuration parameters.

Closes scylladb/scylladb#18928

* github.com:scylladb/scylladb:
  sstables-loader: Run loading in its scheduling group
  sstables-loader: Add scheduling group to constructor
2024-05-30 18:03:51 +03:00
Pavel Emelyanov
83d491af02 config: Remove experimental TABLETS feature
... and replace it with boolean enable_tablets option. All the places
in the code are patched to check the latter option instead of the former
feature.

The option is OFF by default, but the default scylla.yaml file sets this
to true, so that newly installed clusters turn tablets ON.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18898
2024-05-30 18:03:51 +03:00
Pavel Emelyanov
dc588d1eef replication_strategy: Remove unused factory_key::to_sstring() declaration
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18908
2024-05-30 18:03:51 +03:00
Anna Stuchlik
8f5c15b78f doc: add support for Ubuntu 24.04
Closes scylladb/scylladb#18954
2024-05-30 18:03:51 +03:00
Pavel Emelyanov
91f74989ba snitch: Remove production_snitch_base::_prop_file_contents
This fiend was used to carry string with property file contents into the
parse_property_file(), but it can go with an argument just as well

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-30 13:55:14 +03:00
Pavel Emelyanov
1cdeabdc50 snitch: Remove production_snitch_base::_prop_file_size
This field was used to carry property file size across then-lambdas, now
the code is coroutinized and can live with on-stack variable

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-30 13:54:30 +03:00
Pavel Emelyanov
b62aa276d1 snitch: Coroutinize load_property_file()
Cleaner and easier to read this way

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-30 13:54:15 +03:00
Kefu Chai
fb87ab1c75 compress, auth: include used headers
before this change, we rely on `seastar/util/std-compat.hh` to
include the used headers provided by stdandard library. this was
necessary before we moved to a C++20 compliant standard library
implementation. but since Seastar has dropped C++17 support. its
`seastar/util/std-compat.hh` is not responsible for providing these
headers anymore.

so, in this change, we include the used header directly instead
of relying on `seastar/util/std-compat.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18986
2024-05-30 09:16:23 +03:00
Kefu Chai
810da830ef build: add sanitizer compiling options directly
before this change, in order to avoid repeating/hardwiring the
compiling options set by Seastar, we just inherit the compiling
options of Seastar for building Abseil, as the former exposes the
options to enable sanitizers.

this works fine, despite that, strictly speaking, not all options
are necessary for building abseil, as abseil is not a Seastar
application -- it is just a C++ library.

but when we introduce dependencies which are only generated at
build time, and these dependencies are passed to the compiler
at build time, this breaks the build of Abseil. because these
dependencies are exposed by the Seastar's .pc file, and consumed
by Abseil. when building Abseil, apparently, the building process
driven by ninja is not started yet, so we are not able to build
Abseil with these settings due to missing dependencies.

so instead of inheriting the compiling options from Seastar, just
set the sanitizer related compiling options directly, to avoid
referencing these missing dependencies.

the upside is that we pass a much smaller set of compiling options
to compiler when building Abseil, the downside is that we hardwire
these options related to sanitizer manually, they are also detected
by Seastar's building system. but fortunately, these options are
relatively stable across the building environements we support.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18987
2024-05-30 09:14:03 +03:00
Aleksandra Martyniuk
8a72324ff1 docs: add docs to task manager
Closes scylladb/scylladb#18967
2024-05-30 09:05:02 +03:00
Raphael S. Carvalho
a56664b8e9 readers: combined: Avoid reallocation in prepare_forwardable_readers()
reserve() is missing conditional addition of single and galloping
readers.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18980
2024-05-30 08:57:27 +03:00
Dawid Medrek
e855794327 db/hints: Introduce an error injection to test draining
We want to verify that a hint directory is drained
when any of the nodes correspodning to it leaves
the cluster. The test scenario should happen before
the whole cluster has been migrated to
the host-ID-based hinted handoff, so when we still
rely on the mappings between hint endpoint managers
and the hint directories managed by them.

To make such a test possible, in these changes we
introduce an error injection rejecting incoming
hints. We want to test a scenario when:

1. hints are saved towards a given node -- node N1,
2. N1 changes its IP to a different one,
3. some other node -- node N2 -- changes its IP
   to the original IP of N1,
4. hints are saved towards N2 and they are stored
   in the same directory as the hints saved towards
   N1 before,
5. we start draining N2.

Because at some point N2 needs to be stopped,
it may happen that some mutations towards
a distributed system table generate a hint
to N2 BEFORE it has finished changing its IP,
effectively creating another hint directory
where ALL of the hints towards the node
will be stored from there on. That would disturb
the test scenario. Hence, this error injection is
necessary to ensure that all of the steps in the
test proceed as expected.
2024-05-29 19:32:41 +02:00
Dawid Medrek
745a9c6ab8 db/hints: Ensure that draining happens
Before hinted handoff is migrated to using host IDs
to identify nodes in the cluster, we keep track
of mappings between hint endpoint managers
identified by host IDs and the hint directories
managed by them and represented by IP addresses.
As a consequence, it may happen that one hint
directory corresponds to multiple nodes
-- it's intended. See 64ba620 for more details.

Before these changes, we only started the draining
process of a hint directory if the node leaving
the cluster corresponded to that hint directory
AND was identified by the same host ID as
the hint endpoint manager managing that directory.
As a result, the draining did not always happen
when it was supposed to.

Draining should start no matter which of the nodes
corresponding to a hint directory is leaving
the cluster. This commit ensures that it happens.
2024-05-29 19:32:38 +02:00
Wojciech Mitros
0de3a5f3ff test mv: remove injection delaying shutdown of a node
In the test_mv_topology_change case, we use an injection to
delay the view updates application, so that the ERMs have
a chance to change in the process. This injection was also
enabled on a new node in the test, which was later decommissioned.
During the shutdown, writes were still being performed, causing
view update generation and delays due to the injection which in
turn delayed the node shutdown, causing the test to timeout.
This patch removes the injection for the node being shut down.
At the same time, the force_gossip_topology_changes=True option
is also removed from its config, but for that option it's enough
to enable on the first node in the cluster and all nodes use it.

Fixes: https://github.com/scylladb/scylladb/issues/18941

Closes scylladb/scylladb#18958
2024-05-29 15:29:55 +02:00
Kefu Chai
a415bb07ab sl_controller: fix a typo in comment
s/necessairy/necessary/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18950
2024-05-29 16:23:31 +03:00
Nadav Har'El
4b04ed1360 test/alternator: be more forgiving on authorizer configuration
The Alternator test suite usually runs on a specific configuration of
Scylla set up by test.py or test/alternator/run. However, we do consider
it an important design goal of this test suite that developers should be
able to run these tests against any DynamoDB-API implementation, including
any version Scylla manually run by the developer in *any way* he or she
pleases.

The recent commit dc80b5dafe changed the way
we retrieve the configured autentication key, which is needed if Scylla is
run with --alternator-enforce-authorization. However, the new code assumed
that Scylla was also run with
     --authenticator PasswordAuthenticator --authorizer CassandraAuthorizer
so that the default role of "cassandra" has a valid, non-null, password
(namely, "cassandra"). If the developer ran Scylla manually without
these options, the test initialization code broke, and all tests in the
suite failed.

This patch fixes this breakage. You can now run the Alternator test
suite against Scylla run manually without any of the aforementioned
options, and everything will work except some tests in test_authorization.py
will fail as expected.

This patch has no affect on the usual test.py or test/alternator/run
runs, as they already run Scylla with all the aforementioned options
and weren't exposed to the problem fixed here.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18957
2024-05-29 16:22:45 +03:00
Raphael S. Carvalho
578a6c1e07 replica: Only consume memtable of the tablet intersecting with range read
storage_proxy is responsible for intersecting the range of the read
with tablets, and calling replica with a single tablet range, therefore
it makes sense to avoid touching memtables of tablets that don't
intersect with a particular range.

Note this is a performance issue, not correctness one, as memtable
readers that don't intersect with current range won't produce any
data, but cpu is wasted until that's realized (they're added to list
of readers in mutation_reader_merger, more allocations, more data
sources to peek into, etc).

That's also important for streaming e.g. after decommission, that
will consume one tablet at a time through a reader, so we don't want
memtables of streamed tablets (that weren't cleaned up yet) to
be consumed.

Refs #18904.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18907
2024-05-29 15:58:33 +03:00
Tomasz Grabiec
0d596a425c tablets: Filter-out left nodes in get_natural_endpoints()
The API already promises this, the comment on effective_replication_map says:
"Excludes replicas which are in the left state".

Tablet replicas on the replaced node are rebuilt after the node
already left. We may no longer have the IP mapping for the left node
so we should not include that node in the replica set. Otherwise,
storage_proxy may try to use the empty IP and fail:

  storage_proxy - No mapping for :: in the passed effective replication map

It's fine to not include it, because storage proxy uses keyspace RF
and not replica list size to determine quorum. The node is not coming
up, so noone should need to contact it.

Users which need replica list stability should use the host_id-based API.

Fixes #18843
2024-05-29 14:49:49 +02:00
Anna Stuchlik
888d7601a2 doc: add the tablets information to the nodetool describering command
This commit adds an explanation of how the `nodetool describering` command
works if tablets are enabled.

Closes scylladb/scylladb#18940
2024-05-29 15:31:46 +03:00
Pavel Emelyanov
e74a4b038f Merge 'tablets: alter keyspace' from Piotr Smaron
This change supports changing replication factor in tablets-enabled keyspaces.
This covers both increasing and decreasing the number of tablets replicas through
first building topology mutations (`alter_keyspace_statement.cc`) and then
tablets/topology/schema mutations (`topology_coordinator.cc`).
For the limitations of the current solution, please see the docs changes attached to this PR.

Fixes: #16129

Closes scylladb/scylladb#16723

* github.com:scylladb/scylladb:
  test: Do not check tablets mutations on nodes that don't have them
  test: Fix the way tablets RF-change test parses mutation_fragments
  test/tablets: Unmark RF-changing test with xfail
  docs: document ALTER KEYSPACE with tablets
  Return response only when tablets are reallocated
  cql-pytest: Verify RF is changes by at most 1 when tablets on
  cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1
  Reject ALTER with 'replication_factor' tag
  Implement ALTER tablets KEYSPACE statement support
  Parameterize migration_manager::announce by type to allow executing different raft commands
  Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks
  Extend system.topology with 3 new columns to store data required to process alter ks global topo req
  Allow query_processor to check if global topo queue is empty
  Introduce new global topo `keyspace_rf_change` req
  New raft cmd for both schema & topo changes
  Add storage service to query processor
  tablets: tests for adding/removing replicas
  tablet_allocator: make load_balancer_stats_manager configurable by name
2024-05-29 14:17:51 +03:00
Gleb Natapov
f91db0c1e4 raft topology: fix indentation after previous commit 2024-05-29 12:11:28 +03:00
Gleb Natapov
6853b02c00 raft topology: do not add bootstrapping node without IP as pending
If there is no mapping from host id to ip while a node is in bootstrap
state there is no point adding it to pending endpoint since write
handler will not be able to map it back to host id anyway. If the
transition sate requires double writes though we still want to fail.
In case the state is write_both_read_old we fail the barrier that will
cause topology operation to rollback and in case of write_both_read_new
we assert but this should not happen since the mapping is persisted by
this point (or we failed in write_both_read_old state).

Fixes: scylladb/scylladb#18676
2024-05-29 12:11:18 +03:00
Gleb Natapov
27445f5291 test: add test of bootstrap where the coordinator crashes just before storing IP mapping
On the next boot there is no host ID to IP mapping which causes node to
crash again with "No mapping for :: in the passed effective replication map"
assertion.
2024-05-29 11:46:23 +03:00
Marcin Maliszkiewicz
1b1bc6f9bb docs: document if not exists option for create index
Closes scylladb/scylladb#18956
2024-05-29 11:35:01 +03:00
Gleb Natapov
1faef47952 schema_tables: remove unused code 2024-05-29 11:30:24 +03:00
Tomasz Grabiec
3e1ba4c859 test: pylib: Extract start_writes() load generator utility 2024-05-29 10:02:56 +02:00
Piotr Smaron
8a77a74d0e cql: fix a crash lurking in ks_prop_defs::get_initial_tablets
`tablets_options->erase(it);` invalidates `it`, but it's still referred
to later in the code in the last `else`, and when that code is invoked,
we get a `heap-use-after-free` crash.

Fixes: #18926

Closes scylladb/scylladb#18936
2024-05-28 23:46:43 +03:00
Botond Dénes
aae3cfaff4 readers: compacting_reader: remove unused _ignore_partition_end
This member is read-only since ac44efea11
so remove it.

Closes scylladb/scylladb#18726
2024-05-28 20:53:00 +03:00
Kefu Chai
719d53a565 service/storage_proxy: coroutinize handle_paxos_accept()
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18765
2024-05-28 20:51:10 +03:00
Nadav Har'El
00d10aa84a alternator: clean up target string splitting
This patch cleans up a bit the code in Alternator which splits up
the operation's X-Amz-Target header (the second part of it is the
name of the operation, e.g., CreateTable).

The patch doesn't change any functionality or change performance in
any meaningful way. I was just reviewing this code and was annoyed by
the unnecessary variable and unnecessary creation of strings and
vectors for such a simple operation - and wanted to clean it up.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18830
2024-05-28 20:42:47 +03:00
Botond Dénes
d37eca0593 test/boost/mutation_reader_test: compacting_reader_next_partition: fix partition order
The test creates two partitions and passes them through the reader, but
the partitions are out-of-order. This is benign but best to fix it
anyway.
Found after bumping validation level inside the compactor.

Closes scylladb/scylladb#18848
2024-05-28 20:41:54 +03:00
Aleksandra Martyniuk
b7ae7e0b0e test: fix test_tombstone_gc.py
Tests in test_tombstone_gc.py are parametrized with string instead
of bool values. Fix that. Use the value to create a keyspace with
or without tablets.

Fixes: #18888.

Closes scylladb/scylladb#18893
2024-05-28 20:40:15 +03:00
Kefu Chai
f58f6dfe20 data_dictionary: include <variant>
otherwise when compiling with the new seastar, which removed
`#include <variant>` from `std-compat.hh`, the {mode}-headers
target would fail to build, like:

```
 ./data_dictionary/storage_options.hh:34:29: error: no template named 'variant' in namespace 'std'
10:45:15      using value_type = std::variant<local, s3>;
10:45:15                         ~~~~~^
10:45:15  ./data_dictionary/storage_options.hh:35:5: error: unknown type name 'value_type'; did you mean 'std::_Bit_const_iterator::value_type'?
10:45:15      value_type value = local{};
10:45:15      ^~~~~~~~~~
10:45:15      std::_Bit_const_iterator::value_type
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18921
2024-05-28 20:38:55 +03:00
Anna Stuchlik
cfa3cd4c94 doc: add the tablet limitation to the manual recovery procedure
This commit adds the information that the manual recovery procedure
is not supported if tablets are enabled.

In addition, the content in the Manual Recovery Procedure is reorganized
by adding the Prerequisites and Procedure subsections - in this way,
we can limit the number of Note and Warning boxes that made the page
hard to follow.

Fixes https://github.com/scylladb/scylladb/issues/18895

Closes scylladb/scylladb#18935
2024-05-28 18:19:22 +02:00
Nadav Har'El
1fe8f22d89 alternator, scheduler: test reproducing RPC scheduling group bug
This patch adds a test for issue #18719: Although the Alternator TTL
work is supposedly done in the "streaming" scheduling group, it turned
out we had a bug where work sent on behalf of that code to other nodes
failed to inherit the correct scheduling group, and was done in the
normal ("statement") group.

Because this problem only happens when more than one node is involved,
the test is in the multi-node test framework test/topology_experimental_raft.

The test uses the Alternator API. We already had in that framework a
test using the Alternator API (a test for alternator+tablets), so in
this patch we move the common Alternator utility functions to a common
file, test_alternator.py, where I also put the new test.

The test is based on metrics: We write expiring data, wait for it to expire,
and then check the metrics on how much CPU work was done in the wrong
scheduling group ("statement"). Before #18719 was fixed, a lot of work
was done there (more than half of the work done in the right group).
After the issue was fixed in the previous patch, the work on the wrong
scheduling group went down to zero.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-05-28 10:58:08 -04:00
Anna Stuchlik
2bfdb1b583 doc: document RF limitation
This commit adds the information that the Replication Factor
must be the same or higher than the number of nodes.

Closes scylladb/scylladb#18760
2024-05-28 17:14:40 +03:00
Botond Dénes
5d3f7c13f9 main: add maintenance tenant to messaging_service's scheduling config
Currently only the user tenant (statement scheduling group) and system
(default scheduling group) tenants exist, as we used to have only
user-initiated operations and sytem (internal) ones. Now there is need
to distinguish between two kinds of system operation: foreground and
background ones. The former should use the system tenant while the
latter will use the new maintenance tenant (streaming scheduling group).
2024-05-28 10:08:46 -04:00
Wojciech Mitros
519317dc58 mv: handle different ERMs for base and view table
When calculating the base-view mapping while the topology
is changing, we may encounter a situation where the base
table noticed the change in its effective replication map
while the view table hasn't, or vice-versa. This can happen
because the ERM update may be performed during the preemption
between taking the base ERM and view ERM, or, due to f2ff701,
the update may have just been performed partially when we are
taking the ERMs.

Until now, we assumed that the ERMs are synchronized while calling
finding the base-view endpoint mapping, so in particular, we were
using the topology from the base's ERM to check the datacenters of
all endpoints. Now that the ERMs are more likely to not be the same,
we may try to get the datacenter of a view endpoint that doesn't
exist in the base's topology, causing us to crash.

This is fixed in this patch by using the view table's topology for
endpoints coming from the view ERM. The mapping resulting from the
call might now be a temporary mapping between endpoints in different
topologies, but it still maps base and view replicas 1-to-1.

Fixes: #17786
Fixes: #18709

Closes scylladb/scylladb#18816
2024-05-28 16:01:39 +02:00
Botond Dénes
aae263ef0a Merge 'Harden the repair_service shutdown path' from Benny Halevy
This series ignores errors in `load_history()` to prevent `abort_requested_exception` coming from `get_repair_module().check_in_shutdown()` from escaping during `repair_service::stop()`, causing
```
repair_service::~repair_service(): Assertion `_stopped' failed.
```

Fixes https://github.com/scylladb/scylladb/issues/18889

Backport to 6.0 required due to 523895145d

Closes scylladb/scylladb#18890

* github.com:scylladb/scylladb:
  repair: load_history: warn and ignore all errors
  repair_service: debug stop
2024-05-28 15:30:39 +03:00
Pavel Emelyanov
66f6001c77 test: Do not check tablets mutations on nodes that don't have them
The check is performed by selecting from mutation_fragments(table), but
it's known that this query crashes Scylla when there's no tablet replica
on that node.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-28 13:56:46 +02:00
Pavel Emelyanov
6e0e2674f0 test: Fix the way tablets RF-change test parses mutation_fragments
When the test changes RF from 2 to 3, the extra node executes "rebuild"
transition which means that it streams tablets replicas from two other
peers. When doing it, the node receives two sets of sstables with
mutations from the given tablet. The test part that checks if the extra
node received the mutations notices two mutation fragments on the new
replica and errorneously fails by seeing, that RF=3 is not equal to the
number of mutations found, which is 4.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-28 13:56:46 +02:00
Pavel Emelyanov
2567e300d1 test/tablets: Unmark RF-changing test with xfail
Now the scailing works and test must check it does

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-28 13:56:46 +02:00
Piotr Smaron
1b913dd880 docs: document ALTER KEYSPACE with tablets 2024-05-28 13:56:46 +02:00
Piotr Smaron
39181c4bf2 Return response only when tablets are reallocated
Up until now we waited until mutations are in place and then returned
directly to the caller of the ALTER statement, but that doesn't imply
that tablets were deleted/created, so we must wait until the whole
processing is done and return only then.
2024-05-28 13:56:46 +02:00
Dawid Medrek
ec5708bdee cql-pytest: Verify RF is changes by at most 1 when tablets on
This commit adds a test verifying that we can only
change the RF of a keyspace for any DC by at most 1
when using tablets.

Fixes #18029
2024-05-28 13:56:46 +02:00
Dawid Medrek
951915ed84 cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1
We want to ensure that when the replication factor
of a keyspace changes, it changes by at most 1 per DC
if it uses tablets. The rationale for that is to make
sure that the old and new quorums overlap by at least
one node.

After these changes, attempts to change the RF of
a keyspace in any DC by more than 1 will fail.
2024-05-28 13:56:46 +02:00
Piotr Smaron
b875151405 Reject ALTER with 'replication_factor' tag
This patch removes the support for the "wildcard" replication_factor
option for ALTER KEYSPACE when the keyspace supports tablets.
It will still be supported for CREATE KEYSPACE so that a user doesn't
have to know all datacenter names when creating the keyspace,
but ALTER KEYSPACE will require that and the user will have to
specify the exact change in replication factors they wish to make by
explicitly specifying the datacenter names.
Expanding the replication_factor option in the ALTER case is
unintuitive and it's a trap many users fell into.

See #8881, #15391, #16115
2024-05-28 13:56:46 +02:00
Piotr Smaron
fbd75c5c06 Implement ALTER tablets KEYSPACE statement support
This commit adds support for executing ALTER KS for keyspaces with
tablets and utilizes all the previous commits.
The ALTER KS is handled in alter_keyspace_statement, where a global
topology request in generated with data attached to system.topology
table. Then, once topology state machine is ready, it starts to handle
this global topology event, which results in producing mutations
required to change the schema of the keyspace, delete the
system.topology's global req, produce tablets mutations and additional
mutations for a table tracking the lifetime of the whole req. Tracking
the lifetime is necessary to not return the control to the user too
early, so the query processor only returns the response while the
mutations are sent.
2024-05-28 13:56:42 +02:00
Piotr Smaron
7081215552 Parameterize migration_manager::announce by type to allow executing different raft commands
Since ALTER KS requires creating topology_change raft command, some
functions need to be extended to handle it. RAFT commands are recognized
by types, so some functions are just going to be parameterized by type,
i.e. made into templates.
These templates are instantiated already, so that only 1 instances of
each template exists across the whole code base, to avoid compiling it
in each translation unit.
2024-05-28 13:55:11 +02:00
Piotr Smaron
80ed442be2 Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks 2024-05-28 13:55:11 +02:00
Piotr Smaron
59d3fd615f Extend system.topology with 3 new columns to store data required to process alter ks global topo req
Because ALTER KS will result in creating a global topo req, we'll have
to pass the req data to topology coordinator's state machine, and the
easiest way to do it is through sytem.topology table, which is going to
be extended with 3 extra columns carrying all the data required to
execute ALTER KS from within topology coordinator.
2024-05-28 13:55:11 +02:00
Piotr Smaron
6fd0a49b63 Allow query_processor to check if global topo queue is empty
With current implementation only 1 global topo req can be executed at a
time, so when ALTER KS is executed, we'll have to check if any other
global topo req is ongoing and fail the req if that's the case.
2024-05-28 13:55:11 +02:00
Piotr Smaron
c174eee386 Introduce new global topo keyspace_rf_change req
It will be used when processing ALTER KS statement, but also to
create a separate processing path for a KS with tablets (as opposed to
a vnode KS).
2024-05-28 13:54:48 +02:00
Kamil Braun
247eb9020b Merge 'cdc, raft topology: fix and test cdc in the recovery mode' from Patryk Jędrzejczak
This PR ensures that CDC keeps working correctly in the recovery
mode after leaving the raft-based topology.

We update `system.cdc_local` in `topology_state_load` to ensure
a node restarting in the recovery mode sees the last CDC generation
created by the topology coordinator.

Additionally, we extend the topology recovery test to verify
that the CDC keeps working correctly during the whole recovery
process. In particular, we test that after restarting nodes in the
recovery mode, they correctly use the active CDC generation created
by the topology coordinator.

Fixes scylladb/scylladb#17409
Fixes scylladb/scylladb#17819

Closes scylladb/scylladb#18820

* github.com:scylladb/scylladb:
  test: test_topology_recovery_basic: test CDC during recovery
  test: util: start_writes_to_cdc_table: add FIXME to increase CL
  test: util: start_writes_to_cdc_table: allow restarting with new cql
  storage_service: update system.cdc_local in topology_state_load
2024-05-28 11:53:28 +02:00
Patryk Jędrzejczak
c44d8eca15 test: test_topology_ops: run correctly without tablets
This patch fixes two bugs in `test_topology_ops`:
1. The values of `tablets_enabled` were nonempty strings, so they
always evaluated to `True` in the if statement responsible for
enabling writing workers only if tablets are disabled. Hence, the
writing workers were always disabled.
2. The `topology_experimental_raft suite` uses tablets by default,
so we need a config with empty `experimental_features` to disable
them.

Ensuring this test works with and without tablets is considered
a part of 6.0, so we should backport this patch.

Closes scylladb/scylladb#18900
2024-05-28 10:08:41 +02:00
Pavel Emelyanov
ae622d711e sstables-loader: Run loading in its scheduling group
Now the loading code has two different paths, and only one of them
switches sched group. It's cleaner and more natural to switch the sched
group in the loader itself, so that all code paths run in it and don't
care switching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-28 11:07:58 +03:00
Pavel Emelyanov
7fefd57b74 sstables-loader: Add scheduling group to constructor
So that it knows in which group to run its code in the future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-28 11:07:22 +03:00
Nadav Har'El
b7fa5261c8 Merge 'Fix parsing of initial tablets by ALTER' from Pavel Emelyanov
If the user wants to change the default initial tablets value, it uses ALTER KEYSPACE statement. However, specifying `WITH tablets = { initial: $value }`  will take no effect, because statement analyzer only applies `tablets` parameters together with the `replication` ones, so the working statement should be `WITH replication = $old_parameters AND tablets = ...` which is not very convenient.

This PR changes the analyzer so that altering `tablets` happens independently from `replication`. Test included.

fixes: #18801

Closes scylladb/scylladb#18899

* github.com:scylladb/scylladb:
  cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS
  cql3: Fix parsing of ALTER KEYSPACE's tablets parameters
  cql3: Remove unused ks_prop_defs/prepare_options() argument
2024-05-27 23:10:39 +03:00
Kefu Chai
e42d83dc46 treewide: include used headers
before this change, we rely on `seastar/util/std-compat.hh` to
include the used headers provided by stdandard library. this was
necessary before we moved to a C++20 compliant standard library
implementation. but since Seastar has dropped C++17 support. its
`seastar/util/std-compat.hh` is not responsible for providing these
headers anymore.

so, in this change, we include the used headers directly instead
of relying on `seastar/util/std-compat.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18883
2024-05-27 17:34:38 +03:00
Anna Stuchlik
806dd5a68a doc: describe Tablets in ScyllaDB
This commit adds the main description of tablets and their
benefits.
The article can be used as a reference in other places
across the docs where we mention tablets.

Closes scylladb/scylladb#18619
2024-05-27 15:41:37 +02:00
Botond Dénes
2d79b0106c Merge 'storage_service: Fix race between tablet split and stats retrieval' from Raphael "Raph" Carvalho
Retrieval of tablet stats must be serialized with mutation to token metadata, as the former requires tablet id stability.
If tablet split is finalized while retrieving stats, the saved erm, used by all shards, can have a lower tablet count than the one in a particular shard, causing an abort as tablet map requires that any id feeded into it is lower than its current tablet count.

Fixes #18085.

Closes scylladb/scylladb#18287

* github.com:scylladb/scylladb:
  test: Fix flakiness in topology_experimental_raft/test_tablets
  service: Use tablet read selector to determine which replica to account table stats
  storage_service: Fix race between tablet split and stats retrieval
2024-05-27 16:32:54 +03:00
Pavel Emelyanov
1003391ed6 cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS
There's a test that checks how ALTER changes the initial tablets value,
but it equips the statement with `replication` parameters because of
limitations that parser used to impose. Now the `tablets` parameters can
come on their own, so add a new test. The old one is kept from
compatibility considerations.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-27 16:27:45 +03:00
Pavel Emelyanov
a172ef1bdf cql3: Fix parsing of ALTER KEYSPACE's tablets parameters
When the `WITH` doesn't include the `replication` parameters, the
`tablets` one is ignoded, even if it's present in the statement. That's
not great, those two parameter sets are pretty much independent and
should be parsed individually.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-27 16:25:38 +03:00
Pavel Emelyanov
8a612da155 cql3: Remove unused ks_prop_defs/prepare_options() argument
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-27 16:25:22 +03:00
Benny Halevy
c32c418cd5 repair: load_history: warn and ignore all errors
Currently, the call to `get_repair_module().check_in_shutdown()`
may throw `abort_requested_exception` that causes
`repair_service::stop()` to fail, and trigger assertion
failure in `~repair_service`.

We alredy ignore failure from `update_repair_time`,
so expand the logic to cover the whole function body.

Fixes scylladb/scylladb#18889

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-27 15:57:54 +03:00
Patryk Jędrzejczak
7c1e6ba8b3 test: test_topology_ops: stop a write worker after the first error
`test_topology_ops` is flaky, which has been uncovered by gating
in scylladb/scylladb#18707. However, debugging it is harder than it
should be because write workers can flood the logs. They may send
a lot of failed writes before the test fails. Then, the log file
can become huge, even up to 20 GB.

Fix this issue by stopping a write worker after the first error.

This test is important for 6.0, so we can backport this change.

Closes scylladb/scylladb#18851
2024-05-27 13:49:30 +02:00
Piotr Dulikowski
fa142a9ce7 Merge 'qos/raft_service_level_distributed_data_accessor: print correct error message when trying to modify a service level in recovery mode' from Michał Jadwiszczak
Raft service levels are read-only in recovery mode. This patch adds check and proper error message when a user tries to modify service levels in recovery mode.

Fixes https://github.com/scylladb/scylladb/issues/18827

Closes scylladb/scylladb#18841

* github.com:scylladb/scylladb:
  test/auth_cluster/test_raft_service_levels: try to create sl in recovery
  service/qos/raft_sl_dda: reject changes to service levels in recovery mode
  service/qos/raft_sl_dda: extract raft_sl_dda steps to common function
2024-05-27 13:26:06 +02:00
Kefu Chai
cbc83f92d3 .github: add iwyu workflow
iwyu is short for "include what you use". this workflow is added to
identify missing "#include" and extraneous "#include" in C++ source
files.

This workflow is triggered when a pull request is created targetting
the "master" branch. It uses the clang-include-cleaner tool provided
by clang-tools package to analyze all the ".cc" and ".hh" source files.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18122
2024-05-27 14:19:11 +03:00
Kefu Chai
e70b116333 api/api-doc/utils: fix a typo in description
s/mintues/minutes/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18869
2024-05-27 14:15:23 +03:00
Kefu Chai
2d7545ade6 test/lib: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18884
2024-05-27 14:13:51 +03:00
Piotr Smaron
06008970fb New raft cmd for both schema & topo changes
Allows executing combined topology & schema mutations under a single RAFT command
2024-05-27 12:48:44 +02:00
Piotr Smaron
cb40f13831 Add storage service to query processor
Query processor needs to access storage service to check if global
topology request is still ongoing and to be able to wait until it
completes.
2024-05-27 12:48:44 +02:00
Paweł Zakrzewski
c888945354 tablets: tests for adding/removing replicas
Note we're suppressing a UBSanitizer overflow error in UTs. That's
because our linter complains about a possible overflow, which never
happens, but tests are still failing because of it.
2024-05-27 12:48:44 +02:00
Paweł Zakrzewski
65deddd967 tablet_allocator: make load_balancer_stats_manager configurable by name
This is needed, because the same name cannot be used for 2 separate
entities, because we're getting double-metrics-registration error, thus
the names have to be configurable, not hardcoded.
2024-05-27 12:48:44 +02:00
Benny Halevy
38845754c4 repair_service: debug stop
Seen the following unexplained assertion failure with
pytest -s -v --scylla-version=local_tarball --tablets repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc
```
INFO  2024-05-27 11:18:05,081 [shard 0:main] init - Shutting down repair service
INFO  2024-05-27 11:18:05,081 [shard 0:main] task_manager - Stopping module repair
INFO  2024-05-27 11:18:05,081 [shard 0:main] task_manager - Unregistered module repair
INFO  2024-05-27 11:18:05,081 [shard 1:main] task_manager - Stopping module repair
INFO  2024-05-27 11:18:05,081 [shard 1:main] task_manager - Unregistered module repair
scylla: repair/row_level.cc:3230: repair_service::~repair_service(): Assertion `_stopped' failed.
Aborting on shard 0.
Backtrace:
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3f040c
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x41c7a1
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x3dbaf
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x8e883
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x3dafd
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x2687e
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x2679a
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x36186
  0x26f2428
  0x10fb373
  0x10fc8b8
  0x10fc809
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x456c6d
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x456bcf
  0x10fc65b
  0x10fc5bc
  0x10808d0
  0x1080800
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff22f
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x4003b7
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff888
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36dea8
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d0e2
  0x101cefa
  0x105a390
  0x101bde7
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89
  /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a
  0x101a764
```

Decoded:
```
~repair_service at ./repair/row_level.cc:3230
~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:491
 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:491
~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:569
 (inlined by) seastar::shared_ptr<repair_service>::operator=(seastar::shared_ptr<repair_service>&&) at ././seastar/include/seastar/core/shared_ptr.hh:582
 (inlined by) seastar::shared_ptr<repair_service>::operator=(decltype(nullptr)) at ././seastar/include/seastar/core/shared_ptr.hh:588
 (inlined by) operator() at ././seastar/include/seastar/core/sharded.hh:727
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&>(seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2035
 (inlined by) seastar::futurize<std::invoke_result<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::type>::type seastar::smp::submit_to<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>(unsigned int, seastar::smp_submit_to_options, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&&) at ././seastar/include/seastar/core/smp.hh:367
seastar::futurize<std::invoke_result<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::type>::type seastar::smp::submit_to<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>(unsigned int, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&&) at ././seastar/include/seastar/core/smp.hh:394
 (inlined by) operator() at ././seastar/include/seastar/core/sharded.hh:725
 (inlined by) seastar::future<void> std::__invoke_impl<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>(std::__invoke_other, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>, seastar::future<void> >::type std::__invoke_r<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>(seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:114
 (inlined by) std::_Function_handler<seastar::future<void> (unsigned int), seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290
```

FWIW, gdb crashed when opening the coredump.

This commit will help catch the issue earlier
when repair_service::stop() fails (and it must never fail)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-27 13:02:10 +03:00
Kefu Chai
61b5bfae6d docs: fix typos in dev documents
these typos were identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18871
2024-05-27 12:28:34 +03:00
Botond Dénes
c137f84535 Merge 'Mark prepare_statement as immutable' from Pavel Emelyanov
Users of prepared statement reference it with the help of "smart" pointers. None of the users are supposed to modify the object they point to, so mark the respective pointer type as `pointer<const prepared_statement>`. Also mark the fields of prepared statement itself with const's (some of them already are)

Closes scylladb/scylladb#18872

* github.com:scylladb/scylladb:
  cql3: Mark prepared_statement's fields const
  cql3: Define prepared_statement weak pointer as const
2024-05-27 12:27:54 +03:00
Kefu Chai
f1f3f009e7 docs: fix typos in upgrade document
s/Montioring/Monitoring/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18870
2024-05-27 12:26:59 +03:00
Patryk Jędrzejczak
2111cb01df test: test_topology_recovery_basic: test CDC during recovery
In topology on raft, management of CDC generations is moved to the
topology coordinator. We extend the topology recovery test to verify
that the CDC keeps working correctly during the whole recovery
process. In particular, we test that after restarting nodes in the
recovery mode, they correctly use the active CDC generation created
by the topology coordinator. A node restarting in the recovery mode
should learn about the active generation from `system.cdc_local`
(or from gossip, but we don't want to rely on it). Then, it should
load its data from `system.cdc_generations_v3`.

Fixes scylladb/scylladb#17409
2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak
388db33dec test: util: start_writes_to_cdc_table: add FIXME to increase CL 2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak
68b6e8e13e test: util: start_writes_to_cdc_table: allow restarting with new cql
This patch allows us to restart writing (to the same table with
CDC enabled) with a new CQL session. It is useful when we want to
continue writing after closing the first CQL session, which
happens during the `reconnect_driver` call. We must stop writing
before calling `reconnect_driver`. If a write started just before
the first CQL session was closed, it would time out on the client.

We rename `finish_and_verify` - `stop_and_verify` is a better
name after introducing `restart`.
2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak
4351eee1f6 storage_service: update system.cdc_local in topology_state_load
When the node with CDC enabled and with the topology on raft
disabled bootstraps, it reads system.cdc_local for the last
generation. Nodes with both enabled use group0 to get the last
generation.

In the following scenario with a cluster of one node:
1. the node is created with CDC and the topology on raft enabled
2. the user creates table T
3. the node is restarted in the recovery mode
4. the CDC log of T is extended with new entries
5. the node restarts in normal mode
The generation created in the step 3 is seen in
system_distributed.cdc_generation_timestamps but not in
system.cdc_generations_v3, thus there are used streams that the CDC
based on raft doesn't know about. Instead of creating a new
generation, the node should use the generation already committed
to group0.

Save the last CDC generation in the system.cdc_local during loading
the topology state so that it is visible for CDC not based on raft.

Fixes scylladb/scylladb#17819
2024-05-27 10:39:04 +02:00
Kefu Chai
f70e888ed5 build: cmake: pass -fprofile-list to compiler
to mirror the behavior of the build.ninja generated by configure.py

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18734
2024-05-27 11:22:55 +03:00
Botond Dénes
47dbf23773 Merge 'Rework view services and system-distributed-keyspace dependencies' from Pavel Emelyanov
The system-distributed-keyspace and view-update-generator often go in pair, because streaming, repair and sstables-loader (via distributed-loader) need them booth to check if sstable is staging and register it if it's such. The check is performed by messing directly with system_distributed.view_build_status table, and the registration happens via view-update-generator.

That's not nice, other services shouldn't know that view status is kept in system table. Also view-update-generator is a service to generae and push view updates, the fact that it keeps staging sstables list is the implementation detail.

This PR replaces dependencies on the mentioned pair of services with the single dependency on view-builder (repair, sstables-loader and stream-manager are enlightened) and hides the view building-vs-staging details inside the view_builder.

Along the way, some simplification of repair_writer_impl class is done.

Closes scylladb/scylladb#18706

* github.com:scylladb/scylladb:
  stream_manager: Remove system_distributed_keyspace and view_update_generator
  repair: Remove system_distributed_keyspace and view_update_generator
  streaming: Remove system_distributed_keyspace and view_update_generator
  sstables_loader: Remove system_distributed_keyspace and view_update_generator
  distributed_loader: Remove system_distributed_keyspace and view_update_generator
  view: Make register_staging_sstable() a method of view_builder
  view: Make check_view_build_ongoing() helper a method of view_builder
  streaming: Proparage view_builder& down to make_streaming_consumer()
  repair: Keep view_builder& on repair_writer_impl
  distributed_loader: Propagate view_builder& via process_upload_dir()
  stream_manager: Add view builder dependency
  repair_service: Add view builder dependency
  sstables_loader: Add view_bulder dependency
  main: Start sstables loader later
  repair: Remove unwanted local references from repair_meta
2024-05-27 10:51:11 +03:00
Botond Dénes
e0f4d79f3b Merge 'Do not export statement scheduling group from database' from Pavel Emelyanov
Database used to be (and still is in many ways) an object used to get configuration from. Part of the configuration is the set of pre-configured scheduling groups. That's not nice, services should use each other for some real need, not as proxies to configuration. This patch patches the places that explicitly switch to statement group _not_ to use database to get the group itself.

fixes: #17643

Closes scylladb/scylladb#18799

* github.com:scylladb/scylladb:
  database: Don't export statement scheduling group
  test: Use async attrs and cql-test-env scheduling groups
  test: Use get_scheduling_groups() to get scheduling groups
  api: Don't switch sched group to start/stop protocol servers
  main: Don't switch sched group to start protocol servers
  code: Switch to sched group in request_stop_server()
  code: Switch to server sched group in start()
  protocol_server: Keep scheduling group on board
  code: Add scheduling group to controllers
  redis: Coroutinize start() method
2024-05-27 10:48:33 +03:00
Kefu Chai
46d993a283 test: revert 4c1b6f04
in 4c1b6f04, we added a concept for fmt::is_formattable<>. but it
was not ncessary. the fmt::is_formattable<> trait was enough. the
reason 4c1b6f04 was actually a leftover of a bigger change which
tried to add trait for the cases where fmt::is_formattable<> was
not able to cover. but that was based on the wrong impression that
fmt::is_formattable<> should be able to work with container types
without including, for instance `fmt/ranges.h`. but in 222dbf2c,
we include `fmt/ranges.h` in tests, where the range-alike formatter
is used, that enables `fmt::is_formattable<>` to tell that container
types are formattable.

in short, 4c1b6f04 was created based on a misunderstanding, and
it was a reduced type trait, which is proved to be not necessary.

so, in this change, it is dropped. but the type constraints is
preserved to make the build failure more explicit, if the fallback
formatter does not match with the type to be formatted by Boost.test.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18879
2024-05-27 10:14:59 +03:00
Marcin Maliszkiewicz
2ab143fb40 db: auth: move auth tables to system keyspace
Separate keyspace which also behaves as system brings
little benefit while creating some compatibility problems
like schema digest mismatch during rollback. So we decided
to move auth tables into system keyspace.

Fixes https://github.com/scylladb/scylladb/issues/18098

Closes scylladb/scylladb#18769
2024-05-26 22:30:42 +03:00
Avi Kivity
56d523b071 Merge 'build, test: disable operator<< for vector and unordered_map' from Kefu Chai
this series disables operator<<:s for vector and unordered_map, and drop operator<< for mutation, because we don't have to keep it to work with these operator:s anymore. this change is a follow up of https://github.com/scylladb/seastar/issues/1544

this change is a cleanup. so no need to backport

Closes scylladb/scylladb#18866

* github.com:scylladb/scylladb:
  mutation,db: drop operator<< for mutation and seed_provider_type&
  build: disable operator<< for vector and unordered_map
  db/heat_load_balance: include used header
  test: define a more generic boost_test_print_type
  test/boost: define fmt::formatter for service_level_controller_test.cc
  test/boost: include test/lib/test_utils.hh
2024-05-26 19:19:20 +03:00
Kefu Chai
4e9596a5a9 treewide: replace std::result_of_t with std::invoke_result_t
in theory, std::result_of_t should have been removed in C++20. and
std::invoke_result_t is available since C++17. thanks to libstdc++,
the tree is compiling. but we should not rely on this.

so, in this change, we replace all `std::result_of_t` with
`std::invoke_result_t`. actually, clang + libstdc++ is already warning
us like:

```
In file included from /home/runner/work/scylladb/scylladb/multishard_mutation_query.cc:9:
In file included from /home/runner/work/scylladb/scylladb/schema/schema_registry.hh:11:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/unordered_map:38:
Warning: /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/type_traits:2624:5: warning: 'result_of<void (noop_compacted_fragments_consumer::*(noop_compacted_fragments_consumer &))()>' is deprecated: use 'std::invoke_result' instead [-Wdeprecated-declarations]
 2624 |     using result_of_t = typename result_of<_Tp>::type;
      |     ^
/home/runner/work/scylladb/scylladb/mutation/mutation_compactor.hh:518:43: note: in instantiation of template type alias 'result_of_t' requested here
  518 |         if constexpr (std::is_same_v<std::result_of_t<decltype(&GCConsumer::consume_end_of_stream)(GCConsumer&)>, void>) {
      |
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18835
2024-05-26 16:45:42 +03:00
Pavel Emelyanov
9108952a52 test/cql-pytest: Add test for token() filter againts mutation_fragments()
When selecting from mutation_fragments(table) one may want to apply
token() filtering againts partition key. This doesn't work currently,
but used to crash. This patch adds a regression test for that

refs: #18637
refs: #18768

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18759
2024-05-26 15:31:20 +03:00
Kefu Chai
125464f2d9 migration_manager: do not reference moved-away smart pointer
this change is inspired by clang-tidy. it warns like:
```
[752/852] Building CXX object service/CMakeFiles/service.dir/migration_manager.cc.o
Warning: /home/runner/work/scylladb/scylladb/service/migration_manager.cc:891:71: warning: 'view' used after it was moved [bugprone-use-after-move]
  891 |             db.get_notifier().before_create_column_family(*keyspace, *view, mutations, ts);
      |                                                                       ^
/home/runner/work/scylladb/scylladb/service/migration_manager.cc:886:86: note: move occurred here
  886 |             auto mutations = db::schema_tables::make_create_view_mutations(keyspace, std::move(view), ts);
      |                                                                                      ^
```
in which,  `view` is an instance of view_ptr which is a type with the
semantics of shared pointer, it's backed by a member variable of
`seastar::lw_shared_ptr<const schema>`, whose move-ctor actually resets
the original instance. so we are actually accessing the moved-away
pointer in

```c++
db.get_notifier().before_create_column_family(*keyspace, *view, mutations, ts)
```

so, in this change, instead of moving away from `view`, we create
a copy, and pass the copy to
`db::schema_tables::make_create_view_mutations()`. this should be fine,
as the behavior of `db::schema_tables::make_create_view_mutations()`
does not rely on if the `view` passed to it is a moved away from it or not.

the change which introduced this use-after-move was 88a5ddabce

Refs 88a5ddabce
Fixes #18837
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18838
2024-05-26 12:04:00 +03:00
Kefu Chai
dbfdc71d2d treewide: fix typos in comment and error messages
these typos were identified by codespell

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18868
2024-05-26 11:54:36 +03:00
Kefu Chai
35e1fcde1f mutation,db: drop operator<< for mutation and seed_provider_type&
since we've migrated away from the generic homebrew formatters
for range-alike containers, there is no need to keep there operator<<
around -- they were preserved in order to work with the container
formatters which expect operator<< of the elements.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 13:44:55 +08:00
Kefu Chai
9bd9f283f4 build: disable operator<< for vector and unordered_map
seastar provides an option named `Seastar_DEPRECATED_OSTREAM_FORMATTERS`
to enable the operator<< for `std::vector` and `std::unordered_map`,
and this option is enabled by default. but we intent to avoid using
them, so that we can use the fmt::formatter specializations when
Boost.test prints variables. if we keep these two operator<< enabled,
Boost.test would use them when printing variables to be compaired
then the check fails, but if elements in the vector or unordered_map
to be compaired does do not provide operator<<, compiling would fail.

so, in this change, let's disable these operator<< implementations.
this allows us to ditch the operator<< implementations which are
preserved only for testing.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 13:44:55 +08:00
Kefu Chai
8e0a6ea021 db/heat_load_balance: include used header
in this header, we use `hr_logger.trace("returned _pp={}", p)` to
print a `vector<float>`, so we we need to include `fmt/ranges.h`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 13:44:55 +08:00
Kefu Chai
4c1b6f0476 test: define a more generic boost_test_print_type
fmt::is_formattable<T>::value is false, even if

* T is a container of U, and
* fmt::is_formattable<U>, and
* U can be formatted using fmt::formatter

so, we have to define a more generic boost_test_print_type()
for the all types supported by {fmt}. it will help us to ditch the
operator<< for vector and unordered_map in Seastar, and allow us
to use the fmt::formatter specialization of the element
types.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 12:32:43 +08:00
Kefu Chai
bfe918ac9e test/boost: define fmt::formatter for service_level_controller_test.cc
since we are moving away for operator<< based formatter, more and more
types now only have {fmt} based formatters. the same will apply to the
STL container types after ditching the generic homebrew formatter in
to_string.hh, so to be prepared for the change, let's add the
fmt::formatter for tests as well.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 12:32:43 +08:00
Kefu Chai
222dbf2ce4 test/boost: include test/lib/test_utils.hh
this change was created in the same spirit of 505900f18f. because
we are deprecating the operator<< for vector and unorderd_map in
Seastar, some tests do not compile anymore if we disable these
operators. so to be prepared for the change disabling them, let's
include test/lib/test_utils.hh for accessing the printer dedicated
for Boost.test. and also '#include <fmt/ranges.h>' when necessary,
because, in order to format the ranges using {fmt}, we need to
use fmt/ranges.h.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 12:32:43 +08:00
Pavel Emelyanov
cf564d7a54 cql3: Mark prepared_statement's fields const
Not only users of prepared_statement point to immutable object, but the
class itself doesn't assume modifications of its fields, so mark them
const too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-25 16:41:30 +03:00
Pavel Emelyanov
828862bdff cql3: Define prepared_statement weak pointer as const
The pointer points to immutable prepared_statement, so tune up the type
respectively. Tracing has its own alieas for it, fix one too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-25 16:40:35 +03:00
Michał Chojnowski
de798775fd test: test_coordinator_queue_management: wait for logs properly
The modified lines of code intend to await the first appearance of a log
on one of the nodes.

But due to misplaced parentheses, instead of creating a list of log-awaiting
tasks with a list comprehension, they pass a generator expression to
asyncio.create_task().

This is nonsense, and it fails immediately with a type error.
But since they don't actually check the result of the await,
the test just assumes that the search completed successfully.

This was uncovered by an upgrade to Python 3.12, because its typing is stronger
and asyncio.create_task() screams when it's passed a regular generator.

This patch fixes the bad list comprehension, and also adds an error check
on the completed awaitables (by calling `await` on them).

Fixes #18740

Closes scylladb/scylladb#18754
2024-05-25 10:54:44 +03:00
Pavel Emelyanov
31edab277a database: Don't export statement scheduling group
Now all the code gets this group from elsewhere and the method can be
removed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 18:00:01 +03:00
Pavel Emelyanov
ddc511872e test: Use async attrs and cql-test-env scheduling groups
Continuation of the prevuous patch, but with its own flavor. There's a
manual test that wants to run seastar thread in statement scheduling
group and gets one from database. This patch makes it get the group from
cql-test-env and, while at it, makes it switch to that group using
thread attributes passed to async() method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 18:00:01 +03:00
Pavel Emelyanov
2e3a057db1 test: Use get_scheduling_groups() to get scheduling groups
There's such a helper in cql-test-env that other tests use to get sched
groups from. Few other tests (ab)use databse for that, this patch fixes
those remnants.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 18:00:01 +03:00
Pavel Emelyanov
d86a8252d4 api: Don't switch sched group to start/stop protocol servers
All the protocol servers implementations now maintain scheduling group
on their own, so the API handler can stop caring

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 18:00:01 +03:00
Pavel Emelyanov
ee0239b2ef main: Don't switch sched group to start protocol servers
Now each of them does this switch on its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 18:00:01 +03:00
Pavel Emelyanov
7c76a35e0b code: Switch to sched group in request_stop_server()
This method is used to stop protocol server in the runtime (via the
API). Since it's not just "kick it and wait to wrap up", it's needed to
perform this in the inherited sched group too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 18:00:01 +03:00
Pavel Emelyanov
fe349a73c8 code: Switch to server sched group in start()
This patch makes all protocol servers implementations use the inherited
sched group in their start methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 17:56:02 +03:00
Pavel Emelyanov
bf5894cc69 protocol_server: Keep scheduling group on board
The groups is now mandatory for the real protocol server implementation
to initialize. Previous patch make all of them get the sched group as
constructor argument, so that's where to take it from.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 17:54:29 +03:00
Pavel Emelyanov
fc3c3e1099 code: Add scheduling group to controllers
There are four of them currently -- transport, thrift, alternator and
redis. This patch makes main pass to all the statement scheduling group
as constructor argument. Next patches will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 17:53:16 +03:00
Pavel Emelyanov
82511f3c25 redis: Coroutinize start() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 17:52:48 +03:00
Michał Jadwiszczak
af0b6bcc56 test/auth_cluster/test_raft_service_levels: try to create sl in recovery 2024-05-23 17:49:59 +02:00
Pavel Emelyanov
8906126a2c stream_manager: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:56 +03:00
Pavel Emelyanov
84ef6a8179 repair: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:56 +03:00
Pavel Emelyanov
ae2dcdc7c2 streaming: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:55 +03:00
Pavel Emelyanov
afa94d2837 sstables_loader: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:47 +03:00
Pavel Emelyanov
b728857954 distributed_loader: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:47 +03:00
Pavel Emelyanov
66a8035b64 view: Make register_staging_sstable() a method of view_builder
Callers of it had just checked if an sstable still has some views
building, so the should talk to view-builder to register the sstable
that's now considered to be staging.

Effectively. this is to hide the view-update-generator from other
services and make them communicate with the builder only.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:47 +03:00
Pavel Emelyanov
92ff0d3fc3 view: Make check_view_build_ongoing() helper a method of view_builder
This helper checks if there's an ongoing build of a view, and it's in
fact internal to view-builder, who keeps its status in one of its
system tables.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:47 +03:00
Pavel Emelyanov
57517d5987 streaming: Proparage view_builder& down to make_streaming_consumer()
Continuation of the previous patch. Repair itself doesn't need it, but
streaming consumer does.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:46 +03:00
Pavel Emelyanov
5e6893075d repair: Keep view_builder& on repair_writer_impl
Preparation patch, next patches will make use of this new member

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:29 +03:00
Pavel Emelyanov
0d946a5fdf distributed_loader: Propagate view_builder& via process_upload_dir()
Preparation to next patches, they'll make use of this new argument

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Pavel Emelyanov
d917b06857 stream_manager: Add view builder dependency
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Pavel Emelyanov
f0f1097d0c repair_service: Add view builder dependency
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Pavel Emelyanov
f269a37541 sstables_loader: Add view_bulder dependency
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Pavel Emelyanov
ff63f8b1a5 main: Start sstables loader later
This service is on its own, nothing depends on it. Neither it can work
before system distributed keyspace is started, so move it lower.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Pavel Emelyanov
f4341ea088 repair: Remove unwanted local references from repair_meta
When constructed, the class copies local references to services just to
push them into make_repair_writer() later in the same initializers list.
There's no need in keeping those references.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Marcin Maliszkiewicz
9adf74ae6c docs: remove note about performance degradation with default superuser
This doesn't apply for auth-v2 as we improved data placement and
removed cassandra quirk which was setting different CL for some
default superuser involved operations.

Fixes #18773

Closes scylladb/scylladb#18785
2024-05-23 13:16:11 +03:00
Kefu Chai
dfeef4e4e8 build: use f-string when appropriate
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18808
2024-05-23 11:19:39 +03:00
Anna Stuchlik
2da25cca1a doc: enable publishing docs for branch-6.0
This commit enables publishing documentation
from branch-6.0. The docs will be published
as UNSTABLE (the warning about version 6.0
being unstable will be displayed).

Closes scylladb/scylladb#18832
2024-05-23 10:37:55 +03:00
Michał Jadwiszczak
ee08d7fdad service/qos/raft_sl_dda: reject changes to service levels in recovery
mode

When a cluster goes into recovery mode and service levels were migrated
to raft, service levels become temporarily read-only.

This commit adds a proper error message in case a user tries to do any
changes.
2024-05-23 08:18:03 +02:00
Michał Jadwiszczak
2b56158d13 service/qos/raft_sl_dda: extract raft_sl_dda steps to common function
When setting/dropping a service level using raft data accessor, the same
validation steps are executed (this_shard_id = 0 and guard is present).
To not duplicate the calls in both functions, they can be extracted to a
helper function.
2024-05-23 08:16:00 +02:00
Raphael S. Carvalho
e7246751b6 test: Fix flakiness in topology_experimental_raft/test_tablets
One source of flakiness is in test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode
due to gossiper being aborted prematurely, and causing reconnection
storm.

Another is test_tablet_missing_data_repair which is flaky due an issue
in python driver that session might not reconnect on rolling restart
(tracked by https://github.com/scylladb/python-driver/issues/230)

Refs #15356.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-22 17:02:29 -03:00
Raphael S. Carvalho
eb8ef38543 replica: Fix tablet's compaction_groups_for_token_range() with unowned range
File-based tablet streaming calls every shard to return data of every
group that intersects with a given range.
After dynamic group allocation, that breaks as the tablet range will
only be present in a single shard, so an exception is thrown causing
migration to halt during streaming phase.
Ideally, only one shard is invoked, but that's out of the scope of this
fix and compaction_groups_for_token_range() should return empty result
if none of the local groups intersect with the range.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18798
2024-05-22 20:15:33 +03:00
Anna Stuchlik
6626d72520 doc: replace Raft-disabled with Raft-enabled procedure
This commit fixes the incorrect Raft-related information on the Handling Cluster Membership Change Failures page
introduced with https://github.com/scylladb/scylladb/pull/17500.

The page describes the procedure for when Raft is disabled. Since 6.0, Raft for consistent schema management
is enabled and mandatory (cannot be disabled), this commit adds the procedure for Raft-enabled setups.

Closes scylladb/scylladb#18803
2024-05-22 17:45:20 +02:00
David Garcia
de2b30fafd docs: docs: autogenerate metrics
Autogenerates metrics documentation using the scripts/get_description.py script introduced in #17479

docs: add beta

Closes scylladb/scylladb#18767
2024-05-22 15:49:41 +03:00
Raphael S. Carvalho
551bf9dd58 service: Use tablet read selector to determine which replica to account table stats
Since we introduced the ability to revert migrations, we can no longer
rely on ordering of transition stages to determine whether to account
pending or leaving replica. Let's use read selector instead, which
correctly has info which replica type has correct stats info.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-22 09:25:29 -03:00
Raphael S. Carvalho
abcc68dbe7 storage_service: Fix race between tablet split and stats retrieval
If tablet split is finalized while retrieving stats, the saved erm, used by all
shards, will be invalidated. It can either cause incorrect behavior or
crash if id is not available.

It's worked by feeding local tablet map into the "coordinator"
collecting stats from all shards. We will also no longer have a snapshot
of erm shared between shards to help intra-node migration. This is
simplified by serializing token metadata changes and the retrieval of
the stats (latter should complete pretty fast, so it shouldn't block
the former for any significant time).

Fixes #18085.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-22 09:25:29 -03:00
Yaron Kaikov
9cc42c98f5 [Mergify] update configuration for 6.0
Updating mergify conf to support 6.0 release

Closes scylladb/scylladb#18823
2024-05-22 14:28:43 +03:00
Yaron Kaikov
219daf3489 Update ScyllaDB version to: 6.1.0-dev 2024-05-22 14:08:56 +03:00
Botond Dénes
2f87bfd634 Update tools/java submodule
* tools/java 4ee15fd9...88809606 (2):
  > Update Scylla Java driver to 3.11.5.3.
  > install-dependencies.sh: s/python/python3/

[botond: regenerate toolchain image]

Closes scylladb/scylladb#18790
2024-05-22 11:39:02 +03:00
Asias He
1a03e3d5ae repair: Add missing db/config.hh
Since commit 952dfc6157 "repair: Introduce
repair_partition_count_estimation_ratio config option", get_config() is
used. We need to include db/config.hh for that.

Spotted when backporting to 5.4 branch.

Refs #18615

Closes scylladb/scylladb#18780
2024-05-22 11:00:16 +03:00
Nadav Har'El
dc80b5dafe test/alternator: do not write to auth tables
As part of the Alternator test suite, we check Alternator's support for
authentication. Alternator maps Scylla's existing CQL roles to AWS's
authentication:
  * AWS's access_key_id     <- the name of the CQL role
  * AWS's secret_access_key <- the salted hash of the password of the CQL role

Before this patch, the Alternator test suite created a new role with a
preset salted hash (role "alternator", salted hash "secret_pass")
and than used that in the tests. However, with the advent of Raft-based
metadata it is wrong to write directly to the roles table, and starting
with #17952 such writes will be outright forbidden.

But we don't actually need to create a new CQL role! We already have
a perfectly good CQL role called "cassandra", and our tests already use
it. So what this patch does is to have the Alternator tests (conftest.py)
read from the roles system-table the salted hash of the "cassandra" role,
and then use that - instead of the hard-coded pair alternator/secret_pass -
in the tests.

A couple more tests assumed that the role name that was used was
"alternator", but now it was changed to "cassandra" so those tests
needed minor fixes as well.

After this patch, the Alternator tests no longer *write* to the roles
system table. Moreover, after this patch, test/alternator/run and
test/alternator/suite.yaml (used when testing with test.py) no longer
need to do extra ugly CQL setup before starting the Alternator tests.

Fixes #18744

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18771
2024-05-22 11:00:15 +03:00
Avi Kivity
c37f2c2984 version: bump version to 6.0.0-dev
The next release will be called 6.0, not 5.5, so bump the version to reflect that.

Closes scylladb/scylladb#18789
2024-05-22 11:00:15 +03:00
Kefu Chai
0610eda1b5 Update seastar submodule
* seastar 42f15a5f...914a4241 (33):
  > sstring: deprecate formatters for vector and unordered_map
  > github: use fedora:40 image for testing
  > github: add 2 testing combinations back to the matrix
  > github: extract test.yaml into a resusable workflow
  > build: use initial-exec TLS when building seastar as shared library
  > coroutine: preserve this->container before calling dtor
  > smp: allocate hugepages eagerly when kernel support is available
  > shared_mutex: Add tests for std::shared_lock and std::unique_lock
  > shared_mutex: Add RAII locks
  > README.md: replace C++17 with C++23
  > treewide: do not check for SEASTAR_COROUTINES_ENABLED
  > build: support enabled options when building seastar-module
  > treewide: include required header files
  > build: move add_subdirectory(src) down
  > README.md: replace CircleCI badge with GitHub badge
  > weak_ptr: Make it possible to convert to "compatible" pointers
  > circleci: remove circleci CI tests
  > build: use DPDK_MACHINE=haswell when testing dpdk build on github-hosted runner
  > build: add --dpdk-machine option to configure.py
  > build: stop translating -march option to names recognized by DPDK
  > github: encode matrix.enables in cache key
  > doc/prometheus.md: add metrics? in URL exporter URI
  > tests/unit/metrics_tester: use deferred_stop() when appropriate
  > httpd: mark http_server_control::stop() noexcept
  > reactor: print scheduling group along with backtrace
  > reactor: update lowres_clock when max_task_backlog is exceeded
  > tests: add test for prometheus exporter
  > tests: move apps/metrics_tester to tests/unit
  > apps/metrics_tester: keep metrics with "private" labels
  > apps/metrics_tester: support "labels" in conf.yaml
  > apps/metrics_tester: stop server properly
  > apps/metrics_tester: always start exporter
  > apps/metrics_tester: fix typo in conf-example.yaml

Closes scylladb/scylladb#18800
2024-05-22 11:00:15 +03:00
Pavel Emelyanov
26eda88401 test/tablets: Check that after RF change data is replicated properly
There's a test that checks system.tablets contents to see that after
changing ks replication factor via ALTER KEYSPACE the tablet map is
updated properly. This patch extends this test that also validates that
mutations themselves are replicated according to the desired replication
factor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18644
2024-05-22 11:00:15 +03:00
Anna Stuchlik
92bc8053e2 doc: remove outdated MV error from Troubleshooting
This commit removes the MV error message, which only
affect older versions of ScyllaDB, from the Troubleshooting section.

Fixes https://github.com/scylladb/scylladb/issues/17205

Closes scylladb/scylladb#17229
2024-05-21 19:02:31 +03:00
Avi Kivity
2bf2e24fcd Merge 'Coroutinize some auth and service levels related functions' from Marcin Maliszkiewicz
Coroutinization will help improve readability and allow easier changes planned for this code.

This work was separated from https://github.com/scylladb/scylladb/pull/17910 to make it smoother to review and merge.

Closes scylladb/scylladb#18788

* github.com:scylladb/scylladb:
  cql3: coroutinize create/alter/drop service levels
  auth: coroutinize alter_role and drop_role
  auth: coroutinize grant_permissions and revoke_permissions
  auth: coroutinize create_role
  cql3: statements: co-routinize auth related statements
  cql3: statements: release unused guard explicitly in auth related statements
2024-05-21 17:45:19 +03:00
Botond Dénes
5e41dd28c7 Merge 'Sanitize sl controller draining' from Pavel Emelyanov
The sl-controller is stopped in three steps. The first (and instantly the second) is unsubscribing from lifecycle notification and draining. The third is stop itself. First two steps are "out of order" as compared to the desired start-stop sequence of any service, this patch fixes these steps.

After this PR the drain_on_shutdown() (the call that drains the node upon stop) finally becomes clean and tidy and is no longer accompanied by ad-hoc fellow drains/stops/aborts/whatever.

refs: #2737

Closes scylladb/scylladb#18731

* github.com:scylladb/scylladb:
  sl_controller: Remove drain() method
  sl_controller: Move abort kicking into do_abort()
  main,sl_controller: Subscribe for early abort
  main: Unsubscribe sl controller next to subscribing
2024-05-21 17:16:23 +03:00
Anna Stuchlik
a86fb293fe doc: update Raft information in 6.0
This commit updates the documentation about Raft in version 6.0.

- "Introduction": The outdated information about consistent topology updates not being supported
  is removed and replaced with the correct information.
- "Enabling Raft": The relevant information is moved to other sections. The irrelevant information
   is removed. The section no longer exists.
- "Verifying that the Raft upgrade procedure finished successfully" - moved under Schema
   (in the same document). I additionally removed the include saying that after you verify
   that schema on Raft is enabled, you MUST enable topology changes on Raft (it is not mandatory;
   also, it should be part of the upgrade guide, not the Raft document).
- Unnecessary or incorrect references to versions are removed.

Refs https://github.com/scylladb/scylladb/issues/18580

Closes scylladb/scylladb#18689
2024-05-21 11:45:36 +02:00
Anna Stuchlik
eefa4a7333 doc: replace 5.4-to-5.5 with 5.4-to-6.0 upgrade guide
This commit replaces the 5.4-to-5.5 upgrade guide with the 5.4-to-6.0 upgrade guide,
including the metrics update information.

The guide references the "Enable Consistent Topology Updates" document,
as enabling consistent topology updates is a new step when upgrading to version 6.0.

Also, a procedure for image upgrades has been added (as verified by @yaronkaikov).

Fixes scylladb/scylladb#18254
Fixes scylladb/scylladb#17896
Refs scylladb/scylladb#18580

Closes scylladb/scylladb#18728
2024-05-21 11:31:04 +02:00
Piotr Dulikowski
9820472277 main: introduce schema commitlog scheduling group
Currently, we do not explicitly set a scheduling group for the schema
commitlog which causes it to run in the default scheduling group (called
"main"). However:

- It is important and significant enough that it should run in a
  scheduling group that is separate from the main one,
- It should not run in the existing "commitlog" group as user writes may
  sometimes need to wait for schema commitlog writes (e.g. read barrier
  done to learn the schema necessary to interpret the user write) and we
  want to avoid priority inversion issues.

Therefore, introduce a new scheduling group dedicated to the schema
commitlog.

Fixes: scylladb/scylladb#15566

Closes scylladb/scylladb#18715
2024-05-21 11:29:57 +02:00
Kefu Chai
5db315930e sstables: fix a typo in comment: s/Mimicks/Mimics/
this typo was identified by the codespell workflow

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18781
2024-05-21 12:14:10 +03:00
Nadav Har'El
dcd26d8a16 Merge 'docs: update isolation.md' from Botond Dénes
Update `docs/dev/isolation.d`:
* Update the list of scheduling groups
* Remove IO priority groups (they were folded into scheduling groups)
* Add section on RPC isolation

Closes scylladb/scylladb#18749

* github.com:scylladb/scylladb:
  docs: isolation.md: add section on RPC call isolation
  docs: isolation.md: remove mention of IO priority groups
  docs: isolation.md: update scheduling group list, add aliases
2024-05-21 11:46:57 +03:00
Kefu Chai
44e85c7d79 build: "undo" the coverage compiling options added to abseil
we are not interseted in the code coverage of abseil library, so no need
to apply the compiling options enabling the coverage instrumentation
when building the abseil library.

moreover, since the path of the file passed to `-fprofile-list` is a relative
path. when building with coverage enabled, the build fails when building
abseil, like:

```
 /usr/lib64/ccache/clang++  -I/jenkins/workspace/scylla-master/scylla-ci/scylla/abseil -std=c++20 -I/jenkins/workspace/scylla-master/scylla-ci/scylla/seastar/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/debug/seastar/gen/include -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEBUG_PROMISE -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DBOOST_NO_CXX98_FUNCTION_BASE -DFMT_SHARED -I/usr/include/p11-kit-1 -fprofile-instr-generate -fcoverage-mapping -fprofile-list=./coverage_sources.list -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/strings/CMakeFiles/strings.dir/str_cat.cc.o -MF absl/strings/CMakeFiles/strings.dir/str_cat.cc.o.d -o absl/strings/CMakeFiles/strings.dir/str_cat.cc.o -c /jenkins/workspace/scylla-master/scylla-ci/scylla/abseil/absl/strings/str_cat.cc
clang-16: error: no such file or directory: './coverage_sources.list'`
```

in this change, we just remove the compiling options enabling the
coverage instrumentation from the cflags when building abseil.

Fixes #18686
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18748
2024-05-21 11:43:16 +03:00
Marcin Maliszkiewicz
570b766e8b cql3: coroutinize create/alter/drop service levels 2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz
f98cb6e309 auth: coroutinize alter_role and drop_role 2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz
21556c39d3 auth: coroutinize grant_permissions and revoke_permissions 2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz
6709947ccf auth: coroutinize create_role 2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz
7f5d259b54 cql3: statements: co-routinize auth related statements 2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz
dee17e5ab6 cql3: statements: release unused guard explicitly in auth related statements
Currently guard is released immediately because those functions are
based on continuations and guard lifetime is not extended. In the following
commit we rewrite those functions to coroutines and lifetime will be
automatically extended. This would deadlock the client because we'd
try to take second guard inside auth code without releasing this unused
one.

In the future commits auth guard will be removed and the one from
statement will be used but this needs some more code re-arrangements.
2024-05-21 10:37:26 +02:00
Botond Dénes
11fa79a537 docs: isolation.md: add section on RPC call isolation 2024-05-21 03:12:22 -04:00
Kefu Chai
86b988a70b test/lib: do not use variable which could be moved away
C++ standard does not define the order in which the parameters
passed to a function are evaluated. so in theory, in
```c++
reusable_sst(sst->get_schema(), std::move(sst));
```
`std::move(sst)` could be evaluated before `sst->get_schema`.
but please note, `std::move(sst)` does not move `sst`
away, it merely cast `sst` to a rvalue reference, it is
`reusable_sst()` which *could* move `sst` away by
consuming it. so following call is much more dangerous
than the above one:
```c++
reusable_sst(sst->get_schema(), modify_sst(std::move(sst)))
```
nevertheless, this usage is still confusing. so instead
of passing a copy of `sst` to `reusable_sst`.

this change is inspired by clang-tidy, it warns like:

```
Warning: /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: warning: 'sst' used after it was moved [bugprone-use-after-move]
  397 |     return reusable_sst(sst->get_schema(), std::move(sst));
      |                         ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:44: note: move occurred here
  397 |     return reusable_sst(sst->get_schema(), std::move(sst));
      |                                            ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
  397 |     return reusable_sst(sst->get_schema(), std::move(sst));
      |
```

per the analysis above, this is a false alarm.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18775
2024-05-21 10:02:10 +03:00
Pavel Emelyanov
428e0bd7d4 locator: Remove unused lshift-operator for topology
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18714
2024-05-21 09:46:30 +03:00
Pavel Emelyanov
b24fb8dc87 inet_address: Remove to_sstring() in favor of fmt::to_string
The existing inet_address::to_string() calls fmt::format("{}", *this)
anyway. However, the to_string() method is declared in .cc file, while
form formatter is in the header and is equipeed with constexprs so
that converting an address to string is done as much as possible
compile-time.

Also, though minor, fmt::to_string(foo) is believed to be even faster
than fmt::format("{}", foo).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18712
2024-05-21 09:43:08 +03:00
Pavel Emelyanov
fed457eb06 sl_controller: Remove drain() method
The draining now only consists of waiting for the data update future to
resolve. It can be safely moved to .stop() (i.e. -- later) because its
stopping had already been initiated by abort-source, and no other
services depend on sl-controller to be stopped and drained.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-21 09:42:16 +03:00
Pavel Emelyanov
535e5f4ae7 sl_controller: Move abort kicking into do_abort()
Draining sl controller consists of two parts -- first, kicks the wrap-up
process by aborting operations, breaking semaphores, etc. It's
no-waiting part. At last there goes co_await of the completion future.
This part moves the no-waiting part into recently introduced abort
subscription, so that wrap-up starts few bits earlier.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-21 09:42:16 +03:00
Kefu Chai
b6e2d6868b build: add dependencies from binaries to abseil libraries
in 0b0e661a, we brought abseil submodule back. but we didn't update
the build.ninja rules properly -- we should have add the abseil
libraries to the dependencies of the binaries so that the abseil
libraries are always generated before a certain binary is built.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18753
2024-05-21 08:50:48 +03:00
Avi Kivity
33ec6ccea9 test: boost: chunked_vector_test: include <optional>
std::optional is used but not imported. This fails on libstdc++-14.

Closes scylladb/scylladb#18739
2024-05-21 07:37:11 +03:00
Pavel Emelyanov
8d4c8711fa main,sl_controller: Subscribe for early abort
There's stop-signal in main that fires an abort source on stop. Lots of
other services are subscribed in it, add the sl-controller too. For now
it's a no-op, but next patches will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-20 21:26:31 +03:00
Pavel Emelyanov
5105ee3284 main: Unsubscribe sl controller next to subscribing
The subscription only handles on_leave_cluster() and only for local
node, so even if controller gets subscribed for longer, it won't do any
harm.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-20 21:26:31 +03:00
Yaron Kaikov
bc596a3e76 pull_request_template: clearify the template and remove checkbox verification
It seems that having the checkbox in the PR template and failing the action is confusing and not very clear. Let's remove it completely and just add to the template an explanation to explain the backport reason

Closes scylladb/scylladb#18708
2024-05-20 18:24:28 +03:00
Botond Dénes
f239339a29 Merge 'Improve modularity of some per-table API endpoints' from Pavel Emelyanov
There's a set of API endpoints that toggle per-table auto-compaction and tombstone-gc booleans. They all live in two different .cc files under api/ directory and duplicate code of each other. This PR generalizes those handlers, places them next to each other, fixes leak on stop and, as a nice side effect, enlightens database.hh header.

Closes scylladb/scylladb#18703

* github.com:scylladb/scylladb:
  api,database: Move auto-compaction toggle guard
  api: Move some table manipulation helpers from storage_service
  api: Move table-related calls from storage_service domain
  api: Reimplement some endpoints using existing helpers
  api: Lost unset of tombstone-gc endpoints
2024-05-20 18:01:54 +03:00
Avi Kivity
61505d057e Merge 'Sort user-defined types in describe statements' from Michał Jadwiszczak
User-defined types can depend on each other, creating directed acyclic graph.

In order to support restoring schema from `DESC SCHEMA`, UDTs should be
ordered topologically, not alphabetically as it was till now.

This patch changes the way UDTs are ordered in `DESC SCHEMA`/`DESC KEYSPACE <ks>` statements, so the output can be safely copy-pasted to restore the schema.

Fixes #18539

Closes scylladb/scylladb#18302

* github.com:scylladb/scylladb:
  test/cql-pytest/test_describe: add test for UDTs ordering
  cql3/statements/describe_statement: UDTs topological sorting
  cql3/statements/describe_statement: allow to skip alphabetical sorting
  types: add a method to get all referenced user types
  db/cql_type_parser: use generic topological sorting
  db/cql_type_parses: futurize raw_builder::build()
  test/boost: add test for topological sorting
  utils: introduce generic topological sorting algorithm
2024-05-20 16:58:17 +03:00
Pavel Emelyanov
159e44d08a test.py: Make it possible to avoid wildcard test names matching
There's a nasty scenario when this searching plays bad joke.

When CI picks up a new branch and notices, that a test had changed, it
spawns a custom job with test.py --repeat 100 $changed_test_name in
it. Next, when the test.py tries opt-in test name matching, it uses the
wildcard search and can pick up extra unwanted tests into the run.

To solve this, the case-selection syntax is extended. Now if the caller
specifies `suite/test::*` as test, the test file is selected by exact
name match, but the specific test-case is not selected, the `*` makes it
run all cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18704
2024-05-20 15:50:47 +02:00
Botond Dénes
e1c4e6c151 Merge 'sstables_manager: use maintenance scheduling group to run components reload fiber' from Lakshmi Narayanan Sreethar
PR https://github.com/scylladb/scylladb/pull/18186 introduced a fiber that reloads reclaimed bloom filters when memory becomes available. Use maintenance scheduling group to run that fiber instead of running it in the main scheduling group.

Fixes #18675

Closes scylladb/scylladb#18721

* github.com:scylladb/scylladb:
  sstables_manager: use maintenance scheduling group to run components reload fiber
  sstables_manager: add member to store maintenance scheduling group
2024-05-20 16:38:42 +03:00
Takuya ASADA
33af97ca5a dist/docker: revert dropping systemd package
On 7ce6962141 we dropped openssh-server,
it also dropped systemd package and caused an error on Scylla Operator
(#17787).

This reverts dropping systemd package and fix the issue.

Fix #17787

Closes scylladb/scylladb#18643
2024-05-20 16:38:15 +03:00
Andrei Chekun
bce53efd36 Enrich test results produced by test.py
This PR resolves issue with double count of the test result for topology tests. It will not appear in the consolidated report anymore.
Another fix is to provide a better view which test failed by modifying the test case name in the report enriching it with mode and run id, so making them unique across the run.

The scope of this change is:
1. Modify the test name to have run id in name
2. Add handlers to get logs of test.py and pytest in one file that are related to test, rather than to the full suite
3. Remove topology tests from aggregating them on a suite level in Junit results
4. Add a link to the logs related to the failed tests in Junit results, so it will be easier to navigate to all logs related to test
5. Gather logs related to the failed test to one directory for better logs investigation

Ref: scylladb/scylladb#17851

Closes scylladb/scylladb#18277
2024-05-20 15:33:57 +02:00
Avi Kivity
52fe351c31 Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec
This is needed to avoid severe imbalance between shards which can
happen when some table grows and is split. The inter-node balance can
be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N
then there is not even a possibility of moving tablets around to fix the imbalance.
The only way to bring the system to balance is to move tablets within the nodes.

The system is not prepared for intra-node migration currently. Request coordination
is host-based, while for intra-node migration it should be (also) shard-based.
The solution employed here is to keep the coordination between nodes as-is,
and for intra-node migration storage_proxy-level coordinator is not aware of
the migration (no pending host). The replica-side request handler will be a
second-level coordinator which routes requests to shards, similar to how
the first-level coordinator routes them to hosts.

Tablet sharder is adjusted to handle intra-migration where a tablet
can have two replicas on the same host. For reads, sharder uses the
read selector to resolve the conflict. For writes, the write selector
is used.

The old shard_of() API is kept to represent shard for reads, and new
method is introduced to query the shards for writing:
shard_for_writes(). All writers should be switched to that API, which
is not done in this patch yet.

The request handler on replica side acts as a second-level
coordinator, using sharder to determine routing to shards. A given
sharder has a scope of a single topology version, a single
effective_replication_map_ptr, which should be kept alive during
writes.

perf-simple-query test results show no signs of regression:

Command: perf-simple-query -c1 -m1G --write --tablets --duration=10

Before:

> 83294.81 tps ( 59.5 allocs/op,  14.3 tasks/op,   53725 insns/op,        0 errors)
> 87756.72 tps ( 59.5 allocs/op,  14.3 tasks/op,   54049 insns/op,        0 errors)
> 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> 86211.38 tps ( 59.7 allocs/op,  14.3 tasks/op,   54219 insns/op,        0 errors)
> 86559.89 tps ( 59.6 allocs/op,  14.3 tasks/op,   54188 insns/op,        0 errors)
> 86609.39 tps ( 59.6 allocs/op,  14.3 tasks/op,   54117 insns/op,        0 errors)
> 87464.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   54039 insns/op,        0 errors)
> 86185.43 tps ( 59.6 allocs/op,  14.3 tasks/op,   54169 insns/op,        0 errors)
> 86254.71 tps ( 59.6 allocs/op,  14.3 tasks/op,   54139 insns/op,        0 errors)
> 83395.35 tps ( 60.2 allocs/op,  14.4 tasks/op,   54693 insns/op,        0 errors)
>
> median 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> median absolute deviation: 243.04
> maximum: 87756.72
> minimum: 83294.81
>

After:

> 85523.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   53872 insns/op,        0 errors)
> 89362.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54226 insns/op,        0 errors)
> 88167.55 tps ( 59.7 allocs/op,  14.3 tasks/op,   54400 insns/op,        0 errors)
> 87044.40 tps ( 59.7 allocs/op,  14.3 tasks/op,   54310 insns/op,        0 errors)
> 88344.50 tps ( 59.6 allocs/op,  14.3 tasks/op,   54289 insns/op,        0 errors)
> 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> 88725.46 tps ( 59.6 allocs/op,  14.3 tasks/op,   54230 insns/op,        0 errors)
> 88640.08 tps ( 59.6 allocs/op,  14.3 tasks/op,   54210 insns/op,        0 errors)
> 90306.31 tps ( 59.4 allocs/op,  14.3 tasks/op,   54043 insns/op,        0 errors)
> 87343.62 tps ( 59.8 allocs/op,  14.3 tasks/op,   54496 insns/op,        0 errors)
>
> median 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> median absolute deviation: 1007.41
> maximum: 90306.31
> minimum: 85523.06

Command (reads): perf-simple-query -c1 -m1G  --tablets --duration=10

Before:

> 95860.18 tps ( 63.1 allocs/op,  14.1 tasks/op,   42476 insns/op,        0 errors)
> 97537.69 tps ( 63.1 allocs/op,  14.1 tasks/op,   42454 insns/op,        0 errors)
> 97549.23 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97511.29 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97227.32 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 94031.94 tps ( 63.1 allocs/op,  14.1 tasks/op,   42441 insns/op,        0 errors)
> 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> 96401.70 tps ( 63.1 allocs/op,  14.1 tasks/op,   42473 insns/op,        0 errors)
> 96573.77 tps ( 63.1 allocs/op,  14.1 tasks/op,   42440 insns/op,        0 errors)
> 96340.54 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
>
> median 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> median absolute deviation: 571.20
> maximum: 97549.23
> minimum: 94031.94
>

After:

> 99794.67 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 101244.99 tps ( 63.1 allocs/op,  14.1 tasks/op,   42472 insns/op,        0 errors)
> 101128.37 tps ( 63.1 allocs/op,  14.1 tasks/op,   42485 insns/op,        0 errors)
> 101065.27 tps ( 63.1 allocs/op,  14.1 tasks/op,   42465 insns/op,        0 errors)
> 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> 101413.31 tps ( 63.1 allocs/op,  14.1 tasks/op,   42463 insns/op,        0 errors)
> 101464.92 tps ( 63.1 allocs/op,  14.1 tasks/op,   42466 insns/op,        0 errors)
> 101086.74 tps ( 63.1 allocs/op,  14.1 tasks/op,   42488 insns/op,        0 errors)
> 101559.09 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
> 100742.58 tps ( 63.1 allocs/op,  14.1 tasks/op,   42491 insns/op,        0 errors)
>
> median 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> median absolute deviation: 200.33
> maximum: 101559.09
> minimum: 99794.67
>

Fixes #16594

Closes scylladb/scylladb#18026

* github.com:scylladb/scylladb:
  Implement fast streaming for intra-node migration
  test: tablets_test: Test sharding during intra-node migration
  test: tablets_test: Check sharding also on the pending host
  test: py: tablets: Test writes concurrent with migration
  test: py: tablets: Test crash during intra-node migration
  api, storage_service: Introduce API to wait for topology to quiesce
  dht, replica: Remove deprecated sharder APIs
  test: Avoid using deprecated sharded API
  db: do_apply_many() avoid deprecated sharded API
  replica: mutation_dump: Avoid deprecated sharder API
  repair: Avoid deprecated sharder API
  table: Remove optimization which returns empty reader when key is not owned by the shard
  dht: is_single_shard: Avoid deprecated sharder API
  dht: split_range_to_single_shard: Work with static_sharder only
  dht: ring_position_range_sharder: Avoid deprecated sharder APIs
  dht: token: Avoid use of deprecated sharder API by switching to static_sharder
  selective_token_sharder: Avoid use of deprecated sharder API
  docs: Document tablet sharding vs tablet replica placement
  readers/multishard.cc: use shard_for_reads() instead of shard_of()
  multishard_mutation_query.cc: use shard_for_reads() instead of shard_of()
  storage_proxy: Extract common code to apply mutations on many shards according to sharder
  storage_proxy: Prepare per-partition rate-limiting for intra-node migration
  storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate()
  storage_proxy: Prepare mutate_hint() for intra-node tablet migration
  commitlog_replayer: Avoid deprecated sharder::shard_of()
  lwt: Avoid deprecated sharder::shard_of()
  compaction: Avoid deprecated sharder::shard_of()
  dht: Extract dht::static_sharder
  replica: Deprecate table::shard_of()
  locator: Deprecate effective_replication_map::shard_of()
  dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard
  tests: tablets: py: Add intra-node migration test
  tests: tablets: Test that drained nodes are not balanced internally
  tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load
  tests: tablets: Verify that disabling balancing results in no intra-node migrations
  tests: tablets: Check that nodes are internally balanced
  tests: tablets: Improve debuggability by showing which rows are missing
  tablets, storage_service: Support intra-node migration in move_tablet() API
  tablet_allocator: Generate intra-node migration plan
  tablet_allocator: Extract make_internode_plan()
  tablet_allocator: Maintain candidate list and shard tablet count for target nodes
  tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions
  tablets, streaming: Implement tablet streaming for intra-node migration
  dht, auto_refreshing_sharder: Allow overriding write selector
  multishard_writer: Handle intra-node migration
  storage_proxy: Handle intra-node tablet migration for writes
  tablets: Get rid of tablet_map::get_shard()
  tablets: Avoid tablet_map::get_shard in cleanup
  tablets: test: Use sharder instead of tablet_map::get_shard()
  tablets: tablet_sharder: Allow working with non-local host
  sharding: Prepare for intra-node-migration
  docs: Document sharder use for tablets
  tablets: Introduce tablet transition kind for intra-node migration
  tests: tablets: Fix use-after-move of skiplist in rebalance_tablets()
  sstables, gdb: Track readers in a linked list
  raft topology: Fix global token metadata barrier to not fence ahead of what is drained
2024-05-20 16:13:01 +03:00
Kefu Chai
a517fcf970 service/storage_proxy: capture tr_state by copy in handle_paxos_accept()
this change is inspired by following warning from clang-tidy

```
Warning: /home/runner/work/scylladb/scylladb/service/storage_proxy.cc:884:13: warning: 'tr_state' used after it was moved [bugprone-use-after-move]
  884 |         if (tr_state) {
      |             ^
/home/runner/work/scylladb/scylladb/service/storage_proxy.cc:872:139: note: move occurred here
  872 |         auto f = get_schema_for_read(proposal.update.schema_version(), src_addr, *timeout).then([&sp = _sp, &sys_ks = _sys_ks, tr_state = std::move(tr_state),
      |                                                                                                                                           ^
```

this is not a false positive. as `tr_state` is a captured by move for
constructing a variable in the captured list of a lambda which is in
turn passed to the expression evaluated to `f`.

even the expression itself is not evaluated yet when we reference
`tr_state` to check if it is empty after preparing the expression,
`tr_state` is already moved away into the captured variable. so
at that moment, the statement of `f = f.finally(...)` is never
evaluated, because `tr_state` is always empty by then.

so before this change, the trace message is never recorded.

in this change, we address this issue by capturing `tr_state` by
copying it. as `tr_state` is backed by a `lw_shared_ptr`, the overhead is
neglectable.

after this change, the tracing message is recorded.

the change introduced this issue was 548767f91e.

please note, we could coroutinize this function to improve its
readability, but since this is a fix and should be backported,
let's start with a minimal fix, and worry about the readability
in a follow-up change.

Refs 548767f91e
Fixes #18725
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18702
2024-05-20 12:58:49 +03:00
Kefu Chai
40ce52c3cc test: use generic boost_test_print_type()
in this change, we trade the `boost_test_print_type()` overloads
for the generic template of `boost_test_print_type()`, except for
those in the very small tests, which presumably want to keep
themselves relative self-contained.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18727
2024-05-20 12:56:20 +03:00
Botond Dénes
0e23cd45ad Merge 'feature: grandfather some old cluster features' from Avi Kivity
This series grandfathers the following features:

  MD_SSTABLE_FORMAT
  ME_SSTABLE feature
  VIEW_VIRTUAL_COLUMNS
  DIGEST_INSENSITIVE_TO_EXPIRY
  CDC
  NONFROZEN_UDTS
  PER_TABLE_PARTITIONERS
  PER_TABLE_CACHING
  DIGEST_FOR_NULL_VALUES
  CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX

Note that for the last (CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX) some code remains to support indexes created before the new feature was adopted.

Each patch names the version where the feature was introduced.

Closes scylladb/scylladb#18428

* github.com:scylladb/scylladb:
  feature, index: grandfather CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX
  feature: grandfather DIGEST_FOR_NULL_VALUES
  storage_proxy: drop use of MD5 as a digest algorithm
  feature: grandfather PER_TABLE_CACHING
  feature: grandfather LWT
  feature: grandfather HINTED_HANDOFF_SEPARATE_CONNECTION
  feature: grandfather PER_TABLE_PARTITIONERS
  test: schema_change_test: regenerate digest for PER_TABLE_PARTITIONERS
  test: test_schema_change_digest: drop unneeded reference digests
  feature: grandfather NONFROZEN_UDTS
  feature: grandfather CDC
  feature: grandfather DIGEST_INSENSITIVE_TO_EXPIRY
  feature: grandfather VIEW_VIRTUAL_COLUMNS
  feature: grandfather ME_SSTABLE feature
  feature: grandfather MD_SSTABLE_FORMAT
2024-05-20 11:48:07 +03:00
Botond Dénes
936a7e282b docs: isolation.md: remove mention of IO priority groups
They were folded into CPU scheduling groups, which now apply to both CPU
and IO.
2024-05-20 03:33:24 -04:00
Botond Dénes
8f61468322 docs: isolation.md: update scheduling group list, add aliases 2024-05-20 03:30:04 -04:00
Lakshmi Narayanan Sreethar
6f58768c46 sstables_manager: use maintenance scheduling group to run components reload fiber
Fixes #18675

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-19 15:23:45 +05:30
Lakshmi Narayanan Sreethar
79f6746298 sstables_manager: add member to store maintenance scheduling group
Store that maintenance scheduling group inside the sstables_manager. The
next patch will use this to run the components reloader fiber.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-19 15:23:45 +05:30
Avi Kivity
54a82fed6b feature, index: grandfather CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX
This feature corrected how we store the token in secondary indexes. It
was introduced in 7ff72b0ba5 (2020; 4.4) and can now be assumed present
everywhere. Note that we still support indexes created with the old format.
2024-05-18 00:24:11 +03:00
Avi Kivity
2fbd78c769 feature: grandfather DIGEST_FOR_NULL_VALUES
The DIGEST_FOR_NULL_VALUES feature was added in 21a77612b3 (2020; 4.4)
and can now be assumed to be always present. The hasher which it invoked
is removed.
2024-05-18 00:24:00 +03:00
Avi Kivity
879583c489 storage_proxy: drop use of MD5 as a digest algorithm
The XXHASH feature was introduced in 0bab3e59c2 (2017; 2.2) and made
mandatory in defe6f49df (2020; 4.4), but some vestiges remain.
Remove them now. Note that md5_hasher itself is still in use by
other components, so it cannot be removed.
2024-05-18 00:23:47 +03:00
Avi Kivity
7c264e8a71 feature: grandfather PER_TABLE_CACHING
The PER_TABLE_CACHING feature was added in 0475dab359 (2020; 4.2)
and can now be assumed to be always present.
2024-05-18 00:23:30 +03:00
Avi Kivity
d52c424a5f feature: grandfather LWT
LWT was make non-experimental in 9948f548a5 (2020; 4.1) and can now be
assumed to be always present.
2024-05-18 00:20:53 +03:00
Avi Kivity
93088d0921 feature: grandfather HINTED_HANDOFF_SEPARATE_CONNECTION
The HINTED_HANDOFF_SEPARATE_CONNECTION feature was introduced in 3a46b1bb2b (2019; 3.3)
and can be assumed always present.
2024-05-18 00:18:27 +03:00
Avi Kivity
3bead8cea0 feature: grandfather PER_TABLE_PARTITIONERS
The PER_TABLE_PARTITIONERS feature was added in 90df9a44ce (2020; 4.0)
and can now be assumed to be always present. We also remove the associated
schema_feature.
2024-05-18 00:15:07 +03:00
Avi Kivity
6b532fd40b test: schema_change_test: regenerate digest for PER_TABLE_PARTITIONERS
The first digest tested was generated without the PER_TABLE_PARTITIONERS
schema feature. We're about to make that feature mandatory, so we won't
be able (and won't need) to generate a digest without it.

Update the digest to include the feature. Note it wasn't untested before,
we have a test with schema_features::full().
2024-05-18 00:14:43 +03:00
Avi Kivity
c4d8b17f4c test: test_schema_change_digest: drop unneeded reference digests
digests[0] was used by the VIEW_VIRTUAL_COLUMNS feature, which
no longer exists.

digests[1] is the same as digests[2], so drop it.
2024-05-17 20:41:20 +03:00
Avi Kivity
93113da01b feature: grandfather NONFROZEN_UDTS
The NONFROZEN_UDTS feature was added in e74b5deb5d (2019; 3.2)
and can now be assumed to be always present.
2024-05-17 20:41:20 +03:00
Avi Kivity
c7d7ca2c23 feature: grandfather CDC
The CDC feature was made non-experimental in e9072542c1 (2020; 4.4)
and can now be assumed to be always present. We also remove the corresponding
schema_feature.
2024-05-17 20:41:20 +03:00
Avi Kivity
82ad2913ca feature: grandfather DIGEST_INSENSITIVE_TO_EXPIRY
The DIGEST_INSENSITIVE_TO_EXPIRY feature was added in 9de071d214 (2019; 3.2)
and can now be assumed to be always present. We enable the corresponding
schema_feature unconditionally.

We do not remove the corresponding schema feature, because it can be disabled
when the related TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is present.
2024-05-17 20:41:19 +03:00
Avi Kivity
b5f6021a6b feature: grandfather VIEW_VIRTUAL_COLUMNS
The VIEW_VIRTUAL_COLUMNS feature was added in a108df09f9 (2019; 3.1)
and can now be assumed to be always present.

The corresponding schema_feature is removed. Note schema_features are not sent
over the wire. A digest calculation without VIEW_VIRTUAL_COLUMNS is no longer tested.
2024-05-17 20:41:19 +03:00
Avi Kivity
7952200c8c feature: grandfather ME_SSTABLE feature
"me" format sstables were introduced in d370558279 (Jan 2022; 5.1)
and so can be assumed always present. The listener that checks when
the cluster understands ME_SSTABLE was removed and in its place
we default to sstable_version_types::me (and call on_enabled()
immediately).
2024-05-17 20:41:19 +03:00
Avi Kivity
6d0c0b542c feature: grandfather MD_SSTABLE_FORMAT
"md" sstable support was introduced in e8d7744040 (2020; 4.4)
and so can be assumed to be present on all versions we upgrade from.
Nothing appears to depend on it.
2024-05-17 20:41:19 +03:00
Anna Stuchlik
c93a7d2664 doc: replace 5.5 with 6.0 in SStable docs (me)
This commit replaces the version number 5.5 with 6.0,
because 5.5 has never been released.

This is a follow-up to https://github.com/scylladb/scylladb/pull/16716.

Refs https://github.com/scylladb/scylladb/issues/16551
Refs https://github.com/scylladb/scylladb/issues/18580

Closes scylladb/scylladb#18730
2024-05-17 16:34:18 +03:00
Botond Dénes
db70e8dd5f test/cql-pytest: test_tombstone_limit.py: enable xfailing tests
These tests were marked as xfail because they use to fail with tablets.
They don't anymore, so remove the xfail.

Fixes: #16486

Closes scylladb/scylladb#18671
2024-05-16 20:14:47 +03:00
Nadav Har'El
c7aa47354a Merge 'mutation_fragment_stream_validating_filter: respect validating_level::none' from Botond Dénes
Even when configured to not do any validation at all, the validator still did some. This small series fixes this, and adds a test to check that validation levels in general are respected, and the validator doesn't validate more than it is asked to.

Fixes: #18662

Closes scylladb/scylladb#18667

* github.com:scylladb/scylladb:
  test/boost/mutation_fragment_test.cc: add test for validator validation levels
  mutation: mutation_fragment_stream_validating_filter: fix validation_level::none
  mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter
2024-05-16 19:57:49 +03:00
Kamil Braun
734c5de314 Merge 'fix test teardown race with ongoing test operation' from Artsiom Mishuta
This commit brings several new features in scylla_cluster.py to fix runaway asyncio task problems in topology tests

- Start-Stop Lock and Stop Event in ScyllaServer
- Tasks History, Wait for tasks from Tasks History and Manager broken state in ScyllaClusterManager
- make ManagerClient object function scope
- test_finished_event in ManagerClient

Fixes: scylladb/scylladb#16472
Fixes: scylladb/scylladb#16651

Closes scylladb/scylladb#18236

* github.com:scylladb/scylladb:
  test/pylib: Introduce ManagerClient.test_finished_event
  test/topology: make ManagerClient object function scope
  test/pylib: Introduce Manager broken state:
  test/pylib: Wait for tasks from Tasks History:
  test/pylib: Introduce Tasks History:
  test/pylib: Introduce Stop Event
  test/pylib: Introduce Start-Stop Lock:
2024-05-16 17:42:00 +02:00
Kefu Chai
759156b56d test: perf: alternator: mark format string as constexpr
before this change, we use `update_item_suffix` as a format string
fed to `format(...)`, which is resolved to `seastar::format()`.
but with a patch which migrates the `seastar::format()` to the backend
with compile-time format check, the caller sites using `format()` would
fail to build, because `update_item_suffix` is not a `constexpr`:
```
/home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o -MF test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o.d -o test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o -c /home/kefu/dev/scylladb/test/perf/perf_alternator.cc
/home/kefu/dev/scylladb/test/perf/perf_alternator.cc:249:69: error: call to consteval function 'fmt::basic_format_string<char, const char (&)[1]>::basic_format_string<const char *, 0>' is not a constant expression
  249 |     return make_request(cli, "UpdateItem", prefix + seastar::format(update_item_suffix, ""));
      |                                                                     ^
/usr/include/fmt/core.h:2776:67: note: read of non-constexpr variable 'update_item_suffix' is not allowed in a constant expression
 2776 |   FMT_CONSTEVAL FMT_INLINE basic_format_string(const S& s) : str_(s) {
      |                                                                   ^
/home/kefu/dev/scylladb/test/perf/perf_alternator.cc:249:69: note: in call to 'basic_format_string<const char *, 0>(update_item_suffix)'
  249 |     return make_request(cli, "UpdateItem", prefix + seastar::format(update_item_suffix, ""));
      |                                                                     ^~~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/test/perf/perf_alternator.cc:198:6: note: declared here
  198 | auto update_item_suffix = R"(
      |      ^
```

so, to prepare the change switching to compile-time format checking,
let's mark this variable `static constexpr`. this is also more correct,
as this variable is

* a compile time constant, and
* is not shared across different compilation units.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18685
2024-05-16 15:18:42 +03:00
Avi Kivity
6982de6dde Merge 'Fix stalls in forward_service::dispatch() with large tablet count' from Raphael "Raph" Carvalho
With a large tablet count, e.g. 128k, forward_service::dispatch() can potentially stall when grouping ranges per endpoint.

`    Reactor stalled for 4 ms on shard 1. Backtrace: 0x5eb15ea 0x5eb09f5 0x5eb1daf 0x3dbaf 0x2d01e57 0x33f7d1e 0x348255f 0x2d005d4 0x2d3d017 0x2d3d58c 0x2d3d225 0x5e59622 0x5ec328f 0x5ec4577 0x5ee84e0 0x5e8394a 0x8c946 0x11296f
`

Also there are inefficient copies that are being removed. partition_range_vector for a single endpoint can grow beyond 1M.

Closes scylladb/scylladb#18695

* github.com:scylladb/scylladb:
  service: fix indentation in dispatch()
  service: fix reactor stall with large tablet count
  service: avoid potential expensive copies in forward_service::dispatch()
  service: coroutinize forward_service::dispatch()
2024-05-16 15:17:43 +03:00
Kefu Chai
617e532859 db: config: drop operator<<() for error_injection_at_startup
it is not used anymore, so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18701
2024-05-16 15:10:57 +03:00
Pavel Emelyanov
dffd985401 data_dictionary: Resurrect formatter for keyspace_metadata
It was commented out by the a439ebcfce (treewide: include fmt/ranges.h
and/or fmt/std.h) , probably by mistake

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18665
2024-05-16 15:09:45 +03:00
Pavel Emelyanov
31d05925cc api,database: Move auto-compaction toggle guard
Toggling per-table auto-compaction enabling bit is guarded with
on-database boolean and raii guard. It's only used by a single
api/column_family.cc file, so it can live there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:51 +03:00
Pavel Emelyanov
a43b178f72 api: Move some table manipulation helpers from storage_service
Continuation of the previous patch -- helpers toggling tombstone_gc and
auto_compaction on tables should live in the same file that uses them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Pavel Emelyanov
862fcd7bc7 api: Move table-related calls from storage_service domain
The storage_service/(enable|disable)_(tombstone_gc|auto_compaction)
endpoints are not handled by storage_service _service_ and should rather
live in the column_family/ domain which is handler by replica::database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Pavel Emelyanov
ba53283d21 api: Reimplement some endpoints using existing helpers
The (enable|disable)_(tombstone_gc|auto_compaction) endpoints living in
column_family domain can benefit from the helpers that do the same in
the storage_service domain. The "difference" is that c.f. endpoints do
it per-table, while s.s. ones operate on a vector of tables, so the
former is a corner case of the latter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Pavel Emelyanov
231ffa623c api: Lost unset of tombstone-gc endpoints
On stop all endpoints must be unregistered, these three are lost

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Michał Jadwiszczak
b3e6a39604 test/cql-pytest/test_describe: add test for UDTs ordering 2024-05-16 13:30:03 +02:00
Michał Jadwiszczak
f29820fb27 cql3/statements/describe_statement: UDTs topological sorting
User-defined types can depend on each other, creating directed acyclic
graph.
In order to support restoring schema from `DESC SCHEMA`, UDTs should be
ordered topologically, not alphabetically as it was till now.
2024-05-16 13:30:03 +02:00
Michał Jadwiszczak
7be938192b cql3/statements/describe_statement: allow to skip alphabetical sorting
In a next commit, we are going to introduce topological sorting of
user-defined types, so alphabetical sorting must be skipped not to
interfere.
2024-05-16 13:30:03 +02:00
Michał Jadwiszczak
8157d260f2 types: add a method to get all referenced user types
The method allows to collect all UDTs used to create a type.
This is required to sort UDTs in a topological order.
2024-05-16 13:30:03 +02:00
Michał Jadwiszczak
573e13e3f1 db/cql_type_parser: use generic topological sorting 2024-05-16 13:30:03 +02:00
Michał Jadwiszczak
3830f3bd23 db/cql_type_parses: futurize raw_builder::build()
In order to use generic topological sort,
build() method needs to return future.
2024-05-16 13:30:03 +02:00
Michał Jadwiszczak
7f04c88395 test/boost: add test for topological sorting 2024-05-16 13:30:03 +02:00
Michał Jadwiszczak
aa08e586fd utils: introduce generic topological sorting algorithm
Until now, we have implemented topological sorting in
db/cql_type_parser.cc but it is specific to its usage.

Now we want to use topological sorting in another place,
so generic sorting algoritm provides one implementation
to be reused in several places.
2024-05-16 13:30:03 +02:00
Nadav Har'El
27ab560abd cql: fix hang during certain SELECT statements
The function intersection(r1,r2) in statement_restrictions.cc is used
when several WHERE restrictions were applied to the same column.
For example, for "WHERE b<1 AND b<2" the intersection of the two ranges
is calculated to be b<1.

As noted in issue #18690, Scylla is inconsistent in where it allows or
doesn't allow these intersecting restrictions. But where they are
allowed they must be implemented correctly. And it turns out the
function intersection() had a bug that caused it to sometimes enter
an infinite loop - when the intent was only to call itself once with
swapped parameters.

This patch includes a test reproducing this bug, and a fix for the
bug. The test hangs before the fix, and passes after the fix.

While at it, I carefully reviewed the entire code used to implement
the intersection() function to try to make sure that the bug we found
was the only one. I also added a few more comments where I thought they
were needed to understand complicated logic of the code.

The bug, the fix and the test were originally discovered by
Michał Chojnowski.

Fixes #18688
Refs #18690

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18694
2024-05-16 11:25:44 +03:00
Piotr Dulikowski
68eca3778c Merge 'mv: throttle view update generation for large queries' from Wojciech Mitros
This series is a reupload of #13792 with a few modifications, namely a test is added and the conflicts with recent tablet related changes are fixed.

See https://github.com/scylladb/scylladb/issues/12379 and https://github.com/scylladb/scylladb/pull/13583 for a detailed description of the problem and discussions.

This PR aims to extend the existing throttling mechanism to work with requests that internally generate a large amount of view updates, as suggested by @nyh.

The existing mechanism works in the following way:

* Client sends a request, we generate the view updates corresponding to the request and spawn background tasks which will send these updates to remote nodes
* Each background task consumes some units from the `view_update_concurrency_semaphore`, but doesn't wait for these units, it's just for tracking
* We keep track of the percent of consumed units on each node, this is called `view update backlog`.
* Before sending a response to the client we sleep for a short amount of time. The amount of time to sleep for is based on the fullness of this `view update backlog`. For a well behaved client with limited concurrency this will limit the amount of incoming requests to a manageable level.

This mechanism doesn't handle large DELETE queries. Deleting a partition is fast for the base table, but it requires us to generate a view update for every single deleted row. The number of deleted rows per single client request can be in the millions. Delaying response to the request doesn't help when a single request can generate millions of updates.

To deal with this we could treat the view update generator just like any other client and force it to wait a bit of time before sending the next batch of updates. The amount of time to wait for is calculated just like in the existing throttling code, it's based on the fullness of `view update backlogs`.

The new algorithm of view update generation looks something like this:
```c++
for(;;) {
    auto updates = generate_updates_batch_with_max_100_rows();
    co_await seastar::sleep(calculate_sleep_time_from_backlogs());
    spawn_background_tasks_for_updates(updates);
}
```
Fixes: https://github.com/scylladb/scylladb/issues/12379

Closes scylladb/scylladb#16819

* github.com:scylladb/scylladb:
  test: add test for bad_allocs during large mv queries
  mv: throttle view update generation for large queries
  exceptions: add read_write_timeout_exception, a subclass of request_timeout_exception
  db/view: extract view throttling delay calculation to a global function
  view_update_generator: add get_storage_proxy()
  storage_proxy: make view backlog getters public
2024-05-16 08:22:54 +02:00
Botond Dénes
af9e173c99 Merge 'repair: Don't get topology via database' from Pavel Emelyanov
Database has token-metadata onboard and other services use it to get topology from. Repair code has simpler and cleaner ways to get access to topology.

Closes scylladb/scylladb#18677

* github.com:scylladb/scylladb:
  repair: Get topology via replication map
  repair: Use repair_service::my_address() in handlers
  repair: Remove repair_meta::_myip
  repair: Use repair_meta::myip() everywhere
  repair: Add repair_service::my_address() method
2024-05-16 08:28:14 +03:00
Raphael S. Carvalho
715ae689c0 Implement fast streaming for intra-node migration
With intra-node migration, all the movement is local, so we can make
streaming faster by just cloning the sstable set of leaving replica
and loading it into the pending one.

This cloning is underlying storage specific, but s3 doesn't support
snapshot() yet (th sstables::storage procedure which clone is built
upon). It's only supported by file system, with help of hard links.
A new generation is picked for new cloned sstable, and it will
live in the same directory as the original.

A challenge I bumped into was to understand why table refused to
load the sstable at pending replica, as it considered them foreign.
Later I realized that sharder (for reads) at this stage of migration
will point only to leaving replica. It didn't fail with mutation
based streaming, because the sstable writer considers the shard --
that the sstable was written into -- as its owner, regardless of what
sharder says. That was fixed by mimicking this behavior during
loading at pending.

test:
./test.py --mode=dev intranode --repeat=100 passes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
a179f37780 test: tablets_test: Test sharding during intra-node migration 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
5f32d2ddb6 test: tablets_test: Check sharding also on the pending host 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
6d809c75fb test: py: tablets: Test writes concurrent with migration 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
ad02d85c16 test: py: tablets: Test crash during intra-node migration 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
7956a2991e api, storage_service: Introduce API to wait for topology to quiesce 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
679baff25a dht, replica: Remove deprecated sharder APIs 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
32a191384a test: Avoid using deprecated sharded API
There is not tablet migration in unit tests, so shard_of() can be
safely replaced with shard_for_reads(). Even if it's used for writes.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
539460dd71 db: do_apply_many() avoid deprecated sharded API 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
0f50504c39 replica: mutation_dump: Avoid deprecated sharder API 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
7bf5733fa5 repair: Avoid deprecated sharder API 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
7c03646f99 table: Remove optimization which returns empty reader when key is not owned by the shard
This check would lead to correctness issues with intra-node migration
because the shard may switch during read, from "read old" to "read
new". If the coordinator used "read old" for shard routing, but table
on the old shard is already using "read new" erm, such a read would
observe empty result, which is wrong.

Drop the optimization. In the scenario above, read will observe all
past writes because:

  1) writes are still using "write both"

  2) writes are switched to "write new" only after all requests which
  might be using "read old" are done

Replica-side coordinators should already route single-key requests to
the correct shard, so it's not important as an optimization.

This issue shows how assumptions about static sharding are embedded in
the current code base and how intra-node migration, by violating those
assumptions, can lead to correctness issues.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
26f2e6aa8e dht: is_single_shard: Avoid deprecated sharder API
All current uses are used in the read path.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
c9e6b4dca7 dht: split_range_to_single_shard: Work with static_sharder only
In preparation for intra-node tablet migration, to avoid
using deprecated sharder APIs.

This function is used for generating sstable sharding metadata.
For tablets, it is not invoked, so we can safely work with the
static sharder. The call site already passes static_sharder only.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
c380aecf64 dht: ring_position_range_sharder: Avoid deprecated sharder APIs
In preparation for tablet intra-node migration.

Existing uses are for reads, so it's safe to use shard_for_reads():
  - in multishard reader
  - in forward_service

The ring_position_range_vector_sharder is used when computing sstable
shards, which for intra-node migration should use the view for
reads. If we haven't completed streaming, sstables should be attached
to the old shard (used by reads). When in write-both-read-new stage,
streaming is complete, reads are using the new shard, and we should
attach sstables to the new shard.

When not in intra-node migration, the view for reads on the pending
node will return the pending shard even if read selector is "read old".
So if pending node restarts during streaming, we will attach to sstables
to the shard which is used by writes even though we're using the selector
for reads.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
a1aac409bf dht: token: Avoid use of deprecated sharder API by switching to static_sharder
The touched APIs are used only with static_sharder.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
dd4a086b87 selective_token_sharder: Avoid use of deprecated sharder API
I analyzed all the uses and all except the alternator/ttl.cc seem to
be interested in the result for the purpose of reading.

Alternator is not supported with tablets yet, so the use was annotated
with a relevant issue.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
eb3a22d5a8 docs: Document tablet sharding vs tablet replica placement 2024-05-16 00:28:47 +02:00
Botond Dénes
635aba435b readers/multishard.cc: use shard_for_reads() instead of shard_of()
The latter is deprecated.
2024-05-16 00:28:47 +02:00
Botond Dénes
bc779ed00c multishard_mutation_query.cc: use shard_for_reads() instead of shard_of()
The latter is deprecated.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
3b7d7088d1 storage_proxy: Extract common code to apply mutations on many shards according to sharder 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
660b3d1765 storage_proxy: Prepare per-partition rate-limiting for intra-node migration
Note: there is a potential problem with rate-limit count going out of sync
during intra-node migration between old and the new shard.

Before this patch, when coordinator accounted and admitted the
request, so the rate_limit_info passed to apply_locally() is
account_only, it was converted to std::monostate for requests to the
local replia. This makes sense because the request was already
accounted by the coordinator.

However, during intra-node migration when we do double writes to two
shards locally, that means that the new shard will not account the
write, it will have lower count than the limiter on the old
shard. This means that the new shard may accept writes which will end
up being rejected. This is not desirable, but not the end of the world
since it's temporary, and the new shard will still protect itself from
overload based on its own rate limiter.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
7c3291b5ea storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate()
Cunters are not supported with tablets, so we should not reach this path.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
db2809317d storage_proxy: Prepare mutate_hint() for intra-node tablet migration 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
feafe0f6a7 commitlog_replayer: Avoid deprecated sharder::shard_of()
shard_for_writes() is appropriate, because we're writing.  It can
happen that the tablet was migrated away and no shard is the owner. In
that case the mutation is dropped, as it should be, because "shards"
is empty.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
c9294b1642 lwt: Avoid deprecated sharder::shard_of()
Instead, use shard_for_reads(). The justification is that:

 1) In cas_shard(), we need to pick a single request coordinator.
    shard_for_reads() gives that, which is equivalent to shard_of()
    if there is no intra-node migration.

 2) In paxos handler for prepare(), the shard we execute it on is
    the shard from which we read, so shard_for_reads() is the one.

 3) Updates of paxos state are separate CQL requests, and use their
    own sharding.

 4) Handler for learn is executing updates using calls to
    storage_proxy::mutate_locally() which will use the right sharder for writes

However, the code is still not prepared for intra-node migration, and
possibly regular migration too in case of abandoned requests, because
the locking of paxos state assumes that the shard is static. That
would have to be fixed separately, e.g. by locking both shards
(shard_for_writes()) during migration, so that the set of locked
shards always intersects during migration and local serialization of
paxos state updates is achieved. I left FIXMEs for that.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
1631bab658 compaction: Avoid deprecated sharder::shard_of() 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
9da3bd84c7 dht: Extract dht::static_sharder
Before the patch, dht::sharder could be instantiated and it would
behave like a static sharder. This is not safe with regards to
extensions of the API because if a derived implementation forgets to
override some method, it would incorrectly default to the
implementation from static sharder. Better to fail the compilation in
this case, so extract static sharder logic to dht::static_sharder
class and make all methods in dht::sharder pure virtual.

This also allows us to have algorithms indicate that they only work
with static sharder by accepting the type, and have compile-time
safety for this requirement.

schema::get_sharder() is changed to return the static_sharder&.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
dbca598e99 replica: Deprecate table::shard_of() 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
a1bee16ee9 locator: Deprecate effective_replication_map::shard_of() 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
10a4903d0c dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard
Require users to specify whether we want shard for reads or for writes
by switching to appropriate non-deprecated variant.

For example, shard_of() can be replaced with shard_for_reads() or
shard_for_writes().

The next_shard/token_for_next_shard APIs have only for-reads variant,
and the act of switching will be a testimony to the fact that the code
is valid for intra-node migration.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
b3cdf9a379 tests: tablets: py: Add intra-node migration test 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
d26cd97633 tests: tablets: Test that drained nodes are not balanced internally
It would be a waste of effort to do so, since we migrate tablets away
anyway.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
04f0088679 tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
c76ba52c70 tests: tablets: Verify that disabling balancing results in no intra-node migrations 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
0addca88b9 tests: tablets: Check that nodes are internally balanced
Existing tests are augmented with a check which verifies that
all nodes are internally balanced.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
0e2617336a tests: tablets: Improve debuggability by showing which rows are missing 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
329342bfb2 tablets, storage_service: Support intra-node migration in move_tablet() API 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
db9d3f0128 tablet_allocator: Generate intra-node migration plan
Intra-node migrations are scheduled for each node independently with
the aim to equalize per-shard tablet count on each node.

This is needed to avoid severe imbalance between shards which can
happen when some table grows and is split. The inter-node balance can
be equal, so inter-node migration cannot fix the imbalance. Also, if
RF=N then there is not even a possibility of moving tablets around to
fix the imbalance.  The only way to bring the system to balance is to
move tablets within the nodes.

After scheduling inter-node migrations, the algorithm schedules
intra-node migrations. This means that across-node migrations can
proceed in parallel with intra-node migrations if there is free
capacity to carry them out, but across-node migrations have higher
priority.

Fixes #16594
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
793af3d6e1 tablet_allocator: Extract make_internode_plan()
Currently the load balancer is only generting an inter-node plan, and
the algorithm is embedded in make_plan(). The method will become even
harder to follow once we add more kinds of plan generating steps,
e.g. inter-node plan. Extract the inter-node plan to make it easier to
add other plans and see the grand flow.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
f95a0f0182 tablet_allocator: Maintain candidate list and shard tablet count for
target nodes

The node_load datastructure was not updated to reflect migration
decisions on the target node. This is not needed for inter-node
migration because target nodes are not considered as sources. But we
want it to reflect migration decisions so that later inter-node
migration sees an accurate picture with earlier migrations reflected
in node_load.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
c86f659421 tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions
Will be needed by member methods which generate migration plans.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
fdcaaea91a tablets, streaming: Implement tablet streaming for intra-node migration 2024-05-16 00:28:46 +02:00
Tomasz Grabiec
aafeacc8d9 dht, auto_refreshing_sharder: Allow overriding write selector
During streaming for intra-node migration we want to write only to the
new shard. To achieve that, allow altering write selector in
sharder::shard_for_writes() and per-instance of
auto_refreshing_sharder.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
dfed4efcc5 multishard_writer: Handle intra-node migration
This writer is used by streaming, on tablet migration and
load-and-stream.

The caller of distribute_reader_and_consume_on_shards(), which provides
a sharder, is supposed to ensure that effective_replication_map is kept
alive around it, in order for topology coordinator to wait for any writes
which may be in flight to reach their shards before tablet replica starts
another migration. This is already the case:

  1) repair and load-and-stream keep the erm around writing.

  2) tablet migration uses autorefreshing_sharder, so it does not, but
     it keeps the topology_guard around the operation in the consumer,
     which serves the same purpose.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
4df818db98 storage_proxy: Handle intra-node tablet migration for writes
When sharder says that the write should go to multiple shards,
we need to consider the write as applied only if it was applied
to all those shards.

This can happen during intra-node tablet migration. During such migration,
the request coordinator on storage_proxy side is coordinating to hosts
as if no migration was in progress. The replica-side coordinator coordinates
to shards based on sharder response.

One way to think about it is that
effective_replication_map::get_natural_endpoints()/get_pending_endpoints()
tells how to coordinate between nodes, and sharder tells how to
coordinate between shards. Both work with some snapshot of tablet
metadata, which should be kept alive around the operation. Sharder is
associated with its own effective_replication_map, which marks the
topology version as used and allows barriers to synchronize with
replica-side operations.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
6c6ce2d928 tablets: Get rid of tablet_map::get_shard()
Its semantics do not fit well with intra-node migration which allow
two owning shards. Replace uses with the new has_replica() API.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
d000ad0325 tablets: Avoid tablet_map::get_shard in cleanup
In preparation for intra-node migration for which get_shard() is not
prepared.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
daaceda963 tablets: test: Use sharder instead of tablet_map::get_shard()
tablet_map::get_shard() will go away as it is not prepared for
intra-node migration.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
d47dfceb34 tablets: tablet_sharder: Allow working with non-local host
Will be used in tests.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
6946ad2a45 sharding: Prepare for intra-node-migration
Tablet sharder is adjusted to handle intra-migration where a tablet
can have two replicas on the same host. For reads, sharder uses the
read selector to resolve the conflict. For writes, the write selector
is used.

The old shard_of() API is kept to represent shard for reads, and new
method is introduced to query the shards for writing:
shard_for_writes(). All writers should be switched to that API, which
is not done in this patch yet.

The request handler on replica side acts as a second-level
coordinator, using sharder to determine routing to shards. A given
sharder has a scope of a single topology version, a single
effective_replication_map_ptr, which should be kept alive during
writes.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
b5bb46357b docs: Document sharder use for tablets 2024-05-16 00:28:46 +02:00
Tomasz Grabiec
82b34d34d8 tablets: Introduce tablet transition kind for intra-node migration
We need a separate transition kind for intra node migration so that we
don't have to recover this information from replica set in an
expensive way. This information is needed in the hot path - in
effective_replicaiton_map, to not return the pending tablet replica to
the coordinator. From its perspective, replica set is not
transitional.

The transition will also be used to alter the behavior of the
sharder. When not in intra-node migration, the sharder should
advertise the shard which is either in the previous or next replica
set. During intra-node migration, that's not possible as there may be
two such shards. So it will return the shard according to the current
read selector.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
942ea39bf0 tests: tablets: Fix use-after-move of skiplist in rebalance_tablets()
balance_tablets() is invoked in a loop, so only the first call will
see non-empty skiplist.

This bug starts to manifest after adding intra-node migration plan,
causing failures of the test_load_balancing_with_skiplist test
case. The reason is that rebalancing will now require multiple passes
before convergence is reached, due to intra-node migrations, and later
calls will not see the skiplist and try to balance skipped nodes,
vioating test's assertions.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
4d84451cf1 sstables, gdb: Track readers in a linked list
For the purpose of scylla-gdb.py command "scylla
active-sstables". Before the patch, readers were located by scanning
the heap for live objects with vtable pointers corresponding to
readers. It was observed that the test scylla_gdb/test_misc.py::test_active_sstables started failing like this:

  gdb.error: Error occurred in Python: Cannot access memory at address 0x300000000000000

This could be explained by there being a live object on the heap which
used to be a reader but now is a different object, and the _sst field
contains some other data which is not a pointer.

To fix, track readers explicitly in a linked list so that the gdb
script can reliably walk readers.

Fixes #18618.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
fad6c41cee raft topology: Fix global token metadata barrier to not fence ahead of what is drained
Topology version may be updated, for example, by executing a RESTful
API call to move a tablet. If that is done concurrently with an
ongoing token metadata barrier executed by topology coordinator
(because there is active tablet migration, for example), then some
requests may fail due to being fenced out unnecessarily.

The problem is that barrier function assumes no concurrent topology
updates so it sets the fence version to the one which is current after
other nodes are drained. This patch changes it to set the fence to the
version which was current before other nodes were drained. Semantics
of the barrier are preserved because it only guarantees that topology
state from before the invocation of barrier is propagated.

Fixes #18699
2024-05-16 00:28:46 +02:00
Benny Halevy
3c4c81c2d9 utils: chunked_vector: optimize for trivially_copyable types
Use std::uninitialized_{copy,move} and std::destroy
that have optimizations for trivially copyable and
trivially moveable types.
In those cases, memory can be copied onto the uninitialized
memory, rather than invoking the respective copy/move constructors,
one item at a time.

perf-simple-query results:
```
base: median 95954.90 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42312 insns/op,        0 errors)
post: median 97530.65 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42331 insns/op,        0 errors)
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18609
2024-05-15 22:32:45 +03:00
Raphael S. Carvalho
012ba25b5b service: fix indentation in dispatch()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-15 16:30:06 -03:00
Raphael S. Carvalho
0a9e073154 service: fix reactor stall with large tablet count
with a large tablet count, e.g. 128k, forward_service::dispatch() can
potentially stall when grouping ranges per endpoint.

Reactor stalled for 4 ms on shard 1. Backtrace: 0x5eb15ea 0x5eb09f5 0x5eb1daf 0x3dbaf 0x2d01e57 0x33f7d1e 0x348255f 0x2d005d4 0x2d3d017 0x2d3d58c 0x2d3d225 0x5e59622 0x5ec328f 0x5ec4577 0x5ee84e0 0x5e8394a 0x8c946 0x11296f

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-15 16:30:06 -03:00
Raphael S. Carvalho
f7659b357c service: avoid potential expensive copies in forward_service::dispatch()
each partition_range_vector might grow to ~9600 elements, assuming
96-shard nodes, each with 100 tablets.

~9600 elements, where each is 120 bytes (sizeof(partition_range))
can result in vector with capacity of ~2M due to growth factor of
2.

we're copying each range 3x in dispatch(), and we can easily avoid
it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-15 16:30:06 -03:00
Raphael S. Carvalho
f9d2b9a83b service: coroutinize forward_service::dispatch()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-15 16:30:06 -03:00
Pavel Emelyanov
16db2f650e functions: Do not crash when schema is missing
Getting token() function first tries to find a schema for underlying
table and continues with nullptr if there's no one. Later, when creating
token_fct, the schema is passed as is and referenced. If it's null crash
happens.

It used to throw before 5983e9e7b2 (cql3: test_assignment: pass optional
schema everywhere) on missing schema, but this commit changed the way
schema is looked up, so nullptr is now possible.

fixes: #18637

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18639
2024-05-15 17:20:40 +03:00
Pavel Emelyanov
d267fbd894 repair: Get topology via replication map
When row_level_repair is constructed it sorts provided list of enpoints.
For that it needs to get topology from somewhere and it goes the
database->token_metadata->topology chain. Patch this palce to get
topology from erm instead. It's consistent with how other code from
row_level_repair gets it and removes one more place that uses database
to token metadata "provider".

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-15 17:07:45 +03:00
Pavel Emelyanov
2706f27cd9 repair: Use repair_service::my_address() in handlers
Some handlers want to print local node address in logs. Now the
repair_service has a method to get one, so those places can stop getting
it via database->token_metadata dependency chain.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-15 17:07:45 +03:00
Pavel Emelyanov
7fb405ba65 repair: Remove repair_meta::_myip
In favor of recently introduced my_address() one.

One nice side effect of this change is minus one place that gets token
metadata from database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-15 17:07:45 +03:00
Pavel Emelyanov
017f650955 repair: Use repair_meta::myip() everywhere
The method returns _myip and some places in this class use _myip
directly. Next patch is going to remove _myip, so prepare for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-15 17:07:45 +03:00
Pavel Emelyanov
6899bf83ec repair: Add repair_service::my_address() method
To be used in next patches

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-15 17:07:45 +03:00
Avi Kivity
a5fea84d82 Merge 'scylla-nodetool: add tablet support for ring command' from Botond Dénes
Currently, invoking `nodetool ring` on a tablet keyspace fails with an error, because it doesn't pass the required table parameter to `/storage_service/ownership/{keyspace}`. Further to this, the command will currently always output the vnode ring, regardless of the keyspace and table parameter. This series fixes this, adding tablet support to `/storage_service/tokens_endpoint`, which will now return the tablet ring (tablet token -> tablet primary replica mapping) if the new keyspace and table parameters are provided.
`nodetool status` also gets a touch-up, to provide the tablet ring's token count (the tablet count) when invoked with a tablet keyspace and table.

Fixes: #17889
Fixes: #18474

- [x] ** native-nodetool is new functionality, no backport is needed **

Closes scylladb/scylladb#18608

* github.com:scylladb/scylladb:
  test/nodetool: make test pass with cassandra nodetool
  tools/scylla-nodetool: status: fix token count for tablets
  tools/scylla-nodetool: add tablet support to ring command
  api/storage_service: add tablet support for /storage_service/tokens_endpoint
  service/storage_service: introduce get_tablet_to_endpoint_map()
  locator/tablets: introduce the primary replica concept
2024-05-15 16:05:10 +03:00
Artsiom Mishuta
d659d9338b test/pylib: Introduce ManagerClient.test_finished_event
introduce ManagerClient.test_finished_event
to block access to REST client object from the test if
ManagerClient.after_test method was called
(test teardown started)
2024-05-15 11:33:45 +02:00
Botond Dénes
7b41bb601c Merge 'Simplify access to topology::my_address()' from Pavel Emelyanov
Recent commit 12f160045b (Get rid of fb_utilities) replaced the usage of global fb_utilities and made all services use topology::my_address() in order to get local node broadcast address. Some places resulted in long dependency chains dereferences. to get to topology This PR fixes some of them.

Closes scylladb/scylladb#18672

* github.com:scylladb/scylladb:
  service_level_controller_test: Use topology::is_me() helper
  service_level_controller: Add dependency on shared_token_metadata
  tracing: Get my_address() via proxy
  storage_proxy: Get token metadata via local member, not database
2024-05-15 11:23:16 +03:00
Wojciech Mitros
5154429713 mv gossip: check errno instead of value returned by strtoull
Currently, when a view update backlog is changed and sent
using gossip, we check whether the strtoll/strtoull
function used for reading the backlog returned
LLONG_MAX/ULLONG_MAX, signaling an error of a value
exceeding the type's limit, and if so, we do not store
it as the new value for the node.

However, the ULLONG_MAX value can also be used as the max
backlog size when sending empty backlogs that were never
updated. In theory, we could avoid sending the default
backlog because each node has its real backlog (based on
the node's memory, different than the ULLONG_MAX used in
the default backlog). In practice, if the node's
backlog changed to 0, the backlog sent by it will be
likely the default backlog, because when selecting
the biggest backlog across node's shards, we use the
operator<=>(), which treats the default backlog as
equal to an empty backlog and we may get the default
backlog during comparison if the backlog of some shard
was never changed (also it's the initial max value
we compare shard's backlogs against).

This patch removes the (U)LLONG_MAX check and replaces
it with the errno check, which is also set to ERANGE during
the strtoll error, and which won't prevent empty backlogs
from being read

Fixes: #18462

Closes scylladb/scylladb#18560
2024-05-15 07:14:36 +02:00
Pavel Emelyanov
59aec1f300 database: Don't break namespace withexternal alias
The namespace replica is broken in the middle with sstable_list alias,
while the latter can be declared earlier

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18664
2024-05-14 16:45:20 +03:00
Piotr Dulikowski
9ab57b12bb Merge 'cql/describe: hide cdc log tables' from Michał Jadwiszczak
Currently all tables are printed in statements like `DESC TABLES`, `DESC KEYSPACE ks` or `DESC SCHEMA`.
But when we create a table with cdc enabled, additional table with `_scylla_cdc_log` suffix is created.
Those tables shouldn't be recreated manually but created automatically when the base table is created.

This patch hides tables with `_scylla_cdc_log` suffix in all describe statements.
To preserve properties values of those tables, `ALTER TABLE` statement with all properties and their current values for log cdc table is added to description of the base table.

Fixes #18459

Closes scylladb/scylladb#18467

* github.com:scylladb/scylladb:
  test/cql-pytest/test_describe: add test for hiding cdc tables
  cql3/statements/describe_statement: hide cdc tables
  schema: add a method to generate ALTER statement with all properties
  schema: extract schema's properties generation
2024-05-14 15:02:29 +02:00
Pavel Emelyanov
a30337e719 service_level_controller_test: Use topology::is_me() helper
The on_leave_cluster() callback needs to check if the leaving node is
the local one. It currently compares endpoint with the my_address()
obtained via pretty long dependency chain of

  auth_service->query_processor->storage_proxy->database->token_metadata

This patch makes the whole thing _much_ shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-14 15:47:12 +03:00
Pavel Emelyanov
634c066c43 service_level_controller: Add dependency on shared_token_metadata
The controller needs to access topology, so it needs the token metadata
at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-14 15:43:01 +03:00
Pavel Emelyanov
f9c34f7bd5 tracing: Get my_address() via proxy
The my_address() helper method gets the address via a long
qp->proxy->database->token_metadata->topology chain. That's quite an
overkill, storage_proxy has public my_address() method. The latter also
accesses topology, but without the help of the database. Also this
change makes tracing code a bit shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-14 15:41:04 +03:00
Pavel Emelyanov
75d5eb96f2 storage_proxy: Get token metadata via local member, not database
The my_address() method eventually needs to access topology and goes
long way via sharded<database>. No need in that, shared token metadata
is available on proxy itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-14 15:40:10 +03:00
Artsiom Mishuta
fb6b572b9e test/topology: make ManagerClient object function scope
move ManagerClient object creation/clear
to functions scope instead of session scope

to prevent test cases affect each other
by stopping sharing connections to cluster between tests
2024-05-14 14:31:10 +02:00
Artsiom Mishuta
efb079ec15 test/pylib: Introduce Manager broken state:
Waiting for all tasks does not guarantee
that test will not spawn new tasks while we wait

Manager broken state prevents all future put requests in case of
1) fail during task waiting
2) Test continue to create tasks in test_after stage
2024-05-14 14:24:03 +02:00
Artsiom Mishuta
a8bab03c15 test/pylib: Wait for tasks from Tasks History:
To ensure the atomicity of tests and recycle clusters without any issues, it is crucial
that all active requests in ScyllaClusterManager are completed before proceeding further.
2024-05-14 14:24:03 +02:00
Artsiom Mishuta
2ee063c90c test/pylib: Introduce Tasks History:
Topology tests might spawn asynchronous tasks in parallel in ScyllaClusterManager.
Tasks history is introduced to be able log and analyze all actions
against cluster in case of failures
2024-05-14 14:24:03 +02:00
Artsiom Mishuta
38125a0049 test/pylib: Introduce Stop Event
indrodce stop event that
interrupt start node on state "wait for node started" if someone wants to stop it
2024-05-14 14:24:03 +02:00
Artsiom Mishuta
4c2527efce test/pylib: Introduce Start-Stop Lock:
The methods stop, stop_gracefully, and start in ScyllaServer
are not designed for parallel execution.
To circumvent issues arising from concurrent calls,
a start_stop_lock has been introduced.
This lock ensures that these methods are executed sequentially.
2024-05-14 14:24:03 +02:00
Botond Dénes
a15a9c3e8d Merge 'utils: chunked_vector: fill ctor: make exception safe' from Benny Halevy
Currently, if the fill ctor throws an exception,
the destructor won't be called, as it object is not fully constructed yet.

Call the default ctor first (which doesn't throw)
to make sure the destructor will be called on exception.

Fixes scylladb/scylladb#18635

- [x] Although the fixes is for a rare bug, it has very low risk and so it's worth backporting to all live versions

Closes scylladb/scylladb#18636

* github.com:scylladb/scylladb:
  chunked_vector_test: add more exception safety tests
  chunked_vector_test: exception_safe_class: count also moved objects
  utils: chunked_vector: fill ctor: make exception safe
2024-05-14 13:35:02 +03:00
Botond Dénes
78afb3644c test/boost/mutation_fragment_test.cc: add test for validator validation levels
To make sure that the validator doesn't validate what the validation
level doesn't include.
2024-05-14 06:03:20 -04:00
Botond Dénes
e7b07692b6 mutation: mutation_fragment_stream_validating_filter: fix validation_level::none
Despite its name, this validation level still did some validation. Fix
this, by short-circuiting the catch-all operator(), preventing any
validation when the user asked for none.
2024-05-14 06:02:10 -04:00
Botond Dénes
f6511ca1b0 mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter
When set to false, no exceptions will be raised from the validator on
validation error. Instead, it will just return false from the respective
validator methods. This makes testing simpler, asserting exceptions is
clunky.
When true (default), the previous behaviour will remain: any validation
error will invoke on_internal_error(), resulting in either std::abort()
or an exception.
2024-05-14 05:59:40 -04:00
Piotr Dulikowski
448f651049 Merge 'hinted handoff: Prevent segmentation fault when initializing endpoint managers ' from Dawid Mędrek
We don't attempt to create an endpoint manager for a hint directory if there is no mapping host ID–IP corresponding to the directory's name, an IP address. That prevents a segmentation fault.

Fixes scylladb/scylladb#18649

Closes scylladb/scylladb#18650

* github.com:scylladb/scylladb:
  db/hints: Remove an unused header
  db/hints: Remove migrating flag before initializing endpoint managers
  db/hints: Prevent segmentation fault when initializing endpoint managers
2024-05-14 07:34:16 +02:00
Amnon Heiman
0c84692c97 replica/table.cc: Add metrics per-table-per-node
This patch adds metrics that will be reported per-table per-node.
The added metrics (that are part of the per-table per-shard metrics)
are:
scylla_column_family_cache_hit_rate
scylla_column_family_read_latency
scylla_column_family_write_latency
scylla_column_family_live_disk_space

Fixes #18642

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#18645
2024-05-14 07:54:34 +03:00
Raphael S. Carvalho
0b2ec3063c sstables: Fix incremental_reader_selector (for range reads) with tablets
incremental_reader_selector is the mechanism for incremental comsumption
of disjoint sstables on range reads.

tablet_sstable_set was implemented, such that selector is efficient with
tablets.

The problem is selector is vnode addicted and will only consider a given
set exhausted when maximum token is reached.

With tablets, that means a range read on first tablet of a given shard
will also consume other tablets living in the same shard. That results
in combined reader having to work with empty sstable readers of tablets
that don't intersect with the range of the read. It won't cause extra
I/O because the underlying sstables don't intersect with the range of
the read. It's only unnecessary CPU work, as it involves creating
readers (= allocation), feeding them into combined reader, which will
in turn invoke the sstable readers only to realize they don't have any
data for that range.

With 100k tablets (ranges), and 100 tablets per shard, and ~5 sstables
per tablet, there will be this amount of readers (empty or not):
  (100k * ((100^2 + 100) / 2) * avg_sstable_per_tablet=5) = ~2.5 billions.

~5000 times more readers, it can be quite significant additional cpu
work, even though I/O dominates the most in scans. It's an inefficiency
that we rather get rid of.

The behavior can be observed from logs (there's 1 sstable for each of
4 tablets, but note how readers are created for every single one of
them when reading only 1 tablet range):
```
table - make_reader_v2 - range=(-inf, {-4611686018427387905, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {minimum token, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._34qn42... that has range [{-9151620220812943033, start},{-4813568684827439727, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {-4611686018427387904, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._368nk2... that has range [{-4599560452460784857, start},{-78043747517466964, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {0, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._38lj42... that has range [{851021166589397842, start},{3516631334339266977, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {4611686018427387904, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._3dba82... that has range [{5065088566032249228, start},{9215673076482556375, end}]
```

Fix is about making sure the tablet set won't select past the
supplied range of the read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18556
2024-05-14 07:43:22 +03:00
Wojciech Mitros
485eb7a64c test: add test for bad_allocs during large mv queries
This patch adds a test for reproducing issue #12379, which is
being fixed in #16819.
The test case works by creating a table with a materialized
view, and then performing a partition delete query on it.
At the same time, it uses injections to limit the memory
to a level lower than usual, in order to increase the
consistency of the test, and to limit its runtime.
Before #16819, the test would exceed the limit and fail,
and now the next allocation is throttled using a sleep.
2024-05-13 18:16:39 +02:00
Jan Ciolek
e0442d7bfa mv: throttle view update generation for large queries
For every mutation applied to the base table we have to
generate the corresponding materialized view table updates.

In case of simple requests, like INSERT or UPDATE, the number
of view updates generated per base table mutation is limited
to at most a few view table updates per base table update.

The situation is different for DELETE queries, which can delete
the whole partitions or clustering ranges. Range deletions are
fast on the base table, but for the view table the situation
is different. Deleting a single partition in the base table
will generate as many singular view updates as there are rows
in the deleted partition, which could potentially be in the millions.

To prevent OOM view updates are generated in batches of at most 100 rows.
There is a loop which generates the next batch of updates, spawns tasks
to send them to remote nodes, generates another batch and so on.

The problem is that there is no concurrency control - each batch is scheduled
to be sent in the background, but the following batch is generated without
waiting for the previously generated updates to be sent. This can lead to
unbounded concurrency and OOM.

To protect against this view update generation should be limited somehow.

There is an existing mechanism for limiting view updates - throttling.
We keep track of how many pending view updates there are, in the view backlog,
and delay responses to the client based on this backlog's fullness.
For a well behaved client with limited concurrency this will slow down
the amount of incoming requests until it reaches an optimal point.

This works for simple queries (INSERT, UPDATE, ...), but it doesn't do anything
for range DELETEs. A DELETE is a single request that generates millions of view
updates, delaying client response doesn't help.

The throttling mechanism could be extend to cover this case - we could treat the
DELETE request like any other client and force it to wait before sending more updates.

This commit implements this approach - before sending the next batch of updates
the generator is forced to sleep for a bit of time, calculated using the exisiting
throttling equation.
The more full the backlog gets the more the generator will have to sleep for,
and hopefully this will prevent overloading the system with view updates.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2024-05-13 18:16:23 +02:00
Jan Ciolek
cd62697605 exceptions: add read_write_timeout_exception, a subclass of request_timeout_exception
The `request_timeout_exception` is thrown when a client request can't be completed in time.
Previously this class included some fields specific to read/write timeouts:
```
db::consistency_level consistency;
int32_t received;
int32_t block_for;
```

The problem is that a request can timeout for reasons other than read/write timeout,
for example the request might timeout due to materialized view update generation taking
too long.

In such cases of non read/write timeouts we would like to be able use request_timeout_exception,
but it contains fields that aren't releveant in these cases.

To deal with this let's create read_write_timeout_exception, which inherits
from request_timeout_exception. read_write_timout_exception will contain all
of these fields that are specific to read/write timeouts. request_timeout_exception
will become the base class that doesn't have any fields, the other case-specific
exceptions will derive from it and add the desired fields.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2024-05-13 18:16:09 +02:00
Jan Ciolek
ae28b8bdb7 db/view: extract view throttling delay calculation to a global function
In order to prevent overload caused by too many view updates,
their number is limited by delaying client responses.
The amount of time to delay for is calculated based on the
fullness of the view update backlog.

Currently this is done in the function calculate_delay,
used by abstract_write_response_handler.

In the following commits I will introduce another throttling
mechanism that uses the same equation to calculate wait time,
so it would be good to reuse the exsiting function.

Let's make the function globally accessible.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2024-05-13 18:14:56 +02:00
Pavel Emelyanov
bb1696910c Merge 'scylla-nodetool: make documentation links product and version dependant' from Botond Dénes
Currently, all documentation links that feature anywhere in the help output of scylla-nodetool, are hard-coded to point to the documentation of the latest stable release. As our documentation is version and product (open-source or enterprise) specific, this is not correct. This PR addresses this, by generating documentation links such that they point to the documentation appropriate for the product and version of the scylladb release.

Fixes: https://github.com/scylladb/scylladb/issues/18276

- [x] the native nodetool is a new feature, no backport needed

Closes scylladb/scylladb#18476

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: make doc link version-specific
  release: introduce doc_link()
  build: pass scylla product to release.cc
2024-05-13 18:03:45 +03:00
Botond Dénes
d82a31f15f service/storage_proxy: add useful version of base write throttle metrics
There are two metrics to help observe base-write throttling:
* current_throttled_base_writes
* last_mv_flow_control_delay

Both show a snapshot of what is happening right at the time of querying
these metrincs. This doesn't work well when one wants to investigate the
role throttling is playing in occasional write timeouts.s Prometheus
scrapes metrics in multi-second intervals, and the probability of that
instant catching the throttling at play is very small (almost zero).
Add two new metrics:
* throttled_base_writes_total
* mv_flow_control_delay_total

These accumulate all values, allowing graphana to derive the values and
extract information about throttle events that happened in the past
(but not necessarily at the instant of the scrape).
Note that dividing the two values, will yield the average delay for a
throttle, which is also useful.

Closes scylladb/scylladb#18435
2024-05-13 18:02:06 +03:00
Dawid Medrek
ef8f14d44b db/hints: Remove an unused header 2024-05-13 16:40:47 +02:00
Dawid Medrek
c9bbb92b1a db/hints: Remove migrating flag before initializing endpoint managers
Before these changes, if initializing endpoint
managers after the migration of hinted handoff
to host ID is done throws an exception, we
don't remove the flag indicating the migration
is still in progress. However, the migration
has, in practice, finished -- all of the
hint directories have been mapped to host IDs
and all of the nodes in the cluster are
host-ID-based. Because of that, it makes sense
to remove the flag early on.
2024-05-13 16:40:47 +02:00
Dawid Medrek
bdcde0c210 db/hints: Prevent segmentation fault when initializing endpoint managers
If hinted handoff is still IP-based and there is
a hint directory representing an IP without
a corresponding mapping to a host ID in
`locator::token_metadata`, an attemp to initialize
its endpoint manager will result in a segmentation
fault. This commit prevents that.
2024-05-13 16:40:47 +02:00
Benny Halevy
4bbb66f805 chunked_vector_test: add more exception safety tests
For insertion, with and without reservation,
and for fill and copy constructors.

Reproduces https://github.com/scylladb/scylladb/issues/18635

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-13 17:18:38 +03:00
Benny Halevy
88b3173d03 chunked_vector_test: exception_safe_class: count also moved objects
We have to account for moved objects as well
as copied objects so they will be balanced with
the respective `del_live_object` calls called
by the destructor.

However, since chunked_vector requires the
value_type to be nothrow_move_constructible,
just count the additional live object, but
do not modify _countdown or, respectively, throw
an exception, as this should be considered only
for the default and copy constructors.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-13 17:18:38 +03:00
Benny Halevy
64c51cf32c utils: chunked_vector: fill ctor: make exception safe
Currently, if the fill ctor throws an exception,
the destructor won't be called, as it object is not
fully constructed yet.

Call the default ctor first (which doesn't throw)
to make sure the destructor will be called on exception.

Fixes scylladb/scylladb#18635

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-13 17:18:38 +03:00
Michał Jadwiszczak
3e5c34831c test/cql-pytest/test_describe: add test for hiding cdc tables 2024-05-13 16:14:11 +02:00
Michał Jadwiszczak
f12edbdd95 cql3/statements/describe_statement: hide cdc tables
Tables with `_scylla_cdc_log` suffix are internal tables used by cdc.

We want to hide those tables in all describe statements, as they
shouldn't be created by user but created by Scylla when user creates a
table with cdc enabled.

Instead, we include `ALTER TABLE <cdc log table> WITH <all table properties>`
to the description of cdc base table, so all changes to cdc log table's
properties are preserved in backup.
2024-05-13 16:11:13 +02:00
Michał Jadwiszczak
05a51c9286 schema: add a method to generate ALTER statement with all properties
In the describe statement, we need to generate `ALTER TABLE` statement
with all schema's properties for some tables (cdc log tables).

The method prints valid CQL statement with current values of
the properties.
2024-05-13 16:11:06 +02:00
Michał Jadwiszczak
b62f7a1dd3 schema: extract schema's properties generation
In a later commit, we want to add a method to create
`ALTER TABLE ... WITH` statement including all schema's
properties with current values.
2024-05-13 14:52:32 +02:00
Asias He
952dfc6157 repair: Introduce repair_partition_count_estimation_ratio config option
In commit 642f9a1966 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes #18615

Closes scylladb/scylladb#18634
2024-05-13 15:16:55 +03:00
Botond Dénes
afa870a387 Merge 'Some sstable set related improvements' from Raphael "Raph" Carvalho
Closes scylladb/scylladb#18616

* github.com:scylladb/scylladb:
  replica: Make it explicit table's sstable set is immutable
  replica: avoid reallocations in tablet_sstable_set
  replica: Avoid compound set if only one sstable set is filled
2024-05-13 14:17:24 +03:00
Botond Dénes
a77796f484 test/nodetool: make test pass with cassandra nodetool
After the recent fixes 4 tests started failing with the java nodetool
implementation. We are about to ditch the java implementation, but until
we actually do, it is valuable to keep the tests passing with both the
native and java implementation.
So in this patch, these tests are fixed to pass with the java
implementation too.
There is one test, test_help.py, which fails only if run together with
all the tests. I couldn't confirm this 100%, but it seems like this is
due to JMX sending a rouge request on some timer, which happens to hit
this test. I don't think this is worth trying to fix.
2024-05-13 07:09:20 -04:00
Botond Dénes
bec4c17db4 tools/scylla-nodetool: status: fix token count for tablets
Currently, the token count column is always based on the vnodes, which
makes no sense for tablet keyspaces. If a tablet keyspace is provided as
the keyspace argument, don't print the vnode token count. If the user
provided a table argument as well, print the tablet count, otherwise
print "?".
2024-05-13 07:09:20 -04:00
Botond Dénes
e82455beab tools/scylla-nodetool: add tablet support to ring command
Add a table parameter. Pass both keyspace and table (when provided) to
the /storage_service/tokens_endpoint API endpoint, so that the returned
(and printed) token ring is that of the table's tablets, not the vnode
ring.
Also pass the table param to the ownership API, which will complain if
this param is missing for a tablet keyspace.
2024-05-13 07:09:20 -04:00
Botond Dénes
fd25bb6f9f api/storage_service: add tablet support for /storage_service/tokens_endpoint
Add a keyspace and cf parameter. When specified, the endpoint will
return token -> primary replica mapping for the table's tablet tokens,
not the vnodes.
2024-05-13 07:09:20 -04:00
Botond Dénes
8690dbf8ad service/storage_service: introduce get_tablet_to_endpoint_map()
The tablet variant of the existing get_token_to_endpoint_map(), which
returns a list of tablet tokens and the primary replica for each.
2024-05-13 06:57:13 -04:00
Pavel Emelyanov
2ce643d06b table: Directly compare std::optional<shard_id> with shard_id
There's a loop that calculates the number of shard matches over a tablet
map. The check of the given shard against optional<shard> can be made
shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18592
2024-05-13 13:25:05 +03:00
Andrei Chekun
76a766cab0 Migrate alternator tests to PythonTestSuite
As part of the unification process, alternator tests are migrated to the PythonTestSuite instead of using the RunTestSuite. The main idea is to have one suite, so there will be easier to maintain and introduce new features.
Introduce the prepare_sql option for suite.yaml to add possibility to run cql statements as precondition for the test suite.
Related: https://github.com/scylladb/scylladb/issues/18188

Closes scylladb/scylladb#18442
2024-05-13 13:23:29 +03:00
Avi Kivity
51d09e6a2a cql3: castas_fcts: do not rely on boost casting large multiprecision integers to floats behavior
In [1] a bug casting large multiprecision integers to floats is documented (note that it
received two fixes, the most recent and relevant is [2]). Even with the fix, boost now
returns NaN instead of ±∞ as it did before [3].

Since we cannot rely on boost, detect the conditions that trigger the bug and return
the expected result.

The unit test is extended to cover large negative numbers.

Boost version behavior:
 - 1.78 - returns ±∞
 - 1.79 - terminates
 - 1.79 + fix - returns NaN

Fixes https://github.com/scylladb/scylladb/issues/18508

[1] https://github.com/boostorg/multiprecision/issues/553
[2] ea786494db
[3] https://github.com/boostorg/math/issues/1132

Closes scylladb/scylladb#18532
2024-05-13 13:18:28 +03:00
Yaniv Michael Kaul
4639ca1bf5 compaction_strategy.cc: typo -> "performanceimproves" -> "performance improves"
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#18629
2024-05-13 08:43:38 +03:00
Patryk Wrobel
ec820e214c scylla_io_setup: ensure correct RLIMIT_NOFILE for iotune
The default limit of open file descriptors
per process may be too small for iotune on
certain machines with large number of cores.

In such case iotune reports failure due to
unability to create files or to set up seastar
framework.

This change configures the limit of open file
descriptors before running iotune to ensure
that the failure does not occur.

The limit is set via 'resource.setrlimit()' in
the parent process. The limit is then inherited
by the child process.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#18546
2024-05-13 08:35:52 +03:00
Botond Dénes
32a0867b38 locator/tablets: introduce the primary replica concept
The primary replica is an arbitrary replica of the tablet's, which is
considered to tbe the "main" owner of the tablet, similar to how
replicas own tokens in the vnode world.
To avoid aliasing the primary replicas with a certain DC or rack,
primary replicas are rotated among the tablet's replicas, selecting
tablet_id % replica_count as the primary replica.
2024-05-13 01:35:05 -04:00
Avi Kivity
cc8b4e0630 batchlog_manager, test: initialize delay configuration
In b4e66ddf1d (4.0) we added a new batchlog_manager configuration
named delay, but forgot to initialize it in cql_test_env. This somehow
worked, but doesn't with clang 18.

Fix it by initializing to 0 (there isn't a good reason to delay it).
Also provide a default to make it safer.

Closes scylladb/scylladb#18572
2024-05-13 07:57:35 +03:00
Israel Fruchter
a1a6bd6798 Update tools/cqlsh submodule to v6.0.18
* tools/cqlsh e5f5eafd...c8158555 (11):
  > cqlshlib/sslhandling: fix logic of `ssl_check_hostname`
  > cqlshlib/sslhandling.py: don't use empty userkey/usercert
  > Dockerfile: noninteractive isn't enough for answering yet on apt-get
  > fix cqlsh version print
  > cqlshlib/sslhandling: change `check_hostname` deafult to False
  > Introduce new ssl configuration for disableing check_hostname
  > set the hostname in ssl_options.server_hostname when SSL is used
  > issue-73 Fixed a bug where username and password from the credentials file were ignored.
  > issue-73 Fixed a bug where username and password from the credentials file were ignored.
  > issue-73
  > github actions: update `cibuildwheel==v2.16.5`

Fixes: scylladb/scylladb#18590

Closes scylladb/scylladb#18591
2024-05-13 07:25:10 +03:00
Yaron Kaikov
3eb81915c1 docker: drop jmx and tools-java from installation
Following the work done in dd0779675f,
removing the scylla-jmx and scylla-tools-java from our docker image

Closes scylladb/scylladb#18566
2024-05-13 07:24:23 +03:00
Takuya ASADA
9538af0d95 scylla_kernel_check: fix block device size error on latest mkfs.xfs
On latest mkfs.xfs, it does not allow to format a block device which is
smaller than 300MB.
There are options to ignore this validation but it is unsupported
feature, so it is better to increase the loopback image size to
"supported size" == 300MB.

reference: https://lore.kernel.org/all/164738662491.3191861.15611882856331908607.stgit@magnolia/

Fixes #18568

Closes scylladb/scylladb#18620
2024-05-13 07:23:29 +03:00
Avi Kivity
c8cc47df2d Merge 'replica: allocate storage groups dynamically' from Aleksandra Martyniuk
Allocate storage groups dynamically, i.e.:
- on table creation allocate only storage groups that are on this
  shard;
- allocate a storage group for tablet that is moved to this shard;
- deallocate storage group for tablet that is moved out of this shard.

Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` before change:
```
random-seed=2248493992
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
64933.90 tps ( 63.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42163 insns/op,        0 errors)
65865.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42155 insns/op,        0 errors)
66649.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
67029.60 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
68361.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42166 insns/op,        0 errors)

median 66649.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
median absolute deviation: 784.00
maximum: 68361.21
minimum: 64933.90
```

Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` after change:
```
random-seed=2248493992
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
63744.12 tps ( 63.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42153 insns/op,        0 errors)
66613.16 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42153 insns/op,        0 errors)
69667.39 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42184 insns/op,        0 errors)
67824.78 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42180 insns/op,        0 errors)
67244.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42174 insns/op,        0 errors)

median 67244.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42174 insns/op,        0 errors)
median absolute deviation: 631.05
maximum: 69667.39
minimum: 63744.12
```

Fixes: #16877.

Closes scylladb/scylladb#17664

* github.com:scylladb/scylladb:
  test: add test for back and forth tablets migration
  replica: allocate storage groups dynamically
  replica: refresh snapshot in compaction_group::cleanup
  replica: add rwlock to storage_group_manager
  replica: handle reads of non-existing tablets gracefully
  service: move to cleanup stage if allow_write_both_read_old fails
  replica: replace table::as_table_state
  compaction: pass compaction group id to reshape_compaction_group
  replica: open code get_compaction_group in perform_cleanup_compaction
  replica: drop single_compaction_group_if_available
2024-05-12 21:22:02 +03:00
Nadav Har'El
9813ec9446 Merge 'test: perf: add end-to-end benchmark for alternator' from Marcin Maliszkiewicz
The code is based on similar idea as perf_simple_query. The main differences are:
  - it starts full scylla process
  - communicates with alternator via http (localhost)
  - uses richer table schema with all dynamoDB types instead of only strings

  Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

  Results on my machine (with 1 vCPU):
  > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
  ...
  median 23402.59616090321
  median absolute deviation: 598.77
  maximum: 24014.41
  minimum: 19990.34

  > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
  ...
  median 16089.34211320635
  median absolute deviation: 552.65
  maximum: 16915.95
  minimum: 14781.97

  The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).

Related: https://github.com/scylladb/scylladb/issues/12518

Closes scylladb/scylladb#13121

* github.com:scylladb/scylladb:
  test: perf: alternator: add option to skip data pre-population
  perf-alternator-workloads: add operations-per-shard option
  test: perf: add global secondary indexes write workload for alternator
  test: perf: add option to continue after failed request
  test: perf: add read modify write workload for alternator (lwt)
  test: perf: add scan workload for alternator
  test: perf: add end-to-end benchmark for alternator
  test: perf: extract result aggregation logic to a separate struct
2024-05-12 18:15:29 +03:00
Kefu Chai
fd14b6f26b test/nodetool: do not accept 1 return code when passing --help to nodetool
in 906700d5, we accepted 0 as well as the return code of
"nodetool <command> --help", because we needed to be prepared for
the newer seastar submodule while be compatible with the older
seastar versions. now that in 305f1bd3, we bumped up the seastar
module, and this commit picked up the change to return 0 when
handling "--help" command line option in seastar, we are able to
drop the workaround.

so, in this change, we only use "0" as the expected return code.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18627
2024-05-12 14:30:31 +03:00
Avi Kivity
be76527781 Merge 'build: cmake build dist-unified by default and put tarballs under per-config paths' from Kefu Chai
in the same spirit of d57a82c156, this change adds `dist-unified` as one of the default targets. so that it is built by default. the unified package is required to when redistributing the precompiled packages -- we publish the rpm, deb and tar balls to S3.

- [x] cmake related change, no need to backport

Closes scylladb/scylladb#18621

* github.com:scylladb/scylladb:
  build: cmake: use paths to be compatible with CI
  build: cmake build dist-unified by default
2024-05-12 11:16:03 +03:00
Benny Halevy
796ca367d1 gossiper: rename topo_sm member to _topo_sm
Follow scylla convention for class member naming.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18528
2024-05-12 11:02:35 +03:00
Avi Kivity
2ad13e5d76 auth: complete coroutinization of password_authenticator::create_default_if_missing
password_authenticator::create_default_if_missing() is a confusing mix of
coroutines and continuations, simplify it to a normal coroutine.

Closes scylladb/scylladb#18571
2024-05-11 17:04:20 +03:00
Kefu Chai
1186ddef16 build: cmake: use paths to be compatible with CI
our CI workflow for publishing the packages expects the tar balls
to be located under `build/$buildMode/dist/tar`, where `$buildMode`
is "release" or "debug".

before this change, the CMake building system puts the tar balls
under "build/dist" when the multi-config generator is used. and
`configure.py` uses multi-config generator.

in this change, we put the tar balls for redistribution under
`build/$<CONFIG>/dist/tar`, where `$<CONFIG>` is "RelWithDebInfo"
or "Debug", this works better with the CI workflow -- we just need
to map "release" and "debug" to "RelWithDebInfo" and "Debug" respectively.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-11 21:56:50 +08:00
Kefu Chai
0f85255c74 build: cmake build dist-unified by default
in the same spirit of d57a82c156, this change adds `dist-unified`
as one of the default targets. so that it is built by default.
the unified package is required to when redistributing the precompiled
packages -- we publish the rpm, deb and tar balls to S3.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-11 18:44:11 +08:00
Raphael S. Carvalho
7faba69f28 replica: Make it explicit table's sstable set is immutable
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-10 11:58:08 -03:00
Raphael S. Carvalho
55c0272b68 replica: avoid reallocations in tablet_sstable_set
reserve upfront wherever possible to avoid reallocations.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-10 10:44:39 -03:00
Raphael S. Carvalho
35a0d47408 replica: Avoid compound set if only one sstable set is filled
Most of the time only main set is filled, so we can avoid one layer
of indirection (= compound set) when maintenance set is empty.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-10 10:44:34 -03:00
Aleksandra Martyniuk
51fdda4199 test: add test for back and forth tablets migration 2024-05-10 15:08:56 +02:00
Aleksandra Martyniuk
b4371a0ea0 replica: allocate storage groups dynamically
Currently empty storage_groups are allocated for tablets that are
not on this shard.

Allocate storage groups dynamically, i.e.:
- on table creation allocate only storage groups that are on this
  shard;
- allocate a storage group for tablet that is moved to this shard;
- deallocate storage group for tablet that is cleaned up.

Stop compaction group before it's deallocated.

Add a flag to table::cleanup_tablet deciding whether to deallocate
sgs and use it in commitlog tests.
2024-05-10 15:08:21 +02:00
Aleksandra Martyniuk
6e1e082e8c replica: refresh snapshot in compaction_group::cleanup
During compaction_group::cleanup sstables set is updated, but
row_cache::_underlaying still keeps a shared ptr to the old set.
Due to that descriptors to deleted sstables aren't closed.

Refresh snapshot in order to store new sstables set in _underlying
mutation source.
2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk
c283746b32 replica: add rwlock to storage_group_manager
Add rwlock which prevents storage groups from being added/deleted
while some other layers itereates over them (or their compaction
groups).

Add methods to iterate over storage groups with the lock held.
2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk
54fcb7be53 replica: handle reads of non-existing tablets gracefully
In the following patches, storage groups (and so also sstables sets)
will be allocated only for tablets that are located on this shard.
Some layers may try to read non-existing sstable sets.

Handle this case as if the sstables set was empty instead of calling
on_internal_error.
2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk
561fb1dd09 service: move to cleanup stage if allow_write_both_read_old fails
If allow_write_both_read_old tablet transition stage fails, move
to cleanup_target stage before reverting migration.

It's a preparation for further patches which deallocate storage
group of a tablet during cleanup.
2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk
532653f118 replica: replace table::as_table_state
Replace table::as_table_state with table::try_get_table_state_with_static_sharding
which throws if a table does not use static sharding.
2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk
cf9913b0b7 compaction: pass compaction group id to reshape_compaction_group
Pass compaction group id to
shard_reshaping_compaction_task_impl::reshape_compaction_group.
Modify table::as_table_state to return table_state of the given
compaction group.
2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk
90d618d8c9 replica: open code get_compaction_group in perform_cleanup_compaction
Open code get_compaction_group in table::perform_cleanup_compaction
as its definition won't be relevant once storage groups are allocated
dynamically.
2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk
8505389963 replica: drop single_compaction_group_if_available
Drop single_compaction_group_if_available as it's unused.
2024-05-10 14:56:38 +02:00
Lakshmi Narayanan Sreethar
d39adf6438 compaction: improve partition estimates for garbage collected sstables
When a compaction strategy uses garbage collected sstables to track
expired tombstones, do not use complete partition estimates for them,
instead, use a fraction of it based on the droppable tombstone ratio
estimate.

Fixes #18283

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#18465
2024-05-10 13:02:34 +03:00
Botond Dénes
3286a6fa14 Merge 'Reload reclaimed bloom filters when memory is available' from Lakshmi Narayanan Sreethar
PR #17771 introduced a threshold for the total memory used by all bloom filters across SSTables. When the total usage surpasses the threshold, the largest bloom filter will be removed from memory, bringing the total usage back under the threshold. This PR adds support for reloading such reclaimed bloom filters back into memory when memory becomes available (i.e., within the 10% of available memory earmarked for the reclaimable components).

The SSTables manager now maintains a list of all SSTables whose bloom filter was removed from memory and attempts to reload them when an SSTable, whose bloom filter is still in memory, gets deleted. The manager reloads from the smallest to the largest bloom filter to maximize the number of filters being reloaded into memory.

Closes scylladb/scylladb#18186

* github.com:scylladb/scylladb:
  sstable_datafile_test: add testcase to test reclaim during reload
  sstable_datafile_test: add test to verify auto reload of reclaimed components
  sstables_manager: reload previously reclaimed components when memory is available
  sstables_manager: start a fiber to reload components
  sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables
  sstable_datafile_test: add test to verify reclaimed components reload
  sstables: support reloading reclaimed components
  sstables_manager: add new intrusive set to track the reclaimed sstables
  sstable: add link and comparator class to support new instrusive set
  sstable: renamed intrusive list link type
  sstable: track memory reclaimed from components per sstable
  sstable: rename local variable in sstable::total_reclaimable_memory_size
2024-05-10 13:01:01 +03:00
Kefu Chai
305f1bd382 Update seastar submodule
* seastar b73e5e7d...42f15a5f (27):
  > prometheus: revert the condition for enabling aggregation
  > tests/unit: add a unit test for json2code
  > seastar-json2code: fix the path param handling
  > github/workflow: do not override <clang++,23,release>
  > github/workflow: add a github workflow for running tests
  > prometheus: support disabling aggregation at query time
  > apps/httpd: free allocated http_server_control
  > rpc: cast rpc::tuple to std::tuple when passing it to std::apply
  > stall-analyser: move `args` into main()
  > stall-analyser: move print_command_line_options() out of Graph
  > stall-analyser: pass branch_threshold via parameter
  > stall-analyser: move process_graph() into Graph class
  > scripts: addr2line: cache the results of resolve_address()
  > stall-analyser: document the parser of log lines
  > stall-analyser: move resolver into main()
  > stall-analyser: extract get_command_line_parser() out
  > stall-analyser: move graph into main()
  > stall-analyser: extract main() out
  > stall-analyser: extract print_command_line_options() out
  > stall-analyser: add more typing annotatins
  > stall-analyser: surround top-level function with two empty lines
  > core/app_template: return status code 0 for --help
  > iotune: Print file alignments too
  > seastar-json2code: extract Parameter class
  > seastar-json2code: use f-string when appropriate
  > seastar-json2code: use nickname in place of oper['nickname']
  > seastar-json2code: use dict.get() when checking allowMultiple

Closes scylladb/scylladb#18598
2024-05-10 12:50:16 +03:00
Patryk Jędrzejczak
a04ea7b997 topology_coordinator: send barrier to a decommissioning node
The code in `global_token_metadata_barrier` allows drain to fail.
Then, it relies on fencing. However, we don't send the barrier
command to a decommissioning node, which may still receive requests.
The node may accept a write with a stale topology version. It makes
fencing ineffective.

Fix this issue by sending the barrier command to a decommissioning
node.

The raft-based topology is moved out of experimental in 6.0, no need
to backport the patch.

Fixes scylladb/scylladb#17108

Closes scylladb/scylladb#18599
2024-05-10 10:53:16 +02:00
Botond Dénes
c35031dda5 Merge 'repair: tablet_repair: make best effort in spite of errors' from Benny Halevy
Currently if any shard repair task fails,
`tablet_repair_task_impl` per-shard loop
breaks, since it doesn't handle the expection.
Although repair does return an error, which
is as expected, we change vnode-based repair
to make a best effort and try to repair
as much as it can, even if any of the ranges
failed.

This causes the `test_repair_with_down_nodes_2b`
dtest to fail with tablets, as seen in, e.g.
https://jenkins.scylladb.com/view/master/job/scylla-master/job/tablets/job/gating-dtest-release-with-tablets/52/testReport/repair_additional_test/TestRepairAdditional/FullDtest___full_split002___test_repair_with_down_nodes_2b/
```
AssertionError: assert 1765 == 2000
```

- [x] ** Backport reason (please explain below if this patch should be backported or not) **
Tablet repair code will be introduced in 6.0, no need to backport to earlier versions.

Closes scylladb/scylladb#18518

* github.com:scylladb/scylladb:
  repair: tablet_repair_task_impl: modernize table lookup
  repair: tablet_repair: make best effort in spite of errors
2024-05-10 10:51:09 +03:00
Piotr Dulikowski
a3070089de main: initialize scheduling group keys before service levels
Due to scylladb/seastar#2231, creating a scheduling group and a
scheduling group key is not safe to do in parallel. The service level
code may attempt to create scheduling groups while
the cql_transport::cql_sg_stats scheduling group key is being created.

Until the seastar issue is fixed, move initialization of the cql sg
states before service level initialization.

Refs: scylladb/seastar#2231

Closes scylladb/scylladb#18581
2024-05-10 10:35:05 +03:00
Kefu Chai
28791aa2c1 build: cmake: link thrift against absl::header
this change is a leftover of 0b0e661a85.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18596
2024-05-09 18:43:23 +03:00
Avi Kivity
37d32a5f8b Merge 'Cleanup inactive reads on tablet migration' from Botond Dénes
When a tablet is migrated away, any inactive read which might be reading from said tablet, has to be dropped. Otherwise these inactive reads can prevent sstables from being removed and these sstables can potentially survive until the tablet is migrated back and resurrect data.
This series introduces the fix as well as a reproducer test.

Fixes: https://github.com/scylladb/scylladb/issues/18110

Closes scylladb/scylladb#18179

* github.com:scylladb/scylladb:
  test: add test for cleaning up cached querier on tablet migration
  querier: allow injecting cache entry ttl by error injector
  replica/table: cleanup_tablet(): clear inactive reads for the tablet
  replica/database: introduce clear_inactive_reads_for_tablet()
  replica/database: introduce foreach_reader_concurrency_semaphore
  reader_concurrency_semaphore: add range param to evict_inactive_reads_for_table()
  reader_concurrency_semaphore: allow storing a range with the inactive reader
  reader_concurrency_semaphore: avoid detach() in inactive_read_handle::abandon()
2024-05-09 17:34:49 +03:00
Lakshmi Narayanan Sreethar
4d22c4b68b sstable_datafile_test: add testcase to test reclaim during reload
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 19:57:40 +05:30
Pavel Emelyanov
5497bb5a3d loading_shared_values: Replace static-assert with concept
The templatized get_or_load() accepts Loader template parameter and
static-asserts on its signature. Concept is more suitable here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18582
2024-05-09 16:29:49 +03:00
Patryk Jędrzejczak
332bd8ea98 raft: raft_group_registry: start_server_for_group: catch and rethrow abort_requested_exception
If we initiate the shutdown while starting the group 0 server,
we could catch `abort_requested_exception` in `start_server_for_group`
and call `on_internal_error`. Then, Scylla aborts with a coredump.
It causes problems in tests that shut down bootstrapping nodes.

The `abort_requested_exception` can be thrown from
`gossiper::lock_endpoint` called in
`storage_service::topology_state_load`. So, the issue is new and
applies only to the raft-based topology. Hence, there is no need
to backport the patch.

Fixes scylladb/scylladb#17794
Fixes scylladb/scylladb#18197

Closes scylladb/scylladb#18569
2024-05-09 14:55:11 +02:00
Benny Halevy
073680768f repair: tablet_repair_task_impl: modernize table lookup
Currently, the loop that goes over all repair metas
checks for the table's existance using `find_column_family()`.
Although this is correct, it might cause an exception storm
if a table o keyspace are dropped during repair.

This can be avoided by using the more modern interface,
`get_table_if_exists` in the database `tables_metadata`
that returns a `lw_shared_ptr<replica::table>`, exactly
as we need, that has value iff the table still exists
without throwing any exception.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-09 15:43:00 +03:00
Benny Halevy
c55aa4b121 repair: tablet_repair: make best effort in spite of errors
Currently if any shard repair task fails,
`tablet_repair_task_impl` per-shard loop
breaks, since it doesn't handle the expection.
Although repair does return an error, which
is as expected, we change vnode-based repair
to make a best effort and try to repair
as much as it can, even if any of the ranges
failed.

This causes the `test_repair_with_down_nodes_2b`
dtest to fail with tablets, as seen in, e.g.
https://jenkins.scylladb.com/view/master/job/scylla-master/job/tablets/job/gating-dtest-release-with-tablets/52/testReport/repair_additional_test/TestRepairAdditional/FullDtest___full_split002___test_repair_with_down_nodes_2b/
```
AssertionError: assert 1765 == 2000
```

This change adds a check for the keyspace and table presence
whenever an individual repair task fails, instead of the
global check at the end, so that failures due to dropping
of the keyspace or the table are logged as warnings, but
ignored for the purpose of failing the overall repair status.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-09 15:42:59 +03:00
Lakshmi Narayanan Sreethar
a080daaa94 sstable_datafile_test: add test to verify auto reload of reclaimed components
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar
0b061194a7 sstables_manager: reload previously reclaimed components when memory is available
When an SSTable is dropped, the associated bloom filter gets discarded
from memory, bringing down the total memory consumption of bloom
filters. Any bloom filter that was previously reclaimed from memory due
to the total usage crossing the threshold, can now be reloaded back into
memory if the total usage can still stay below the threshold. Added
support to reload such reclaimed filters back into memory when memory
becomes available.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar
f758d7b114 sstables_manager: start a fiber to reload components
Start a fiber that gets notified whenever an sstable gets deleted. The
fiber doesn't do anything yet but the following patch will add support
to reload reclaimed components if there is sufficient memory.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar
24064064e9 sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables
The testcase uses an sstable whose mutation key and the generation are
owned by different shards. Due to this, when process_sstable_dir is
called, the sstable gets loaded into a different shard than the one that
was intended. This also means that the sstable and the sstable manager
end up in different shards.

The following patch will introduce a condition variable in sstables
manager which will be signalled from the sstables. If the sstable and
the sstable manager are in different shards, the signalling will cause
the testcase to fail in debug mode with this error : "Promise task was
set on shard x but made ready on shard y". So, fix it by supplying
appropriate generation number owned by the same shard which owns the
mutation key as well.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
69b2a127b0 sstable_datafile_test: add test to verify reclaimed components reload
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
54bb03cff8 sstables: support reloading reclaimed components
Added support to reload components from which memory was previously
reclaimed as the total memory of reclaimable components crossed a
threshold. The implementation is kept simple as only the bloom filters
are considered reclaimable for now.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
2340ab63c6 sstables_manager: add new intrusive set to track the reclaimed sstables
The new set holds the sstables from where the memory has been reclaimed
and is sorted in ascending order of the total memory reclaimed.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
140d8871e1 sstable: add link and comparator class to support new instrusive set
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
3ef2f79d14 sstable: renamed intrusive list link type
Renamed the intrusive list link type to differentiate it from the set
link type that will be added in an upcoming patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
02d272fdb3 sstable: track memory reclaimed from components per sstable
Added a member variable _total_memory_reclaimed to the sstable class
that tracks the total memory reclaimed from a sstable.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
a53af1f878 sstable: rename local variable in sstable::total_reclaimable_memory_size
Renamed local variable in sstable::total_reclaimable_memory_size in
preparation for the next patch which adds a new member variable
_total_memory_reclaimed to the sstable class.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Marcin Maliszkiewicz
a1099791c4 test: perf: alternator: add option to skip data pre-population 2024-05-09 13:59:17 +02:00
Marcin Maliszkiewicz
fd416fac3b perf-alternator-workloads: add operations-per-shard option 2024-05-09 13:59:13 +02:00
Marcin Maliszkiewicz
5b8acf182a test: perf: add global secondary indexes write workload for alternator 2024-05-09 13:59:08 +02:00
Marcin Maliszkiewicz
43a64ac558 test: perf: add option to continue after failed request 2024-05-09 13:59:03 +02:00
Marcin Maliszkiewicz
70b5b5024b test: perf: add read modify write workload for alternator (lwt) 2024-05-09 13:58:58 +02:00
Marcin Maliszkiewicz
5b8e554431 test: perf: add scan workload for alternator 2024-05-09 13:58:54 +02:00
Marcin Maliszkiewicz
55030b1550 test: perf: add end-to-end benchmark for alternator
The code is based on similar idea as perf_simple_query. The main differences are:
- it starts full scylla process
- communicates with alternator via http (localhost)
- uses richer table schema with all dynamoDB types instead of only strings

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

Results on my machine (with 1 vCPU):
> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
...
median 23402.59616090321
median absolute deviation: 598.77
maximum: 24014.41
minimum: 19990.34

> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
...
median 16089.34211320635
median absolute deviation: 552.65
maximum: 16915.95
minimum: 14781.97

The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).
2024-05-09 13:58:40 +02:00
Marcin Maliszkiewicz
6152223890 test: perf: extract result aggregation logic to a separate struct
It will be reused later by a new tool.
2024-05-09 13:58:29 +02:00
Gleb Natapov
3b40d450e5 gossiper: try to locate an endpoint by the host id when applying state if search by IP fails
Even if there is no endpoint for the given IP the state can still belong to existing endpoint that
was restarted with different IP, so lets try to locate the endpoint by host id as well. Do it in raft
topology mode only to not have impact on gossiper mode.

Also make the test more robust in detecting wrong amount of entries in
the peers table. Today it may miss that there is a wrong entry there
because the map will squash two entries for the same host id into one.

Fixes: scylladb/scylladb#18419
Fixes: scylladb/scylladb#18457
2024-05-09 13:14:54 +02:00
Patrik
b0fbe71eaf Update launch-on-gcp.rst
Closes scylladb/scylladb#18512
2024-05-09 10:12:31 +03:00
Avi Kivity
b7055b5f2f storage_service: don't rely on optional<> formatting for removed node error
std::optional formatting changed while moving from the home-grown formatter to
the fmt provided formatter; don't rely on it for user visible messages.

Here, the optional formatted is known to be engaged, so just print it.

Closes scylladb/scylladb#18534
2024-05-09 10:03:23 +03:00
Kefu Chai
906700d523 test/nodetool: accept -1 returncode also when --help is invoked
in newer seastar, 0 is returned as the returncode of the application
when handling `--help`. to prepare for this behavior, let's
accept it before updating the seastar submodule.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18574
2024-05-09 08:26:44 +03:00
Kefu Chai
6047b3b6aa build: cmake: build async_utils.cc
async_utils.cc was introduced in e1411f39, so let's
update the cmake building system to build it. without
which, we'd run into link failure like:

```
ld.lld: error: undefined symbol: to_mutation_gently(canonical_mutation const&, seastar::lw_shared_ptr<schema const>)
>>> referenced by storage_service.cc
>>>               storage_service.cc.o:(service::storage_service::merge_topology_snapshot(service::raft_snapshot)) in archive service/Dev/libservice.a
>>> referenced by group0_state_machine.cc
>>>               group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>)) inarchive service/Dev/libservice.a
>>> referenced by group0_state_machine.cc
>>>               group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>) (.resume)) in archive service/Dev/libservice.a
>>> referenced 1 more times
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18524
2024-05-09 08:26:44 +03:00
Kefu Chai
c336904722 build: cmake: mark abseil include SYSTEM
this change is a followup of 0b0e661a. it helps to ensure that the header files in
abseil submodule have higher priority when the compiler includes abseil headers
when building with CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18523
2024-05-09 08:26:44 +03:00
Kefu Chai
2a9a874e19 db,service: fix typos in comments
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18567
2024-05-09 08:26:44 +03:00
Anna Stuchlik
65c8b81051 doc: add OS support in version 6.0
This commit adds OS support in version 6.0.
In addition, it removes the information about version 5.2, as this version is no longer supported, according to our policy.

Closes scylladb/scylladb#18562
2024-05-09 08:26:44 +03:00
Anna Stuchlik
74fb9808ed doc: update Consistent Topology with Raft
This PR:
- Removes the `.. only:: opensource` directive from Consistent Topology with Raft.
  This feature is no longer an Open Source-only experimental feature.
- Removes redundant version-specific information.
- Moves the necessary version-specific information to a separate file.

This is a follow-up to 55b011902e.

Refs https://github.com/scylladb/scylladb/pull/18285/

Closes scylladb/scylladb#18553
2024-05-09 08:26:44 +03:00
Calle Wilund
79d56ccaad commitlog: Fix request_controller semaphore accounting.
Fixes #18488

Due to the discrepancy between bytes added to CL and bytes written to disk
(due to CRC sector overhead), we fail to account for the proper byte count
when issuing account_memory_usage in allocate (using bytes added) and in
cycle:s notify_memory_written (disk bytes written).

This leads us to slowly, but surely, add to the semaphore all the time.
Eventually rendering it useless.

Also, terminate call would _not_ take any of this into account,
and the chunk overhead there would cause a (smaller) discrepancy
as well.

Fix by simply ensuring that buffer alloc handles its byte usage,
then accounting based on buffer position, not input byte size.

Closes scylladb/scylladb#18489
2024-05-09 08:26:44 +03:00
Botond Dénes
155332ebf8 Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov
Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail.

This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop.

Closes scylladb/scylladb#18408

* github.com:scylladb/scylladb:
  Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB"
  view: Abort pending view updates when draining
2024-05-09 08:26:44 +03:00
Aleksandra Martyniuk
67bbaad62e tasks: use default task_ttl in scylla.yaml
Currently default task_ttl_in_seconds is 0, but scylla.yaml changes
the value to 10.

Change task_ttl_in_seconds in scylla.yaml to 0, so that there are
consistent defaults. Comment it out.

Fixes: #16714.

Closes scylladb/scylladb#18495
2024-05-09 08:26:44 +03:00
Botond Dénes
0438febdc9 Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El
The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`.

Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table.

The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards.

Fixes #5883.

Closes scylladb/scylladb#18286

* github.com:scylladb/scylladb:
  test/alternator: test addressing LSI using REST API
  REST API: stop using deprecated, buggy, path parameter
2024-05-09 08:26:43 +03:00
Yaniv Michael Kaul
124064844f docs/dev/object_stroage.md: convert example AWS keys to be more innocent
Someone thought that they actually represent real keys (the 'EXAMPLE' in their name was not enough).
Converted them to be as clear as can be, example data.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#18565
2024-05-09 08:26:43 +03:00
Asias He
46269a99d8 repair: Add ranges_parallelism option support for tablet
The ranges_parallelism option is introduced in commit 9b3fd9407b.
Currently, this option works for vnode table repair only.

This patch enables it for tablet repair, since it is useful for
tablet repair too.

Fixes #18383

Closes scylladb/scylladb#18385
2024-05-09 08:26:43 +03:00
Benny Halevy
0156e97560 storage_proxy: cas: reject for tablets-enabled tables
Currently, LWT is not supported with tablets.
In particular the interaction between paxos and tablet
migration is not handled yet.

Therefore, it is better to outright reject LWT queries
for tablets-enabled tables rather than support them
in a flaky way.

This commit also marks tests that depend on LWT
as expeced to fail.

Fixes scylladb/scylladb#18066

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18103
2024-05-09 08:26:43 +03:00
Patryk Jędrzejczak
053a2893cf raft topology: join_token_ring: prevent shutdown hangs
Shutdown of a bootstrapping node could hang on
`_topology_state_machine.event.when()` in
`wait_for_topology_request_completion`. It caused
scylladb/scylladb#17246 and scylladb/scylladb#17608.

On a normal node, `wait_for_group0_stop` would prevent it, but this
function won't be called before we join group 0. Solve it by adding
a new subscriber to `_abort_source`.

Additionally, trigger `_group0_as` to prevent other hang scenarios.

Note that if both the new subscriber and `wait_for_group0_stop` are
called, nothing will break. `abort_source::request_abort` and
`conditional_variable::broken` can be called multiple times.

The raft-based topology is moved out of experimental in 6.0, no need
to backport the patch.

Fixes scylladb/scylladb#17246
Fixes scylladb/scylladb#17608

Closes scylladb/scylladb#18549
2024-05-09 08:26:43 +03:00
Botond Dénes
96a7ed7efb Merge 'sstables: add dead row count when issuing warning to system.large_partitions' from Ferenc Szili
This is the second half of the fix for issue #13968. The first half is already merged with PR #18346

Scylla issues warnings for partitions containing more rows than a configured threshold. The warning is issued by inserting a row into the `system.large_partitions` table. This row contains the information about the partition for which the warning is issued: keyspace, table, sstable, partition key and size, compaction time and the number of rows in the partition. A previous PR #18346 also added range tombstone count to this row.

This change adds a new counter for dead rows to the large_partitions table.

This change also adds cluster feature protection for writing into these new counters. This is needed in case a cluster is in the process of being upgraded to this new version, after which an upgraded node writes data with the new schema into `system.large_partitions`, and finally a node is then rolled back to an old version. This node will then revert the schema to the old version, but the written sstables will still contain data with the new counters, causing any readers of this table to throw errors when they encounter these cells.

This is an enhancement, and backporting is not needed.

Fixes #13968

Closes scylladb/scylladb#18458

* github.com:scylladb/scylladb:
  sstable: added test for counting dead rows
  sstable: added docs for system.large_partitions.dead_rows
  sstable: added cluster feature for dead rows and range tombstones
  sstable: write dead_rows count to system.large_partitions
  sstable: added counter for dead rows
2024-05-09 08:26:43 +03:00
David Garcia
d63d418ae3 docs: change "create an issue" github label to "type/documentation"
Closes scylladb/scylladb#18550
2024-05-09 08:26:43 +03:00
Kefu Chai
02be1e9309 .github: add clang-tidy workflow
clang-tidy is a tool provided by Clang to perform static analysis on
C++ source files. here, we are mostly intersted in using its
https://clang.llvm.org/extra/clang-tidy/checks/bugprone/use-after-move.html
check to reveal the potential issues.

this workflow is added to run clang-tidy when building the tree, so
that the warnings from clang-tidy can be noticed by developers.

a dedicated action is added so other github workflow can reuse it to
setup the building environment in an ubuntu:jammy runner.

clang-tidy-matcher.json is added to annotate the change, so that the
warnings are more visible with github webpage.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18342
2024-05-09 08:26:43 +03:00
David Garcia
4a1b109641 docs: add swagger ui extension
Renders the API Reference from api/api-doc using Swagger UI 2.2.10.

address comments

Closes scylladb/scylladb#18253
2024-05-09 08:26:43 +03:00
Botond Dénes
c7c4964b1c tools/scylla-nodetool: make doc link version-specific
Generate documentation link, such that they point to the documentation
page, which is appropriate to the current product (open-source or
enterprise) and version. The documentation links are generated by a new
function and the documentation links are injected into the description
of nodetool command via fmt::format().
2024-05-08 09:41:18 -04:00
Botond Dénes
2d1e938849 release: introduce doc_link()
Allows generating documentation links that are appropriate for the
current product (open-source or enterprise) and version.
To be used in the next patch to make scylla-nodetool's documentation
links product and version appropriate.
2024-05-08 09:41:17 -04:00
Botond Dénes
9d2156bd8a build: pass scylla product to release.cc
In the form of -DSCYLLA_PRODUCT. To be used in the next patch.
2024-05-08 09:40:24 -04:00
Kamil Braun
4dcae66380 Merge 'test: {auth,topology}: use manager.rolling_restart' from Piotr Dulikowski
Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated
`manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior.

In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed.

Fixes: scylladb/scylladb#18369

Closes scylladb/scylladb#18379

* github.com:scylladb/scylladb:
  test: get rid of server-side server_restart
  test: util: get rid of the `restart` helper
  test: {auth,topology}: use manager.rolling_restart
2024-05-08 09:45:08 +02:00
Piotr Dulikowski
180cb7a2b9 storage_service: notify lifecycle subs only after token metadata update
Currently, in raft mode, when raft topology is reloaded from disk or a
notification is received from gossip about an endpoint change, token
metadata is updated accordingly. While updating token metadata we detect
whether some nodes are joining or are leaving and we notify endpoint
lifecycle subscribers if such an event occurs. These notifications are
fired _before_ we finish updating token metadata and before the updated
version is globally available.

This behavior, for "node leaving" notifications specifically, was not
present in legacy topology mode. Hinted handoff depends on token
metadata being updated before it is notified about a leaving node (we
had a similar issue before: scylladb/scylladb#5087, and we fixed it by
enforcing this property). Because this is not true right now for raft
mode, this causes the hint draining logic not to work properly - when a
node leaves the cluster, there should be an attempt to send out hints
for that node, but instead hints are not sent out and are kept on disk.

In order to fix the issue with hints, postpone notifying endpoint
lifecycle subscribers about joined and left nodes only after the final
token metadata is computed and replicated to all shards.

Fixes: scylladb/scylladb#17023

Closes scylladb/scylladb#18377
2024-05-08 09:40:44 +02:00
Kamil Braun
03818c4aa9 direct_failure_detector: increase ping timeout and make it tunable
The direct failure detector design is simplistic. It sends pings
sequentially and times out listeners that reached the threshold (i.e.
didn't hear from a given endpoint for too long) in-between pings.

Given the sequential nature, the previous ping must finish so the next
ping can start. We timeout pings that take too long. The timeout was
hardcoded and set to 300ms. This is too low for wide-area setups --
latencies across the Earth can indeed go up to 300ms. 3 subsequent timed
out pings to a given node were sufficient for the Raft listener to "mark
server as down" (the listener used a threshold of 1s).

Increase the ping timeout to 600ms which should be enough even for
pinging the opposite side of Earth, and make it tunable.

Increase the Raft listener threshold from 1s to 2s. Without the
increased threshold, one timed out ping would be enough to mark the
server as down. Increasing it to 2s requires 3 timed out pings which
makes it more robust in presence of transient network hiccups.

In the future we'll most likely want to decrease the Raft listener
threshold again, if we use Raft for data path -- so leader elections
start quickly after leader failures. (Faster than 2s). To do that we'll
have to improve the design of the direct failure detector.

Ref: scylladb/scylladb#16410
Fixes: scylladb/scylladb#16607

---

I tested the change manually using `tc qdisc ... netem delay`, setting
network delay on local setup to ~300ms with jitter. Without the change,
the result is as observed in scylladb/scylladb#16410: interleaving
```
raft_group_registry - marking Raft server ... as dead for Raft groups
raft_group_registry - marking Raft server ... as alive for Raft groups
```
happening once every few seconds. The "marking as dead" happens whenever
we get 3 subsequent failed pings, which is happens with certain (high)
probability depending on the latency jitter. Then as soon as we get a
successful ping, we mark server back as alive.

With the change, the phenomenon no longer appears.

Closes scylladb/scylladb#18443
2024-05-07 23:40:23 +02:00
Anna Stuchlik
98367cb6a1 doc: Snitch switch is not supported with tablets
This commit adds the tablets-related limitation:
if you use tablets, then changing snitch is not supported

Refs:https://github.com/scylladb/scylladb/issues/17513
See: https://github.com/scylladb/scylladb/issues/17513#issuecomment-2022552677

Closes scylladb/scylladb#18548
2024-05-07 17:26:05 +02:00
Pavel Emelyanov
677e80a4d5 table: Coroutinize table::delete_sstables_atomically()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18499
2024-05-07 17:10:28 +02:00
Kamil Braun
53443f566a Merge 'Coroutinize generic_server's listen() method' from Pavel Emelyanov
It needs some local naming cleanup, but otherwise it's pretty simple

Closes scylladb/scylladb#18510

* github.com:scylladb/scylladb:
  generic_server: Fix indentation after previous patch
  generic_server: Coroutinize listen() method
  generic_server: Rename creds argument to builder
2024-05-07 17:08:59 +02:00
Ferenc Szili
60bf846f68 sstable: added test for counting dead rows 2024-05-07 15:44:33 +02:00
Ferenc Szili
8e9771d010 sstable: added docs for system.large_partitions.dead_rows 2024-05-07 15:44:33 +02:00
Avi Kivity
9b8dfb2b19 compaction: compaction_strategy validation: don't rely on optional<> formatting
std::optional formatting changed while moving from the home-grown formatter to
the fmt provided formatter; don't rely on it for user visible messages.

Here, the optional formatted is known to be engaged, so just print it.

Closes scylladb/scylladb#18533
2024-05-07 12:02:33 +03:00
Kefu Chai
7e578ae964 message: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18527
2024-05-07 11:59:36 +03:00
Raphael S. Carvalho
570e3f8df0 compaction: exclude expired sstables from calculation of base timestamps
base timestamps are feeded into the sstable writer for calculating
delta, used by varints. given that expired ssts are bypassed, we
don't have to account them. so if we compacting fully expired and
new sstable together, we can save a bit by having a base ts closer
to the data actually written into output. also I wanted to move
the calculation into the loop in setup(), to avoid two iterations
over input set that can have even more than 1k elements.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18504
2024-05-07 08:43:50 +03:00
Raphael S. Carvalho
2d9142250e Fix flakiness in test_tablet_load_and_stream due to premature gossiper abort on shutdown
Until https://github.com/scylladb/scylladb/issues/15356 is fixed, this
will be handled by explicitly closing the connection, so if scylla fails
to update gossiper state due to premature abort on shutdown, then we
won't be stuck in an endless reconnection attempt (later through
heartbeats (30s interval)), causing the test to timeout.

Manifests in scylla logs like this:
gossip - failure_detector_loop: Got error in the loop, live_nodes={127.147.5.10, 127.147.5.16}: seastar::sleep_aborted (Sleep is aborted)
gossip - failure_detector_loop: Finished main loop
migration_manager - stopping migration service
storage_service - Shutting down native transport server
gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
cql_server_controller - CQL server stopped
...
gossip - My status = NORMAL
gossip - Announcing shutdown
gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
gossip - Sending a GossipShutdown to 127.147.5.10 with generation 1714449924
gossip - Sending a GossipShutdown to 127.147.5.16 with generation 1714449924
gossip - === Gossip round FAIL: seastar::abort_requested_exception (abort requested)

Refs #14746.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18484
2024-05-07 02:31:02 +02:00
Piotr Dulikowski
5459cfed6a Merge 'auth: don't run legacy migrations in auth-v2 mode' from Marcin Maliszkiewicz
We won't run:
- old pre auth-v1 migration code
- code creating auth-v1 tables

We will keep running:
- code creating default rows
- code creating auth-v1 keyspace (needed due to cqlsh legacy hack,
  it errors when executing `list roles` or `list users` if
  there is no system_auth keyspace, it does support case when
  there is no expected tables)

Fixes https://github.com/scylladb/scylladb/issues/17737

Closes scylladb/scylladb#17939

* github.com:scylladb/scylladb:
  auth: don't run legacy migrations on auth-v2 startup
  auth: fix indent in password_authenticator::start
  auth: remove unused service::has_existing_legacy_users func
2024-05-06 19:53:35 +02:00
Wojciech Mitros
8472c46c8a service_level_controller: coroutinize notify_service_level_removed
To avoid conflicts arising from the discrepancy between different
versions of the repository, use coroutines instead of continuations
in service_level_controller::notify_service_level_removed().

Closes scylladb/scylladb#18525
2024-05-06 14:20:49 +03:00
Piotr Dulikowski
92e5018ddb test: get rid of server-side server_restart
Restarting a node amounts to just shutting it down and then starting
again. There is no good reason to have a dedicated endpoint in the
ScyllaClusterManager for restarting when it can be implemented by
calling two endpoints in a sequence: stop and start - it's just code
duplication.

Remove the server_restart endpoint in ScyllaClusterManager and
reimplement it as two endpoint calls in the ManagerClient.
2024-05-06 12:54:53 +02:00
Piotr Dulikowski
8de2bda7ae test: util: get rid of the restart helper
We already have `ManagerClient.server_restart`, which can be used in its
place.
2024-05-06 12:24:40 +02:00
Piotr Dulikowski
897e603bf0 test: {auth,topology}: use manager.rolling_restart
Instead of performing a rolling restart by calling `restart` in a loop
over every node in the cluster, use the dedicated
`manager.rolling_restart` function. This method waits until all other
nodes see the currently processed node as up or down before proceeding
to the next step. Not doing so may lead to surprising behavior.

In particular, in scylladb/scylladb#18369, a test failed shortly after
restarting three nodes. Because nodes were restarted one after another
too fast, when the third node was restarted it didn't send a
notification to the second node because it still didn't know that the
second node was alive. This led the second node to notice that the third
node restarted by observing that it incremented its generation in gossip
(it restarted too fast to be marked as down by the failure detector). In
turn, this caused the second node to send "third node down" and "third
node up" notifications to the driver in a quick succession, causing it
to drop and reestablish all connections to that node. However, this
happened _after_ rolling upgrade finished and _after_ the test logic
confirmed that all nodes were alive. When the notifications were sent to
the driver, the test was executing some statements necessary for the
test to pass - as they broke, the test failed.

Fixes: scylladb/scylladb#18369
2024-05-06 12:24:40 +02:00
Kamil Braun
ccbb9f5343 Merge 'topology_coordinator: clear obsolete generations earlier' from Patryk Jędrzejczak
We want to clear CDC generations that are no longer needed
(because all writes are already using a new generation) so they
don't take space and are not sent during snapshot transfers
(see e.g. https://github.com/scylladb/scylladb/issues/17545).

The condition used previously was that we clear generations which
were closed (i.e., a new generation started at this time) more than
24h ago. This is a safe choice, but too conservative: we could
easily end up with a large number of obsolete generations if we
boot multiple nodes during 24h (which is especially easy to do
with tablets.)

Change this bound from 24h to `5s + ring_delay`. The choice is
explained in a comment in the code.

Additionally, improve `test_raft_snapshot_request` that would
become flaky after the change so it's not sensitive to changes
anymore.

The raft-based topology was experimental before 6.0, no need
to backport.

Ref: scylladb/scylladb#17545

Closes scylladb/scylladb#18497

* github.com:scylladb/scylladb:
  topology_coordinator: clear obsolete generations earlier
  test: test_raft_snapshot_request: improve the last assertion
  test: test_raft_snapshot_request: find raft leader after restart
  test: test_raft_shanpshot_request: simplify appended_command
2024-05-06 12:03:33 +02:00
Kamil Braun
1a50a524e7 Merge 'topology_coordinator: compute cluster size correctly during upgrade' from Piotr Dulikowski
During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints().

Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case).

The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster.

In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size.

Fixes: scylladb/scylladb#18198

Closes scylladb/scylladb#18261

* github.com:scylladb/scylladb:
  test: topology: test that upgrade succeeds after recent removal
  topology_coordinator: compute cluster size correctly during upgrade
2024-05-06 11:06:09 +02:00
Piotr Dulikowski
64ba620dc2 Merge 'hinted handoff: Use host IDs instead of IPs in the module' from Dawid Mędrek
This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased.

The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost.

Refs scylladb/scylladb#6403
Fixes scylladb/scylladb#12278

Closes scylladb/scylladb#15567

* github.com:scylladb/scylladb:
  docs: Update Hinted Handoff documentation
  db/hints: Add endpoint_downtime_not_bigger_than()
  db/hints: Migrate hinted handoff when cluster feature is enabled
  db/hints: Handle arbitrary directories in resource manager
  db/hints: Start using hint_directory_manager
  db/hints: Enforce providing IP in get_ep_manager()
  db/hints: Introduce hint_directory_manager
  db/hints/resource_manager: Update function description
  db/hints: Coroutinize space_watchdog::scan_one_ep_dir()
  db/hints: Expose update lock of space watchdog
  db/hints: Add function for migrating hint directories to host ID
  db/hints: Take both IP and host ID when storing hints
  db/hints: Prepare initializing endpoint managers for migrating from IP to host ID
  db/hints: Migrate to locator::host_id
  db/hints: Remove noexcept in do_send_one_mutation()
  service: Add locator::host_id to on_leave_cluster
  service: Fix indentation
  db/hints: Fix indentation
2024-05-06 09:58:18 +02:00
Patryk Jędrzejczak
628d7e709e cdc: generation: fix retrieve_generation_data_v2
`system_keyspace::read_cdc_generation_opt` queries
`system.cdc_generations_v3`, which stores ids of CDC generations
as timeuuids. This function shouldn't be called with a normal uuid
(used by `system.cdc_generations_v2` to store generation ids).
Such a call would end with a marshaling error.

Before this patch,`retrieve_generation_data_v2` could call
`system_keyspace::read_cdc_generation_opt` with a normal uuid if
the generation wasn't present in `system.cdc_generations_v2`.
This logic caused a marshaling error while handling the
`check_and_repair_cdc_streams` request in the
`cdc_test.TestCdc.test_check_and_repair_cdc_streams_liveness` dtest.

This patch fixes the code being added in 6.0, no need to backport it.

Fixes scylladb/scylladb#18473

Closes scylladb/scylladb#18483
2024-05-06 09:12:47 +02:00
Kamil Braun
16846bf5ce Merge 'Do not serialize removenode operation with api lock if topology over raft is enabled' from Gleb
With topology over raft all operation are already serialized by the
coordinator anyway, so no need to synchronize removenode using api lock.
All others are still synchronized since there cannot be executed in
parallel for the same node anyway.

* 'gleb/17681-fix' of github.com:scylladb/scylla-dev:
  storage_service: do not take API lock for removenode operation if topology coordinator is enabled
  test: return file mark from wait_for that points after the found string
2024-05-06 09:03:03 +02:00
Benny Halevy
ebff5f5d70 everywhere: include seastar headers using angle brackets
seastar is an external library therefore it should
use the system-include syntax.

Closes scylladb/scylladb#18513
2024-05-06 10:00:31 +03:00
Kefu Chai
5ca9a46a91 test/lib: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18515
2024-05-05 23:31:48 +03:00
Kefu Chai
0b0e661a85 build: bring abseil submodule back
because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689,
the rebuilt abseil package provided by fedora has different settings
than the ones if the tree is built with the sanitizer enabled. this
inconsistency leads to a crash.

to address this problem, we have to reinstate the abseil submodule, so
we can built it with the same compiler options with which we build the
tree.

in this change

* Revert "build: drop abseil submodule, replace with distribution abseil"
* update CMake building system with abseil header include settings
* bump up the abseil submodule to the latest LTS branch of abseil:
  lts_2024_01_16
* update scylla-gdb.py to adapt to the new structure of
  flat_hash_map

This reverts commit 8635d24424.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18511
2024-05-05 23:31:09 +03:00
Kefu Chai
ea791919cf service/storage_proxy: drop unused operator<<
operator<<(ostream, paxos_response_handler) is not used anymore,
so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18520
2024-05-05 16:33:29 +03:00
Nadav Har'El
21557cfaa6 cql3: Fix invalid JSON parsing for JSON object with different key types
More than three years ago, in issue #7949, we noticed that trying to
set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the
fromJson() function) fails - the ascii key is incorrectly parsed.
We fixed that issue in commit 75109e9519
but unfortunately, did not do our due diligence: We did not write enough
tests inspired by this bug, and failed to discover that actually we have
the same bug for many other key types, not just for "ascii". Specifically,
the following key types have exactly the same bug:

  * blob
  * date
  * inet
  * time
  * timestamp
  * timeuuid
  * uuid

Other types, like numbers or boolean worked "by accident" - instead of
parsing them as a normal string, we asked the JSON parser to parse them
again after removing the quotes, and because unquoted numbers and
unquoted true/false happwn to work in JSON, this didn't fail.

The fix here is very simple - for all *native* types (i.e., not
collections or tuples), the encoding of the key in JSON is simply a
quoted string - and removing the quotes is all we need to do and there's
no need to run the JSON parser a second time. Only for more elaborate
types - collections and tuples - we need to run the JSON parser a
second time on the key string to build the more elaborate object.

This patch also includes tests for fromJson() reading a map with all
native key types, confirming that all the aforementioned key types
were broken before this patch, and all key types (including the numbers
and booleans which worked even befoe this patch) work with this patch.

Fixes #18477.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18482
2024-05-05 15:42:43 +03:00
Kefu Chai
f2b1c47dfc test/boost: s/boost::range::random_shuffle/std::ranges::shuffle/
`boost::range::random_shuffle()` uses the deprecated
`std::random_shuffle()` under the hood, so let's use
`std::ranges::shuffle()` which is available since C++20.

this change should address the warning like:

```
[312/753] CXX build/debug/test/boost/counter_test.o                                                                                                                                                                 In file included from test/boost/counter_test.cc:17:
/usr/include/boost/range/algorithm/random_shuffle.hpp:106:13: warning: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard *, std::vector<counter_shard>>>' is deprecated: use 'std::shuffle' instead [-Wdepr
ecated-declarations]
  106 |     detail::random_shuffle(boost::begin(rng), boost::end(rng));
      |             ^
test/boost/counter_test.cc:507:27: note: in instantiation of function template specialization 'boost::range::random_shuffle<std::vector<counter_shard>>' requested here
  507 |             boost::range::random_shuffle(shards);
      |                           ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_algo.h:4489:5: note: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard *, std::vector<counter_shard>>>' has been explicitly marked
deprecated here
 4489 |     _GLIBCXX14_DEPRECATED_SUGGEST("std::shuffle")
      |     ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1957:45: note: expanded from macro '_GLIBCXX14_DEPRECATED_SUGGEST'
 1957 | # define _GLIBCXX14_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT)
      |                                             ^
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1941:19: note: expanded from macro '_GLIBCXX_DEPRECATED_SUGGEST'
 1941 |   __attribute__ ((__deprecated__ ("use '" ALT "' instead")))
      |                   ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18517
2024-05-05 15:39:57 +03:00
Pavel Emelyanov
99f9807f15 sstables: Remove operator<<(std::ostream&, const deletion_time&)
It's completely unused, likely in favor of recently added formatter
for the type in question.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18502
2024-05-05 14:43:27 +03:00
Pavel Emelyanov
ddd2623418 generic_server: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-03 12:29:08 +03:00
Pavel Emelyanov
a1daa7093e generic_server: Coroutinize listen() method
Straightforward. Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-03 12:28:42 +03:00
Pavel Emelyanov
030f1ef81c generic_server: Rename creds argument to builder
So that it doesn't clash with local creds variable that will appear in
this method after its coroutinization.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-03 12:27:37 +03:00
Kefu Chai
53b98a8610 test: string_format_test: disable test if {fmt} >= 10.0.0
{fmt} v10.0.0 introduces formatter for `std::optional`, so there
is no need to test it. furthermore the behavior of this formatter
is different from our homebrew one. so let's skip this test if
{fmt} v10.0.0 or up is used.

Refs #18508

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18509
2024-05-03 11:34:23 +03:00
Kefu Chai
3421e6dcc1 tools/scylla-nodetool: add formatter for char*
in {fmt} version 10.0.0, it has a regression, which dropped the
formatter for `char *`, even it does format `const char*`, as the
latter is convertible to
`fmt::stirng_view`.

and this issue was addressed in 10.1.0 using 616a4937, which adds
the formatter for `Char *` back, where `Char` is a template parameter.

but we do need to print `vector<char*>`, so, to address the build
failure with {fmt} version 10.0.0, which is shipped along with
fedora 39. let's backport this formatter.

Fixes #18503
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18505
2024-05-02 23:25:24 +03:00
Avi Kivity
8de81f8f91 Merge 'Unstall merge topology snapshot' from Benny Halevy
This series adds facilities to gently convert canonical mutations back to mutations
and to gently make canonical mutations or freeze mutations in a seastar thread.

Those are used in storage_service::merge_topology_snapshot to prevent reactor stalls
due to large mutation, as seed in the test_add_many_nodes_under_load dtest.

Also, migration_manager migration_request was converted to use a seastar thread
to use the above facilities to prevent reactor stalls with large schema mutations,
e,g, with a large number of tables, and/or when reading tablets mutations with
a large number of tablets in a table.

perf-simple-query --write results:
Before:
```
median 79151.53 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53289 insns/op,        0 errors)
```
After:
```
median 79716.73 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53314 insns/op,        0 errors)
```

Closes scylladb/scylladb#18290

* github.com:scylladb/scylladb:
  storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method
  raft: group0_state_machine: write_mutations_to_database: freeze mutations gently
  database: apply_in_memory: unfreeze_gently large mutations
  storage_service: get_system_mutations: make_canonical_mutation_gently
  tablets: read_tablet_mutations: make_canonical_mutation_gently
  schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently
  schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference
  storage_service: merge_topology_snapshot: freeze_gently
  canonical_mutation: add make_canonical_mutation_gently
  frozen_mutation: move unfreeze_gently to async_utils
  mutation: add freeze_gently
  idl-compiler: generate async serialization functions for stub members
  raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently
  storage_service: merge_topology_snapshot: co_await to_mutation_gently
  canonical_mutation: add to_mutation_gently
  idl-compiler: emit include directive in generated impl header file
  mutation_partition: add apply_gently
  collection_mutation: improve collection_mutation_view formatting
  mutation_partition: apply_monotonically: do not support schema upgrade
  test/perf: report also log_allocations/op
2024-05-02 23:24:38 +03:00
Nadav Har'El
f604269f0a cql3, secondary index: consistently choose index to use in a query
When a table has secondary indexes on *multiple* columns, and several
such columns are used for filtering in a query, Scylla chooses one
of these indexes as the main driver of the query, and the second
column's restriction is implemented as filtering.

Before this patch, the index to use was chosen fairly randomly, based on
the order of the indexes in the schema. This order may be different in
different coordinators, and may even change across restarts on the same
coordinators. This is not only inconsistent, it can cause outright wrong
results when using *paging* and switching (or restarting) coordinates
in the middle of a paged scan... One coordinator saves one index's key
in the paging state, and then the other coordinator gets this paging
state and wrongly believes it is supposed to be a key of a *different*
index.

The fix in this patch is to pick the index suitable for the first
indexed column mentioned in the query. This has two benefits over
the situation before the patch:

1. The decision of which index to use no longer changes between
   coordinators or across restarts - it just depends on the schema
   and the specific query.

2. Different indexes can have different "specificity" so using one
   or the other can change the query's performance. After this patch,
   the user is in control over which index is used by changing the
   order of terms in the query. A curious user can use tracing to
   check which index was used to implement a particular query.

An xfailing test we had for this issue no longer fails, so the "xfail"
marker is removed.

Fixes #7969

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#14450
2024-05-02 19:52:42 +02:00
Benny Halevy
890b890e36 storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method
Generalizing the ad-hoc implementation out of
group0_state_machine.write_mutations_to_database.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:42:58 +03:00
Benny Halevy
4ae5bbb058 raft: group0_state_machine: write_mutations_to_database: freeze mutations gently
write_mutations_to_database might need to handle
large mutations from system tables, so to prevent
reactor stalls, freeze the mutations gently
and call proxy.mutate_locally in parallel on
the individual frozen mutations, rather than
calling the vector<mutation> based entry point
that eventually freezes each mutation synchronously.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:37:06 +03:00
Benny Halevy
a9f157b648 database: apply_in_memory: unfreeze_gently large mutations
Prevent stalls coming from applying large
mutations in memory synchronously,
like the ones seen with the test_add_many_nodes_under_load
dtest:
```
  | | |   ++[5#2/2 44%] addr=0x1498efb total=256 count=3 avg=85:
  | | |   |             replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}::operator() at ./replica/memtable.cc:804
  | | |   |             (inlined by) logalloc::allocating_section::with_reclaiming_disabled<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&> at ././utils/logalloc.hh:500
  | | |   |             (inlined by) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}::operator() at ././utils/logalloc.hh:527
  | | |   |             (inlined by) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}> at ././utils/logalloc.hh:471
  | | |   |             (inlined by) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}> at ././utils/logalloc.hh:526
  | | |   |             (inlined by) replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator() at ./replica/memtable.cc:800
  | | |   |             (inlined by) with_allocator<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0> at ././utils/allocation_strategy.hh:318
  | | |   |             (inlined by) replica::memtable::apply at ./replica/memtable.cc:799
  | | |     ++[6#1/1 100%] addr=0x145047b total=1731 count=21 avg=82:
  | | |     |              replica::table::do_apply<frozen_mutation const&, seastar::lw_shared_ptr<schema const>&> at ./replica/table.cc:2896
  | | |       ++[7#1/1 100%] addr=0x13ddccb total=2852 count=32 avg=89:
  | | |       |              replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0::operator() at ./replica/table.cc:2924
  | | |       |              (inlined by) seastar::futurize<void>::invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&> at ././seastar/include/seastar/core/future.hh:2032
  | | |       |              (inlined by) seastar::futurize_invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&> at ././seastar/include/seastar/core/future.hh:2066
  | | |       |              (inlined by) replica::dirty_memory_manager_logalloc::region_group::run_when_memory_available<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0> at ./replica/dirty_memory_manager.hh:572
  | | |       |              (inlined by) replica::table::apply at ./replica/table.cc:2923
  | | |       ++           - addr=0x1330ba1:
  | | |       |              replica::database::apply_in_memory at ./replica/database.cc:1812
  | | |       ++           - addr=0x1360054:
  | | |       |              replica::database::do_apply at ./replica/database.cc:2032
```

This change has virtually no effect on small mutations
(up to 128KB in size).
build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1
Before:
median 80092.06 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53291 insns/op,        0 errors)
After:
median 78780.86 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53311 insns/op,        0 errors)

To estimate the performance ramifications on
large mutations, I measured perf-simple-query --write
calling unfreeze_gently in all cases:
median 77411.26 tps ( 71.3 allocs/op,   8.0 logallocs/op,  14.3 tasks/op,   53280 insns/op,        0 errors)

Showing the allocations that moved out of logalloc
(in memtable::apply of frozen_mutation) into seastar
allocations (in unfreeze_gently) and <1% cpu overhead.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:37:06 +03:00
Benny Halevy
7dd6a81026 storage_service: get_system_mutations: make_canonical_mutation_gently
and also unfreeze_gently the result frozen_mutation:s
to prevent the following stalls that were seen
with the test_add_many_nodes_under_load dtest:
```
  ++[1#1/58 5%] addr=0x16330e9 total=321 count=4 avg=80:
  |             utils::uleb64_express_encode_impl at ././utils/vle.hh:73
  |             (inlined by) utils::uleb64_express_encode<void (&)(char const*, unsigned long), void (&)(char const*, unsigned long)> at ././utils/vle.hh:82
  |             (inlined by) logalloc::region_impl::object_descriptor::encode at ./utils/logalloc.cc:1658
  |             (inlined by) logalloc::region_impl::alloc_small at ./utils/logalloc.cc:1743
  ++          - addr=0x1634cff:
  |             logalloc::region_impl::alloc at ./utils/logalloc.cc:2104
  | ++[2#1/2 83%] addr=0x116e22c total=321 count=4 avg=80:
  | |             managed_bytes::managed_bytes at ././utils/managed_bytes.hh:552
  | | ++[3#1/3 51%] addr=0x1551288 total=198 count=3 avg=66:
  | | |             compound_wrapper<clustering_key_prefix, clustering_key_prefix_view>::compound_wrapper at ././keys.hh:149
  | | |             (inlined by) prefix_compound_wrapper<clustering_key_prefix, clustering_key_prefix_view, clustering_key_prefix>::prefix_compound_wrapper at ././keys.hh:574
  | | |             (inlined by) clustering_key_prefix::clustering_key_prefix at ././keys.hh:865
  | | |             (inlined by) rows_entry::rows_entry at ./mutation/mutation_partition.hh:957
  | | ++          - addr=0x153f09f:
  | | |             allocation_strategy::construct<rows_entry, schema const&, position_in_partition_view&, seastar::bool_class<dummy_tag>&, seastar::bool_class<continuous_tag>&> at ././utils/allocation_strategy.hh:160
  | | ++          - addr=0x151409a:
  | | |             mutation_partition::append_clustered_row at ./mutation/mutation_partition.cc:719
  | | ++          - addr=0x14ab38f:
  | | |             partition_builder::accept_row at ././partition_builder.hh:57
  | | | ++[4#1/1 100%] addr=0x1579766 total=577 count=7 avg=82:
  | | | |              mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:212
  | | |   ++[5#1/2 56%] addr=0x14e737c total=321 count=4 avg=80:
  | | |   |             frozen_mutation::unfreeze at ./mutation/frozen_mutation.cc:116
  | | |   | ++[6#1/1 100%] addr=0x24fb47e total=1476 count=18 avg=82:
  | | |   | |              service::storage_service::get_system_mutations at ./service/storage_service.cc:6401
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:37:06 +03:00
Benny Halevy
3143f575e5 tablets: read_tablet_mutations: make_canonical_mutation_gently
To prevent reactor stalls due to large tablets
mutations (that can contain over 100,000 rows).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:37:06 +03:00
Benny Halevy
7f372dd9ae schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently
To prevent stalls due to large schema mutations.
While at it, reserve the result canonical_mutation vector.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:37:05 +03:00
Benny Halevy
61dea98185 schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference
The function upgrades the input mutation
only in certain cases.  Currently it accepts
the input mutation by value, which may cause
and extraneous copy if the caller doesn't move
the mutation, as done in
`adjust_schema_for_schema_features`.

Getting an rvalue reference instead makes the
interface clearer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:37:05 +03:00
Benny Halevy
bc1985b8ce storage_service: merge_topology_snapshot: freeze_gently
Freezing large mutations synchronously may cause
reactor stalls, as seen in the test_add_many_nodes_under_load
dtest:
```
  ++[1#1/37 5%] addr=0x15b0bf total=99 count=2 avg=50: ?? ??:0
  | ++[2#1/2 67%] addr=0x15a331f total=66 count=1 avg=66:
  | |             bytes_ostream::write at ././bytes_ostream.hh:248
  | |             (inlined by) bytes_ostream::write at ././bytes_ostream.hh:263
  | |             (inlined by) ser::serialize_integral<unsigned int, bytes_ostream> at ././serializer.hh:203
  | |             (inlined by) ser::integral_serializer<unsigned int>::write<bytes_ostream> at ././serializer.hh:217
  | |             (inlined by) ser::serialize<unsigned int, bytes_ostream> at ././serializer.hh:254
  | |             (inlined by) ser::writer_of_column<bytes_ostream>::write_id at ./build/dev/gen/idl/mutation.dist.impl.hh:4680
  | | ++[3#1/1 100%] addr=0x159df71 total=132 count=2 avg=66:
  | | |              (anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}::operator() at ./mutation/mutation_partition_serializer.cc:99
  | | |              (inlined by) row::maybe_invoke_with_hash<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1} const, cell_and_hash const> at ./mutation/mutation_partition.hh:133
  | | |              (inlined by) row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}::operator() at ./mutation/mutation_partition.hh:152
  | | |              (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>::operator() at ././utils/compact-radix-tree.hh:1888
  | | |              (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::visit_slot<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:1560
  | | ++           - addr=0x159d84d:
  | | |              compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:1364
  | | |              (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> > at ././utils/compact-radix-tree.hh:799
  | | |              (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:807
  | |   ++[4#1/1 100%] addr=0x1596f4a total=329 count=5 avg=66:
  | |   |              compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true> > at ././utils/compact-radix-tree.hh:473
  | |   |              (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true> > at ././utils/compact-radix-tree.hh:1626
  | |   |              (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::walk<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}> at ././utils/compact-radix-tree.hh:1909
  | |   |              (inlined by) row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}> at ./mutation/mutation_partition.hh:151
  | |   |              (inlined by) (anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:97
  | |   |              (inlined by) write_row<ser::writer_of_deletable_row<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:168
  | |     ++[5#1/2 80%] addr=0x15a310c total=263 count=4 avg=66:
  | |     |             mutation_partition_serializer::write_serialized<ser::writer_of_mutation_partition<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:180
  | |     | ++[6#1/2 62%] addr=0x14eb60a total=428 count=7 avg=61:
  | |     | |             frozen_mutation::frozen_mutation(mutation const&)::$_0::operator()<ser::writer_of_mutation_partition<bytes_ostream> > at ./mutation/frozen_mutation.cc:85
  | |     | |             (inlined by) ser::after_mutation__key<bytes_ostream>::partition<frozen_mutation::frozen_mutation(mutation const&)::$_0> at ./build/dev/gen/idl/mutation.dist.impl.hh:7058
  | |     | |             (inlined by) frozen_mutation::frozen_mutation at ./mutation/frozen_mutation.cc:84
  | |     | | ++[7#1/1 100%] addr=0x14ed388 total=532 count=9 avg=59:
  | |     | | |              freeze at ./mutation/frozen_mutation.cc:143
  | |     | |   ++[8#1/2 74%] addr=0x252cf55 total=394 count=6 avg=66:
  | |     | |   |             service::storage_service::merge_topology_snapshot at ./service/storage_service.cc:763
```

This change uses freeze_gently to freeze
the cdc_generations_v2 mutations one at a time
to prevent the stalls reported above.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:37:05 +03:00
Benny Halevy
a016e1d05d canonical_mutation: add make_canonical_mutation_gently
Make a canonical mutation gently using an
async serialization function.
Similar to freeze_gently, yielding is considered
only in-between range tombstones and rows.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:37:04 +03:00
Benny Halevy
a126160d7e frozen_mutation: move unfreeze_gently to async_utils
Unfreeze_gently doesn't have to be a method of
frozen_mutation.  It might as well be implemented as
a free function reading from a frozen_mutation
and preparing a mutation gently.

The logic will be used in a later patch
to make a canonical mutation directly from
a frozen_mutation instead of unfreezing it
and then converting it to a canonical_mutation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:27:56 +03:00
Benny Halevy
aa27ef8811 mutation: add freeze_gently
Allow yielding in between serializing of
range tombstones and rows to prevent reactor
stalls due to large mutations with many
rows or range tombstones.

mutations that have many cells might still
stall but those are considered infrequent enough
to ignore for now.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:27:56 +03:00
Benny Halevy
0da2940c72 idl-compiler: generate async serialization functions for stub members
To be used in a following patch for e.g.
mutation::freeze_gently.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:27:56 +03:00
Benny Halevy
504a9ab897 raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently
Prevent stalls coming from writing large mutations
like the ones seen with the test_add_many_nodes_under_load
dtest:
```
  ++[1#11/11 6%] addr=0x15408f6 total=33 count=1 avg=33:
  |              managed_bytes::managed_bytes at ././utils/managed_bytes.hh:284
  |              (inlined by) atomic_cell_or_collection::atomic_cell_or_collection at ./mutation/atomic_cell_or_collection.hh:25
  |              (inlined by) cell_and_hash::cell_and_hash at ./mutation/mutation_partition.hh:73
  |              (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::emplace<atomic_cell_or_collection, seastar::optimized_optional<cell_hash> > at ././utils/compact-radix-tree.hh:1809
  ++           - addr=0x1518bae:
  |              row::append_cell at ./mutation/mutation_partition.cc:1344
  ++           - addr=0x14acb23:
  |              partition_builder::accept_row_cell at ././partition_builder.hh:70
  ++           - addr=0x157a6a6:
  |              mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor::accept_atomic_cell at ./mutation/mutation_partition_view.cc:218
  |              (inlined by) (anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor::operator() at ./mutation/mutation_partition_view.cc:138
  |              (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>::internal_visit<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>&> at /usr/include/boost/variant/variant.hpp:1028
  |              (inlined by) boost::detail::variant::visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void*, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type> > at /usr/include/boost/variant/detail/visitation_impl.hpp:117
  |              (inlined by) boost::detail::variant::visitation_impl_invoke<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void*, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::has_fallback_type_> at /usr/include/boost/variant/detail/visitation_impl.hpp:157
  |              (inlined by) boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<3l>, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, boost::mpl::l_item<mpl_::long_<2l>, ser::collection_cell_view, boost::mpl::l_item<mpl_::long_<1l>, ser::unknown_variant_type, boost::mpl::l_end> > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void*, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::has_fallback_type_> at /usr/include/boost/variant/detail/visitation_impl.hpp:238
  |              (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void*> at /usr/include/boost/variant/variant.hpp:2337
  |              (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::internal_apply_visitor<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false> > at /usr/include/boost/variant/variant.hpp:2349
  |              (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::apply_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const> at /usr/include/boost/variant/variant.hpp:2393
  |              (inlined by) boost::apply_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>&> at /usr/include/boost/variant/detail/apply_visitor_unary.hpp:68
  |              (inlined by) (anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor> at ./mutation/mutation_partition_view.cc:158
  |              (inlined by) mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:224
  ++           - addr=0x151234a:
  |              mutation_partition::apply at ./mutation/mutation_partition.cc:476
  ++           - addr=0x14e1103:
  |              canonical_mutation::to_mutation at ./mutation/canonical_mutation.cc:76
  ++           - addr=0x283f9ee:
  |              service::write_mutations_to_database at ./service/raft/group0_state_machine.cc:124
  ++           - addr=0x283f36c:
  |              service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2::operator() at ./service/raft/group0_state_machine.cc:165
  ++           - addr=0x28395e3:
  |              std::__invoke_impl<seastar::future<void>, seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, service::topology_change&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61
  |              (inlined by) std::__invoke<seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, service::topology_change&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:96
  |              (inlined by) std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<std::__detail::__variant::__deduce_visit_result<seastar::future<void> > (*)(seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>&&, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&)>, std::integer_sequence<unsigned long, 2ul> >::__visit_invoke at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1032
  |              (inlined by) std::__do_visit<std::__detail::__variant::__deduce_visit_result<seastar::future<void> >, seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1793
  |              (inlined by) std::visit<seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1854
  |              (inlined by) service::group0_state_machine::merge_and_apply at ./service/raft/group0_state_machine.cc:156
  ++           - addr=0x284781e:
  |              service::group0_state_machine::apply at ./service/raft/group0_state_machine.cc:220
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:27:56 +03:00
Benny Halevy
574cb7d977 storage_service: merge_topology_snapshot: co_await to_mutation_gently
Perevent stalls from "unpacking" of large
canonical mutations seen with test_add_many_nodes_under_load
when called from `group0_state_machine::transfer_snapshot`:

```
  ++[1#1/44 14%] addr=0x395b2f total=569 count=6 avg=95: ?? ??:0
  | ++[2#1/2 56%] addr=0x3991e3 total=321 count=4 avg=80: ?? ??:0
  | ++          - addr=0x1587159:
  | |             std::__new_allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/new_allocator.h:147
  | |             (inlined by) std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/allocator.h:198
  | |             (inlined by) std::allocator_traits<std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/alloc_traits.h:482
  | |             (inlined by) std::_Vector_base<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::_M_allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_vector.h:378
  | |             (inlined by) std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::reserve at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/vector.tcc:79
  | |             (inlined by) ser::idl::serializers::internal::vector_serializer<std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > > >::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer_impl.hh:226
  | |             (inlined by) ser::deserialize<std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer.hh:264
  | |             (inlined by) ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}::operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/dev/gen/idl/keys.dist.impl.hh:31
  | ++          - addr=0x1587085:
  | |             seastar::with_serialized_stream<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>, ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}, void, void> at ././seastar/include/seastar/core/simple-stream.hh:646
  | |             (inlined by) ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/dev/gen/idl/keys.dist.impl.hh:28
  | |             (inlined by) ser::deserialize<clustering_key_prefix, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer.hh:264
  | |             (inlined by) ser::deletable_row_view::key() const::{lambda(auto:1&)#1}::operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> const> at ./build/dev/gen/idl/mutation.dist.impl.hh:1268
  | | ++[3#1/1 100%] addr=0x15865a3 total=577 count=7 avg=82:
  | | |              seastar::memory_input_stream<bytes_ostream::fragment_iterator>::with_stream<ser::deletable_row_view::key() const::{lambda(auto:1&)#1}> at ././seastar/include/seastar/core/simple-stream.hh:491
  | | |              (inlined by) seastar::with_serialized_stream<seastar::memory_input_stream<bytes_ostream::fragment_iterator> const, ser::deletable_row_view::key() const::{lambda(auto:1&)#1}, void> at ././seastar/include/seastar/core/simple-stream.hh:639
  | | |              (inlined by) ser::deletable_row_view::key at ./build/dev/gen/idl/mutation.dist.impl.hh:1264
  | |   ++[4#1/1 100%] addr=0x157cf27 total=643 count=8 avg=80:
  | |   |              mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:212
  | |   ++           - addr=0x1516cac:
  | |   |              mutation_partition::apply at ./mutation/mutation_partition.cc:497
  | |     ++[5#1/1 100%] addr=0x14e4433 total=1765 count=22 avg=80:
  | |     |              canonical_mutation::to_mutation at ./mutation/canonical_mutation.cc:60
  | |       ++[6#1/2 98%] addr=0x2452a60 total=1732 count=21 avg=82:
  | |       |             service::storage_service::merge_topology_snapshot at ./service/storage_service.cc:761
  | |       ++          - addr=0x2858782:
  | |       |             service::group0_state_machine::transfer_snapshot at ./service/raft/group0_state_machine.cc:303
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:27:56 +03:00
Benny Halevy
c485ed6287 canonical_mutation: add to_mutation_gently
to_mutation_gently generates mutation from canonical_mutation
asynchronously using the newly introduced mutation_partition
accept_gently method.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 19:27:54 +03:00
Benny Halevy
7f7e4616ab idl-compiler: emit include directive in generated impl header file
The generated implementation header file depends
on the generated header file for the types it uses.
Generate a respective #include directive to make it self-sufficient.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 18:50:16 +03:00
Benny Halevy
e1411f3911 mutation_partition: add apply_gently
To be used for freezing mutations or
making canonical mutations gently.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 18:45:24 +03:00
Benny Halevy
f625cd76a9 collection_mutation: improve collection_mutation_view formatting
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 18:42:41 +03:00
Benny Halevy
15e8ecb670 mutation_partition: apply_monotonically: do not support schema upgrade
Currently, if the input mutation_partition requires
schema upgrade, apply_monotonically always silently reverts to
being non-preemptible, even if the caller passed is_preemptible::yes.

To prevent that from happening, put the burden of upgrading
the mutation_partition schem on the caller, which is
today the apply() methods, which are synchronous anyhow.

With that, we reduce the proliferation of the
`apply_monotonically` overloads and keep only the
low level one (which could potentially be private as well,
as it's called only from within the mutation/ source files
and from tests)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 18:42:41 +03:00
Benny Halevy
e5ca65f78b test/perf: report also log_allocations/op
Currently perf-simple-query --write ignores
log allocations that happen on the memtable
apply path.

This change adds tracking and accounting
of the number of log allocation,
and reporting of thereof.

For reference, here's the output of
build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1
```
random-seed=1
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
78073.55 tps ( 59.4 allocs/op,  16.3 logallocs/op,  14.3 tasks/op,   52991 insns/op,        0 errors)
77263.59 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53282 insns/op,        0 errors)
79913.07 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53295 insns/op,        0 errors)
79554.32 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53284 insns/op,        0 errors)
79151.53 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53289 insns/op,        0 errors)

median 79151.53 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53289 insns/op,        0 errors)
median absolute deviation: 761.54
maximum: 79913.07
minimum: 77263.59
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 18:42:41 +03:00
Avi Kivity
e0d597348b Merge 'Remove sstable_directory::_sstable_dir member' from Pavel Emelyanov
Different sstable storage backends use slightly different notion of what sstable location is. Filesystem storage knows it's `/var/lib/data/ks/cf-uuid/state` path, while s3 storage keeps only this path's part without state (and even that's not very accurate, because bucket prefix is missing as well as "/var/lib/data" prefix is not needed and eventually should be omitted). Nonetheless, the sstable_directory still keeps the filsystem-like path, while it's really only needed by the filesystem lister. This PR removes it.

Closes scylladb/scylladb#18496

* github.com:scylladb/scylladb:
  sstable_directory: Remove _sstable_dir member
  sstable_directory: Create sstable path with make_path() when logging
  sstable_directory: Use make_path to construct filesystem lister
  sstable_directory: Move some logging around
2024-05-02 17:52:21 +03:00
Patryk Jędrzejczak
b8e3bf4b09 topology_coordinator: clear obsolete generations earlier
We want to clear CDC generations that are no longer needed
(because all writes are already using a new generation) so they
don't take space and are not sent during snapshot transfers
(see e.g. scylladb/scylladb#17545).

The condition used previously was that we clear generations which
were closed (i.e., a new generation started at this time) more than
24h ago. This is a safe choice, but too conservative: we could
easily end up with a large number of obsolete generations if we
boot multiple nodes during 24h (which is especially easy to do
with tablets.)

Change this bound from 24h to `5s + ring_delay`. The choice is
explained in a comment in the code.

Also, prevent `test_cdc_generation_clearing` from being flaky by
firing the `increase_cdc_generation_leeway` error injection on
the server being the topology coordinator.

Ref: scylladb/scylladb#17545
2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak
f61c50baa4 test: test_raft_snapshot_request: improve the last assertion
The last assertion in the test is very sensitive to changes. The
constant has already been increased from 0 to 1 due to flakiness.
The old comment explains it.

In the following patch, we change the CDC generation publisher so
that it clears the obsolete CDC generations earlier. This change
would make this assertion flaky again. After restarting the servers,
the new topology coordinator could remove the first generation if it
became obsolete. This operation appends a new entry to the log. If
it happened after triggering snapshot, the assertion could fail
with `2 <= 1`.

We could increase the constant again to unflake the test, but we
better improve it once and for all. We change the assertion so
that it's not sensitive to changes in the code based on Raft. The
explanation is in the new comment.
2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak
44791a849e test: test_raft_snapshot_request: find raft leader after restart
Finding the new Raft leader after restart simplifies the test
and makes it easier to reason about. There are two improvements:
- we only need to wait until the leader appends a command, so
  the read barrier becomes unnecessary,
- we only need to trigger snapshot on the leader.

We also use the knowledge about the leader in the following patch.
2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak
41198998c5 test: test_raft_shanpshot_request: simplify appended_command
We shorten the code and remove the unused `log_size` variable.
2024-05-02 12:46:31 +02:00
Yaron Kaikov
2cf7cc1ea5 scylla_setup: Remove jmx and tools packages from being verified
Following
b8634fb244
machine image started to fail with the following error:
```
10:44:59  ␛[0;32m    googlecompute.gce: scylla-jmx package is not installed.␛[0m
10:44:59  ␛[1;31m==> googlecompute.gce: Traceback (most recent call last):␛[0m
10:44:59  ␛[1;31m==> googlecompute.gce:   File "/home/ubuntu/scylla_install_image", line 135, in <module>␛[0m
10:44:59  ␛[1;31m==> googlecompute.gce:     run('/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup', shell=True, check=True)␛[0m
10:44:59  ␛[1;31m==> googlecompute.gce:   File "/usr/lib/python3.10/subprocess.py", line 526, in run␛[0m
10:44:59  ␛[1;31m==> googlecompute.gce:     raise CalledProcessError(retcode, process.args,␛[0m
10:44:59  ␛[1;31m==> googlecompute.gce: subprocess.CalledProcessError: Command '/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup' returned non-zero exit status 1.␛[0m
```

It seems we no longer need to verify that jmx and tools-java packages are installed.

Closes scylladb/scylladb#18494
2024-05-02 13:30:50 +03:00
Pavel Emelyanov
b8f9eeb82b sstable_directory: Remove _sstable_dir member
It's no longer in use.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-02 13:12:59 +03:00
Pavel Emelyanov
608762adda sstable_directory: Create sstable path with make_path() when logging
The sstable_directory::sstable_filename() should generate a name of an
sstable for log messages. It's not accurate, because it silently assumes
that the filename is on local storage, which might not be the case.
Fixing it is large chage, so for now replace _sstable_dir with explicit
call to make_path(). The change is idempotent, as _sstable_dir is
initialized with the result of make_path() call in constructor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-02 13:12:59 +03:00
Pavel Emelyanov
07c1df575e sstable_directory: Use make_path to construct filesystem lister
The _sstable_dir is used currently, but it's initialized with
make_path() result anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-02 13:12:59 +03:00
Pavel Emelyanov
ef98777b27 sstable_directory: Move some logging around
At the beginning of .process() method there's a log message which path
and which storage is being processed. That's not really nice, because,
e.g. filesystem lister may skip processing quarantine directory. Also,
the registry lister doesn't list entries by their _sstable_dir, but
rather by its _location (spoiler: dir = location / state).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-02 13:08:28 +03:00
Ferenc Szili
90634b419c sstable: added cluster feature for dead rows and range tombstones
Previously, writing into system.large_partitions was done by calling
record_large_partition(). In order to write different data based on
the cluster feature flag, another level of indirection was added by
calling _record_large_partitions which is initialized to a lambda
which calls internal_record_large_partitions(). This function does
not record the values of the two new columns (dead_rows and
range_tombstones). After the cluster feature flag becomes true,
_record_large_partitions is set to a lambda which calls
internal_record_large_partitions_all_data() which record the values
of the two new columns.
2024-05-02 11:49:46 +02:00
Ferenc Szili
b06af5b2b9 sstable: write dead_rows count to system.large_partitions 2024-05-02 11:49:10 +02:00
Ferenc Szili
63e724c974 sstable: added counter for dead rows 2024-05-02 11:49:10 +02:00
Nadav Har'El
5558143014 test/alternator: test addressing LSI using REST API
The name of the Scylla table backing an Alternator LSI looks like
basename:!lsiname. Some REST API clients (including Scylla Manager)
when they send a "!" character in the REST API request may decide
to "URL encode" it - convert it to %21.

Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725)
Scylla's REST API server forgets to do the URL decoding, which leads
to the REST API request failing to address the LSI table.

This patch introduces a test for this bug, which fails without the
Seastar issue being fixed, and passes afterwards (i.e., after the
previous patch that starts to use the new, fixed, Seastar API).

The test creates an LSI, uses the REST API to find its name and then
tries to call some REST API ("compaction_strategy") on this table name,
after deliberately URL-encoding it.

Refs #5883.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-05-02 12:33:54 +03:00
Nadav Har'El
1aacfdf460 REST API: stop using deprecated, buggy, path parameter
The API req->param["name"] to access parameters in the path part of the
URL was buggy - it forgot to do URL decoding and the result of our use
of it in Scylla was bugs like #5883 - where special characters in certain
REST API requests got botched up (encoded by the client, then not
decoded by the server).

The solution is to replace all uses of req->param["name"] by the new
req->get_path_param("name"), which does the decoding correctly.

Unfortunately we needed to change 104 (!) callers in this patch, but the
transformation is mostly mechanical and there is no functional changes in
this patch. Another set of changes was to bring req, not req->param, to
a few functions that want to get the path param.

This patch avoids the numerous deprecation warnings we had before, and
more importantly, it fixes #5883.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-05-02 12:33:46 +03:00
Jan Ciolek
59b7920b0b view_update_generator: add get_storage_proxy()
During view generation we would like to be able
to access information about the current state
of view update backlogs, but this information
is kept inside storage_proxy.

A reference to storage_proxy is kept inside view_update_generator,
so the easiest way to get access to it from the view update code
is by adding a public getter there.

There's already a similar getter for replica::database: get_db(),
so it's in line with the rest of the code.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2024-05-02 10:59:55 +02:00
Jan Ciolek
4c5cfc7683 storage_proxy: make view backlog getters public
Storage proxy maintains information about both local
and remote view update backlogs.

This information might also be useful outside of storage_proxy,
so let's expose the functions that allow to acces backlog information.

There aren't any implementation quirks that would make
it unsafe to make the functions public, the worst that
can happen is that someone causes a lot of atomic operations
by repeatedly calling get_view_update_backlog().

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2024-05-02 10:59:55 +02:00
Pavel Emelyanov
67736b5cd3 Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB"
This reverts commit 9c2a836607.
2024-05-02 08:16:14 +03:00
Pavel Emelyanov
d47053266b view: Abort pending view updates when draining
When view builder is drained (it now happens very early, but next patch
moves this into regular drain) it waits for all on-going view build
steps to complete. This includes waiting for any outstanding proxy view
writes to complete as well.

View writes in proxy have very high timeout of 5 minutes but they are
cancellable. However, canecelling of such writes happens in proxy's
drain_on_shutdown() call which, in turn, happens pretty late on
shutdown. Effectively, by the time it happens all view writes mush have
completed already, so stop-time cancelling doesn't really work nowadays.

Next patch makes view builder drain happen a bit later during shutdown,
namely -- _after_ shutting down messaging service. When it happen that
late, non-working view writes cancellation becomes critical, as view
builder drain hangs for aforementioned 5 minutes. This patch explicitly
cancels all view writes when view builder stops.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-02 08:16:12 +03:00
Kefu Chai
f183f5aa80 Update seastar submodule
* seastar 2b43417d...b73e5e7d (11):
  > treewide: inherit from formatter<string_view> not formatter<std::string_view>
  > CMakeLists.txt: Apply CXX deprecated flags conditionally
  > tls: add assignment operator for gnutls_datum
  > tls: s/get0()/get()/
  > io_queue: do not reference moved variable
  > TLS: use helper function in get_distinguished_name & get_alt_name_information
  > TLS: Add support for TLS1.3 session tickets
  > iotune: ignore shards with id above max_iodepth
  > core/future: remove a template parameter from set_callback()
  > util: with_file_input_stream: always close file
  > core/sleep: Use more raii-sh aproach to maintain sleeper

Fixes #5181

Closes scylladb/scylladb#18491
2024-05-02 07:35:42 +03:00
Takuya ASADA
b8634fb244 dist: stop installing scylla-tools, scylla-jmx by default
Since we added native nodetool, we no longer need to install scylla-tools
and scylla-jmx, drop them from scylla metapackage and make it optional
package.

Closes #18472

Closes scylladb/scylladb#18487
2024-05-01 22:15:40 +03:00
Kefu Chai
af5674211d redis/server.hh: suppress -Wimplicit-fallthrough from protocol_parser.hh
when compiling the tree with clang-18 and ragel 6.10, the compiler
warns like:

```
/usr/local/bin/cmake -E __run_co_compile --tidy="clang-tidy-18;--checks=-*,bugprone-use-after-move;--extra-arg-before=--driver-mode=g++" --source=/home/runner/work/scylladb/scylladb/redis/controller.cc -- /usr/bin/clang++-18 -DBOOST_NO_CXX98_FUNCTION_BASE -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -I/home/runner/work/scylladb/scylladb -I/home/runner/work/scylladb/scylladb/build/gen -I/home/runner/work/scylladb/scylladb/seastar/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/src -isystem /home/runner/work/scylladb/scylladb/cooking/include -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/runner/work/scylladb/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT redis/CMakeFiles/redis.dir/controller.cc.o -MF redis/CMakeFiles/redis.dir/controller.cc.o.d -o redis/CMakeFiles/redis.dir/controller.cc.o -c /home/runner/work/scylladb/scylladb/redis/controller.cc
error: too many errors emitted, stopping now [clang-diagnostic-error]
Error: /home/runner/work/scylladb/scylladb/build/gen/redis/protocol_parser.hh:110:1: error: unannotated fall-through between switch labels [clang-diagnostic-implicit-fallthrough]
  110 | case 1:
      | ^
/home/runner/work/scylladb/scylladb/build/gen/redis/protocol_parser.hh:110:1: note: insert 'FMT_FALLTHROUGH;' to silence this warning
  110 | case 1:
      | ^
      | FMT_FALLTHROUGH;
```

since we have `-Werror`, the warnings like this are considered as error,
hence the build fails. in order to address this failure, let's silence
this warning when including this generated header file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18447
2024-05-01 18:47:24 +03:00
Kefu Chai
08d1362f80 utils/chunked_vector: fix some typos in comment
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18486
2024-05-01 16:38:43 +03:00
Nadav Har'El
4e78e2d506 test/cql-pytest, cdc: add test for what happens when log name is taken
In our CDC implementation, the CDC log table for table "xyz" is always
called "xyz_scylla_cdc_log". If this table name is taken, and the user
tries to create a table "xyz" with CDC enabled - or enable CDC on the
table "xyz", the creation/enabling should fail gracefully, with a clear
error message. This test verifies this.

The new test passes - the code is already correct. I just wanted to
verify that it is (and to prevent future regressions).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18485
2024-05-01 14:46:19 +03:00
Pavel Emelyanov
5d992a4f01 proxy: Remove declaration of nonexisting view_update_write_response_handler class
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18417
2024-05-01 10:15:41 +03:00
Botond Dénes
65a385f5d0 Merge 'Relax the way view builder code checks if a table exists' from Pavel Emelyanov
There are two places that workaround db.column_family_exists() call with some fancy exceptions-catching lambda.
This PR makes things simpler.

Closes scylladb/scylladb#18441

* github.com:scylladb/scylladb:
  view: Open-code one line lambda checking if table exists
  view: Use non-throwoing check if a table exists
2024-05-01 10:14:58 +03:00
Kefu Chai
94ac0799d9 build: cmake: link scylla_tracing against scylla-main
because tracing/trace_keyspace_helper.cc references symbols
defined by table_helper, which is in turn provided by scylla-main,
we should link tracing_tracing against scylla-main.

otherwise we could have following link failure:

```
./build/./tracing/trace_keyspace_helper.cc:214: error: undefined reference to 'table_helper::setup_keyspace(cql3::query_processor&, service::migration_manager&, std::basic_string_view<char, std::char_traits<char> >, seastar::basic_sstring<char, unsigned int, 15u, true>, service::query_state&, std::vector<table_helper*, std::allocator<table_helper*> >)'
./build/./tracing/trace_keyspace_helper.cc:396: error: undefined reference to 'table_helper::cache_table_info(cql3::query_processor&, service::migration_manager&, service::query_state&)'
./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)'
./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)'
./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)'
./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)'
clang++-18: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18455
2024-05-01 10:08:11 +03:00
Kefu Chai
f0d12df7fc reloc: create $BUILDDIR for getting its path
when building with CMake, there is a use case where the $BUILDIR
is not created yet, when `reloc/build_rpm.sh` is launched. in order
to enable us to run this script without creating $BUILDIR first, let's
create this directory first.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18464
2024-05-01 09:52:17 +03:00
Kefu Chai
8168f02550 raft_group_registry: do not use moved variable
clang-tidy warns like:
```
[628/713] Building CXX object service/CMakeFiles/service.dir/raft/raft_group_registry.cc.o
Warning: /home/runner/work/scylladb/scylladb/service/raft/raft_group_registry.cc:543:66: warning: 'id' used after it was moved [bugprone-use-after-move]
  543 |             auto& rate_limit = _rate_limits.try_get_recent_entry(id, std::chrono::minutes(5));
      |                                                                  ^
/home/runner/work/scylladb/scylladb/service/raft/raft_group_registry.cc:539:19: note: move occurred here
  539 |     auto dst_id = raft::server_id{std::move(id)};
      |                   ^
```

this is a false alarm. as the type of `id` is actually `utils::UUID`
which is a struct enclosing two `int64_t` variables. and we don't
define a move constructor for `utils::UUID`. so the value of of `id`
is intact after being moved away. but it is still confusing at
the first glance, as we are indeed referencing a moved-away variable.

so in order to reduce the confusion and to silence the warning, let's
just do not `std::move(id)`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18449
2024-05-01 09:45:12 +03:00
Kefu Chai
bd0d246b57 tools/scylla-nodetool: implement the resetlocalschema command
Fixes #18468
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18470
2024-05-01 08:49:11 +03:00
Raphael S. Carvalho
b980634ff2 test: Verify tablet cleanup is properly retried on failure
Doesn't test only coordinator ability to retry on failure, but also
that replica will be able to properly continue cleanup of a storage
group from where it left off (when failure happened), not leave any
sstables behind.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18426
2024-04-30 19:27:17 +02:00
Raphael S. Carvalho
62b1cfa89c topology_coordinator: Fix synchronization of tablet split with other concurrent ops
Finalization of tablet split was only synchronizing with migrations, but
that's not enough as we want to make sure that all processes like repair
completes first as they might hold erm and therefore will be working
with a "stale" version of token metadata.

For synchronization to work properly, handling of tablet split finalize
will now take over the state machine, when possible, and execute a
global token metadata barrier to guarantee that update in topology by
split won't cause problems. Repair for example could be writing a
sstable with stale metadata, and therefore, could generate a sstable
that spans multiple tablets. We don't want that to happen, therefore
we need the barrier.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18380
2024-04-30 19:23:28 +02:00
Botond Dénes
525553aa41 SCYLLA-VERSION-GEN: warn against using - or _ in custom version names
Doing so is a pitfall that will make one waste a lot of time rebuilding
the packages, just because at the end it turns out that the version has
illegal characters in it. The author of this patch has certainly fallen
into this pitfall a lot of times.

Closes scylladb/scylladb#18429
2024-04-30 18:14:51 +03:00
Avi Kivity
ea15ddc7dc Merge 'Fix population of non-normal sstables from registry' from Pavel Emelyanov
On boot sstables are populated from normal location as well as from quarantine and staging. It turned out that sstables listed in registry (S3-backed ones) are not populated from non-normal states.

Closes scylladb/scylladb#18439

* github.com:scylladb/scylladb:
  test: Add test for how quarantined sstables registry entries are loaded
  sstable_directory: Use sstable location to initialize registry lister
2024-04-30 18:10:11 +03:00
Avi Kivity
329b135b5e Merge 'chunked_vector: fix use after free in emplace back' from Benny Halevy
Currently, push_back or emplace_back reallocate the last chunk
before constructing the new element.

If the arg passed to push_back/emplace_back is a reference to an
existing element in the vector, reallocating the last chunk will
invalidate the arg reference before it is used.

This patch changes the order when reallocating
the last chunk in reserve_for_emplace_back:
First, a new chunk_ptr is allocated.
Then, the back_element is emplaced in the
newly allocated array.
And only then, existing elements in the current
last chunk are migrated to the new chunk.
Eventually, the new chunk replaces the existing chunk.

If no reservation is requried, the back element
is emplaced "in place" in the current last chunk.

Fixes scylladb/scylladb#18072

Closes scylladb/scylladb#18073

* github.com:scylladb/scylladb:
  test: chunked_managed_vector_test: add test_push_back_using_existing_element
  utils: chunked_vector: reserve_for_emplace_back: emplace before migrating existing elements
  utils: chunked_vector: push_back: call emplace_back
  utils: chunked_vector: define min_chunk_capacity
  utils: chunked*vector: use std::clamp
2024-04-30 18:09:04 +03:00
David Garcia
f62197ee1e docs: enable concurrent downloads
Downloads chunks of 10 CSV concurrently to speed up doc builds.

Closes scylladb/scylladb#18469
2024-04-30 16:13:40 +03:00
Raphael S. Carvalho
d7a01598ce tools: Make sstable shard-of efficient by loading minimum to compute owners
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18440
2024-04-30 16:10:58 +03:00
Gleb Natapov
f2b0a5e9e1 storage_service: do not take API lock for removenode operation if topology coordinator is enabled
Topology coordinator serialize operations internally, so there is no
need to have an external lock.

Fixes: scylladb/scylladb#17681
2024-04-30 15:13:50 +03:00
Gleb Natapov
0a7101923c test: return file mark from wait_for that points after the found string
Returning file mark allows to start searching from the point where the
previous string was found.
2024-04-30 15:06:32 +03:00
Kefu Chai
3a1ceb96d7 utils: UUID_gen: include <atomic>
in UUID_gen.cc, we are using `std::atomic<int64_t>` in
`make_thread_local_node()`, but this template is not defined by
any of the included headers. but  we should include used headers
to be self-contained.

when compiling on ubuntu:jammy with libstdc++-13, we have following
error:
```
/usr/local/bin/cmake -E __run_co_compile --tidy="clang-tidy-18;--checks=-*,bugprone-use-after-move;--extra-arg-before=--driver-mode=g++" --source=/home/runner/work/scylladb/scylladb/utils/UUID_gen.cc -- /usr/bin/clang++-18 -DBOOST_ALL_NO_LIB -DBOOST_NO_CXX98_FUNCTION_BASE -DBOOST_REGEX_DYN_LINK -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -I/home/runner/work/scylladb/scylladb -I/home/runner/work/scylladb/scylladb/seastar/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/src -isystem /home/runner/work/scylladb/scylladb/cooking/include -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overl
Error: /home/runner/work/scylladb/scylladb/utils/UUID_gen.cc:29:33: error: implicit instantiation of undefined template 'std::atomic<long>' [clang-diagnostic-error]
   29 |     static std::atomic<int64_t> thread_id_counter;
      |                                 ^
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/shared_ptr_atomic.h:361:11: note: template is declared here
  361 |     class atomic;
      |           ^
```
so, in this change, we include `<atomic>` to address this
build failure.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18387
2024-04-30 09:07:22 +03:00
Kefu Chai
6a73c911e3 tools: lua_sstable_consumer.cc: be compatible with Lua 5.3's lua_resume()
in Lua 5.3, lua_resume() only accepts three parameters, while in Lua 5.4,
this function accepts four parameters. so in order to be compatible with
Lua 5.3, we should not pass the 4th parameter to this function.
a macro is defined to conditionally pass this parameter based on the
Lua's version.

see https://www.lua.org/manual/5.3/manual.html#lua_resume

Refs 5b5b8b3264
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18450
2024-04-30 09:06:25 +03:00
Botond Dénes
0ace90ad04 test: add test for cleaning up cached querier on tablet migration
Check that a cached querier, which exists prior to a migration, will be
cleaned up afterwards. This reproduces #18110.
The test fails before the fix for the above and passes afterwards.
2024-04-30 01:47:16 -04:00
Botond Dénes
64c817462e querier: allow injecting cache entry ttl by error injector
To allow making tests more robust by setting TTL to a very large value,
whent the test relies on entries being present for a given time.
2024-04-30 01:47:16 -04:00
Botond Dénes
03995d9397 replica/table: cleanup_tablet(): clear inactive reads for the tablet
To avoid any resource surviving the cleanup, via some inactive read
pinning it. This can cause data resurrection if the tablet is later
migrated back and the pinned data source is added back to the tablet.
2024-04-30 01:47:16 -04:00
Botond Dénes
a062e3f650 replica/database: introduce clear_inactive_reads_for_tablet()
To be used on the tablet cleanup path, to clear any inactive read which
might be related to the cleaned-up tablet.
2024-04-30 01:44:03 -04:00
Botond Dénes
338af5055c replica/database: introduce foreach_reader_concurrency_semaphore
Currently we have a single method -- detach_column_family() -- which
does something with each semaphore. Soon there will be another one.
Introduce a method to do something with all semaphores, to make this
smoother. Enterprise has a different set of semaphores, and this will
reduce friction.
2024-04-30 01:43:56 -04:00
Botond Dénes
3c813fbb99 reader_concurrency_semaphore: add range param to evict_inactive_reads_for_table()
When the new optional parameter has a value, evict only inactive reads,
whose ranges overlap with the provided range. The range for the inactive
read is provided in `register_inactive_read()`. If the inactive read has
no range, ovarlap is assumed and the read is evicted.
This will be used to evict all inactive reads that could potentially use
a cleaned-up tablet.
2024-04-30 01:31:08 -04:00
Botond Dénes
9e7a957ffb reader_concurrency_semaphore: allow storing a range with the inactive reader
This allows specifying the range the inactive read is reading from. To
be used in the next patch to selectively evict inactive reads whose
range overlaps with a certain (tablet) range.
2024-04-30 01:31:08 -04:00
Botond Dénes
67684308d1 reader_concurrency_semaphore: avoid detach() in inactive_read_handle::abandon()
inactive_read_handle::abandon() evicts and destroyes the inactive-read,
so it is not left behind. Currently, while doing so, it triggers the
inactive_read's own version of abandon(): detach(). The two has bad
interaction when the inactive_read_handle stores the last permit
instance, causing (so far benign) use-after-free. Prevent triggering
detach() to avoid this bad interaction altogether.
2024-04-30 01:31:08 -04:00
Piotr Dulikowski
35f456c483 Merge 'Extend ALTER TABLE ... DROP to allow specifying timestamp of column drop' from Michał Jadwiszczak
In order to correctly restore schema from `DESC SCHEMA WITH INTERNALS`, we need a way to drop a column with a timestamp in the past.

Example:
- table t(a int pk, b int)
- insert some data1
- drop column b
- add column b int
- insert some data2

If the sstables weren't compacted, after restoring the schema from description:
- we will loss column b in data2 if we simply do `ALTER TABLE t DROP b` and `ALTER TABLE t ADD b int`
- we will resurrect column b in data1 if we skip dropping and re-adding the column

Test for this: https://github.com/scylladb/scylla-dtest/pull/4122

Fixes #16482

Closes scylladb/scylladb#18115

* github.com:scylladb/scylladb:
  docs/cql: update ALTER TABLE docs
  test/cqlpytest: add test for prepared `ALTER TABLE ... DROP ... USING TIMESTAMP ?`
  test/cql-pytest: remove `xfail` from alter table with timestamp tests
  cql3/statements: extend `ALTER TABLE ... DROP` to allow specifying timestamp of column drop
  cql3/statements: pass `query_options` to `prepare_schema_mutations()`
  cql3/statements: add bound terms to alter table statement
  cql3/statements: split alter_table_statement into raw and prepared
  schema: allow to specify timestamp of dropped column
2024-04-29 14:05:05 +02:00
Piotr Dulikowski
dec652de9e test: topology: test that upgrade succeeds after recent removal
Adds a regression test for scylladb/scylladb#18198 - start a two node
cluster in legacy topology mode, use nodetool removenode on one of the
nodes, upgrade the remaining 1-node cluster and observe that it
succeeds.
2024-04-29 13:33:40 +02:00
Piotr Dulikowski
cb4a4f2caf topology_coordinator: compute cluster size correctly during upgrade
During upgrade to raft topology, information about service levels is
copied from the legacy tables in system_distributed to the raft-managed
tables of group 0. system_distributed has RF=3, so if the cluster has
only one or two nodes we should use lower consistency level than ALL -
and the current procedure does exactly that, it selects QUORUM in case
of two nodes and ONE in case of only one node. The cluster size is
determined based on the call to _gossiper.num_endpoints().

Despite its name, gossiper::num_endpoints() does not necessarily return
the number of nodes in the cluster but rather the number of endpoint
states in gossiper (this behavior is documented in a comment near the
declaration of this function). In some cases, e.g. after gossiper-based
nodetool remove, the state might be kept for some time after removal (3
days in this case).

The consequence of this is that gossiper::num_endpoints() might return
more than the current number of nodes during upgrade, and that in turn
might cause migration of data from one table to another to fail -
causing the upgrade procedure to get stuck if there is only 1 or two
nodes in the cluster.

In order to fix this, use token_metadata::get_all_endpoints() as a
measure of the cluster size.

Fixes: scylladb/scylladb#18198
2024-04-29 13:26:29 +02:00
Takuya ASADA
af0c0ee8af configure.py: revert changing builddir as absolute path
On be3776ec2a, we changed outdir to
absolute path.
This causes "unknown target" error when we build Scylla using the relative
path something like "ninja build/dev/scylla", since the target name
become absolte path.
Revert the change to able to build with the relative path.

Also, change optimized_clang.sh to use relative path for --builddir,
since we reference "../../$builddir/SCYLLA-*-FILE" when we build
submodule, it won't work with absolute path.

Fixes #18321

Closes scylladb/scylladb#18338
2024-04-29 09:35:21 +03:00
Kefu Chai
4433d2e10e build: cmake: let iotune depends on config specific file
before this change, in order to build `${iotune_path}`, we use
the rule to build `app_iotune` but this target is built using
the default build type, see
https://cmake.org/cmake/help/latest/variable/CMAKE_DEFAULT_BUILD_TYPE.html#variable:CMAKE_DEFAULT_BUILD_TYPE
so, if we want to build `${iotune_path}` for the configuration
which is not listed as the first item in `CMAKE_CONFIGURATION_TYPES`,
we would end up with copying an nonexistent file.

to address this issue, we override the this behavior using
the `$<OUTPUT_CONFIG:...>` generator-expression. so that we
can depend on non-unique path. and the file-level dependency
between ${iotune_path} and $<CONFIG>/iotune can be established.

see also
https://cmake.org/cmake/help/latest/generator/Ninja%20Multi-Config.html#custom-commands

Refs #2717

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18395
2024-04-29 09:06:39 +03:00
Kefu Chai
f03f69ad4f partition_version: move the base class in move ctor
before this change, `partition_version` uses a hand-crafted move
constructor. but it suffers from the warning from clang-tidy, which
believe there is a use-after-move issue, as the inner instance of
it's parent class is constructed using
`anchorless_list_base_hook(std::move(pv))`, and its other member
variables are initialized like `_partition(std::move(pv._partition))`

`std::move(pv)` does not do anything, but *indicates* `pv` maybe
moved from. and what is moved away is but the part belong to its
parent class. so this issue is benign.

but, it's still annoying. as we need to tell the genuine issues
reported by clang-tidy from the false alarms. so we have at least
two options:

- stop using clang-tidy
- ignore this warning
- silence this warning using LINT direction in a comment
- use another way to implement the move constructor

in this change, we just cast the moved instance to its
base class and move it instead, this should applease
clang-tidy.

Fixes #18354
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18359
2024-04-28 18:34:45 +02:00
Dawid Medrek
bf802e99eb docs: Update Hinted Handoff documentation
We briefly explain the process of migration
of Hinted Handoff to host IDs, the rationale
for it, consequences, and possible side effects.
2024-04-28 01:22:59 +02:00
Dawid Medrek
46ab22f805 db/hints: Add endpoint_downtime_not_bigger_than()
We add an auxiliary function checking if a node
hasn't been down for too long. Although
`gms::gossiper` provides already exposes a function
responsible for that, it requires that its
argument be an IP address. That's the reason
we add a new function.
2024-04-28 01:22:59 +02:00
Dawid Medrek
0ef8d67d32 db/hints: Migrate hinted handoff when cluster feature is enabled
These changes migrate hinted handoff to using
host ID as soon as the corresponding cluster
feature is enabled.

When a node starts, it defaults to creating
directories naming them after IP addresses.
When the whole cluster has upgraded
to a version of Scylla that can handle
directories representing host IDs,
we perform a migration of the IP folders,
i.e. we try to rename them to host IDs.
Invalid directories, i.e. those that
represent neither an IP address, nor a host
ID, are removed.

During the migration, hinted handoff is
disabled. It is necessary because we have
to modify the disk's contents, so new hints
cannot be saved until the migration finishes.
2024-04-28 01:22:57 +02:00
Dawid Medrek
58784cd8db db/hints: Handle arbitrary directories in resource manager
Before these changes, resource manager only handled
the case when directories it browsed represented
valid host IDs. However, since before migrating
hinted handoff to using host IDs we still name
directories after IP addresses, that would lead
to exceptins that shouldn't happen.

We make resource manager handle directories
of arbitrary names correctly.
2024-04-27 22:31:07 +02:00
Dawid Medrek
ee84e810ca db/hints: Start using hint_directory_manager
We start keeping track of mappings IP - host ID.
The mappings are between endpoint managers
(identified by host IDs) and the hint directories
managed by them (represented by IP addresses).

This is a prelude to handling IP directories
by the hint shard manager.

The structure should only be used by the hint
manager before it's migrated to using host IDs.
The reason for that is that we rely on the
information obtained from the structure, but
it might not make sense later on.

When we start creating directories named after
host IDs and there are no longer directories
representing IP addresses, there is no relation
between host IDs and IPs -- just because
the structure is supposed to keep track between
endpoint managers and hint directories that
represent IP addresses. If they represent
host IDs, the connection between the two
is lost.

Still using the data structure could lead
to bugs, e.g. if we tried to associate
a given endpoint manager's host ID with its
corresponding IP address from
locator::token_metadata, it could happen that
two different host IDs would be bound to
the same IP address by the data structure:
node A has IP I1, node A changes its IP to I2,
node B changes its IP to I1. Though nodes
A and B have different host IDs (because they
are unique), the code would try to save hints
towards node B in node A's hint directory,
which should NOT happen.

Relying on the data structure is thus only
safe before migrating hinted handoff to using
host IDs. It may happen that we save a hint
in the hint directory of the wrong node indeed,
but since migration to using host IDs is
a process that only happens once, it's a price
we are ready to pay. It's only imperative to
prevent it from happening in normal
circumstances.
2024-04-27 22:31:07 +02:00
Dawid Medrek
aa4b06a895 db/hints: Enforce providing IP in get_ep_manager()
We drop the default argument in the function's signature.
Also, we adjust the code of change_host_filter() to
be able to perform calls to get_ep_manager().
2024-04-27 22:31:07 +02:00
Dawid Medrek
d0f58736c8 db/hints: Introduce hint_directory_manager
This commit introduces a new class responsible
for keeping track of mappings IP-host ID.
Before hinted handoff is migrated to using
host IDs, hint directories still have to
represent IP addresses. However, since
we identify endpoint managers by host IDs
already, we need to be able to associate
them with the directories they manage.
This class serves this purpose.
2024-04-27 22:31:07 +02:00
Dawid Medrek
f9af01852d db/hints/resource_manager: Update function description
The current description of the function
`space_watchdog::scan_one_ep_dir` is
not up-to-date with the function's
signature. This commit updates it.
2024-04-27 22:31:07 +02:00
Dawid Medrek
59d49c5219 db/hints: Coroutinize space_watchdog::scan_one_ep_dir() 2024-04-27 22:31:07 +02:00
Dawid Medrek
8fd9c80387 db/hints: Expose update lock of space watchdog
We expose the update lock of space watchdog
to be able to prevent it from scanning
hint directories. It will be necessary in an
upcoming commit when we will be renaming hint
directories and possibly removing some of them.
Race conditions are unacceptable, so resource
manager cannot be able to access the directory
during that time.
2024-04-27 22:31:07 +02:00
Dawid Medrek
934e4bb45e db/hints: Add function for migrating hint directories to host ID
We add a function that will be used while
migrating hinted handoff to using host IDs.
It iterates over existing hint directories
and tries to rename them to the corresponding
host IDs. In case of a failure, we remove
it so that at the end of its execution
the only remaining directories are those
that represent host IDs.
2024-04-27 22:31:04 +02:00
Dawid Medrek
e36f853f9b db/hints: Take both IP and host ID when storing hints
The store_hint() method starts taking both an IP
and a host ID as its arguments. The rationale
for the change is depending on the stage of
the cluster (before an upgrade to the
host-ID-based hinted handdof and after it),
we might need to create a directory representing
either an IP address, or a host ID.

Because locator::topology can change in the
before obtaining the host ID we pass
and when the function is being executed,
we need to pass both parameters explicitly
to ensure the consistency between them.
2024-04-27 20:35:58 +02:00
Dawid Medrek
063d4d5e91 db/hints: Prepare initializing endpoint managers for migrating from IP to host ID
We extract the initialization of endpoint managers
from the start method of the hint manager
to a separate function and make it handle directories
that represent either IP addresses, or host IDs;
other directories are ignored.

It's necessary because before Scylla is upgraded
to a version that uses host-ID-based hinted handoff,
we need to continue only managing IP directories.
When Scylla has been upgraded, we will need to handle
host ID directories.

It may also happen that after an upgrade (but not
before it), Scylla fails while renaming
the directories, so we end up with some of them
representing IP address, and some representing
host IDs. After these changes, the code handles
that scenario as well.
2024-04-27 20:35:53 +02:00
Dawid Medrek
cfd03fe273 db/hints: Migrate to locator::host_id
We change the type of node identifiers
used within the module and fix compilation.
Directories storing hints to specific nodes
are now represented by host IDs instead of
IPs.
2024-04-26 22:44:04 +02:00
Dawid Medrek
1af7fa74e8 db/hints: Remove noexcept in do_send_one_mutation()
While the function is marked as noexcept, the returned
future can in fact store an exception. We remove the
specifier to reflect the actual behavior of the
function.
2024-04-26 22:44:04 +02:00
Dawid Medrek
54ae9797b9 service: Add locator::host_id to on_leave_cluster
We extend the function
endpoint_lifecycle_subscriber::on_leave_cluster
by another argument -- locator::host_id.
It's more convenient to have a consistent
pair of IP and host ID.
2024-04-26 22:44:03 +02:00
Dawid Medrek
a36387d942 service: Fix indentation 2024-04-26 22:44:03 +02:00
Dawid Medrek
c585444c60 db/hints: Fix indentation 2024-04-26 22:44:03 +02:00
Pavel Emelyanov
7f2742893e view: Open-code one line lambda checking if table exists
Continuation of the previous patch. The lambda in question used to be a
heavyweight(y) code, but now it's one-liner. And it's only called once,
so no more point in keeping it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-26 20:19:38 +03:00
Pavel Emelyanov
a3e76f9c93 view: Use non-throwoing check if a table exists
Two places in view code check if a table exists by finding its schema ID
and catching no_such_column_family exception. That's a bit heavyweight,
database has column_family_exists() method for such cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-26 20:17:35 +03:00
Pavel Emelyanov
5e23493d25 test: Add test for how quarantined sstables registry entries are loaded
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-26 16:54:43 +03:00
Pavel Emelyanov
ba512c52a5 sstable_directory: Use sstable location to initialize registry lister
When populating sstables on boot a bunch of sstable_directory objects is
created. For each sstable there come three -- one for normal, quarantine
and staging state. Each is initialized with sstable location (which is
now a datadir/ks_name/cf_name-and-uuid) and the desired state (a enum
class). When created, the directory object wires up component lister,
depending on which storage options are provided. For local sstables a
legacy filesystem lister is created and it's initialized with a path
where to search files for -- location + / + string(state). But for s3
sstables, that keep their entries in registry, the lister is
errorneously initialized with the same location + / + string(state)
value. The mistake is that sstables in registry keep location and state
in different columns, so for any state lister should query registry with
the same location value (then it filters entries by state on its own).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-26 16:36:47 +03:00
Kamil Braun
d8313dda43 Merge 'db: config: move consistent-topology-changes out of experimental and make it the default for new clusters' from Patryk Jędrzejczak
We move consistent cluster management out of experimental and
make it the default for new clusters in 6.0. In code, we make the
`consistent-topology-changes` flag unused and assumed to be true.

In 6.0, the topology upgrade procedure will be manual and
voluntary, so some clusters will still be using the gossip-based
topology even though they support the raft-based topology.
Therefore, we need to continue testing the gossip-based topology.
This is possible by using the `force-gossip-topology-changes` flag
introduced in scylladb/scylladb#18284.

Ref scylladb/scylladb#17802

Closes scylladb/scylladb#18285

* github.com:scylladb/scylladb:
  docs: raft.rst: update after removing consistent-topology-changes
  treewide: fix indentation after the previous patch
  db: config: make consistent-topology-changes unused
  test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls
  test: test_read_required_hosts: run with force-gossip-topology-changes
  storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes
  storage_service: join_token_ring: fix finish_setup_after_join calls
2024-04-26 14:45:29 +02:00
Botond Dénes
b96f28356a Merge 'api/storage_service: convert runtime_error from repair to http error ' from Kefu Chai
in `set_repair()`, despite that the repair is performed asynchronously,
we check the options specified by client immediately, and throw
`std::runtime_error`, if any of them is not supported.

before this change, these unhandled exceptions are translated to HTTP
500 error but the underlying HTTP router. but this is misleading, as
these errors are caused by client, not server.

in this change, we handle the `runtime_error`, and translate them
into `httpd::bad_param_exception`, so that the client can have
HTTP 400 (Bad Request) instead of HTTP 500 (Internal Server Error),
and with informative error message.

for instance, if we apply repair with "small_table_optimization" enabled
on a keyspace with tablets enabled. we should have an HTTP error 400
with "The small_table_optimization option is not supported for tablet repair"
as the body of the error. this would much more helpful.

Closes scylladb/scylladb#18389

* github.com:scylladb/scylladb:
  api/storage_service: convert runtime_error from repair to http error
  repair: change runtime_error to invalid_argument in do_repair_start()
  api/storage_service: coroutinize set_repair()
2024-04-26 13:27:51 +03:00
Patryk Jędrzejczak
3a100cd16c test: test_raft_recovery_stuck: ensure raft upgrade procedure failed
We have log browsing in test.py now, so we can fix this TODO easily.

Closes scylladb/scylladb#18425
2024-04-26 10:16:49 +02:00
Asias He
62a9ecff51 repair: Cleanup repair history status entry for tablet
The entry in the repair history map that is used to track repair status
internally for each repair job should be removed after the repair job is
done. We do the same for vnode repairs.

This patch adds the missing automatic history cleanup code which is
missed in the initial tablet repair support in commit 54239514af,
which does not support repair history update back then.

Refs #17046

Closes scylladb/scylladb#18434
2024-04-26 10:56:45 +03:00
Botond Dénes
044fd7a3ec Merge 'Move some view updating methods from table to view_update_generator' from Pavel Emelyanov
The populate_views() and generate_and_propagate_view_updates() both naturally belong to view_update_generator -- they don't need anything special from table itself, but rather depend on some internals of the v.u.generator itself.

Moving them there lets removing the view concurrency semaphore from keyspace and table, thus reducing the cross-components dependencies.

Closes scylladb/scylladb#18421

* github.com:scylladb/scylladb:
  replica: Do not carry view concurrency semaphore pointer around
  view: Get concurrency semaphore via database, not table
  view_update_generator: Mark mutate_MV() private
  view: Move view_update_generator methods' code
  view: Move table::generate_and_propagate_view_updates into view code
  view: Move table::populate_views() into view_update_generator class
2024-04-26 10:55:38 +03:00
Botond Dénes
d566eec89a Merge 'treewide: remove {dclocal_,}read_repair_chance options' from Kefu Chai
dclocal_read_repair_chance and read_repair_chance have been removed in Cassandra 3.11 and 4.x, see
https://issues.apache.org/jira/browse/CASSANDRA-13910. if we expose these properties via DDL, Cassandra would fail to consume the CQL statement creating the table when performing migration from Scylla to Cassandra 4.x, as the latter does not understand these properties anymore.

currently the default values of `dc_local_read_repair_chance` and `read_repair_chance` are both "0". so they are practically disabled, unless user deliberately set them to a value greater than 0.

also, as a side effect, Cassandra 4.x has better support of Python3. the cqlsh shipped along with Cassandra 3.11.16 only supports python2.7, see
https://github.com/apache/cassandra/blob/cassandra-3.11.16/bin/cqlsh.py it errors out if the system only provides python3 with the error of
```
No appropriate python interpreter found.
```
but modern linux systems do not provide python2 anymore.

so, in this change, we deprecate these two options.

Fixes #3502
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18087

* github.com:scylladb/scylladb:
  docs: drop documents related to {,dclocal_}read_repair_chance
  treewide: remove {dclocal_,}read_repair_chance options
2024-04-26 10:48:47 +03:00
Michał Chojnowski
c1146314a1 docs: clarify that DELETE can be used with USING TIMEOUT
The current text seems to suggest that `USING TIMEOUT` doesn't work with `DELETE` and `BATCH`. But that's wrong.

Closes scylladb/scylladb#18424
2024-04-26 10:48:17 +03:00
Pavel Emelyanov
4ac30e5337 view-builder: Print correct exception in built ste exception handler
Inside .handle_exception() continuation std::current_exception() doesn't
work, there's std::exception ex argument to handler's lambda instead

fixes #18423

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18349
2024-04-26 09:58:45 +03:00
Kefu Chai
0bbaded4ce api/storage_service: convert runtime_error from repair to http error
in `set_repair()`, despite that the repair is performed asynchronously,
we check the options specified by client immediately, and throw
`std::runtime_error`, if any of them is not supported.

before this change, these unhandled exceptions are translated to HTTP
500 error but the underlying HTTP router. but this is misleading, as
these errors are caused by client, not server. and the error message
is missing in the HTTP error message when performing the translation.

in this change, we handle the `runtime_error`, and translate them
into `httpd::bad_param_exception`, so that the client can have
HTTP 400 (Bad Request) instead of HTTP 500 (Internal Server Error),
and with informative error message.

for instance, if we apply repair with "small_table_optimization" enabled
on a keyspace with tablets enabled. we should have an HTTP error 400
with "The small_table_optimization option is not supported for tablet repair"
as the body of the error. this would much more helpful.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-26 14:25:15 +08:00
Kefu Chai
9de9f401a1 repair: change runtime_error to invalid_argument in do_repair_start()
if an error is caused by the option provided by user, would be better
to throw an `std::invalid_argument` instead of `std::runtime_error`,
so that the caller can make a better decision when handling the
thrown exceptions.

so, in this change, we change the exceptions raise directly in
`repair_service::do_repair_start()` from `std::runtime_error` to
`std::invalid_argument`. please note, in the lambda named `host2ip`,
since the hostname is not provided by user, so we are not changing
the exception type in that lambda.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-26 14:24:45 +08:00
Kefu Chai
d737ba1ab2 api/storage_service: coroutinize set_repair()
before this change, `set_repair()` uses a lambda for handling
the client-side requests. and this works great. but the underlying
`repair_start()` throws if any of the given options is not sane.
and we don't handle any of these throw exceptions in `set_repair()`,
from client's point of view, it would get an HTTP 500 error code,
which implies an "Internal Server Error". but actually, we should
blame the client for the error, not the server.

so, to prepare the error handling, let's take the opportunity to
coroutinize the lambda handling the request, so that we can handle
the exception in a more elegant way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-26 14:24:03 +08:00
Michał Jadwiszczak
7f839f727e docs/cql: update ALTER TABLE docs 2024-04-26 07:01:08 +02:00
Michał Jadwiszczak
7cbce78480 test/cqlpytest: add test for prepared ALTER TABLE ... DROP ... USING TIMESTAMP ? 2024-04-26 07:01:02 +02:00
Botond Dénes
7cbe5c78b4 install.sh: use the native nodetool directly
* tools/java b810e8b00e...4ee15fd9ea (1):
  > install.sh: don't install nodetool into /usr/bin

Add a bin/nodetool and install it to bin/ in install.sh. This script
simply forwards to scylla nodetool and it is the replacement for the
Java nodetool, which is dropped from the java-tools's install.sh, in the
submodule update also included in this patch.
With this change, we now hardwire the usage of the native nodetool, as
*the* nodetool, with the intermediary nodetool wrapper script removed
from the picture.
Bash completion was copied from the java tools repository and it is now
installed by the scylla package, together with nodetool.

The Java nodetool is still available as as a fall-back, in case the
native nodetool has problems, at the path of
/opt/scylladb/share/cassandra/bin/nodetool.

Testing

I tested upgrades on a DEB and RPM distro: Ubuntu and Fedora.
First I installed scylla-5.4, then I installed the packages for this PR.
On Ubuntu, I had to use dpkg -i --auto-deconfigure, otherwise, dpkg would
refuse to install the new packages because they break the old ones. No
extra flags were required on Fedora.
In both cases, /usr/bin/nodetool was changed from a thunk calling the
Java nodetool (from 5.4) to the native launcher script from this PR.
/opt/scylladb/share/cassandra/bin/nodetool remained in place and still
works after the upgrade.

I also verified that --nonroot installs also work. Nodetool works both
when called with an absolute path, or when ~/scylladb/bin is added to
$PATH.

Fixes: #18226
Fixes: #17412

Closes scylladb/scylladb#18255

[avi: reset submodule to actual hash we ended up with]
2024-04-25 22:52:00 +03:00
Michał Jadwiszczak
27a4331dcd test/cql-pytest: remove xfail from alter table with timestamp tests
Previous patch introduced `ALTER TABLE ... DROP .. USING TIMESTAM ...`
so those test should no longer fail.

Refs #9929
2024-04-25 21:27:40 +02:00
Michał Jadwiszczak
80f0357436 cql3/statements: extend ALTER TABLE ... DROP to allow specifying timestamp of column drop 2024-04-25 21:27:40 +02:00
Michał Jadwiszczak
7dc0d068c0 cql3/statements: pass query_options to prepare_schema_mutations()
The object is needed to get timestamp from attributes (in a case when
the statement was prepared with parameter marker).
2024-04-25 21:27:40 +02:00
Michał Jadwiszczak
998a65a4f6 cql3/statements: add bound terms to alter table statement
Until now, alter table couldn't take any parameter marker, so the bound
terms were always 0.
Adding `USING TIMESTAMP` to `ALTER TABLE ... DROP` also adds possibility
to prepare a alter table statement with a paramenter marker.
2024-04-25 21:27:40 +02:00
Michał Jadwiszczak
d268641c27 cql3/statements: split alter_table_statement into raw and prepared
Currently alter table doesn't prepare any parameters so raw statement
and prepared one could be the same class.
Later commit will add attributes to the statement, which needs to be
prepared, that's why I'm splitting.
2024-04-25 21:27:40 +02:00
Michał Jadwiszczak
1c5563ba44 schema: allow to specify timestamp of dropped column
In order to drop a column with specified timestamp, we need to
allow it in out schema class.
2024-04-25 21:27:40 +02:00
Avi Kivity
c2b8ca7d71 Merge 'cql3: statements: change default tombstone_gc mode for tablets' from Aleksandra Martyniuk
Repair may miss some tablets that migrated across nodes.
So if tombstones expire after some timeout, then we can
have data resurrection.

Set default tombstone_gc mode to "repair" for tables which
use tablets (if repair is required).

Fixes: #16627.

Closes scylladb/scylladb#18013

* github.com:scylladb/scylladb:
  test: check default value of tombstone_gc
  test: topology: move some functions to util.py
  cql3: statements: change default tombstone_gc mode for tablets
2024-04-25 19:18:37 +03:00
Lakshmi Narayanan Sreethar
6af2659b57 sstables: reclaim_memory_from_components: do not update _recognised_components
When reclaiming memory from bloom filters, do not remove them from
_recognised_components, as that leads to the on-disk filter component
being left back on disk when the SSTable is deleted.

Fixes #18398

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#18400
2024-04-25 19:15:59 +03:00
Raphael S. Carvalho
4a5fdc5814 table: Remove outdated FIXME about sstable spanning multiple tablets
The FIXME was added back then because we thought the interface of
compaction_group_for_sstable might have to be adjusted if a sstable
were allowed to temporarily span multiple tablets until it's split,
but we have gone a different path.
If a sstable's key range incorrectly spans more than one tablet,
that will be considered a bug and an exception is thrown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18410
2024-04-25 17:21:11 +03:00
Marcin Maliszkiewicz
7085339f72 cql3: test: include get_mutations_internal log in test.py
We have a concurrent modification conflict in tests and suspect
duplicated requests but since we don't log successful requests
we have no way to verify if that's the case. get_mutations_internal log
will help to tell wchich nodes are trying to push auth or
service levels mutations into raft.

Refs scylladb/scylladb#18319

Closes scylladb/scylladb#18413
2024-04-25 17:17:53 +03:00
Botond Dénes
0234b4542a Merge '[github] add PR template and action to verify PR tasks was completed' from Yaron Kaikov
Today with the backport automation, the developer added the relevant backport label, but without any explanation of why

Adding the PR template with a placeholder for the developer to add his decision about backport yes or no

The placeholder is marked as a task, so once the explanation is added, the task must be checked as completed

Also adding another check to the PR summary will make it clear to the maintainer/reviewer if the developer explained about backport

Closes scylladb/scylladb#18275

* github.com:scylladb/scylladb:
  [github] add action to verify PR tasks was completed
  [github] add PR template
2024-04-25 17:14:50 +03:00
Pavel Emelyanov
18cc2cfa31 replica: Generalize snapshot details for single table/snapshot dir
There are two places that get total:live stats for a table snapshot --
database::get_snapshot_details() and table::get_snapshot_details(). Both
do pretty similar thing -- walk the table/snapshots/ directory, then
each of the found sub-directory and accumulate the found files' sizes
into snapshot details structure.

Both try to tell total from live sizes by checking whether an sstable
component found in snapshots is present in the table datadir. The
database code does it in a more correct way -- not just checks the file
presense, but also compares if it's a hardlink on the snapshot file,
while the table code just checks if the file of the same name exists.

This patch does both -- makes both database and table call the same
helper method for a single snapshot details, and makes the generalized
version use more elaborated collision check, thus fixing the per-table
details getting behavior.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18347
2024-04-25 17:12:42 +03:00
Asias He
1ca779d287 streaming: Fix use after move in fire_stream_event
The event is used in a loop.

Found by clang-tidy:

```
streaming/stream_result_future.cc:80:49: warning: 'event' used after it was moved [bugprone-use-after-move]
        listener->handle_stream_event(std::move(event));
                                                ^
streaming/stream_result_future.cc:80:39: note: move occurred here
        listener->handle_stream_event(std::move(event));
                                      ^
streaming/stream_result_future.cc:80:49: note: the use happens in a later loop iteration than the move
        listener->handle_stream_event(std::move(event));
                                                ^
```

Fixes #18332

Closes scylladb/scylladb#18333
2024-04-25 16:48:54 +03:00
Botond Dénes
2c8bd99cd4 Merge 'Coroutinize view_builder::stop()' from Pavel Emelyanov
It's pretty straightforward, but prior to that, exception handling needs some care

Closes scylladb/scylladb#18378

* github.com:scylladb/scylladb:
  view-builder: Coroutinize stop()
  view_builder: Do not try to handle step join exceptions on stop
2024-04-25 16:48:25 +03:00
Kefu Chai
014a069ed2 build: cmake: require {fmt} >= 9.0.0
we are using `fmt::ostream_formatter` which was introduced in
{fmt} v9.0.0, see https://github.com/fmtlib/fmt/releases/tag/9.0.0 .

before this change, we depend on Seastar to find {fmt}. but
the minimal version of {fmt} required by Seastar is 5.0.0, which
cannot fulfill the needs to build scylladb.

in this change, we find {fmt} package in scylla, and specify the
minimal required version of 9.0.0, so the build can fail at the
configuration time. {fmt} v8 could be still being used by users.
for instance, ubuntu:jammy comes with libfmt-dev 8.1.1. and
ubuntu:jammy is EOL in Apr 2027, see
https://ubuntu.com/about/release-cycle .

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18386
2024-04-25 16:35:08 +03:00
Amnon Heiman
dfea50a7e9 db/config.cc add metric family config from file
Metric family config lets a user configure the metric family aggregate labels.
This patch modifies the existing relable-config from file to accept
metric family config.

Similar to the existing relable_config, it adds a metric_family_configs
section.  For example, the following configuration demonstrates changing
aggregate labels by name and regular expression.

```
metric_family_configs:
 - name: storage_service
   aggregate_labels: [shard]
 - regex: (storage_proxy.*)
   aggregate_labels: [shard, scheduling_group_name]
```

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#18339
2024-04-25 16:03:39 +03:00
Kefu Chai
e9b31cb4c1 test: locator_topology: s/get0()/get()/
this change addresses the leftover of 9e8805bb49

Refs 9e8805bb49

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18390
2024-04-25 16:03:01 +03:00
Patryk Jędrzejczak
55b011902e docs: raft.rst: update after removing consistent-topology-changes 2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak
0d428a3857 treewide: fix indentation after the previous patch 2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak
3a34bb18cd db: config: make consistent-topology-changes unused
We make the `consistent-topology-changes` experimental feature
unused and assumed to be true in 6.0. We remove code branches that
executed if `consistent-topology-changes` was disabled.
2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak
77342ffb34 test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls
In the following commit, we make the `consistent-topology-changes`
experimental feature unused. Then, all unit tests in the boost suite
will start using the raft-based topology by default. Unfortunately,
tests with multiple `single_node_cql_env::run_in_thread` calls
(usually coming from the `do_with_cql_env_thread` calls) would fail.

In a noninitial `run_in_thread` call, a node is started as if it
booted for the first time. On the other hand, it has its persistent
state from previous boots. Hence, the node can behave strangely and
unexpectedly. In particular, `SYSTEM.TOPOLOGY` is not empty and the
assertion that expects it to be empty when we boot for the first
time fails.

We fix this issue by making noninitial `run_in_thread` calls
behave as normal restarts.

After this change,
`test_schema_digest_does_not_change_with_disabled_features` starts
failing. This test copies the data directory before booting for the
first time, so the new
`_sys_ks.local().build_bootstrap_info().get();` makes the node
incorrectly think it restarts. Then, after noticing it is not a part
of group 0, the node would start the raft upgrade procedure if we
didn't run it in the raft RECOVERY mode. This procedure would get
stuck because it depends on messaging being enabled even if the node
communicates only with itself and messaging is disabled in boost tests.
2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak
88038d958a test: test_read_required_hosts: run with force-gossip-topology-changes
In one of the following commits, we make the
`consistent-topology-changes` experimental feature unused. Then,
all unit tests in the boost suite will start using the raft-based
topology by default. Unfortunately, some tests would start failing
and `test_read_required_hosts` is one of them.

`tablet_cql_test_config` in `tablets_test.cc` doesn't use
`consistent-topology-changes`, so all test cases in this file
run incorrectly wit the gossip-based topology changes. With
`consistent-topology-changes`, only `test_read_required_hosts`
fails. The failure happens on `auto table2 = add_table(e).get();`:
```
ERROR 2024-04-17 11:14:16,083 [shard 0:main] load_balancer -
Replica 9b94d710-fbfb-11ee-9c4f-448617b47e11:0 of tablet
9b94d713-fbfb-11ee-9c4f-448617b47e11:0 not found in topology
```
This test case needs to be investigated and rewritten so that
it passes with the raft-based topology. However, we don't want
this issue to block the process of making the
`consistent-topology-changes` experimental feature unused. We
leave a FIXME and we will open a new issue to track it.
2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak
213f2f6882 storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes
The `force_gossip_based_join` error injection does exactly what we
expect from `force-gossip-topology-changes` so we can do a simple
replacement.

We prefer a flag over an error injection because we will use it
a lot in CI jobs' configurations, some tests, manual testing etc.
It's much more convenient.

Moreover, the flag can be used in the release mode, so we re-enable
all tests that were disabled in release mode only because of using
the `force_gossip_based_join` error injection.

The name of the `force-gossip-topology-changes` flag suggests that
using it should always succesfully force the gossip-based topology
or, if forcing is not possible, the booting should fail. We don't
want a node with `force-gossip-topology-changes=true` that silently
boots in the raft-topology mode. We achieve it by throwing a
runtime error from `join_cluster` in two cases:
- the node is restarting in the cluster that is using raft topology
- the node is joining the cluster that is using raft topology
2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak
d6ee540efc storage_service: join_token_ring: fix finish_setup_after_join calls
The `topology_change_enabled` parameter of `finish_setup_after_join`
is used underneath to enable pulling raft topology snapshots in two
cases:
- when the node joins the cluster that uses the raft-based topology,
- when the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature is enabled.
The first case happens in the first changed call.
`_raft_experimental_topology` always equals true there. The second
call was incorrect as it could enable pulling snapshots before
SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES was enabled. It could cause
problems during rolling upgrade to 6.0. For more information see
07aba3abc4.
2024-04-25 14:33:21 +02:00
Yaron Kaikov
5e63f74984 [github] add action to verify PR tasks was completed
Adding another check to the PR summary will make it clear to the maintainer/reviewer if the developer explained about backport
2024-04-25 15:24:22 +03:00
Botond Dénes
aaa76d4c0e Merge 'Getting per-table snapshot size is racy wrt creating new snapshots' from Pavel Emelyanov
The API endpoint in question calls table::get_snapshot_detail() which just walks table/snapshots/ directory. This can clash with creating a new snapshot. Database-wide walk is guarded with snapshot-ctl's locking, so should the per-table API do

Closes scylladb/scylladb#18414

* github.com:scylladb/scylladb:
  snapshot: Get per-table snapshot size under snapshot lock
  snapshot: Move per-table snap API to other snapshot endpoints
2024-04-25 14:57:52 +03:00
Kefu Chai
e5b30ae2ad partition_version: do not rereference moved variable
in `partition_entry::apply_to_incomplete()`, we pass `*dst_snp` and
`std::move(dst_snp)` to build the capture variable list of a lambda,
but the order of evaluation of these variables are unspecified.
fortunately, we haven't run into any issues at this moment. but this
is not future-proof. so, let's avoid this by storing a reference
of the dereferenced smart pointer, and use it later on.

this issue is identified by clang-tidy:

```
/home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: warning: 'dst_snp' used after it was moved [bugprone-use-after-move]
  500 |             cur = partition_snapshot_row_cursor(s, *dst_snp),
      |                                                     ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:502:23: note: move occurred here
  502 |             dst_snp = std::move(dst_snp),
      |                       ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
  500 |             cur = partition_snapshot_row_cursor(s, *dst_snp),
      |                                                     ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: warning: 'src_snp' used after it was moved [bugprone-use-after-move]
  501 |             src_cur = partition_snapshot_row_cursor(s, *src_snp, can_move),
      |                                                         ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:504:23: note: move occurred here
  504 |             src_snp = std::move(src_snp),
      |                       ^
/home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
  501 |             src_cur = partition_snapshot_row_cursor(s, *src_snp, can_move),
      |                                                         ^
```

Fixes #18360
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18361
2024-04-25 14:57:52 +03:00
Pavel Emelyanov
8aaa09ee97 replica: Do not carry view concurrency semaphore pointer around
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 14:27:43 +03:00
Pavel Emelyanov
2ee7c41139 view: Get concurrency semaphore via database, not table
The _view_update_concurrency_sem field on database propagates itself via
keyspace config down to table config and view_update_generator then
grabs one via table:: helper. That's an overkil, view_update_generator
has direct reference on the database and can get this semaphore from
there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 14:25:57 +03:00
Pavel Emelyanov
3d8b572d96 view_update_generator: Mark mutate_MV() private
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 14:25:40 +03:00
Pavel Emelyanov
bc4552740f view: Move view_update_generator methods' code
Now when the two methods belong to another class, move the code itself
to db/view , where the class itself resides.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 14:24:20 +03:00
Pavel Emelyanov
c2bf6b43b2 view: Move table::generate_and_propagate_view_updates into view code
Similarly to populate_views() method, this one also naturally belongs to
view_update_generator class.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 14:20:06 +03:00
Pavel Emelyanov
670c7c925c view: Move table::populate_views() into view_update_generator class
The method in question has little to do with table, effectively it only
needs stats and consurrency semaphore. And the semaphore in question is
obtained from table indirectly, it really resides on database. On the
other hand, the method carries lots of bits from db::view, e.g. the
view_update_builder class, memory_usage_of() helper and a bit more.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 14:17:20 +03:00
Kefu Chai
e5bcea6718 docs: drop documents related to {,dclocal_}read_repair_chance
since "read_repair_chance" and "dclocal_read_repair_chance" are
removed, and not supported anymore. let's stop documenting them.

Refs #3502

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-25 17:15:27 +08:00
Kefu Chai
c323c93fa4 treewide: remove {dclocal_,}read_repair_chance options
dclocal_read_repair_chance and read_repair_chance have been removed
in Cassandra 3.11 and 4.x, see
https://issues.apache.org/jira/browse/CASSANDRA-13910.
if we expose the properties via DDL, Cassandra would fails to consume
the CQL statement to creating the table when performing migration
from Scylla to Cassandra 4.x, as the latter does not understand
these properties anymore.

currently the default values of `dc_local_read_repair_chance` and
`read_repair_chance` are both "0". so this is practically disabled,
unless user deliberately set them to a value greater than 0.

also, as a side effect, Cassandra 4.x has better support of
Python3. the cqlsh shipped along with Cassandra 3.11.16 only
supports python2.7, see
https://github.com/apache/cassandra/blob/cassandra-3.11.16/bin/cqlsh.py
it errors out if the system only provides python3 with the error
of

```
No appropriate python interpreter found.
```
but modern linux systems do not provide python2 anymore.

so, in this change, we deprecate these two options.

Fixes #3502
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-25 17:15:27 +08:00
Botond Dénes
ca26899c36 Merge 'sstable: large data handler needs to count range tombstones as rows' from Ferenc Szili
When issuing warnings about partitions with the number of rows above a configured threshold, the  large partitions handler does not take into consideration the number of range tombstone markers in the total rows count. This fix adds the number of range tombstone markers to the total number of rows and saves this total in system.large_partitions.rows (if it is above the threshold). It also adds a new column range_tombstones to the system.large_partitions table which only contains the number of range tombstone markers for the given partition.

This PR fixes the first part of issue #13968
It does not cover distinguishing between live and dead rows. A subsequent PR will handle that.

Closes scylladb/scylladb#18346

* github.com:scylladb/scylladb:
  sstables: add docs changes for system.large_partitions
  sstable: large data handler needs to count range tombstones as rows
2024-04-25 11:38:30 +03:00
Pavel Emelyanov
e97abfc473 tablets: Fix indentation after flat-hash-map patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18364
2024-04-25 11:36:37 +03:00
Kefu Chai
0b5a861961 build: cmake: reference build_mode with ${scylla_build_mode_${CMAKE_BUILD_TYPE}}
before this change, if we generate the building system with plain
`Ninja`, instead of `Ninja Multi-Config` using cmake, the build
fails, because `${scylla_build_mode_${CMAKE_BUILD_TYPE}}` is not
defined. so the profile used for building the rust library would be
"rust-", which does not match any of the profiles defined by
`Cargo.toml`.

in this change, we use `$CMAKE_BUILD_TYPE` instead of "$config". as
the former is defined for non-multi generator. while the latter
is. see https://cmake.org/cmake/help/latest/generator/Ninja%20Multi-Config.html

with this change, we are able to generate the building system properly
with the "Ninja" generator. if we just want to run some static analyzer
against the source tree or just want to build scylladb with a single
configuration, the "Ninja" generator is a good fit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18353
2024-04-25 10:51:54 +03:00
Pavel Emelyanov
ae4c1c44ec snapshot: Get per-table snapshot size under snapshot lock
Walking per-table snapshot directory without lock is racy. There's
snapshot-ctl locking that's used to get db-wide snapshot details, it
should be used to get per-table snapshot details too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 10:05:51 +03:00
Pavel Emelyanov
186b36165e snapshot: Move per-table snap API to other snapshot endpoints
So that they are collected in one place and to facilitate next patch
that's going to use snapshot-ctl for per-table API too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 10:05:01 +03:00
Anna Stuchlik
b5d256a991 doc: add Scylla Doctor to the docs
This commit adds the description and usage instructions of Scylla Doctor
to the "How to Report a ScyllaDB Problem" page.

Scylla Doctor replaces Health Check Report, so the description of
and references to the latter are removed with this commit.

Fixes https://github.com/scylladb/scylladb/issues/16276

Closes scylladb/scylladb#17617
2024-04-25 09:50:38 +03:00
Asias He
037bba0ca1 repair: Turn on off_strategy_updater for tablet repair
The off_strategy_updater is used during repair to update the automatic
off strategy timer so off_strategy compaction starts automatically only
after repair finishes. We still use off_strategy for tablets. So we
should still turn on the updater.

The update logic is used for vnode tables. We can share the code with
vnode table instead of copying, but since there is a possibility we
could disable off_strategy for tablets. We'd better postpone the code
sharing as follow ups. If later, we decide to disable off_strategy for
tablets, we can remove the updater for tablet.

Fixes #18196

Closes scylladb/scylladb#18266
2024-04-25 09:03:07 +03:00
Kamil Braun
3363f6e1e8 Merge 'Fix write failures during node replace with same IP with topology over raft' from Gleb
Currently a new node is marked as alive too late, after it is already
reported as a pending node. The patch series changes replace procedure
to be the same as what node_ops do: first stop reporting the IP of the
node that is being replaced as a natural replica for writes, then mark
the IP is alive, and only after that report the IP as a pending endpoint.

Fixes: scylladb/scylladb#17421

* 'gleb/17421-fix-v2' of github.com:scylladb/scylla-dev:
  test_replace_reuse_ip: add data plane load
  sync_raft_topology_nodes: make replace procedure similar to nodeops one
  storage_service: topology_coordinator: fix indentation after previous patch
  storage_service: topology coordinator: drop ring check in node_state::replacing state
2024-04-24 17:09:01 +02:00
Petr Gusev
bc98774f83 test_replace_reuse_ip: add data plane load
In this commit we enhance test_replace_reuse_ip
to reproduce #17421. We create a test table and run
insert queries on it while the first node is
being replaced. In this form the test fails
without the fix from the previous commit. Some
insert requests fail with [Unavailable exception]
"Cannot achieve consistency level for cl QUORUM...".
2024-04-24 16:59:24 +03:00
Gleb Natapov
4614fedd22 sync_raft_topology_nodes: make replace procedure similar to nodeops one
In replace-with-same-ip a new node calls gossiper.start_gossiping
from join_token_ring with the 'advertise' parameter set to false.
This means that this node will fail echo RPC-s from other nodes,
making it appear as not alive to them. The node changes this only
in storage_service::join_node_response_handler, when the topology
coordinator notifies it that it's actually allowed to join the
cluster. The node calls _gossiper.advertise_to_nodes({}), and
only from this moment other nodes can see it as alive.

The problem is that topology coordinator sends this notification
in topology::transition_state::join_group0 state. In this state
nodes of the cluster already see the new node as pending,
they react with calling tmpr->add_replacing_endpoint and
update_topology_change_info when they process the corresponding
raft notification in sync_raft_topology_nodes. When the new
token_metadata is published, assure_sufficient_live_nodes
sees the new node in pending_endpoints. All of this happen
before the new node handled successful join notification,
so it's not alive yet. Suppose we had a cluster with three
nodes and we're replacing on them with a fourth node.
For cl=qurum assure_sufficient_live_nodes throws if
live < need + pending, which in our case becomes 2 < 2 + 1.
The end effect is that during replace-with-same-ip
data plane requests can fail with unavailable_exception,
breaking availability.

The patch makes boot procedure more similar to node ops one.
It splits the marking of a node as "being replaced" and adding it to
pending set in to different steps and marks it as alive in the middle.
So when the node is in topology::transition_state::join_group0 state
it marked as "being replaced" which means it will no longer be used for
reads and writes. Then, in the next state, new node is marked as alive and
is added to pending list.

fixes scylladb/scylladb#17421
2024-04-24 16:59:22 +03:00
Kamil Braun
1297b9a322 mutation: mutation_by_size_splitter: skip last mutation if it's empty
Currently, the last mutation emitted by split_mutation could be empty.
It can happen as follows:
- consume range tombstone change at pos `1` with some timestamp
- consume clustering row at pos `2`
- flush: this will create mutation with range tombstone (1, 2) and
  clustering row at 2
- consume range tombstone change at pos `2` with no timestamp (i.e.
  closing rtc)
- end of partition

since the closing rtc has the same position as the clustering row, no
additional range tombstone will be emitted -- the only necessary range
tombstone was already emitted in the previous mutation.

On the other hand, `test_split_mutations` expects all emitted mutations
to be non-empty, which is a sane expectation for this function.

The test catched a case like this with random-seed=629157129.

Fix this by skipping the last mutation if it turns out to be empty.

Fixes: scylladb/scylladb#18042

Closes scylladb/scylladb#18375
2024-04-24 16:25:31 +03:00
Raphael S. Carvalho
71682aebdd storage_service: Fix use-after-move in storage_service::node_ops_cmd_handler
```
service/storage_service.cc:4288:62: warning: 'req' used after it was moved [bugprone-use-after-move]
            node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable {
                                                             ^
service/storage_service.cc:4288:107: note: move occurred here
            node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable {
                                                                                                          ^
service/storage_service.cc:4288:62: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
            node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable {
                                                             ^

```

if evaluation order is right-to-left (GCC), req is moved first, and req.ignore_nodes will be empty,
so nodes that should be ignored will still be considered, potentially resulting in a failure during
replace.

https://godbolt.org/z/jPcM6GEx1

courtesy of clang-tidy.

Fixes #18324.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18366
2024-04-24 15:36:28 +03:00
Aleksandra Martyniuk
06f6aaf2cf test: check default value of tombstone_gc
Add a test which checks whether default tombstone_gc value is properly
set and if it does not override previous setting.
2024-04-24 10:57:51 +02:00
Aleksandra Martyniuk
e0d498716a test: topology: move some functions to util.py
Move functions marked with asynccontextmanager from test/topology/test_mv.py
to test/topology/util.py so that they can be used in other tests.
2024-04-24 10:57:51 +02:00
Aleksandra Martyniuk
58f72f9019 cql3: statements: change default tombstone_gc mode for tablets
Currently, if tombstone_gc mode isn't specified for a table,
then "timeout" is used by default. With tablets, running
"nodetool repair -pr" may miss a tablet if it migrated across
the nodes. Then, if we expire tombstones for ranges that
weren't repaired, we may get data resurrection.

Set default tombstone_gc mode value for DDLs that don't
specify it. It's set to "repair" for tables which use tablets
unless they use local replication strategy or rf = 1.
Otherwise it's set to "timeout".
2024-04-24 10:42:10 +02:00
Kamil Braun
8876b9b0ef test/pylib: random_tables: use IF NOT EXISTS when creating keyspace
Due to Python driver's unexpected behavior, "CREATE KEYSPACE" statement
may sometimes get executed twice (scylladb/python-driver#317), leading
to "Keyspace ... already exists" error in our tests
(scylladb/scylladb#17654). Work around this by using "IF NOT EXISTS".

Fixes: scylladb/scylladb#17654

Closes scylladb/scylladb#18368
2024-04-24 10:09:26 +03:00
Pavel Emelyanov
1b1b86809d view-builder: Coroutinize stop()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-23 20:43:42 +03:00
Pavel Emelyanov
eaf78fca04 view_builder: Do not try to handle step join exceptions on stop
Commit 23c891923e (main: make sure view_builder doesn't propagate
semaphore errors) ignored some exceptions that could pop up from the
_build_step/do_build_step() serialized action, since they are "benign"
on stop.

Later there came b56b10a4bb (view_builder: do_build_step: handle
unexpected exceptions) that plugged any exception from the action in
question, regardless of they happen on stop or run-time.

Apparently, the latter commit supersedes  the former.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-23 20:26:14 +03:00
Anna Stuchlik
c0e4f3e646 doc: include OSS-specific info as separate files
This commit excludes OSS-specific links and content
added in https://github.com/scylladb/scylladb/pull/17624
to separate files and adds the include directive `.. scylladb_include_flag::`
to include these files in the doc source files.

Reason: Adding the link to the Open Source upgrade guide
(/upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology)
breaks the Enterprise documentation because the Enterprise docs don't
contain that upgrade guide.  We must add separate files for OSS and
Enterprise to prevent failing the Enterprise build and breaking the
links.

Closes scylladb/scylladb#18372
2024-04-23 16:59:05 +02:00
Raphael S. Carvalho
fa2dc5aefa sstables: Fix use-after-move in an error path of FS-based sstable writer
```
sstables/storage.cc:152:21: warning: 'file_path' used after it was moved [bugprone-use-after-move]
        remove_file(file_path).get();
                    ^
sstables/storage.cc:145:64: note: move occurred here
    auto w = file_writer(output_stream<char>(std::move(sink)), std::move(file_path));

```

It's a regression when TOC is found for a new sstable, and we try to delete temporary TOC.

courtesy of clang-tidy.

Fixes #18323.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18367
2024-04-23 17:19:55 +03:00
Pavel Emelyanov
f5f57dc817 table: No need to open directory in snapshot_exists()
In order to check if a snapshot of a certain name exists the checking
method opens directory. It can be made with more lightweight call.

Also, though not critical, is that it fogets to close it.

Coroutinuze the method while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18365
2024-04-23 17:19:24 +03:00
Botond Dénes
572003c469 Merge 'Cleanup the way snapshot details are propagated via API' from Pavel Emelyanov
There's a database::get_snapshot_details() method that returns collection of all snapshots for all ks.cf out there and there are several *snapshot_details* aux structures around it. This PR keeps only one "details" and cleans up the way it propagates from database up to the respective API calls.

Closes scylladb/scylladb#18317

* github.com:scylladb/scylladb:
  snapshot_ctl: Brush up true_snapshots_size() internals
  snapshot_ctl: Remove unused details struct
  snapshot_ctl: No double recoding of details
  database,snapshots: Move database::snapshot_details into snapshot_ctl
  database,snapshots: Make database::get_snapshot_details() return map, not vector
  table,snapshots: Move table::snapshot_details into snapshot_ctl
2024-04-23 16:28:25 +03:00
Kefu Chai
9e8805bb49 repair, transport: s/get0()/get()/
`future::get0()` was deprecated in favor of `future::get()`. so
let's use the latter instead. this change silences a `-Wdeprecated`
warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18357
2024-04-23 15:48:54 +03:00
Kefu Chai
4fd9b2a791 reader: silence false-positive use-after-move warning
when compiling with clang-tidy, it warngs:
```
[6/9] Building CXX object readers/CMakeFiles/readers.dir/multishard.cc.o
/home/kefu/dev/scylladb/readers/multishard.cc:84:53: warning: 'fut_and_result' used after it was moved [bugprone-use-after-move]
   84 |                 auto result = std::get<1>(std::move(fut_and_result));
      |                                                     ^
/home/kefu/dev/scylladb/readers/multishard.cc:79:34: note: move occurred here
   79 |             _read_ahead_future = std::get<0>(std::move(fut_and_result));
      |                                  ^
```

but this warning is but a false alarm, as we are not really moving away
the *whole* tuple, we are just move away an element from it. but
clang-tidy cannot tell which element we are actually moving. so, silence
both places of `std::move()`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18363
2024-04-23 15:47:50 +03:00
Botond Dénes
5a1e3b25d0 Merge 'Sanitize sstables::directory_semaphore usage' from Pavel Emelyanov
The semaphore in question is used to limit parallelism of manipulations with table's sstables. It's currently used in two places -- sstable_directory (mainly on boot) and by table::take_snapshot() to take snapshot. For the latter, there's also a database -> sharded<directory_semaphore> reference.

This PR sanitizes the semaphore usage. The results are
- directory_semaphore no longer needs to friend several classes that mess with its internals
- database no longer references directory_semaphore

Closes scylladb/scylladb#18281

* github.com:scylladb/scylladb:
  database: Keep local directory_semaphore to initialize sstables managers
  database: Don't reference directory_semaphore
  table: Use directory semaphore from sstables manager
  table: Indentation fix after previous patch
  table: Use directory_semaphore for rate-limited snapshot taking
  sstables: Move directory_semaphore::parallel_for_each() to header
  sstables: Move parallel_for_each_restricted to directory_semaphore
  table: Use smp::all_cpus() to iterate over all CPUs locally
2024-04-23 13:54:52 +03:00
Kefu Chai
ab4de1f470 auth: move fmt::formatter<auth::resource_kind> up
before this change, `fmt::formatter<auth::resource_kind>` is located at
line 250 in this file, but it is used at line 130. so, {fmt} is not able
to find it:

```
/usr/include/fmt/core.h:2593:45: error: implicit instantiation of undefined template 'fmt::detail::type_is_unformattable_for<auth::resource_kind, char>'
 2593 |     type_is_unformattable_for<T, char_type> _;
      |                                             ^
/usr/include/fmt/core.h:2656:23: note: in instantiation of function template specialization 'fmt::detail::parse_format_specs<auth::resource_kind, fmt::detail::compile_parse_context<char>>' requested here
 2656 |         parse_funcs_{&parse_format_specs<Args, parse_context_type>...} {}
      |                       ^
/usr/include/fmt/core.h:2787:47: note: in instantiation of member function 'fmt::detail::format_string_checker<char, auth::resource_kind, auth::resource_kind>::format_string_checker' requested here
 2787 |       detail::parse_format_string<true>(str_, checker(s));
      |                                               ^
/home/kefu/dev/scylladb/auth/resource.hh:130:29: note: in instantiation of function template specialization 'fmt::basic_format_string<char, auth::resource_kind &, auth::resource_kind &>::basic_format_string<char[65], 0>' requested here
  130 |             seastar::format("This resource has kind '{}', but was expected to have kind '{}'.", actual, expected)) {
      |                             ^
/usr/include/fmt/core.h:1578:45: note: template is declared here
 1578 | template <typename T, typename Char> struct type_is_unformattable_for;
      |                                             ^
```

in this change, `fmt::formatter<auth::resource_kind>` is moved up to
where `auth::resource_kind` is defined. so that it can be used by its
caller.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18316
2024-04-23 12:11:17 +03:00
Kefu Chai
48048c2f94 utils/to_string: include fmt/std.h if fmt >= v10
in to_string.hh, we define the specialization of
`fmt::formatter<std::optional<T>>`, which is available in {fmt} v10
and up. to avoid conditionally including `utils/to_string.hh` and
`fmt/std.h` in all source files formatting `std::optional<T>` using
{fmt}, let's include `fmt/std.h` if {fmt}'s verison is greater or equal
to 10. in future, we should drop the specialization and use `fmt/std.h`
directly.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18325
2024-04-23 12:09:05 +03:00
Kefu Chai
e2d5054c53 types: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18326
2024-04-23 12:08:23 +03:00
Pavel Emelyanov
4445ee9a55 Merge 'install-dependencies.sh: add more dependencies for debian' from Kefu Chai
in this changeset, we install `libxxhash-dev` and `cargo` for debian, and install cxxbridge for all distros, so that at least debian can be built without further preparations after running `install-dependencies.sh`.

Closes scylladb/scylladb#18335

* github.com:scylladb/scylladb:
  install-dependencies.sh: move cargo out of fedora branch
  install-dependencies: install cargo and wabt for debian
  install-dependencies.sh: add libxxhash-dev for debian
2024-04-23 12:04:47 +03:00
Lakshmi Narayanan Sreethar
de6570e1ec serializer_impl, sstables: fix build failure due to missing includes
When building scylla with cmake, it fails due to missing includes in
serializer_impl.hh and sstables/compress.hh files. Fix that by adding
the appropriate include files.

Fixes #18343

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#18344
2024-04-23 12:03:51 +03:00
Kefu Chai
826f413cad thrift: avoid use-after-move in make_non_overlapping_ranges()
in handler.cc, `make_non_overlapping_ranges()` references a moved
instance of `ColumnSlice` when something unexpected happens to
format the error message in an exception, the move constructor of
`ColumnSlice` is default-generated, so the members' move constructors
are used to construct the new instance in the move constructor. this
could lead to undefined behavior when dereferencing the move instance.

in this change, in order to avoid use-after free, let's keep
a copy of the referenced member variables and reference them when
formatting error message in the exception.

this use-after-move issue was introduced in 822a315dfa, which implemented
`get_multi_slice` verb and this piece in the first place. since both 5.2
and 5.4 include this commit, we should backport this change to them.

Refs 822a315dfa
Fixes #18356
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18358
2024-04-23 12:02:09 +03:00
Kefu Chai
ad2c26824a main: do not reference moved variable
before this change, we dereference `linfo` after moving it away.
and clang-tidy warns us like

```
[19/171] Building CXX object CMakeFiles/scylla.dir/main.cc.o
/home/kefu/dev/scylladb/main.cc:559:12: warning: 'linfo' used after it was moved [bugprone-use-after-move]
  559 |     return linfo.host_id;
      |            ^
/home/kefu/dev/scylladb/main.cc:558:36: note: move occurred here
  558 |     sys_ks.local().save_local_info(std::move(linfo), snitch.local()->get_location(), broadcast_address, broadcast_rpc_address).get();
      |                                    ^
```

the default-generated move constructor of `local_info` uses the
default-generated move constructor of `locator::host_id`, which in turn
use the default-generated move constructor of
`utils::tagged_uuid<struct host_id_tag>`, and then `utils::UUID` 's
move constructor. since `UUID` does not contain any moveable resources,
what it has is but two `int64_t` member variables. so this is a benign
issue. but still, it is distracting.

in this change, we keep the value of `host_id` locally, and return it
instead to silence this warning, and to improve the maintainability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18362
2024-04-23 11:58:58 +03:00
Patryk Jędrzejczak
14911051ee db: config: introduce force-gossip-topology-changes
We are going to make the `consistent-topology-changes` experimental
feature unused in 6.0. However, the topology upgrade procedure will
be manual and voluntary, so some 6.0 clusters will be using the
gossip-based topology. Therefore, we need to continue testing the
gossip-based topology. The solution is introducing a new flag,
`force-gossip-topology-changes`, that will enforce the gossip-based
topology in a fresh cluster.

In this patch, we only introduce the parameter without any effect.
Here is the explanation. Making `consistent-topology-changes` unused
and introducing `force-gossip-topology-changes` requires adjustments
in scylla-dtest. We want to merge changes to scylladb and scylla-dtest
in a way that ensures all tests are run correctly during the whole
process. If we merged all changes to scylladb first, before merging
the scylla-dtest changes, all tests would run with the raft-based
topology and the ones excluded in the raft-based topology would fail.
We also can't merge all changes to scylla-dtest first. However, we
can follow this plan:
1. scylladb: merge this patch
2. scylla-dtest: start using `force-gossip-topology-changes`
   in jobs that run without the raft-based topology
3. scylladb: merge the rest of the changes
4. scylla-dtest: merge the rest of the changes

Ref scylladb/scylladb#17802

Closes scylladb/scylladb#18284
2024-04-23 09:42:46 +02:00
Botond Dénes
275ed9a9bc replica/mutation_dump: create_underlying_mutation_sources(): remove false move
transformed_cr is moved in a loop, in each iteration. This is harmless
because the variable is const and the move has no effect, yet it is
confusing to readers and triggers false positives in clang-tidy
(moved-from object reused). Remove it.

Fixes: #18322

Closes scylladb/scylladb#18348
2024-04-23 01:21:36 +02:00
Kamil Braun
e9285e5c04 Merge 'various fixes for topology coordinator' from Gleb
The series contains fixes for some problems found during scalability
testing and one clean up patch.

Ref: scylladb/scylladb#17545

* 'gleb/topology-fixes-v4' of github.com:scylladb/scylla-dev:
  gossiper: disable status check for endpoints in raft mode
  storage_service: introduce a setter for topology_change_kind
  topology coordinator: drop unused structure
  storage_service: yield in get_system_mutations
2024-04-22 17:37:47 +02:00
Calle Wilund
82d97da3e0 commitlog: Remove (benign) use-after-move
Fixes #18329

named_file::assign call uses old object "known_size" after a move
of the object. While this is wholly ok, since the attribute accessed
will not be modified/destroyed by the move, it causes warnings in
"tidy" runs, and might confuse or cause real errors should impl. change.

Closes scylladb/scylladb#18337
2024-04-22 17:20:19 +03:00
Ferenc Szili
c528597a84 sstables: add docs changes for system.large_partitions
This commit updates the documentation changes for the new column
range_tombstones in system.large_partitions
2024-04-22 15:25:41 +02:00
Ferenc Szili
98bec4e02a sstable: large data handler needs to count range tombstones as rows
When issuing warnings about partitions with the number of rows above a configured threshold,
the  large partitions handler does not take into consideration the number of range tombstone
markers in the total rows count. This fix adds the number of range tombstone markers to the
total number of rows and saves this total in system.large_partitions.rows (if it is above
the threshold). It also adds a new column range_tombstones to the system.large_partitions
table which only contains the number of range tombstone markers for the given partition.

This PR fixes the first part of issue #13968
It does not cover distinguishing between live and dead rows. A subsequent PR will handle that.
2024-04-22 15:24:18 +02:00
Kefu Chai
ff04375016 main: drop unused namespace alias
`fs` namespace alias was introduced in ff4d8b6e85, but we don't
use it anymore. so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18308
2024-04-22 13:50:28 +03:00
Nadav Har'El
59b40484c8 Update seastar submodule
* seastar 8fabb30a...2b43417d (6):
  > future: deprecate future::get0()
  > build: do not export valgrind with export()
  > http: deprecate buggy path param[]
  > http/request: add get_path_param method
  > http/request: get_query_param refactor
  > http/util: add path_decode method

Refs #5883 (fixes https://github.com/scylladb/seastar/issues/725 and
provides a new API to read the decoded paths).

Closes scylladb/scylladb#18297
2024-04-22 11:12:49 +03:00
Kefu Chai
85406a450c install-dependencies.sh: move cargo out of fedora branch
so that we install cxxbridge-cmd on all distros, and cxxbridge is
available when building scylladb.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-22 15:41:20 +08:00
Kefu Chai
835742af6d install-dependencies: install cargo and wabt for debian
cargo is used for installing cxxbridge-cmd, which is in turn used
when building the cxx bindings for the rust modules. so we need it
on all distros.

in this change, we add cargo for debian. so that we don't have
build failure like:

```
CMake Error at rust/CMakeLists.txt:32 (find_program):
  Could not find CXXBRIDGE using the following names: cxxbridge
```

for similar reason, we also need wabt, which provides wasm2wat,
without which, we'd have

```
CMake Error at test/resource/wasm/CMakeLists.txt:1 (find_program):
  Could not find WASM2WAT using the following names: wasm2wat
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-22 15:41:20 +08:00
Kefu Chai
a70a288627 install-dependencies.sh: add libxxhash-dev for debian
libxxhash is used for building on both fedora and debian. `xxhash-devel`
is already listed in `fedora_packages`, we should have its counterpart
in `debian_base_packages`. otherwise the build on debian and its
derivatives could fail like

```
CMake Error at /usr/local/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find xxHash (missing: xxhash_LIBRARY xxhash_INCLUDE_DIR) (found
  version "")
Call Stack (most recent call first):
  /usr/local/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  cmake/FindxxHash.cmake:30 (find_package_handle_standard_args)
  CMakeLists.txt:75 (find_package)
```

if we are using CMake to generate the building system. if we use
`configure.py` to generate `build.ninja`, the build would fails at
build time.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-22 15:22:51 +08:00
Gleb Natapov
0c77e96b0b storage_service: topology_coordinator: fix indentation after previous patch 2024-04-21 18:53:21 +03:00
Gleb Natapov
b8ee8911ca storage_service: topology coordinator: drop ring check in node_state::replacing state
Always modify topology metadata in node_state::replacing state. There is
no dependency on the ring value at all.
2024-04-21 18:53:04 +03:00
Gleb Natapov
06e6ed09ed gossiper: disable status check for endpoints in raft mode
Gossiper automatically removes endpoints that do not have tokens in
normal state and either do not send gossiper updates or are dead for a
long time. We do not need this with topology coordinator mode since in
this mode the coordinator is responsible to manage the set of nodes in
the cluster. In addition the patch disables quarantined endpoint
maintenance in gossiper in raft mode and uses left node list from the
topology coordinator to ignore updates for nodes that are no longer part
of the topology.
2024-04-21 16:36:07 +03:00
Gleb Natapov
0e3f92fa49 storage_service: introduce a setter for topology_change_kind
In the next patch we will extend it to have other side affects.
2024-04-21 16:36:07 +03:00
Gleb Natapov
040c6ca0c1 topology coordinator: drop unused structure 2024-04-21 16:36:07 +03:00
Gleb Natapov
d0a00f3489 storage_service: yield in get_system_mutations
Yield in a loop that converts a result to canonical_mutation. We
observed stalls for very large tables.
2024-04-21 16:36:07 +03:00
Avi Kivity
87b08c957f Merge 'treewide: drop FMT_DEPRECATED_OSTREAM macro and homebrew range formatters' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<.
with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Closes scylladb/scylladb#17968

* github.com:scylladb/scylladb:
  treewide: do not define FMT_DEPRECATED_OSTREAM
  treewide: include fmt/ranges.h and/or fmt/std.h
  utils/managed_bytes: add support for fmt::to_string() to bytes and friends
2024-04-20 22:25:00 +03:00
Mikołaj Grzebieluch
65cfb9b4e0 storage_service: skip wait_for_gossip_to_settle if topology changes are based on raft
Waiting for gossip to settle slows down the bootstrap of the cluster.
It is safe to disable it if the topology is based on Raft.

Fixes scylladb/scylladb#16055

Closes scylladb/scylladb#17960
2024-04-20 17:56:51 +02:00
Pavel Emelyanov
67a408447f snapshot_ctl: Brush up true_snapshots_size() internals
Previous patches broke indentation in this method. Fix it by shortening
the summation loop with the help of std::accumulate()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 21:06:06 +03:00
Pavel Emelyanov
50add3314d snapshot_ctl: Remove unused details struct
Now the details are manipulated via some other structs and this one can
just be removed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 20:04:34 +03:00
Pavel Emelyanov
e8f10be12e snapshot_ctl: No double recoding of details
Currently database::get_snapshot_details() returns a collection of
snapshots. The snapshot_ctl converts this collection into similarly
looking one with slightly different structures inside. The resulting
collection is converted one more time on the API layer into another
similarly looking map.

This patch removes the intermediate conversion.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 20:04:32 +03:00
Pavel Emelyanov
8ec3f057a8 database,snapshots: Move database::snapshot_details into snapshot_ctl
Similarly to how it looks like for table::snapshot_details

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 20:04:29 +03:00
Pavel Emelyanov
f6bc283bbb database,snapshots: Make database::get_snapshot_details() return map, not vector
So that it's in-sync with table::get_snapshot_details(). Next patches
will improve this place even further.

Also, there can be many snapshots and vector can grow large, but that's
less of an issue here.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 20:04:25 +03:00
Pavel Emelyanov
a36c13beb3 table,snapshots: Move table::snapshot_details into snapshot_ctl
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 19:59:34 +03:00
Kefu Chai
372a4d1b79 treewide: do not define FMT_DEPRECATED_OSTREAM
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.

in this change,

* utils: drop the range formatters in to_string.hh and to_string.c, as
  we don't use them anymore. and the tests for them in
  test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
  we are not able to print the elements using operator<< anymore
  after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
  due to a bug in {fmt} v9, {fmt} fails to format a range whose
  element type is `basic_sstring<uint8_t>`, as it considers it
  as a string-like type, but `basic_sstring<uint8_t>`'s char type
  is signed char, not char. this issue does not exist in {fmt} v10,
  so, in this change, we add a workaround to explicitly specialize
  the type trait to assure that {fmt} format this type using its
  `fmt::formatter` specialization instead of trying to format it
  as a string. also, {fmt}'s generic ranges formatter calls the
  pair formatter's `set_brackets()` and `set_separator()` methods
  when printing the range, but operator<< based formatter does not
  provide these method, we have to include this change in the change
  switching to {fmt}, otherwise the change specializing
  `fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
  for comparing values. but without the operator<< based formatters,
  Boost.Test would not be able to print them. after removing
  the homebrew formatters, we need to use the generic
  `boost_test_print_type()` helper to do this job. so we are
  including `test_utils.hh` in tests so that we can print
  the formattable types.
* treewide: add "#include "utils/to_string.hh" where
  `fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:57:36 +08:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kefu Chai
01f13850cb utils/managed_bytes: add support for fmt::to_string() to bytes and friends
in 3835ebfcdc, `fmt::formatter` were added to `bytes` and friend, but
their `format()` methods were intentionally implemented as plain
methods, which only acccept `fmt::format_context`. it was a decision
decision. the intention was to reduce the usage of template, to speed
up the compilation at the expense of dropping the support of other
appenders, notably the one used by `fmt::to_string()`, where the type
of "format_context" is not a `fmt::format_context`, but a string
appender. but it turns out we still have users in tests using
`fmt::to_string()`, to convert, for instance, `bytes` to `std::string`,

so, to make their life easier, we add the templated `format()` to
these types. an alternative is to change the callers to use something
like `fmt::format("{}", v)`, which is less convenient though.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:13 +08:00
Kefu Chai
5ab527e669 main: do not echo parsed options when calling scylla interactively
in 2f0f53ac, we added logging of parsed command line options so that we
can see how scylla is launched in case it fails to boot. but when scylla
is called interactively in console. this echo is a little bit annoying.
see following console session
```console
$ scylla --help-loggers
Scylla version 5.5.0~dev-0.20240419.3c9651adf297 with build-id 7dd6a110e608535e5c259a03548eda6517ab4bde starting ...
command used: "./RelWithDebInfo/scylla --help-loggers"
pid: 996503
parsed command line options: [help-loggers]
Available loggers:
    BatchStatement
    LeveledManifest
    alter_keyspace
    alter_table
...
```

so in this change, we check if the stdin is associated with a terminal
device, if that the case, we don't print the scylla version, parsed
command line and pid. and the interactive session looks like:

```console
$ scylla --help-loggers
Available loggers:
    BatchStatement
    LeveledManifest
    alter_keyspace
    alter_table
```
no more distracting information printed. the original behavior
can be tested like:

```console
$ : | ./RelWithDebInfo/scylla --help-loggers
```

assuming scylla is always launched with systemd, which connects
stdin to /dev/null. see
https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Logging%20and%20Standard%20Input/Output
. so this behavior is preserved with this change.

Refs #4203

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18309
2024-04-19 15:00:05 +03:00
Raphael S. Carvalho
223214439b compaction: Disconsider active tables in the hourly compaction reevaluation
This hourly reevaluation is there to help tablets that have very low
write activity, which can go a long time without flushing a memtable,
and it's important to reevaluate compaction as data can get expired.
Today it can happen that we reevaluate a table that is being compacted
actively, which is waste of cpu as the reevaluation will happen anyway
when there are changes to sstable set. This waste can be amplified with
a significant tablet count in a given shard.
Eventually, we could make the revaluation time per table based on
expiration histogram, but until we get there, let's avoid this waste
by only reevaluating tables that are compaction idle for more than 1h.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18280
2024-04-19 14:33:40 +03:00
Pavel Emelyanov
ba58b71eea database: Keep local directory_semaphore to initialize sstables managers
Now database is constructed with sharded<directory_semaphore>, but it no
longer needs sharded, local is enough.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Pavel Emelyanov
53909da390 database: Don't reference directory_semaphore
It was only used by table taking snapshot code. Now it uses sstables
manager's reference and database can stop carrying it around.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Pavel Emelyanov
be5bc38cde table: Use directory semaphore from sstables manager
It's natural for a table to itarate over its sstables, get the semaphore
from the manager of sstables, not from database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Pavel Emelyanov
7e7dd2649b table: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Pavel Emelyanov
2fced3c557 table: Use directory_semaphore for rate-limited snapshot taking
The table::take_snapshot() limits its parallelizm with the help of
direcoty semaphore already, but implements it "by hand". There's already
parallel_for_each() method on the dir.sem. class that does exactly that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Pavel Emelyanov
6514c67fae sstables: Move directory_semaphore::parallel_for_each() to header
It's a template and in order to use it in other .cc files it's more
convenient to move it into a header file

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Pavel Emelyanov
ad1a9d4c11 sstables: Move parallel_for_each_restricted to directory_semaphore
In order not to make sstable_directory mess with private members of this
class. Next patch will also make use of this new method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Pavel Emelyanov
0d2178202d table: Use smp::all_cpus() to iterate over all CPUs locally
Currently it uses irange(0, smp::count0), but seastar provides
convenient helper call for the very same range object.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Kefu Chai
a5dae74aee doc: update nodetool setlogginglevel sample output with most recent loggers list
in order to reduce the confusion like:

> I cannot find foobar in the list, is it supported?

also, take this opportunity to use "console" instead of "shell" for
rendering the code block. it's a better fit in this case. since we
are using pygment for syntax highlighting,
see https://pygments.org/docs/lexers/#pygments.lexers.shell.BashSessionLexer
for details on the "console" lexer.

and add a prompt before the command line, so that "console" lexer
can render the command line and output better.

also, add a note explaining that user should refer the output of
`scylla` to see the list of logger classes.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18311
2024-04-19 13:25:39 +03:00
Kefu Chai
c04654e865 storage_service: capture this explicitly
clang-19 complains with `-Wdeprecated-this-capture`:

```
/home/kefu/dev/scylladb/service/storage_service.cc:5837:22: error: implicit capture of 'this' with a capture default of '=' is deprecated [-Werror,-Wdeprecated-this-capture]
 5837 |         auto* node = get_token_metadata().get_topology().find_node(dst.host);
      |                      ^
/home/kefu/dev/scylladb/service/storage_service.cc:5830:44: note: add an explicit capture of 'this' to capture '*this' by reference
 5830 |     co_await transit_tablet(table, token, [=] (const locator::tablet_map& tmap, api::timestamp_type write_timestamp) {
      |                                            ^
      |                                             , this
```

since https://open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0806r2.html
was approved, see https://eel.is/c++draft/depr.capture.this. and newer
versions of C++ compilers implemented it, so we need to capture `this`
explicitly to be more standard compliant, and to be more future-proof
in this regard.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18306
2024-04-19 10:05:57 +03:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00
Avi Kivity
6e487a49aa Merge 'toolchain: support building an optimized clang' from Takuya ASADA
This is complete version of #12786, since I take over the issue from @mykaul.
Update from original version are:
- Support ARM64 build (disable BOLT for now since it doesn't functioning)
- Changed toolchain settings to get current scylla able to build (support WASM, etc)
- Stop git clone scylladb repo, create git-archive of current scylla directory and import it
-  Update Clang to 17.0.6
- Save entire clang directory for BUILD mode, not just /usr/bin/clang binary
- Implemented INSTALL_PREBUILT mode to install prebuilt image which built in BUILD mode

Note that this patch drops cross-build support of frozen toolchain, since building clang and scylla multiple time in qemu-user-static will very slow, it's not usable.
Instead, we should build the image for each architecture natively.

----

This is a different way attempting to combine building an optimized clang (using LTO, PGO and BOLT, based on compiling ScyllaDB) to dbuild. Per Avi's request, there are 3 options: skip this phase (which is the current default), build it and build + install it to the default path.

Fixes: #10985
Fixes: scylladb/scylla-enterprise#2539

Closes scylladb/scylladb#17196

* github.com:scylladb/scylladb:
  toolchain: support building an optimized clang
  configure.py: add --build-dir option
2024-04-18 19:20:23 +00:00
Anna Stuchlik
a3481a4566 doc: document the system_auth_v2 feature
This commit includes updates related to replacing system_auth with system_auth_v2.

- The keyspace name system_auth is renamed to system_auth_v2.
- The procedures are updated to account for system_auth_v2.
- No longer required system_auth RF changes are removed from procedures.
- The information is added that if the consistent topology updates feature
  was not enabled upon upgrade from 5.4, there are limitations or additional
  steps to do (depending on the procedure).
  The files with that kind of information are to be found in _common folders
  and included as needed.
- The upgrade guide has been updated to reflect system_auth_v2 and related impacts.

Closes scylladb/scylladb#18077
2024-04-18 18:33:49 +02:00
Kefu Chai
21b03d2ce3 topology_coordinator: remove unused variable
when compiling the tree with clang-19, it complains:

```
/home/kefu/dev/scylladb/service/topology_coordinator.cc:1968:31: error: variable 'reject' set but not used [-Werror,-Wunused-but-set-variable]
 1968 |                     if (auto* reject = std::get_if<join_node_response_params::rejected>(&validation_result)) {
      |                               ^
1 error generated.
```
so, despite that we evaluate the assignment statement to see it
evaluates to true or false, the compiler still believes that the
variable is not used. probably, the value of the statement is not
dependent on the value of the value being assigned. either way,
let's use `std::holds_alternative<..>` instead of `std::get_if<..>`,
to silence this warning, and the code is a little bit more compacted,
in the sense of less tokens in the `if` statement.

in order to be self-contained, we take the opportunity to
include `<variant>` in this source file, as a function declared
in this header is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18291
2024-04-18 18:04:56 +03:00
Amnon Heiman
e8410848a8 Update seastar submodule
This patch updates the seastar submodule to get the latest safety patch for the metric layer.
The latest patch allows manipulating metric_families early in the
start-up process and is safer if someone chooses to aggregate summaries.

* seastar f3058414...8fabb30a (4):
  > stall-analyser: improve stall pattern matching
  > TLS: Move background BYE handshake to engine::run_in_background
  > metrics.cc: Safer set_metric_family_configs
  > src/core/metrics.cc: handle SUMMARY add operator

Closes scylladb/scylladb#18293
2024-04-18 18:02:28 +03:00
Tomasz Grabiec
393cb54c01 Merge 'Generalize tablet transition API calls' from Pavel Emelyanov
Recently there had been added add_tablet_replica and del_tablet_replica API calls that copy big portion of the existing move_tablet API call's logic. This PR generalizes the common parts

Closes scylladb/scylladb#18272

* github.com:scylladb/scylladb:
  tablets: Generalize transition mutations preparation
  tablets: Generalize tablet-already-in-transition check
  tablets: Generalize raft communications for tablet transition API calls
  tablets: Drop src vs dst equality check from move_tablet()
2024-04-18 14:42:10 +02:00
Anna Stuchlik
ad81f9f56a doc: replace Scylla with ScyllaDB in Glossary
This commit replaces "Scylla" with "ScyllaDB" on the Glossary page.

The product has been rebranded as "ScyllaDB".

Closes scylladb/scylladb#18296
2024-04-18 14:59:23 +03:00
Kamil Braun
9c2a836607 Revert "Merge 'Drain view_builder in generic drain' from ScyllaDB"
This reverts commit 298a7fcbf2, reversing
changes made to 5cf53e670d.

The change made CI flaky.

Fixes: scylladb/scylladb#18278
2024-04-18 11:50:41 +02:00
Aleksandr Bykov
e8833c6f2a test: Kill coordinator during topology operation
If coordinator node was killed, restarted, become not
operatable during topology operation, new coordinator should be elected,
operation should be aborted and cluster should be rolled back

Error injection will be used to kill the coordinator before streaming
starts

Closes scylladb/scylladb#16197
2024-04-17 17:24:20 +02:00
Tomasz Grabiec
c6c8347493 migration_manager: Pull all of group0 state on repair
Current code uses non-raft path to pull the schema, which violates
group0 linearizability because the node will have latest schema but
miss group0 updates of other system tables. In particular,
system.tablets. This manifests as repair errors due to missing
tablet_map for a given table when trying to access it. Tablet map is
always created together with the table in the same group0 command.

When a node is bootstrapping, repair calls sync_schema() to make
sure local schema is up to date. This races with group0 catch up,
and if sync_schema() wins, repair may fail on misssing tablet map.

Fix by making sync_schema() do a group0 read barrier when in raft
mode.

Fixes #18002

Closes scylladb/scylladb#18175
2024-04-17 16:21:05 +02:00
Yaron Kaikov
44d1ffe86b [github] add PR template
Today with the backport automation, the developer added the relevant backport label, but without any explanation of why

Adding the PR template with a placeholder for the developer to add his decision about backport yes or no

The placeholder is marked as a task, so once the explanation is added, the task must be checked as completed
2024-04-17 15:40:32 +03:00
Nadav Har'El
e78fc75323 Merge 'tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands' from Botond Dénes
Just like all the other commands already have it. These commands didn't have documentation at the point where they were implemented, hence the missing doc link.

The links don't work yet, but they will work once we release 6.0 and the current master documentation is promoted to stable.

Closes scylladb/scylladb#18147

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: fix typo: Fore -> For
  tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands
2024-04-17 15:15:56 +03:00
Asias He
642f9a1966 repair: Improve estimated_partitions to reduce memory usage
Currently, we use the sum of the estimated_partitions from each
participant node as the estimated_partitions for sstable produced by
repair. This way, the estimated_partitions is the biggest possible
number of partitions repair would write.

Since repair will write only the difference between repair participant
nodes, using the biggest possible estimation will overestimate the
partitions written by repair, most of the time.

The problem is that overestimated partitions makes the bloom filter
consume more memory. It is observed that it causes OOM in the field.

This patch changes the estimation to use a fraction of the average
partitions per node instead of sum. It is still not a perfect estimation
but it already improves memory usage significantly.

Fixes #18140

Closes scylladb/scylladb#18141
2024-04-17 14:31:38 +03:00
Pavel Emelyanov
1b2cd56bcc tablets: Generalize transition mutations preparation
Tablet transition handlers prepare two mutations -- one for tablets
table, that sets transition state, transition mode and few others; and
another one for topology table that "activates" the tablet_migration
state for topology coordinator.

The latter is common to all three handlers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-17 12:01:51 +03:00
Pavel Emelyanov
3beccb8165 tablets: Generalize tablet-already-in-transition check
Continuation of the previous patch -- there's a common sanity check of
tablet transition API handlers, namely that this tablet is not in
transition already.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-17 12:01:02 +03:00
Pavel Emelyanov
14923812ad tablets: Generalize raft communications for tablet transition API calls
There are three transition calls -- move, add replica and del replica --
and all three work similarly. In a loop they try to get guard for raft
operation, then perform sanity checks on topology state, then prepare
mutations and then try to apply them to raft. After the loop finishes
all three wait for transition for the given tablet to complete.

This patch generalizes the raft kicking loop and the transition
completion waiting code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-17 11:59:03 +03:00
Pavel Emelyanov
c4d538320e tablets: Drop src vs dst equality check from move_tablet()
The code here looks like this

    if src.host == dst.host
        throw "Local migration not possible"

    if src == dst
        co_return;

The 2nd check is apparently never satisfied -- if src == dst this means
that src.host == dst.host and it should have thrown already

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-17 11:57:10 +03:00
Kefu Chai
e431e7dc16 test: paritioner_test: print using fmt::print()
instead of using `operator<<`, use `fmt::print()` to
format and print, so we can ditch the `operator<<`-based formatters.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18259
2024-04-17 07:13:20 +03:00
Kefu Chai
0ff28b2a2a test: extract boost_test_print_type() into test_utils.hh
since Boost.Test relies on operator<< or `boost_test_print_type()`
to print the value of variables being compared, instead of defining
the fallback formatter of `boost_test_print_type()` for each
individual test, let's define it in `test/lib/test_utils.hh`, so
that it can be shared across tests.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18260
2024-04-17 07:12:39 +03:00
Kefu Chai
2bb8e7c3c3 utils: include "seastarx.hh" in composite_abort_source.hh
there is chance that `utils/small_vector.hh` does not include
`using namespace seastar`, and even if it does, we should not rely
on it. but if it does not, checkhh would fail. so let's include
"seastarx.hh" in this header, so it is self-contained.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18265
2024-04-17 07:11:01 +03:00
David Garcia
6707bc673c docs: update theme 1.7
Closes scylladb/scylladb#18252
2024-04-16 13:48:11 +02:00
Kamil Braun
eb9ba914a3 Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy
The problem this series solves is correctly ignoring DOWN nodes state
when replacing a node.

When a node is replaced and there are other nodes that are down, the
replacing node is told to ignore those DOWN nodes using the
`ignore_dead_nodes_for_replace` option.

Since the replacing node is bootstrapping it starts with an empty
system.peers table so it has no notion about any node state and it
learns about all other nodes via gossip shadow round done in
`storage_service::prepare_replacement_info`.

Normally, since the DOWN nodes to ignore already joined the ring, the
remaining node will have their endpoint state already in gossip, but if
the whole cluster was restarted while those DOWN nodes did not start,
the remaining nodes will only have a partial endpoint state from them,
which is loaded from system.peers.

Currently, the partial endpoint state contains only `HOST_ID` and
`TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`.

The first part of this series loads also `DC` and `RACK` from
system.peers to make them available to the replacing node as they are
crucial for building a correct replication map with network topology
replication strategy.

But still, without a `STATUS` those nodes are not considered as normal
token owners yet, and they do not go through handle_state_normal which
adds them to the topology and token_metadata.

The second part of this series uses the endpoint state retrieved in the
gossip shadow round to explicitly add the ignored nodes' state to
topology (including dc and rack) and token_metadata (tokens) in
`prepare_replacement_info`.  If there are more DOWN nodes that are not
explicitly ignored replace will fail (as it should).

Fixes scylladb/scylladb#15787

Closes scylladb/scylladb#15788

* github.com:scylladb/scylladb:
  storage_service: join_token_ring: load ignored nodes state if replacing
  storage_service: replacement_info: return ignore_nodes state
  locator: host_id_or_endpoint: keep value as variant
  gms: endpoint_state: add getters for host_id, dc_rack, and tokens
  storage_service: topology_state_load: set local STATUS state using add_saved_endpoint
  gossiper: add_saved_endpoint: set dc and rack
  gossiper: add_saved_endpoint: fixup indentation
  gossiper: add_saved_endpoint: make host_id mandatory
  gossiper: add load_endpoint_state
  gossiper: start_gossiping: log local state
2024-04-16 10:27:36 +02:00
Pavel Emelyanov
2c3d6fe72f storage_proxy: Simplify create_hint_sync_point() code
It tries to call container().invoke_on_all() the hard way.
Calling it directly is not possible, because there's no
sharded::invoke_on_all() const overload

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18202
2024-04-16 07:26:06 +03:00
Nadav Har'El
a175e34375 cql-pytest: add instructions on how to get Cassandra
The cql-pytest framework allows running tests also against Cassandra,
but developers need to install Cassandra on their own because modern
distributions such as Fedora no longer carry a Cassandra package.

This patch adds clear and easy to follow (I think) instructions on how
to download a pre-compiled Cassadra, or alternatively how to download
and build Cassandra from source - and how either can be used with the
test/cql-pytest/run-cassandra script.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18138
2024-04-16 07:23:36 +03:00
Botond Dénes
298a7fcbf2 Merge 'Drain view_builder in generic drain' from ScyllaDB
For view builder draining there's dedicated deferred action in main while all other services that need to be drained do it via storage_service. The latter is to unify shutdown for services and to make `nodetool drain` drain everything, not just some part of those. This PR makes view builder drain look the same. As a side effect it also moves `mark_existing_views_as_built` from storage service to view builder and generalizes this marking code inside view builder itself.

refs: #2737
refs: #2795

Closes scylladb/scylladb#16558

* github.com:scylladb/scylladb:
  storage_service: Drain view builder on drain too
  view_builder: Generalize mark_as_built(view_ptr) method
  view_builder: Move mark_existing_views_as_built from storage service
  storage_service: Add view_builder& reference
  main,cql_test_env: Move view_builder start up (and make unconditional)
2024-04-16 07:21:42 +03:00
Pavel Emelyanov
5cf53e670d replica: Remove unused ex variable from table::take_snapshot
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18215
2024-04-16 07:16:38 +03:00
Pavel Emelyanov
f17c594d21 large_data_handler: If-less statistics increment
The partitions_bigger_than_threshold is incremented only if the previous
check detects that the partition exceeds a threshold by its size. It's
done with an extra if, but it can be done without (explicit) condition
as bool type is guaranteed by the standard to convert into integers as
true = 1 and false = 0

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18217
2024-04-16 07:16:05 +03:00
Pavel Emelyanov
0f70d276d2 tools/scylla-sstable: Use shorter check is unordered_set contains a key
Currentl code counts the number of keys in it just to see if this number
is non-zero. Using .contains() method is better fit here

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18219
2024-04-16 07:14:48 +03:00
Pavel Emelyanov
1df7c2a0e9 topology_coordinator: Mark retake_node() const
Runaway from 4d83a8c12c

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18218
2024-04-16 07:13:07 +03:00
Pavel Emelyanov
05c4042511 api/lsa: Don't use database to perform invoke-on-all
The sharded<database> is used as a invoke_in_all() method provider,
there's no real need in database itself. Simple smp::invoke_on_all()
would work just as good.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18221
2024-04-16 07:12:40 +03:00
Pavel Emelyanov
4a6291dce5 test/sstable: Use .handle_exception_type() shortcut
Some tests want to ignore out_of_range exception in continuation and go
the longer route for that

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18216
2024-04-16 07:11:35 +03:00
Pavel Emelyanov
1612aa01ca cql3: Reserve vector with pk columns
When constructing a vector with partition key data, the size of that
vector is known beforehand

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18239
2024-04-16 07:06:07 +03:00
Pavel Emelyanov
f3edde7d2e api: Qualify callback commitlog* argument with const
There's a helper map-reducer that accepts a function to call on
commitlog. All callers accumulate statistics with it, so the commitlog
argument is const pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18238
2024-04-16 07:02:31 +03:00
Botond Dénes
162c9ad6f6 Merge 'gossiper: lock local endpoint when updating heart_beat' from Kamil Braun
In testing, we've observed multiple cases where nodes would fail to
observe updated application states of other nodes in gossiper.

For example:
- in scylladb/scylladb#16902, a node would finish bootstrapping and enter
NORMAL state, propagating this information through gossiper. However,
other nodes would never observe that the node entered NORMAL state,
still thinking that it is in joining state. This would lead to further
bad consequences down the line.
- in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for
schema versions to converge. Convergence would never be achieved and the
test eventually timed out. The node was observing outdated schema state
of some existing node in gossip.

I created a test that would bootstrap 3 nodes, then wait until they all
observe each other as NORMAL, with timeout. Unfortunately, thousands of
runs of this test on different machines failed to reproduce the problem.

After banging my head against the wall failing to reproduce, I decided
to sprinkle randomized sleeps across multiple places in gossiper code
and finally: the test started catching the problem in about 1 in 1000
runs.

With additional logging and additional head-banging, I determined
the root cause.

The following scenario can happen, 2 nodes are sufficient, let's call
them A and B:
- Node B calls `add_local_application_state` to update its gossiper
  state, for example, to propagate its new NORMAL status.
- `add_local_application_state` takes a copy of the endpoint_state, and
  updates the copy:
```
            auto local_state = *ep_state_before;
            for (auto& p : states) {
                auto& state = p.first;
                auto& value = p.second;
                value = versioned_value::clone_with_higher_version(value);
                local_state.add_application_state(state, value);
            }
```
  `clone_with_higher_version` bumps `version` inside
  gms/version_generator.cc.
- `add_local_application_state` calls `gossiper.replicate(...)`
- `replicate` works in 2 phases to achieve exception safety: in first
  phase it copies the updated `local_state` to all shards into a
  separate map. In second phase the values from separate map are used to
  overwrite the endpoint_state map used for gossiping.

  Due to the cross-shard calls of the 1 phase, there is a yield before
  the second phase. *During this yield* the following happens:
- `gossiper::run()` loop on B executes and bumps node B's `heart_beat`.
  This uses the monotonic version_generator, so it uses a higher version
  then the ones we used for states added above. Let's call this new version
  X. Note that X is larger than the versions used by application_states
  added above.
- now node B handles a SYN or ACK message from node A, creating
  an ACK or ACK2 message in response. This message contains:
    - old application states (NOT including the update described above,
      because `replicate` is still sleeping before phase 2),
    - but bumped heart_beat == X from `gossiper::run()` loop,
  and sends the message.
- node A receives the message and remembers that the max
  version across all states (including heart_beat) of node B is X.
  This means that it will no longer request or apply states from node B
  with versions smaller than X.
- `gossiper.replicate(...)` on B wakes up, and overwrites
  endpoint_state with the ones it saved in phase 1. In particular it
  reverts heart_beat back to smaller value, but the larger problem is that it
  saves updated application_states that use versions smaller than X.
- now when node B sends the updated application_states in ACK or ACK2
  message to node A, node A will ignore them, because their versions are
  smaller than X. Or node B will never send them, because whenever node
  A requests states from node B, it only requests states with versions >
  X. Either way, node A will fail to observe new states of node B.

If I understand correctly, this is a regression introduced in
38c2347a3c, which introduced a yield in
`replicate`. Before that, the updated state would be saved atomically on
shard 0, there could be no `heart_beat` bump in-between making a copy of
the local state, updating it, and then saving it.

With the description above, it's easy to make a consistent
reproducer for the problem -- introduce a longer sleep in
`add_local_application_state` before second phase of replicate, to
increase the chance that gossiper loop will execute and bump heart_beat
version during the yield. Further commit adds a test based on that.

The fix is to bump the heart_beat under local endpoint lock, which is
also taken by `replicate`.

The PR also adds a regression test.

Fixes: scylladb/scylladb#15393
Fixes: scylladb/scylladb#15602
Fixes: scylladb/scylladb#16668
Fixes: scylladb/scylladb#16902
Fixes: scylladb/scylladb#17493
Fixes: scylladb/scylladb#18118
Ref: scylladb/scylla-enterprise#3720

Closes scylladb/scylladb#18184

* github.com:scylladb/scylladb:
  test: reproducer for missing gossiper updates
  gossiper: lock local endpoint when updating heart_beat
2024-04-16 06:46:24 +03:00
Tzach Livyatan
289793d964 Update Driver root page
The right term is Amazon DynamoDB not AWS DynamoDB
See https://aws.amazon.com/dynamodb/

Closes scylladb/scylladb#18214
2024-04-16 06:41:28 +03:00
Beni Peled
223275b4d1 test.py: add the pytest junit_suite_name parameter
By default the suitename in the junit files generated by pytest
is named `pytest` for all suites instead of the suite, ex. `topology_experimental_raft`
With this change, the junit files will use the real suitename

This change doesn't affect the Test Report in Jenkins, but it
raised part of the other task of publishing the test results to
elasticsearch https://github.com/scylladb/scylla-pkg/pull/3950
where we parse the XMLs and we need the correct suitename

Closes scylladb/scylladb#18172
2024-04-15 21:07:00 +03:00
Tomasz Grabiec
95d93c1668 Merge 'Extend tablet_transition_kind::rebuild to remove replicas' from Pavel Emelyanov
When altering rf for a keyspace, all tablets in this ks may have less replicas. Part of this process is removing replicas from some node(s). This PR extends the tablets rebuild transition to handle this case by making pending_replica optional.

fixes: #18176

Closes scylladb/scylladb#18203

* github.com:scylladb/scylladb:
  test: Tune up tablet-transition test to check del_replica
  api: Add method to delete replica from tablet
  tablet: Make pending replica optional
2024-04-15 21:01:03 +03:00
Pavel Emelyanov
c60639d582 sstables: Coroutinize drop_caches() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18220
2024-04-15 17:22:59 +03:00
Pavel Emelyanov
b06b85c270 test: Tune up tablet-transition test to check del_replica
For that the test case is modified to have 3 nodes and 2 replicas on
start. Existing test cases are changed slightly in the way "from" host
is detected.

Also, the final check for data presense is modified to check that hosts
in "replicas" have data and other hosts don't have it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-15 16:31:07 +03:00
Pavel Emelyanov
8bad828208 api: Add method to delete replica from tablet
Copied from the add_replica counterpart

TODO: Generalize common parts of move_tablet and add_|del_tablet_replica

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-15 16:31:07 +03:00
Pavel Emelyanov
725b2863d2 tablet: Make pending replica optional
Just like leaving replica could be optional when adding replica to
tablet, the pending replica can be optional too if we're removing a
replica from tablet

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-15 16:31:07 +03:00
Amnon Heiman
06dc56df01 Update seastar submodule
Fixes scylladb/scylladb#18083

* seastar cd8a9133...f3058414 (18):
  > src/core/metrics.cc: rewrite set_metric_family_configs
  > include/seastar/core/metrics_api.hh: Revert d2929c2ade5bd0125a73d53280c82ae5da86218e
  > sstring: include <fmt/format.h> instead of <fmt/ostream.h>
  > seastar.cc: include used header
  > tls: include used header of <unordered_set>
  > docs: remove unused parameter from handle_connection function of echo-HTTP-server tutorial example
  > stall-analyser: use 0 for the default value of --width
  > http: Move parsed params and urls
  > scripts: use raw string to avoid invalid escape sequences
  > timed_out_error: add fmt::formatter for timed_out_error
  > scripts/stall-analyser: change default branch-threshold to 3%
  > scripts/stall-analyser: resolve string escape sequence warning
  > io_queue: Use static vector for fair groups too
  > io_queue: Use static vector to store fair queues
  > stall-analyser: add space around '=' in param list
  > stall-analyser: add a space between 'var: Type' in type annotation
  > stall-analyser: move variables closer to where they are used
  > memory: drop support for compilers that don't support aligned new

Closes scylladb/scylladb#18235
2024-04-15 15:19:59 +02:00
Tomasz Grabiec
2ceef1d600 scripts: tablet-mon.py: Support for annotating tablets by table id
Closes scylladb/scylladb#18225
2024-04-15 15:19:59 +02:00
Marcin Maliszkiewicz
7e749cd848 auth: don't run legacy migrations on auth-v2 startup
We won't run:
- old pre auth-v1 migration code
- code creating auth-v1 tables

We will keep running:
- code creating default rows
- code creating auth-v1 keyspace (needed due to cqlsh legacy hack,
  it errors when executing `list roles` or `list users` if
  there is no system_auth keyspace, it does support case when
  there is no expected tables)
2024-04-15 12:09:39 +02:00
Marcin Maliszkiewicz
d40ff81c5b auth: fix indent in password_authenticator::start 2024-04-15 12:09:32 +02:00
Marcin Maliszkiewicz
3e8cf20b98 auth: remove unused service::has_existing_legacy_users func 2024-04-15 12:09:32 +02:00
Benny Halevy
655d624e01 storage_service: join_token_ring: load ignored nodes state if replacing
When a node bootstraps or replaces a node after full cluster
shutdown and restart, some nodes may be down.

Existing nodes in the cluster load the down nodes TOKENS
(and recently, in this series, also DC and RACK) from system.peers
and then populate locator::topology and token_metadata
accordingly with the down nodes' tokens in storage_service::join_cluster.

However, a bootstrapping/replacing node has no persistent knowledge
of the down nodes, and it learns about their existance only from gossip.
But since the down nodes have unknown status, they never go
through `handle_state_normal` (in gossiper mode) and therefore
they are not accounted as normal token owners.
This is handled by `topology_state_load`, but not with
gossip-based node operations.

This patch updates the ignored nodes (for replace) state in topology
and token_metadata as if they were loaded from system tables,
after calling `prepare_replacement_info` when raft topology changes are
disabled, based on the endpoint_state retrieved in the shadow round
initiated in prepare_replacement_info.

Fixes scylladb/scylladb#15787

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:45:55 +03:00
Benny Halevy
e4c3c07510 storage_service: replacement_info: return ignore_nodes state
Instead of `parse_node_list` resolving host ids to inet_address
let `prepare_replacement_info` get host_id_or_endpoint from
parse_node_list and prepare `loaded_endpoint_state` for
the ignored nodes so it can be used later by the callers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:43:19 +03:00
Benny Halevy
7c2bd8dc34 locator: host_id_or_endpoint: keep value as variant
Rather than allowing to keep both
host_id and endpoint, keep only one of them
and provide resolve functions that use the
token_metadata to resolve the host_id into
an inet_address or vice verse.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:25:50 +03:00
Benny Halevy
86f1fcdcdd gms: endpoint_state: add getters for host_id, dc_rack, and tokens
Allow getting metadata from the endpoint_state based
on the respective application states instead of going
through the gossiper.

To be used by the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:16:58 +03:00
Benny Halevy
239069eae5 storage_service: topology_state_load: set local STATUS state using add_saved_endpoint
When loading this node endpoint state and it has
tokens in token_metadata, its status can already be set
to normal.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:07:00 +03:00
Benny Halevy
6aaa1b0f48 gossiper: add_saved_endpoint: set dc and rack
When loading endpoint_state from system.peers,
pass the loaded nodes dc/rack info from
storage_service::join_token_ring to gossiper::add_saved_endpoint.

Load the endpoint DC/RACK information to the endpoint_state,
if available so they can propagate to bootstrapping nodes
via gossip, even if those nodes are DOWN after a full cluster-restart.

Note that this change makes the host_id presence
mandatory following https://github.com/scylladb/scylladb/pull/16376.
The reason to do so is that the other states: tokens, dc, and rack
are useless with the host_id.
This change is backward compatible since the HOST_ID application state
was written to system.peers since inception in scylla
and it would be missing only due to potential exception
in older versions that failed to write it.
In this case, manual intervention is needed and
the correct HOST_ID needs to be manually updated in system.peers.

Refs #15787

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:07:00 +03:00
Benny Halevy
468462aa73 gossiper: add_saved_endpoint: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:07:00 +03:00
Benny Halevy
b9e2aa4065 gossiper: add_saved_endpoint: make host_id mandatory
Require all callers to provide a valid host_id parameter.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:07:00 +03:00
Benny Halevy
1061455442 gossiper: add load_endpoint_state
Pack the topology-related data loaded from system.peers
in `gms::load_endpoint_state`, to be used in a following
patch for `add_saved_endpoint`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:06:56 +03:00
Benny Halevy
6b2d94045a gossiper: start_gossiping: log local state
The trace level message hides important information
about the initial node state in gossip.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:06:30 +03:00
Benny Halevy
a7c5fccab9 test: chunked_managed_vector_test: add test_push_back_using_existing_element
chunked_managed_vector isn't susceptible to #18072
since the elements it keeps are managed_ref<T> and
those must be constructed by the caller, before reallocation
takes place, so it's safer with that respect.

The unit test is added to verify that and prevent
regressions in the future.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-11 14:34:50 +03:00
Benny Halevy
2afc584f08 utils: chunked_vector: reserve_for_emplace_back: emplace before migrating existing elements
Currently, push_back or emplace_back reallocate the last chunk
before constructing the new element.

If the arg passed to push_back/emplace_back is a reference to an
existing element in the vector, reallocating the last chunk will
invalidate the arg reference before it is used.

This patch changes the order when reallocating
the last chunk in reserve_for_emplace_back:
First, a new chunk_ptr is allocated.
Then, the back_element is emplaced in the
newly allocated array.
And only then, existing elements in the current
last chunk are migrated to the new chunk.
Eventually, the new chunk replaces the existing chunk.

If no reservation is requried, the back element
is emplaced "in place" in the current last chunk.

Fixes scylladb/scylladb#18072

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-11 14:34:48 +03:00
Benny Halevy
2c0e40a21f utils: chunked_vector: push_back: call emplace_back
When pushing an element with a value referencing
an exisiting element in the vector, we currently
risking use-after-free when that element gets moved
to a reallocated chunk, if capacity needs to be reserved,
by that, invaliding the refernce to the existing element
before it is used.

This patch prepares for fixing that in the emplace path
by converging to a single code path.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-11 14:33:43 +03:00
Benny Halevy
882bb21903 utils: chunked_vector: define min_chunk_capacity
Expose the number of items in the first allocated chunk.
This will be used by a unit test in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-11 14:33:43 +03:00
Benny Halevy
e066f81cb3 utils: chunked*vector: use std::clamp
It is available in the std library since C++17.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-11 14:33:43 +03:00
Kefu Chai
0be61e51d3 treewide: include <fmt/ostream.h>
this header was previously brought in by seastar's sstring.hh. but
since sstring.hh does not include <fmt/ostream.h> anymore,
`gms/application_state.cc` does not have access to this header.
also, `gms/application_state.cc` should `#include` the used header
by itself.

so, in this change, let's include  <fmt/ostream.h> in `gms/application_state.cc`.
this change addresses the FTBFS with the latest seastar.

the same applies to other places changed in this commit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18193
2024-04-11 11:59:41 +03:00
Yaniv Kaul
bd34f2fe46 toolchain: support building an optimized clang
This is a different way attempting to combine building an optimized clang (using LTO, PGO and BOLT, based on compiling ScyllaDB) to dbuild. Per Avi's request, there are 3 options: skip this phase (which is the current default), build it and build + install it to the default path.

Fixes: #10985
Fixes: scylladb/scylla-enterprise#2539
2024-04-08 22:53:59 +09:00
Pavel Emelyanov
1e0d96cfed storage_service: Drain view builder on drain too
This gets rid of dangling deferred drin on stop and makes nodetool drain
more "consistent" by stopping one more unneeded background activity

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:56:12 +03:00
Pavel Emelyanov
90593f4e82 view_builder: Generalize mark_as_built(view_ptr) method
Marking is performed in two places and they can be generalized

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:56:12 +03:00
Pavel Emelyanov
3c3f2cd337 view_builder: Move mark_existing_views_as_built from storage service
Now it's in the correct component

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:56:11 +03:00
Pavel Emelyanov
895391fb4b storage_service: Add view_builder& reference
Storage service will need to drain v.b. on its drain. Also on cluster
join it marks existing views as built while it's v.b.'s job to do it.
Both will be fixed by next patching and this is prerequisite.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:55:07 +03:00
Pavel Emelyanov
f00f1f117b main,cql_test_env: Move view_builder start up (and make unconditional)
Just starting sharded<view_builder> is lightweight, its constructor does
nothing but initializes on-board variables. Real work takes off on
view_builder::start() which is not moved.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-05 19:53:33 +03:00
Botond Dénes
c01b19fcb3 Merge 'test/boost: add test for writing large partition notifications' from Ferenc Szili
The current test in boost/cql_query_large_test::test_large_data only checks whether notifications for large rows and cells are written into the system keyspace. It doesn't check this for partitions.

This change adds this check for partitions.

Closes scylladb/scylladb#18189

* github.com:scylladb/scylladb:
  test/boost: added test for large row count warning
  test/boost: add test for writing large partition notifications
2024-04-05 15:35:54 +03:00
Botond Dénes
f6efa17713 Merge 'repair: fix memory counting in repair' from Aleksandra Martyniuk
Repair memory limit includes only the size of frozen mutation
fragments in repair row. The size of other members of repair
row may grow uncontrollably and cause out of memory.

Modify what's counted to repair memory limit.

Fixes: #16710.

Closes scylladb/scylladb#17785

* github.com:scylladb/scylladb:
  test: add test for repair_row::size()
  repair: fix memory accounting in repair_row
2024-04-05 14:53:55 +03:00
Tomasz Grabiec
0c74c2c12f Merge 'Extend tablet_transition_kind::rebuild to rebuild tablet to new replica' from Pavel Emelyanov
When altering rf for a keyspace, all tablets in this ks will get more replicas. Part of this process is rebuilding tablets' onto new node(s). This PR extends the tablets transition code to support rebuilding of tablet on new replica.

fixes: #18030

Closes scylladb/scylladb#18082

* github.com:scylladb/scylladb:
  test: Check data presense as well
  test: Test how tablets are copied between nodes
  test: Add sanity test for tablet migration
  api: Add method to add replica to a tablet
  tablet: Make leaving replica optional
2024-04-05 12:51:10 +02:00
Ferenc Szili
443192e36d test/boost: added test for large row count warning 2024-04-05 11:50:09 +02:00
Pavel Emelyanov
639cc1f576 compaction: Replace formatted_sstables_list with fmt:: facilities
The formatted_sstables_list is auxiliary class that collects a bunch of
sstables::to_string(shared_sstable)-generated strings. One of bad side
effects of this helper is that it allocates memory for the vector of
strings.

This patch achieves the same goal with the help of fmt::join() equipped
with transformed boost adaptor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18160
2024-04-05 09:17:15 +03:00
Kefu Chai
ff43628b44 gms: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18194
2024-04-05 08:48:17 +03:00
Pavel Emelyanov
2a98e95cd0 api: Coroutinize API get_snapshot_details handler
Now it's possible to understand what it does

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18190
2024-04-04 22:20:28 +03:00
Kamil Braun
72955093eb test: reproducer for missing gossiper updates
Regression test for scylladb/scylladb#17493.
2024-04-04 18:47:01 +02:00
Kamil Braun
a0b331b310 gossiper: lock local endpoint when updating heart_beat
In testing, we've observed multiple cases where nodes would fail to
observe updated application states of other nodes in gossiper.

For example:
- in scylladb/scylladb#16902, a node would finish bootstrapping and enter
NORMAL state, propagating this information through gossiper. However,
other nodes would never observe that the node entered NORMAL state,
still thinking that it is in joining state. This would lead to further
bad consequences down the line.
- in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for
schema versions to converge. Convergence would never be achieved and the
test eventually timed out. The node was observing outdated schema state
of some existing node in gossip.

I created a test that would bootstrap 3 nodes, then wait until they all
observe each other as NORMAL, with timeout. Unfortunately, thousands of
runs of this test on different machines failed to reproduce the problem.

After banging my head against the wall failing to reproduce, I decided
to sprinkle randomized sleeps across multiple places in gossiper code
and finally: the test started catching the problem in about 1 in 1000
runs.

With additional logging and additional head-banging, I determined
the root cause.

The following scenario can happen, 2 nodes are sufficient, let's call
them A and B:
- Node B calls `add_local_application_state` to update its gossiper
  state, for example, to propagate its new NORMAL status.
- `add_local_application_state` takes a copy of the endpoint_state, and
  updates the copy:
```
            auto local_state = *ep_state_before;
            for (auto& p : states) {
                auto& state = p.first;
                auto& value = p.second;
                value = versioned_value::clone_with_higher_version(value);
                local_state.add_application_state(state, value);
            }
```
  `clone_with_higher_version` bumps `version` inside
  gms/version_generator.cc.
- `add_local_application_state` calls `gossiper.replicate(...)`
- `replicate` works in 2 phases to achieve exception safety: in first
  phase it copies the updated `local_state` to all shards into a
  separate map. In second phase the values from separate map are used to
  overwrite the endpoint_state map used for gossiping.

  Due to the cross-shard calls of the 1 phase, there is a yield before
  the second phase. *During this yield* the following happens:
- `gossiper::run()` loop on B executes and bumps node B's `heart_beat`.
  This uses the monotonic version_generator, so it uses a higher version
  then the ones we used for states added above. Let's call this new version
  X. Note that X is larger than the versions used by application_states
  added above.
- now node B handles a SYN or ACK message from node A, creating
  an ACK or ACK2 message in response. This message contains:
    - old application states (NOT including the update described above,
      because `replicate` is still sleeping before phase 2),
    - but bumped heart_beat == X from `gossiper::run()` loop,
  and sends the message.
- node A receives the message and remembers that the max
  version across all states (including heart_beat) of node B is X.
  This means that it will no longer request or apply states from node B
  with versions smaller than X.
- `gossiper.replicate(...)` on B wakes up, and overwrites
  endpoint_state with the ones it saved in phase 1. In particular it
  reverts heart_beat back to smaller value, but the larger problem is that it
  saves updated application_states that use versions smaller than X.
- now when node B sends the updated application_states in ACK or ACK2
  message to node A, node A will ignore them, because their versions are
  smaller than X. Or node B will never send them, because whenever node
  A requests states from node B, it only requests states with versions >
  X. Either way, node A will fail to observe new states of node B.

If I understand correctly, this is a regression introduced in
38c2347a3c, which introduced a yield in
`replicate`. Before that, the updated state would be saved atomically on
shard 0, there could be no `heart_beat` bump in-between making a copy of
the local state, updating it, and then saving it.

With the description above, it's easy to make a consistent
reproducer for the problem -- introduce a longer sleep in
`add_local_application_state` before second phase of replicate, to
increase the chance that gossiper loop will execute and bump heart_beat
version during the yield. Further commit adds a test based on that.

The fix is to bump the heart_beat under local endpoint lock, which is
also taken by `replicate`.

Fixes: scylladb/scylladb#15393
Fixes: scylladb/scylladb#15602
Fixes: scylladb/scylladb#16668
Fixes: scylladb/scylladb#16902
Fixes: scylladb/scylladb#17493
Fixes: scylladb/scylladb#18118
Ref: scylladb/scylla-enterprise#3720
2024-04-04 18:46:56 +02:00
Ferenc Szili
5624abfbeb test/boost: add test for writing large partition notifications
The current test in boost/cql_query_large_test::test_large_data only checks whether notifications for large rows and cells are written into the system keyspace. It doesn't check this for partitions.

This change adds this check for partitions.
2024-04-04 17:33:23 +02:00
Pavel Emelyanov
c7908c319f test: Check data presense as well
Other than making sure that system.tablets is updated with correct
replica set, it's also good to check that the data is present on the
repsective nodes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 18:01:24 +03:00
Aleksandra Martyniuk
51c09a84cc test: add test for repair_row::size()
Add test which checs whether repair_row::size() considers external
memory.
2024-04-04 16:03:05 +02:00
Aleksandra Martyniuk
a4dc6553ab repair: fix memory accounting in repair_row
In repair, only the size of frozen mutation fragments of repair row
is counted to the memory limit. So, huge keys of repair rows may
lead to OOM.

Include other repair_row's members' memory size in repair memory
limit.
2024-04-04 15:50:53 +02:00
Raphael S. Carvalho
9f93dd9fa3 replica: Use flat_hash_map for tablet storage
The reason that we want to switch to flat_hash_map is that only a small
subset of tablets will be allocated on any given shard, therefore it's
wasteful to use a sparse array, and iterations are slow.
Also, the map gives greater development flexibility as one doesn't have
to worry about empty entries.

perf result:

-- reads

scylla_with_chunked_vector-read-no-tablets.txt
median 73223.28 tps ( 62.3 allocs/op,  13.3 tasks/op,   41932 insns/op,        0 errors)
median 74952.87 tps ( 62.3 allocs/op,  13.3 tasks/op,   41969 insns/op,        0 errors)
median 73016.37 tps ( 62.3 allocs/op,  13.3 tasks/op,   41934 insns/op,        0 errors)
median 74078.14 tps ( 62.3 allocs/op,  13.3 tasks/op,   41938 insns/op,        0 errors)
median 75323.07 tps ( 62.3 allocs/op,  13.3 tasks/op,   41944 insns/op,        0 errors)

scylla_with_hash_map-read-no-tablets.txt
median 74963.30 tps ( 62.3 allocs/op,  13.3 tasks/op,   41926 insns/op,        0 errors)
median 74032.09 tps ( 62.3 allocs/op,  13.3 tasks/op,   41918 insns/op,        0 errors)
median 74850.09 tps ( 62.3 allocs/op,  13.3 tasks/op,   41937 insns/op,        0 errors)
median 74239.37 tps ( 62.3 allocs/op,  13.3 tasks/op,   41921 insns/op,        0 errors)
median 74798.14 tps ( 62.3 allocs/op,  13.3 tasks/op,   41925 insns/op,        0 errors)

scylla_with_chunked_vector-read-tablets-1.txt
median 74234.27 tps ( 62.1 allocs/op,  13.3 tasks/op,   41903 insns/op,        0 errors)
median 75775.98 tps ( 62.1 allocs/op,  13.3 tasks/op,   41910 insns/op,        0 errors)
median 76481.56 tps ( 62.1 allocs/op,  13.2 tasks/op,   41874 insns/op,        0 errors)
median 74056.67 tps ( 62.1 allocs/op,  13.3 tasks/op,   41894 insns/op,        0 errors)
median 75287.68 tps ( 62.1 allocs/op,  13.3 tasks/op,   41894 insns/op,        0 errors)

scylla_with_hash_map-read-tablets-1.txt
median 75613.63 tps ( 62.1 allocs/op,  13.2 tasks/op,   41990 insns/op,        0 errors)
median 74819.51 tps ( 62.1 allocs/op,  13.2 tasks/op,   41973 insns/op,        0 errors)
median 75648.41 tps ( 62.1 allocs/op,  13.3 tasks/op,   42025 insns/op,        0 errors)
median 74170.89 tps ( 62.1 allocs/op,  13.2 tasks/op,   42002 insns/op,        0 errors)
median 75447.72 tps ( 62.1 allocs/op,  13.3 tasks/op,   41952 insns/op,        0 errors)

scylla_with_chunked_vector-read-tablets-128.txt
median 73788.57 tps ( 62.1 allocs/op,  13.2 tasks/op,   41956 insns/op,        0 errors)
median 76563.63 tps ( 62.1 allocs/op,  13.3 tasks/op,   42006 insns/op,        0 errors)
median 75536.12 tps ( 62.1 allocs/op,  13.2 tasks/op,   42005 insns/op,        0 errors)
median 74679.17 tps ( 62.1 allocs/op,  13.3 tasks/op,   41958 insns/op,        0 errors)
median 75380.95 tps ( 62.1 allocs/op,  13.2 tasks/op,   41946 insns/op,        0 errors)

scylla_with_hash_map-read-tablets-128.txt
median 75459.99 tps ( 62.1 allocs/op,  13.3 tasks/op,   42055 insns/op,        0 errors)
median 74280.11 tps ( 62.1 allocs/op,  13.3 tasks/op,   42085 insns/op,        0 errors)
median 74502.61 tps ( 62.1 allocs/op,  13.3 tasks/op,   42063 insns/op,        0 errors)
median 74692.27 tps ( 62.1 allocs/op,  13.3 tasks/op,   41994 insns/op,        0 errors)
median 75402.64 tps ( 62.1 allocs/op,  13.3 tasks/op,   42015 insns/op,        0 errors)

-- writes

scylla_with_chunked_vector-write-no-tablets.txt
median 68635.17 tps ( 58.4 allocs/op,  13.3 tasks/op,   52709 insns/op,        0 errors)
median 68716.36 tps ( 58.4 allocs/op,  13.3 tasks/op,   52691 insns/op,        0 errors)
median 68512.76 tps ( 58.4 allocs/op,  13.3 tasks/op,   52721 insns/op,        0 errors)
median 68606.14 tps ( 58.4 allocs/op,  13.3 tasks/op,   52696 insns/op,        0 errors)
median 68619.25 tps ( 58.4 allocs/op,  13.3 tasks/op,   52697 insns/op,        0 errors)

scylla_with_hash_map-write-no-tablets.txt
median 67678.10 tps ( 58.4 allocs/op,  13.3 tasks/op,   52723 insns/op,        0 errors)
median 67966.06 tps ( 58.4 allocs/op,  13.3 tasks/op,   52736 insns/op,        0 errors)
median 67881.47 tps ( 58.4 allocs/op,  13.3 tasks/op,   52743 insns/op,        0 errors)
median 67856.81 tps ( 58.4 allocs/op,  13.3 tasks/op,   52730 insns/op,        0 errors)
median 67812.58 tps ( 58.4 allocs/op,  13.3 tasks/op,   52740 insns/op,        0 errors)

scylla_with_chunked_vector-write-tablets-1.txt
median 67741.83 tps ( 58.4 allocs/op,  13.3 tasks/op,   53425 insns/op,        0 errors)
median 68014.20 tps ( 58.4 allocs/op,  13.3 tasks/op,   53455 insns/op,        0 errors)
median 68228.48 tps ( 58.4 allocs/op,  13.3 tasks/op,   53447 insns/op,        0 errors)
median 67950.96 tps ( 58.4 allocs/op,  13.3 tasks/op,   53443 insns/op,        0 errors)
median 67832.69 tps ( 58.4 allocs/op,  13.3 tasks/op,   53462 insns/op,        0 errors)

scylla_with_hash_map-write-tablets-1.txt
median 66873.70 tps ( 58.4 allocs/op,  13.3 tasks/op,   53548 insns/op,        0 errors)
median 67568.23 tps ( 58.4 allocs/op,  13.3 tasks/op,   53547 insns/op,        0 errors)
median 67653.70 tps ( 58.4 allocs/op,  13.3 tasks/op,   53525 insns/op,        0 errors)
median 67389.21 tps ( 58.4 allocs/op,  13.3 tasks/op,   53536 insns/op,        0 errors)
median 67437.91 tps ( 58.4 allocs/op,  13.3 tasks/op,   53537 insns/op,        0 errors)

scylla_with_chunked_vector-write-tablets-128.txt
median 67115.41 tps ( 58.3 allocs/op,  13.3 tasks/op,   53341 insns/op,        0 errors)
median 66836.07 tps ( 58.3 allocs/op,  13.3 tasks/op,   53342 insns/op,        0 errors)
median 67214.07 tps ( 58.3 allocs/op,  13.3 tasks/op,   53303 insns/op,        0 errors)
median 67198.25 tps ( 58.3 allocs/op,  13.3 tasks/op,   53347 insns/op,        0 errors)
median 67368.78 tps ( 58.3 allocs/op,  13.3 tasks/op,   53374 insns/op,        0 errors)

scylla_with_hash_map-write-tablets-128.txt
median 66273.50 tps ( 58.3 allocs/op,  13.3 tasks/op,   53400 insns/op,        0 errors)
median 66564.89 tps ( 58.3 allocs/op,  13.3 tasks/op,   53432 insns/op,        0 errors)
median 66568.52 tps ( 58.3 allocs/op,  13.3 tasks/op,   53408 insns/op,        0 errors)
median 66368.00 tps ( 58.3 allocs/op,  13.3 tasks/op,   53441 insns/op,        0 errors)
median 66293.55 tps ( 58.3 allocs/op,  13.3 tasks/op,   53408 insns/op,        0 errors)

Fixes #18010.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18093
2024-04-04 16:25:48 +03:00
Yaniv Kaul
2ce2649ec1 Typo: you -> your
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#17806
2024-04-04 14:55:46 +03:00
Nadav Har'El
c24bc3b57a alternator: do not use tablets on new Alternator tables
A few months ago, in merge d3c1be9107,
we decided that if Scylla has the experimental "tablets" feature enabled,
new Alternator tables should use this feature by default - exactly like
this is the default for new CQL tables.

Sadly, it was now decided to reverse this decision: We do not yet trust
enough LWT on tablets, and since Alternator often (if not always) relies
on LWT, we want Alternator tables to continue to use vnodes - not tablets.

The fix is trivial - just changing the default. No test needed to change
because anyway, all Alternator tests work correctly on Scylla with the
tablets experimental feature disabled. I added a new test to enshrine
the fact that Alternator does not use tablets.

An unfortunate result of this patch will be that Alternator tables
created on versions with this patch (e.g., Scylla 6.0) will not use
tablets and will continue to not use tablets even if Scylla is upgraded
(currently, the use of tablets is decided at table creation time, and
there is no way to "upgrade" a vnode-based table to be tablet based).

This patch should be reverted as soon as LWT support matures on tablets.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#18157
2024-04-04 12:11:29 +03:00
Pavel Emelyanov
1c1004d1bd sstables_loader: Format list of sstables' filenames in place
Loader wants to print set of sstables' names. For that it collects names
into a dedicated vector, then prints it using fmt/ranges facility.

There's a way to achieve the same goal without allocating extra vector
with names -- use fmt::format() and pass it a range converting sstables
into their names.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18159
2024-04-04 12:09:52 +03:00
Ferenc Szili
f1cc6252fd logging: Don't log PK/CK in large partition/row/cell warning
Currently, Scylla logs a warning when it writes a cell, row or partition which are larger than certain configured sizes. These warnings contain the partition key and in case of rows and cells also the cluster key which allow the large row or partition to be identified. However, these keys can contain user-private, sensitive information. The information which identifies the partition/row/cell is also inserted into tables system.large_partitions, system.large_rows and system.large_cells respectivelly.

This change removes the partition and cluster keys from the log messages, but still inserts them into the system tables.

The logged data will look like this:

Large cells:
WARN  2024-04-02 16:49:48,602 [shard 3:  mt] large_data - Writing large cell ks_name/tbl_name: cell_name (SIZE bytes) to sstable.db

Large rows:
WARN  2024-04-02 16:49:48,602 [shard 3:  mt] large_data - Writing large row ks_name/tbl_name: (SIZE bytes) to sstable.db

Large partitions:
WARN  2024-04-02 16:49:48,602 [shard 3:  mt] large_data - Writing large partition ks_name/tbl_name: (SIZE bytes) to sstable.db

Fixes #18041

Closes scylladb/scylladb#18166
2024-04-04 12:06:31 +03:00
Kefu Chai
3b50c39a83 scylla-gdb: access io_queue::_streams and io_queue::_fgs with static_vector
in seastar's b28342fa5a301de3facf5e83dc691524a6b20604, we switched
* `io_queue::_streams` from
  `boost::container::small_vector<fair_queue, 2>` to
  `boost::container::static_vector<fair_queue, 2>`
* `io_queue::_fgs` from
  `std::vector<std::unique_ptr<fair_group>>` to
  `boost::container::static_vector<fair_group, 2>`

so we need to update the gdb script accordingly to reflect this
change, and to avoid the nested try-except blocks, we switch to
a `while` statement to simplify the code structure.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18165
2024-04-04 11:39:10 +03:00
Anna Stuchlik
994f807bf6 docs: add the latest image info to GCP and Azure pages
This commit adds image information for the latest patch release
to the GCP and Azure deployment page.
The information now replaces the reference to the Download Center
so that the user doesn't have to jump to another website.

Fixes https://github.com/scylladb/scylladb/issues/18144

Closes scylladb/scylladb#18168
2024-04-04 11:24:39 +03:00
Kefu Chai
64b8bb239f api/storage_service: throw if table is not found when move tablets
`database::find_column_family()` throws no_such_column_family
if an unknown ks.cf is fed to it. and we call into this function
without checking for the existence of ks.cf first. since
"/storage_service/tablets/move" is a public interface, we should
translate this error to a better http error.

in this change, we check for the existence of the given ks.cf, and
throw an exception so that it can be caught by seastar::httpd::routers,
and converted to an HTTP error.

Fixes #17198
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17217
2024-04-04 11:23:52 +03:00
Pavel Emelyanov
590f0329ae test: Test how tablets are copied between nodes
This patches the previously introduced test by introducing the 'action'
test paramter and tweaking the final checking assertions around tablet
replicas read from system.tablets

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:22:57 +03:00
Pavel Emelyanov
28964ba5fe test: Add sanity test for tablet migration
It just checks that after api call to move_tablet the resulting replica
is in expected state. This test will be later expanded to check for
rebuild transition.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:22:31 +03:00
Pavel Emelyanov
79ad760e95 api: Add method to add replica to a tablet
The new API submits rebuild transition with new replicas set to be old
(current) replicas plus the provided one. It looks and acts like the
move_tablet API call with several changes:

- lacks the "source" replica argument
- submits "rebuild" transition kind
- cross racks checks are not performed

The 'force' argument is inherited from move_tablet, but is unused now
and is left for future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:22:16 +03:00
Tomasz Grabiec
1a839bcb36 main: Skip tablet metadata loading in maintenance mode
If system.tablets is corrupted, the node would not boot in maintenance
mode, which is needed to fix system.tablets.

Closes scylladb/scylladb#17990
2024-04-04 09:20:09 +03:00
Pavel Emelyanov
b0cba57e29 tablet: Make leaving replica optional
When getting leaving replica from from tablet info and transition info,
the getter code assumes that this replica always exists. It's not going
to be the case soon, so make the return value be optional.

There are four places that mess with leaving replica:

- stream tablet handler: this place checks that the leaving replica is
  _not_ current host. If leaving replica is missing, the check should
  pass

- cleanup tablet handler: this place checks that the leaving replica
  _is_ current host. If leaving replica is missing, the check should
  fail as well

- topology coordinator: it gets leaving replica to call cleanup on. If
  leaving replica is missing, the cleanup call is short-circuited to
  succeed immediately

- load-stats calculator: it checks if the leaving replica is self. This
  check is not patched as it's automatically satisfied by std::optional
  comparison operator overload for wrapped type

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:03:36 +03:00
Michał Chojnowski
8147ab69ac row_cache_test: avoid a throw in external_updater
In test_exception_safety_of_update_from_memtable, we have a potential
throw from external_updater.

external_updater is supposed to be infallible.
Scylla currently aborts when an external_updater throws, so a throw from
there just fails the test.

This isn't intended. We aren't testing external_updater in this test.

Fixes #18163

Closes scylladb/scylladb#18171
2024-04-03 23:22:08 +02:00
Piotr Dulikowski
baae811142 Merge 'auth: keep auth version in scylla_local' from Marcin Maliszkiewicz
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.

Fixes https://github.com/scylladb/scylladb/issues/17736

Closes scylladb/scylladb#18039

* github.com:scylladb/scylladb:
  auth: keep auth version in scylla_local
  auth: coroutinize service::start
2024-04-03 12:25:56 +02:00
Kefu Chai
e2f3fed373 service: qos: fix a typo
s/accesor/accessor/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18124
2024-04-03 10:33:54 +02:00
Raphael S. Carvalho
12714a4123 locator: Avoid tablet map lookup on every write for getting replicas
We can cache tablet map in erm, to avoid looking it up on every write for
getting write replicas. We do that in tablet_sharder, but not in tablet
erm. Tablet map is immutable in the context of a given erm, so the
address of the map is stable during erm lifetime.

This caught my attention when looking at perf diff output
(comparing tablet and vnode modes).

It also helps when erm is called again on write completion for
checking locality, used for forwarding info to the driver if needed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18158
2024-04-03 10:28:04 +02:00
Botond Dénes
d43670046b test/lib: random_schema: disallow boolean_type in keys
They result in poor distribution and poor cardinality, interfering with
tests which want to generate N partitions or rows.

Fixes: #17821

Closes scylladb/scylladb#17856
2024-04-03 09:52:36 +03:00
Botond Dénes
2cb5dcabf7 docs/dev/maintainer.md: document another exceptions to rule no.0
Maintainers are also allowed to commit their own backport PR. They are
allowed to backport their own code, opening a PR to get a CI run for a
backport doesn't change this.

Closes scylladb/scylladb#17727
2024-04-03 09:51:19 +03:00
Botond Dénes
6771c646c4 tools/scylla-nodetool: fix typo: Fore -> For 2024-04-03 02:16:59 -04:00
Botond Dénes
b6db56286a tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands
Just like all the other commands already have it. These commands didn't
have documentation at the point where they were implemented, hence the
missing doc link.

The links don't work yet, but they will work once we release 6.0 and the
current master documentation is promoted to stable.
2024-04-03 02:16:03 -04:00
Piotr Dulikowski
3ba7a4ead2 Merge 'api: upgrade_to_raft topology: add logging' from Benny Halevy
Upgrading raft topology is an important api call
that should be logged.

When failed, it is also important to log the
exception to get better visibility into why
the call failed.

Closes scylladb/scylladb#18143

* github.com:scylladb/scylladb:
  api: storage_service: upgrade_to_raft_topology: fixup indentation
  api: storage_service: upgrade_to_raft_topology: add logging
2024-04-03 07:00:10 +02:00
Pavel Emelyanov
8550a38a8b cql: Reserve vector of column definitions in advance
The vector in question is populted from the content of another map, so
its size is known in advance

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18155
2024-04-02 22:35:10 +03:00
Marcin Maliszkiewicz
562caaf6c6 auth: keep auth version in scylla_local
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.
2024-04-02 19:04:21 +02:00
Benny Halevy
1272d736c0 api: storage_service: upgrade_to_raft_topology: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-02 20:02:51 +03:00
Benny Halevy
31026ae27f api: storage_service: upgrade_to_raft_topology: add logging
Upgrading raft topology is an important api call
that should be logged.

When failed, it is also important to log the
exception to get better visibility into why
the call failed.

Indentation will be fixed in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-02 20:02:49 +03:00
Kefu Chai
15d59db98b cql3: select_statement: include <ranges>
we should include used header, to avoid compilation failures like:
```
cql3/statements/select_statement.cc:229:79: error: no member named 'filter' in namespace 'std::ranges::views'
        for (const auto& used_function : used_functions | std::ranges::views::filter(not_native)) {
                                                          ~~~~~~~~~~~~~~~~~~~~^
1 error generated.`
```
if some of the included header drops its own `#include <optional>`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18145
2024-04-02 18:47:54 +03:00
Botond Dénes
2179bfc40d Merge 'Relax initialization of virtual tables' from Pavel Emelyanov
It now happens in initialize_virtual_tables(), but this function is split into sub-calls and iterates over virtual tables map several times to do its work. This PR squashes it into a straightforward code which is shorter and, hopefully, easier to read.

Closes scylladb/scylladb#18133

* github.com:scylladb/scylladb:
  virtual_tables: Open-code install_virtual_readers_and_writers()
  virtual_tables: Move readers setup loop into add_table()
  virtual_tables: Move tables creation loop into add_table()
  virtual_tables: Make add_tablet() a coroutine
  virtual_tables: Open-code register_virtual_tables()
2024-04-02 13:39:26 +03:00
Botond Dénes
469ff4f290 Merge 'repair: Load repair history in background' from Asias He
Currently, we load the repair history during boot up. If the number of
repair history entries is high, it might take a while to load them.

In my test, to load 10M entries, it took around 60 seconds.

It is not a must to load the entries during boot up. It is better to
load them in the background to speed up the boot time.

Fixes #17993

Closes scylladb/scylladb#17994

* github.com:scylladb/scylladb:
  repair: Load repair history in background
  repair: Abort load_history process in shutdown
2024-04-02 10:53:10 +03:00
Botond Dénes
fd12052c89 Update tools/java/ submodule
* tools/java/ d61296dc...b810e8b0 (1):
  > do not include {dclocal_,}read_repair_chance if not enabled
2024-04-02 10:47:57 +03:00
Yaron Kaikov
fcdb80773e github: sync-labels: run only in scylladb oss repo
We currently support the sync-label only in OSS. Since Scylla-enterprise
get all the commits from OSS repo, the sync-label is running and failing
during checkout (since it's a private repo and should have different
configuration)

For now, let's limit the workflows for oss repo

Closes scylladb/scylladb#18142
2024-04-02 10:45:17 +03:00
Botond Dénes
ffdd47c2b1 Merge 'Track and limit memory used by bloom filters' from Lakshmi Narayanan Sreethar
Added support to track and limit the memory usage by sstable components. A reclaimable component of an SSTable is one from which memory can be reclaimed. SSTables and their managers now track such reclaimable memory and limit the component memory usage accordingly. A new configuration variable defines the memory reclaim threshold. If the total memory of the reclaimable components exceeds this limit, memory will be reclaimed to keep the usage under the limit. This PR considers only the bloom filters as reclaimable and adds support to track and limit them as required.

The feature can be manually verified by doing the following :
1. run a single-node single-shard 1GB cluster
2. create a table with bloom-filter-false-positive-chance of 0.001 (to intentionally cause large bloom filter)
3. populate with tiny partitions
4. watch the bloom filter metrics get capped at 100MB

The default value of the `components_memory_reclaim_threshold` config variable which controls the reclamation process is `.1`. This can also be reduced further during manual tests to easily hit the threshold and verify the feature.

Fixes #17747

Closes scylladb/scylladb#17771

* github.com:scylladb/scylladb:
  test_bloom_filter.py: disable reclaiming memory from components
  sstable_datafile_test: add tests to verify auto reclamation of components
  test/lib: allow overriding available memory via test_env_config
  sstables_manager: support reclaiming memory from components
  sstables_manager: store available memory size
  sstables_manager: add variable to track component memory usage
  db/config: add a new variable to limit memory used by table components
  sstable_datafile_test: add testcase to verify reclamation from sstables
  sstables: support reclaiming memory from components
2024-04-02 10:40:52 +03:00
Amnon Heiman
803d414896 get_description.py: Make the Script a library
This patch makes the get_description.py script easier to use by the
documentation automation:
1. The script is now a library.
2. You can choose the output of the script, currently supported pipee
   and yml.

You can still call the from the command line, like before, but you can
also calls it from another python script.

For example the folowing python script would generate the documentation
for the metrics description of the ./alternator/ttl.cc file.
```

import get_description

metrics = get_description.get_metrics_from_file("./alternator/ttl.cc", "scylla", get_description.get_metrics_information("metrics-config.yml"))
get_description.write_metrics_to_file("out.yaml", metrics, "yml")
```

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#18136
2024-04-02 10:07:11 +03:00
Botond Dénes
ea8478a3e7 scripts/open-coredump.sh: introduce --ci
Coredumps coming from CI are produced by a commit, which is not
available in the scylla.git repository, as CI runs on a merge commit
between the main branch (master or enterprise) and the tested PR branch.
Currently the script will attempt to checkout this commit and will fail
as the commit hash is unrecognized.
To work around this, add a --ci flag, which when used, will force the
main branch to be checked out, instead of the commit hash.

Closes scylladb/scylladb#18023
2024-04-02 09:27:52 +03:00
Kefu Chai
55d0ea48bd test: randomized_nemesis_test: remove fmt::formatter for seastar::timed_out_error
This reverts commit 97b203b1af.

since Seastar provides the formatter, it's not necessary to vendor it in
scylladb anymore.

Refs #13245

Closes scylladb/scylladb#18114
2024-04-02 09:25:51 +03:00
Benny Halevy
d5ac0c06b3 test_sstable_reversing_reader_random_schema: drop workaround for #9352
Issue #9352 was fixed about a year and a half ago
so this workaround should not be needed anymore.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18121
2024-04-02 09:25:06 +03:00
Raphael S. Carvalho
29f9f7594f replica: Kill table::storage_group_id_for_token()
storage_group_id_for_token() was only needed from within
tablet_storage_group_manager, so we can kill
table::storage_group_id_for_token().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18134
2024-04-02 09:23:23 +03:00
Asias He
99b7ccfa8b repair: Load repair history in background
Currently, we load the repair history during boot up. If the number of
repair history entries is high, it might take a while to load them.

In my test, to load 10M entries, it took around 60 seconds.

It is not a must to load the entries during boot up. It is better to
load them in the background to speed up the boot time.

Fixes #17993
2024-04-02 09:24:35 +08:00
Asias He
523895145d repair: Abort load_history process in shutdown
If the node is shutting down, there is no point to continue to load the
repair history.

Refs #17993
2024-04-02 09:24:35 +08:00
Lakshmi Narayanan Sreethar
d86505e399 test_bloom_filter.py: disable reclaiming memory from components
Disabled reclaiming memory from sstable components in the testcase as it
interferes with the false positive calculation.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
d261f0fbea sstable_datafile_test: add tests to verify auto reclamation of components
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
169629dd40 test/lib: allow overriding available memory via test_env_config
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
a36965c474 sstables_manager: support reclaiming memory from components
Reclaim memory from the SSTable that has the most reclaimable memory if
the total reclaimable memory has crossed the threshold. Only the bloom
filter memory is considered reclaimable for now.

Fixes #17747

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
2ca4b0a7a2 sstables_manager: store available memory size
The available memory size is required to calculate the reclaim memory
threshold, so store that within the sstables manager.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
f05bb4ba36 sstables_manager: add variable to track component memory usage
sstables_manager::_total_reclaimable_memory variable tracks the total
memory that is reclaimable from all the SSTables managed by it.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
e8026197d2 db/config: add a new variable to limit memory used by table components
A new configuration variable, components_memory_reclaim_threshold, has
been added to configure the maximum allowed percentage of available
memory for all SSTable components in a shard. If the total memory usage
exceeds this threshold, it will be reclaimed from the components to
bring it back under the limit. Currently, only the memory used by the
bloom filters will be restricted.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
e0b6186d16 sstable_datafile_test: add testcase to verify reclamation from sstables
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
4f0aee62d1 sstables: support reclaiming memory from components
Added support to track total memory from components that are reclaimable
and to reclaim memory from them if and when required. Right now only the
bloom filters are considered as reclaimable components but this can be
extended to any component in the future.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Pavel Emelyanov
627c5fdf04 virtual_tables: Open-code install_virtual_readers_and_writers()
It's pretty short already and is naturally a "part" of
initialize_virtual_tables(). Neither it installs writers any longer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:02:40 +03:00
Pavel Emelyanov
1d79cfc6cf virtual_tables: Move readers setup loop into add_table()
Similarly to previous patch, after virtual tables are registered the
registry is iterated over to install virtual readers onto each entry.
Again, this can happen at the time of registering, no need in dedicated
loop for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:01:50 +03:00
Pavel Emelyanov
891e792717 virtual_tables: Move tables creation loop into add_table()
Once virtual_tables map is populated, it's iterated over to create
replica::table entries for each virtual table. This can be done in the
same place where the virtual table is created, no need in dedicated loop
for it nowadays.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:00:38 +03:00
Pavel Emelyanov
420ce3634f virtual_tables: Make add_tablet() a coroutine
Next patches will populate it with sleeping calls, this patch prepares
for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 19:00:15 +03:00
Pavel Emelyanov
ddc6f9279f virtual_tables: Open-code register_virtual_tables()
It's naturally a "part" of initialize_virtual_tables(). Further patching
gets possible with it being open-coded.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-01 18:59:18 +03:00
Kefu Chai
c5601a749e github: sync_labels: do not error out if PR's cover letter is empty
if a pull request's cover letter is empty, `pr.body` is None. in that
case we should not try to pass it to `re.findall()` as the "string"
parameter. otherwise, we'd get

```
TypeError: expected string or bytes-like object, got 'NoneType'
```
so, in this change, we just return an empty list if the PR in question
has an empty cover letter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18125
2024-04-01 18:13:22 +03:00
Avi Kivity
88fb686d67 test: generate core dumps on crashes in debug clusters
The cluster manager library doesn't set the asan/ubsan options
to abort on error and create core dumps; this makes debugging much
harder.

Fix by preparing the environment correctly.

Fixes scylladb/scylladb#17510

Closes scylladb/scylladb#17511
2024-04-01 18:11:41 +03:00
Kefu Chai
07c40f5600 github: sync_labels: use ${{}} expression syntax in "if" condition
to ensure that the expression is evaluated properly.
see https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runsstepsif

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18127
2024-04-01 17:17:43 +03:00
Kefu Chai
1494499f90 github: sync_labels: checkout a single file not the whole repo
what we need is but a script, so instead of checkout the whole repo,
with all history for all tags and branches, let's just checkout
a single file. faster this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18126
2024-04-01 17:15:50 +03:00
Yaron Kaikov
b8c705bc54 .github: sync-labels: fix pull request permissions
when adding a label to a PR request we keep getting the following error
message:
```
Traceback (most recent call last):
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 93, in <module>
    main()
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main
    sync_labels(repo, args.number, args.label, args.action, args.is_issue)
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 74, in sync_labels
    target.add_to_labels(label)
  File "/usr/lib/python3/dist-packages/github/Issue.py", line 321, in add_to_labels
    headers, data = self._requester.requestJsonAndCheck(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck
    return self.__check(
  File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 403 {"message": "Resource not accessible by integration", "documentation_url": "https://docs.github.com/rest/issues/labels#add-labels-to-an-issue"}
```

Based on
https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token.
The maximum access for pull requests from public forked repositories is
set to `read`

Switching to `pull_request_target` to solve it

Fixes: https://github.com/scylladb/scylladb/issues/18102

Closes scylladb/scylladb#18052
2024-04-01 17:11:35 +03:00
Pavel Emelyanov
46bbfc0c53 expression: Shorten making raw_value from FragmetedView
The read_field is std::optional<View>. The raw_value::make_value()
accepts managed_bytes_opt which is std::optional<manager_bytes>.
Finally, there's std::optional<T>::optional(std::optional<U>&&)
move constructor (and its copy-constructor peer).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18128
2024-04-01 16:52:18 +03:00
Benny Halevy
01fc1a9f66 schema_tables: std::move mutation into the mutation vector
To save a copy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18120
2024-04-01 14:16:30 +03:00
Pavel Emelyanov
5427967f45 schema: Introduce build() && overload
The schema_builder::build() method creates a copy of raw schema
internaly in a hope that builder will be updated and be asked to build
the resulting schema again (e.g. alternator uses this).

However, there are places that build schema using temporary object once
in a `return schema_builder().with_...().build()` manner. For those
invocations copying raw schema is just waste of cycles.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18094
2024-04-01 14:00:42 +03:00
Takuya ASADA
be3776ec2a configure.py: add --build-dir option
Add --build-dir option to specify build directory.
This is needed for optimized clang support, since it requires to build
Scylla in tools/toolchain/prepare, w/o deleting current build/
directory.
2024-04-01 18:35:42 +09:00
Nadav Har'El
b6854cbb21 Merge 'test/cql-pytest: match error message formated using {fmt} ' from Kefu Chai
currently, our homebrew formatter formats `std::map` like
```
{{k1, v1}, {k2, v2}}
```
while {fmt} formats a map like:
```
{k1: v1, k2: v2}
```
and if the type of key/value is string, {fmt} quotes it, so a

compaction strategy option is formatted like
```
{"max_threshold": "1"}
```
before switching the formatter to the ones supported by {fmt},
let's update the test to match with the new format. this should
reduce the overhead of reviewing the change of switching the
formatter. we can revert this change, and use a simpler approach
after the change of formatter lands.

Closes scylladb/scylladb#18058

* github.com:scylladb/scylladb:
  test/cql-pytest: match error message formated using {fmt}
  test/cql-pytest: extract scylla_error() for not allowed options test
2024-04-01 11:23:24 +03:00
Kefu Chai
fcf7ca5675 utils/logalloc: do not allocate memory in reclaim_timer::report()
before this change, `reclaim_timer::report()` calls

```c++
fmt::format(", at {}", current_backtrace())
```

which allocates a `std::string` on heap, so it can fail and throw. in
that case, `std::terminate()` is called. but at that moment, the reason
why `reclaim_timer::report()` gets called is that we fail to reclaim
memory for the caller. so we are more likely to run into this issue. anyway,
we should not allocate memory in this path.

in this change, a dedicated printer is created so that we don't format
to a temporary `std::string`, and instead write directly to the buffer
of logger. this avoids the memory allocation.

Fixes #18099
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18100
2024-04-01 11:01:52 +03:00
Botond Dénes
885cb2af07 utils/rjson: include tasklocal backtrace in rapidjson assert error message
Currently, the error message on a failed RAPIDJSON_ASSERT() is this:

    rjson::error (JSON error: condition not met: false)

This is printed e.g. when the code processing a json expects an object
but the JSON has a different type. Or if a JSON object is missing an
expected member. This message however is completely inadequate for
determinig what went wrong. Change this to include a task-local
backtrace, like a real assert failure would. The new error looks like
this:

    rjson::error (JSON assertion failed on condition '{}' at: libseastar.so+0x56dede 0x2bde95e 0x2cc18f3 0x2cf092d 0x2d2316b libseastar.so+0x46b623)

Closes scylladb/scylladb#18101
2024-03-29 18:41:54 +01:00
Pavel Emelyanov
41a1b1c0d0 move_tablets: Emplace mutations into vector, not push
It's more applicable in this case.

Also, built tablets mutations are casted to canonical_mutations, but
when emplaced compiler can pick-up canonical_mutation(const mutation&)
constructor and the cast is not required.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18090
2024-03-29 15:21:49 +02:00
Kamil Braun
f5603ad9ca Merge 'test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero' from Mikołaj Grzebieluch
Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp by adding `ring_delay_ms` to it.

In this test, nodes are learning about new generations (introduced by upgrade procedure and then by node bootstrap) concurrently with doing writes that should go to these generations.

Because of `ring_delay_ms = 0', the generation could have been committed when it should have already been in use.

This can be seen in the following logs from a node:
```
ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams.
```

Creating writes during such a generation can result in assigning them a wrong generation or a failure. Failure may occur if it hits short time window when `generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed
`svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet been executed. With a nonzero ring_delay_ms it's not a problem, because during this time window, the generation should not be in use.

Write can fail with the following response from a node:
```
cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node.
```

Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes. Wait for the last generation to be in use and sleep one second to make sure there are writes to the CDC table in this generation.

Fixes scylladb/scylladb#17977

Reapply b4144d14c6.

Closes scylladb/scylladb#17998

* github.com:scylladb/scylladb:
  test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero
  Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
2024-03-29 12:52:31 +01:00
Tzach Livyatan
4930095d39 Docs: Fix link fro scylla-sstable.rst to /architecture/sstable/
Fix https://github.com/scylladb/scylladb/issues/18096

Closes scylladb/scylladb#18097
2024-03-29 10:48:24 +02:00
Piotr Dulikowski
57719ece4f Merge 'main: reload service levels data accessor after join_cluster' from Marcin Maliszkiewicz
Setting data accessor implicitly depends on node joining the cluster
with raft leader elected as only then service level mutation is put
into scylla_local table. Calling it after join_cluster avoids starting
new cluster with older version only to immediately migrate it to the
latest one in the background.

Closes scylladb/scylladb#18040

* github.com:scylladb/scylladb:
  main: reload service levels data accessor after join_cluster
  service: qos: create separate function for reloading data accessor
2024-03-29 09:39:11 +01:00
Kefu Chai
1632fbbef9 test/cql-pytest: match error message formated using {fmt}
currently, our homebrew formatter formats `std::map` like

{{k1, v1}, {k2, v2}}

while {fmt} formats a map like:

{k1: v1, k2: v2}

and if the type of key/value is string, {fmt} quotes it, so a

compaction strategy option is formatted like

{"max_threshold": "1"}

before switching the formatter to the ones supported by {fmt},
let's update the test to match with the new format. this should
reduce the overhead of reviewing the change of switching the
formatter. we can revert this change, and use a simpler approach
after the change of formatter lands.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-29 08:07:59 +08:00
Kefu Chai
8f47fcedf6 test/cql-pytest: extract scylla_error() for not allowed options test
currently, our homebrew formatter formats `std::map` like

{{k1, v1}, {k2, v2}}

while {fmt} formats a map like:

{k1: v1, k2: v2}

and if the type of key/value is string, {fmt} quotes it, so a

compaction strategy option is formatted like

{"max_threshold": "1"}

as we are switching to the formatters provided by {fmt}, would be
better to support its convention directly.

so, in this change, to prepare the change, before migrating to
{fmt}, let's refactor the test to support both formats by
extracting a helper to format the error message, so that we can
change it to emit both formats.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-29 08:03:02 +08:00
Mikołaj Grzebieluch
1e2607563f test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero
Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp
by adding `ring_delay_ms` to it.

In this test, nodes are learning about new generations (introduced by upgrade
procedure and then by node bootstrap) concurrently with doing writes that
should go to these generations.

Because of `ring_delay_ms = 0', the generation could have been committed when
it should have already been in use.

This can be seen in the following logs from a node:
```
ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams.
```

Creating writes during such a generation can result in assigning them a wrong
generation or a failure. Failure may occur if it hits short time window when
`generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed
`svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet
been executed. With a nonzero ring_delay_ms it's not a problem, because during
this time window, the generation should not be in use.

Write can fail with the following response from a node:
```
cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node.
```

Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes.
Wait for the last generation to be in use and sleep one second to make sure
there are writes to the CDC table in this generation.

Fixes #17977
2024-03-28 17:13:43 +01:00
Botond Dénes
4c0dadee7c Merge 'test: changes to prepare for dropping FMT_DEPRECATED_OSTREAM' from Kefu Chai
this series includes test related changes to enable us to drop `FMT_DEPRECATED_OSTREAM` deprecated in {fmt} v10.

Refs #13245

Closes scylladb/scylladb#18054

* github.com:scylladb/scylladb:
  test: unit: add fmt::formatter for test_data in tests
  test/lib: do not print with fmt::to_string()
  test/boost: print runtime_error using e.what()
2024-03-28 15:33:56 +02:00
Kamil Braun
33751f8f4e Merge 'raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC' from Gleb
* 'gleb/raft_snapshot_rpc-v3' of github.com:scylladb/scylla-dev:
  raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC
  Use correct limit for raft commands throughout the code.
2024-03-28 14:25:58 +01:00
Nadav Har'El
566223c34a Merge ' tools/scylla-nodetool: repair: abort on first failed repair' from Botond Dénes
When repairing multiple keyspaces, bail out on the first failed keyspace repair, instead of continuing and reporting all failures at the end. This is what Origin does as well.

To be able to test this, a bit of refactoring was needed, to be able to assert that `scylla-nodetool` doesn't make repair requests, beyond the expected ones.

Refs: https://github.com/scylladb/scylla-cluster-tests/issues/7226

Closes scylladb/scylladb#17678

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: repair: abort on first failed repair
  test/nodetool: nodetool(): add check_return_code param
  test/nodetool: nodetool(): return res object instead of just stdout
  test/nodetool: count unexpected requests
2024-03-28 14:02:29 +02:00
Botond Dénes
81bbfae77a tools/scylla-nodetool: implement the checkAndRepairCdcStreams command
Closes scylladb/scylladb#18076
2024-03-28 13:54:37 +02:00
Pavel Emelyanov
1adf16ce73 Merge 'network_topology_strategy: reallocate_tablets: support for rf changes' from Benny Halevy
This series provides a reallocate_tablets function, that's initially called by allocate_tablets_for_new_table.
The new allocation implementation is independent of vnodes/token ownership.
Rather than using the natural_endpoints_tracker, it implements its own tracking
based on dc/rack load (== number of replicas in rack), with the additional benefit
that tablet allocation will balance the allocation across racks, using a heap structure,
similar to the one we use to balance tablet allocation across shards in each node.

reallocate_tablets may also be called with an optional parameter pointing the the current tablet_map.
In this case the function either allocates more tablet replicas in datacenters for which the replication factor was increased,
or it will deallocate tablet replicas from datacenters for which replication factor was decreased.

The NetworkTopologyStrategy_tablets_test unit test was extended to cover replication factor changes.

Closes scylladb/scylladb#17846

* github.com:scylladb/scylladb:
  network_topology_strategy: reallocate_tablets: consider new_racks before existing racks
  network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test
  network_topology_strategy: reallocate_tablets: support deallocation via rf change
  network_topology_startegy_test: tablets_test: randomize cases
  network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership
  network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test
  network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions
  network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes
  network_topology_strategy_test: endpoints_check: strictly check rf for tablets
  network_topology_strategy_test: full_ring_check for tablets: drop unused options param
2024-03-28 11:19:11 +03:00
Kefu Chai
2bfc7324d4 mutation: friend fmt::formatter<atomic_cell> in atomic_cell_view
GCC-14 rightly points out that the constructor of `atomic_cell_view`
is marked private, and cannot be called from its formatter:
```
/usr/bin/g++-14 -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/var/ssd/scylladb -I/var/ssd/scylladb/build/gen -I/var/ssd/scylladb/seastar/include -I/var/ssd/scylladb/build/seastar/gen/include -I/var/ssd/scylladb/build/seastar/gen/src -g -Og -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unused-parameter -ffile-prefix-map=/var/ssd/scylladb=. -march=westmere -Wstack-usage=40960 -U_FORTIFY_SOURCE -Wno-maybe-uninitialized -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -MF mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o.d -o mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -c /var/ssd/scylladb/mutation/atomic_cell.cc
In file included from /var/ssd/scylladb/mutation/atomic_cell.cc:9:
/var/ssd/scylladb/mutation/atomic_cell.hh: In member function ‘auto fmt::v10::formatter<atomic_cell>::format(const atomic_cell&, fmt::v10::format_context&) const’:
/var/ssd/scylladb/mutation/atomic_cell.hh:413:67: error: ‘atomic_cell_view::atomic_cell_view(basic_atomic_cell_view<is_mutable>) [with mutable_view is_mutable = mutable_view::yes]’ is private within this context
  413 |         return fmt::format_to(ctx.out(), "{}", atomic_cell_view(ac));
      |                                                                   ^
/var/ssd/scylladb/mutation/atomic_cell.hh:275:5: note: declared private here
  275 |     atomic_cell_view(basic_atomic_cell_view<is_mutable> view)
      |     ^~~~~~~~~~~~~~~~
```
so, in this change, we make the formatter a friend of
`atomic_cell_view`.
since the operator<< was dropped, there is no need to keep its friend
declaration around, so it is dropped in this change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18081
2024-03-28 09:44:00 +02:00
Kefu Chai
99e743de9d test: nodetool: match with vector printed by {fmt}
our homebrew formatter for std::vector<string> formats like

```
{hello, world}
```

while {fmt}'s formatter for sequence-like container formats like

```
["hello", "world"]
```

since we are moving to {fmt} formatters. and in this context,
quoting the verbatim text makes more sense to user. let's
support the format used by {fmt} as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18057
2024-03-28 09:35:37 +02:00
Kefu Chai
c2ffa0d813 bytes.hh: stop at '}' in fmt::formatter<fmt_hex>
according to {fmt}'s document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,

```
  // the range will contain "f} continued". The formatter should parse
  // specifiers until '}' or the end of the range. In this example the
  // formatter should parse the 'f' specifier and return an iterator
  // pointing to '}'.
```

so we should check for _both_ '}' and end of the range. when building
scylla with {fmt} 10.2.1, it fails to build code like

```c++
fmt::format_to(out, "{}", fmt_hex(frag))
```

as {fmt}'s compile-time checker fails to parse this format string
along with given argument, as at compile time,
```c++
throw format_error("invalid group_size")
```
is executed.

so, in this change, we check both '}' and the end of range.

the change which introduced this formatter was
2f9dfba800

Refs 2f9dfba800
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18080
2024-03-28 08:58:36 +02:00
Marcin Maliszkiewicz
50e0032bca test: auth: remove if not exists from auth cql statement
They were added due to https://github.com/scylladb/python-driver/issues/296
but looks like it no longer reproduces.

Change was tested with ./test.py -vv --repeat=100 test_auth
to minimize chance of introducing flakiness.

Closes scylladb/scylladb#18043
2024-03-28 06:06:45 +01:00
Raphael S. Carvalho
902c71bac8 storage_service: Fix undefined behavior in stream_tablet()
correctness when constructing range_streamer depends on compiler
evaluation order of params.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18079
2024-03-27 23:50:37 +01:00
Gleb Natapov
6e6aefc9ab raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC
We have new, more generic, RPC to pull group0 mutations now: RAFT_PULL_SNAPSHOT.
Use it instead of more specific RAFT_PULL_TOPOLOGY_SNAPSHOT one.
2024-03-27 19:18:45 +02:00
Gleb Natapov
c1dcf0fae7 Use correct limit for raft commands throughout the code.
Raft uses schema commitlog, so all its limits should be derived from
this commitlog segment size, but many places used regular commitlog size
to calculate the limits and did not do what they really suppose to be
doing.
2024-03-27 19:16:09 +02:00
Kamil Braun
c3989d8e03 Merge 'storage_service: keep subscription to raft topology feature alive' from Piotr Dulikowski
The storage_service::track_upgrade_progress_to_topology_coordinator
function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
cluster feature (among other things) before starting the
raft_state_monitor_fiber. The wait is realized by passing a callback to
feature::when_enabled which sets a shared_promise that is waited on by
the tracking fiber. If the feature is already enabled, when_enabled will
call the callback immediately. However, if it's not, then it will return
a non-null listener_registration object - as long as it is alive, the
callback is registered. The listener_registration object was not
assigned to a variable which caused it to be destroyed shortly after the
when_enabled function returns.

Due to that, if upgrade was requested but the current group0 leader
didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled
right after boot, the upgrade would not start until the leader is
changed to a node which has that cluster feature already enabled on
boot. Moreover, the topology coordinator would not start on such a node
until the node were rebooted.

Fix the issue by assigning the subscription to a variable.

Fixes: scylladb/scylladb#18049

Closes scylladb/scylladb#18051

* github.com:scylladb/scylladb:
  gms: feature: mark when_enabled(func) with nodiscard
  storage_service: keep subscription to raft topology feature alive
2024-03-27 14:46:43 +01:00
Avi Kivity
96a3544739 Merge 'alternator: reduce stall for Query and Scan with large pages' from Nadav Har'El
Before this series, Alternator's Query and Scan operations convert an
entire result page to JSON without yielding. For a page of maximum
size (1MB) and tiny rows, this can cause a significant stall - the
test included in this PR reported stalls of 14-26ms on my laptop.

The problem is the describe_items() function, which does this conversion
immediately, without yielding. This patch changes this function to
return a future, and use a new result_set::visit_gently() method
that does what visit() does, but with yields when needed.

This PR improves #17995, but does not completely fix is as the stalls in the
are not completely eliminated. But on my laptop it usually reduces the stalls
to around 5ms. It appears that the remaining stalls some from other places
not fixed in this PR, such as perhaps query_page::handle_result(), and will need
to be fixed by additional patches.

Closes scylladb/scylladb#18036

* github.com:scylladb/scylladb:
  alternator: reduce stall for Query and Scan with large pages
  result_set: introduce visit_gently()
  alternator: coroutinize do_query() function
2024-03-27 15:06:32 +02:00
Kamil Braun
404406e6a1 Merge ' test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables' from Botond Dénes
Memtables are fickle, they can be flushed when there is memory pressure,
if there is too much commitlog or if there is too much data in them. The
tests in test_select_from_mutation_fragments.py currently assume data
written is in the memtable. This is tru most of the time but we have
seen some odd test failures that couldn't be understood.  To make the
tests more robust, flush the data to the disk and read it from the
sstables. This means that some range scans need to filter to read from
just a single mutation source, but this does not influence the tests.
Also fix a use-after-return found when modifying the tests.

This PR tentatively fixes the below issues, based on our best guesses on why they failed (each was seen just once):
Fixes: scylladb/scylladb#16795
Fixes: scylladb/scylladb#17031

Closes scylladb/scylladb#17562

* github.com:scylladb/scylladb:
  test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables
  cql3: select_statement: mutation_fragments_select_statement: fix use-after-return
2024-03-27 13:21:19 +01:00
Botond Dénes
fdd5367974 Merge 'compaction: implement unchecked_tombstone_compaction' from Ferenc Szili
This change adds the missing Cassandra compaction option unchecked_tombstone_compaction.
Setting this option to true causes the compaction to ignore tombstone_threshold, and decide whether to do a compaction only based on the value of tombstone_compaction_interval

Fixes #1487

Closes scylladb/scylladb#17976

* github.com:scylladb/scylladb:
  removed forward declaration of resharding_descriptor
  compaction options and troubleshooting docs
  cql-pytest/test_compaction_strategy_validation.py
  test/boost/sstable_compaction_test.cc
  compaction: implement unchecked_tombstone_compaction
2024-03-27 13:56:02 +02:00
Kefu Chai
6bd0be71ab mutation: add fmt::formatter for invalid_mutation_fragment_stream
before this change, we rely on the default-generated fmt::formatter
created from operator<<. but this depends on the
`FMT_DEPRECATED_OSTREAM` macro which is not respected in {fmt} v10.

this change addresses the formatting with fmtlib < 10, and without
`FMT_DEPRECATED_OSTREAM` defined. please note, in {fmt} v10 and up,
it defines formatter for classes derived from `std::exception`, so
our formatter is only added when compiled with {fmt} < 10.

in this change, `fmt::formatter<invalid_mutation_fragment_stream>`
is added for backward compatibility with {fmt} < 10.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18053
2024-03-27 13:37:48 +02:00
Kefu Chai
d1e8d89ae2 doc: topology-over-raft: add transition_state to node state diagram
in order to help the developers to understand the transitions
of `node_state` and the `transition_state` on each of the `node_state`,
in this change, the nested state machine diagram is added to the
node state diagram.

please note, instead of trying to merge similar states like
bootstrapping and replacing into a single state, we keep them as
separate ones, and replicate the nested state machine diagram in them
as well, to be more clear.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18025
2024-03-27 12:16:35 +01:00
Andrei Chekun
0752ef1481 test: remove skip annotation for multi-DC test with 5 DCs with one node in each
As a follow-up of the https://github.com/scylladb/scylladb/pull/17503 remove skip annotation for the multi-DC test with a reduced amount of the DC used in it: from 30 DCs to 5 DCs

Closes scylladb/scylladb#17898
2024-03-27 13:13:13 +02:00
Michał Chojnowski
295b27a07b cache_flat_mutation_reader: only call get_iterator_in_latest() when pointing at a row
Calling `_next_row.get_iterator_in_latest()` is illegal when `_next_row` is not
pointing at a row. In particular, the iterator returned by such call might be
dangling.

We have observed this to cause a use-after-free in the field, when a reverse
read called `maybe_add_to_cache` after `_latest_it` was left dangling after
a dead row removal in `copy_from_cache_to_buffer`.

To fix this, we should ensure that we only call `_next_row.get_iterator_in_latest`
is pointing at a row.

Only the occurrences of this problem in `maybe_add_to_cache` are truly dangerous.
As far as I can see, other occurrences can't break anything as of now.
But we apply fixes to them anyway.

Closes scylladb/scylladb#18046
2024-03-27 11:48:42 +01:00
Kamil Braun
d274f63d89 Merge 'Add support for "initial-token" parameter in raft mode' from Gleb
Fixes scylladb/scylladb#17893

* 'gleb/initial-token-v1' of github.com:scylladb/scylla-dev:
  dht: drop unused parameter from get_random_bootstrap_tokens() function
  test: add test for initial_token parameter
  topology coordinator: use provided initial_token parameter to choose bootstrap tokens
  topology cooordinator: propagate initial_token option to the coordinator
2024-03-27 11:41:06 +01:00
Kefu Chai
71a519dee8 test: unit: add fmt::formatter for test_data in tests
this change is created in same spirit of d1c35f943d.

before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for test_data in
radix_tree_stress_test.cc, and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Kefu Chai
4f8c1a4729 test/lib: do not print with fmt::to_string()
we should not format a variable unless we want to print it. in this
case, we format `first_row` using `fmt::to_string()` to a string,
and then insert the string to another string, despite that this is
in a cold path, this is still a anti pattern -- both convoluted,
and not performant.

so let's just pass `first_row` to `format()`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Kefu Chai
d0ceb35e7e test/boost: print runtime_error using e.what()
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter. but fortunately, fmt v10 brings the builtin
formatter for classes derived from `std::exception`. but before
switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM`
macro, we need to print out `std::runtime_error`. so far, we don't
have a shared place for formatter for `std::runtime_error`. so we
are addressing the needs on a case-by-case basis.

in this change, we just print it using `e.what()`. it's behavior
is identical to what we have now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 18:18:32 +08:00
Benny Halevy
8a77319cb7 network_topology_strategy: reallocate_tablets: consider new_racks before existing racks
Allocate first from new (unpopulated) racks before
allocating from racks that are already populated
with replicas.

Still, rotate both new and existing racks by tablet id
to ensure fairness.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
c5ff060dee network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test
Test that tablet allocation is balanced across
racks, nodes, and shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
4a7d57525e network_topology_strategy: reallocate_tablets: support deallocation via rf change
Add support for deallocating tablet replicas when the
datacenter replication factor is decreased.

We deallocate replicas back-to-front order to maintain
replica pairing between the base table and
its materialized views.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
1e8f8db5b8 network_topology_startegy_test: tablets_test: randomize cases
Instead of deterministically testing a very small set of cases,
randomize the the shard_count per node, the cluster topology
and the NetworkTopologyStrategy options.

The next patch will extend the test to also test
`reallocate_tablets` with randomized options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
898cd1d404 network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership
Base initial tablets allocation for new table
on the dc/rack topology, rather then on the token ring,
to remove the dependency on token ownership.

We keep the rack ordinal order in each dc
to facilitate in-rack pairing of base/view
replica pairing, and we apply load-balancing
principles by sorting the nodes in each rack
by their load (number of tablets allocated to
the node), and attempting to fill lease-loaded
nodes first.

This method is more efficient than circling
the token ring and attemting to insert the endpoints
to the natural_endpoint_tracker until the replication
factor per dc is fulfilled, and it allows an easier
way to incrementally allocate more replicas after
rf is increased.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:21 +02:00
Botond Dénes
f70f04c240 tools/scylla-nodetool: repair: abort on first failed repair
When repairing multiple keyspaces, bail out on the first failed keyspace
repair, instead of continuing and reporting all failures at the end.
This is what Origin does as well.
2024-03-27 05:46:18 -04:00
Mikołaj Grzebieluch
fa4193e09f Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
This reverts commit 230f23004b.
2024-03-27 10:39:01 +01:00
Benny Halevy
40a4b349bd network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test
Test that we attempting to allocate tablets
throws an error when there are not enough nodes
for the configured replication factor.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
f19dbb4ae5 network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions
Using e.g. `BOOST_CHECK_EQUAL(endpoints.size(), total_rf)`
rather than `BOOST_CHECK(endpoints.size() == total_rf)`
prints a more detailed error message that includes the
runtime valies, if it fails.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
93b6573a90 network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
c11ffd14cc network_topology_strategy_test: endpoints_check: strictly check rf for tablets
With tablet we want to verify that the number of
replicas allocated per tablet per dc exactly matches
the replication strategy per-dc replication factor options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Benny Halevy
ffa5870758 network_topology_strategy_test: full_ring_check for tablets: drop unused options param
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 10:35:04 +02:00
Botond Dénes
764e9a344d test/nodetool: nodetool(): add check_return_code param
When set to false, the returncode is not checked, this is left to the
caller. This in turn allows for checking the expected and unexpected
requests which is not checked when the nodetool process fails.
This is used by utils._do_check_nodetool_fails_with(), so that expected
and unexpected requests are checked even for failed invocations.

Some test need adjustment to the stricter checks.
2024-03-27 04:18:19 -04:00
Botond Dénes
8f3b1db37f test/nodetool: nodetool(): return res object instead of just stdout
So callers have access to stderr, return code and more.
This causes some churn in the test, but the changes are mechanical.
2024-03-27 04:18:19 -04:00
Kefu Chai
2e2c3a5fea locator: fix a typo in comment
s/Substracts/Subtracts/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18048
2024-03-27 10:15:18 +02:00
Piotr Dulikowski
e76817502f gms: feature: mark when_enabled(func) with nodiscard
The feature::when_enabled function takes a callback and returns a
listener_registration object. Unless the feature were enabled right from
the start, the listener_registration will be non-null and will keep the
callback registered until the registration is destroyed. If the
registration is destroyed before the feature is enabled, the callback
will not be called. It's easy to make a mistake and forget to keep the
returned registration alive - especially when, in tests, the feature is
enabled early in boot, because in that case when_enabled calls the
callback immediately and returns a null object instead.

In order to prevent issues with prematurely dropped
listener_registration in the future, mark feature::when_enabled with the
[[nodiscard]] attribute.
2024-03-27 08:55:45 +01:00
Piotr Dulikowski
7ea6e1ec0a storage_service: keep subscription to raft topology feature alive
The storage_service::track_upgrade_progress_to_topology_coordinator
function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
cluster feature (among other things) before starting the
raft_state_monitor_fiber. The wait is realized by passing a callback to
feature::when_enabled which sets a shared_promise that is waited on by
the tracking fiber. If the feature is already enabled, when_enabled will
call the callback immediately. However, if it's not, then it will return
a non-null listener_registration object - as long as it is alive, the
callback is registered. The listener_registration object was not
assigned to a variable which caused it to be destroyed shortly after the
when_enabled function returns.

Due to that, if upgrade was requested but the current group0 leader
didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled
right after boot, the upgrade would not start until the leader is
changed to a node which has that cluster feature already enabled on
boot. Moreover, the topology coordinator would not start on such a node
until the node were rebooted.

Fix the issue by assigning the subscription to a variable.
2024-03-27 08:55:45 +01:00
Botond Dénes
2d12db81cf Merge 'docs: document nodetool {getsstables, sstableinfo}' from Kefu Chai
these two subcommands are provided by cassandra, and are also implemented natively in scylla. so let's document them.

Closes scylladb/scylladb#17982

* github.com:scylladb/scylladb:
  docs/operating-scylla: document nodetool sstableinfo
  docs/operating-scylla: document nodetool getsstables
2024-03-27 09:04:55 +02:00
Botond Dénes
4d98b7d532 test/nodetool: count unexpected requests
We currently check at the end of each test, that all expected requests
set by the test were consumed. This patch adds a mechanism to count
unexpected requests -- requests which didn't match any of the expected
ones set by the test. This can be used to asser that nodetool didn't
make any request to the server, beyond what the test expected it to do.
Before this patch, requests like this would only be noticed by the test,
if the response of 404/500 caused nodetool to fail, which is not always
the case.
2024-03-27 02:39:28 -04:00
Kefu Chai
8af9c735f2 docs/operating-scylla: document nodetool sstableinfo
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 07:29:24 +08:00
Kefu Chai
da90e368dc docs/operating-scylla: document nodetool getsstables
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-27 07:29:24 +08:00
Pavel Emelyanov
04370dc8a4 tablets: Introduce substract_sets()
There are several places in code that calculate replica sets associated
with specific tablet transision. Having a helper to substract two sets
improves code readability.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18033
2024-03-26 23:33:06 +02:00
Tomasz Grabiec
042a4b7627 Merge 'tablets: add warning on CREATE KEYSPACE' from Nadav Har'El
The CDC feature is not supported on a table that uses tablets
(Refs https://github.com/scylladb/scylladb/issues/16317), so if a user creates a keyspace with tablets enabled
they may be surprised later (perhaps much later) when they try to enable
CDC on the table and can't.

The LWT feature always had issue Refs https://github.com/scylladb/scylladb/issues/5251, but it has become potentially
more common with tablets.

So it was proposed that as long as we have missing features (like CDC or
LWT), every time a keyspace is created with tablets it should output a
warning (a bona-fide CQL warning, not a log message) that some features
are missing, and if you need them you should consider re-creating the
keyspace without tablets.

This PR does this.

The warning text which will be produced is the following (obviously, it can
be improved later, as we perhaps find more missing features):

>   "Tables in this keyspace will be replicated using tablets, and will
>    not support the CDC feature (issue https://github.com/scylladb/scylladb/issues/16317) and LWT may suffer from
>    issue https://github.com/scylladb/scylladb/issues/5251 more often. If you want to use CDC or LWT, please drop
>    this keyspace and re-create it without tablets, by adding AND TABLETS
>    = {'enabled': false} to the CREATE KEYSPACE statement."

This PR also includes a test - that checks that this warning is is
indeed generated when a keyspace is created with tablets (either by default
or explicitly), and not generated if the keyspace is created without
tablets. It also fixes existing tests which didn't like the new warning.

Fixes https://github.com/scylladb/scylladb/issues/16807

Closes scylladb/scylladb#17318

* github.com:scylladb/scylladb:
  tablets: add warning on CREATE KEYSPACE
  test/cql-pytest: fix guadrail tests to not be sensitive to more warnings
2024-03-26 20:04:07 +01:00
Gleb Natapov
9b00847f31 dht: drop unused parameter from get_random_bootstrap_tokens() function 2024-03-26 18:43:31 +02:00
Gleb Natapov
ed534fde8f test: add test for initial_token parameter
Test that configured tokens are used and tokens collision is detected.
2024-03-26 18:43:31 +02:00
Gleb Natapov
06952ec6dd topology coordinator: use provided initial_token parameter to choose bootstrap tokens
Use the same logic as with gossiper to choose bootstrap tokens in case
initial_token parameters is not empty.
2024-03-26 18:43:25 +02:00
Gleb Natapov
6ab78e13c6 topology cooordinator: propagate initial_token option to the coordinator
The patch propagates initial_token option to the topology coordinator
where it is added to join request parameter.
2024-03-26 18:43:16 +02:00
Marcin Maliszkiewicz
e1fea3af6b main: reload service levels data accessor after join_cluster
Setting data accessor implicitly depends on node joining the cluster
with raft leader elected as only then service level mutation is put
into scylla_local table. Calling it after join_cluster avoids starting
new cluster with older version only to immediately migrate it to the
latest one in the background.
2024-03-26 17:36:03 +01:00
Nadav Har'El
ba97fd98a3 alternator: reduce stall for Query and Scan with large pages
Before this patch, Alternator's Query and Scan operations convert an
entire result page to JSON without yielding. For a page of maximum
size (1MB) and tiny rows, this can cause a significant stall - the
test included in this patch reported stalls of 14-26ms on my laptop.

The problem is the describe_items() function, which does this conversion
immediately, without yielding. This patch changes this function to
return a future, and use the result_set::visit_gently() method
instead of visit() that yields when needed.

This patch does not completely eliminate stalls in the test, but
on my laptop usually reduces them to around 5ms. It appears that
the remaining stalls some from other places not fixed in this PR,
such as perhaps query_page::handle_result(), and will need to be
fixed by additional patches.

The test included in this patch is useful for manually reproducing
the stall, but not useful as a regression test: It is slow (requiring
a couple of seconds to set up the large partition) and doesn't
check anything, and can't even report the stall without modifying the
test runner. So the test is skipped by default (using the "veryslow"
marker) and can be enabled and run manually by developers who want
to continue working on #17995.

Refs #17995.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-26 18:32:45 +02:00
Nadav Har'El
e24629a635 result_set: introduce visit_gently()
Whereas result_set::visit() passes all the rows the the visitor and
returns void, this patch introduces a method visit_gently() that returns
a future, and may yield before visiting each row.

This method will be used in the next patch to allow Alternator, which
used visit() to convert a result_set into JSON format, to potentially
yield between rows and avoid large stalls when converting a large
result set.

Note that I decided to add the yield points in the new visit_gently()
between rows - not between each cell. Many places in our code (including
the memtable) already work on a per-row basis and do not yield in the
middle of a row, so it won't really be helpful to do this either.
But if we'll want, we will still be able to modify visit_gently() later
to be even more gentle, and yield between individual cells. The callers
shouldn't know or care.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-26 18:32:11 +02:00
Marcin Maliszkiewicz
ff17a29b54 service: qos: create separate function for reloading data accessor
Scylla's main is already too long, it's better to contain this logic inside qos service.
2024-03-26 17:26:19 +01:00
Avi Kivity
4ddf82e58b treewide: don't #include "gms/feature_service.hh" from other headers
feature_service.hh is a high-level header that integrates much
of the system functionality, so including it in lower-level headers
causes unnecessary rebuilds. Specifically, when retiring features.

Fix by removing feature_service.hh from headers, and supply forward
declarations and includes in .cc where needed.

Closes scylladb/scylladb#18005
2024-03-26 15:31:18 +02:00
Nadav Har'El
c146b1224c alternator: coroutinize do_query() function
This patch changes the do_query() function, used to implement Alternator's
Query and Scan operations, from using continuations to be a coroutine.
There are no functional changes in this patch, it's just the necessary
changes to convert the function to a coroutine.

The new code is easier to read and less indented, but more importantly,
will be easier to extend in the next patch to add additional awaits
in the middle of the function.

In additional to the obvious changes, I also had to rename one local
variable (as the same name was used in two scopes), and to convert
pass-by-rvalue-reference to pass-by-value (these parameters are *moved*
by the caller, and moreover the old code had to move them again to a
continuation, so there is no performance penalty in this change).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-26 15:08:08 +02:00
Pavel Emelyanov
8bf9098663 system_keyspace: Consolidate node-state vs tokens checks
When loading topology state, nodes are checked to have or not to have
"tokens" field set. The check is done based on node state and it's
spread across the loading method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17957
2024-03-26 14:55:46 +02:00
Avi Kivity
22b8065a89 Merge 'tools/scylla-nodetool: implement the getsstables and sstableinfo commands' from Botond Dénes
These commands manage to avoid detection because they are not documented on https://opensource.docs.scylladb.com/stable/operating-scylla/nodetool.html.

They were discovered when running dtests, with ccm tuned to use the native nodetool directly. See https://github.com/scylladb/scylla-ccm/pull/565.

The commands come with tests, which pass with both the native and Java nodetools. I also checked that the relevant dtests pass with the native implementation.

Closes scylladb/scylladb#17979

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the sstableinfo command
  tools/scylla-nodetool: implement the getsstables command
  tools/scylla-nodetool: move get_ks_cfs() to the top of the file
  test/nodetool: rest_api_mock.py: add expected_requests context manager
2024-03-26 14:38:00 +02:00
Kefu Chai
101fdfc33a test: randomized_nemesis_test: add fmt::formatter for stop_crash::result_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

also, it's impossible to partial specialize a nested type of a
template class, we cannot specialize the `fmt::formatter` for
`stop_crash<M>::result_type`, as a workaround, a new type is
added.

in this change,

* define a new type named `stop_crash_result`
* add fmt::formatter for `stop_crash_result`
* define stop_crash::result_type as an alias of `stop_crash_result`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18018
2024-03-26 12:18:55 +02:00
Pavel Emelyanov
67c2a06493 api: Rename (un)set_server_load_sstable -> (un)set_server_column_family
The method sets up column family API, not load-sstables one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18022
2024-03-26 12:16:08 +02:00
Botond Dénes
7edbf189e6 Merge 'treewide: use fmt::to_string() to transform a UUID to std::string and drop UUID::to_sstring()' from Kefu Chai
`UUID::to_sstring()` relies on `FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for `UUID`, and this feature is deprecated in {fmt} v9, and dropped in {fmt} v10.

in this series, all callers of `UUID::to_sstring()` are switched to `fmt::to_string()`, and this function is dropped.

Closes scylladb/scylladb#18020

* github.com:scylladb/scylladb:
  utils: UUID: drop UUID::to_sstring()
  treewide: use fmt::to_string() to transform a UUID to std::string
2024-03-26 12:14:56 +02:00
Kefu Chai
f3532cbaa0 db: commitlog: use fmt::streamed() to print segment
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change:

* add `format_as()` for `segment` so we can use it as a fallback
  after upgrading to {fmt} v10
* use fmt::streamed() when formatting `segment`, this will be used
  the intermediate solution before {fmt} v10 after dropping
  `FMT_DEPRECATED_OSTREAM` macro

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18019
2024-03-26 12:13:29 +02:00
Botond Dénes
cd9589ec78 Merge 'test.py: Sanitize test list creation' from Pavel Emelyanov
To create the list of tests to run there's a loop that fist collects all tests from suits, then filters the list in two ways -- excludes opt-out-ed lists (disabled and matching the skip pattern) or leaves there only opt-in-ed (those, specified as positional arguments).

This patch keeps both list-checking code close to each other so that the intent is explicitly clear.

Closes scylladb/scylladb#17981

* github.com:scylladb/scylladb:
  test.py: Give local variable meaningful name
  test.py: Sanitize test list creation
2024-03-26 12:09:49 +02:00
Marcin Maliszkiewicz
5844d66676 auth: coroutinize service::start 2024-03-26 09:45:15 +01:00
Patryk Jędrzejczak
13fecd4e36 raft topology: decommission: allow only in NORMAL mode
We move the mode check so that the raft-based decommission also uses
it. Without this check, it hanged after the drain operation instead
of instantly failing. `test_decommission_after_drain_is_invalid` was
failing because of it with the raft-based topology enabled.

Fixes scylladb/scylladb#16761

Closes scylladb/scylladb#18000
2024-03-26 08:52:26 +01:00
Botond Dénes
f0ff23492f Merge 'Sanitize topology suites' skiplists' from Pavel Emelyanov
There are skip_in_<mode> lists in suite yaml that tells test.py not to run the test from it. This PR sanitizes these lists in two ways.

First, to skip pytests the skip-decorators are much more convenient, e.g. because they show the reason why the test is skipped.

Also, if a test wants to be opt-in-ed for some mode only, it's opt-out-ed in all other lists instead. There's run_in_<mode> list in suite for that.

Closes scylladb/scylladb#17964

* github.com:scylladb/scylladb:
  test: Do not duplicate test name in several skip-lists
  test: Mark tests with skip_mode instead of suite skip-list
2024-03-26 08:24:57 +02:00
Kefu Chai
a047178fe7 utils: UUID: drop UUID::to_sstring()
this function is not used anymore, and it relies on
`FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for
`UUID`, and this feature is deprecated in {fmt} v9, and
dropped in {fmt} v10.

in this change, let's drop this member function.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-26 13:38:37 +08:00
Kefu Chai
1b859e484f treewide: use fmt::to_string() to transform a UUID to std::string
without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is
implemented using its `fmt::formatter`, which is not available
at the end of this header file where `UUID` is defined. at this moment,
we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can
still use `UUID::to_sstring()`, but in {fmt} v10, we cannot.

so, in this change, we change all callers of `UUID::to_sstring()`
to `fmt::to_string()`, so that we don't depend on
`FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-26 13:38:37 +08:00
Wojciech Mitros
9789a3dc7c mv: keep semaphore units alive until the end of a remote view update
When a view update has both a local and remote target endpoint,
it extends the lifetime of its memory tracking semaphore units
only until the end of the local update, while the resources are
actually used until the remote update finishes.
This patch changes the semaphore transferring so that in case
of both local and remote endpoints, both view updates share the
units, causing them to be released only after the update that
takes longer finishes.

Fixes #17890

Closes scylladb/scylladb#17891
2024-03-25 19:43:58 +02:00
Tzach Livyatan
6702ba3664 Docs: Add link from migration tools page to nodetool refresh load and stream
Closes scylladb/scylladb#18006
2024-03-25 17:47:05 +02:00
Botond Dénes
1ea7b408db tools/scylla-nodetool: implement the sstableinfo command 2024-03-25 11:29:30 -04:00
Botond Dénes
50da93b9c8 tools/scylla-nodetool: implement the getsstables command 2024-03-25 11:29:30 -04:00
Botond Dénes
f51061b198 tools/scylla-nodetool: move get_ks_cfs() to the top of the file
So it can be used by all commands.
2024-03-25 11:29:30 -04:00
Botond Dénes
4ff88b848c test/nodetool: rest_api_mock.py: add expected_requests context manager
So tests and fixtures can use `with expected_requests():` and have
cleanup be taken care for them. I just discovered that some tests do not
clean up after themselves and when running all tests in a certain order,
this causes unrelated tests to fail.
Fix by using the context everywhere, getting guaranteed cleanup after
each test.
2024-03-25 11:29:30 -04:00
Petr Gusev
7c84fc527b test_invalid_user_type_statements: increase raft timeout
The test creates ut4 with a lot of fields,
this may take a while in debug builds,
to avoid raft operation timeout set the threshold
to some big value.

The error injector is disabled in release builds,
so this settings won't be applied to them.
This shouldn't be a problem since release builds
are fast enough, even on arm.

Fixes scylladb/scylladb#17987

Closes scylladb/scylladb#17997
2024-03-25 14:52:16 +01:00
Ferenc Szili
8bb7a18de2 test/cql-pytest: add --omit-scylla-output to Cassandra test runs
Currently, the tests in test/cql-pytest can be run against both ScyllaDB and Cassandra.
Running the test for either will first output the test results, and subsequently
print the stdout output of the process under test. Using the command line
option --omit-scylla-output it is possible to disable this print for Scylla,
but it is not possible for tests run against Cassandra.

This change adds the option to suppress output for Cassandra tests, too. By default,
the stdout of the Cassandra run will still be printed after the test results, but
this can now be disabled with --omit-scylla-output

Closes scylladb/scylladb#17996
2024-03-25 15:14:45 +02:00
Pavel Emelyanov
16343b3edc test: Do not duplicate test name in several skip-lists
Some tests are only run in dev mode for some reason. For such tests
there's run_in_dev list, no need in putting it in all the non-dev
skip_in_... ones.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:56:37 +03:00
Pavel Emelyanov
90dfcec86b test: Mark tests with skip_mode instead of suite skip-list
There are many tests that are skipped in release mode becuase they rely
on error-injection machinery which doesn't work in release mode. Most of
those tests are listed in suite's skip_in_release, but it's not very
handy, mainly because it's not clear why the test is there. The
skip_mode decoration is much more convenient.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:56:37 +03:00
Pavel Emelyanov
2c90aeb5ee test.py: Give local variable meaningful name
Rename t to testname as it's more informative

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:53:48 +03:00
Pavel Emelyanov
b2f5b63aaa test.py: Sanitize test list creation
To create the list of tests to run there's a loop that fist collects all
tests from suits, then filters the list in two ways -- excludes
opt-out-ed lists (disabled and matching the skip pattern) or leaves
there only opt-in-ed (those, specified as positional arguments).

This patch keeps both list-checking code close to each other so that the
intent is explicitly clear.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-25 14:53:20 +03:00
Kamil Braun
69bf962522 Merge 'allow changing snitch with topology over raft' from Gleb
Fixes scylladb/scylladb#17513

* 'gleb/raft-snitch-change-v3' of github.com:scylladb/scylla-dev:
  doc: amend snitch changing procedure to work with raft
  test: add test to check that snitch change takes effect.
  raft topology: update rack/dc info in topology state on reboot if changed
2024-03-25 10:41:39 +01:00
Gleb Natapov
3b272c5650 doc: amend snitch changing procedure to work with raft
To change snitch with raft all nodes need to be started simultaneously
since each node will try to update its state in the raft and for that
quorum is required.
2024-03-25 11:31:30 +02:00
Beni Peled
eecfd164ff Remove docs-amplify-enhanced github-workflow
Since we implemented the CI-Docs on pkg, there is no need for this
workflow

Closes scylladb/scylladb#17908
2024-03-25 11:30:06 +02:00
Kefu Chai
e97ae6b0de raft: server: print pointee of server_impl::_fsm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, instead of printing the `unique_ptr` instance, we
print the pointee of it. since `server_impl` uses pimpl paradigm,
`_fsm` is always valid after `server_impl::start()`, we can always
deference it without checking for null.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17953
2024-03-25 11:20:34 +02:00
Botond Dénes
ff421168d0 Update tools/jmx submodule
* tools/jmx 3257897a...53696b13 (1):
  > dist/debian: do not use substvar of ${shlib:Depends}
2024-03-25 11:16:25 +02:00
Gleb Natapov
d7adf26a56 test: add test to check that snitch change takes effect.
The test creates two node cluster with default snitch (SimpleSnitch) and
checks that dc and rack names are as expected. Then it changes the
config to use GossipingPropertyFileSnitch with different names, restart
nodes and check that now peers table has new names.
2024-03-25 10:41:49 +02:00
Kefu Chai
4eabf8b617 topology_coordinator: add fmt::formatter for wait_for_ip_timeout
before this change, we rely on the default-generated fmt::formatter
created from operator<<. but this depends on the
`FMT_DEPRECATED_OSTREAM` macro which is not respected in {fmt} v10.

this change addresses the formatting with fmtlib < 10, and without
`FMT_DEPRECATED_OSTREAM` defined. please note, in {fmt} v10 and up,
it defines formatter for classes derived from `std::exception`, so
our formatter is only added when compiled with {fmt} < 10.

in this change, `fmt::formatter<service::wait_for_ip_timeout>` is
added for backward compatibility with {fmt} < 10.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17955
2024-03-25 10:39:38 +02:00
Kefu Chai
5d59dd585f configure.py: always rebuild SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE
before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated
at the first run of `configure.py`, once these files are around, they
are not updated despite that `SCYLLA_VERSION_GEN` does not generate
them as long as the release string retrieved from git sha1 is identical
the one stored in `SCYLLA-RELEASE-FILE`, because we don't rerun
`SCYLLA_VERSION_GEN` at all.

but the pain is, when performing incremental build, like other built
artifacts, these generated files stay with the build directory, so
even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE
keeps the same -- it still contains the original git sha1 when it
was created. this could leads to confusion if developer or even our
CI perform incremental build using the same workspace and build
directory, as the built scylla executables always report the same
version number.

in this change, we always rebuilt the said
SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct ninja
to re-stat the output files, see
https://ninja-build.org/manual.html#ref_rule, in order to avoid
unnecessary rebuild. so the downside is that `SCYLLA_VERSION_GEN`
is executed every time we run `ninja` even if all targets are updated.
but the upside is that the release number reported by scylla is
accurate even if we perform incremental build.

also, since we encode the product, version and release stored
in the above files in the generated `build.ninja` file, in this change,
these three files are added as dependencies of `build.ninja`,
so that this file is regenerated if any of them is newer than
`build.ninja`.

Fixes #8255

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17974
2024-03-25 10:29:42 +02:00
Kefu Chai
5bc6d83f3b build: cmake: always rebuild SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE
before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated
when CMake generate `build.ninja` for the first time, once these files
are around, they are not updated anymore. despite that
`SCYLLA_VERSION_GEN` does not generate them as long as the release
string retrieved from git sha1 is identical the one stored in
`SCYLLA-RELEASE-FILE`, because we don't rerun `SCYLLA_VERSION_GEN` at
all.

but the pain is, when performing incremental build, like other built
artifacts, these generated files stay with the build directory, so
even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE
keeps the same -- it still contains the original git sha1 when it
was created. this could leads to confusion if developer or even our
CI perform incremental build using the same workspace and build
directory, as the built scylla executables always report the same
version number.

in this change, we always rebuilt the said
SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct CMake
to regenerate `build.ninja` if any of these files is updated.

Fixes #17975
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17983
2024-03-25 10:28:28 +02:00
Kefu Chai
0eb990fbf6 .github: skip "raison" when running codespell workflow
codespell workflow checks for misspellings to identify common
mispellings. it considers "raison" in "raison d'etre" (the accent
mark over "e" is removed , so the commit message can be encoded in
ASCII), to the misspelling of "reason" or "raisin". apparently, the
dictionary it uses does not include les mots francais les plus
utilises.

so, in this change, let's ignore "raison" for this very use case,
before we start the l10n support of the document.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17985
2024-03-25 09:51:12 +02:00
Kefu Chai
0713c324d4 cql3: provide fmt::formatter for cql3_type::raw only for {fmt} < 10
since we already have `format_as()` for `cql3_type::raw`, there is no
need to provide `cql3_type::raw` if the tree is compiled with {fmt} >= 10,
otherwise compiler is not able to figure out which one to match, see the
errror at the end of this commit message. so, in this change, we only
provide the specialized `fmt::formatter` for `cql3_type::raw` when
{fmt} < 10. this should address the FTBFS with {fmt} >= 10.

```
/usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1040:25: error: ambiguous partial specializations of 'formatter<cql3::cql3_type::raw>'
 1040 |       = __bool_constant<__is_constructible(_Tp, _Args...)>;
      |                         ^
/usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1046:16: note: in instantiation of template type alias '__is_constructible_impl' requested here
 1046 |       : public __is_constructible_impl<_Tp, _Args...>
      |                ^
/usr/include/fmt/core.h:1420:13: note: in instantiation of template class 'std::is_constructible<fmt::formatter<cql3::cql3_type::raw>>' requested here
 1420 |            !has_formatter<T, Context>::value))>
      |             ^
/usr/include/fmt/core.h:1421:22: note: while substituting prior template arguments into non-type template parameter [with T = cql3::cql3_type::raw]
 1421 |   FMT_CONSTEXPR auto map(const T&) -> unformattable_pointer {
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1422 |     return {};
      |     ~~~~~~~~~~
 1423 |   }
      |   ~
```

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17986
2024-03-25 09:49:40 +02:00
Yaron Kaikov
cb2c69a3f7 github: mergify: Add Ref to original PR
When openning a backport PR, adding a reference to the original PR.
This will be used later for updating the original PR/issue once the
backport is done (with different label)

Closes scylladb/scylladb#17973
2024-03-25 08:12:47 +02:00
Raphael S. Carvalho
6bdb456fad sstables_loader: Fix loader when write selector is previous during tablet migration
The loader is writing to pending replica even when write selector is set
to previous. If migration is reverted, then the writes won't be rolled
back as it assumes pending replicas weren't written to yet. That can
cause data resurrection if tablet is later migrated back into the same
replica.

NOTE: write selector is handled correctly when set to next, because
get_natural_endpoints() will return the next replica set, and none
of the replicas will be considered leaving. And of course, selector
set to both is also handled correctly.

Fixes #17892.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17902
2024-03-24 01:20:50 +01:00
Kamil Braun
230f23004b Revert "test.py: adjust the test for topology upgrade to write to and read from CDC tables"
This reverts commit b4144d14c6.

The test is flaky and blocks next promotions.
2024-03-22 17:25:04 +01:00
Petr Gusev
2a5f5d1948 test_fencing: fix flakiness
To cause the stale topology exception the test reads
the version from the last bootstrapped host and assigns its decremented
value to version and fence_version fields of system.topology.
The test assumes that version == fence_version here, if version
is greater than fence_version we won't get state topology
exception in this setup. Tablet balancer can break
this -- it may increment the version after the last node is
bootstrapped.

Fix this by disabling the tablet balancer earlier.

fixes scylladb/scylladb#17807

Closes scylladb/scylladb#17940
2024-03-22 12:49:13 +01:00
Piotr Dulikowski
f23f8f81bf Merge 'Raft-based service levels' from Michał Jadwiszczak
This patch introduces raft-based service levels.

The difference to the current method of working is:
- service levels are stored in `system.service_levels_v2`
- reads are executed with `LOCAL_ONE`
- writes are done via raft group0 operation

Service levels are migrated to v2 in topology upgrade.
After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then)

Fixes #17926

Closes scylladb/scylladb#16585

* github.com:scylladb/scylladb:
  test: test service levels v2 works in recovery mode
  test: add test for service levels migration
  test: add test for service levels snapshot
  test:topology: extract `trigger_snapshot` to utils
  main: create raft dda if sl data was migrated
  service:qos: store information about sl data migration
  service:qos: service levels migration
  main: assign standard service level DDA before starting group0
  service:qos: fix `is_v2()` method
  service:qos: add a method to upgrade data accessor
  test: add unit_test_raft_service_levels_accessor
  service:storage_service: add support for service levels raft snapshot
  service:qos: add abort_source for group0 operations
  service:qos: raft service level distributed data accessor
  service:qos: use group0_guard in data accessor
  cql3:statements: run service level statements on shard0 with raft guard
  test: fix overrides in unit_test_service_levels_accessor
  service:qos: fix indentation
  service:qos: coroutinize some of the methods
  db:system_keyspace: add `SERVICE_LEVELS_V2` table
  service:qos: extract common service levels' table functions
2024-03-22 11:51:53 +01:00
Ferenc Szili
b50a9f9bab removed forward declaration of resharding_descriptor
resharding_descriptor has been removed in e40aa042 in 2020
2024-03-22 11:35:10 +01:00
Ferenc Szili
93395e2ebe compaction options and troubleshooting docs
Added unchecked_tombstone_compaction descrition to compaction docs.
Added section to troubleshooting pointless compaction.
2024-03-22 11:26:17 +01:00
Ferenc Szili
455959b80e cql-pytest/test_compaction_strategy_validation.py
Adds the check for the wording of the validation error on invalid
values of unchecked_tombstone_compaction
2024-03-22 11:22:56 +01:00
Ferenc Szili
5c0de3b097 test/boost/sstable_compaction_test.cc
Checks if the tombstone_threshold value will be ignored if
unchecked_tombstone_compaction is set to true
2024-03-22 11:21:21 +01:00
Kamil Braun
9979adb670 Merge 'topology_coordinator: do not clear unpublished CDC generation's data' from Patryk Jędrzejczak
In this PR, we ensure unpublished CDC generation's data is
never removed, which was theoretically possible. If it happened,
it could cause problems. CDC generation publisher would then try
to publish the generation with its data removed. In particular, the
precondition of calling `_sys_ks.read_cdc_generation` wouldn't be
satisfied.

We also add a test that passes only after the fix. However, this test
needs to block execution of the CDC generation publisher's loop
twice. Currently, error injections with handlers do not allow it
because handlers always share received messages. Apart from the
first created handler, all handlers would be instantly unblocked by
a message from the past that has already unblocked the first
handler. This seems like a general limitation that could cause
problems in the future, so in this PR, we extend injections with
handlers to solve it once and for all. We add the `share_messages`
parameter to the `inject` (with handler) function. Depending on its
value, handlers will share messages (as before) or not.

Fixes scylladb/scylladb#17497

Closes scylladb/scylladb#17934

* github.com:scylladb/scylladb:
  topology_coordinator: clean_obsolete_cdc_generations: fix log
  topology_coordinator: do not clear unpublished CDC generation's data
  topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages
  error_injection: allow injection handlers to not share messages
2024-03-22 11:20:26 +01:00
Ferenc Szili
5a65169f46 compaction: implement unchecked_tombstone_compaction
This change adds the missing Cassandra compaction option unchecked_tombstone_compaction.
Setting this option to true causes the compaction to ignore tombstone_threshold,
and decide whether to do a compaction only on the value of tombstone_compaction_interval
2024-03-22 11:19:43 +01:00
Kamil Braun
4359a1b460 Merge 'raft timeouts: better handling of lost quorum' from Petr Gusev
In this PR we add timeouts support to raft groups registry. We introduce
the `raft_server_with_timeouts` class, which wraps the `raft::server`
add exposes its interface with additional `raft_timeout` parameter. If
it's set, the wrapper cancels the `abort_source` after certain amount of
time. The value of the timeout can be specified either in the
`raft_timeout` parameter, or the default value can be set in `the
raft_server_with_timeouts` class constructor.

The `raft_group_registry` interface is extended with
`group0_with_timeouts()` method. It returns an instance of
`raft_server_with_timeouts` for group0 raft server. The timeout value
for it is configured in `create_server_for_group0`. It's one minute by
default and can be overridden for tests with
`group0-raft-op-timeout-in-ms` parameter.

The new api allows the client to decide whether to use timeouts or not.
In this PR we are reviewing all the group0 call sites and add
`raft_timeout` if that makes sense. The general principle is that if the
code is handling a client request and the client expects a potential
error, we use timeouts. We don't use timeouts for background fibers
(such as topology coordinator), since they wouldn't add much value. The
only thing the background fiber can do with a timeout is to retry, and
this will have the same end effect as not having a timeout at all.

Fixes scylladb/scylladb#16604

Closes scylladb/scylladb#17590

* github.com:scylladb/scylladb:
  migration_manager: use raft_timeout{}
  storage_service::join_node_response_handler: use raft_timeout{}
  storage_service::start_upgrade_to_raft_topology: use raft_timeout{}
  storage_service::set_tablet_balancing_enabled: use raft_timeout{}
  storage_service::move_tablet: use raft_timeout{}
  raft_check_and_repair_cdc_streams: use raft_timeout{}
  raft_timeout: test that node operations fail properly
  raft_rebuild: use raft_timeout{}
  do_cluster_cleanup: use raft_timeout{}
  raft_initialize_discovery_leader: use raft_timeout{}
  update_topology_with_local_metadata: use with_timeout{}
  raft_decommission: use raft_timeout{}
  raft_removenode: use raft_timeout{}
  join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
  raft_group0: make_raft_config_nonvoter: add raft_timeout parameter
  raft_group0: make_raft_config_nonvoter: add abort_source parameter
  manager_client: server_add with start=false shouldn't call driver_connect
  scylla_cluster: add seeds parameter to the add_server and servers_add
  raft_server_with_timeouts: report the lost quorum
  join_node_request_handler: add raft_timeout{} for start_operation
  skip_mode: add platform_key
  auth: use raft_timeout{}
  raft_group0_client: add raft_timeout parameter
  raft_group_registry: add group0_with_timeouts
  utils: add composite_abort_source.hh
  error_injection: move api registration to set_server_init
  error_injection: add inject_parameter method
  error_injection: move injection_name string into injection_shared_data
  error_injection: pass injection parameters at startup
2024-03-22 10:45:33 +01:00
Botond Dénes
f02baef871 Merge 'test/lib: sstable::test_env consolidate and reduce header footprint' from Avi Kivity
Reduce the sprawl of sstables::test_env in .cc and .hh files, to ease
maintenance and reduce recompilations.

Closes scylladb/scylladb#17965

* github.com:scylladb/scylladb:
  test: sstables::test_env: complete pimplification
  test/lib: test_env: move test_env::reusable_sst() to test_services.cc
2024-03-22 11:26:12 +02:00
Botond Dénes
8b2856339a Merge 'github: sync-labels: use more descriptive name for workflow' from Kefu Chai
* rename `sync_labels.yaml` to `sync-labels.yaml`
* use more descrptive name for workflow

Closes scylladb/scylladb#17971

* github.com:scylladb/scylladb:
  github: sync-labels: use more descriptive name for workflow
  github: sync_labels: rename sync_labels to sync-labels
2024-03-22 10:01:56 +02:00
David Garcia
0375faa6aa docs: add experimental tag
Closes scylladb/scylladb#17633
2024-03-22 09:53:30 +02:00
Patryk Wrobel
28ed20d65e scylla-nodetool: adjust effective ownership handling
When a keyspace uses tablets, then effective ownership
can be obtained per table. If the user passes only a
keyspace, then /storage_service/ownership/{keyspace}
returns an error.

This change:
 - adds an additional positional parameter to 'status'
   command that allows a user to query status for table
   in a keyspace
 - makes usage of /storage_service/ownership/{keyspace}
   optional to avoid errors when user tries to obtain
   effective ownership of a keyspace that uses tablets
 - implements new frontend tests in 'test_status.py'
   that verify the new logic

Refs: scylladb#17405
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17827
2024-03-22 09:51:57 +02:00
Yaron Kaikov
407d25e47b [mergify] delete backport branch after merge
Since those branches clutter the branch search UI and we don't need them
after merging

Closes scylladb/scylladb#17961
2024-03-22 09:51:22 +02:00
Calle Wilund
7e09517433 Update seastar submodule
Submodule seastar 6b7b16a8a3..cd8a9133d2:
  > abort_source: add fmt::formatter for abort_requested_exception
  > memory: Ensure thread locals etc are minimally initialized even with non-seastar reactor options for alloc
  > rpc: add fmt::formatter for rpc::error classes and rpc::optional
  > Merge 'Adding Metrics family config' from Amnon Heiman
  > util: add fmt::formatter for bool_class<Tag>
  > util/bool_class: use the default-generated comparison operators
  > membarrier: cooperatively serialize calls to sys_membarrier
  > Merge 'build: relax the version constraint for Protobuf' from Kefu Chai
  > tls: add fmt::formatter for tls::subject_alt_name
  > memory.cc: Fix static init fiasco in system malloc override

diff --git a/seastar b/seastar
index 6b7b16a8a3..cd8a9133d2 160000
--- a/seastar
+++ b/seastar
@@ -1 +1 @@
-Subproject commit 6b7b16a8a329d831b94fdd4b41f6f55b260e9afd
+Subproject commit cd8a9133d2c02f63dbd578d882cf7333a427e194

Closes scylladb/scylladb#17865
2024-03-22 09:49:23 +02:00
Kefu Chai
7ebdfdb705 github: sync-labels: use more descriptive name for workflow
"label-sync" is not very helpful for developers to understand what
this workflow is for.

the "name" field of a job shows in the webpage on github of the
pull request against which the job is performed, so if the author
or reviewer checks the status of the pull request, he/she would
notice these names aside of the workflow's name. for this very
job, what we have now is:

```
Sync labels / label-sync
```

after this change it will be:
```
Sync labels / Synchronize labels between PR and the issue(s) fixed by it
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-22 10:41:20 +08:00
Kefu Chai
af879759b9 github: sync_labels: rename sync_labels to sync-labels
to be more consistent with other github workflows

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-22 10:31:31 +08:00
Michał Jadwiszczak
c0853b461c test: test service levels v2 works in recovery mode 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
c551a85cda test: add test for service levels migration 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
5811f696be test: add test for service levels snapshot 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
bf3aed1ecb test:topology: extract trigger_snapshot to utils
The function was defined separately in a few tests.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
a08918a320 main: create raft dda if sl data was migrated
Create `raft_service_levels_distributed_data_accessor` if service levels
were migrated to v2 table.
This supports raft recovery mode, as service levels will be read from v2
table in the mode.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
dab909b1d1 service:qos: store information about sl data migration
Save information whether service levels data was migrated to v2 table.
The information is stored in `system.scylla_local` table. It's
written with raft command and included in raft snapshot.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
2917ec5d51 service:qos: service levels migration
Migrate data from `system_distributes.service_levels` to
`system.service_levels_v2` during raft topology upgrade.

Migration process reads data from old table with CL ALL
and inserts the data to the new table via raft.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
36c9afda99 main: assign standard service level DDA before starting group0
`topology_state_load()` is responsible for upgrading service level DDA,
so the standard DDA has to be assigned before to be upgraded
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
159a6a2169 service:qos: fix is_v2() method 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
fd32f5162a service:qos: add a method to upgrade data accessor 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
d403bdfdd5 test: add unit_test_raft_service_levels_accessor
Raft service level data accessor with logic simillar to
`unit_test_service_levels_accessor` to avoid sleeps in boost tests.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
8bbeea0169 service:storage_service: add support for service levels raft snapshot
Include mutations from `system.service_levels_v2` in `raft_snapshot`.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
d5fa0747d7 service:qos: add abort_source for group0 operations
Add mechanism to abort ongoing group0 operations while draining
service_level_controller or leaving the cluster.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
7e61bbb0d5 service:qos: raft service level distributed data accessor
`raft_service_level_distributed_data_accessor` works this way:
- on read path it reads service levels from `SYSTEM.SERVICE_LEVELS_V2`
  table with CL = LOCAL_ONE
- on write path it starts group0 operation and it makes the change
  using raft command
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
71c07addb5 service:qos: use group0_guard in data accessor
Adjust service_level_controller and
service_level_controller::service_level_distributed_data_accessor
interfaces to take `group0_guard` while adding/altering/dropping a
service level.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
da82c5f0b0 cql3:statements: run service level statements on shard0 with raft guard
To migrate service levels to be raft managed, obtain `group0_guard` to
be able to pass it to service_level_controller's methods.

Using this mechanism also automatically provides retries in case of
concurrent group0 operation.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
674286b868 test: fix overrides in unit_test_service_levels_accessor 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
c0e22fcb9c service:qos: fix indentation 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
1f3c6b2813 service:qos: coroutinize some of the methods
Functions:
- `service_level_controller::set_distributed_service_level()`
- `service_level_controller::drop_distributed_service_level()`
- `service_level_controller::drain()`

Coroutines increase readability of those functions.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
8e242f5acd db:system_keyspace: add SERVICE_LEVELS_V2 table
The table has the same schema as `system_distributed.service_levels`.
However it's created entirely at once (unlike old table which creates
base table first and then it adds other columns) because `system` tables
are local to the node.
2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
990c5e7dd0 service:qos: extract common service levels' table functions
Getting a service level(s) will be done the same way in raft-based
service levels as it's done in standard service levels, so those
funtions are extracted to reused it.
2024-03-21 23:14:57 +01:00
Avi Kivity
b530dc1e3b test: sstables::test_env: complete pimplification
sstables::test_env uses the pimpl idiom, but incompletely. This
prevents reaping some of the benefits.

Complete the pimplification:
 - the `impl` nested struct is moved out-of-line
 - all non-template member functions are moved out-of-line
 - a destructor is declared and defined out-of-line
 - the move constructor is also defined (necessary after the destructor is
   defined)

After this, we can forward-declare more components.
2024-03-21 22:29:01 +02:00
Avi Kivity
d745929b44 test/lib: test_env: move test_env::reusable_sst() to test_services.cc
test_env implementation is scattered around two .cc, concentrate it
in test_services.cc, which happens to be the file that doesn't cause
link errors.

Move toc_filename with it, as it is its only caller and it is static.
2024-03-21 22:21:02 +02:00
Kefu Chai
900b56b117 raft_group0: print runtime_error by printing e.what()
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter. but fortunately, fmt v10 brings the builtin
formatter for classes derived from `std::exception`. but before
switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM`
macro, we need to print out `std::runtime_error`. so far, we don't
have a shared place for formatter for `std::runtime_error`. so we
are addressing the needs on a case-by-case basis.

in this change, we just print it using `e.what()`. it's behavior
is identical to what we have now.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17954
2024-03-21 19:43:52 +02:00
Avi Kivity
f0ca5e5a08 Merge 'treewide: add fmt::formatter for exception types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter` is added for following types for backward compatibility with {fmt} < 10:

* `utils::bad_exception_container_access`
* `cdc::no_generation_data_exception`
* classes derived from `sstables::malformed_sstable_exception`
* classes derived from `cassandra_exception`

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17944

* github.com:scylladb/scylladb:
  cdc: add fmt::formatter for exception types in data_dictionary.hh
  utils: add fmt::formatter for utils::bad_exception_container_access
  sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception
  exceptions: add fmt::formatter for classes derived from cassandra_exception
  cdc: add fmt::formatter for cdc::no_generation_data_exception
2024-03-21 18:44:37 +02:00
Botond Dénes
f9104fbfa9 tools/toolchain/image: update python driver (implicit)
Fixes: #17662

Closes scylladb/scylladb#17956
2024-03-21 18:27:40 +02:00
Andrei Chekun
7de28729e7 test: change maintenance socket location to /tmp
Fixes #16912

By default, ScyllaDB stores the maintenance socket in the workdir. Test.py by default uses the location for the ScyllaDB workdir as testlog/{mode}/scylla-#. The Usual location for cloning the repo is the user's home folder. In some cases, it can lead the socket path being too long and the test will start to fail. The simple way is to move the maintenance socket to /tmp folder to eliminate such a possibility.

Closes scylladb/scylladb#17941
2024-03-21 18:22:21 +02:00
Patryk Jędrzejczak
33a0864aaa topology_coordinator: clean_obsolete_cdc_generations: fix log
We use a non-inclusive bound here, so the log was incorrect.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
27465a00e0 topology_coordinator: do not clear unpublished CDC generation's data
In this commit, we ensure unpublished CDC generation's data is
never removed, which was theoretically possible. If it happened,
it could cause problems. CDC generation publisher would then try
to publish the generation with its data removed. In particular, the
precondition of calling `_sys_ks.read_cdc_generation` wouldn't be
satisfied.

We also add a test that passes only after the fix.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
f45aebeee2 topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages
In the following commit, we add a test that needs to block the CDC
generation publisher's loop twice. We allow it in this commit by
making handlers of the `cdc_generation_publisher_fiber` injection
share messages. From now on, unblocking every step of the loop will
require sending a new message from the test.

This change breaks the test already using the
`cdc_generation_publisher_fiber` injection, so we adjust the test.
2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak
c5c4cc7d00 error_injection: allow injection handlers to not share messages
For a single injection, all created injection handlers share all
received messages. In particular, it means that one received message
unblocks all handlers waiting for the first message. This behavior
is often desired, for example, if multiple fibers execute the
injected code and we want to unblock them all with a single message.
However, there is a problem if we want to block every execution
of the injected code. Apart from the first created handler, all
handlers will be instantly unblocked by messages from the past that
have already unblocked the first handler.

In one of the following commits, we add a test that needs to block
the CDC generation publisher's loop twice. Since it looks like there
are no good workarounds for this arguably general problem, we extend
injections with handlers in a way that solves it. We introduce the
new `share_messages` parameter. Depending on its value, handlers
will share messages or not. The details are described in the new
comments in `error_injection.hh`.

We also add some basic unit tests for the new funcionality.
2024-03-21 14:35:38 +01:00
Petr Gusev
ae0ec19537 migration_manager: use raft_timeout{}
Checking all the call sites of the migration manager shows
that all of them are initiated by user requests,
not background activities. Therefore, we add a global
raft_timeout{} here.
2024-03-21 16:35:48 +04:00
Petr Gusev
294e1ff464 storage_service::join_node_response_handler: use raft_timeout{}
This function is called as part of a node join procedure
initiated by the user, so having timeouts here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
e53189dcdc storage_service::start_upgrade_to_raft_topology: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
6e350fb580 storage_service::set_tablet_balancing_enabled: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
22d7c62c3c storage_service::move_tablet: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
dafd5d0160 raft_check_and_repair_cdc_streams: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
ca21362ade raft_timeout: test that node operations fail properly 2024-03-21 16:35:48 +04:00
Petr Gusev
dcc275cb0f raft_rebuild: use raft_timeout{}
This is a user-requested operation, so having
a timeout here makes sense.

The test will be provided in a subsequent commit.
2024-03-21 16:35:48 +04:00
Petr Gusev
8deb06647a do_cluster_cleanup: use raft_timeout{}
This function is called from the REST api,
so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
d5d2f04cd6 raft_initialize_discovery_leader: use raft_timeout{}
This function is called as part of a node startup
procedure, so a timeout may be useful.

As outlined in the comment, there is no valid way
we can lose quorum here, but some subsystems may
just become unreasonably slow for various reasons,
so we nonetheless use raft_timeout{} here.
2024-03-21 16:35:48 +04:00
Petr Gusev
f498cfae79 update_topology_with_local_metadata: use with_timeout{}
This function is called as part of a node startup
procedure, so having a timeout here makes sense.
2024-03-21 16:35:48 +04:00
Petr Gusev
f1f77b4882 raft_decommission: use raft_timeout{}
This is a user requested operation, so having
a timeout here makes sense.

The test will be provided in a subsequent commit.
2024-03-21 16:35:48 +04:00
Petr Gusev
aabcc0852a raft_removenode: use raft_timeout{}
This is a user requested operation, so having
a timeout here makes sense.

The test will be provided in a subsequent commit.
2024-03-21 16:35:48 +04:00
Petr Gusev
099c756ba1 join_node_request_handler: add raft_timeout to make_nonvoters and add_entry
We also add a specific test_quorum_lost_during_node_join. It
exercises the case when the quorum is lost after start_operation
but before these methods are called.
2024-03-21 16:35:48 +04:00
Petr Gusev
0ad852e323 raft_group0: make_raft_config_nonvoter: add raft_timeout parameter
We'll use this parameter in subsequent commits.
2024-03-21 16:35:48 +04:00
Petr Gusev
ce7fb39750 raft_group0: make_raft_config_nonvoter: add abort_source parameter 2024-03-21 16:35:48 +04:00
Petr Gusev
99ddffac32 manager_client: server_add with start=false shouldn't call driver_connect
If the server is not started there is not point
in starting the driver, it would fail because there
are no nodes to connect to. On the other hand, we
should connect the driver in server_start()
if it's not connected yet.
2024-03-21 16:35:48 +04:00
Petr Gusev
3f6cf38dd5 scylla_cluster: add seeds parameter to the add_server and servers_add
If this parameter is set, we use its value for
the scylla.yaml of the new node, otherwise we
use IPs of all running nodes as before.

We'll need this parameter in subsequent commits to
restrict the communication between nodes.

We remove default values for _create_server_add_data parameters
since they are redundant - in the two call sites we pass all
of them.
2024-03-21 16:35:48 +04:00
Petr Gusev
99419d5964 raft_server_with_timeouts: report the lost quorum
In this commit we extend the timeout error message with
additional context - if we see that there is no quorum of
available nodes, we report this as the most likely
cause of the error.

We adjust the test by adding this new information to the
expected_error. We need raft-group-registry-fd-threshold-in-ms
to make _direct_fd threshold less than
group0-raft-op-timeout-in-ms.
2024-03-21 16:35:48 +04:00
Petr Gusev
1a3fc58438 join_node_request_handler: add raft_timeout{} for start_operation
In the test, we use the group0-raft-op-timeout-in-ms parameter to
reduce the timeout to one second so as not to waste time.

The join_node_request_handler method contains other group0 calls
which should have timeouts (make_nonvoters and add_entry). They
will be handled in a separate commit.
2024-03-21 16:35:48 +04:00
Petr Gusev
854531ae8e skip_mode: add platform_key
In subsequent commits we are going to add test.py
tests for raft_timeout{} feature. The problem is that
aarch/debug configuration is infamously slow. Timeout
settings used in tests work for all platforms but aarch/debug.

In this commit we extend the skip_mode attribute with the
platform_key property. We'll use @skip_mode('debug', platform_key='aarch64')
to skip the tests for this specific configuration.
The tests will still be run for aarch64/release.
2024-03-21 16:35:43 +04:00
Yaron Kaikov
5bd6b4f4c2 github: sync_labels: match issue number with better pattern
Seen in https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535

```
python .github/scripts/sync_labels.py --repo scylladb/scylladb --number 17309 --action labeled --label backport/none
  shell: /usr/bin/bash -e {0}
  env:
    GITHUB_TOKEN: ***

Found issue number: ('', '', '15465')
Traceback (most recent call last):
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 9[3](https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535#step:5:3), in <module>
    main()
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main
    sync_labels(repo, args.number, args.label, args.action, args.is_issue)
  File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line [7](https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535#step:5:8)1, in sync_labels
    target = repo.get_issue(int(pr_or_issue_number))
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'tuple'
Error: Process completed with exit code 1.
```

Fixing the pattern to catch all GitHub supported close keywords as
describe in https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

Fixed: https://github.com/scylladb/scylladb/issues/17917
Fixed: https://github.com/scylladb/scylladb/issues/17921

Closes scylladb/scylladb#17920
2024-03-21 14:25:24 +02:00
Petr Gusev
e335b17190 auth: use raft_timeout{}
The only place where we don't need raft_timeout{}
is migrate_to_auth_v2 since it's called from
topology_coordinator fiber. All other places are
called from user context, so raft_timeout{} is used.
2024-03-21 16:12:51 +04:00
Petr Gusev
cebf87bf59 raft_group0_client: add raft_timeout parameter
In this commit we add raft_timeout parameter to
start_operation and add_entry method.

We fix compilation in default_authorizer.cc,
bind_front doesn't account for default parameter
values. We should use raft_timeout{} here, but this
is for another commit.
2024-03-21 16:12:51 +04:00
Petr Gusev
3d1b94475f raft_group_registry: add group0_with_timeouts
In this commit we add timeouts support to raft groups
registry. We introduce the raft_server_with_timeouts
class, which wraps the raft::server add exposes its
interface with additional raft_timeout parameter.
If it's set, the wrapper cancels the abort_source
after certain amount of time. The value of the timeout
can be specified in the raft_timeout parameter,
or the default value can be set in the raft_server_with_timeouts
class constructor.

The raft_group_registry interface is extended with
get_server_with_timeouts(group_id) and group0_with_timeouts()
methods. They return an instance of raft_server_with_timeouts for
a specified group id or for group0. The timeout value for it is configured in
create_server_for_group0. It's one minute by default, can be overridden
for tests with group0-raft-op-timeout-in-ms parameter.

The new api allows the client to decide whether to use timeouts or not.
In subsequent commits we are going to review all group0 call sites
and add raft_timeout if that makes sense. The general principle is that
if the code is handling a client request and the client expects
a potential error, we use timeouts. We don't use timeouts for
background fibers (such as topology coordinator), since they won't
add much value. The only thing the background fiber can do
with a timeout is to retry, and this will have the same effect
as not having a timeout at all.
2024-03-21 16:12:51 +04:00
Petr Gusev
532a720c3d utils: add composite_abort_source.hh 2024-03-21 16:12:51 +04:00
Kefu Chai
8dacec589d cql3: add fmt::formatter for cql3_type and cql3_type::raw
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<>` is added for following classes:

* `cql3::cql3_type`
* `cql3::cql3_type::raw`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17945
2024-03-21 14:08:50 +02:00
Nadav Har'El
fdeb14b468 Merge 'scylla-nodetool: make command-line parsing fully compatible with the legacy nodetool' from Botond Dénes
There was two more things missing:
* Allow global options to be positioned before the operation/command option (https://github.com/scylladb/scylladb/issues/16695)
* Ignore JVM args (https://github.com/scylladb/scylladb/issues/16696)

This PR fixes both. With this, hopefully we are fully compatible with nodetool as far as command line parsing is concerned.
After this PR goes in, we will need another fix to tools/java/bin/nodetool-wrapper, to allow user to benefit from this fix. Namely, after this PR, we can just try to invoke scylla-nodetool first with all the command-line args as-is. If it returns with exit-code 100, we fall back to nodetool. We will not need the current trick with `--help $1`. In fact, this trick doesn't work currently, because `$1` is not guaranteed to be the command in the first place.

In addition to the above, this PR also introduces a new option, to help us in the switching process. This is `--rest-api-port`, which can also be provided as `-Dcom.scylladb.apiPort`. When provided, this option takes precedence over `--port|-p`. This is intended as a bridge for `scylla-ccm`, which currently provides the JMX port as `--port`. With this change, it can also provided the REST API port as `-Dcom.scylladb.apiPort`. The legacy nodetool will ignore this, while the native nodetool will use it to connect to the correct REST API address. After the switch we can ditch these options.

Fixes: https://github.com/scylladb/scylladb/issues/16695
Fixes: https://github.com/scylladb/scylladb/issues/16696
Refs: https://github.com/scylladb/scylladb/issues/16679
Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17168

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add --rest-api-port option
  tools/scylla-nodetool: ignore JVM args
  tools/utils: make finding the operation command line option more flexible
  tools/utils: get_selected_operation(): remove alias param
  tools: add constant with current help command-line arguments
2024-03-21 14:06:45 +02:00
Pavel Emelyanov
c8fc43d169 test: Update topology_custom/suite::run_first list
The recently added test_tablets_migration dominates with it run-time (10
minutes). Also update other tests, e.g. test_read_repair is not in top-7
for any mode, test_replace and test_raft_recovery_majority_loss are both
not notably slower than most of other tests (~40 sec both). On the other
hand, the test_raft_recovery_basic and test_group0_schema_versioning are
both 1+ minute

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17927
2024-03-21 12:48:50 +01:00
Gleb Natapov
e26b0f34a0 raft topology: update rack/dc info in topology state on reboot if changed
It is allowed to change a snitch after cluster is already running.
Changing a snitch may cause dc and/or rack names to be changed and
gossiper handles it by gossiping new names on restart. The patch changes
raft mode to update the names on restart as well.
2024-03-21 12:44:12 +02:00
Andrei Chekun
a5455460d8 test: fix flakiness of the multi_dc tests
The initial version used a redundant method, and it did not cover all
cases. So that leads to the flakiness of the test that used this method.
Switching to the cluster_con() method removes flakiness since it's
written more robustly.

Fixes scylladb/scylladb#17914

Closes scylladb/scylladb#17932
2024-03-21 11:17:22 +01:00
Asias He
9587352f13 repair: Invoke group0 read barrier in repair_tablets
This allows the repair master to see all previous metadata changes.

Refs #17658

Closes scylladb/scylladb#17942
2024-03-21 10:54:40 +01:00
Kamil Braun
4dfb7e3051 Merge 'storage_service::merge_topology_snapshot: handle big mutations' from Petr Gusev
The group0 state machine calls `merge_topology_snapshot` from
`transfer_snapshot`. It feeds it with `raft_topology_snapshot` returned
from `raft_pull_topology_snapshot`. This snapshot includes the entire
`system.cdc_generations_v3` table. It can be huge and break the
commitlog `max_record_size` limit.

The `system.cdc_generations_v3` is a single-partition table, so all the
data is contained in one mutation object. To fit the commitlog limit we
split this mutation into many smaller ones and apply them in separate
`database::apply` calls. That means we give up the atomicity guarantee,
but we actually don't need it for `system.cdc_generations_v3` and
`system.topology_requests`.

This PR fixes the dtest
`update_cluster_layout_tests.py::TestLargeScaleCluster::test_add_many_nodes_under_load`

Fixes scylladb/scylladb#17545

Closes scylladb/scylladb#17632

* github.com:scylladb/scylladb:
  test_cdc_generation_data: test snapshot transfer
  storage_service::merge_topology_snapshot: handle big cdc_generations_v3 mutations
  mutation: add split_mutation function
  storage_service::merge_topology_snapshot: fix indentation
2024-03-21 10:50:03 +01:00
Avi Kivity
628017c810 test: sstables::test_env: mock sstables_registry
sstables::test_env is intended for sstable unit tests, but to satisfy its
dependency of an sstables_registry we instantiate an entire database.

Remove the dependency by having a mock implementation of sstables_registry
and using that instead.

Closes scylladb/scylladb#17895
2024-03-21 10:19:46 +01:00
Tomasz Grabiec
baf12b0b2f test: tablets: Avoid infinite loop in rebalance_tablets()
If there is a bug in the tablet scheduler which makes it never
converge for a given state of topology, rebalance_tablets() will never
complete and will generate a huge amounts of logs. This patch adds a
sanity limit so that we fail earlier.

This was observed in one of the test_load_balancing_with_random_load runs in CI.

Fixes scylladb/scylladb#17894.

Closes scylladb/scylladb#17916
2024-03-21 10:19:46 +01:00
Kamil Braun
bc42a5a092 Merge 'make sure that address map entry is not dropped between join request placement and the request handling' from Gleb
The series marks nodes to be non expiring in the address map earlier, when
they are placed in the topology.

Fixes: scylladb/scylladb#16849

* 'gleb/16849-fix-v2' of github.com:scylladb/scylla-dev:
  test: add test to check that address cannot expire between join request placemen and its processing
  topology_coordinator: set address map entry to nonexpiring when a node is added to the topology
  raft_group0: add modifiable_address_map() function
2024-03-21 10:19:46 +01:00
Kamil Braun
676af581d8 Merge 'cdc: should_propose_first_generation: get my_host_id from caller' from Benny Halevy
There is no need to map this node's inet_address to host_id. The
storage_service can easily just pass the local host_id. While at it, get
the other node's host_id directly from their endpoint_state instead of
looking it up yet again in the gossiper, using the nodes' address.

Refs #12283

Closes scylladb/scylladb#17919

* github.com:scylladb/scylladb:
  cdc: should_propose_first_generation: get my_host_id from caller
  storage_service: add my_host_id
2024-03-21 10:19:46 +01:00
Avi Kivity
43bcaeb87f Merge 'test: randomized_nemesis_test: add fmt::formatter for some types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* raft_call
* raft_read
* network_majority_grudge
* reconfiguration
* stop_crash
* operation::thread_id
* append_seq
* AppendReg::append
* AppendReg::ret
* operation::either_of<Ops...>
* operation::exceptional_result<Op>
* operation::completion<Op>
* operation::invocable<Op>

and drop their operator<<:s.

in which,

* `operator<<` for append_entry is never used. so it is removed.
* `operator<<` for `std::monostate` and `std::variant` are dropped. as we are now using their counterparts in {fmt}.
* stop_crash::result_type 's `fmt::formatter` is not added, as we cannot define a partial specialization of `fmt::formatter` for a nested class for a template class. we will tackle this struct in another change.

Refs #13245

Closes scylladb/scylladb#17884

* github.com:scylladb/scylladb:
  test: raft: generator: add fmt::formatter:s
  test: randomized_nemesis_test: add fmt::formatter for some types
  test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error
  raft: add fmt::formatter for error classes
2024-03-21 10:19:46 +01:00
Kefu Chai
6d77283941 cdc: add fmt::formatter for exception types in data_dictionary.hh
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<>` is added for following classes for
backward compatibility with {fmt} < 10:

* `data_dictionary::no_such_keyspace`
* `data_dictionary::no_such_column_family`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 13:26:01 +08:00
Kefu Chai
a58be49abf utils: add fmt::formatter for utils::bad_exception_container_access
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<utils::bad_exception_container_access>` is
added for backward compatibility with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 12:48:19 +08:00
Kefu Chai
0d6bff0f56 sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<T>` is added for classes derived from
`malformed_sstable_exception`, where `T` is the class type derived from
`malformed_sstable_exception`.

this change is implemented to be backward compatible  with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 12:48:19 +08:00
Kefu Chai
0609cd676f exceptions: add fmt::formatter for classes derived from cassandra_exception
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<T>` is added for classes derived from
`cassandra_exception`, where `T` is the class type derived from
`cassandra_exception`.

this change is implemented to be backward compatible  with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 12:48:19 +08:00
Kefu Chai
f5e1f0ccc7 cdc: add fmt::formatter for cdc::no_generation_data_exception
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, `fmt::formatter<cdc::no_generation_data_exception>` is
added for backward compatibility with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-21 12:48:19 +08:00
Petr Gusev
740b240e9d test_cdc_generation_data: test snapshot transfer
The test only looked at the initial cdc_generation
generation. It made the changes bigger to go
past the raft max_command_size limit.
It then made sure this large mutation set is saved
in several raft commands.

In this commit we enhance the test to check that the
mutations are properly handled during snapshot transfer.
The problem is that the entire system.cdc_generations_v3
table is read into the topology_snapshot and it's total
size can exceed the commitlog max_record_size limit.

We need a separate injection since the compaction
could nullify the effects of the previous injection.

The test fails without the fix from the previous commit.
2024-03-20 22:40:03 +04:00
Petr Gusev
276d58114d storage_service::merge_topology_snapshot: handle big cdc_generations_v3 mutations
The group0 state machine calls merge_topology_snapshot
from transfer_snapshot. It feeds it with raft_topology_snapshot
returned from raft_pull_topology_snapshot. This snapshot
includes the entire system.cdc_generations_v3 table.
It can be huge and break the commitlog max_record_size limit.

The system.cdc_generations_v3 is a single-partition table,
so all the data is contained in one mutation object. To
fit the commitlog limit we split this mutation into several
smaller ones and apply them in separate database::apply calls.
That means we give up the atomicity guarantee, but we
actually don't need it for system.cdc_generations_v3.
The cdc_generations_v3 data is not used in any way until
it's referenced from the topology table. By applying the
cdc_generations_v3 mutations before topology mutations
we ensure that the lack of atomicity isn't a problem here.

The database::apply method takes frozen_mutation parameter by
const reference, so we need to keep them alive until
all the futures are complete.

fixes #17545
2024-03-20 22:40:03 +04:00
Petr Gusev
db1afa0aba mutation: add split_mutation function
The function splits the source mutation into multiple
mutations so that their size does not exceed the
max_size limit. The size of a mutation is calculated
as the sum of the memory_usage() of its constituent
mutation_fragments.

The implementation is taken from view_updating_consumer.
We use mutation_rebuilder_v2 to reconstruct mutations from
a stream of mutation fragments and recreate the output
mutation whenever we reach the limit.

We'll need this function in the next commit.
2024-03-20 22:39:51 +04:00
Petr Gusev
d07e0efdd8 storage_service::merge_topology_snapshot: fix indentation
It was three spaces, should be four.
2024-03-20 22:30:48 +04:00
Kefu Chai
61424b615c test: raft: generator: add fmt::formatter:s
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* operation::either_of<Ops...>
* operation::exceptional_result<Op>
* operation::completion<Op>
* operation::invocable<Op>

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Kefu Chai
72899f573e test: randomized_nemesis_test: add fmt::formatter for some types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raft_call
* raft_read
* network_majority_grudge
* reconfiguration
* stop_crash
* operation::thread_id
* append_seq
* append_entry
* AppendReg::append
* AppendReg::ret

and drop their operator<<:s.

in which,

* `operator<<` for `std::monostate` and `std::variant` are dropped.
  as we are now using their counterparts in {fmt}.
* stop_crash::result_type 's `fmt::formatter` is not added, as we
  cannot define a partial specialization of `fmt::formatter` for
  a nested class for a template class. we will tackle this struct
  in another change.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Kefu Chai
97b203b1af test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatter for `seastar::timed_out_error`,
which will be used by the `fmt::formatter` for  `std::variant<...>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Kefu Chai
50637964ed raft: add fmt::formatter for error classes
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatter for classes derived from
`raft::error`. since {fmt} v10 defines the formatter for all classes
derived from `std::exception`, the definition is provided only when
the tree is compiled with {fmt} < 10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-20 21:01:29 +08:00
Pavel Emelyanov
21a5911e60 Merge 'db/virtual_tables: make token_ring_table tablet aware' from Botond Dénes
The token ring table is a virtual table (`system.token_ring`), which contains the ring information for all keyspaces in the system. This is essentially an alternative to `nodetool describering`, but since it is a virtual table, it allows for all the usual filtering/aggregation/etc. that CQL supports.
Up until now, this table only supported keyspaces which use vnodes. This PR adds support for tablet keyspaces. To accommodate these keyspaces a new `table_name` column is added, which is set to `ALL` for vnodes keyspaces. For tablet keyspaces, this contains the name of the table.
Simple sanity tests are added for this virtual table (it had none).

Fixes: #16850

Closes scylladb/scylladb#17351

* github.com:scylladb/scylladb:
  test/cql-pytest: test_virtual_tables: add test for token_ring table
  db/virtual_tables: token_ring_table: add tablet support
  db/virtual_tables: token_ring_table: add table_name column
  db/virtual_tables: token_ring_table: extract ring emit
  service/storage_service: describe_ring_for_table(): use topology to map hostid to ip
2024-03-20 14:05:49 +03:00
Benny Halevy
fceb1183d3 cdc: should_propose_first_generation: get my_host_id from caller
There is no need to map this node's inet_address to host_id.
The storage_service can easily just pass the local host_id.
While at it, get the other node's host_id directly
from their endpoint_state instead of looking it up
yet again in the gossiper, using the nodes' address.

Refs #12283

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-20 12:53:49 +02:00
Benny Halevy
37adcd3ecf storage_service: add my_host_id
Shorthand for getting this node's host_id
from token_metadata.topology, similar to the
`get_broadcast_address` helper.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-20 12:53:49 +02:00
Mikołaj Grzebieluch
b4144d14c6 test.py: adjust the test for topology upgrade to write to and read from CDC tables
In topology on raft, management of CDC generations is moved to the topology coordinator.
We need to verify that the CDC keeps working correctly during the upgrade for topology on the raft.

A similar change will be made in the topology recovery test. It will reuse
the `start_writes_to_cdc_table` function.

Ref #17409

Closes scylladb/scylladb#17828
2024-03-20 11:15:02 +01:00
Yaron Kaikov
d859067486 [action sync labels] improve pr search when labeling an issue
This PR contains few fixes and improvment seen during
https://github.com/scylladb/scylladb/issues/15902 label addtion

When we add a label to an issue, we go through all PR.
1) Setting PR base to `master` (release PR are not relevant)
2) Since for each Issue we have only one PR, ending the search after a
   match was found
3) Make sure to skip PR with empty body (mainly debug one)
4) Set backport label prefix to `backport/`

Closes scylladb/scylladb#17912
2024-03-20 12:14:42 +02:00
David Garcia
559dc9bb27 docs: Implement relative link support for configuration properties
Introduces relative link support for individual properties listed on the configuration properties page.  For instance, to link to a property from a different document, use the syntax :ref:`memtable_flush_static_shares <confprop_memtable_flush_static_shares>`.

Additionally, it also adds support for linking groups. For example, :ref:`Ungrouped properties <confgroup_ungrouped_properties>`.

Closes scylladb/scylladb#17753
2024-03-20 11:39:30 +02:00
Gleb Natapov
2b11842cb4 test: add test to check that address cannot expire between join request placemen and its processing 2024-03-20 11:05:31 +02:00
Kefu Chai
2479328e3b Update seastar submodule
> Revert "build: do not provide zlib as an ingredient"
> Fix reference to sstring type in tutorial about concurrency in coroutines
> Merge 'Adding a Metrics tester app' from Amnon Heiman
> cooking.sh: do not quote backtick in here document

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17887
2024-03-20 09:18:35 +02:00
Kefu Chai
432c000dfa ./: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17888
2024-03-20 09:16:46 +02:00
Raphael S. Carvalho
6115c113fe sstables_loader: Don't discard sstable that is not fully exhausted
Affects load-and-stream for tablets only.

The intention is that only this loop is responsible for detecting
exhausted sstables and then discarding them for next iterations:
        while (sstable_it != _sstables.rend() && exhausted(*sstable_it)) {
            sstable_it++;
        }

But the loop which consumes non exhausted sstables, on behalf of
each tablet, was incorrectly advancing the iterator, despite the
sstable wasn't considered exhausted.

Fixes #17733.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17899
2024-03-20 09:11:59 +02:00
Yaron Kaikov
0cbe5f1aa8 [action] add Fixes validation in backport PR
When we open a backport PR we should make sure the patch contains a ref to the issue it suppose to fix in order to make sure we have more accurate backport information

This action will only be triggered when base branch is `branch-*`

If `Fixes` are missing, this action will fail and notify the author.

Ref: https://github.com/scylladb/scylla-pkg/issues/3539

Closes scylladb/scylladb#17897
2024-03-20 08:55:36 +02:00
Nadav Har'El
8df2ea3f95 cql: don't crash when creating a view during a truncate
The test dtest materialized_views_test.py::TestMaterializedViews::
test_mv_populating_from_existing_data_during_truncate reproduces an
assertion failure, and crash, while doing a CREATE MATERIALIZED VIEW
during a TRUNCATE operation.

This patch fixes the crash by removing the assert() call for a view
(replacing it by a warning message) - we'll explain below why this is fine.
Also for base tables change we change the assertion to an on_internal_error
(Refs #7871).
This makes the test stop crashing Scylla, but it still fails due to
issue #17635.

Let's explain the crash, and the fix:

The test starts TRUNCATE on table that doesn't yet have a view.
truncate_table_on_all_shards() begins by disabling compaction on
the table and all its views (of which there are none, at this
point). At this point, the test creates a new view is on this table.
The new view has, by default, compaction enabled. Later, TRUNCATE
calls discard_sstables() on this new view, asserts that it has
compaction disabled - and this assertion fails.

The fix in this patch is to not do the assert() for views. In other words,
we acknowledge that in this use case, the view *will* have compactions
enabled while being truncated. I claim that this is "good enough", if we
remember *why* we disable compaction in the first place: It's important
to disable compaction while truncating because truncating during compaction
can lead us to data resurection when the old sstable is deleted during
truncation but the result of the compaction is written back. True,
this can now happen in a new view (a view created *DURING* the
truncation). But I claim that worse things can happen for this
new view: Notably, we may truncate a view and then the ongoing
view building (which happens in a new view) might copy data from
the base to the view and only then truncate the base - ending up
with an empty base and non-empty view. This problem - issue #17635 -
is more likely, and more serious, than the compaction problem, so
will need to be solved in a separate patch.

Fixes #17543.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17634
2024-03-20 08:54:39 +02:00
Raphael S. Carvalho
d5a5005afa sstables: Fix clone semantics for runs in partitioned_sstable_set
When a sstable set is cloned, we don't want a change in cloned set
propagating to the former one.

It happens today with partitioned_sstable_set::_all_runs, because
sets are sharing ownership of runs, which is wrong.

Let's not violate clone semantics by copying all_runs when cloning.

Doesn't affect data correctness as readers work directly with
sstables, which are properly cloned. Can result in a crash in ICS
when it is estimating pending tasks, but should be very rare in
practice.

Fixes #17878.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17879
2024-03-20 08:41:32 +02:00
Botond Dénes
c2425ca135 tools/scylla-nodetool: add --rest-api-port option
This option is an alternative to --port|-p and takes precedence over it.
This is meant to aid the switch from the legacy nodetool to the native
one. Users of the legacy nodetool pass the port of JMX to --port. We
need a way to provide both the JMX port (via --port) and also the REST
API port, which only the native nodetool will interpret. So we add this
new --rest-api-port, which when provided, overwrites the --port|-p
option. To ensure the legacy nodeotol doesn't try to interpret this,
this option can also be provided as -Dcom.scylladb.apiPort (which is
substituted to --rest-api-port behind the scenes).
2024-03-20 02:11:47 -04:00
Botond Dénes
a85ec6fc60 tools/scylla-nodetool: ignore JVM args
Legacy scripts and tests for nodetool, might pass JVM args like
-Dcom.sun.jndi.rmiURLParsing=legacy. Ignore these, by dropping anything
that starts with -D from the command line args.
2024-03-20 02:11:47 -04:00
Botond Dénes
12516b0861 tools/utils: make finding the operation command line option more flexible
Currently all scylla-tools assume that the operation/command is in
argv[1]. This is not very flexible, because most programs allow global
options (that are not dependent on the current operation/command) to be
passed before the operation name on the command line. Notably C*'s
nodetool is one such program and indeed scripts and tests using nodetool
do utilize this.
This patch makes this more flexible. Instead of looking at argv[1], do
an initial option parsing with boost::program_options to locate the
operation parameter. This initial parser knows about the global options,
and the operation positional argument. It allows for unrecognized
positional and non-positional arguments, but only after the command.
With this, any combination of global options + operation is allowed, in
any order.
2024-03-20 02:11:47 -04:00
Botond Dénes
7ae98c586a tools/utils: get_selected_operation(): remove alias param
This method has a single caller, who always passes "operation". Just
hard-code this into the method, no need to keep a param for it.
2024-03-20 02:11:47 -04:00
Botond Dénes
28e7eecf0b tools: add constant with current help command-line arguments
Unfortunately, we have code in scylla-nodetool.cc which needs to know
what are the current help options available. Soon, there will be more
code like this in tools/utils.cc, so centralize this list in a const
static tool_app_template member.
2024-03-20 02:11:47 -04:00
Petr Gusev
5db6b8b3c2 error_injection: move api registration to set_server_init
The set_server_done function is called only
when a node is fully initialized. To allow error
injection to be used during initialization we
move the handler registration to set_server_init,
which is called as soon as the api http server
is started.
2024-03-19 20:18:29 +04:00
Petr Gusev
e4318e139d error_injection: add inject_parameter method
In this commit we extend the error_injector
with a new method inject_parameter. It allows
to pass parameters from tests to scylla, e.g. to
lower timeouts or limits. A typical use cases is
described in scylladb/scylladb#15571.

It's logically the same as inject_with_handler,
whose lambda reads the parameter named 'value'.
The only difference is that the inject_parameter
doesn't return future, it just read the
parameter from  the injection shared_data.
2024-03-19 20:18:23 +04:00
Petr Gusev
460567c4fd error_injection: move injection_name string into injection_shared_data
In subsequent commit we'll need the injection_name from inside
injection_shared_data, so in this commit we move it there.
Additionally, we fix the todo about switching the injections dictionary
from map to unordered_set, now unordered_map contains
string_views, pointing to injection_name inside
injection_shared_data.
2024-03-19 20:17:02 +04:00
Petr Gusev
49a4220fea error_injection: pass injection parameters at startup
Injection parameters can be used in the lambda passed to
inject_with_handler method to take some values from
the test. However, there was no way to set values to these
parameters on node startup, only through
the error injection REST api. Therefore, we couldn't rely
on this when inject_with_handler is used during
node startup, it could trigger before we call the api
from the test.

In this commit with solve this problem by allowing these
parameters to be assigned through scylla.yaml config.

The defer.hh header was added to error_injection.hh to fix
compilation after adding error_injection.hh to config.hh,
defer function is used in error_injection.hh.
2024-03-19 20:17:02 +04:00
Andrei Chekun
b52f79b1ce Fix leaking file descriptors in test.py
Fixes #17569

Tests are not closing file descriptor after it finishes. This leads to inability to continue tests since the default value for opened files in Linux is 1024. Issue easy to reproduce with the next command:
```
$ ./test.py --mode debug test_native_transport --repeat 1500
```
After fix applied all tests are passed with a next command:
```
$ ./test.py --mode debug test_native_transport --repeat 10000
```

Closes scylladb/scylladb#17798
2024-03-19 14:59:14 +01:00
Piotr Dulikowski
70cb1dc8fe doc: describe upgrade and recovery for raft topology
Document the manual upgrade procedure that is required to enable
consistent cluster management in clusters that were upgraded from an
older version to ScyllaDB Open Source 6.0. This instruction is placed in
previously placeholder "Enable Raft-based Topology" page which is a part
of the upgrade instructions to ScyllaDB Open Source 6.0.

Add references to the new description in the "Raft Consensus Algorithm
in ScyllaDB" document in relevant places.

Extend the "Handling Node Failures" document so that it mentions steps
required during recovery of a ScyllaDB cluster running version 6.0.

Fixes: scylladb/scylladb#17341

Closes scylladb/scylladb#17624
2024-03-19 14:59:14 +01:00
Gleb Natapov
fde3068530 topology_coordinator: set address map entry to nonexpiring when a node is added to the topology
Currently a node's address is set to nonexpiring in the address map when
the node is added to group0, but the node is added to the topology earlier
(during the join request) and the cluster must be able to communicate
with it (potentially) much later when the request will be processed.
The patch marks nodes that are in the topology, but no yet in group0 as
non expiring, so they will not be dropped from address map until their
join request is processed.

Fixes: scylladb/scylladb#16849
2024-03-19 13:35:19 +02:00
Gleb Natapov
9651ae875f raft_group0: add modifiable_address_map() function
Provide access to non const address_map. We will need it later.
2024-03-19 13:34:41 +02:00
Yaron Kaikov
ad76f0325e [action] Sync labels from an Issue to linked PR
After merging https://github.com/scylladb/scylladb/pull/17365, all backport labels should be added to PR (before we used to add backport labels to the issues).

Adding a GitHub action which will be triggered in the following conditions only:

1) The base branch is `master` or `next`
2) Pull request events:
- opened: For every new PR that someone opens, we will sync all labels from the linked issue (if available)
- labeled: This role only applies to labels with the `backport/` prefix. When we add a new label for the backport we will update the relevant issue or PR to get them both to sync
- unlabeled: Same as `labeled` only applies to the `backport/` prefix. When we remove a label for backport we will update the relevant issue or pr

Closes scylladb/scylladb#17715
2024-03-19 09:17:07 +02:00
Avi Kivity
e48eb76f61 sstables_manager: decouple from system_keyspace
sstables_manager now depends on system_keyspace for access to the
system.sstables table, needed by object storage. This violates
modularity, since sstables_manager is a relatively low-level leaf
module while system_keyspace integrates large parts of the system
(including, indirectly, sstables_manager).

One area where this is grating is sstables::test_env, which has
to include the much higher level cql_test_env to accommodate it.

Fix this by having sstables_manager expose its dependency on
system_keyspace as an interface, sstables_registry, and have
system_keyspace implement the glue logic in
system_keyspace_sstables_manager.

Closes scylladb/scylladb#17868
2024-03-18 20:38:07 +03:00
Anna Stuchlik
a13694daea doc: fix the image upgrade page
This commit updates the Upgrade ScyllaDB Image page.

- It removes the incorrect information that updating underlying OS packages is mandatory.
- It adds information about the extended procedure for non-official images.

Closes scylladb/scylladb#17867
2024-03-18 18:27:46 +02:00
Gleb Natapov
af218d0063 raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
group0 operations a valid on shard 0 only. Assert that. We already do
that in the version of the function that gets abort source.

Message-ID: <ZeCti70vrd7UFNim@scylladb.com>
2024-03-18 16:20:41 +01:00
Pavel Emelyanov
a8f48e0f6b test/boost/tablets: Use verbose BOOST_REQUIRE checkers
Lot's of BOOST_REQUIRES in this test require some integers to be in some
eq/gt/le relations to each other. And one place that compares rack names
as strings. Using more verbose boost checkers is preferred in such cases

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17866
2024-03-18 17:09:02 +02:00
Botond Dénes
270d01f16a Merge 'build: cmake: put server deb packages under build/dist/$<CONFIG>/debian' from Kefu Chai
this change is a follow up of ca7f7bf8e2, which changed the output path to build/$<CONFIG>/debian. but what dist/docker/debian/build_docker.sh expects is `build/dist/$config/debian/*.deb`, where `$config` is the normalized mode, when the debian packages are built using CMake generated rules, `$mode` is CMake configuration name, i.e., `$<CONFIG>`. so, ca7f7bf8e2 made a mistake, as it does not match the expectation of `build_docker.sh`.

in this change, this issue is addressed. so we use the same path in both `dist/CMakeLists.txt` and `dist/docker/debian/build_docker.sh`.

Closes scylladb/scylladb#17848

* github.com:scylladb/scylladb:
  build: cmake: add dist-* targets to the default build target
  build: cmake: put server deb packages under build/dist/$<CONFIG>/debian
2024-03-18 16:18:35 +02:00
Avi Kivity
72bbe75d5b Merge 'Fix node replace with tablets for RF=N' from Tomasz Grabiec
This PR fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because tablet replica allocation for
rebuild will not be able to find a viable destination, as the replacing node
is not considered to be a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.

The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is in left state and replacing node is in normal state.

The replacing node waits for this draining to be complete on boot
before the node is considered booted.

Fixes https://github.com/scylladb/scylladb/issues/17025

Nodes in the left state will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet replica sets.

Currently left nodes are never removed from topology, so will
accumulate in memory. We could garbage-collect them from topology
coordinator if a left node is absent in any replica set. That means we
need a new state - left_for_real.

Closes scylladb/scylladb#17388

* github.com:scylladb/scylladb:
  test: py: Add test for view replica pairing after replace
  raft, api: Add RESTful API to query current leader of a raft group
  test: test_tablets_removenode: Verify replacing when there is no spare node
  doc: topology-on-raft: Document replace behavior with tablets
  tablets, raft topology: Rebuild tablets after replacing node is normal
  tablets: load_balancer: Access node attributes via node struct
  tablets: load_balancer: Extract ensure_node()
  mv: Switch to using host_id-based replica set
  effective_replication_map: Introduce host_id-based get_replicas()
  raft topology: Keep nodes in the left state to topology
  tablets: Introduce read_required_hosts()
2024-03-18 16:16:08 +02:00
Kefu Chai
d1c35f943d test: unit: add fmt::formatter for test_data in tests
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* test_data in two different tests
* row_cache_stress_test::reader_id

and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17861
2024-03-18 15:35:28 +02:00
Kefu Chai
de6803de92 build: cmake: use --ld-path for specifying linker for clang
Clang > 12 starts to complain like
```
warning:  '-fuse-ld=' taking a path is deprecated; use '--ld-path=' instead [-Wfuse-ld-path]'
```
this option is not supported by GCC yet. also instead of using
the generic driver's name, use the specific name. otherwise ld
fails like
```
lld is a generic driver.
Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld (WebAssembly) instead
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17825
2024-03-18 14:49:11 +02:00
Pavel Emelyanov
933b346166 test/tablets: Add test to check how ALTER changes RF (in one DC)
For now test is incomplete in several ways

1. It xfails, until #17116
2. It doesn't rebuild/repair tablets
3. It doesn't check that tablet data actually exists on replicas

refs: #17575

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17808
2024-03-18 14:47:57 +02:00
Yaron Kaikov
6406d3083c [mergify] set draft PR when conflicts
When Mergify open a backport PR and identify conflicts, it adding the
`conflicts` label. Since GitHub can't identify conflicts in PR, setting
a role to move PR to draft, this way we will not trigger CI

Once we resolve the conflicts developer should make the PR `ready for
review` (which is not draft) and then CI will be triggered

`conflict` label can also be removed

Closes scylladb/scylladb#17834
2024-03-18 14:45:08 +02:00
Beni Peled
bddac3279e Skip the backport-label workflow for draft pull requests
It's not necessary (and annoying) when this workflow runs and fails
against PRs in draft mode

Closes scylladb/scylladb#17864
2024-03-18 14:42:55 +02:00
Wojciech Mitros
efcb718e0a mv: adjust memory tracking of single view updates within a batch
Currently, when dividing memory tracked for a batch of updates
we do not take into account the overhead that we have for processing
every update. This patch adds the overhead for single updates
and joins the memory calculation path for batches and their parts
so that both use the same overhead.

Fixes #17854

Closes scylladb/scylladb#17855
2024-03-18 14:31:54 +02:00
Kefu Chai
d57a82c156 build: cmake: add dist-* targets to the default build target
also, add a target of `dist-server`, which mirrors the structure
of the targets created by `configure.py`, and it is consistent
with the ones defined by `build_submodule()`.

so that they are built when our CI runs `ninja -C $build`. CI
expects that all these rpm and deb packages to built when
`ninja -C $build` finishes. so that it can continue with
building the container image. let's make it happen. so that
the CMake-based rules can work better with CI.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-18 20:02:43 +08:00
Raphael S. Carvalho
2c9b13d2d1 compaction: Check for key presence in memtable when calculating max purgeable timestamp
It was observed that some use cases might append old data constantly to
memtable, blocking GC of expired tombstones.

That's because timestamp of memtable is unconditionally used for
calculating max purgeable, even when the memtable doesn't contain the
key of the tombstone we're trying to GC.

The idea is to treat memtable as we treat L0 sstables, i.e. it will
only prevent GC if it contains data that is possibly shadowed by the
expired tombstone (after checking for key presence and timestamp).

Memtable will usually have a small subset of keys in largest tier,
so after this change, a large fraction of keys containing expired
tombstones can be GCed when memtable contains old data.

Fixes #17599.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17835
2024-03-18 13:37:44 +02:00
Benny Halevy
2c0b1d1fa7 compaction: get_max_purgeable_timestamp: optimize sstable filtering by min_timestamp
There is no point in checking `sst->filter_has_key(*hk)`
if the sstable contains no data older than the running
minimum timestamp, since even if it matches, it won't change
the minimum.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17839
2024-03-18 13:26:49 +02:00
Avi Kivity
ed211cd0bf sstables: partition_index_cache: reindent
Fix up after e120ba3514.

Closes scylladb/scylladb#17847
2024-03-18 13:23:21 +02:00
Andrei Chekun
b6edf056ea Add sanity tests for multi dc
Fix writing cassandra-rackdc.properties with correct format data instead of yaml
Add a parameter to overwrite RF for specific DC
Add the possibility to connect cql to the specific node

In this PR 4 tests were added to test multi-DC functionality. One is added from initial commit were multi-DC possibility were introduced, however, this test was not commited. Three of them are migrations from dtest, that later will be deleted. To be able to execute migrated tests additional functionality is added: the ability to connect cql to the specific node in the cluster instead of pooled connection and the possibility to overwrite the replication factor for the specific DC. To be able to use the multi DC in test.py issue with the incorrect format of the properties file fixed in this PR.

Closes scylladb/scylladb#17503
2024-03-18 13:00:36 +02:00
Nadav Har'El
680e37c4af Merge 'schema_tables: unfreeze frozen_mutation:s gently' from Avi Kivity
With large schemas, unfreezing can stall, especially as it requires
a lot of memory. Switch to a gentle version that will not stall.

As a preparation step, we add unfreeze_gently() for a span of mutations.

Fixes #17841

Closes scylladb/scylladb#17842

* github.com:scylladb/scylladb:
  schema_tables: unfreeze frozen_mutation:s gently
  frozen_mutation: add unfreeze_gently(span<frozen_mutation>)
2024-03-18 12:56:44 +02:00
Kefu Chai
fe28aac440 test/perf: add fmt::formatter for perf_result_with_aio_writes
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `perf_result_with_aio_writes`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17849
2024-03-18 12:53:39 +02:00
Botond Dénes
a4e8bea679 tools/scylla-nodetool: status: handle missing host_id
Newly joining nodes may not have a host id yet. Handle this and print a
"?" for these nodes, instead of the host-id.
Extend the existing test for joining node case (also rename it and add
comment).

Closes scylladb/scylladb#17853
2024-03-18 12:26:59 +02:00
Kefu Chai
384e9e9c7c build: cmake: put server deb packages under build/dist/$<CONFIG>/debian
this change is a follow up of ca7f7bf8e2, which changed the output path
to build/$<CONFIG>/debian. but what dist/docker/debian/build_docker.sh
expects is `build/dist/$config/debian/*.deb`, where `$config` is the
normalized mode, when the debian packages are built using CMake
generated rules, `$mode` is CMake configuration name, i.e., `$<CONFIG>`.
so, ca7f7bf8e2 made a mistake, as it does not match the expectation of
`build_docker.sh`.

in this change, this issue is addressed. so we use the same path
in both `dist/CMakeLists.txt` and `dist/docker/debian/build_docker.sh`.

apply the same change to `dist-server-rpm`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-18 14:21:39 +08:00
Avi Kivity
731b5c5120 schema_tables: unfreeze frozen_mutation:s gently
With large schemas, unfreezing can stall, especially as it requires
a lot of memory. Switch to a gentle version that will not stall.
2024-03-17 17:46:02 +02:00
Avi Kivity
a34edb0a93 frozen_mutation: add unfreeze_gently(span<frozen_mutation>)
While we have unfreeze(vector<frozen_mutation>), a gentle version is
preferred.
2024-03-17 17:45:30 +02:00
Kefu Chai
8811900602 build: cmake: do not link randomized_nemesis_test with replication.cc
test/raft/replication.cc defines a symbol named `tlogger`, while
test/raft/randomized_nemesis_test.cc also defines a symbol with
the same name. when linking the test with mold, it identified the ODR
violation.

in this change, we extract test-raft-helper out, so that
randomized_nemesis_test can selectively only link against this library.
this also matches with the behavior of the rules generated by `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17836
2024-03-17 17:01:47 +02:00
Kefu Chai
e1ae36ecfd test/boost: add formatter for BOOST_REQUIRE_EQUAL
in gossiping_property_file_snitch_test, we use
`BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])` to check the equality
of two instances of `pair<sstring, sstring`, like:
```c++
BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])
```

since the standard library does not provide the formatter for printing
`std::pair<>`, we rely on the homebrew generic formatter to
print `std::pair<>, which in turn uses operator<< to format the
elements in the `pair`, but we intend to remove this formatter
in future, as the last step of #13245 .

so in order to enable Boost.test to print out lhs and rhs when
`BOOST_REQUIRE_EQUAL` check fails, we are adding
`boost_test_print_type()` for `pair<sstring,sstring>`. the helper
function uses {fmt} to print the `pair<>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17831
2024-03-17 16:58:39 +02:00
Kefu Chai
6244a2ae00 service:qos: add fmt::formatter for service_level_options::workload_type
this change prepares for the fmt::formatter based formatter used by
tests, which will use {fmt} to print the elements in a container,
so we need to define the formatter using fmt::formatter for these
element. the operator<< for service_level_options::workload_type is
preserved, as the tests are still using it.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17837
2024-03-17 16:52:57 +02:00
Kefu Chai
7df3acd39c repair: add fmt::formatter for row_level_diff_detect_algorithm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
row_level_diff_detect_algorithm. please note, we already have
`format_as()` overload for this type, but we cannot use it as a
fallback of the proper `fmt::formatter<>` specialization before
{fmt} v10. so before we update our CI to a distro with {fmt} v10,
`fmt::formatter<row_level_diff_detect_algorithm>` is still
needed.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17824
2024-03-16 19:12:49 +02:00
Botond Dénes
03c47bc30b tools/scylla-nodetool: status: handle nodes without load
Some nodes may not have a load yet. Handle this. Also add a test
covering this case.

Closes scylladb/scylladb#17823
2024-03-16 17:38:53 +02:00
Pavel Emelyanov
42a2dce4b6 test/lib: Eliminate variadic futures from template
The assert_that_failed(future) pair of helpers are templates with
variadic futures, but since they are gone in seastar, so should they in
test/lib

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17830
2024-03-16 17:37:25 +02:00
Kefu Chai
8bab51733f db: add fmt::formatter for db::functions::function
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::functions::function`.
please note, because we use `std::ostream` as the parameter of
the polymorphism implementation of `function::print()`.
without an intrusive change, we have to use `fmt::ostream_formatter`
or at least use similar technique to format the `function` instance
into an instance of `ostream` first. so instead of implementing
a "native" `fmt::formatter`, in this change, we just use
`fmt::ostream_formatter`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17832
2024-03-16 17:36:49 +02:00
Kefu Chai
23e9958ebb data_dictionary: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17826
2024-03-15 21:17:11 +03:00
Botond Dénes
ad9bad4700 tools/scylla-nodetool: {proxy,table}histograms: handle empty histograms
Empty histograms are missing some of the members that non-empty
histograms have. The code handling these histograms assumed all required
members are always present and thus error out when receiving an empty
histogram.
Add tests for empty histograms and fix the code handling them to check
for the potentially missing members, instead of making assumptions.

Closes scylladb/scylladb#17816
2024-03-15 15:59:31 +03:00
Tomasz Grabiec
a233a699cc test: py: Add test for view replica pairing after replace 2024-03-15 13:20:08 +01:00
Tomasz Grabiec
6d50e93f10 raft, api: Add RESTful API to query current leader of a raft group
Example:

  $ curl -X GET "http://127.0.0.1:10000/raft/leader_host"
  "f7f57588-62de-4cac-9e4b-c62bfc458d91"

Accepts optional group_id param, defaults to group0.
2024-03-15 13:20:08 +01:00
Tomasz Grabiec
6d24fdee75 test: test_tablets_removenode: Verify replacing when there is no spare node
The test is changed to be more strict. Verifies the case of replacing
when RF=N in which case tablet replicas have to be rebuilt using the
replacing node.

This would fail if tablets are drained as part of replace operation,
since replacing node is not yet a viable target for tablet migration.
2024-03-15 13:20:08 +01:00
Tomasz Grabiec
1d01b4ca20 doc: topology-on-raft: Document replace behavior with tablets 2024-03-15 13:20:08 +01:00
Tomasz Grabiec
1c71f44e63 tablets, raft topology: Rebuild tablets after replacing node is normal
This fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because new tablet replica allocation
will not be able to find a viable destination, as the replacing node
is not considered a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.

The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is left and replacing node is normal.

The replacing node waits for this draining to be complete on boot
before the node is considered booted.

Fixes #17025
2024-03-15 13:20:08 +01:00
Tomasz Grabiec
b2418fab39 tablets: load_balancer: Access node attributes via node struct
Reduces lookups into topology and decouples the algorithm more from
the topology object.
2024-03-15 11:22:34 +01:00
Tomasz Grabiec
9090050244 tablets: load_balancer: Extract ensure_node()
Will be called in another loop to populate the "nodes" map with left node.
2024-03-15 11:22:32 +01:00
Artsiom Mishuta
73ed4c0eb5 test.py: fix aiohttp usage issue in python 3.12
Fix aiohttp usage issue in python 3.12:
"Timeout context manager should be used inside a task"

This occurs due to UnixRESTClient created in one event loop (created
inside pytest) but used in another (created in rewriten event_loop
fixture), now it is fixed by updating UnixRESTClient object for every new
loop.

Closes scylladb/scylladb#17760
2024-03-15 11:17:29 +01:00
Tomasz Grabiec
9b656ec2aa mv: Switch to using host_id-based replica set
This is necessary to not break replica pairing between base and
view. After replacing a node, tablet replica set contains for a while
the replaced node which is in the left state. This node is not
returned by the IP-based get_natural_endpoints() so the replica
indexes would shift, changing the pairing with the view.

The host_id-based replica set always has stable indexes for replicas.
2024-03-15 11:05:29 +01:00
Tomasz Grabiec
888dc41d66 effective_replication_map: Introduce host_id-based get_replicas() 2024-03-15 11:05:29 +01:00
Tomasz Grabiec
61b3453552 raft topology: Keep nodes in the left state to topology
Those nodes will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on
 their location. For example materialized views code which pairs base
 replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order
 to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet
replica sets.

We load topology infromation only for left nodes which are actually
referenced by any tablet. To achieve that, topology loading code
queries system.tablet for the set of hosts. This set is then passed to
system.topology loading method which decides whether to load
replica_state for a left node or not.
2024-03-15 11:05:29 +01:00
Tomasz Grabiec
f7851696fa tablets: Introduce read_required_hosts()
Will be used by topology loading code to determine which hosts are
needed in topology, even if they're in the left state. We want to load
only left nodes if they are referenced by any tablet, which may happen
temporarily until the replacement replica is rebuilt.
2024-03-15 11:05:29 +01:00
Botond Dénes
598e5aebfb test/cql-pytest: test_virtual_tables: add test for token_ring table
Just a simple sanity test for both vnodes and tablets.
2024-03-15 04:23:20 -04:00
Botond Dénes
279e496133 db/virtual_tables: token_ring_table: add tablet support
For keyspaces which use tablets, we describe each table separately.
2024-03-15 04:23:20 -04:00
Botond Dénes
61b6ac7ffe db/virtual_tables: token_ring_table: add table_name column
As the first clustering column. For vnode keyspaces, this will always be
"ALL", for tablet keyspaces, this will contain the name of the described
table.
2024-03-15 04:23:20 -04:00
Botond Dénes
fdef62c232 db/virtual_tables: token_ring_table: extract ring emit
Into a separate method. For vnodes there is a single ring per keyspace,
but for tablets, there is a separate ring for each table in the
keyspace. To accomodate both, we move the code emitting the ring into a
separate method, so execute() can just call it once per keyspace or once
per table, whichever appropriate.
2024-03-15 04:23:20 -04:00
Botond Dénes
a205752513 service/storage_service: describe_ring_for_table(): use topology to map hostid to ip
Do no use the internal host2ip() method. This relies on `_group0`, which
is only set on shard 0. Consequently, any call to this method, coming
from a shard other than shard 0, would crash ScyllaDB, as it
dereferences a nullptr.
2024-03-15 04:23:20 -04:00
Nadav Har'El
6cdb68f094 test/cql-pytest: remove unused function
Remove an unused function from test/cql-pytest/test_using_timeout.py.
Some linters can complain that this function used re.compile(), but
the "re" package was never imported. Since this function isn't used,
the right fix is to remove it - and not add the missing import.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17801
2024-03-15 09:56:30 +02:00
Kefu Chai
e1a9340cc1 partition_version: add fmt::formatter for partition_entry::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `parition_entry::printer`,
and drop its operator<< .

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17812
2024-03-15 09:52:27 +02:00
Kefu Chai
a0625261ef build: cmake: reword the comment for dev-headers
before this change, the comment was difficult to parse. let's update
it for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17814
2024-03-15 09:51:47 +02:00
Kefu Chai
640d573106 schema_mutations: add fmt::formatter for schema_mutations
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `schema_mutations`,
and drop its operator<< .

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17815
2024-03-15 09:49:56 +02:00
Kefu Chai
3edd530bd1 test/boost: add formatter for BOOST_REQUIRE_EQUAL
before this change, we rely on the homebrew generic formatter to
print unordered_set<>, which in turn uses operator<< to format the
elements in the `unordered_set`, but we intend to remove this formatter
in future, as the last step of #13245 .

so enable Boost.test to print out lhs and rhs when `BOOST_REQUIRE_EQUAL`
check fails, we are adding `boost_test_print_type()` for
`unordered_set<fruit>`. the helper function uses {fmt} to print the
`unordered_set<>`, so we are adding a fmt::formatter for `fruit`, the
operator<< for this type is dropped, as it is not used anymore.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17813
2024-03-15 09:40:22 +02:00
Benny Halevy
530d270828 api: /storage_service/tablets/balancing: fix incorrect operation summary
It was probably copy-pasted from /storage_service/tablets/move

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17811
2024-03-14 22:52:57 +01:00
Tomasz Grabiec
8c5d088928 Merge 'Drop tablets of dropped views and indices' from Benny Halevy
This series adds notification before dropping views and indices so that the
tablet_allocator can generate mutations to respectively drop all tablets associated with them from system.tablets.

Additional unit tests were added for these cases.

Note that one case is not yet tested: where a table is allowed to be dropped while having views that depend on it, when it is dropped from the alternator path.

This is tested indirectly by testing dropping a table with live secondary index as it follows the same notification path as views in this series.

Fixes #17627

Closes scylladb/scylladb#17773

* github.com:scylladb/scylladb:
  migration_manager: notify before_drop_column_family when dropping indices
  schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices
  migration_manager: notify before_drop_column_family before dropping views
  cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table
  tablet_allocator: on_before_drop_column_family: remove unused result variable
2024-03-14 22:52:29 +01:00
Raphael S. Carvalho
c46c2d436f sstables: Reduce cost for loading sstables with tablets
Loader was changed to quickly determine ownership after consuming
sharding metadata only. If it's not available, it falls back to
reading first and last keys from summary. The fallback is only there
for backward compatibility and it costs a lot more as we don't
skip to the end where keys are located in summary.

With tablets, sharding metadata is only first and last keys so
we can do it without sharder. So loader will be able to use it
instead of looking up keys in summary.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17805
2024-03-14 21:06:35 +01:00
Pavel Emelyanov
8ffb5f27c7 topology_coordinator: Clear tablet transition session after streaming
When jumping from streaming stage into cleanup_target, session must also
be cleared as pending replica may still process some incoming mutations
blocked in the pipeline. Deleting session prior to executing barrier
makes sure those mutations will not be applied.

fixes: #17682

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17800
2024-03-14 20:35:00 +01:00
Pavel Emelyanov
6a77f36519 doc: Add tablets migration state diagram
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17790
2024-03-14 20:29:21 +01:00
Benny Halevy
5bfca73b30 migration_manager: notify before_drop_column_family when dropping indices
Fixes #17627

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:19:12 +02:00
Benny Halevy
9cf6a2e510 schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices
When dropping indices, we don't need to go through
`create_view_for_index` in order to drop the index.
That actually creates a new schema for this view
which is used just for its metadata for generating mutations
dropping it.

Instead, use `find_schema` to lookup the current schema
for the dropped index.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:19:11 +02:00
Benny Halevy
358e92e645 migration_manager: notify before_drop_column_family before dropping views
Call the before_drop_column_family notifications
before dropping the views to allow the tablet_allocator
to delete the view's tablets.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 20:14:56 +02:00
Avi Kivity
5e28bf9b5c Merge 'Do not try to balance tablets on nodes which are known to be down' from Pavel Emelyanov
Tablet transition would get stuck anyway for such nodes, so it's not worth trying

refs: #16372 (not fixes, because there's also repair transitions with same problem)

Closes scylladb/scylladb#17796

* github.com:scylladb/scylladb:
  topology_coordinator: Skip dead nodes when balancing tablets
  test: Add test for load_balancer skiplist
  tablet_allocator: Add skiplist to load_balancer
2024-03-14 18:47:51 +02:00
Avi Kivity
0f188f2d9f Merge 'tools/scylla-nodetool: implement the status command' from Botond Dénes
The status command has an extensive amount of requests to the server. To be able to handle this more easily, the rest api mock server is refactored extensively to be more flexible, accepting expected requests out-of-order. While at it, the rest api mock server also moves away from a deprecated `aiohttp` feature: providing custom router argument to the `aiohttp` app. This forces us to pre-register all API endpoints that any test currently uses, although due to some templateing support, this is not as bad as it sounds. Still, this is an annoyance, but this point we have implemented almost all commands, so this won't be much a of a problem going forward.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17547

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the status command
  test/nodetool: rest_api_mock.py: match requests out-of-order
  test/nodetool: rest_api_mock.py: remove trailing / from request paths
  test/nodetool: rest_api_mock.py: use static routes
  test/nodetool: check only non-exhausted requests
  tools/scylla-nodetool: repair: set the jobThreads request parameter
2024-03-14 18:42:54 +02:00
Kamil Braun
5ef47c42b3 Merge 'remove_rpc_client_with_ignored_topology: recreate rpc client earlier' from Petr Gusev
It's too late to call `remove_rpc_client_with_ignored_topology` on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to `write_both_read_new`. The `remove_rpc_client_with_ignored_topology` function shutdowns sockets and causes such requests to timeout.

In this PR we move the `remove_rpc_client_with_ignored_topology` call to the earliest point possible when a node first appears in `token_metadata.topology`.

From the topology coordinator perspective this happens when a joining node moves to `node_state::bootstrapping` and the topology moves to `transition_state::join_group0`. In `sync_raft_topology_nodes` the node should be contained in transition_nodes. The successful `wait_for_ip` before entering `transition_state::join_group0` ensures that update_topology should find a node's IP and put it into the topology. The barrier in `commit_cdc_generation` will ensure that all nodes in the cluster are using the proper connection parameters.

Only outgoing connections are tracked by `remove_rpc_client_with_ignored_topology`, those created by the current node. This means we need to call `remove_rpc_client_with_ignored_topology` on each node of the cluster.

fixes scylladb/scylladb#17445

Closes scylladb/scylladb#17757

* github.com:scylladb/scylladb:
  test_remove_rpc_client_with_pending_requests: add a regression test
  remove_rpc_client_with_ignored_topology: call it earlier
  storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined
2024-03-14 17:20:59 +01:00
Yaniv Kaul
a2ac80340f Typo: pint -> print
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#17804
2024-03-14 15:50:35 +02:00
Wojciech Mitros
59d5bfa742 mv: fail base writes instead of dropping view updates when overloaded
Since 4c767c379c we can reach a situation
where we know that we have admitted too many expensive view update
operations and the mechanism of dropping the following view updates
can be triggerred in a wider range of scenarios. Ideally, we would
want to fail whole requests on the coordinator level, but for now, we
change the behavior to failing just the base writes. This allows us
to avoid creating inconsistencies between base replicas and views
at the cost of introducing inconsistencies between different base
replicas. This, however, can be fixed by repair, in contrast to
base-view inconsistencies which we don't have a good method of fixing.

Fixes #17795

Closes scylladb/scylladb#17777
2024-03-14 15:11:45 +02:00
Aleksandra Martyniuk
43ef6e6ab9 test: fix regular compaction tasks check
Since 6b87778 regular compaction tasks are removed from task manager
immediately after they are finished.

test_regular_compaction_task lists compaction tasks and then requests
their statuses. Only one regular compaction task is guaranteed to still
be running at that time, the rest of them may finish before their status
is requested and so it will no longer be in task manager, causing the test
to fail.

Fix statuses check to consider the possibility of a regular compaction
task being removed from task manager.

Fixes: #17776.

Closes scylladb/scylladb#17784
2024-03-14 14:40:18 +02:00
Piotr Smaron
ad2d039e3d db: move all group 0 tables to schema commitlog
This is to have durability for the group0 tables.
But also because I need it specifially to make
`system.topology` & `system_schema.scylla_keyspaces`
mutations under a single raft command in https://github.com/scylladb/scylladb/pull/16723

Fixes: #15596

Closes scylladb/scylladb#17783
2024-03-14 13:33:30 +01:00
Piotr Dulikowski
2d9e78b09a gossiper: failure detector: don't handle directly removed live endpoints
Commit 0665d9c346 changed the gossiper
failure detector in the following way: when live endpoints change
and per-node failure detectors finish their loops, the main failure
detector calls gossiper::convict for those nodes which were alive when
the current iteration of the main FD started but now are not. This was
changed in order to make sure that nodes are marked as down, because
some other code in gossiper could concurrently remove nodes from
the live node lists without marking them properly.

This was committed around 3 years ago and the situation changed:

- After 75d1dd3a76
  the `endpoint_state::_is_alive` field was removed and liveness
  of a node is solely determined by its presence
  in the `gossiper::_live_endpoints` field.
- Currently, all gossiper code which modifies `_live_endpoints`
  takes care to trigger relevant callback. The only function which
  modifies the field but does not trigger notifications
  is `gossiper::evict_from_membership`, but it is either called
  after `gossiper::remove_endpoint` which triggers callbacks
  by itself, or when a node is already dead and there is no need
  to trigger callbacks.

So, it looks like the reasons it was introduced for are not relevant
anymore. What's more important though is that it is involved in a bug
described in scylladb/scylladb#17515. In short, the following sequence
of events may happen:

1. Failure detector for some remote node X decides that it was dead
   long enough and `convict`s it, causing live endpoints to be updated.
2. The gossiper main loop sends a successful echo to X and *decides*
  to mark it as alive.
3. At the same time, failure detector for all nodes other than X finish
  and main failure detector continues; it notices that node X is
  not alive (because it was convicted in point 1.) and *decides*
  to convict it.
4. Actions planned in 2 and 3 run one after another, i.e. node is first
  marked as alive and then immediately as dead.

This causes `on_alive` callbacks to run first and then `on_dead`. The
second one is problematic as it closes RPC connections to node X - in
particular, if X is in the process of replacing another node with the
same IP then it may cause the replace operation to fail.

In order to simplify the code and fix the bug - remove the piece
of logic in question.

Fixes: scylladb/scylladb#17515

Closes scylladb/scylladb#17754
2024-03-14 13:29:17 +01:00
Botond Dénes
d6103dc1b6 tools/scylla-nodetool: snapshot: handle ks.tbl positional args correctly
Nodetool currently assumes that positional arguments are only keyspaces.
ks.tbl pairs are only provided when --kt-list or friends are used. This
is not the case however. So check positional args too, and if they look
like ks.tbl, handle them accordingly.

While at it, also make sure that alternator keyspace and tables names
are handled correctly.

Closes scylladb/scylladb#17480
2024-03-14 13:42:23 +02:00
Avi Kivity
dd76e1c834 Merge 'Simplify error_injection::inject_with_handler()' from Pavel Emelyanov
The method in question can have a shorter name that matches all other injections in this class, and can be non-template

Closes scylladb/scylladb#17734

* github.com:scylladb/scylladb:
  error_injection: De-template inject() with handler
  error_injection: Overload inject() instead of inject_with_handler()
2024-03-14 13:37:54 +02:00
Petr Gusev
2783985bb2 test_remove_rpc_client_with_pending_requests: add a regression test
This test reproduces the problem from scylladb/scylladb#17445.
It fails quite reliably without the fix from the previous
commit.

The test just bootstraps a new node while bombarding the cluster
with read requests.
2024-03-14 15:17:34 +04:00
Petr Gusev
398e14d6d0 remove_rpc_client_with_ignored_topology: call it earlier
In this commit we move the remove_rpc_client_with_ignored_topology
call to the earliest point possible - when a node first appears
in token_metadata.topology.

From the topology coordinator perspective this happens when a joining
node moves to node_state::bootstrapping and the topology moves to
transition_state::join_group0. In sync_raft_topology_nodes
the node should be contained in transition_nodes. The successful
wait_for_ip before entering transition_state::join_group0 ensures
that update_topology should find a node's IP and put it into the topology.
The barrier in commit_cdc_generation will ensure that all nodes
in the cluster are using the proper connection parameters.

Only outgoing connections are tracked by remove_rpc_client_with_ignored_topology,
those created by the current node. This means we need to call
remove_rpc_client_with_ignored_topology on each node of the cluster.

fixes scylladb/scylladb#17445
2024-03-14 15:10:09 +04:00
Petr Gusev
1b9f21314f storage_service: decouple remove_rpc_client_with_ignored_topology from
notify_joined

It's too late to call remove_rpc_client_with_ignored_topology on
messaging service when a node becomes normal. Data
plane requests can be routed to the node much earlier,
at least when topology switches to write_both_read_new.
The remove_rpc_client_with_ignored_topology function
shutdowns sockets and causes such requests to timeout.

We intend to call remove_rpc_client_with_ignored_topology
as soon as a node becomes part of token_metadata topology.
In this preparatory commit we refactor
storage_service::notify_joined. We remove the
remove_rpc_client_with_ignored_topology call from it
call it separately from the two call sites of notify_joined.
2024-03-14 15:10:09 +04:00
Kefu Chai
ce17841860 tools/scylla-nodetool: print bpo::options_description with fmt::streamed
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, since boost::program_options::options_description is
defined by boost.program_options library, and it only provides the
operator<< overload. we're inclined to not specializing `fmt::formatter`
for it at this moment, because

* this class is not in defined by scylla project. we would have to
  find a home for this formatter.
* we are not likely to reuse the formatter in multiple places

so, in this change we just print it using `fmt::streamed`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17791
2024-03-14 10:44:32 +02:00
Pavel Emelyanov
33d258528e topology_coordinator: Skip dead nodes when balancing tablets
The coordinator can find out which nodes are marked as DOWN, thus when
calling tablets balancer it can feed it a skiplist

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:51:11 +03:00
Pavel Emelyanov
ee55e8442a test: Add test for load_balancer skiplist
The test is inspired by the test_load_balancing_with_empty_node one and
verifies that when a node is skiplisted, balancer doesn't put load on it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:50:21 +03:00
Pavel Emelyanov
b4dd732dab tablet_allocator: Add skiplist to load_balancer
Currently load balancer skips nodes only based on its "administrative"
state, i.e. whether it's drained/decommissioned/removed/etc. There's no
way to exclude any node from balancing decision based on anything else.
This patch add this ability by adding skiplist argument to
balance_tablets() method. When a node is in it, it will not be
considered, as if it was removenode-d.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-14 10:47:31 +03:00
Kefu Chai
926fe29ebd db: commitlog: add fmt::formatter for commitlog types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* db::commitlog::segment::cf_mark
* db::commitlog::segment_manager::named_file
* db::commitlog::segment_manager::dispose_mode
* db::commitlog::segment_manager::byte_flow<T>

please note, the formatter of `db::commitlog::segment` is not
included in this commit, as we are formatting it in the inline
definition of this class. so we cannot define the specialization
of `fmt::formatter` for this class before its callers -- we'd
either use `format_as()` provided by {fmt} v10, or use `fmt::streamed`.
either way, it's different from the theme of this commit, and we
will handle it in a separated commit.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17792
2024-03-14 09:28:12 +02:00
Botond Dénes
20d5c536b5 tools/scylla-nodetool: implement the status command
Contrary to Origin, the single-token case is not discriminated in the
native implementation, for two reasons:
* ScyllaDB doesn't ever run with a single token, it is even moving away
  from vnodes.
* Origin implemented the logic to detect single-token with a mistake: it
  compares the number of tokens to the number of DCs, not the number of
  nodes.

Another difference is that the native implementation doesn't request
ownership information when a keyspace argument was not provided -- it is
not printed anyway.
2024-03-14 03:27:04 -04:00
Botond Dénes
2d4f4cfad4 test/nodetool: rest_api_mock.py: match requests out-of-order
In the previous patch, we made matching requests to different endpoints
be matched out-of-order. In this patch we go one step further and make
matching requests to the same endpoint match out-of-order too.
With this, tests can register the expected requests in any order, not in
the same order as the nodetool-under-test is expected to send them. This
makes testing more flexible. Also, how requests are ordered is not
interesting from the correctness' POV anyway.
2024-03-14 03:27:04 -04:00
Botond Dénes
09a27f49ea test/nodetool: rest_api_mock.py: remove trailing / from request paths
The legacy nodetool likes to append an "/" to the requests paths every
now and then, but not consistently. Unfortunately, request path matching
in the mock rest server and in aiohttp is quite sensitive to this
currently. Reduce friction by removing trailing "/" from paths in the
mock api, allowing paths to match each other even if one has a trailing
"/" but the other doesn't.
Unfortunately there is nothing we can do about the aiohttp part, so some
API endpoints have to be registered with a trailing "/".
2024-03-14 03:27:04 -04:00
Botond Dénes
5659f23b2a test/nodetool: rest_api_mock.py: use static routes
The mock server currently provides its own router to the aiohttp.web
app. The ability to provide custom routers  however is deprecated and
can be removed at any point. So refactor the mock server to use the
built-in router. This requires some changes, because the built-in router
does not allow adding/removing routes once the server starts. However
the mock server only learns of the used routes when the tests run.
This unfortunately means that we have to statically register all
possible routes the tests will use. Fortunately, aiohttp has variable
route support (templated routes) and with this, we can get away with
just 9 statically registered routes, which is not too bad.

A (desired) side-effect of this refactoring is that now requests to
different routes do not have to arrive in order. This constraint of the
previous implementation proved to be not useful, and even made writing
certain tests awkward.
2024-03-14 03:27:04 -04:00
Botond Dénes
061bd89957 test/nodetool: check only non-exhausted requests
Refactor how the tests check for expected requests which were never
invoked. At the end of every test, the nodetool fixture requests all
unconsumed expected requests from the rest_api_mock.py and checks that
there is none. This mechanism has some interaction with requests which
have a "multiple" set: rest_api_mock.py allows registering requests with
different "multiple" requirements -- how many times a request is
expected to be invoked:
* ANY: [0, +inf)
* ONE: 1
* MULTIPLE: [1, +inf)

Requests are stored in a stack. When a request arrives, we pop off
requests from the top until we find a perfect match. We pop off
requests, iff: multiple == ANY || multiple == MULTIPLE and was hit at
least once.
This works as long as we don't have an multiple=ANY request at the
bottom of the stack which is never invoked. Or a multiple=MULTIPLE one.
This will get worse once we refactor requests to be not stored in a
stack.

So in this patch, we filter requests when collecting unexhausted ones,
dropping those which would be qualified to be popped from the stack.
2024-03-14 03:27:04 -04:00
Botond Dénes
be5a18c07d tools/scylla-nodetool: repair: set the jobThreads request parameter
Although ScyllaDB ignores this request parameter, the Java nodetools
sets it, so it is better to have the native one do the same for
symmetry. It makes testing easier.
Discovered with the more strict request matching introduced in the next
patches.
2024-03-14 03:26:13 -04:00
Benny Halevy
b4245bf46e cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 09:01:30 +02:00
Asias He
9d41fb9bcd repair: Add hosts and ignore_nodes option support for tablet repair
It is not supported currently.

If a user passes the option, the request will be rejected with:

    The hosts option is not supported for tablet repair
    The ignore_nodes option is not supported for tablet repair

This option is useful to select nodes to repair.

Fixes: #17742

Tests: repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes
       repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes_errors
       repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_dc_host

Closes scylladb/scylladb#17767
2024-03-14 08:40:30 +02:00
Benny Halevy
b73aaee5e4 tablet_allocator: on_before_drop_column_family: remove unused result variable
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-14 08:34:02 +02:00
Avi Kivity
c1d8a1dda5 Merge 'Fix false-positive errors in scrub validate-mode' from Botond Dénes
The new MX-native validator, which validates the index in tandem with the data file, was discovered to print false-positive errors, related to range-tombstones and promoted-index positions.
This series fixes that. But first, it refactors the scrub-related tests. These are currently dominated by boiler-plate code. They are hard to read and hard to write. In the first half of the series, a new `scrub_test` is introduced, which moves all the boiler-plate to a central place, allowing the tests to focus on just the aspect of scrub that is tested.
Then, all the found bugs in validate are fixed and finally a new test, checking validate with valid sstable is introduced.

Fixes: #16326

Closes scylladb/scylladb#16327

* github.com:scylladb/scylladb:
  test/boost/sstable_compaction_test: add validation test with valid sstable
  sstablex/mx/reader: validate(): print trace message when finishing the PI block
  sstablex/mx/reader: validate(): make index-data PI position check message consistent
  sstablex/mx/reader: validate(): only load the next PI block if current is exhausted
  sstablex/mx/reader: validate(): reset the current PI block on partition-start
  sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked
  sstablex/mx/reader: validate(): fix validator for range tombstone end bounds
  test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper
  test/boost/sstable_compaction_test: fix indentation
  test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test
  test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test
  test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test
  test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test
  test/boost/sstable_compaction_test: introduce scrub_test_framework
  test/lib/random_schema: add uncompatible_timestamp_generator()
2024-03-13 20:51:30 +02:00
Kefu Chai
15bea069a9 docs: use less slangy language
this is a follow-up change of 1519904fb9, to incorporate the comment
from Anna Stuchlik.

Signed-off-by: Anna Stuchlik <anna.stuchlik@scylladb.com>

Closes scylladb/scylladb#17671
2024-03-13 13:33:37 +02:00
Avi Kivity
4db4b2279c Merge 'tools/scylla-nodetool: implement the last batch of commands' from Botond Dénes
This PR implements the following new nodetool commands:
* netstats
* tablehistograms/cfhistograms
* proxyhistograms

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#17651

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the proxyhistograms command
  tools/scylla-nodetool: implement the tableshistograms command
  tools/scylla-nodetool: introduce buffer_samples
  utils/estimated_histogram: estimated_histogram: add constructor taking buckets
  tools/scylla-nodetool: implement the netstats command
  tools/scylla-nodetool: add correct units to file_size_printer
2024-03-13 12:46:11 +02:00
Avi Kivity
e120ba3514 sstables: partition_index_cache: evict entries within a page gently
When the partition_index_cache is evicted, we yield for preemption between
pages, but not within a page.

Commit 3b2890e1db ("sstables: Switch index_list to chunked_vector
to avoid large allocations") recognized that index pages can be large enough
to overflow a 128k alignment block (this was before the index cache and
index entries were not stored in LSA then). However, it did not go as far as
to gently free individual entries; either the problem was not recognized
or wasn't as bad.

As the referenced issue shows, a fairly large stall can happen when freeing
the page. The workload had a large number of tombstones, so index selectivity
was poor.

Fix by evicting individual rows gently.

The fix ignores the case where rows are still references: it is unlikely
that all index pages will be referenced, and in any case skipping over
a referenced page takes an insignificant amount of time, compared to freeing
a page.

Fixes #17605

Closes scylladb/scylladb#17606
2024-03-13 10:44:37 +01:00
Marcin Maliszkiewicz
7b60752e47 test: fix cql connection problem in test_auth_raft_command_split
This is a speculative fix as the problem is observed only on CI.
When run_async is called right after driver_connect and get_cql
it fails with ConnectionException('Host has been marked down or
removed').

If the approach proves to be succesfull we can start to deprecate
base get_cql in favor of get_ready_cql. It's better to have robust
testing helper libraries than try to take care of it in every test
case separately.

Fixes #17713

Closes scylladb/scylladb#17772
2024-03-13 10:36:51 +01:00
Pavel Emelyanov
4d83a8c12c topology_coordinator: Mark constant class methods with const
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17756
2024-03-13 10:23:39 +02:00
Pavel Emelyanov
2e982df898 test/tablets: Generalize repair history loading
Two repair test cases verify that repair generated enough rows in the
history table. Both use identical code for that, worth generalizing

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17761
2024-03-13 10:22:57 +02:00
Pavel Emelyanov
88a40b0dfa uuid: UUID_gen::get_UUID src argument is const pointer
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17762
2024-03-13 10:21:25 +02:00
Botond Dénes
53e3325845 Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* mutation_partition_v2::printer
* frozen_mutation::printer
* mutation

their operator<<:s are dropped.

Refs #13245

Closes scylladb/scylladb#17769

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for mutation
  mutation: add fmt::formatter for frozen_mutation::printer
  mutation: add fmt::formatter for mutation_partition_v2::printer
2024-03-13 10:13:09 +02:00
Pavel Emelyanov
488404e080 gms: Remove unused i_failure_detection_event_listener
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17765
2024-03-13 09:33:56 +02:00
Kefu Chai
fb4f48b4ed schema: add fmt::formatter for schema
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* column_definition
* column_mapping
* ordinal_column_id
* raw_view_info
* schema
* view_ptr

their operator<<:s are dropped. but operator<< for schema is preserved,
as we are still printing `seastar::lw_shared_ptr<const schema>` with
our homebrew generic formatter for `seastar::lw_shared_ptr<>`, which
uses operator<< to print the pointee.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17768
2024-03-13 09:29:00 +02:00
Kefu Chai
85c4034495 .git: skip redis/lolwut.cc when scanning spelling errors
codespell reports "Nees" should be "Needs" but "Nees" is the last
name of Georg Nees. so it is not a misspelling. can should not be
fixed.

since the purpose of lolwut.cc is to display Redis version and
print a generative computer art. the one included by our version
was created by Georg Nees. since the LOLWUT command does not contain
business logic connected with scylladb, we don't lose a lot if skip
it when scanning for spelling errors. so, in this change, let's
skip it, this should silence one more warning from the github
codespell workflow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17770
2024-03-13 09:25:58 +02:00
Michał Chojnowski
75864e18a2 open-coredump.sh: respect http redirects
downloads.scylladb.com recently started redirecting from http to https
(via `301 Moved Permanently`).
This broke package downloading in open-coredump.sh.

To fix this, we have to instruct curl to follow redirects.

Closes scylladb/scylladb#17759
2024-03-13 08:57:04 +02:00
Pavel Emelyanov
d90db016bf treewide: Use partition_slice::is_reversed()
Continuation of cc56a971e8, more noisy places detected

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17763
2024-03-13 08:52:46 +02:00
Botond Dénes
a329cc34b7 tools/scylla-nodetool: implement the proxyhistograms command 2024-03-13 02:06:30 -04:00
Botond Dénes
a52eddc9c1 tools/scylla-nodetool: implement the tableshistograms command 2024-03-13 02:06:30 -04:00
Botond Dénes
151fb5a53b tools/scylla-nodetool: introduce buffer_samples
Based on Origin's org.apache.cassandra.tools.NodeProbe.BufferSamples.
To be used to qunatile time latency histogram samples.
2024-03-13 02:06:30 -04:00
Botond Dénes
47ac7d70e4 utils/estimated_histogram: estimated_histogram: add constructor taking buckets
And bucket offsets. Allows constructing the histogram back from a json
format.
2024-03-13 02:06:30 -04:00
Botond Dénes
006bc84761 tools/scylla-nodetool: implement the netstats command 2024-03-13 02:06:10 -04:00
Botond Dénes
ec7e1a2e92 tools/scylla-nodetool: add correct units to file_size_printer
When printing human-readable file-sizes, the Java nodetool always uses
base-2 steps (1024) to arrive at the human-readable size, but it uses
the base-10 units (MB) and base-2 units (MiB) interchangeably.
Adapt file_size_printer to support both. Add a flag to control which is
used.
2024-03-13 02:05:22 -04:00
Kefu Chai
2d319fa789 mutation: add fmt::formatter for mutation
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for mutation. but its operator<<
is preserved, as we are still using our homebrew generic formatter
for printing `std::vector<mutation>`, and this formatter is using
operator<< for printing the elements in vector.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-13 11:07:42 +08:00
Kefu Chai
acd14f12f0 mutation: add fmt::formatter for frozen_mutation::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for frozen_mutation::printer,
and drop its operator.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-13 10:47:22 +08:00
Kefu Chai
94d25e02ad mutation: add fmt::formatter for mutation_partition_v2::printer
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for mutation_partition_v2::printer, and
drop its operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-13 10:47:09 +08:00
Asias He
f74053af40 repair: Add dc option support for tablet repair
This patch adds the dc option support for table repair. The management
tool can use this option to select nodes in specific data centers to run
repair.

Fixes: #17550
Tests: repair_additional_test.py::TestRepairAdditional::test_repair_option_dc

Closes scylladb/scylladb#17571
2024-03-12 22:19:50 +02:00
Ferenc Szili
1da5b3033e scylla-nodetool: check for missing keyspace argument on describering
Calling scylla-nodetool with option describering and ommiting the keyspace
name argument results in a boost exception with the following error message:

error running operation: boost::wrapexcept<boost::bad_any_cast> (boost::bad_any_cast: failed conversion using boost::any_cast)

This change checks for the missing keyspace and outputs a more sensible
error message:

error processing arguments: keyspace must be specified

Closes scylladb/scylladb#17741
2024-03-12 21:19:11 +02:00
Avi Kivity
f410038296 Merge 'Use do_with_cql_env_thread() helper in storage proxy test' from Pavel Emelyanov
Just a cleanup -- replace do_with_cql_env + async with do_with_cql_env_thread

Closes scylladb/scylladb#17758

* github.com:scylladb/scylladb:
  test/storage_proxy: Restore indentation after previous patch
  test/storage_proxy: Use do_with_cql_env_thread()
2024-03-12 20:23:40 +02:00
Pavel Emelyanov
34477ad98e test/storage_proxy: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-12 19:10:44 +03:00
Pavel Emelyanov
fd112446c2 test/storage_proxy: Use do_with_cql_env_thread()
One of the test cases explicitly wraps itself into async, but there's a
convenience helper for that already.

Indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-12 19:10:33 +03:00
Botond Dénes
2335f42b2b test/boost/sstable_compaction_test: add validation test with valid sstable
Add a positive test, as it turns out we had some false-positive
validation bugs in the validator and we need a regression test for this.
2024-03-12 11:05:18 -04:00
Botond Dénes
a19a2d76c9 sstablex/mx/reader: validate(): print trace message when finishing the PI block 2024-03-12 11:05:18 -04:00
Botond Dénes
677be168c4 sstablex/mx/reader: validate(): make index-data PI position check message consistent
The message says "index-data" but when printing the position, the data
position is printed first, causing confusion. Fix this and while at it,
also print the position of the partition start.
2024-03-12 11:05:18 -04:00
Botond Dénes
5bff7c40d3 sstablex/mx/reader: validate(): only load the next PI block if current is exhausted
The validate() consumes the content of partitions in a consume-loop.
Every time the consumer asks for a "break", the next PI block is loaded
and set on the validator, so it can validate that further clustering
elements are indeed from this block.
This loop assumed the consumer would only request interruption when the
current clustering block is finished. This is wrong, the consumer can
also request interruption when yielding is needed. When this is the
case, the next PI block doesn't have to be loaded yet, the current one
is not exhausted yet. Check this condition, before loading the next PI
block, to prevent false positive errors, due to mismatched PI block
and clustering elements from the sstable.
2024-03-12 11:05:18 -04:00
Botond Dénes
e073df1dbb sstablex/mx/reader: validate(): reset the current PI block on partition-start
It is possible that the next partition has no PI and thus there won't be
a new PI block to overwrite the old one. This will result in
false-positive messages about rows being outside of the finished PI
block.
2024-03-12 11:05:18 -04:00
Botond Dénes
2737899c21 sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked
Promoted index entries can be written on any clustering elements,
icluding range tombstones. So the validating consumer also has the check
whether the current expected clustering block is finished, when
consuming a range tombstone. If it is, consumption has to be
interrupted, so that the outer-loop can load up the next promoted index
block, before moving on to the next clustering element.
2024-03-12 11:05:18 -04:00
Botond Dénes
f46b458f0d sstablex/mx/reader: validate(): fix validator for range tombstone end bounds
For range tombstone end-bounds, the validate_fragment_order() should be
passed a null tombstone, not a disengaged optional. The latter means no
change in the current tombstone. This caused the end bound of range
tombstones to not make it to the validator and the latter complained
later on partition-end that the partition has unclosed range tombstone.
2024-03-12 11:05:18 -04:00
Botond Dénes
8be97884ec test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper
It is not used anymore.
2024-03-12 11:05:18 -04:00
Botond Dénes
da0f4d3a9f test/boost/sstable_compaction_test: fix indentation 2024-03-12 11:05:18 -04:00
Botond Dénes
c35092aff6 test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test
The test becomes a lot shorter and it now uses random schema and random
data.
Indentation is left broken, to be fixed in a future patch.
2024-03-12 11:05:18 -04:00
Botond Dénes
3f76aad609 test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test
The test becomes a lot shorter and it now uses random schema and random
data.
Indentation is left broken, to be fixed in a future patch.
2024-03-12 11:05:18 -04:00
Botond Dénes
5237e8133b test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test
The test becomes a lot shorter and it now uses random schema and random
data. The test is also split in two: one test for abort mode and one for
skip mode.
Indentation is left broken, to be fixed in a future patch.
2024-03-12 11:05:18 -04:00
Botond Dénes
76785baf43 test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test
The test becomes a lot shorter and it now uses random schema and random
data.
Indentation is left broken, to be fixed in a future patch.
2024-03-12 11:05:18 -04:00
Botond Dénes
b6f0c4efa0 test/boost/sstable_compaction_test: introduce scrub_test_framework
Scrub tests require a lot of boilerplate code to work. This has a lot of
disadvantages:
* Tests are long
* The "meat" of the test is lost between all the boiler-plate, it is
  hard to glean what a test actually does
* Tests are hard to write, so we have only a few of them and they test
  multiple things.
* The boiler-plate differs sligthly from test-to-test.

To solve this, this patch introduces a new class, `scrub_test_frawmework`,
which is a central place for all the boiler-plate code needed to write
scrub-related tests. In the next patches, we will migrate scrub related
tests to this class.
2024-03-12 11:05:18 -04:00
Botond Dénes
e412673c44 test/lib/random_schema: add uncompatible_timestamp_generator()
Guarantees that produced mutations will not be compactible.
2024-03-12 11:05:18 -04:00
Pavel Emelyanov
3a734facc7 view_builder: Complete build step early if reader produces nothing
Builder works in "steps". Each step runs for a given base table, when a
new view is created it either initiates a step or appends to currently
running step.

Running a step means reading mutations from local sstables reader and
applying them to all views that has jumped into this step so far. When a
view is added to the step it remembers the current token value the step
is on. When step receives end-of-stream it rewinds to minimal-token.
Rewinding is done by closing current reader and creating a new one. Each
time token is advanced, all the views that meet the new token value for
the second time (i.e. -- scan full round) are marked as built and are
removed from step. When no views are left on step, it finishes.

The above machinery can break when rewinding the end-of-stream reader.
The trick is that a running step silently assumes that if the reader
once produced some token (and there can be a view that remembered this
token as its starting one), then after rewinding the reader would
generate the same token or greater. With tablets, however, that's not
the case. When a node is decommissioned tablets are cleaned and all
sstables are removed. Rewinding a reader after it makes empty reader
that produces no tokens from now on. Respectively, any build steps that
had captured tokens prior to cleanup would get stuck forever.

The fix is to check if the mutation consumer stepped at least one step
forward after rewind, and if no -- complete all the attached views.

fixes: #17293

Similar thing should happen if the base table is truncated with views
being built from it. Testing it steps on compaction assertion elsewhere
and needs more research.

refs: #17543

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17548
2024-03-12 14:58:47 +02:00
Kefu Chai
69f140eea6 test.py: s/summarize_tests/summarize_boost_tests/
summarize_tests() is only used to summarize boost tests, so reflect
this fact using its name. we will need to summarize the tests which
generate JUnit XML as well, so this change also prepares for a
following-up change to implement a new summarize helper.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17746
2024-03-12 14:49:01 +02:00
Pavel Emelyanov
def5fed619 api: Fix stats reported for row cache
Here are three endpoints in the api/cache_service that report "metrics"
for the row cache and the values they return

    - entries:  number of partitions
    - size:     number of partitions
    - capacity: used space

The size and capacity seem very inaccurate.

Comment says, that in C* the size should be weighted, but scylla doesn't
support weight of entries in cache. Also, capacity is configurable via
row_cache_size_in_mb config option or set_row_cache_capacity_in_mb API
call, but Scylla doesn't support both either.

This patch suggestes changing return values for size and capacity endpoints.

Despite row cache doesn't support weights, it's natural to return
used_space in bytes as the value, which is more accurate to what "size"
means rather than number of entries.

The capacity may return back total memory size, because this is what
Scylla really does -- row cache growth is only limited by other memory
consumers, not by configured limits.

fixes: #9418

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17724
2024-03-12 13:44:59 +02:00
Pavel Emelyanov
a755914265 test/cql_query_test: Use string_view by value
The test carries const std::string_view& around, but the type is
lightweight class that can be copied around at the same cost as its
reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17735
2024-03-12 13:44:04 +02:00
Kefu Chai
17fe4a6439 view_info: add fmt::formatter for view_info
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `view_info`, its operator<<
is dropped.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17745
2024-03-12 13:28:27 +02:00
Botond Dénes
f3735dc8e0 Merge 'utils: add fmt::formatter for utils types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* utils::human_readable_value
* std::strong_ordering
* std::weak_ordering
* std::partial_ordering
* utils::exception_container

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17710

* github.com:scylladb/scylladb:
  utils/exception_container: add fmt::formatter for exception_container
  utils/human_readable: add fmt::formatter for human_readable_value
  utils: add fmt::formatter for std::strong_ordering and friends
2024-03-12 13:27:37 +02:00
Botond Dénes
8e90b856b5 Merge 'Extend test.py's ability to select test cases' from Pavel Emelyanov
This PR fixes comments left from #17481 , namely

- adds case selection to boost suite
- describes the case selection in documentation

Closes scylladb/scylladb#17721

* github.com:scylladb/scylladb:
  docs: Add info about the ability to run specific test case
  test.py: Support case selection for boost tests
2024-03-12 13:21:50 +02:00
Kefu Chai
9c1d517bcc data_dictionary: drop unused friend declaration
the corresponding implementation of operator<< was dropped in
a40d3fc25b, so there is no needs to
keep this friend declaration anymore.

also, drop `include <ostream>`, as this header does not reference
any of the ostream types with the change above.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17743
2024-03-12 09:45:15 +02:00
Kefu Chai
af3b69a4d1 Update seastar submodule
* seastar 5d3ee980...a71bd96d (51):
  > util: add formatter for optimized_optional<>
  > build: search protobuf using package config
  > reactor: Move pieces of scollectd to scollectd
  > reactor: Remove write-only task_queue._current
  > Add missing include in tests/unit/rpc_test.cc
  > doc/io_tester.md: include request_type::unlink in the docs
  > doc/io-tester.md: update obsolete information in io_tester docs
  > io_tester/conf.yaml: include an example of request_type::unlink job
  > io_tester: implement request_type::unlink
  > reactor: Print correct errno on io_submit failure
  > src/core/reactor.cc: qualify metric function calls with "sm::"
  > build: add shard_id.hh to seastar library
  > thread: speed up thread creation in debug mode
  > include: add missing modules.hh import to shard_id.hh
  > prometheus: avoid ambiguity when calling MetricFamily.set_name()
  > util/log: add formatter for log_level
  > util/log: use string_view for log_level_names
  > perf: Calculate length of name column in perf tests
  > rpc_test: add a test for inter-compressor communication
  > rpc: in multi_algo_compressor_factory, propagate send_empty_frame
  > rpc: give compressors a way to send something over the connection
  > rpc: allow (and skip) empty compressed frames
  > metrics: change value_vector type to std::deque
  > HACKING.md: remove doc related to test_dist
  > test/unit: do not check if __cplusplus > 201703L
  > json_elements: s/foramted/formatted/
  > iostream: Refactor input_stream::read_exactly_part
  > add unit test to verify str.starts_with(str), str.ends_with(str) return true.
  > str.starts_with(str) and str.ends_with(str) should return true, just like std::string
  > rpc: Remove FrameType::header_and_buffer_type
  > rpc: Defuturize FrameType::return_type
  > rpc: Kill FrameType::get_size()
  > treewide: put std::invocable<> constraints in template param list
  > include: do not include unuser headers
  > rpc: fix a deadlock in connection::send()
  > iostream: Replace recursion by iteration in input_stream::read_exactly_part
  > core/bitops.hh: use std::integral when appropriate
  > treewide: include <concepts> instead of seastar/util/concepts.hh
  > abortable_fifo: fix the indent
  > treewide: expand `SEASTAR_CONCEPT` macro
  > util/concepts: always define SEASTAR_CONCEPT
  > file: Remove unused thread-pool arg from directory lister
  > seastar-json2code: collect required_query_params using a list
  > seastar-json2code: reduce the indent level
  > seastar-json2code: indent the enum and array elements
  > seastar-json2code: generate code for enum type using Template
  > seastar-json2code: extract add_operation() out
  > reactor: Re-ifdef SIGSEGV sigaction installing
  > reactor: Re-ifdef reactor::enable_timer()
  > reactor: Re-ifdef task_histogram_add_task()
  > reactor: Re-ifdef install_signal_handler_stack()

Closes scylladb/scylladb#17714
2024-03-12 09:19:28 +02:00
Botond Dénes
3a7364525f Merge 'test/alternator: improve metrics tests' from Nadav Har'El
This small series improves the Alternator tests for metrics:
1. Improves some comments in the test.
2. Restores a test that was previously hidden by two tests having the same name.
3. Adds tests for latency histogram metrics.

Closes scylladb/scylladb#17623

* github.com:scylladb/scylladb:
  test/alternator: tests for latency metrics
  test/alternator: improve comments and unhide hidden test
2024-03-12 09:13:17 +02:00
Kefu Chai
35fc065458 utils/exception_container: add fmt::formatter for exception_container
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `exception_container<..>`
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-12 14:53:55 +08:00
Kefu Chai
9300d7b80b utils/human_readable: add fmt::formatter for human_readable_value
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for `utils::human_readable_value`,
and drop its operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-12 14:53:55 +08:00
Kefu Chai
007d7f1355 utils: add fmt::formatter for std::strong_ordering and friends
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* std::strong_ordering
* std::weak_ordering
* std::partial_ordering

and their operator<<:s are moved to test/lib/test_utils.{hh,cc}, as they
are only used by Boost.test.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-12 14:53:55 +08:00
Tomasz Grabiec
47a66d0150 Merge 'Handle tablet migration failure in wrapping-up stages' from Pavel Emelyanov
There are four stages left to handle: cleanup, cleanup_target, end_migration and revert_migration. All are handling removed nodes already, so the PR just extends the test.

fixes: #16527

Closes scylladb/scylladb#17684

* github.com:scylladb/scylladb:
  test/tablets_migration: Test revert_migration failure handling
  test/tablets_migration: Test end_migration failure handling
  test/tablets_migration: Test cleanup_target failure handling
  test/tablets_migration: Test cleanup failure handling
  test/tablets_migration: Prepare for do_... stages
  test/tablets_migration: Add ability to removenode via any other node
  test/tablets_migration: Wrap migration stages failing code into a helper class
  storage_service: Add failure injection to crash cleanup_tablet
2024-03-12 00:20:56 +01:00
Botond Dénes
c6cff53771 reader_concurrency_semaphore: use variable reference for metrics
Instead of a functor, for those metrics that just return the value of an
existing member variable. This is ever so slightly more efficient than a
functor.

Closes scylladb/scylladb#17726
2024-03-11 20:47:04 +02:00
Mikołaj Grzebieluch
cb17b4ac59 docs: maintenance socket: add section about accessing maintenance socket
Closes scylladb/scylladb#17701
2024-03-11 20:25:00 +02:00
Asias He
ebc0ab94e5 repair: Add ranges option support for tablet repair
The management tool, e.g., scylla manager, needs the ranges option to
select which ranges to repair on a node to schedule repair jobs.

This patch adds ranges option support.

E.g.,

curl -X POST "http://127.0.0.1:10000/storage_service/repair_async/ks1?ranges=-4611686018427387905:-1,4611686018427387903:9223372036854775807"

Fixes: #17416
Tests: test_tablet_repair_ranges_selection

Closes scylladb/scylladb#17436
2024-03-11 20:03:12 +02:00
Nadav Har'El
d207962e40 test/alternator: tests for latency metrics
In test/alternator/test_metrics.py we had tests for the operation-count
metrics for different Alternator API operations, but not for the latency
histograms for these same operations. So this patch adds the missing
tests (and removes a TODO asking to do that).

Note that only a subset of the operations - PutItem, GetItem, DeleteItem,
UpdateItem, and GetRecords - currently have a latency history, and this
test verifies this. We have an issue (Refs #17616) about adding latency
histograms for more operations - at which point we will be able to expand
this test for the additional operations.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-11 19:26:59 +02:00
Nadav Har'El
970c2dc7a6 test/alternator: improve comments and unhide hidden test
The original goal of this patch was to improve comments in
test/alternator/test_metrics.py, but while doing that I discovered
that one of the test functions was hidden by a second test with
the same name! So this patch also renames the second test.

The test continues to work after this patch - the hidden test
was successful.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-03-11 19:26:59 +02:00
Pavel Emelyanov
0d5c25aef5 error_injection: De-template inject() with handler
The recently renamed inject_with_handler() was a template, but it can be
symmetrical to its peer that accepts void function as a callback, and
use std::function as its argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 19:32:21 +03:00
Pavel Emelyanov
1f44a374b8 error_injection: Overload inject() instead of inject_with_handler()
The inject_with_handler() method accepts a coroutine that can be called
wiht injection_handler. With such function as an argument, there's no
need in distinctive inject_with_handler() name for a method, it can be
overload of all the existing inject()-s

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 19:30:19 +03:00
Botond Dénes
7d31093d4b Merge 'storage_service/ownership: handle requests when tablets are enabled' from Patryk Wróbel
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().

This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.

Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.

Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.

Fixes: https://github.com/scylladb/scylladb/issues/17342

Closes scylladb/scylladb#17405

* github.com:scylladb/scylladb:
  storage_service/ownership: discard get_ownership() requests when tablets enabled
  storage_service/ownership/{keyspace}: handle requests when tablets are enabled
  locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual
  locator/tablets: add tablet_map::get_sorted_tokens()
  pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient
  rest_api/test_storage_service: add simplistic tests of ownership API for vnodes
2024-03-11 14:55:26 +02:00
Kefu Chai
50c6fc1141 scylla-gdb: use current_scheduling_group_ptr instead of task_queue._current
Seastar removed `task_queue::_current` in
258b11220d343d8c7ae1a2ab056fb5e202723cc8 . let's adapt scylla-gdb.py
accordingly. despite that `current_scheduling_group_ptr()` is an internal
API, it's been around for a while, and relatively stable. so let's use
it instead.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17720
2024-03-11 13:13:59 +02:00
Kamil Braun
65b4f754ff Merge 'gossiper: do_status_check: allow evicting dead nodes from membership with no host_id' from Benny Halevy
The short series allows do_status_check to handle down nodes that don't have HOST_ID application state.

Fixes #16936

Closes scylladb/scylladb#17024

* github.com:scylladb/scylladb:
  gossiper: do_status_check: fixup indentation
  gossiper: do_status_check: allow evicting dead nodes from membership with no host_id
  gossiper: print the host_id when endpoint state goes UP/DOWN
  gossiper: get_host_id: differentiate between no endpoint_state and no application_state
  gms: endpoint_state: add get_host_id
  gossiper: do_status_check: continue loop after evicting FatClient
2024-03-11 11:21:49 +01:00
Kefu Chai
e1dbfedcdb service: add fmt::formatter for service/storage_proxy.cc types
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for internal types in service/storage_proxy.cc.
please note, `service::storage_proxy::remote::read_verb` is extracted out of
the outter class, because, the class's implementation formats `read_verb` in this
class. so we have to put the formatter at the place where its callers can see.
that's why it is moved up and out of `service::storage_proxy::remote`.

some of the operator<<:s are preserved, as they are still being used by
the existing formatters, for instance, the one for
`seastar::shared_ptr<>`, which is used to print
`seastar::shared_ptr<service::paxos_response_handler>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17708
2024-03-11 11:52:58 +02:00
Kefu Chai
1ab30fc306 clustering_bounds_comparator: add fmt::formtter for bound_{kind,view}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `bound_kind` and `bound_view`,
and drop the latter's operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17706
2024-03-11 11:37:48 +02:00
Botond Dénes
1e7180de57 Update tools/java submodule
* tools/java e4878ae7...d61296dc (1):
  > build.xml: update scylla-driver-core to 3.11.5.2

Closes scylladb/scylladb#17722
2024-03-11 11:36:29 +02:00
Amnon Heiman
8b43609920 alternator: Use summary for shard-level latencies.
Shard-level latencies generate a lot of metrics. This patch reduces the
the number of latencies reported by Alternator while keeping the same
functionality.

On the shard level, summaries will be reported instead of histograms.
On the instance level, an aggregated histogram will be reported.

Summaries, histograms, and counters are marked with skip_when_empty.

Fixes #12230

Closes scylladb/scylladb#17581
2024-03-11 11:12:08 +02:00
Patryk Wrobel
9eb91b5526 storage_service/ownership: discard get_ownership() requests when tablets enabled
This change introduces a logic, that is responsible
for checking if tablets are enabled for any of
keyspaces when get_ownership() is invoked.

Without it, the result would be calculated
based solely on sorted_tokens() which was
invalid.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:52:25 +01:00
Patryk Wrobel
51da80da7d storage_service/ownership/{keyspace}: handle requests when tablets are enabled
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().

This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.

Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.

Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:52:23 +01:00
Patryk Wrobel
75aadeb32f locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual
Before this patch, the mentioned function was a specific
member of vnode_effective_replication_strategy class.
To allow its usage also when tablets are enabled it was
shifted to the base class - effective_replication_strategy
and made pure virtual to force the derived classes to
implement it.

It is used by 'storage_service::get_ranges_for_endpoint()'
that is used in calculation of effective ownership. Such
calculation needs to be performed also when tablets are
enabled.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Patryk Wrobel
3fff6bd407 locator/tablets: add tablet_map::get_sorted_tokens()
This change introudces a new member function that
returns a vector of sorted tokens where each pair of adjacent
elements depicts a range of tokens that belong to tablet.

It will be used to produce the equivalent of sorted_tokens() of
vnodes when trying to use dht::describe_ownership() for tablets.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Patryk Wrobel
a39a5b671e pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient
This change adds a member function that can be used
to access 'storage_service/ownership' API.

It will be used by tests that need to access this API.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Patryk Wrobel
dea76c4763 rest_api/test_storage_service: add simplistic tests of ownership API for vnodes
This change is intended to introduce tests for vnodes for
the following API paths:
 - 'storage_service/ownership'
 - 'storage_service/ownership/{keyspace}'

In next patches the logic that is tested will be adjusted
to work correctly when tablets are enabled. This is a safety
net that ensures that the logic is not broken.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Kefu Chai
38ae52d5cd add fmt::formatter for reader_permit::state and reader_resources
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* reader_permit::state
* reader_resources

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17707
2024-03-11 09:55:51 +02:00
Kefu Chai
ca7b73f34e tools/scylla-nodetool: use constexpr for compile-time format check
instead of using fmt::runtime format string, use compile-time
format string, so that we can have compile-time format check provided
by {fmt}.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17709
2024-03-11 09:45:32 +02:00
Pavel Emelyanov
3453a934ba docs: Add info about the ability to run specific test case
The test.py usage is documented, the ability to run a specific test by
its name is described in doc. Extend it with the new ability to run
specific test case as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 09:10:20 +03:00
Pavel Emelyanov
3afbd21faa test.py: Support case selection for boost tests
Boost tests support case-by-case execution and always turn it on -- when
run, boost test is split into parallel-running sub-tests each with the
specific case name.

This patch tunes this, so that when a test is run like

   test.py boost/testname::casename

No parallel-execution happens, but instead just the needed casename is
run. Example of selection:

   test.py --mode=${mode} boost/bptree_test::test_cookie_find

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 09:09:10 +03:00
Pavel Emelyanov
feae470475 test/tablets_migration: Test revert_migration failure handling
This stage is also the error path that starts from write_both_read_old,
so check this failure in two steps -- first fail the latter stage in one
of the nodes, then fail the former in another.

For that one more node in the cluster is needed.

Also, to avoid name conflicts, the do_revert_migration pseudo stage name
is used.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
c3d96b1a86 test/tablets_migration: Test end_migration failure handling
This stage is pure barrier. Barriers already take ignored nodes into
account, so do the fail-injector, so just wire the stage name into the
test.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
180446e7b8 test/tablets_migration: Test cleanup_target failure handling
This stage is error path, so in order to fail it we need to fail some
other stage prior to that. This leads to the testing sequence of

1. fail streaming via source node
2. stop and remove source node to let state machine proceed
3. fail cleanup_target on the destination node
4. stop and remove destination node

First thing to note here, is that the test doesn't fail source node for
cleanup_target stage, symmetrically to how it does for cleanup stage.

Next, since we're removing two nodes, the cluster is equipeed with more
nodes nodes to have raft quorum.

Finally, since remove of source node doesn't finish until tablet
migration finishes, it's impossible to remove destination node via the
same node-0, so the 2nd removenode happens via node-3.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
724c79ecf6 test/tablets_migration: Test cleanup failure handling
The handling itself is already there -- if the leaving node is excluded
the cleanup stage resolves immediately. So just add a code that
validates that.

Also, skip testing of pending replica failure during cleanup stage, as
it doesn't really participate in it any longer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 08:16:13 +03:00
Pavel Emelyanov
ccefb7f21f test/tablets_migration: Prepare for do_... stages
The tablets migration test is parametrized with stage name to inject
failure in. Internal class node_failer uses this parameter as is when
injecting a failure into scylla barrier handler.

Next patch will need to extend the test with revert_migration value and
add handling of this name to node_failer class. The node_failer class,
in turn, will want to instantiate two other instances of the same class
-- one to fail the write_both_read_old stage, and the other one to fail
the revert_migration barrier. So internally the class will need to tell
revert_migration value as full test parameter from revert_migration as
barrier-only parameter.

This test adds the ability to add do_ prefix to node_failer parameter to
tell full test from barrier-only. When injecting a failure into scylla
the do_ prefix needs to be cut off, since scylla still needs to fail the
barrier named revert_migration, not do_revert_migration.

Also split the long line while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:58 +03:00
Pavel Emelyanov
abbd22cb90 test/tablets_migration: Add ability to removenode via any other node
Currently the test calls removenode via node-0 in the cluster, which is
always alive. Next test case will need to call removenode on some other
node (more details in that patch later).

refs: #17681

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:55 +03:00
Pavel Emelyanov
5d3291f322 test/tablets_migration: Wrap migration stages failing code into a helper class
One of the next stages will need to use two of them at the same time and
it's going to be easier if the failing code is encapsulated.

No functional changes here, just large portions of code and local
variables are moved into class and its methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:55 +03:00
Pavel Emelyanov
82270e3ec4 storage_service: Add failure injection to crash cleanup_tablet
Will be needed by test that verifies how failures in tablets migration
stages are handled by state machine

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-11 07:56:55 +03:00
Benny Halevy
9804ce79d8 gossiper: do_status_check: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 20:17:00 +02:00
Benny Halevy
1375c4e6a3 gossiper: do_status_check: allow evicting dead nodes from membership with no host_id
Be more permissive about the presence of host_id
application state for dead and expired nodes in release mode,
so do not throw runtime_error in this case, but
rather consider them as non-normal token owners.
Instead, call on_internal_error_noexcept that will
log the internal error and a backtrace, and will abort
if abort-on-internal-error is set.

This was seen when replacing dead nodes,
without https://github.com/scylladb/scylladb/pull/15788

Fixes #16936

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 20:17:00 +02:00
Benny Halevy
f32efcb7a6 gossiper: print the host_id when endpoint state goes UP/DOWN
The host_id is now used in token_metadata
and in raft topology changes so print it
when the gossiper marks the node as UP/DOWN.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 20:17:00 +02:00
Benny Halevy
fbf85ee199 gossiper: get_host_id: differentiate between no endpoint_state and no application_state
Currently, we throw the same runtime_error:
`Host {} does not have HOST_ID application_state`
in both case: where there is no endpoint_state
or when the endpoint_state has no HOST_ID
application state.

The latter case is unexpected, especially
after 8ba0decda5
(and also from the add_saved_endpoint path
after https://github.com/scylladb/scylladb/pull/15788
is merged), so throw different error in each case
so we can tell them apart in the logs.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 20:16:49 +02:00
Benny Halevy
a9fb0cf3dc gms: endpoint_state: add get_host_id
A simpler getter to get the HOST_ID application state
from the endpoint_state.

Return a null host_id if the application state is not found.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 15:19:51 +02:00
Benny Halevy
234774295e gossiper: do_status_check: continue loop after evicting FatClient
We're seeing cases like #16936:
```
INFO  2024-01-23 02:14:19,915 [shard 0:strm] gossip - failure_detector_loop: Mark node 127.0.23.4 as DOWN
INFO  2024-01-23 02:14:19,915 [shard 0:strm] gossip - InetAddress 127.0.23.4 is now DOWN, status = BOOT
INFO  2024-01-23 02:14:27,913 [shard 0: gms] gossip - FatClient 127.0.23.4 has been silent for 30000ms, removing from gossip
INFO  2024-01-23 02:14:27,915 [shard 0: gms] gossip - Removed endpoint 127.0.23.4
WARN  2024-01-23 02:14:27,916 [shard 0: gms] gossip - === Gossip round FAIL: std::runtime_error (Host 127.0.23.4 does not have HOST_ID application_state)
```

Since the FatClient timeout handling already evicts the endpoint
from memberhsip there is no need to check further if the
node is dead and expired, so just co_return.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-10 15:19:51 +02:00
Nadav Har'El
af90910687 Merge 'repair: add fmt::formatter for repair types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* repair_hash
* read_strategy
* streaming::stream_summary

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17711

* github.com:scylladb/scylladb:
  repair: add fmt::formatter for streaming::stream_summary
  repair: add fmt::formatter for read_strategy
  repair: add fmt::formatter for repair_hash
2024-03-10 12:15:15 +02:00
Kefu Chai
5687c289f4 repair: add fmt::formatter for streaming::stream_summary
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for streaming::stream_summary, and
drop its operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-09 23:43:32 +08:00
Kefu Chai
7be93084b3 repair: add fmt::formatter for read_strategy
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for read_strategy, and drop its
operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-09 23:42:19 +08:00
Kefu Chai
39ee8593cb repair: add fmt::formatter for repair_hash
before this change, we rely on the default-generated fmt::formatter created
from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for repair_hash.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-09 23:41:58 +08:00
Botond Dénes
9f97d21339 Merge 'Enhance perf-simple-query test' from Pavel Emelyanov
While measuring #17149 with this test some changes were applied, here they are

- keep initial_tablets number in output json's parameters section
- disable auto compaction
- add control over the amount of sstables generated for --bypass-cache case

Closes scylladb/scylladb#17473

* github.com:scylladb/scylladb:
  perf_simple_query: Add --memtable-partitions option
  perf_simple_query: Disable auto compaction
  perf_simple_query: Keep number of initial tablets in output json
2024-03-08 15:21:04 +02:00
Kefu Chai
079d70145e raft: add fmt::formatter for raft tracker types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raft::election_tracker
* raft::votes
* raft::vote_result

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17670
2024-03-08 15:19:37 +02:00
Piotr Smaroń
44bbf2e57b test.py: improve readability of failures resulting in empty XML
Before the change, when a test failed because of some error
in the `cql_test_env.cc`, we were getting:
```
error: boost/virtual_table_test: failed to parse XML output '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0
```
After the change we're getting:
```
error: boost/virtual_table_test: Empty testcase XML output, possibly caused by a crash in the cql_test_env.cc, details: '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0
```

Closes scylladb/scylladb#17679
2024-03-08 15:17:12 +02:00
Kefu Chai
362a8a777c partition_snapshot_row_cursor: add fmt::format to this class
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`partition_snapshot_row_cursor`, and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17669
2024-03-08 15:15:43 +02:00
Botond Dénes
630be97d2f Merge 'tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"' from Kefu Chai
before this change, "ring" subcommand has two issues:

1. `--resolve-ip` option accepts a boolean argument, but this option
   should be a switch, which does not accept any argument at all
2. it always prints the endpoint no matter if `--resolve-ip` is
   specified or not. but it should print the resolved name, instead
   of an IP address if `--resolve-ip` is specified.

in this change, both issues are addressed. and the test is updated
accordingly to exercise the case where `--resolve-ip` is used.

Closes scylladb/scylladb#17553

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"
  test/nodetool: calc max_width from all_hosts
  test/nodetool: keep tokens as Host's member
  test/nodetool: remove unused import
2024-03-08 15:15:19 +02:00
Pavel Emelyanov
fc9fb03b90 cql3: Remove unused cf_name::operator<<
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17686
2024-03-08 15:14:52 +02:00
Nadav Har'El
ba585905e5 Update tools/java submodule
* tools/java 5e11ed17...e4878ae7 (2):
  > nodetool: fix a typo in error message
  > bin/cassandra-stress: Add extended version info

Closes scylladb/scylladb#17680
2024-03-08 15:14:21 +02:00
Kefu Chai
f5f5ff1d51 clustering_interval_set: add fmt::formatter for clustering_interval_set
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `clustering_interval_set`

their operator<<:s are dropped

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17593
2024-03-08 15:13:14 +02:00
Kefu Chai
9b5ec53355 tombstone_gc_options: add fmt::formatter for tombstone_gc_mode
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `tombstone_gc_mode`, and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17673
2024-03-08 15:12:00 +02:00
Kefu Chai
8ca672a02c test/pylib: return better error if self.create_server() raises
in `ScyllaServer::add_server()`, `self.create_server()` is called to
create a server, but if it raises, we would reference a local variable
of `server` which is not bound to any value, as `server` is not assigned
at that moment. if `ScyllaServer` is used by `ScyllaClusterManager`, we
would not be able to see the real exception apart from the error like

```
cannot access local variable 'server' where it is not associated with a
value
```

which is but the error from Python runtime.

in this change, `server` is always initialized, and we check for None,
before dereference it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17693
2024-03-08 15:10:27 +02:00
Kefu Chai
70ef7e63b5 tools: toolchain: prepare: do not bail out when checking for command
before this change, if `buildah` is not available in $PATH, this script
fails like:
```console
$ tools/toolchain/prepare --help
tools/toolchain/prepare: line 3: buildah: command not found
```

the error message never gets a chance to show up. as `set -e` in the
shebang line just let bash quit.

after this change, we check for the existence of buildah, and bail out
if it is not available. so, on a machine without buildah around, we now
have:
```console
$ tools/toolchain/prepare  --help
install buildah 1.19.3 or later
```

the same applies to "reg".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17697
2024-03-08 15:09:21 +02:00
Botond Dénes
05307d0be9 Merge 'service: add fmt::formatter for service types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* service::fencing_token
* service::topology::transition_state
* service::node_state
* service::topology_request
* service::global_topology_request
* service::raft_topology_cmd::command
* service::paxos::proposal
* service::paxos::promise

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17692

* github.com:scylladb/scylladb:
  service/paxos: add fmt::formatter for paxos::promise
  service/paxos: add fmt::formatter for paxos::proposal
  service: add fmt::formatter for topology_state_machine types
2024-03-08 15:06:07 +02:00
Botond Dénes
505f137cc9 Merge 'Make object_store suite use ManagerClient' from Pavel Emelyanov
The test cases in this suite need to start scylla with custom config options, restart it and call API on it. By the time the suite was created all this wasn't possible with any library facility, so the suite carries its version of managed_cluster class that piggy-backs cql-pytest scylla starting. Now test.py has pretty flexible manager that provides all the scylla cluster management object_store suite needs. This PR makes the suite use the manager client instead of the home-brew managed_cluster thing

refs: #16006
fixes: #16268

Closes scylladb/scylladb#17292

* github.com:scylladb/scylladb:
  test/object_store: Remove unused managed_cluster (and other stuff)
  test/object_store: Use tmpdir fixture in flush-retry case
  test/object_store: Turn flush-retry case to use ManagerClient
  test/object_store: Turn "misconfigured" case to use ManagerClient
  test/object_store: Turn garbage-collect case to use ManagerClient
  test/object_store: Turn basic case to use ManagerClient
  test/object_store: Prepare to work with ManagerClient
2024-03-08 15:04:46 +02:00
Tomasz Grabiec
85ae10f632 Merge 'Make it possible to run individual pytest cases with test.py' from Pavel Emelyanov
Today's test.py allows filtering tests to run with the `test.py --options name` syntax. The "name" argument is then considered to be some prefix, and when iterating tests only those whose name starts with that prefix are collected and executed. This has two troubles.

Minor: since it is prefix filtering, running e.g. topology_custom/test_tablets will run test_tablets _and_ test_tablets_migration from it. There's no way to exclude the latter from this selection. It's not common, but careful file names selection is welcome for better ~~user~~ testing experience.

Major: most of test files in topology and python suites contain many cases, some are extremely long. When the intent is to run a single, potentially fast, test case one needs to either wait or patch the test .py file by hand to somehow exclude unwanted test cases.

This PR adds the ability to run individual test case with test.py. The new syntax is `test.py --options name::case`. If the "::case" part is present two changes apply.

First, the test file selection is done by name match, not by prefix match. So running topology_custom/test_tablets will _not_ select test_tablets_migration from it.

Second, the "::case" part is appended to the pytest execution so that it collects and runs only the specified test case.

Closes scylladb/scylladb#17481

* github.com:scylladb/scylladb:
  test.py: Add test-case splitting in 'name' selection
  test.py: Add casename argument to PythonTest
2024-03-08 12:56:39 +01:00
Kamil Braun
ae954fb2ec test: unflake test_tablets_removenode
These tests are inserting data into RF=3 tables, but used the default
consistency level which is taken from the default execution profile
which is set to LOCAL_QUORUM. The tests would then read with CL=ONE, so
we cannot give a guarantee that some of the data won't be missed. Fix
this by inserting the data with CL=ALL. (Do it for all RF cases for
simplicity.)

Fixes scylladb/scylladb#17695

Closes scylladb/scylladb#17700
2024-03-08 12:47:47 +01:00
Benny Halevy
8456967012 tablets: read_tablet_mutations: unfreeze_gently
Use co_await unfreeze_gently in the loop body
unfreezing each partition mutation to prevent
reactor stalls when building group0 snapshot
with lots of tablets.

Fixes #15303

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17688
2024-03-08 10:52:39 +01:00
Yaron Kaikov
ad842e5ad7 [mergify] Fix worng label and base branch for backport pr
This PR contains 2 fixes for mergify config file:
1) When openning a backport PR base branch should be `branch-x.y`

2) Once a commit is promoted, we should add the label
   `promoted-to-master`, in 5.4 configuraion we were using the wrong
label. fixing it

Closes scylladb/scylladb#17698
2024-03-08 10:08:09 +01:00
Kamil Braun
76fb902858 test: unflake test_topology_remove_garbage_group0
The test is booting nodes, and then immediately starts shutting down
nodes and removing them from the cluster. The shutting down and
removing may happen before driver manages to connect to all nodes in the
cluster. In particular, the driver didn't yet connect to the last
bootstrapped node. Or it can even happen that the driver has connected,
but the control connection is established to the first node, and the
driver fetched topology from the first node when the first node didn't
yet consider the last node to be normal. So the driver decides to close
connection to the last node like this:
```
22:34:03.159 DEBUG> [control connection] Removing host not found in
   peers metadata: <Host: 127.42.90.14:9042 datacenter1>
```

Eventually, at the end of the test, only the last node remains, all
other nodes have been removed or stopped. But the driver does not have a
connection to that last node.

Fix this problem by ensuring that:
- all nodes see each other as NORMAL,
- the driver has connected to all nodes
at the beginning of the test, before we start shutting down and removing
nodes.

Fixes scylladb/scylladb#16373

Closes scylladb/scylladb#17676
2024-03-08 10:08:09 +01:00
Mikołaj Grzebieluch
a0915115c3 maintenance_socket: change log message to differentiate from regular CQL ports
Scylla-ccm uses function `wait_for_binary_interface` that waits for
scylla logs to print "Starting listening for CQL clients". If this log
is printed far before the regular cql_controller is initialized,
scylla-ccm assumes too early that node is initialized.
It can result in timeouts that throw errors, for example in the function
`watch_rest_for_alive`.

Closes scylladb/scylladb#17496
2024-03-08 10:08:09 +01:00
Nadav Har'El
ea53db379f Merge 'tools/scylla-nodetool: listsnapshot: make it compatible with origin' from Botond Dénes
The following incompatibilities were identified by `listsnapshots_test.py` in dtests:
* Command doesn't bail out when there are no snapshots, instead it prints meaningless empty report
* Formatting is incompatible

Both are fixed in this mini-series.

Closes scylladb/scylladb#17541

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
  tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
2024-03-08 10:08:09 +01:00
Kefu Chai
185b503b73 service/paxos: add fmt::formatter for paxos::promise
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `service::paxos::promise`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-08 14:26:58 +08:00
Kefu Chai
cb6c7bb9bf service/paxos: add fmt::formatter for paxos::proposal
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `service::paxos::proposal`,
but its operator<< is preserved, as it is still used by our generic
formatter for std::tuple<> which uses operator<< for printing the
elements in it, so operator<< of this class is indirectly used.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-08 14:26:58 +08:00
Kefu Chai
14cb48eb0a service: add fmt::formatter for topology_state_machine types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* service::fencing_token
* service::topology::transition_state
* service::node_state
* service::topology_request
* service::global_topology_request
* service::raft_topology_cmd::command

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-08 14:05:45 +08:00
Kefu Chai
de276901f2 tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"
before this change, "ring" subcommand has two issues:

1. `--resolve-ip` option accepts a boolean argument, but this option
   should be a switch, which does not accept any argument at all
2. it always prints the endpoint no matter if `--resolve-ip` is
   specified or not. but it should print the resolved name, instead
   of an IP address if `--resolve-ip` is specified.

in this change, both issues are addressed. and the test is updated
accordingly to exercise the case where `--resolve-ip` is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:29:31 +08:00
Kefu Chai
d927ee8d8f test/nodetool: calc max_width from all_hosts
for better readability. as `token_to_endpoint` is but a derived
variable from `all_hosts`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Kefu Chai
4a748c7fb0 test/nodetool: keep tokens as Host's member
to be more consistent with the test_status.py.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Kefu Chai
aefc385786 test/nodetool: remove unused import
and add two empty lines in between global functions

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 22:28:54 +08:00
Botond Dénes
b69ee6bc27 Merge 'Fix load-and-stream for tablets' from Raphael "Raph" Carvalho
It might happen that multiple tablets co-habit the same shard, so we want load-and-stream to jump into a new streaming session for every tablet, such that the receiver will have the data properly segregated. That's a similar treatment we gave to repair. Today, load-and-stream fails due to sstables spanning more than 1 tablet in the receiver.

Synchronization with migration is done by taking replication map, so migrations cannot advance while streaming new data. A bug was fixed too, where data must be streamed to pending replicas too, to handle case where migration is ongoing and new data must reach both old and new replica set. A test was added stressing this synchronization path.

Another bug was fixed in sstable loading, which expected sharder to not be invalidated throughout the operation, but that breaks during migrations.

Fixes #17315.

Closes scylladb/scylladb#17449

* github.com:scylladb/scylladb:
  test: test_tablets: Add load-and-stream test
  sstables_loader: Stream to pending tablet replica if needed
  sstables_loader: Implement tablet based load-and-stream
  sstables_loader: Virtualize sstable_streamer for tablet
  sstables_loader: Avoid reallocations in vector
  sstable_loader: Decouple sstable streaming from selection
  sstables_loader: Introduce sstable_streamer
  Fix online SSTable loading with concurrent tablet migration
2024-03-07 14:18:30 +02:00
Nadav Har'El
19bcea6216 materialized views: fix rare failure caused by empty update
This one-line patch fixes a failure in the dtest

        lwt_schema_modification_test.py::TestLWTSchemaModification
        ::test_table_alter_delete

Where an update sometimes failed due to an internal server error, and the
log had the mysterious warning message:

        "std::logic_error (Empty materialized view updated)"

We've also seen this log-message in the past in another user's log, and
never understood what it meant.

It turns out that the error message was generated (and warning printed)
while building view updates for a base-table mutation, and noticing that
the base mutation contains an *empty* row - a row with no cells or
tombstone or anything whatsoever. This case was deemed (8 years ago,
in d5a61a8c48) unexpected and nonsensical,
and we threw an exception. But this case actually *can* happen - here is
how it happened in test_table_alter_delete - which is a test involving
a strange combination of materialized views, LWT and schema changes:

 1. A table has a materialized view, and also a regular column "int_col".
 2. A background thread repeatedly drops and re-creates this column
    int_col.
 3. Another thread deletes rows with LWT ("IF EXISTS").
 4. These LWT operations each reads the existing row, and because of
    repeated drop-and-recreate of the "int_col" column, sometimes this
    read notices that one node has a value for int_col and the other
    doesn't, and creates a read-repair mutation setting int_col (the
    difference between the two reads includes just in this column).
 5. The node missing "int_col" receives this mutation which sets only
    int_col. It upgrade()s this mutation to its most recent schema,
    which doesn't have int_col, so it removes this column from the
    mutation row - and is left with a completely empty mutation row.
    This completely empty row is not useful, but upgrade() doesn't
    remove it.
 6. The view-update generation code sees this empty base-mutation row
    and fails it with this std::logic_error.
 7. The node which sent the read-repair mutation sees that the read
    repair failed, so it fails the read and therefore fails the LWT
    delete operation.
    It is this LWT operation which failed in the test, and caused
    the whole test to fail.

The fix is trivial: an empty base-table row mutation should simply be
*ignored* when generating view updates - it shouldn't cause any error.

Before this patch, test_table_alter_delete used to fail in roughly
20% of the runs on my laptop. After this patch, I ran it 100 times
without a single failure.

Fixes #15228
Fixes #17549

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17607
2024-03-07 12:00:43 +02:00
Botond Dénes
09068d20ea tools/scylla-nodetool: scrub: make keyspace parameter optional
When no keyspace is provided, request all keyspaces from the server,
then scrub all of them. This is what the legacy nodetool does, for some
reason this was missed when re-implementing scrub.

Closes scylladb/scylladb#17495
2024-03-07 11:15:46 +02:00
Tomasz Grabiec
ec6ed18b5c Merge 'Handle tablet migration failure in barrier stages' from Pavel Emelyanov
There are 4 barrier-only stages when migrating a tablet and the test needs to fail pending/leaving replica that handles it in order to validate how coordinator handles dead node. Failing the barrier is done by suspending it with injection code and stopping the node without waking it up. The main difficulty here is how to tell one barrier RPC call from another, because they don't have anything onboard that could tell which stage the barrier is run for. This PR suggests that barrier injection code looks directly into the system.tablets table for the transition stage, the stage is already there by the time barrier is about to ack itself over RPC.

refs: #16527

Closes scylladb/scylladb#17450

* github.com:scylladb/scylladb:
  topology.tablets_migration: Handle failed use_new
  topology.tablets_migration: Handle failed write_both_read_new
  topology.tablets_migration: Handle failed write_both_read_old
  topology.tablets_migration: Handle failed allow_write_both_read_old
  test/tablets_migration: Add conditional break-point into barrier handler
  replica: Add helper to read tablet transition stage
  topology_coordinator: Add action_failed() helper
2024-03-07 09:56:13 +01:00
Botond Dénes
5dfaa69bde tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's
The author (me) tried to be clever and fix the formatting, but then he
realized this just means a lot of unnecessary fighting with tests. So
this patch makes the formatting compatible with that of the legacy
nodetool:
* Use compatible rounding and precision formatting
* Use incorrect unit (KB instead of KiB)
* Align numbers to the left
* Add trailing white-space to "Snapshot Details: "
2024-03-07 03:54:54 -05:00
Botond Dénes
80483ba732 tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots
Print a message and exit, don't continue to output the snapshot table.
This is what the legacy nodetool does too.
2024-03-07 03:54:54 -05:00
Botond Dénes
ac15e4c109 tools/scylla-nodetool: repair: accept and ignore -full/--full and -j/--job-threads
These two parameters are not used by the native nodetool, because
ScyllaDB itself doesn't support them. These should be just ignored and
indeed there was a unit test checking that this is the case. However,
due to a mistake in the unit test, this was not actually tested and
nodetool complained when seeing these params.
This patch fixes both the test and the native nodetool.

Closes scylladb/scylladb#17477
2024-03-07 11:53:50 +03:00
Nadav Har'El
a36c8b28dd Merge 'scylla-gdb.py: fixes warnings raised by flake8' from Kefu Chai
this changeset addresses some warnings raised by flake8 in hope to improve the readability of this script in general.

Closes scylladb/scylladb#17668

* github.com:scylladb/scylladb:
  scylla-gdb: s/if not foo is None/if foo is not None/
  scylla-gdb.py: add space after keyword
  scylla-gdb.py: remove extraneous spaces
  scylla-gdb.py: use 2 empty lines between top-level funcs/classes
  scylla-gdb.py: replace <tab> with 4 spaces
  scylla-gdb: fix the indent
2024-03-07 10:41:15 +02:00
Botond Dénes
28639e6a59 Merge 'docs: trigger the docs-pages workflow on release branches' from Beni Peled
Currently, the github docs-pages workflow is triggered only when changes are merged to the master/enterprise branches, which means that in the case of changes to a release branch, for example, a fix to branch-5.4, or a branch-5.4>branch-2024.1 merge, the docs-pages is not triggering and therefore the documentation is not updated with the new change,

In this change, I added the `branch-**` pattern, so changes to release branches will trigger the workflow

Closes scylladb/scylladb#17281

* github.com:scylladb/scylladb:
  docs: always build from the default branch
  docs: trigger the docs-pages workflow on release branches
2024-03-07 10:01:50 +02:00
Botond Dénes
75fe2f5c3a Merge 'test: rest_api: fix tests to work with tablets' from Aleksandra Martyniuk
Fix test_compaction_task.py, test_repair_task.py and
test_storage_service.py to work with tablets.

Fixes: #17338.

Closes scylladb/scylladb#17474

* github.com:scylladb/scylladb:
  test: rest_api: enable tablets by default
  test: fix indentation and delete unused this_dc param
  test: rest_api: fix test_storage_service.py
  test: rest_api: fix test_repair_task.py
  test: rest_api: fix test_compaction_task.py
  test: rest_api: use skip_without_tablets fixture
  test: rest_api: add some tablet related fixtures
2024-03-07 10:00:09 +02:00
Asias He
83a28342ea service: Drop unused table param from session_topology_guard
The table param is not used. Dropping it so it can be used in places
where the table object is not available.

Closes scylladb/scylladb#17628
2024-03-07 09:34:40 +02:00
Israel Fruchter
6eb0509ff9 Update tools/cqlsh submodule
* tools/cqlsh b8d86b76...e5f5eafd (2):
  > dist/debian: fix the trailer line format
  > `COPY TO STDOUT` shouldn't put None where a function is expected

Fixes: scylladb/scylladb#17451

Closes scylladb/scylladb#17447
2024-03-07 09:33:36 +02:00
Michał Chojnowski
f9e97fa632 sstables: fix a use-after-free in key_view::explode()
key_view::explode() contains a blatant use-after-free:
unless the input is already linearized, it returns a view to a local temporary buffer.

This is rare, because partition keys are usually not large enough to be fragmented.
But for a sufficiently large key, this bug causes a corrupted partition_key down
the line.

Fixes #17625

Closes scylladb/scylladb#17626
2024-03-07 09:07:07 +02:00
Kefu Chai
7631605892 query-request: use default-generated operator==
instead of using the hand-crafted operator==, use the default-generated
one, which is equivalent to the former.

regarding the difference between global operator== and member operator==,
the default-generated operator in C++20 is now symmetric. so we don't
need to worry about the problem of `max_result_size` being lhs or rhs.
but neither do we need to worry about the implicit conversion, because
all constructors of `max_result_size` are marked explicit. so we don't
gain any advantage by making the operator== global instead of a member
operator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17536
2024-03-07 09:02:42 +03:00
Kefu Chai
64e14d21db locator/tablets: add fmt::formatter for tablet_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* tablet_id
* tablet_replica
* tablet_metadata
* tablet_map

their operator<<:s are dropped

Refs scylladb/scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17504
2024-03-07 09:00:49 +03:00
Kefu Chai
6ef507e842 build: cmake: add table_check.cc to repair/CMakeLists.txt
in 5202bb9d, we introduced repair/table_check.cc, but we didn't
update repair/CMakeLists.txt accordingly. but the symbols defined
by this compilation unit is referenced by other source files when
building scylla.

so, in this change, we add this table_check.cc to the "repair"
target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17517
2024-03-07 08:59:02 +03:00
Pavel Emelyanov
52a1b2c413 Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* position_range
* mutation_fragment
* range_tombstone_stream
* mutation_fragment_v2::printer

Refs #13245

Closes scylladb/scylladb#17521

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for position_range
  mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream
  mutation: add fmt::formatter for mutation_fragment_v2::printer
2024-03-07 08:56:21 +03:00
Pavel Emelyanov
df6048adec topology.tablets_migration: Handle failed use_new
This stage doesn't need any special treatment, because we cannot revert
to old replicas and should proceed normally. The barrier itself won't
get stuck, because it already handles excluded/ignored nodes.

Just make the test validate it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
fb7428c560 topology.tablets_migration: Handle failed write_both_read_new
Two options here -- go revert to old replicas by jumping into
cleanup_target stage or proceed noramlly. The choice depends on which
replica set has less number of dead nodes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
324eaaf873 topology.tablets_migration: Handle failed write_both_read_old
At this stage it can happen that target replica got some writes, so its
tablet needs to be cleaned up, so jump to cleanup_target stage.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
f81e0b2e88 topology.tablets_migration: Handle failed allow_write_both_read_old
This is early stage, just proceed to existing revert_migration

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
5bb1597a30 test/tablets_migration: Add conditional break-point into barrier handler
There are several transition stages that are executed by the topology
coordinator with the help of barrier-and-drain raft commands. For the
test to stop and remove a node while handling this stage it must inject
a break-point into barrier handler, wait for it to happen and then stop
the node without resuming the break-point. Then removenode from the
cluster.

The break-point suspends barrier handling when a specific tablet is in
specific transition stage. Tablet ID and desired stage are configured
via injector parameters.

With today's error-injection facilities the way to suspend code
execution is with injecting a lambda that waits for a message from the
injection engine.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:26 +03:00
Pavel Emelyanov
f5264dc501 replica: Add helper to read tablet transition stage
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:47:25 +03:00
Kefu Chai
4f8b618be7 scylla-gdb: s/if not foo is None/if foo is not None/
more readable this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
643a6d5bda scylla-gdb.py: add space after keyword
it'd be more pythonic to just put an expression after `assert`,
instead of quoting it with a pair of parenthesis. and there is no need
to add `;` after `break`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
8c65f92f1f scylla-gdb.py: remove extraneous spaces
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
12c06c39c3 scylla-gdb.py: use 2 empty lines between top-level funcs/classes
and 1 empty line for nested functions/classes, to be more PEP8
compliant. for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
8e3b22c76a scylla-gdb.py: replace <tab> with 4 spaces
do not mix tab and spaces for indent

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Kefu Chai
c4b679fe3b scylla-gdb: fix the indent
indent should be multiple of 4 spaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-07 13:46:38 +08:00
Pavel Emelyanov
79b5a75ded topology_coordinator: Add action_failed() helper
It checks if the action holder holds a failed action.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-03-07 08:46:29 +03:00
Botond Dénes
8dd6fe75e7 Merge 'tools/scylla-nodetool: implement info ' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17498

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement info
  test/nodetool: move format_size into utils.py
2024-03-07 07:14:51 +02:00
Avi Kivity
c5f01349b1 Merge 'Add specialized tablet_sstable_set' from Benny Halevy
Make a specialized sstable_set for tablets
via tablet_storage_group_manager::make_sstable_set.

This sstable set takes a snapshot of the storage_groups
(compound) sstable_sets and maps the selected tokens
directly into the tablet compound_sstable_set.

This sstable_set provides much more efficient access
to the table's sstable sets as it takes advantage of the disjointness
of sstable sets between tablets/storage_groups, and making it is cheaper
that rebuilding a complete partitioned_sstable_set from all sstables in the table.

Fixes #16876

Cassandra-stress setup:
```
$ sudo cpupower frequency-set -g userspace
$ build/release/scylla (developer-mode options) --smp=16 --memory=8G --experimental-features=consistent-topology-changes --experimental-features=tablets
cqlsh> CREATE KEYSPACE keyspace1 WITH replication={'class':'NetworkTopologyStrategy', 'replication_factor':1} AND tablets={'initial':2048};
$ ./tools/java/tools/bin/cassandra-stress write no-warmup n=10000000 -pop 'seq=1...10000000' -rate threads=128
$ scylla-api-client system drop_sstable_caches POST
$ ./tools/java/tools/bin/cassandra-stress read no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128
$ scylla-api-client system drop_sstable_caches POST
$ ./tools/java/tools/bin/cassandra-stress mixed no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128
```

Baseline (0a7854ea4d) vs. fix (0c2c00f01b)

Throughput (op/s):
workload | baseline | fix
---------|----------|----------
write | 76,806 | 100,787
read | 34,330 | 106,099
mixed | 32,195 | 79,246

Closes scylladb/scylladb#17149

* github.com:scylladb/scylladb:
  table: tablet_storage_group_manager: make tablet_sstable_set
  storage_group_manager: add make_sstable_set
  tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count
  table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode
  table: move compaction_group_list and storage_group_vector to storage_group_manager
  compaction_group::table_state: get_group_id: become self-sufficient
  compaction_group, table: make_compound_sstable_set: declare as const
  tablet_storage_group_manager: precalculate my_host_id and _tablet_map
  table: coroutinize update_effective_replication_map
2024-03-06 23:59:39 +02:00
Botond Dénes
557d851191 tools/toolchain/README.md: mention the need of credentials for publishing images
Without this, the push will fail, complaining about bad permissions.

Closes scylladb/scylladb#17652
2024-03-06 15:58:24 +02:00
Kefu Chai
3e91b1382b tools/scylla-nodetool: always use compile-time format string
instead of passing fmt string as a plain `const char*`, pass it as
a consteval type, so that `fmt::format()` can perform compile-time
format check against it and the formatted params.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17656
2024-03-06 14:55:10 +02:00
Avi Kivity
3ab2088119 Merge 'build: cmake: use scylla build mode for rust profile name ' from Kefu Chai
before this change, we used the lower-case CMake build configuration
name for the rust profile names. but this was wrong, because the
profiles are named with the scylla build mode.

in this change, we translate the `$<CONFIG>` to scylla build mode,
and use it for the profile name and for the output directory of
the built library.

Closes scylladb/scylladb#17648

* github.com:scylladb/scylladb:
  build: cmake: use scylla build mode for rust profile name
  build: cmake: define per-config build mode
2024-03-06 13:46:20 +02:00
Botond Dénes
65b9e10543 repair: resolve start-up deadlock
Repairs have to obtain a permit to the reader concurrency semaphore on
each shard they have a presence on. This is prone to deadlocks:

node1                              node2
repair1_master (takes permit)      repair1_follower (waits on permit)
repair2_master (waits for permit)  repair2_follower (takes permit)

In lieu of strong central coordination, we solved this by making permits
evictable: if repair2 can evict repair1's permit so it can obtain one
and make progress. This is not efficient as evicting a permit usually
means discarding already done work, but it prevents the deadlocks.
We recently discovered that there is a window when deadlocks can still
happen. The permit is made evictable when the disk reader is created.
This reader is an evictable one, which effectively makes the permit
evictable. But the permit is obtained when the repair constrol
structrure -- repair meta -- is create. Between creating the repair meta
and reading the first row from disk, the deadlock is still possible. And
we know that what is possible, will happen (and did happen). Fix by
making the permit evictable as soon as the repair meta is created. This
is very clunky and we should have a better API for this (refs #17644),
but for now we go with this simple patch, to make it easy to backport.

Refs: #17644
Fixes: #17591

Closes scylladb/scylladb#17646
2024-03-06 11:38:07 +02:00
Kamil Braun
19b816bb68 Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz
This patch series makes all auth writes serialized via raft. Reads stay
eventually consistent for performance reasons. To make transition to new
code easier data is stored in a newly created keyspace: system_auth_v2.

Internally the difference is that instead of executing CQL directly for
writes we generate mutations and then announce them via raft group0. Per
commit descriptions provide more implementation details.

Refs https://github.com/scylladb/scylladb/issues/16970
Fixes https://github.com/scylladb/scylladb/issues/11157

Closes scylladb/scylladb#16578

* github.com:scylladb/scylladb:
  test: extend auth-v2 migration test to catch stale static
  test: add auth-v2 migration test
  test: add auth-v2 snapshot transfer test
  test: auth: add tests for lost quorum and command splitting
  test: pylib: disconnect driver before re-connection
  test: adjust tests for auth-v2
  auth: implement auth-v2 migration
  auth: remove static from queries on auth-v2 path
  auth: coroutinize functions in password_authenticator
  auth: coroutinize functions in standard_role_manager
  auth: coroutinize functions in default_authorizer
  storage_service: add support for auth-v2 raft snapshots
  storage_service: extract getting mutations in raft snapshot to a common function
  auth: service: capture string_view by value
  alternator: add support for auth-v2
  auth: add auth-v2 write paths
  auth: add raft_group0_client as dependency
  cql3: auth: add a way to create mutations without executing
  cql3: run auth DML writes on shard 0 and with raft guard
  service: don't loose service_level_controller when bouncing client_state
  auth: put system_auth and users consts in legacy namespace
  cql3: parametrize keyspace name in auth related statements
  auth: parametrize keyspace name in roles metadata helpers
  auth: parametrize keyspace name in password_authenticator
  auth: parametrize keyspace name in standard_role_manager
  auth: remove redundant consts auth::meta::*::qualified_name
  auth: parametrize keyspace name in default_authorizer
  db: make all system_auth_v2 tables use schema commitlog
  db: add system_auth_v2 tables
  db: add system_auth_v2 keyspace
2024-03-06 10:11:33 +01:00
Botond Dénes
58265a7dc1 tools/utils: fix use-after-free when printing error message for unknown operation
When a tool application is invoked with an unknown operation, an error
message is printed, which includes all the known operations, with all
their aliases. This is collected in `std::vector<std::string_view>`. The
problem is that the vector containing alias names, is returned as a
value, so the code ends up creating views to temporaries.
Fix this by returning alias vector with const&.

Fixes: #17584

Closes scylladb/scylladb#17586
2024-03-06 10:42:02 +02:00
Pavel Emelyanov
ca8bfed8e6 topology_coordinator: Demote log level for advance_in_background() errors
The helper in question is supposed to spawn a background fiber with
tablet migration stage action and repeat it in case action fails (until
operator intervention, but that's another story). In case action fails
a message with ERROR level is logger about the failure.

This error confuses some tests that scan scylla log messages for
ERROR-s at the end, treat most of them (if not all) as ciritical and
fail. But this particular message is not in fact an error -- topology
coordinator would re-execute this action anyway, so let's demote the
message to be WARN instead.

refs: #17027

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17568
2024-03-06 10:39:00 +02:00
Botond Dénes
88a76245ba Merge 'Get metrics description' from Amnon Heiman
This series adds a Python script that searches the code for metrics definition and their description.
Because part of the code uses a nonstandard way of definition, it uses a configuration file to resolve parameter values.

The script supports the code that uses string format and string concatenation with variables.

The documentation team will use the results to both document the existing metrics and to get the metrics changes between releases.

Replaces #16328

Closes scylladb/scylladb#17479

* github.com:scylladb/scylladb:
  Adding scripts/metrics-config.yml
  Adding scripts/get_description.py to fetch metrics description
2024-03-06 10:37:35 +02:00
Kefu Chai
e248ab48db tools/scylla-nodetool: correct tablestats filtering
before this change, we failed to apply the filtering of tablestats
command in the right way:

1. `table_filter` failed to check if delimiter is npos before
   extract the cf component from the specified table name.
2. the stats should not included the keyspace which are not
   included by the filter.
3. the total number of tables in the stats report should contain
   all tables no matter they are filtered or not.

in this change, all the problems above are addressed. and the tests
are updated to cover these use cases.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17468
2024-03-06 10:36:20 +02:00
Benny Halevy
0c2c00f01b table: tablet_storage_group_manager: make tablet_sstable_set
Make a specialized sstable_set for tablets
via tablet_storage_group_manager::make_sstable_set.

This sstable set takes a snapshot of the storage_groups
(compound) sstable_sets and maps the selected tokens
directly into the tablet compound_sstable_set.

Refs #16876

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
0745865914 storage_group_manager: add make_sstable_set
Move the responsibility for preparing the table_set
covering all sstables in the table to the storage_group_manager
so it can specialize the sstable_set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
3cee24c148 tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count
Mini-cleanup of `new_tablet_count`, similar
to pre-calculating `old_tablet_count` once.

While at it, add some missing coding-style related spaces.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
c65768dc24 table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode
No validation is really required in release build.
Add `#ifndef SCYLLA_BUILD_MODE_RELEASE` before
adding another term to the logic in the next patch
that adds support for sparse allocation in a cloned
tablet_storage_group_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:36 +02:00
Benny Halevy
7f203f0551 table: move compaction_group_list and storage_group_vector to storage_group_manager
So the storage_group_manager can be used later by table_sstable_set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:35:33 +02:00
Tzach Livyatan
a245c0bb98 Docs: Remove 3rd party Rust Driver from the driver list
The 3rd party Rust https://github.com/AlexPikalov/cdrs is not maintained, and we have a better internal alternative.

Closes scylladb/scylladb#15815
2024-03-06 10:34:43 +02:00
Aleksandra Martyniuk
923ef3c8c8 repair: reuse table name from repair_range argument
Currently in shard_repair_task_impl::repair_range table name is
retrieved with database::find_column_family and in case of exception,
we return from the function.

But the table name is already kept in table_info passed to repair_range
as an argument. Let's reuse it. If a table is dropped, we will find it
out almost immediately after calling repair_cf_range_row_level and
handle it more adequately.

Closes scylladb/scylladb#17245
2024-03-06 10:34:21 +02:00
Botond Dénes
41424231f1 Merge 'compaction: reshape sstables within compaction groups' from Lakshmi Narayanan Sreethar
For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. The shard_reshaping_compaction_task_impl now groups the sstables based on their compaction groups before reshaping them.

Fixes https://github.com/scylladb/scylladb/issues/16966

Closes scylladb/scylladb#17395

* github.com:scylladb/scylladb:
  test/topology_custom: add testcase to verify reshape with tablets
  test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction
  replica/distributed_loader: enable reshape for sstables
  compaction: reshape sstables within compaction groups
  replica/table : add method to get compaction group id for an sstable
  compaction: reshape: update total reshaped size only on success
  compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run
2024-03-06 10:33:56 +02:00
Botond Dénes
f164ed8bae Merge 'docs: fix the formattings in operating-scylla/nodetool-commands/info.rst' from Kefu Chai
couple minor formatting fixes.

Closes scylladb/scylladb#17518

* github.com:scylladb/scylladb:
  docs: remove leading space in table element
  docs: remove space in words
2024-03-06 10:33:21 +02:00
Tzach Livyatan
dafc83205b Docs: rename the select-from-mutation-fragments page name
Closes scylladb/scylladb#17456
2024-03-06 10:32:56 +02:00
David Garcia
d27d89fd34 docs: add collapsible for images
Introduces collapsible dropdowns for images reference docs. With this update, only the latest version's details will be displayed open by default. Information about previous versions will be hidden under dropdowns, which users can expand as needed. This enhancement aims to make pages shorter and easier to navigate.

Closes scylladb/scylladb#17492
2024-03-06 10:32:35 +02:00
Botond Dénes
dce42b2517 Merge 'tools/scylla-nodetool: fixes to address the test failure with dtest' from Kefu Chai
- use API endpoint of /storage_service/toppartition/
- only print out the specified samplings.
- print "\n" separator between samplings

Closes scylladb/scylladb#17574

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: print separator between samplings
  tools/scylla-nodetool: only print the specified sampling
  tools/scylla-nodetool: use /storage_service/toppartition/
2024-03-06 10:27:25 +02:00
David Garcia
847882b981 docs: add dynamic substitutions
This pull request adds dynamic substitutions for the following variables:

* `.. |CURRENT_VERSION| replace:: {current_version}`
* `.. |UBUNTU_SCYLLADB_LIST| replace:: scylla-{current_version}.list`
* `.. |CENTOS_SCYLLADB_REPO| replace:: scylla-{current_version}.repo`

As a result, it is no longer needed to update the "Installation on Linux" page manually after every new release.

Closes scylladb/scylladb#17544
2024-03-06 10:25:57 +02:00
comsky
48ad1b3d20 Update stats-output.rst
I read this doc to learn how to use nodetool commands, and I eventually found some typos in the docs. 😄

Closes scylladb/scylladb#15771
2024-03-06 10:25:06 +02:00
Kefu Chai
7bb33a1f8d node_ops: add fmt::formatter for node_ops_cmd and node_ops_cmd_request
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* node_ops_cmd
* node_ops_cmd_request

their operator<<:s are dropped

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17505
2024-03-06 10:24:31 +02:00
Benny Halevy
dc10d02890 compaction_group::table_state: get_group_id: become self-sufficient
Printing the compaction_group group_id as "i/size"
where size is the total number of compaction_groups in
the table is convenient but it comes with a price
of a circular dependency on the table, as noted by
Aleksandra Martyniuk in c25827feb3 (r1511341251),
which can be triggered when hitting an error when adding the
compaction_group::table_state to the table's compaction_manager
within the table's constructor.

This patch just prints the _group_id member
resolving the dependency on the table.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:21:48 +02:00
Avi Kivity
6383aa1e3c docs: maintainer.md: add exceptions to the don't-commit-your-own-code rules
Submodule and toolchain updates aren't original code and so are exempt
from the don't-commit-own-code rule.

Closes scylladb/scylladb#17534
2024-03-06 10:19:46 +02:00
Tzach Livyatan
04b483e286 Docs: fix RF type in the consistency-calculator
Closes scylladb/scylladb#17557
2024-03-06 10:18:29 +02:00
Kefu Chai
d93b018bcf create-relocatable-package.py: add --debian-dir option
before this change, we assume that debian packaging directory is
always located under `build/debian/debian`. which is hardwired by
`configure.py`. but this could might hold anymore, if we want to
have a self-contained build, in the sense that different builds do
not share the same build directory. this could be a waste for the
non-mult-config build, but `configure.py` uses mult-config generator
when building with CMake. so in that case, all builds still share the
same $build_dir/debian/ directory.

in order to work with the out-of-source build, where the build
directory is not necessarily "build", a new option is added to
`create-relocatable-package.py`, this allows us to specify the directory
where "debian" artifacts are located.

Refs #15241

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17558
2024-03-06 10:18:00 +02:00
Kefu Chai
19e02de1aa transport/controller: remove unused struct definition
the removed struct definition is not used, so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17537
2024-03-06 10:17:08 +02:00
Tzach Livyatan
1edce9f4b6 Improve the frozen vs. non-frozen doc section, removing falses claimes
Closes scylladb/scylladb#17556
2024-03-06 10:16:33 +02:00
Kefu Chai
4d4c0ddf31 build: cmake: exclude Seastar's tests from "all"
in 02de9f1833, we enable building Seastar testing for using the
testing facilities in scylla's own tests. but this brings in
Seastar's tests.

since scylladb's CI builds the "all" targets, and we are not
interested in running Seastar's tests when building scylladb,
let's exclude Seastar's tests from the "all" target.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17554
2024-03-06 10:15:45 +02:00
Benny Halevy
bfe13daed4 compaction_group, table: make_compound_sstable_set: declare as const
It does not modify the compaction_group/table respectively.
This is required by the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:15:34 +02:00
Benny Halevy
d7b1851449 tablet_storage_group_manager: precalculate my_host_id and _tablet_map
The node host_id never changes, so get it once,
when the object is constructed.

A pointer to the tablet_map is taken when constructed
using the effective_replication_map and it is
updated whenever the e_r_m changes, using a newly added
`update_effective_replication_map` method.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:15:34 +02:00
Benny Halevy
f2ff701489 table: coroutinize update_effective_replication_map
It's better to wait on deregistering the
old main compaction_groups:s in handle_tablet_split_completion
rather than leaving work in the background.
Especially since their respective storage_groups
are being destroyed by handle_tablet_split_completion.

handle_tablet_split_completion keeps a continuation chain
for all non-ready compaction_group stop fibers.
and returns it so that update_effective_replication_map
can await it, leaving no cleanup work in the background.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-06 10:15:34 +02:00
Konstantin Osipov
39d882ddca main: print pid (process id) at start
Print process id to the log at start.
It aids debugging/administering the instance if you have multiple
instances running on the same machine.

Closes scylladb/scylladb#17582
2024-03-06 10:14:22 +02:00
Kefu Chai
80d2981473 dist/docker: collect deb packages from different dir for CMake builds
CMake generate debian packages under build/$<CONFIG>/debian instead of
build/$mode/debian. so let's translate $mode to $<CONFIG> if
build.ninja is found under build/ directory, as configure.py puts
build.ninja under $top_srcdir, while CMake puts it under build/ .

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17592
2024-03-06 10:13:47 +02:00
Botond Dénes
d37ac1545b Merge 'build: cmake: fixes for debian packaging' from Kefu Chai
- changes to use build/$<CONFIG> for build directory
- add ${CMAKE_BINARY_DIR}/debian as a dep
- generate deb packages under build/$<CONFIG>/debian

Closes scylladb/scylladb#17560

* github.com:scylladb/scylladb:
  build: cmake: generate deb packages under build/$<CONFIG>/debian
  build: cmake: add ${CMAKE_BINARY_DIR}/debian as a dep
  build: cmake: use build/$<CONFIG>/ instead of build
  build: cmake: always pass absolute path for add_stripped()
2024-03-06 10:12:18 +02:00
Anna Stuchlik
a024c2d692 doc: remove Membership changes vs LWT page
This commit removes the redundant
"Cluster membership changes and LWT consistency" page.

The page is no longer useful because the Raft algorithm
serializes topology operations, which results in
consistent topology updates.

Closes scylladb/scylladb#17523
2024-03-06 10:10:01 +02:00
Kefu Chai
e8473d6d03 row_cache: add fmt::formatter for cache_entry
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for cache_entry, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17594
2024-03-06 10:08:11 +02:00
Botond Dénes
6f374aa7d6 Merge 'doc: update procedures following the introduction of Raft-based topology' from Anna Stuchlik
This PR updates the procedures that changed as a result of introducing Raft-based topology.

Refs https://github.com/scylladb/scylladb/issues/15934
Applied the updates from https://docs.google.com/document/d/1BgZaYtKHs2GZKAxudBZv4G7uwaXcRt2jM6TK9dctRQg/edit

In addition, it adds a placeholder for the 5.4-to-6.0 upgrade guide, as a file included in that guide, Enable Raft topology, is referenced from other places in the docs.

Closes scylladb/scylladb#17500

* github.com:scylladb/scylladb:
  doc: replace "Raft Topology" with "Consistent Topology"
  doc: (Raft topology) update Removenode
  doc: (Raft topology) update Upscale a Cluster
  doc:(Raft topology)update Membership Change Failures
  doc: doc: (Raft topology) update Replace Dead Node
  doc: (Raft topology) update Remove a Node
  doc: (Raft topology) update Add a New DC
  doc: (Raft topology) update Add a New Node
  doc: (Raft topology) update Create Cluster (EC2)
  doc: (Raft topology) update Create Cluster (n-DC)
  doc: (Raft topology) update Create Cluster (1DC)
  doc: include the quorum requirement file
  doc: add the quorum requirement file
  doc: add placeholder for Enable Raft topology page
2024-03-06 10:05:47 +02:00
Botond Dénes
c843f98769 Merge 'cql3: add fmt::formatter for cql3 types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* std::vector<data_type>
* column_identifier
* column_identifier_raw
* untyped_constant::type_class

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17538

* github.com:scylladb/scylladb:
  cql3: add fmt::formatter for expression::printer
  cql3: add fmt::formatter for raw_value{,_view}
  cql3: add fmt::formatter for std::vector<data_type>
  cql3: add fmt::formatter for untyped_constant::type_class
  cql3: add fmt::formatter for column_identifier{,_row}
2024-03-06 10:03:50 +02:00
Kefu Chai
1519904fb9 docs: quote CQL keywords
this "misspelling" was identified by codespell. actually, it's not
quite a misspelling, as "UPDATE" and "INSERT" are keywords in CQL.
so we intended to emaphasis them, so to make codespell more useful,
and to preserve the intention, let's quote the keywords with backticks.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17391
2024-03-06 09:57:07 +02:00
Kefu Chai
51a789afc1 build: cmake: use scylla build mode for rust profile name
before this change, we used the lower-case CMake build configuration
name for the rust profile names. but this was wrong, because the
profiles are named with the scylla build mode.

in this change, we translate the $<CONFIG> to scylla build mode,
and use it for the profile name and for the output directory of
the built library.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-06 15:53:11 +08:00
Kefu Chai
0c1864eebd build: cmake: define per-config build mode
so that scylla_build_mode_$<CONFIG> can be referenced when necessary.
we using it for referencing build mode in the building system instead
of the CMake configuration name.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-06 15:53:11 +08:00
Kefu Chai
7e9b0d3d9e network_topology_strategy: use structured binding when appropriate
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17642
2024-03-06 09:52:20 +02:00
Botond Dénes
c370f42d8b Merge 'Automation of ScyllaDB backports - Phase #1: Master → OSS backports' from Yaron Kaikov
This PR includes 3 commits:

- **[actions] Add a check for backport labels**: As part of the Automation of ScyllaDB backports project, each PR should get either a `backport/none` or `backport/X.Y` label. Based on this label we will automatically open a backport PR for the relevant OSS release.

In this commit, I am adding a GitHub action to verify if such a label was added. This only applies to PR with a based branch of `master` or `next`. For releases, we don't need this check

- **Add Mergify (https://mergify.com/) configuration file**: In this PR we introduce the `.mergify.yml` configuration file, which
include a set of rules that we will use for automating our backport
process.

For each supported OSS release (currently 5.2 and 5.4) we have an almost
identical configuration section which includes the four conditions before
we open a backport pr:
* PR should be closed
* PR should have the proper label. for example: backport/5.4 (we can
  have multiple labels)
* Base branch should be `master`
* PR should be set with a `promoted` label - this condition will be set
  automatically once the commits are promoted to the `master` branch (passed
gating)

Once all conditions are applied, the verify bot will open a backport PR and
will assign it to the author of the original PR, then CI will start
running, and only after it pass. we merge

- **[action] Add promoted label when commits are in master**: In Scylla, we don't merge our PR but use ./script/pull_github_pr.sh` to close the pull request, adding `closes scylladb/scylladb <PR number>` remark and push changes to `next` branch.

One of the conditions for opening a backport PR is that all relevant commits are in `master` (passed gating), in this GitHub action, we will go through the list of commits once a push was made to `master` and will identify the relevant PR, and add `promoted` label to it. This will allow Mergify to start the process of backporting

Closes scylladb/scylladb#17365

* github.com:scylladb/scylladb:
  [action] Add promoted label when commits are in master
  Add mergify (https://mergify.com/) configuration file
  [actions] Add a check for backport labels
2024-03-06 09:50:30 +02:00
Dawid Medrek
b36becc1f3 db/hints: Fix too_many_in_flight_hints_for
The semantics of the function was accidentally
modified in 6e79d64. The consequence of the change
was that we didn't limit memory consumption:
the function always returned false for any node
different from the local node. The returned value
is used by storage_proxy to decide whether it
is able to store a hint or not.

This commit fixes the problem by taking other
nodes into consideration again.

Fixes #17636

Closes scylladb/scylladb#17639
2024-03-06 09:48:30 +02:00
Benny Halevy
08b0426318 scripts/open-coredump.sh: calculate MAIN_BRANCH before cloning repo
We need MAIN_BRANCH calculated earlier so we can use it
to checkout the right branch when cloning the src repo
(either `master` or `enterprise`, based on the detected `PRODUCT`)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17647
2024-03-06 09:46:30 +02:00
Avi Kivity
c32a4c8d5c build: docker: clean up after docker build
The `buildah commit` command doesn't remove the working container. These
accumulate in ~/.local/container/storage until something bad happens.

Fix by adding the `--rm` flag to remove the container and volume.

Closes scylladb/scylladb#17546
2024-03-06 09:41:36 +02:00
Kefu Chai
3d8ac06ee8 cql3: add fmt::formatter for expression::printer
before this change, we already have a `fmt::formatter` specialized for
`expression::printer`. but the formatter was implemented by

1. formatting the `printer` instance to an `ostringstream`, and
2. extracting a `std::string` from this `ostringstream`
3. formatting the `std::string` instance to the fmt context

this is convoluted and is not an optimal implementation. so,
in this change, it is reimplemented by formatting directly to
the context. its operator<< is also dropped in this change.
please note, to avoid adding the large chunk of code into the
.hh file, the implementation is put in the .cc file. but in order
to preserve the usage of `transformed(fmt::to_string<expression::printer>)`,
the `format()` function is defined as a template, and instantiated
explicitly for two use cases:

1. to format to `fmt::context`
2. to format using `fmt::to_string()`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-05 14:00:13 +08:00
Kefu Chai
fc774361e8 cql3: add fmt::formatter for raw_value{,_view}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* raw_value
* raw_value_view

`raw_value_view` 's operator<< is still being used by the generic
homebrew printer for vector<>, so it is preserved.

`raw_value` 's operator<< is still being used by the generic
homebrew printer for optional<>, so it's preserved as well.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-05 14:00:13 +08:00
Kamil Braun
0a7854ea4d Merge 'test: test_topology_ops: fix flakiness and reenable bg writes' from Patryk Jędrzejczak
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.

Since scylladb/scylladb#15924 is the last issue mentioned in
scylladb/scylladb#15962, this PR also reenables background
writes in `test_topology_ops` with tablets disabled. The test
doesn't pass with tablets and background writes because of
scylladb/scylladb#17025. We will reenable background writes
with tablets after fixing that issue.

Fixes scylladb/scylladb#15924
Fixes scylladb/scylladb#15962

Closes scylladb/scylladb#17585

* github.com:scylladb/scylladb:
  test: test_topology_ops: reenable background writes without tablets
  test: test_topology_ops: run with and without tablets
  test: topology: decrease the server's request timeouts
2024-03-04 20:57:24 +01:00
Patryk Jędrzejczak
f1d9248df9 test: wait for CDC generations publishing before checking CDC-topology consistency
Tests that verify upgrading to the raft-based topology
(`test_topology_upgrade`, `test_topology_recovery_basic`,
`test_topology_recovery_majority_loss`) have flaky
`check_system_topology_and_cdc_generations_v3_consistency` calls.
`assert topo_results[0] == topo_res` can fail because of different
`unpublished_cdc_generations` on different nodes.

The upgrade procedure creates a new CDC generation, which is later
published by the CDC generation publisher. However, this can happen
after the upgrade procedure finishes. In tests, if publishing
happens just before querying `system.topology` in
`check_system_topology_and_cdc_generations_v3_consistency`, we can
observe different `unpublished_cdc_generations` on different nodes.
It is an expected and temporary inconsistency.

For the same reasons,
`check_system_topology_and_cdc_generations_v3_consistency` can
fail after adding a new node.

To make the tests not flaky, we wait until the CDC generation
publisher finishes its job. Then, all nodes should always have
equal (and empty) `unpublished_cdc_generations`.

Fixes scylladb/scylladb#17587
Fixes scylladb/scylladb#17600
Fixes scylladb/scylladb#17621

Closes scylladb/scylladb#17622
2024-03-04 19:28:51 +02:00
Kamil Braun
ec1f574b3a test/pylib: util: silence exception from refresh_nodes
Driver's `refresh_nodes` function may throw an exception if we call it
in the middle of driver reconnecting. Silence it.

Fixes scylladb/scylladb#17616

Closes scylladb/scylladb#17620
2024-03-04 17:50:16 +02:00
Avi Kivity
e3de30f943 tools: toolchain: update frozen toolchain for python driver 3.26.7
Fixes scylladb/scylladb#16709
Fixes scylladb/scylladb#17353

Closes scylladb/scylladb#17604
2024-03-03 16:36:14 +02:00
Kefu Chai
4cc5fcde72 cql3: add fmt::formatter for std::vector<data_type>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for std::vector<data_type>,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-02 10:52:50 +08:00
Kefu Chai
ed6dc6e3b4 cql3: add fmt::formatter for untyped_constant::type_class
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for untyped_constant::type_class,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-02 10:52:50 +08:00
Kefu Chai
213d13a31c cql3: add fmt::formatter for column_identifier{,_row}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* column_identifier
* column_identifier_raw

and their operator<<:s are dropped.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-02 10:52:50 +08:00
Marcin Maliszkiewicz
eb56ae3bb9 test: extend auth-v2 migration test to catch stale static 2024-03-01 16:31:04 +01:00
Marcin Maliszkiewicz
6c30dc6351 test: add auth-v2 migration test 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
53996e2557 test: add auth-v2 snapshot transfer test 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
4f65e173cf test: auth: add tests for lost quorum and command splitting
With auth-v2 we can login even if quorum is lost. So test
which checks if error occurs in such situation is deleted
and the opposite test which checks if logging in works was
added.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
a5f81f0836 test: pylib: disconnect driver before re-connection 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
1badd09d45 test: adjust tests for auth-v2 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
ebb0ffeb6c auth: implement auth-v2 migration
During raft topology upgrade procedure data from
system_auth keyspace will be migrated to system_auth_v2.

Migration works mostly on top of CQL layer to minimize
amount of new code introduced, it mostly executes SELECTs
on old tables and then INSERTs on new tables. Writes are
not executed as usual but rather announced via raft.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
a8175ce5c6 auth: remove static from queries on auth-v2 path
Because keyspace is part of the query when we
migrate from v1 to v2 query should change otherwise
code would operate on old keyspace if those statics
were initialized.

Likewise keyspace name can no longer be class
field initialized in constructor as it can change
during class lifetime.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
ca488c5777 auth: coroutinize functions in password_authenticator
Affected functions are: create, create_default_if_missing,
authenticate, alter, drop
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
9f172f1843 auth: coroutinize functions in standard_role_manager
Affected functions are: find_record, create_default_role_if_missing,
create_or_replace, drop, modify_membership, query_all, get_attribute,
set_attribute, remove_attribute
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
896b474db0 auth: coroutinize functions in default_authorizer
Affected functions: authorize, list_all, revoke_all
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
5a6d4dbc37 storage_service: add support for auth-v2 raft snapshots
This patch adds new RPC for pulling snapshot of auth tables.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
c27a84d8e7 storage_service: extract getting mutations in raft snapshot to a common function 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
17572a0e44 auth: service: capture string_view by value
This doesn't seem to fix anything but typically
we capture string_view by value, so do it consistently
the same way.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
9cb1f111d5 alternator: add support for auth-v2
Alternator doesn't do any writes to auth
tables so it's simply change of keyspace
name.

Docs will be updated later, when auth-v2
is enabled as default.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
913a773b1a auth: add auth-v2 write paths
All auth modifications will go now via group0.
This is achieved by acquiring group0 guard,
creating mutations without executing and
then announcing them.

Actually first guard is taken by query processor,
it serves as read barrier for query validations
(such as standard_role_manager::exists), otherwise
we could read older data. In principle this single
guard should be used for entire query but it's impossible
to achive with current code without major refactor.

For read before write cases it's good to do write with
the guard acquired before the read so that there
wouldn't be any modify operation allowed in between.
Alought not doing it doesn't make the implementation
worse than it currently is so the most complex cases
were left with FIXME.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
7f204a6e80 auth: add raft_group0_client as dependency
Most auth classes need this to be able to announce
raft commands.

Usage added in subsequent commit.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
bd444ed6f1 cql3: auth: add a way to create mutations without executing
To make table modifications go via raft we need to publish
mutations. Currently many system tables (especially auth) use
CQL to generate table modifications. Added function is a missing
link which will allow to do a seamless transition of certain
system tables to raft.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
b482679857 cql3: run auth DML writes on shard 0 and with raft guard
Because we'll be doing group0 operations we need to run on shard 0. Additional benefit
is that with needs_guard set query_processor will also do automatic retries in case of
concurrent group0 operations.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
5607aa590e service: don't loose service_level_controller when bouncing client_state
When bounce_to_shard happens we need to fill client_state with
sl_controller appropriate for destination shard.

Before the patch sl_controller was set to null after the bounce.
It was fine becauase looks like it was never used in such scenario.
With auth-v2 we need to bounce attach/detach service level statements
because they modify things via auth subsystem which needs to be called
on shard 0.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
e26e786340 auth: put system_auth and users consts in legacy namespace
This is done to clearly mark legacy (no longer used, once auth-v2
feature becomes default) code paths.
2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
661eec6e07 cql3: parametrize keyspace name in auth related statements 2024-03-01 16:25:11 +01:00
Marcin Maliszkiewicz
6728965869 auth: parametrize keyspace name in roles metadata helpers 2024-03-01 16:25:03 +01:00
Marcin Maliszkiewicz
f9b985b68c auth: parametrize keyspace name in password_authenticator
It's the same approach as done for standard_role_manager in
earlier commit.
2024-03-01 16:24:54 +01:00
Marcin Maliszkiewicz
1901b1c808 auth: parametrize keyspace name in standard_role_manager
It's the same approach as done for default_authorizer in
earlier commit.

Note that only non-legacy paths were changed, in particular
legacy migrations and table creations won't be ever executed
in new keyspace as they will be managed by system_auth_keyspace
implementation.

For now we add keyspace name as class member because it's static
value anyway. But statics will be removed in future commits because
migration can occur and auth need to switch keyspace name in runtime.
2024-03-01 16:24:32 +01:00
Marcin Maliszkiewicz
12d7b40b34 auth: remove redundant consts auth::meta::*::qualified_name
Just follow the same pattern as in default_authorizer so
it's easy to track where system_auth keyspace is actually
used. It will also allow for easier parametrization.
2024-03-01 16:24:32 +01:00
Marcin Maliszkiewicz
ae2d8975b9 auth: parametrize keyspace name in default_authorizer
When adding group0 replication for auth we will change only
write path and plan to reuse read path. To not copy the code
or make more complicated class hierarchy default_authorizer's
read code will remain unchanged except this parametrization,
it is needed as group0 implementation uses separate keyspace
(replication is defined on a keyspace level).

In subsequent commits legacy write path code will be separated
and new implementation placed in default_authorizer.

For now we add keyspace name as class member because it's static
value anyway. But statics will be removed in future commits because
migration can occur and auth need to switch keyspace name in runtime.
2024-03-01 16:22:17 +01:00
Gleb Natapov
94cd235888 topology_coordinator: drop group0 guard while changing raft configuration
Changing config under the guard can cause a deadlock.

The guard holds _read_apply_mutex. The same lock is held by the group0
apply() function. It means that no entry can be applied while the guard
is held and raft apply fiber may be even sleeping waiting for this lock
to be release. Configuration change OTOH waits for the config change
command to be committed before returning, but the way raft is implement
is that commit notifications are triggered from apply fiber which may
be stuck. Deadlock.

Drop and re-take guard around configuration changes.

Fixes scylladb/scylladb#17186
2024-03-01 11:20:15 +01:00
Marcin Maliszkiewicz
d3679de1d2 db: make all system_auth_v2 tables use schema commitlog 2024-03-01 10:40:29 +01:00
Marcin Maliszkiewicz
a706424825 db: add system_auth_v2 tables
Their schema is equivalent to legacy tables
in system_auth.
2024-03-01 10:40:29 +01:00
Marcin Maliszkiewicz
9144d8203b db: add system_auth_v2 keyspace
New keyspace is added similarly as system_schema keyspace,
it's being registred via system_keyspace::make which calls
all_tables to build its schema.

Dummy table 'roles' is added as keyspaces are being currently
registered by walking through their tables. Full table schemas
will be added in subsequent commits.

Change can be observed via cqlsh:

cassandra@cqlsh> describe keyspaces;

system_auth_v2  system_schema       system         system_distributed_everywhere
system_auth     system_distributed  system_traces

cassandra@cqlsh> describe keyspace system_auth_v2;

CREATE KEYSPACE system_auth_v2 WITH replication = {'class': 'LocalStrategy'}  AND durable_writes = true;

CREATE TABLE system_auth_v2.roles (
    role text PRIMARY KEY
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = 'comment'
    AND compaction = {'class': 'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 604800
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
2024-03-01 10:40:29 +01:00
Kefu Chai
ca7f7bf8e2 build: cmake: generate deb packages under build/$<CONFIG>/debian
this follows the convention of configure.py, which puts
debian packages under build/$mode/debian.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-03-01 09:50:30 +08:00
Patryk Jędrzejczak
e7d4e080e9 test: test_topology_ops: reenable background writes without tablets
After fixing scylladb/scylladb#15924 in one of the previous
patches, we reenable background writes in `test_topology_ops`.

We also start background writes a bit later after adding all nodes.
Without this change and with tablets, the test fails with:
```
>       await cql.run_async(f"CREATE TABLE tbl (pk int PRIMARY KEY, v int)")
E       cassandra.protocol.ConfigurationException: <Error from server: code=2300
        [Query invalid because of configuration issue] message="Datacenter
        datacenter1 doesn't have enough nodes for replication_factor=3">
```

The change above makes the test a bit weaker, but we don't have to
worry about it. If adding nodes is bugged, other tests should
detect it.

Unfortunately, the test still doesn't pass with tablets and
background writes because of scylladb/scylladb#17025, so we keep
background writes disabled with tablets and leave FIXME.

Fixes scylladb/scylladb#15962
2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak
90317c5ceb test: test_topology_ops: run with and without tablets
`test_topology_ops` is a valuable test that has uncovered many bugs.
It's worth running it with and without tablets.
2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak
9dfb26428b test: topology: decrease the server's request timeouts
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.

A high server's request timeout can slow down the topology tests
(see the new comment in `make_scylla_conf`). We make the timeout
dependent on the testing mode to not slow down tests for no reason.

We don't touch the driver's request timeout. Decreasing it in some
modes would require too much effort for almost no improvement.

Fixes scylladb/scylladb#15924
2024-02-29 18:37:38 +01:00
Gleb Natapov
4ef57096bc topology coordinator: fix use after free after streaming failure
node.rs pointer can be freed while guard is released, so it cannot be
accessed during error processing. Save state locally.

Fixes #17577

Message-ID: <Zd9keSwiIC4v_EiF@scylladb.com>
2024-02-29 18:27:12 +02:00
Kamil Braun
57b14580f0 Merge 'move migration_request handling to shard0' from Gleb
The RPC is used by group0 now which is available only on shard0

Fixes scylladb/scylladb#17565

* 'gleb/migration-request-shard0' of github.com:scylladb/scylla-dev:
  raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
  migration_manager: fix indentation after the previous patch.
  messaging_service: process migration_request rpc on shard 0
2024-02-29 15:13:16 +01:00
Anna Stuchlik
85cfc6059b doc: replace "Raft Topology" with "Consistent Topology"
This commit replaces "Raft-based Topology" with
"Consistent Topology Updates"
in the 5.4-to-6.0 upgrade guide and all the links to it.
2024-02-29 14:42:30 +01:00
Anna Stuchlik
9250e0d8e0 doc: (Raft topology) update Removenode
This commit updates the Nodetool Removenode page
with reference to the Raft-related topology.
Specifically, it removes outdated warnings, and
adds the information about banning removed and ignored
nodes from the cluster.
2024-02-29 14:40:19 +01:00
Anna Stuchlik
d59f38a6ad doc: (Raft topology) update Upscale a Cluster
This commit updates the Upscale a Cluster page
with reference to the Raft-related topology.
Specifically, it adds a note with the quorum requirement.
2024-02-29 14:40:11 +01:00
Anna Stuchlik
5bece99d4d doc:(Raft topology)update Membership Change Failures
This commit updates the Handling Cluster Membership Change Failures page
with reference to the Raft-related topology.
Specifically, it adds a note that the page only applies when
Raft-based topology is not enabled.
In addition, it removes the Raft-enabled option.
2024-02-29 14:38:45 +01:00
Anna Stuchlik
48dd7021a7 doc: doc: (Raft topology) update Replace Dead Node
This commit updates the Replace a Dead Node page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to replace the nodes one by one and the requirement to ensure
that the the replaced node will never come back to the cluster
In addition, a warning is added to indicate the limitations
when Raft-base topology is not enabled upon upgrade from 5.4.
2024-02-29 14:38:45 +01:00
Anna Stuchlik
a390ce9e6b doc: (Raft topology) update Remove a Node
This commit updates the Remove a Node page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to remove the nodes one by one and the requirement to ensure
that the the removed node will never come back to the cluster
In addition, a warning is added to indicate the limitations
when Raft-base topology is not enabled upon upgrade from 5.4.
2024-02-29 14:38:45 +01:00
Anna Stuchlik
59f890c0ef doc: (Raft topology) update Add a New DC
This commit updates the Add a New DC) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.
In addition, a warning is added to indicate the limitations
when Raft-base topology is not enabled upon upgrade from 5.4.
2024-02-29 14:38:36 +01:00
Anna Stuchlik
5a3a720b82 doc: (Raft topology) update Add a New Node
This commit updates the Add a New Node (Out Scale) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.
In addition, a warning is added to indicate the limitations
when Raft-base topology is not enabled upon upgrade from 5.4.
2024-02-29 14:35:03 +01:00
Anna Stuchlik
631fcebe12 doc: (Raft topology) update Create Cluster (EC2)
This commit updates the Create Cluster (EC2) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.

In addition, it updates the concept of the seed node.
2024-02-29 14:30:00 +01:00
Anna Stuchlik
b6b610c16e doc: (Raft topology) update Create Cluster (n-DC)
This commit updates the Create Cluster (Multi DC) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.

In addition, it updates the concept of the seed node.
2024-02-29 14:30:00 +01:00
Anna Stuchlik
cbf054f2b9 doc: (Raft topology) update Create Cluster (1DC)
This commit updates the Create Cluster (Single DC) page
with reference to the Raft-related topology.
Specifically, it removes the previous pre-Raft limitation
to bootstrap the nodes one by one.

In addition, it updates the concept of the seed node.
2024-02-29 14:30:00 +01:00
Anna Stuchlik
57e0f15c7c doc: include the quorum requirement file
Include the file to avoid repetition.
2024-02-29 14:29:39 +01:00
Gleb Natapov
9847e272f9 raft_group0_client: assert that hold_read_apply_mutex is called on shard 0
group0 operations a valid on shard 0 only. Assert that.
2024-02-29 12:39:48 +02:00
Gleb Natapov
77907b97f1 migration_manager: fix indentation after the previous patch. 2024-02-29 12:39:48 +02:00
Gleb Natapov
4a3c79625f messaging_service: process migration_request rpc on shard 0
Commit 0c376043eb added access to group0
semaphore which can be done on shard0 only. Unlike all other group0 rpcs
(that already always forwarded to shard0) migration_request does not
since it is an rpc that what reused from non raft days. The patch adds
the missing jump to shard0 before executing the rpc.
2024-02-29 12:39:48 +02:00
Petr Gusev
6afa80a443 sync_raft_topology_nodes: do no emit REMOVED_NODE on IP change
Calling notify_left for old ip on topology change in raft mode
was a regression. In gossiper mode it didn't occur. In gossiper
mode the function handle_state_normal was responsible for spotting
IP addresses that weren't managing any parts of the data, and
it would then initiate their removal by calling remove_endpoint.
This removal process did not include calling notify_left.
Actually, notify_left was only supposed to be called (via excise) by
a 'real' removal procedures - removenode and decommission.

The redundant notify_left caused troubles in scylla python driver.
The driver could receive REMOVED_NODE and NEW_NODE notifications
in the same time and their handling routines could race with each other.

In this commit we fix the problem by not calling notify_left if
the remove_ip lambda was called from the ip change code path.
Also, we add a test which verifies that the driver log doesn't
mention the REMOVED_NODE notification.

Fixes scylladb/scylladb#17444

Closes scylladb/scylladb#17561
2024-02-29 10:18:20 +01:00
Kefu Chai
ce45f93caf tools/scylla-nodetool: print separator between samplings
instead of printing it out after samplings, we should print it
in between them. as toppartitions_test.py in dtest splits the
samplings using "\n\n". without this change, dtest would consider
the empty line as another sampling and then fail the test. as
the empty sampling does not match with the expected regular expressions.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-29 16:17:44 +08:00
Kefu Chai
a53457f740 tools/scylla-nodetool: only print the specified sampling
before this change, we print all samplings returned by the API,
but this is not what cassandra nodetool's behavior, which only
prints out the specified one. and the toppartitions_test.py
in dtest actually expects that the number of sampling should
match with the one specified with command line.

so, in this change, we only print out the specified samplings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-29 16:17:44 +08:00
Kefu Chai
604c7440d2 tools/scylla-nodetool: use /storage_service/toppartition/
instead of using the endpoint of /storage_service/toppartition,
use /storage_service/toppartition/. otherwise API server refuses
to return the expected result. as it does match with any API endpoint.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-29 16:17:44 +08:00
Anna Stuchlik
b02f8a0759 doc: add the quorum requirement file 2024-02-28 13:21:11 +01:00
Botond Dénes
60e04e2c59 test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables
Memtables are fickle, they can be flushed when there is memory pressure,
if there is too much commitlog or if there is too much data in them.
The tests in test_select_from_mutation_fragments.py currently assume
data written is in the memtable. This is tru most of the time but we
have seen some odd test failures that couldn't be understood.
To make the tests more robust, flush the data to the disk and read it
from the sstables. This means that some range scans need to filter to
read from just a single mutation source, but this does not influence
the tests.
2024-02-28 07:00:25 -05:00
Botond Dénes
c228e4d518 cql3: select_statement: mutation_fragments_select_statement: fix use-after-return
Don't capture stack variables by reference... it can (and will) explode
in your face.
2024-02-28 06:48:09 -05:00
Kefu Chai
9dbc30a385 build: cmake: add ${CMAKE_BINARY_DIR}/debian as a dep
create-relocatable-package.py packages debian packaging as well,
so we have to add it as a dependency for the targets which
uses `create-relocatable-package.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-28 16:09:48 +08:00
Kefu Chai
a1cd019e50 build: cmake: use build/$<CONFIG>/ instead of build
with multi-config generator, the generated artifacts are located
under ${CMAKE_BINARY_DIR}/$<CONFIG>/ instead of ${CMAKE_BINARY_DIR}.
so update the paths referencing the built executables. and update
the `--build-dir` option of `create-relocatable-package.py` accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-28 16:09:48 +08:00
Kefu Chai
bf9a895c09 build: cmake: always pass absolute path for add_stripped()
before this change, we assumed that the $<TARGET_FILE:${name}
is the path to the parameter passed to this function, but this was
wrong. it actually refers the `TARGET` argument of the keyword
of this function. also, the path to the generated files should
be located under path like "build/Debug" instead of "build" if
multi-config generator is used. as multi-config builds share
the same `${CMAKE_BINARY_DIR}`.

in this change, instead of acccepting a CMake target, we always
accept an absolute path. and use ""${CMAKE_BINARY_DIR}/$<CONFIG>"
for the directory of the executable, this should work for
multi-config generator which it is used by `configure.py`, when
CMake is used to build the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-28 16:09:48 +08:00
Raphael S. Carvalho
305c63c629 test: test_tablets: Add load-and-stream test
stresses concurrent migration and stream.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 15:18:21 -03:00
Raphael S. Carvalho
771cbf9b79 sstables_loader: Stream to pending tablet replica if needed
Even though taking erm blocks migration, it cannot prevent the
load-and-stream to start while a migration is going on, erm
only prevents migration from advancing.

With tablets, new data will be streamed to pending replica too if
the write replica selector, in transition metadata, is set to both.
If migration is at a later stage where only new replica is written
to, then data is streamed only to new replica as selector is set
to next (== new replica set).

primary_replica_only flag is handled by only streaming to pending
if the primary replica is the one leaving through migration.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 15:17:05 -03:00
Avi Kivity
616eec2214 Merge ' test/topology_custom: test_read_repair.py: reduce run-time ' from Botond Dénes
This test needed a lot of data to ensure multiple pages when doing the read repair. This change two key configuration items, allowing for a drastic reduction of the data size and consequently a large reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa,  reducing this to a too small value would start killing internal queries. Now, after said commit, this is no longer a concern, as this limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.

The latter configuration is a new one, added by the first patches of this series. It allows configuring the page-size in bytes, after which pages are cut. Previously this was a hard-coded constant: 1MB. This forced any tests which wanted to check paging, with pages cut on size, to work with large datasets. This was especially pronounced in the tests fixed in this PR, because this test works with tombstones which are tiny and a lot of them were needed to trigger paging based on the size.

With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8

The runtime of the test consequently drops from 62 seconds to 13.5 seconds (dev mode, on my build machine).

Fixes: https://github.com/scylladb/scylladb/issues/15425
Fixes: https://github.com/scylladb/scylladb/issues/16899

Closes scylladb/scylladb#17529

* github.com:scylladb/scylladb:
  test/topology_custom: test_read_repair.py: reduce run-time
  replica/database: get_query_max_result_size(): use query_page_size_in_bytes
  replica/database: use include page-size in max-result-size
  query-request: max_result_size: add without_page_limit()
  db/config: introduce query_page_size_in_bytes
2024-02-27 18:54:38 +02:00
Aleksandra Martyniuk
9dcb5c76d6 test: rest_api: enable tablets by default
Enable tablets by default. Add --vnodes flag to test/rest_api/run
to run tests without tablets.
2024-02-27 17:46:30 +01:00
Aleksandra Martyniuk
92d87eb1f7 test: fix indentation and delete unused this_dc param 2024-02-27 17:37:31 +01:00
Aleksandra Martyniuk
9cca241ec6 test: rest_api: fix test_storage_service.py
Fix test_storage_service.py to work with tablets.

- test_describe_ring was failing because in storage_service/describe_ring
  table must be specified for keyspaces with tablets.
  Do not check the status if tablets are enabled. Add checks for
  specified table;
- test_storage_service_keyspace_cleanup_with_no_owned_ranges
  was failing because cleanup is disabled on keyspaces with tablets.
  Use test_keyspace_vnodes fixture to use keyspace with tablet disabled;
- test_storage_service_get_natural_endpoints required
  some minor type-related fixes.
2024-02-27 17:34:40 +01:00
Aleksandra Martyniuk
aee0257051 test: rest_api: fix test_repair_task.py
Injection set in test_repair_task_progress didn't consider the case
when repair::shard_repair_task_impl::ranges_size() == 1 which is
true for tablets.

Move the injection so that it is triggered before number of complete
ranges is increased.
2024-02-27 17:33:59 +01:00
Aleksandra Martyniuk
6210c210ff test: rest_api: fix test_compaction_task.py
Fix test_compaction_task.py to work with tablets.

Currently test fail because cleanup on keyspace with tablets is
disabled, and reshape and reshard of keyspace with tablets uses
load_and_stream which isn't covered by tasks.

Use test_keyspace_vnodes for these tests to have a keyspace with
tablets disabled.
2024-02-27 17:32:24 +01:00
Aleksandra Martyniuk
a996ed8be9 test: rest_api: use skip_without_tablets fixture
Use skip_without_tablets in tests that can be run only with tablets
enabled. Delete xfails for these tests.
2024-02-27 17:12:04 +01:00
Aleksandra Martyniuk
1fbe76814e test: rest_api: add some tablet related fixtures
Add fixtures for checking if tablets are enabled or skipping a test
if they are/aren't enabled.
2024-02-27 17:11:57 +01:00
Raphael S. Carvalho
ab498489fe sstables_loader: Implement tablet based load-and-stream
Similar treatment to repair is given to load-and-stream.

Jumps into a new streaming session for every tablet, so we guarantee
data will be segregated into tablets co-habiting the same shard.

Fixes #17315.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 13:04:20 -03:00
Nadav Har'El
fc861742d7 cql: avoid undefined behavior in totimestamp() of extreme dates
This patch fixes a UBSAN-reported integer overflow during one of our
existing tests,

   test_native_functions.py::test_mintimeuuid_extreme_from_totimestamp

when attempting to convert an extreme "date" value, millions of years
in the past, into a "timestamp" value. When UBSAN crashing is enabled,
this test crashes before this patch, and succeeds after this patch.

The "date" CQL type is 32-bit count of *days* from the epoch, which can
span 2^31 days (5 million years) before or after the epoch. Meanwhile,
the "timestamp" type measures the number of milliseconds from the same
epoch, in 64 bits. Luckily (or intentionally), every "date", however
extreme, can be converted into a "timestamp": This is because 2^31 days
is 1.85e17 milliseconds, well below timestamp's limit of 2^63 milliseconds
(9.2e18).

But it turns out that our conversion function, date_to_time_point(),
used some boost::gregorian library code, which carried out these
calculations in **microsecond** resolution. The extra conversion to
microseconds wasn't just wasteful, it also caused an integer overflow
in the extreme case: 2^31 days is 1.85e20 microseconds, which does NOT
fit in a 64-bit integer. UBSAN notices this overflow, and complains
(plus, the conversion is incorrect).

The fix is to do the trivial conversion on our own (a day is, by
convention, exactly 86400 seconds - no fancy library is needed),
without the grace of Boost. The result is simpler, faster, correct
for the Pliocene-age dates, and fixes the UBSAN crash in the test.

Fixes #17516

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17527
2024-02-27 17:04:18 +02:00
Raphael S. Carvalho
b9158e36ef sstables_loader: Virtualize sstable_streamer for tablet
virtualization allows for tablet version of streaming.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:30:14 -03:00
Raphael S. Carvalho
3523cc8063 sstables_loader: Avoid reallocations in vector
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:28:11 -03:00
Raphael S. Carvalho
d1db17d490 sstable_loader: Decouple sstable streaming from selection
That will make it easy to introduce tablet-based load-and-stream.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:28:11 -03:00
Raphael S. Carvalho
0a41f2a11f sstables_loader: Introduce sstable_streamer
Will make it easier to implement tablet oriented variant.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:28:11 -03:00
Raphael S. Carvalho
21533aff0f Fix online SSTable loading with concurrent tablet migration
load-and-stream is currently the only method -- for tablets -- that
can load SSTables while the node is online.
Today, sstable_directory relies on replication map (erm) not being
invalidated during loading, and the assumption is broken with
concurrent tablet migration.
It causes load-and-stream to segfault.

The sstable loader needs the sharder from erm in order to compute
the owning shard.

To fix, let's use auto_refreshing_sharder, which refreshes sharder
every time table has replication map updated. So we guarantee any
user of sharder will find it alive throughout the lifetime of
sstable_directory.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:27:07 -03:00
Gleb Natapov
0c376043eb migration_manager: take group0 lock during raft snapshot taking
Group0 state machine access atomicity is guaranteed by a mutex in group0
client. A code that reads or writes the state needs to hold the log. To
transfer schema part of the snapshot we used existing "migration
request" verb which did not follow the rule. Fix the code to take group0
lock before accessing schema in case the verb is called as part of
group0 snapshot transfer.

Fixes scylladb/scylladb#16821
2024-02-27 11:15:17 +01:00
Botond Dénes
5dc145a93f test/topology_custom: test_read_repair.py: reduce run-time
This test needed a lot of data to ensure multiple pages when doing the
read repair. This change two key configuration items, allowing
for a drastic reduction of the data size and consequently a large
reduction in run-time.
* Changes query-tombstone-page-limit 1000 -> 10. Before f068d1a6fa,
  reducing this to a too small value would start killing internal
  queries. Now, after said commit, this is no longer a concern, as this
  limit no longer affects unpaged queries.
* Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB.

With this two changes, we can reduce the data size:
* total_rows: 20000 -> 100
* max_live_rows: 32 -> 8

The runtime of the test consequently drops from 62 seconds to 13.5
seconds (dev mode, on my build machine).
2024-02-27 02:27:55 -05:00
Botond Dénes
7f3ca3a3d8 replica/database: get_query_max_result_size(): use query_page_size_in_bytes
As the page size for user queries, instead of the hard-coded constant
used before. For system queries, we keep using the previous constant.
2024-02-27 02:27:55 -05:00
Botond Dénes
8213e66815 replica/database: use include page-size in max-result-size
This patch changes get_unlimited_query_max_result_size():
* Also set the page-size field, not just the soft/hard limits
* Renames it to get_query_max_result_size()
* Update callers, specifically storage_proxy::get_max_result_size(),
  which now has a much simpler common return path and has to drop the
  page size on one rare return path.

This is a purely mechanical change, no behaviour is changed.
2024-02-27 02:27:55 -05:00
Botond Dénes
97615e0d9a query-request: max_result_size: add without_page_limit()
Returns an instance with the page_limit reset to 0. This converts a
max_results_size which is usable only with the
"page_size_and_safety_limit" feature, to one which can be used before
this feature.
To be used in the next patch.
2024-02-27 02:14:46 -05:00
Botond Dénes
5e37c1465f db/config: introduce query_page_size_in_bytes
Regulates the page size in bytes via config, instead of the currently
used hard-coded constant. Allows tests to configure lower limits so they
can work with smaller data-sets when testing paging related
functionality.
Not wired yet.
2024-02-27 02:14:45 -05:00
Kefu Chai
0fd85a98a9 mutation: add fmt::formatter for position_range
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `position_range`, and the
helpers for printing related types are dropped.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 20:15:57 +08:00
Kefu Chai
2f532b9ebc mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* mutation_fragment
* range_tombstone_stream

their operator<<:s are dropped

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 20:15:57 +08:00
Beni Peled
c06282b312 docs: always build from the default branch
In order to publish the docs-pages from release branches (see the other
commit), we need to make sure that docs is always built from the default
branch which contains the updated conf.py

Ref https://github.com/scylladb/scylladb/pull/17281
2024-02-26 11:48:38 +02:00
Beni Peled
f59f70fc58 docs: trigger the docs-pages workflow on release branches
Currently, the github docs-pages workflow is triggered only when changes
are merged to the master/enterprise branches, which means that in the
case of changes to a release branch, for example, a fix to branch-5.4,
or a branch-5.4>branch-2024.1 merge, the docs-pages is not triggering and
therefore the documentation is not updated with the new change,

In this change, I added the `branch-**` pattern, so changes to release
branches will trigger the workflow.
2024-02-26 11:48:13 +02:00
Kefu Chai
1fe7a467e7 mutation: add fmt::formatter for mutation_fragment_v2::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for mutation_fragment_v2::printer

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 17:47:05 +08:00
Kefu Chai
3d6948c13e tools/scylla-nodetool: implement info
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 14:52:22 +08:00
Kefu Chai
4d8f74f301 test/nodetool: move format_size into utils.py
so that this helper can be shared across more tests. `test_info.py`
will be using it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 14:52:22 +08:00
Kefu Chai
cd228f4d6c docs: remove leading space in table element
otherwise sphinx would consider "Within which Data Center the"
as the "term" part of an entry in a definition list, and
"node is located" as the definition part of this entry.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 13:03:26 +08:00
Kefu Chai
d12655ff46 docs: remove space in words
* remove space in "Exceptions", otherwise it renders like "Except"
  "tions", which does not look right.
* remove space in "applicable".
* remove space in "Transport".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-26 13:03:26 +08:00
Kamil Braun
fd32e2ee10 Merge 'misc_services: fix data race from bad usage of get_next_version' from Piotr Dulikowski
The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc.

The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0.

The PR fixes the race by moving the call to `versioned_value::cache_hitrates` to shard 0. Additionally, in order to help detect similar issues in the future, a check is introduced to `get_next_version` which aborts the process if the function was called on other shard than 0.

There is a possibility that it is a fix for #17493. Because `get_next_version` uses a simple incrementation to advance the global counter, a data race can occur if two shards call it concurrently and it may result in shard 0 returning the same or smaller value when called two times in a row. The following sequence of events is suspected to occur on node A:

1. Shard 1 calls `get_next_version()`, loads version `v - 1` from the global counter and stores in a register; the thread then is preempted,
2. Shard 0 executes `add_local_application_state()` which internally calls `get_next_version()`, loads `v - 1` then stores `v` and uses version `v` to update the application state,
3. Shard 0 executes `add_local_application_state()` again, increments version to `v + 1` and uses it to update the application state,
4. Gossip message handler runs, exchanging application states with node B. It sends its application state to B. Note that the max version of any of the local application states is `v + 1`,
5. Shard 1 resumes and stores version `v` in the global counter,
6. Shard 0 executes `add_local_application_state()` and updates the application state - again - with version `v + 1`.
7. After that, node B will never learn about the application state introduced in point 6. as gossip exchange only sends endpoint states with version larger than the previous observed max version, which was `v + 1` in point 4.

Note that the above scenario was _not_ reproduced. However, I managed to observe a race condition by:

1. modifying Scylla to run update of `CACHE_HITRATES` much more frequently than usual,
2. putting an assertion in `add_local_application_state` which fails if the version returned by `get_next_version` was not larger than the previous returned value,
3. running a test which performs schema changes in a loop.

The assertion from the second point was triggered. While it's hard to tell how likely it is to occur without making updates of cache hitrates more frequent - not to mention the full theorized scenario - for now this is the best lead that we have, and the data race being fixed here is a real bug anyway.

Refs: #17493

Closes scylladb/scylladb#17499

* github.com:scylladb/scylladb:
  version_generator: check that get_next_version is called on shard 0
  misc_services: fix data race from bad usage of get_next_version
2024-02-25 19:35:34 +01:00
Gleb Natapov
59df47920b topology coordinator: fix use after free in rollback_to_normal state
node.rs pointer can be freed while guard is released, so it cannot be
accessed during error processing. Save state locally.

Fixes scylladb/scylladb#17402
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6993/

Message-ID: <ZdtJNJM056r4EZzz@scylladb.com>
2024-02-25 16:34:19 +02:00
Raphael S. Carvalho
f07c233ad5 Fix potential data resurrection when another compaction type does cleanup work
Since commit f1bbf70, many compaction types can do cleanup work, but turns out
we forgot to invalidate cache on their completion.

So if a node regains ownership of token that had partition deleted in its previous
owner (and tombstone is already gone), data can be resurrected.

Tablet is not affected, as it explicitly invalidates cache during migration
cleanup stage.

Scylla 5.4 is affected.

Fixes #17501.
Fixes #17452.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17502
2024-02-25 13:08:04 +02:00
Yaron Kaikov
493327afd8 [action] Add promoted label when commits are in master
In Scylla, we don't merge our PR but use ./script/pull_github_pr.shto close the pull request, addingcloses scylladb/scylladb remark and push changes tonext` branch.
One of the conditions for opening a backport PR is that all relevant commits are in master (passed gating), in this GitHub action, we will go through the list of commits once a push was made to master and will identify the relevant PR, and add promoted label to it. This will allow Mergify to start the process of backporting
2024-02-25 11:56:50 +02:00
Nadav Har'El
b4cef638ef Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define formatters for

* canonical_mutation
* atomic_cell_view
* atomic_cell
* atomic_cell_or_collection::printer

Refs #13245

Closes scylladb/scylladb#17506

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for canonical_mutation
  mutation: add fmt::formatter for atomic_cell_view and atomic_cell
  mutation: add fmt::formatter for atomic_cell_or_collection::printer
2024-02-25 09:48:56 +02:00
Kefu Chai
84ba624415 mutation: add fmt::formatter for canonical_mutation
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for canonical_mutation

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-25 12:48:13 +08:00
Kefu Chai
3625796222 mutation: add fmt::formatter for atomic_cell_view and atomic_cell
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* atomic_cell_view
* atomic_cell

and drop their operator<<:s.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-25 12:19:11 +08:00
Kefu Chai
b4fa32ec17 mutation: add fmt::formatter for atomic_cell_or_collection::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`atomic_cell_or_collection::printer`, and drop its operator<<.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-25 12:18:41 +08:00
Lakshmi Narayanan Sreethar
c7eab9329f test/topology_custom: add testcase to verify reshape with tablets
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar
ed2d8529f3 test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar
7196d2fff4 replica/distributed_loader: enable reshape for sstables
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar
83fecc2f1f compaction: reshape sstables within compaction groups
For tables using tablet based replication strategies, the sstables
should be reshaped only within the compaction groups they belong to.
Updated shard_reshaping_compaction_task_impl to group the sstables based
on their compaction groups before reshaping them within the groups.

Fixes #16966

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 18:43:39 +05:30
Piotr Dulikowski
54546e1530 version_generator: check that get_next_version is called on shard 0
The get_next_version function can only be safely called from shard 0,
but this constraint is not enforced in any way. As evidenced in the
previous commit, it is easy to accidentally call it from a non-zero
shard.

Introduce a runtime check to get_next_version which calls
on_fatal_internal_error if it detects that the function was called form
the wrong shard. This will let us detect cross-shard use issues in
runtime.
2024-02-23 13:49:49 +01:00
Piotr Dulikowski
21d5d4e15c misc_services: fix data race from bad usage of get_next_version
The function `gms::version_generator::get_next_version()` can only be
called from shard 0 as it uses a global, unsynchronized counter to
issue versions. Notably, the function is used as a default argument for
the constructor of `gms::versioned_value` which is used from shorthand
constructors such as `versioned_value::cache_hitrates`,
`versioned_value::schema` etc.

The `cache_hitrate_calculator` service runs a periodic job which
updates the `CACHE_HITRATES` application state in the local gossiper
state. Each time the job is scheduled, it runs on the next shard (it
goes through shards in a round-robin fashion). The job uses the
`versioned_value::cache_hitrates` shorthand to create a
`versioned_value`, therefore risking a data race if it is not currently
executing on shard 0.

Fix the race by constructing the versioned value on shard 0.
2024-02-23 12:54:32 +01:00
Kefu Chai
496cf9a1d8 interval: add fmt::formatters for managed_bytes and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* wrapping_interval
* interval

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17488
2024-02-23 10:26:30 +02:00
Nadav Har'El
0aaa6b1a08 fmt: add formatter for mutation_fragment_v2::kind
Unfortunately, fmt v10 dropped support for operator<< formatters,
forcing us to replace the huge number of operator<< implementations
in our code by uglier and templated fmt::formatter implementations
to get Scylla to compile on modern distros (such as Fedora 39) :-(

Kefu has already started doing this migration, here is my small
contribution - the formatter for mutation_fragment_v2::kind.
This patch is need to compile, for example,
build/dev/mutation/mutation_fragment_stream_validator.o.

I can't remove the old operator<< because it's still used by
the implementation of other operator<< functions. We can remove
all of them when we're done with this coversion. In the meantime,
I replaced the original implementation of operator<< by a trivial
implementation just passing the work to the new fmt::print support.

Refs #13245

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17432
2024-02-23 10:25:39 +02:00
Botond Dénes
c1267900c6 Merge 'sstables: add fmt::formatter for sstable types' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* bound_kind_m
* sstable_state
* indexable_element
* deletion_time

drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17490

* github.com:scylladb/scylladb:
  sstables: add fmt::formatter for deletion_time
  sstable: add fmt::formatter for indexable_element
  sstables: add fmt::foramtter for sstable_state
  sstables: add fmt::formatter for sstables::bound_kind_m
2024-02-23 10:09:26 +02:00
Botond Dénes
89efa89dd7 Merge 'test: add fmt::formatters' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for some types used in testing.

Refs #13245

Closes scylladb/scylladb#17485

* github.com:scylladb/scylladb:
  test/unit: add fmt::formatter for tree_test_key_base
  test: add printer for type for BOOST_REQUIRE_EQUAL
  test: add fmt::formatters
  test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result
2024-02-23 09:32:39 +02:00
Botond Dénes
1f363a876e Merge 'utils: add fmt::formatter for occupancy_stats, managed_bytes and friends ' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* managed_bytes
* managed_bytes_view
* managed_bytes_opt
* occupancy_stats

and drop their operator<<:s

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#17462

* github.com:scylladb/scylladb:
  utils/managed_bytes: add fmt::formatters for managed_bytes and friends
  utils/logalloc: add fmt::formatter for occupancy_stats
2024-02-23 09:31:22 +02:00
Botond Dénes
d314ad2725 Merge 'sstables: close index_reader in has_partition_key' from Aleksandra Martyniuk
If index_reader isn't closed before it is destroyed, then ongoing
sstables reads won't be awaited and assertion will be triggered.

Close index_reader in has_partition_key before destroying it.

Fixes: #17232.

Closes scylladb/scylladb#17355

* github.com:scylladb/scylladb:
  test: add test to check if reader is closed
  sstables: close index_reader in has_partition_key
2024-02-23 09:27:55 +02:00
Kefu Chai
010fb5f323 tools/scylla-nodetool: make keyspace argument optional for "ring"
the "keyspace" argument of the "ring" command is optional. but before
this change, we considered it a mandatory option. it was wrong.

so, in this change, we make it optional, and print out the warning
message if the keyspace is not specified.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17472
2024-02-23 09:25:29 +02:00
Kefu Chai
6800810dba interval, multishard_mutation_query: fix typos in comments
these misspellings were identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17491
2024-02-23 09:06:24 +02:00
Botond Dénes
a08d9ba2a4 Merge 'tools/scylla-nodetool: fixes to address test failures with dtest' from Kefu Chai
* tighten the param check for toppartitions
* add an extra empty line inbetween reports

Closes scylladb/scylladb#17486

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: add an extra empty line inbetween reports
  tools/scylla-nodetool: tighten the param check for toppartitions
2024-02-23 09:05:30 +02:00
Botond Dénes
959d33ba39 Merge 'repair: streaming: handle no_such_column_family from remote node' from Aleksandra Martyniuk
RPC calls lose information about the type of returned exception.
Thus, if a table is dropped on receiver node, but it still exists
on a sender node and sender node streams the table's data, then
the whole operation fails.

To prevent that, add a method which synchronizes schema and then
checks, if the exception was caused by table drop. If so,
the exception is swallowed.

Use the method in streaming and repair to continue them when
the table is dropped in the meantime.

Fixes: #17028.
Fixes: #15370.
Fixes: #15598.

Closes scylladb/scylladb#17231

* github.com:scylladb/scylladb:
  repair: handle no_such_column_family from remote node gracefully
  test: test drop table on receiver side during streaming
  streaming: fix indentation
  streaming: handle no_such_column_family from remote node gracefully
  repair: add methods to skip dropped table
2024-02-23 08:25:45 +02:00
Kefu Chai
3574c22d73 test/nodetool/utils: print out unmatched output on test failure
would be more helpful if the matched could print out the unmatched
output on test failure. so, in this change, both stdout and stderr
are printed if they fail to match with the expected error.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17489
2024-02-23 08:20:30 +02:00
Botond Dénes
234aa99aaa Merge 'tools/scylla-nodetool: extract and use {yaml,json}_writers' from Kefu Chai
simpler this way.

Closes scylladb/scylladb#17437

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation
  tools/scylla-nodetool: add {json,yaml}_writer
2024-02-23 08:13:07 +02:00
Kefu Chai
3a3f0d392f gms/versioned_value: impl operator<<(.., const gms::versioned_value) using fmt
less repeatings this way. this is also a follow-up change of
cb781c0ff7.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17390
2024-02-23 08:11:03 +02:00
Kefu Chai
62abf89312 sstables: add fmt::formatter for deletion_time
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::deletion_time`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:56:32 +08:00
Kefu Chai
a5a757387a sstable: add fmt::formatter for indexable_element
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::indexable_element`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:56:28 +08:00
Kefu Chai
5754b9eb08 sstables: add fmt::foramtter for sstable_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::sstable_state`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:55:49 +08:00
Kefu Chai
9a32029a8f sstables: add fmt::formatter for sstables::bound_kind_m
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `sstables::bound_kind_m`,
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 13:55:22 +08:00
Kefu Chai
67c69be3c6 tools/scylla-nodetool: add an extra empty line inbetween reports
before this change, `toppartitions` does not print an empty line
after an empty sampling warning message. but
dtest/toppartitions_test.py actually split sampling reports with
two newlines, so let's appease it. the output also looks better
this way, as the samplings for READS and WRITES are always visually
separated with an empty line.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 12:57:51 +08:00
Kefu Chai
381c389b56 tools/scylla-nodetool: tighten the param check for toppartitions
the test cases of `test_any_of_required_parameters_is_missing`
considers that we should either pass all positional argument or
pass none of them, otherwise nodetool should fail. but `scylla nodetool`
supported partial positional argument.

to be more consistent with the expected behavior, in this change,
we enforce the sanity check so that we only accept either all
positional args or none of them. the corresponding test is added.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 12:57:51 +08:00
Kefu Chai
3835ebfcdc utils/managed_bytes: add fmt::formatters for managed_bytes and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* managed_bytes
* managed_bytes_view
* managed_bytes_opt

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 11:32:41 +08:00
Kefu Chai
3d9054991b utils/logalloc: add fmt::formatter for occupancy_stats
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `occupancy_stats`, and
drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 11:32:41 +08:00
Avi Kivity
bf107dae84 test/unit: add fmt::formatter for tree_test_key_base
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for the classes derived from `tree_test_key_base`

(this change was extracted from a larger change at #15599)

Refs #13245
2024-02-23 10:52:12 +08:00
Kefu Chai
a70318e722 test: add printer for type for BOOST_REQUIRE_EQUAL
after dropping the operator<< for vector, we would not able to
use BOOST_REQUIRE_EQUAL to compare vector<>. to be prepared for this,
less defined the printer for Boost.test

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:52:12 +08:00
Kefu Chai
63396f780d test: add fmt::formatters
the operator<< for `cql3::expr::test_utils::mutation_column_value` is
preserved, as it used by test/lib/expr_test_utils.cc, which prints
std::map<sstring, cql3::expr::test_utils::mutation_column_value> using
the homebrew generic formatter for std::map<>. and the formatter uses
operator<< for printing the elements in map.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:52:12 +08:00
Kefu Chai
2ccd9e695d test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* scheduling_latency_measurer
* perf_result

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:17:50 +08:00
Lakshmi Narayanan Sreethar
c76871aa65 replica/table : add method to get compaction group id for an sstable
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 01:07:54 +05:30
Lakshmi Narayanan Sreethar
9fffd8905f compaction: reshape: update total reshaped size only on success
The total reshaped size should only be updated on reshape success and
not after reshape has been failed due to some exception.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 01:07:54 +05:30
Lakshmi Narayanan Sreethar
4fb099659a compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run
Catch and handle the exceptions directly instead of rethrowing and
catching again.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-23 01:07:54 +05:30
Pavel Emelyanov
5682e51a97 test.py: Add test-case splitting in 'name' selection
When filtering a test by 'name' consider that name can be in a
'test::case' format. If so, get the left part to be the filter and the
right part to be the case name to be passed down to test itself.

Later, when the pytest starts it then appends the case name (if not
None) to the pytest execution, thus making it run only the specified
test-case, not the whole test file.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 19:24:10 +03:00
Pavel Emelyanov
b64710b0c6 test.py: Add casename argument to PythonTest
And propagate it from add_test() helper. For now keep it None, next
patch will bring more sense to this place

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 19:23:06 +03:00
Amnon Heiman
8859b4d991 Adding scripts/metrics-config.yml
The scripts/metrics-config.yml is a configuration file used by
get_description.py. It covers the places in the code that uses
non-standard way of defining metrics.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-02-22 17:15:30 +02:00
Amnon Heiman
4e67a98a21 Adding scripts/get_description.py to fetch metrics description
The get_description script parse a c++ file and search of metrics
decleration and their description.

It create a pipe delimited file with the metric name, metric family
name,description and location in file.

To find all description in all files:
find . -name "*.cc" -exec grep -l '::description' {} \; | xargs -i ./get_description.py {}

While many of the metrics define in the form of
_metrics.add_group("hints_manager", {
        sm::make_gauge("size_of_hints_in_progress", _stats.size_of_hints_in_progress,
                        sm::description("Size of hinted mutations that are scheduled to be written.")),

Some metrics decleration uses variable and string format.
The script uses a configuration file to translate parameters and
concatenations to the actual names.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-02-22 17:06:26 +02:00
Anna Stuchlik
14a4fa16a8 doc: add placeholder for Enable Raft topology page
This commit adds a placeholder for the Enable Raft-based Topology page
in the 5.4-to-6.0 upgrade guide.
This page needs to be referenced from other pages in the docs.
2024-02-22 16:02:06 +01:00
Pavel Emelyanov
5afaa03241 test/object_store: Remove unused managed_cluster (and other stuff)
Now all test cases use pylib manager client to manipulate cluster
While at it -- drop more unused bits from suite .py files

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:40:25 +03:00
Kefu Chai
57c408ab5d alternator: add fmt::formatter for alternator::parsed::path
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `alternator::parsed::path`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17458
2024-02-22 16:40:01 +02:00
Pavel Emelyanov
95ed46e26a test/object_store: Use tmpdir fixture in flush-retry case
Now when the test case in question is not using ManagerCluster, there's
no point in using test_tempdir either and the temporary object-store
config can be generated in generic temporary directory

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:39:30 +03:00
Pavel Emelyanov
252688fe0c test/object_store: Turn flush-retry case to use ManagerClient
In the middle this test case needs to force scylla server reload its
configs. Currently manager API requires that some existing config option
is provided as an argument, but in this test case scylla.yaml remains
intact. So it satisfies the API with non-chaning option.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Pavel Emelyanov
e742906f1f test/object_store: Turn "misconfigured" case to use ManagerClient
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Pavel Emelyanov
857b48f950 test/object_store: Turn garbage-collect case to use ManagerClient
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Pavel Emelyanov
d27b91cfb4 test/object_store: Turn basic case to use ManagerClient
This case is a bit tricky, as it needs to know where scylla's workdir
is, so it replaces the use of test_tempdir with the call to manager to
get server's workdir.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:32:34 +03:00
Avi Kivity
67f8dc5a7c Merge 'mutation: add fmt::formatter for clustering_row, row_tombstone and friends' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* row_tombstone
* row_marker
* deletable_row::printer
* row::printer
* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17461

* github.com:scylladb/scylladb:
  mutation: add fmt::formatter for clustering_row and friends
  mutation: add fmt::formatter for row_tombstone and friends
2024-02-22 16:16:26 +02:00
Pavel Emelyanov
89d0704d9b test/object_store: Prepare to work with ManagerClient
This includes

- marking the suite as Topology
- import needed fixtures and options from topology conftest
- configuring the zero initial cluster size and anonymous auth
- marking all test cases as skipped, as they no longer work after above

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 17:02:05 +03:00
Aleksandra Martyniuk
4530be9e5b test: add test to check if reader is closed
Add test to check if reader is closed in sstable::has_partition_key.
2024-02-22 14:53:14 +01:00
Aleksandra Martyniuk
5227336a32 sstables: close index_reader in has_partition_key
If index_reader isn't closed before it is destroyed, then ongoing
sstables reads won't be awaited and assertion will be triggered.

Close index_reader in has_partition_key before destroying it.
2024-02-22 14:53:07 +01:00
Yaron Kaikov
6d07f7a0ea Add mergify (https://mergify.com/) configuration file
In this PR we introduce the .mergify.yml configuration file, which
include a set of rules that we will use for automating our backport
process.
For each supported OSS release (currently 5.2 and 5.4) we have an almost
identical configuration section which includes the four conditions before
we open a backport pr:

* PR should be closed
* PR should have the proper label. for example: backport/5.4 (we can
have multiple labels)
* Base branch should be master
* PR should be set with a promoted label - this condition will be set
automatically once the commits are promoted to the master branch (passed
gating)

Once all conditions are applied, the verify bot will open a backport PR and
will assign it to the author of the original PR, then CI will start
running, and only after it pass. we merge
2024-02-22 14:28:08 +02:00
Nadav Har'El
b0233c0833 Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity
Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token.

We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility.

Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping.

We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias.

Closes scylladb/scylladb#17455

* github.com:scylladb/scylladb:
  interval: rename nonwrapping_interval to interval
  interval: rename interval_test to wrapping_interval_test
2024-02-22 14:03:43 +02:00
Kefu Chai
8afdc503b8 cdc: s/string_view/std::string_view/
in af2553e8, we added formatters for cdc::image_mode and
cdc::delta_mode. but in that change, we failed to qualify `string_view`
with `std::` prefix. even it compiles, it depends on a `using
std::string_view` or a more error-prone `using namespace std`.
neither of which shold be relied on. so, in this change, we
add the `std::` prefix to `string_view`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17459
2024-02-22 13:49:19 +02:00
Avi Kivity
35b700a884 Merge 'compaction: add fmt::formatter for types' from Kefu Chai
* `sstables::compaction_type`
* `sstables::compaction_type_options::scrub::mode`
* `sstables::compaction_type_options::scrub::quarantine_mode`
* `formatted_sstables_list`

Refs #13245

Closes scylladb/scylladb#17439

* github.com:scylladb/scylladb:
  compaction: add formatter for formatted_sstables_list
  compaction: add fmt::formatter for compaction_type and friends
2024-02-22 13:48:30 +02:00
Pavel Emelyanov
027282ee07 perf_simple_query: Add --memtable-partitions option
There's the --partitions one that specifies how many partitions the test
would generate before measuring. When --bypass-cache option is in use,
thus making the test alway engage sstables readers, it makes sense to
add some control over sstables granularity. The new option suggests that
during population phase, memtable gets flushed every $this-number
partitions, not just once at the end (and unknown amount of times in the
middle because of dirty memory limit).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:44:17 +03:00
Pavel Emelyanov
fd4c2e607e perf_simple_query: Disable auto compaction
Usually a perf test doesn't expect that some activity runs in the
background without controls. Compaction is one of a kind, so it makes
sense to keep it off while running the measurement.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:43:23 +03:00
Pavel Emelyanov
74899f71de perf_simple_query: Keep number of initial tablets in output json
When producing the output json file, keep how many initial tablets were
requested (if at all) next to other workload parameters

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:42:39 +03:00
Kefu Chai
643c01fd80 locator: fix typo in comment -- s/slecting/selecting/
fix a typo

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17470
2024-02-22 13:28:18 +02:00
Avi Kivity
89f86962f5 Merge 'streaming: add fmt::formatter for stream_session_state and stream_request' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* `streaming::stream_request`,
* `stream_session_state`

and drop their operator<<:s

Refs #13245

Closes scylladb/scylladb#17464

* github.com:scylladb/scylladb:
  streaming: add fmt::formatter for streaming::stream_request
  streaming: add fmt::formatter for stream_session_state
2024-02-22 13:04:02 +02:00
Kefu Chai
5c0952ab59 compaction: add fmt::formatter for compaction_type and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* `sstables::compaction_type`
* `sstables::compaction_type_options::scrub::mode`
* `sstables::compaction_type_options::scrub::quarantine_mode``

and drop their operator<<:s.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17441
2024-02-22 13:02:37 +02:00
Kamil Braun
3d15fecf12 Merge 'amend cluster_status_table virtual table to work with raft' from Gleb
cluster_status_table virtual table have a status field for each node. In
gossiper mode the status is taken from the gossiper, but with raft the
states are different and are stored in the topology state machine. The
series fixes the code to check current mode and take the status from
correct place.

Refs scylladb/scylladb#16984

* 'gleb/cluster_status_table-v1' of github.com:scylladb/scylla-dev:
  gossiper: remove unused REMOVAL_COORDINATOR state
  virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled
  virtual_tables: create result for  cluster_status_table read on shard 0
2024-02-22 11:47:57 +01:00
Kamil Braun
3ee56e1936 Merge 'raft topology: enable writes to previous CDC generations' from Patryk Jędrzejczak
When we create a CDC generation and ring-delay is non-zero, the
timestamp of the new generation is in the future. Hence, we can
have multiple generations that can be written to. However, if we
add a new node to the cluster with the Raft-based topology, it
receives only the last committed generation. So, this node will
be rejecting writes considered correct by the other nodes until
the last committed generation starts operating.

In scylladb/scylladb#17134, we have allowed sending writes to the
previous CDC generations. So, the situation became even more
complicated. This PR adjusts the Raft-based topology
to ensure all required generations are loaded into memory and their
data isn't cleared too early.

To load all required generations into memory, we replace
`current_cdc_generation_{uuid, timestamp}` with the set containing
IDs of all committed generations - `committed_cdc_generations`.
To ensure this set doesn't grow endlessly, we remove an entry from
this set together with the data in CDC_GENERATIONS_V3.

Currently, we may clear a CDC generation's data from
CDC_GENERATIONS_V3 if it is not the last committed generation
and it is at least 24 hours old (according to the topology
coordinator's clock). However, after allowing writes to the
previous CDC generations, this condition became incorrect. We
might clear data of a generation that could still be written to.
The new solution introduced in this PR is to clear data of the
generations that finished operating more than 24 hours ago.

Apart from the changes mentioned above, this PR hardens
`test_cdc_generation_clearing.py`.

Fixes scylladb/scylladb#16916
Fixes scylladb/scylladb#17184
Fixes scylladb/scylladb#17288

Closes scylladb/scylladb#17374

* github.com:scylladb/scylladb:
  test: harden test_cdc_generation_clearing
  test: test clean-up of committed_cdc_generations
  raft topology: clean committed_cdc_generations
  raft topology: clean only obsolete CDC generations' data
  storage_service: topology_state_load: load all committed CDC generations
  system_keyspace: load_topology_state: fix indentation
  raft topology: store committed CDC generations' IDs in the topology
2024-02-22 11:41:25 +01:00
Gleb Natapov
fe5853aacc storage_service: disable removenode --force in raft mode and deprecate it for gossiper mode
removenode --force is an unsafe operation and does not even make sense with
topology over raft. This patch disables it if raft is enabled and prints
a deprecation note otherwise. We already have a PR to remove it
(https://github.com/scylladb/scylladb/pull/15834), but it was decided
there that a deprecation period is needed for legacy use case.

Fixes: scylladb/scylladb#16293
2024-02-22 11:08:57 +01:00
Kefu Chai
37c6073fd5 mutation: add fmt::formatter for clustering_row and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* clustering_row::printer
* static_row::printer
* partition_start
* partition_end
* mutation_fragment::printer

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 17:53:34 +08:00
Kefu Chai
9ee728dab9 scylla-gdb: use raw string when '\' is not used in an escape sequence
when '\' does not start an escape sequence, Python complains at seeing
it. but it continues anyway by considering '\' as a separate char.
but the warning message is still annoying:

```
scylla-gdb.py: 2417: SyntaxWarning: invalid escape sequence '\-'
  branches = (r" |-- ", " \-- ")
```

when sourcing this script.

so, let's mark these strings as raw strings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17466
2024-02-22 09:03:26 +02:00
Kefu Chai
4ee2aee279 tools/scylla-nodetool: define operator<< for vector<sstring>
we already have generic operator<< based formatter for sequence-alike
ranges defined in `utils/to_string.hh`, but as a part of efforts to
address #13245, we will eventually drop the formatter.

to prepare for this change, we should create/find the alternatives
where the operator<< for printing the ranges is still used.
Boost::program_options is one of them. it prints the options' default
values using operator<< in its error message or usage. so in order
to keep it working, we define operator<< for `vector<sstring>` here.
if there are more types are required, we will need the generalize
this formatter. if there are more needs from different compiling
units, we might need to extract this helper into, for instance,
`utils/to_string.hh`. but we should do this after removing it.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17413
2024-02-22 09:01:04 +02:00
Kefu Chai
da7ffd4e73 tools/scylla-types: print using managed_bytes
instead of materializing the `managed_bytes_view` to a string, and
print it, print it directly to stdout. this change helps to deprecate
`to_hex()` helpers, we should materialize string only when necessary.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17463
2024-02-22 09:00:38 +02:00
Kefu Chai
f644ba9cdc streaming: add fmt::formatter for streaming::stream_request
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `streaming::stream_request`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 14:03:59 +08:00
Kefu Chai
618091f6f7 streaming: add fmt::formatter for stream_session_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`streaming::stream_session_state`, and drop its operator<<

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 14:03:59 +08:00
Kefu Chai
b61b5a8b5d mutation: add fmt::formatter for row_tombstone and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* row_tombstone
* row_marker
* deletable_row::printer
* row::printer

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-22 12:44:33 +08:00
Avi Kivity
51df8b9173 interval: rename nonwrapping_interval to interval
Our interval template started life as `range`, and was supported
wrapping to follow Cassandra's convention of wrapping around the
maximum token.

We later recognized that an interval type should usually be non-wrapping
and split it into wrapping_range and nonwrapping_range, with `range`
aliasing wrapping_range to preserve compatibility.

Even later, we realized the name was already taken by C++ ranges and
so renamed it to `interval`. Given that intervals are usually non-wrapping,
the default `interval` type is non-wrapping.

We can now simplify it further, recognizing that everyone assumes
that an interval is non-wrapping and so doesn't need the
nonwrapping_interval_designation. We just rename nonwrapping_interval
to `interval` and remove the type alias.
2024-02-21 19:43:17 +02:00
Avi Kivity
e338f0e009 interval: rename interval_test to wrapping_interval_test
As preparation for reclaiming the name `interval` for nonwrapping_interval,
rename interval_test to wrapping_interval_test.
2024-02-21 19:38:53 +02:00
Avi Kivity
1df5697bd7 Merge 'Refine some api/column_family endpoints' from Pavel Emelyanov
Those that collect vectors with ks/cf names can reserve the vectors in advance. Also one of those can use range loop for shorter code

Closes scylladb/scylladb#17433

* github.com:scylladb/scylladb:
  api: Reserve vectors in advance
  api: Use range-loop to iterate keyspaces
2024-02-21 19:19:28 +02:00
Tomasz Grabiec
ef9e5e64a3 locator: token_metadata: Introduce topology barrier stall detector
When topology barrier is blocked for longer than configured threshold
(2s), stale versions are marked as stalled and when they get released
they report backtrace to the logs. This should help to identify what
was holding for token metadata pointer for too long.

Example log:

  token_metadata - topology version 30 held for 299.159 [s] past expiry, released at:  0x2397ae1 0x23a36b6 ...

Closes scylladb/scylladb#17427
2024-02-21 15:05:34 +02:00
Nadav Har'El
e02cfd0035 Merge 'query*.h: add fmt::formatter for types' from Kefu Chai
* query::specific_ranges
* query::partition_slice
* query::read_command
* query::forward_request
* query::forward_request::reduction_type
* query::forward_request::aggregation_info
* query::forward_result::printer
* query::result_set
* query::result_set_row
* query::result::printer

Refs #13245

Closes scylladb/scylladb#17440

* github.com:scylladb/scylladb:
  query-result.hh: add formatter for query::result::printer
  query-result-set: add formatter for query-result-set.hh types
  query-request: add formatter for query-request.hh types
2024-02-21 14:46:36 +02:00
Avi Kivity
4be70bfc2b Merge 'multishard_mutation_query: add tablets support' from Botond Dénes
When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with database::query[_mutations](), merging the results into a single
instance. This makes the code simple. For tablets multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance.

Fixes: #16484

Closes scylladb/scylladb#16802

* github.com:scylladb/scylladb:
  test/cql-pytest: remove skip_with_tablets fixture
  test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests
  test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets
  multishard_mutation_query: add tablets support
  multishard_mutation_query: remove compaction-state from result-builder factory
  multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>>
  mutation_query: reconcilable_result: add merge_disjoint()
  locator: introduce tablet_range_spliter
  dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive
  interval: add before() overload which takes another interval
2024-02-21 13:40:55 +02:00
Botond Dénes
94dac43b2f tools/utils: configure tools to use the epoll reactor backend
The default AIO backend requires AIO blocks. On production systems, all
available AIO blocks could have been already taken by ScyllaDB. Even
though the tools only require a single unit, we have seen cases where
not even that is available, ScyllDB having siphoned all of the available
blocks.
We could try to ensure all deployments have some spare blocks, but it is
just less friction to not have to deal with this problem at all, by just
using the epoll backend. We don't care about performance in the case of
the tools anyway, so long as they are not unreasonably slow. And since
these tools are replacing legacy tools written in Java, the bar is low.

Closes scylladb/scylladb#17438
2024-02-21 11:58:09 +02:00
Kefu Chai
1263494dd1 query-result.hh: add formatter for query::result::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for following types

* query::result::printer

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:57:18 +08:00
Kefu Chai
e5a930e7c6 query-result-set: add formatter for query-result-set.hh types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for following types

* query::result_set
* query::result_set_row

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:54:48 +08:00
Kefu Chai
4383ca431c query-request: add formatter for query-request.hh types
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for following types

* query::specific_ranges
* query::partition_slice
* query::read_command
* query::forward_request
* query::forward_request::reduction_type
* query::forward_request::aggregation_info
* query::forward_result::printer

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:54:41 +08:00
Kefu Chai
6408834e33 compaction: add formatter for formatted_sstables_list
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `formatted_sstables_list`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:45:45 +08:00
Kefu Chai
9969d88d82 compaction: add fmt::formatter for compaction_type and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* `sstables::compaction_type`
* `sstables::compaction_type_options::scrub::mode`
* `sstables::compaction_type_options::scrub::quarantine_mode`

and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 17:45:40 +08:00
Kefu Chai
61308d51ef tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation
simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 16:49:30 +08:00
Kefu Chai
e9e558534a tools/scylla-nodetool: add {json,yaml}_writer
so that we have less repeatings for dumping the metrics. the repeatings
are error-prone and not maintainable. also move them out into a separate
header, to keep fit of this source file -- it's now 3000 LOC. also,
by moving them out, we can reuse them in other subcommands without
moving them to the top of this source file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-21 16:49:30 +08:00
Botond Dénes
ca585903b7 test/cql-pytest: remove skip_with_tablets fixture
All tests that used it are fixed, and we should not add any new tests
failing with tablets from now on, so remove.
2024-02-21 02:08:49 -05:00
Botond Dénes
8df82d4781 test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests
To run with both vnodes and tablets. For this functionality, both
replication methods should be covered with tests, because it uses
different ways to produce partition lists, depending on the replication
method.

Also add scylla_only to those tests that were missing this fixture
before. All tests in this suite are scylla-only and with the
parameterization, this is even more apparent.
2024-02-21 02:08:49 -05:00
Botond Dénes
b09b949159 test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets
The underlying functionality was fixed, the tests should now pass with
tablets.
2024-02-21 02:08:49 -05:00
Botond Dénes
ce472b33b8 multishard_mutation_query: add tablets support
When reading a list of ranges with tablets, we don't need a multishard
reader. Instead, we intersect the range list with the local nodes tablet
ranges, then read each range from the respective shard.
The individual ranges are read sequentially, with
database::query[_mutations](), merging the results into a single
instance. This makes the code simple. For tablets,
multishard_mutation_query.cc is no longer on the hot paths, range scans
on tables with tablets fork off to a different code-path in the
coordinator. The only code using multishard_mutation_query.cc are
forced, replica-local scans, like those used by SELECT * FROM
MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests,
so we optimize for simplicity, not performance.
2024-02-21 02:08:48 -05:00
Botond Dénes
d160a179ee multishard_mutation_query: remove compaction-state from result-builder factory
This param was used by the query-result builder, to set the
last-position on end-of-stream. Instead, do this via a new ResultBuilder
method, maybe_set_last_position(), which is called from read_page(),
which has access to the compaction-state.
With this, the ResultBuilder can be created without a compaction-state
at hand. This will be important in the next patch.
2024-02-21 02:08:48 -05:00
Botond Dénes
95bc0cb1c0 multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>>
Makes future patching easier.
2024-02-21 02:08:48 -05:00
Botond Dénes
35e6cbf42e mutation_query: reconcilable_result: add merge_disjoint()
Merging two disjoint reconcilable_result instances.
2024-02-21 02:08:48 -05:00
Botond Dénes
7bdd0c2cae locator: introduce tablet_range_spliter
Given a list of partition-ranges, yields the intersection of this
range-list, with that of that tablet-ranges, for tablets located on the
given host.
This will be used in multishard_mutation_query.cc, to obtain the ranges
to read from the local node: given the read ranges, obtain the ranges
belonging to tablets who have replicas on the local node.
2024-02-21 02:08:48 -05:00
Botond Dénes
4993d0e30a dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive
Consider the inclusiveness of the token-range's start and end bounds and
copy the flag to the output bounds, instead of assuming they are always
inclusive.
2024-02-21 02:08:48 -05:00
Botond Dénes
239484f259 interval: add before() overload which takes another interval
The current point variant cannot take inclusiveness into account, when
said point comes from another interval bound.
This method had no tests at all, so add tests covering both overloads.
2024-02-21 02:08:48 -05:00
Avi Kivity
605bf6e221 range.hh: retire
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.

Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.

The unit tests are renamed and range.hh is deleted.

Closes scylladb/scylladb#17428
2024-02-21 00:24:25 +02:00
Wojciech Mitros
4c767c379c mv: adjust the overhead estimation for view updates
In order to avoid running out of memory, we can't
underestimate the memory used when processing a view
update. Particularly, we need to handle the remote
view updates well, because we may create many of them
at the same time in contrast to local updates which
are processed synchronously.

After investigating a coredump generated in a crash
caused by running out of memory due to these remote
view updates, we found that the current estimation
is much lower than what we observed in practice; we
identified overhead of up to 2288 bytes for each
remote view update. The overhead consists of:
- 512 bytes - a write_response_handler
- less than 512 bytes - excessive memory allocation
for the mutation in bytes_ostream
- 448 bytes - the apply_to_remote_endpoints coroutine
started in mutate_MV()
- 192 bytes - a continuation to the coroutine above
- 320 bytes - the coroutine in result_parallel_for_each
started in mutate_begin()
- 112 bytes - a continuation to the coroutine above
- 192 bytes - 5 unspecified allocations of 32, 32, 32,
48 and 48 bytes

This patch changes the previous overhead estimate
of 256 bytes to 2288 bytes, which should take into
account all allocations in the current version of the
code. It's worth noting that changes in the related
pieces of code may result in a different overhead.

The allocations seem to be mostly captures for the
background tasks. Coroutines seem to allocate extra,
however testing shows that replacing a coroutine with
continuations may result in generating a few smaller
futures/continuations with a larger total size.
Besides that, considering that we're waiting for
a response for each remote view update, we need the
relatively large write_response_handler, which also
includes the mutation in case we needed to reuse it.

The change should not majorly affect workloads with many
local updates because we don't keep many of them at
the same time anyway, and an added benefit of correct
memory utilization estimation is avoiding evictions
of other memory that would be otherwise necessary
to handle the excessive memory used by view updates.

Fixes #17364

Closes scylladb/scylladb#17420
2024-02-21 00:05:49 +02:00
Tomasz Grabiec
e63d8ae272 Merge 'Handle tablet migration failure while streaming' from Pavel Emelyanov
It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration.

This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier.

To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field.

refs: #16527

Closes scylladb/scylladb#17360

* github.com:scylladb/scylladb:
  test/topology: Add checking error paths for failed migration
  topology.tablets_migration: Handle failed streaming
  topology.tablets_migration: Add cleanup_target transition stage
  topology.tablets_migration: Add revert_migration transition stage
  storage_service: Rewrap cleanup stage checking in cleanup_tablet()
  test/topology: Move helpers to get tablet replicas to pylib
2024-02-20 18:50:55 +01:00
Anna Stuchlik
37237407f6 doc: remove info about outdated versions
This PR removes information about outdated versions, including disclaimers and information when a given feature was added.
Now that the documentation is versioned, information about outdated versions is unnecessary (and makes the docs harder to read).

Fixes https://github.com/scylladb/scylladb/issues/12110

Closes scylladb/scylladb#17430
2024-02-20 19:32:13 +02:00
Pavel Emelyanov
ceac65be1e api: Reserve vectors in advance
Some endpoints in api/column_family fill vectors with data obtained from
database and return them back. Since the amount of data is known in
advance, it's good to reserve the vector.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 19:13:05 +03:00
Pavel Emelyanov
f3e58cb806 api: Use range-loop to iterate keyspaces
The code uses standard for (;;) loop, but range version is nicer

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 19:12:12 +03:00
Avi Kivity
93af3dd69b Merge 'Maintenance socket: set filesystem permissions to 660' from Mikołaj Grzebieluch
Set filesystem permissions for the maintenance socket to 660 (previously it was 755) to allow a scyllaadm's group to connect.
Split the logic of creating sockets into two separate functions, one for each case: when it is a regular cql controller or used by maintenance_socket.

Fixes https://github.com/scylladb/scylladb/issues/16487.

Closes scylladb/scylladb#17113

* github.com:scylladb/scylladb:
  maintenance_socket: add option to set owning group
  transport/controller: get rid of magic number for socket path's maximal length
  transport/controller: set unix_domain_socket_permissions for maintenance_socket
  transport/controller: pass unix_domain_socket_permissions to generic_server::listen
  transport/controller: split configuring sockets into separate functions
2024-02-20 15:09:54 +02:00
Botond Dénes
73a3a3faf3 Merge 'tools/scylla-nodetool: implement tablestats' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17387

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement tablestats
  utils/rjson: add templated streaming_writer::Write()
2024-02-20 14:46:07 +02:00
Botond Dénes
8c228bffc8 Merge 'repair: accelerate repair load_history time' from Xu Chang
Using `parallel_for_each_table` instance of `for_each_table_gently` on
`repair_service::load_history`, to reduced bootstrap time.
Using uuid_xor_to_uint32 on repair load_history dispatch to shard.

Ref: https://github.com/scylladb/scylladb/issues/16774

Closes scylladb/scylladb#16927

* github.com:scylladb/scylladb:
  repair: resolve load_history shard load skew
  repair: accelerate repair load_history time
2024-02-20 13:45:26 +02:00
Kefu Chai
b0bb3ab5b0 topology: print node* with node_printer
in da53854b66, we added formatter for printing a `node*`, and switched
to this formatter when printing `node*`. but we failed to update some
caller sites when migrating to the new formatter, where a
`unique_ptr<node>` is printed instead. this is not the behavior before
the change, and is not expected.

so, in this change, we explicitly instantiate `node_printer` instances
with the pointer held by `unique_ptr<node>`, to restore the behavior
before da53854b66.

this issue was identified when compiling the tree using {fmt} v10 and
compile-time format-string check enabled, which is yet upstreamed to
Seastar.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17418
2024-02-20 14:35:56 +03:00
Patryk Jędrzejczak
419354bc9f test: harden test_cdc_generation_clearing
In one of the previous patches, we fixed scylladb/scylladb#16916 as
a side effect. We removed
`system_keyspace::get_cdc_generations_cleanup_candidate`, which
contained the bug causing the issue.

Even though we didn't have to fix this issue directly, it showed us
that `test_cdc_generation_clearing` was too weak. If something went
wrong during/after the only clearing, the test still could pass
because the clearing was the last action in the test. In
scylladb/scylladb#16916, the CDC generation publisher was stuck
after the clearing because of a recurring error. The test wouldn't
detect it. Therefore, we harden the test by expecting two clearings
instead of one. If something goes wrong during the first clearing,
there is a high chance that the second clearing will fail. The new
test version wouldn't pass with the old bug in the code.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
2b724735d1 test: test clean-up of committed_cdc_generations
We extend `test_cdc_generation_clearing`. Now, it also tests the
clean-up of `TOPOLOGY.committed_cdc_generations` added in the
previous patch.

In the implementation, we harden the already existing
`check_system_topology_and_cdc_generations_v3_consistency`. After
the previous patch, data of every generation present in
`committed_cdc_generations` should be present in CDC_GENERATIONS_V3.
In other words, `committed_cdc_generations` should always be a
subset of a set containing generations in CDC_GENERATIONS_V3.
Before the previous patch, this wasn't true after the clearing, so
the new version of `test_cdc_generation_clearing` wouldn't pass
back then.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
7301d1317b raft topology: clean committed_cdc_generations
We clean `TOPOLOGY.committed_cdc_generations` from obsolete
generations to ensure this set doesn't grow endlessly. After this
patch, the following invariant will be true: if a generation is in
`committed_cdc_generation`, its data is in CDC_GENERATIONS_V3.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
b8aa74f539 raft topology: clean only obsolete CDC generations' data
Currently, we may clear a CDC generation's data from
CDC_GENERATIONS_V3 if it is not the last committed generation
and it is at least 24 hours old (according to the topology
coordinator's clock). However, after allowing writes to the
previous CDC generations, this condition became incorrect. We
might clear data of a generation that could still be written to.

The new solution is to clear data of the generations that
finished operating more than 24 hours ago. The rationale behind
it is in the new comment in
`topology_coordinator:clean_obsolete_cdc_generations`.

The previous solution used the clean-up candidate. After
introducing `committed_cdc_generations`, it became unneeded.
The last obsolete generation can be computed in
`topology_coordinator:clean_obsolete_cdc_generations`. Therefore,
we remove all the code that handles the clean-up candidate.

After changing how we clear CDC generations' data,
`test_current_cdc_generation_is_not_removed` became obsolete.
The tested feature is not present in the code anymore.

`test_dependency_on_timestamps` became the only test case covering
the CDC generation's data clearing. We adjust it after the changes.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
8b214d02fb storage_service: topology_state_load: load all committed CDC generations
We load all committed CDC generations into `cdc::metadata`. Since
we have allowed sending writes to the previous generations in
scylladb/scylladb#17134, the committed generations may be necessary
to handle a correct request.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
18cff1aa6a system_keyspace: load_topology_state: fix indentation
Broken in the previous patch.
2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak
e145e758eb raft topology: store committed CDC generations' IDs in the topology
When we create a CDC generation and ring-delay is non-zero, the
timestamp of the new generation is in the future. Hence, we can
have multiple generations that can be written to. However, if we
add a new node to the cluster with the Raft-based topology, it
receives only the last committed generation. So, this node will
be rejecting writes considered correct by the other nodes until
the last committed generation starts operating.

In scylladb/scylladb#17134, we have allowed sending writes to the
previous CDC generations. So, the situation became even more
complicated. We need to adjust the Raft-based topology to ensure
all required generations are loaded into memory and their data
isn't cleared too early.

This patch is the first step of the adjustment. We replace
`current_cdc_generation_{uuid, timestamp}` with the set containing
IDs of all committed generations - `committed_cdc_generations`.
This set is sorted by timestamps, just like
`unpublished_cdc_generations`.

This patch is mostly refactoring. The last generation in
`committed_cdc_generations` is the equivalent of the previous
`current_cdc_generation_{uuid, timestamp}`. The other generations
are irrelevant for now. They will be used in the following patches.

After introducing `committed_cdc_generations`, a newly committed
generation is also unpublished (it was current and unpublished
before the patch). We introduce `add_new_committed_cdc_generation`,
which updates both sets of generations so that we don't have to
call `add_committed_cdc_generation` and
`add_unpublished_cdc_generation` together. It's easy to forget
that both of them are necessary. Before this patch, there was
no call to `add_unpublished_cdc_generation` in
`topology_coordinator::build_coordinator_state`. It was a bug
reported in scylladb/scylladb#17288. This patch fixes it.

This patch also removes "the current generation" notion from the
Raft-based topology. For the Raft-based topology, the current
generation was the last committed generation. However, for the
`cdc::metadata`, it was the generation operating now. These two
generations could be different, which was confusing. For the
`cdc::metadata`, the current generation is relevant as it is
handled differently, but for the Raft-based topology, it isn't.
Therefore, we change only the Raft-based topology. The generation
called "current" is called "the last committed" from now.
2024-02-20 12:35:16 +01:00
Kefu Chai
c627d9134e tools/scylla-nodetool: implement tablestats
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-20 18:12:35 +08:00
Kefu Chai
a7a2cf64cc utils/rjson: add templated streaming_writer::Write()
so we can use it in a templated context.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-20 18:12:35 +08:00
Botond Dénes
050c6dcad7 api: storage_service/keyspaces: add replication filter
To allow to filter the returned keyspaces based by the replication they
use: tablets or vnodes.
The filter can be disabled by omitting the parameter or passing "all".
The default is "all".

Fixes: #16509

Closes scylladb/scylladb#17319
2024-02-20 09:04:41 +01:00
Kefu Chai
57ede58a64 raft: add fmt::formatter for raft::fsm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `raft::fsm`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17414
2024-02-20 09:02:02 +02:00
Kefu Chai
acefde0735 mutation: add fmt::formatter for mutation_partition::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `mutation_partition::printer`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17419
2024-02-20 09:01:22 +02:00
Kefu Chai
0b13de52de sstable/mx: add fmt::formatter for cached_promoted_index::promoted_index_block
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`cached_promoted_index::promoted_index_block`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17415
2024-02-20 09:00:32 +02:00
Botond Dénes
2a494b6c47 Merge 'test/nodetool: parameterize test_ring' from Kefu Chai
so we exercise the cases where state and status are not "normal" and "up".

turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`.

* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
  should help with debugging.

Fixes #17401

Closes scylladb/scylladb#17417

* github.com:scylladb/scylladb:
  test/nodetool: parameterize test_ring
  test/nodetool: fail a test only with leftover expected requests
2024-02-20 08:48:11 +02:00
Anna Stuchlik
69ead0142d doc: remove outdated/invalid entries from FAQ
This commit removes outdated or invalid
FAQ entries specified in https://github.com/scylladb/scylladb/issues/16631

In addition, the questions about Cassandra compatibility
are removed as they are already answered on the forum:
https://forum.scylladb.com/t/which-cassandra-version-is-scylladb-it-compatible-with/84

Also, the incorrect entry about the cache has been removed
and the correct answer is added to the forum.
Fixes https://github.com/scylladb/scylladb/issues/17003

The question about troubleshooting performance issues
has also been removed, as it's already covered on the Forum.

Also, it removes the Apache copyright entry,
which should not be added to the FAQ page.

Closes scylladb/scylladb#17200
2024-02-20 08:43:58 +02:00
Anna Stuchlik
4f8f183736 doc: remove SSTable2json from the docs
This commit removes the SSTable2json documentation,
as well as the links to the removed page.

In addition, it adds a redirection for that page
to prevent 404.

Fixes https://github.com/scylladb/scylladb/issues/17204

Closes scylladb/scylladb#17340
2024-02-20 08:43:27 +02:00
Kefu Chai
64f9d90f7b tools/scylla-nodetool: implement toppartitions
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17357
2024-02-20 08:16:43 +02:00
Pavel Emelyanov
1440eddc58 test/topology: Add checking error paths for failed migration
For now only fail streaming stage and check that migration doesn't get
stuck and doesn't make tablet appear on dead node.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:59:06 +03:00
Pavel Emelyanov
cb02297642 topology.tablets_migration: Handle failed streaming
In case pending or leaving replica is marked as ignored by operator,
streaming cannot be retried and should jump to "cleanup_target" stage
after a barrier.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:59:06 +03:00
Pavel Emelyanov
72f3b1d5fe topology.tablets_migration: Add cleanup_target transition stage
The new stage will be used to revert migration that fails at some
stages. The goal is to cleanup the pending replica, which may already
received some writes by doing the cleanup RPC to the pending replica,
then jumping to "revert_migration" stage introduced earlier.

If pending node is dead, the call to cleanup RPC is skipped.

Coordinators use old replicas.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:59:06 +03:00
Pavel Emelyanov
ced5bf56eb topology.tablets_migration: Add revert_migration transition stage
It's like end_migration, but old replicas intact just removing the
transition (including new replicas).

Coordinators use old replicas.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:53:36 +03:00
Pavel Emelyanov
a0a33e8be1 storage_service: Rewrap cleanup stage checking in cleanup_tablet()
Next patch will need to teach this code to handle new cleanup_target
stage, this change prepares this place for smoother patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:53:36 +03:00
Pavel Emelyanov
c06cbc391f test/topology: Move helpers to get tablet replicas to pylib
These are very useful and will be used across different test files soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:53:36 +03:00
Kefu Chai
3a94a7c1ff test/nodetool: parameterize test_ring
so we exercise the cases where state and status are not "normal" and "up".

turns out the MBean is able to cache some objects. so the requets
retrieving datacenter and rack are now marked `ANY`.

Fixes #17401
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-20 12:59:59 +08:00
Kefu Chai
3d8a6956fc test/nodetool: fail a test only with leftover expected requests
if there are unconsumed requests whose `multiple` is -1, we should
not consider it a required, the test can consume it or not. but if
it does not, we should not consider the test a failure just because
these requests are sitting at the end of queue.

so, in this change, we

* filter out the requests whose `multiple` is `ANY`
* include the unconsumed requets in the raised `AssertionError`. this
  should help with debugging.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-20 12:59:59 +08:00
Patryk Wrobel
82104b6f50 test_tablets: tablet count metric - remove assumption about tablets existence
The mentioned test failed on CI. It sets up two nodes and performs
operations related to creation and dropping of tables as well as
moving tablets. Locally, the issue was not visible - also, the test
was passing on CI in majority of cases.

One of steps in the test case is intended to select the shard that
has some tablets on host_0 and then move them to (host_1, shard_3).
It contains also a precondition that requires the tablets count to
be greater than zero - to ensure, that move_tablets operation really
moves tablets.

The error message in the failed CI run comes from the precondition
related to tablets count on (host0, src_shard) - it was zero.
This indicated that there were no tablets on entire host_0.

The following commit removes the assumption about the existence of
tablets on host_0. In case when there are no tablets there, the
procedure is rerun for host_1.

Now the logic is as follows:
 - find shard that has some tablets on host_0
 - if such shard does not exist, then find such shard on host_1
 - depending on the result of search set src/dest nodes
 - verify that reported tablet count metric is changed when
   move_tablet operation finishes

Refs: scylladb#17386

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17398
2024-02-19 21:26:08 +01:00
Kefu Chai
3c84f08b93 alternator: add formatter for attribute_path_map_node<update_expression::action>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`attribute_path_map_node<update_expression::action>`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17270
2024-02-19 20:09:11 +02:00
Gleb Natapov
f00ea36f63 gossiper: remove unused REMOVAL_COORDINATOR state
This is leftover from 66ff072540
2024-02-19 15:01:33 +02:00
Gleb Natapov
461bba08cb virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled
If topology over raft is enabled the most up-to-date node status is in
the topology state machine. Get it from there.
2024-02-19 15:01:33 +02:00
Gleb Natapov
eb6fa81714 virtual_tables: create result for cluster_status_table read on shard 0
Next patch will access data that is available only on shard 0 during
result creation.
2024-02-19 15:01:33 +02:00
Petr Gusev
f83df24108 test_decommission: fix log messages
Closes scylladb/scylladb#17396
2024-02-19 12:09:43 +02:00
Mikołaj Grzebieluch
182cfebe40 maintenance_socket: add option to set owning group
Option `maintenance-socket-group` sets the owning group of the maintenance socket.
If not set, the group will be the same as the user running the scylla node.
2024-02-19 10:21:00 +01:00
Kefu Chai
34cc245da5 gms: add formatter for read_context::dismantle_buffer_stats
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`read_context::dismantle_buffer_stats`, and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17389
2024-02-19 09:43:53 +02:00
Kefu Chai
fe8e37c5bd configure.py: remove -Wno-unused-command-line-argument
`-Wno-unused-command-line-argument` is used to disable the warning of
`-Wunused-command-line-argument`, which is in turn used to split
warnings if any of the command line arguments passed to the compiler
driver is not used. see
https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-command-line-argument
but it seems we are not passing unused command line arguments to
the compiler anymore. so let's drop this option.

this change helps to

* reduce the discrepencies between the compiling options used by
  CMake-generated rules and those generated directly using
  `configure.py`
* reenable the warning so we are aware if any of the options
  is not used by compiler. this could a sign that the option fails
  to serve its purpose.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17195
2024-02-19 09:42:31 +02:00
Botond Dénes
42a76ca568 Merge 'Improve printing of nodes and backtraces in topology' from Pavel Emelyanov
There's a bunch of debug- and trace-level logging of locator::node-s that also include current_backtrace(). Printing node is done via debug_format() helper that generates and returns an sstring to print. Backtrace printing is not very lightweight on its own because of backtrace collecting. Not to slow things down in info log level, which is default, all such prints are wrapped with explicit if-s about log-level being enabled or not.

This PR removes those level checks by introducing lazy_backtrace() helper and by providing a formatter for nodes that also results in lazy node format string calculation.

Closes scylladb/scylladb#17235

* github.com:scylladb/scylladb:
  topology: Restore indentation after previous patch
  topology: Drop if_enabled checks for logging
  topology: Add lazy_backtrace() helper
  topology: Add printer wrapper for node* and formatter for it
  topology: Expand formatter<locator::node>
2024-02-19 09:32:53 +02:00
Kefu Chai
47ec74ad1a tools/scylla-nodetool: implement ring
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17375
2024-02-19 09:30:01 +02:00
Anna Stuchlik
ef1468d5ec doc: remove Enterprise OS support from Open Source
With this commit:
- The information about ScyllaDB Enterprise OS support
  is removed from the Open Source documentation.
- The information about ScyllaDB Open Source OS support
  is moved to the os-support-info file in the _common folder.
- The os-support-info file is included in the os-support page
  using the scylladb_include_flag directive.

This update employs the solution we added with
https://github.com/scylladb/scylladb/pull/16753.
It allows to dynamically add content to a page
depending on the opensource/enterprise flag.

Refs https://github.com/scylladb/scylladb/issues/15484

Closes scylladb/scylladb#17310
2024-02-18 22:09:06 +02:00
Petr Gusev
1d6caa42b9 join_cluster: move was_decommissioned check earlier
Before the patch if a decommissioned node tries
to restart, it calls _group0->discover_group0 first
in join_cluster, which hangs since decommissioned
nodes are banned and other nodes don't respond
to their discovering requests.

We fix the problem by checking was_decommissioned()
flag before calling discover_group0.

fixes scylladb/scylladb#17282

Closes scylladb/scylladb#17358
2024-02-18 22:07:28 +02:00
Kefu Chai
9d666f7d29 cmake: add -Wextra to compiling options
this matches what we have in configure.py

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17376
2024-02-18 19:21:54 +02:00
Kefu Chai
cb781c0ff7 gms: add add formatter for gms::versioned_value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::versioned_value`. its
operator<< is preserved, as it's still being used by the homebrew
generic formatter for std::unordered_map<gms::application_state,
gms::versioned_value>, which is in turn used in gms/gossiper.cc.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17366
2024-02-18 19:21:54 +02:00
Avi Kivity
43f1c3df2e Merge 'repair: Update repair history for tablet repair' from Asias He
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.

Fixes: #17046
Tests: test_tablet_repair_history

Closes scylladb/scylladb#17047

* github.com:scylladb/scylladb:
  repair: Update repair history for tablet repair
  repair: Extract flush hints code
2024-02-18 19:21:54 +02:00
Kefu Chai
8fc4243cf6 configure.py: do not pass include cxx_ldflags in cxxflags
ldflags are passed to ld (the linker), while cxxflags are passed to the
C++ compiler. the compiler does not understand the ldflags. if we
pass ldflags to it, it complains if `-Wunused-command-line-argument` is
enabled.

in this change, we do not include the ldflags in cxxflags, this helps
us to enable the warning option of `-Wunused-command-line-argument`,
so we don't need to disabled it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17328
2024-02-18 19:21:54 +02:00
Avi Kivity
d257cc5003 Merge 'scylla-nodetool: implement the repair command' from Botond Dénes
As usual, the new command is covered with tests, which pass with both the legacy and the new native implementation.

Refs: #15588

Closes scylladb/scylladb#17368

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the repair command
  test/nodetool: utils: add check_nodetool_fails_with_error_contains()
  test/nodetool: util: replace flags with custom matcher
2024-02-18 19:21:54 +02:00
Petr Gusev
4ef5d92f50 gossiping_property_file_snitch_test: modernize + fix potential race
This is mostly a refactoring commit to make the test
more readable, as a byproduct of
scylladb/scylladb#17369 investigation.

We add the check for specific type of exceptions that
can be thrown (bad_property_file_error).

We also fix the potential race - the test may write
to res from multiple cores with no locks.

Closes scylladb/scylladb#17371
2024-02-18 19:21:53 +02:00
Kefu Chai
4812a57f71 gms: add add formatter for gms::gossip_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

- gms::gossip_digest
- gms::gossip_digest_ack
- gms::gossip_digest_syn

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17379
2024-02-18 19:21:53 +02:00
Patryk Wrobel
3842bf18a7 storage_service/range_to_endpoint_map: allow API to properly handle tablets
This API endpoint was failing when tablets were enabled
because of usage of get_vnode_effective_replication_map().
Moreover, it was providing an error message that was not
user-friendly.

This change extends the handler to properly service the incoming requests.
Furthermore, it introduces two new test cases that verify the behavior of
storage_service/range_to_endpoint_map API. It also adjusts the test case
of this endpoint for vnodes to succeed when tablets are enabled by default.

The new logic is as follows:
 - when tablets are disabled then users may query endpoints
   for a keyspace or for a given table in a keyspace
 - when tablets are enabled then users have to provide
   table name, because effective replication map is per-table

When user does not provide table name when tablets are enabled
for a given keyspace, then BAD_REQUEST is returned with a
meaningful error message.

Fixes: scylladb#17343

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17372
2024-02-18 19:21:53 +02:00
Kefu Chai
808f4d72fb storage_service: fix typos in comment
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17377
2024-02-18 19:21:53 +02:00
Botond Dénes
b11213e547 tools/scylla-nodetool: implement the upgradesstables command
Refs: #15588

Closes scylladb/scylladb#17370
2024-02-18 19:21:53 +02:00
Kefu Chai
af2553e8bc cdc: add formatter for cdc::image_mode and cdc::delta_mode
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cdc::image_mode and cdc::delta_mode, and drop their operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17381
2024-02-18 19:21:53 +02:00
Avi Kivity
9bb4482ad0 Merge 'cdc: metadata: allow sending writes to the previous generations' from Patryk Jędrzejczak
Before this PR, writes to the previous CDC generations would
always be rejected. After this PR, they will be accepted if the
write's timestamp is greater than `now - generation_leeway`.

This change was proposed around 3 years ago. The motivation was
to improve user experience. If a client generates timestamps by
itself and its clock is desynchronized with the clock of the node
the client is connected to, there could be a period during
generation switching when writes fail. We didn't consider this
problem critical because the client could simply retry a failed
write with a higher timestamp. Eventually, it would succeed. This
approach is safe because these failed writes cannot have any side
effects. However, it can be inconvenient. Writing to previous
generations was proposed to improve it.

The idea was rejected 3 years ago. Recently, it turned out that
there is a case when the client cannot retry a write with the
increased timestamp. It happens when a table uses CDC and LWT,
which makes timestamps permanent. Once Paxos commits an entry
with a given timestamp, Scylla will keep trying to apply that entry
until it succeeds, with the same timestamp. Applying the entry
involves writing to the CDC log table. If it fails, we get stuck.
It's a major bug with an unknown perfect solution.

Allowing writes to previous generations for `generation_leeway` is
a probabilistic fix that should solve the problem in practice.

Apart from this change, this PR adds tests for it and updates
the documentation.

This PR is sufficient to enable writes to the previous generations
only in the gossiper-based topology. The Raft-based topology
needs some adjustments in loading and cleaning CDC generations.
These changes won't interfere with the changes introduced in this
PR, so they are left for a follow-up.

Fixes scylladb/scylladb#7251
Fixes scylladb/scylladb#15260

Closes scylladb/scylladb#17134

* github.com:scylladb/scylladb:
  docs: using-scylla: cdc: remove info about failing writes to old generations
  docs: dev: cdc: document writing to previous CDC generations
  test: add test_writes_to_previous_cdc_generations
  cdc: generation: allow increasing generation_leeway through error injection
  cdc: metadata: allow sending writes to the previous generations
2024-02-18 19:21:53 +02:00
Asias He
796044be1c repair: Update repair history for tablet repair
This patch wires up tombstone_gc repair with tablet repair. The flush
hints logic from the vnode table repair is reused. The way to mark the
finish of the repair is also adjusted for tablet repair because it only
has one shard per tablet token range instead of smp::count shards.

Fixes: #17046
Tests: test_tablet_repair_history
2024-02-18 10:21:58 +08:00
Asias He
e43bc775d0 repair: Extract flush hints code
So it can be used by tablet repair as well.
2024-02-18 09:42:02 +08:00
Kefu Chai
50964c423e hints: host_filter: add formatter for hints::host_filter
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `hints::host_filter`. its
operator<< is preserved as it's still used by the homebrew generic
formatter for vector<>, which is in turn used by db/config.cc.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17347
2024-02-16 19:03:11 +03:00
Anna Stuchlik
e132ffdb60 doc: add missing redirections
This commit adds the missing redirections
to the pages whose source files were
previously stored in the install-scylla folder
and were moved to another location.

Closes scylladb/scylladb#17367
2024-02-16 14:09:26 +02:00
Kefu Chai
47fec0428a tools/scylla-nodetool: return 1 when viewbuild not succeeds
this change introduces a new exception which carries the status code
so that an operation can return a non-zero exit code without printing
any errors. this mimics the behavior of "viewbuildstatus" command of
C* nodetool.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17359
2024-02-16 13:53:33 +02:00
Botond Dénes
8d8ea12862 tools/scylla-nodetool: implement the repair command 2024-02-16 04:42:08 -05:00
Botond Dénes
48e8435466 test/nodetool: utils: add check_nodetool_fails_with_error_contains()
Checks that at least one error snippet is contained in the error output.
2024-02-16 04:40:31 -05:00
Botond Dénes
190c9a7239 test/nodetool: util: replace flags with custom matcher
_do_check_nodetool_fails_with() currently has a `match_all` flag to
control how the match is checked. Now we need yet another way to control
how matching is done. Instead of adding yet another flag (and who knows
how many more), jut replace the flag and the errors input with a matcher
functor, which gets the stdout and stderr and is delegated to do any
checks it wants. This method will scale much better going forward.
2024-02-16 04:40:31 -05:00
Yaron Kaikov
44edb89f79 [actions] Add a check for backport labels
As part of the Automation of ScyllaDB backports project, each PR should get either a backport/none or backport/X.Y label.
Based on this label we will automatically open a backport PR for the relevant OSS release.
In this commit, I am adding a GitHub action to verify if such a label was added.
This only applies to PR with a based branch of master or next. For releases, we don't need this check
2024-02-15 22:40:09 +02:00
Avi Kivity
eedb997568 Merge 'compaction: upgrade: handle keyspaces that use tablets' from Lakshmi Narayanan Sreethar
Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace.

Fixes #16848

Closes scylladb/scylladb#17335

* github.com:scylladb/scylladb:
  compaction: upgrade: handle keyspaces that use tablets
  replica/database: add an optional variant to get_keyspace_local_ranges
2024-02-15 21:31:54 +02:00
Kefu Chai
f0b3068bcf build: cmake: disable unused-parameter, missing-field-initializers and deprecated-copy
-Wunused-parameter, -Wmissing-field-initializers and -Wdeprecated-copy
warning options are enabled by -Wextra. the tree fails to build with
these options enabled, before we address them if the warning are genuine
problems, let's disable them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17352
2024-02-15 21:27:44 +02:00
Kamil Braun
50ebce8acc Merge 'Purge old ip on change' from Petr Gusev
When a node changes IP address we need to remove its old IP from `system.peers` and gossiper.

We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted.

The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address.
The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1.

To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup.

The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes.

Fixes #16886
Fixes #16691
Fixes #17199

Closes scylladb/scylladb#17162

* github.com:scylladb/scylladb:
  test_change_ip: improve the test
  raft_ip_address_updater: remove stale IPs from gossiper
  raft_address_map: add my ip with the new generation
  system_keyspace::update_peer_info: check ep and host_id are not empty
  system_keyspace::update_peer_info: make host_id an explicit parameter
  system_keyspace::update_peer_info: remove any_set flag optimisation
  system_keyspace: remove duplicate ips for host_id
  system_keyspace: peers table: use coroutines
  storage_service::raft_ip_address_updater: log gossiper event name
  raft topology: ip change: purge old IP
  on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes
2024-02-15 17:40:29 +01:00
Nadav Har'El
6873a4772f tablets: add warning on CREATE KEYSPACE
The CDC feature is not supported on a table that uses tablets
(Refs #16317), so if a user creates a keyspace with tablets enabled
they may be surprised later (perhaps much later) when they try to enable
CDC on the table and can't.

The LWT feature always had issue Refs #5251, but it has become potentially
more common with tablets.

So it was proposed that as long as we have missing features (like CDC or
LWT), every time a keyspace is created with tablets it should output a
warning (a bona-fide CQL warning, not a log message) that some features
are missing, and if you need them you should consider re-creating the
keyspace without tablets.

This patch does this. It was surprisingly hard and ugly to find a place
in the code that can check the tablet-ness of a keyspace while it is
still being created, but I think I found a reasonable solution.

The warning text in this patch is the following (obviously, it can
be improved later, as we perhaps find more missing features):

   "Tables in this keyspace will be replicated using tablets, and will
    not support the CDC feature (issue #16317) and LWT may suffer from
    issue #5251 more often. If you want to use CDC or LWT, please drop
    this keyspace and re-create it without tablets, by adding AND TABLETS
    = {'enabled': false} to the CREATE KEYSPACE statement."

This patch also includes a test - that checks that this warning is is
indeed generated when a keyspace is created with tablets (either by default
or explicitly), and not generated if the keyspace is created without
tablets.

Obviously, this entire patch - the warning and its test - can be reverted
as soon as we support CDC (and all other features) on tablets.

Fixes #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-02-15 15:51:47 +02:00
Nadav Har'El
29b42e47e5 test/cql-pytest: fix guadrail tests to not be sensitive to more warnings
The guardrail tests check that certain guardrails enable and disable
certain warnings.

These tests currently check for the *number* of warnings returned by a
request, assuming that without the guardrail there would be no warning.
But in the following patch we plan to add an additional warning on
keyspace creation (that warns about tablets missing some features).
So the tests should check for whether or not a *specific* warning is
returned - not the count.

I only modified tests which the change in the next patch will break.
Tests which use SimpleStrategy and will not get the extra warning,
are unmodified and continue to use the old approach of counting
warnings.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-02-15 15:08:08 +02:00
Lakshmi Narayanan Sreethar
7a98877798 compaction: upgrade: handle keyspaces that use tablets
Tables in keyspaces governed by replication strategy that uses tablets, have
separate effective_replication_maps. Update the upgrade compaction task to
handle this when getting owned key ranges for a keyspace.

Fixes #16848

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-15 17:47:39 +05:30
Lakshmi Narayanan Sreethar
8925a2c3cb replica/database: add an optional variant to get_keyspace_local_ranges
Add a new method database::maybe_get_keyspace_local_ranges that
optionally returns the owned ranges for the given keyspace if it has a
effective_replication_map for the entire keyspace.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-15 17:44:47 +05:30
Botond Dénes
22a5112bf1 tools/scylla-sstable-scripts: add keys.lua and largest-key.lua
I wrote these scripts to identify sstables with too large keys for a
recent investigation. I think they could be useful in the future,
certainly as further examples on how to write lua scripts for
scylla-sstable script.

Closes scylladb/scylladb#17000
2024-02-15 13:39:41 +02:00
Avi Kivity
5df5714331 Merge 'api: storage_service/natural_endpoints: add tablets support' from Botond Dénes
This API endpoint currently returns with status 500 if attempted to be called for a table which uses tablets. This series adds tablet support. No change in usage semantics is required, the endpoint already has a table parameter.
This endpoint is the backend of `nodetool getendpoints` which should now work, after this PR.

Fixes: #17313

Closes scylladb/scylladb#17316

* github.com:scylladb/scylladb:
  service/storage_service: get_natural_endpoints(): add tablets support
  replica/database: keyspace: add uses_tablets()
  service/storage_service: remove token overload of get_natural_endpoints()
2024-02-15 13:36:56 +02:00
Kefu Chai
caa20c491f storage_service: pass non-empty keyspace when performing cleanup_all
this change addresses the regression introduced by 5e0b3671, which
fall backs to local cleanup in cleanup_all. but 5e0b3671 failed to
pass the keyspace to the `shard_cleanup_keyspace_compaction_task_impl`
is its constructor parameter, that's why the test fails like
```
error executing POST request to http://localhost:10000/storage_service/cleanup_all with parameters {}: remote replied with status code 400 Bad Request:
Can't find a keyspace

```

where the string after "Can't find a keyspace" is empty.

in this change, the keyspace name of the keyspace to be cleaned is passed to
`shard_cleanup_keyspace_compaction_task_impl`.

we always enable the topology coordinator when performing testing,
that's why this issue does not pop up until the longevity test.

Fixes #17302
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17320
2024-02-15 13:17:45 +02:00
Aleksandra Martyniuk
cf36015591 repair: handle no_such_column_family from remote node gracefully
If no_such_column_family is thrown on remote node, then repair
operation fails as the type of exception cannot be determined.

Use repair::with_table_drop_silenced in repair to continue operation
if a table was dropped.
2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk
2ea5d9b623 test: test drop table on receiver side during streaming 2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk
b08f539427 streaming: fix indentation 2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk
219e1eda09 streaming: handle no_such_column_family from remote node gracefully
If no_such_column_family is thrown on remote node, then streaming
operation fails as the type of exception cannot be determined.

Use repair::with_table_drop_silenced in streaming to continue
operation if a table was dropped.
2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk
5202bb9d3c repair: add methods to skip dropped table
Schema propagation is async so one node can see the table while on
the other node it is already dropped. So, if the nodes stream
the table data, the latter node throws no_such_column_family.
The exception is propagated to the other node, but its type is lost,
so the operation fails on the other node.

Add method which waits until all raft changes are applied and then
checks whether given table exists.

Add the function which uses the above to determine, whether the function
failed because of dropped table (eg. on the remote node so the exact
exception type is unknown). If so, the exception isn't rethrown.
2024-02-15 12:06:42 +01:00
Botond Dénes
811e931b09 Merge 'tools/scylla-nodetool: implement compactionstats and viewbuildstatus' from Kefu Chai
Refs #15588

Closes scylladb/scylladb#17344

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement viewbuildstatus
  tools/scylla-nodetool: implement compactionstats
2024-02-15 12:44:05 +02:00
Petr Gusev
c4140678ba test_change_ip: improve the test
In this commit we refactor test_change_ip to improve
it in several ways:
  * We inject failure before old IP is removed and verify
    that after restart the node sees the proper peers - the
    new IP for node2 and old IP for node3, which is not restarted
    yet.
  * We introduce the lambda wait_proper_ips, which checks not only the
    system.peers table, but also gossiper and token_metadata.
  * We call this lambda for all nodes, not only the first node;
    this allows to validate that the node that has changed its
    IP has the proper IP of itself in the data structures above.

Note that we need to inject an additional delay ip-change-raft-sync-delay
before old IP is removed. Otherwise the problem stop reproducing - other
nodes remove the old IP before it's send back to the just restarted node.
2024-02-15 13:26:02 +04:00
Petr Gusev
a068dba8c9 raft_ip_address_updater: remove stale IPs from gossiper
In the scenario described in the previous commit the
on_endpoint_change could be called with our previous IP.
We can easily detect this case - after add_or_update_entry
the IP for a given id in address_map hasn't changed. We
remove such IP from gossiper since it's not needed, and
makes the test in the next commit more natural - all old
IPs are removed from all subsystems.
2024-02-15 13:25:56 +04:00
Petr Gusev
4b33ba2894 raft_address_map: add my ip with the new generation
The following scenario is possible: a node A changes its IP
from ip1 to ip2 with restart, other nodes are not yet aware of ip2
so they keep gossiping ip1, after restart A receives
ip1 in a gossip message and calls handle_major_state_change
since it considers it as a new node. Then on_join event is
called on the gossiper notification handles, we receive
such event in raft_ip_address_updater and reverts the IP
of the node A back to ip1.

The essence of the problem is that we don't pass the proper
generation when we add ip2 as a local IP during initialization
when node A restarts, so the zero generation is used
in raft_address_map::add_or_update_entry and the gossiper
message owerwrites ip2 to ip1.

In this commit we fix this problem by passing the new generation.
To do that we move the increment_and_get_generation call
from join_token_ring to scylla_main, so that we have a new generation
value before init_address_map is called.

Also we remove the load_initial_raft_address_map function from
raft_group0 since it's redundant. The comment above its call site
says that it's needed to not miss gossiper updates, but
the function storage_service::init_address_map where raft_address_map
is now initialized is called before gossiper is started. This
function does both - it load the previously persisted host_id<->IP
mappings from system.local and subscribes to gossiper notifications,
so there is no room for races.

Note that this problem reproduces less likely with the
'raft topology: ip change: purge old IP' commit - other
nodes remove the old IP before it's send back to the
just restarted node. This is also the reason why this
problem doesn't occur in gossiper mode.

fixes scylladb/scylladb#17199
2024-02-15 13:21:04 +04:00
Petr Gusev
2bf75c1a4e system_keyspace::update_peer_info: check ep and host_id are not empty 2024-02-15 13:21:04 +04:00
Petr Gusev
86410d71d1 system_keyspace::update_peer_info: make host_id an explicit parameter
The host_id field should always be set, so it's more
appropriate to pass it as a separate parameter.

The function storage_service::get_peer_info_for_update
is  updated. It shouldn't look for host_id app
state is the passed map, instead the callers should
get the host_id on their own.
2024-02-15 13:21:04 +04:00
Petr Gusev
e0072f7cb3 system_keyspace::update_peer_info: remove any_set flag optimisation
This optimization never worked -- there were four usages of
the update_peer_info function and in all of them some of
the peer_info fields were set or should be set:
* sync_raft_topology_nodes/process_normal_node: e.g. tokens is set
* sync_raft_topology_nodes/process_transition_node: host_id is set
* handle_state_normal: tokens is set
* storage_service::on_change: get_peer_info_for_update could potentially
return a peer_info with all fields set to empty, but this shouldn't
be possible, host_id should always be set.

Moreover, there is a bug here: we extract host_id from the
states_ parameter, which represent the gossiper application
states that have been changed. This parameter contains host_id
only if a node changes its IP address, in all other cases host_id
is unset. This means we could end up with a record with empty
host_id, if it wasn't previously set by some other means.

We are going to fix this bug in the next commit.
2024-02-15 13:21:04 +04:00
Petr Gusev
4a14988735 system_keyspace: remove duplicate ips for host_id
When a node changes IP we call sync_raft_topology_nodes
from raft_ip_address_updater::on_endpoint_change with
the old IP value in prev_ip parameter.
It's possible that the nodes crashes right after
we insert a new IP for the host_id, but before we
remove the old IP. In this commit we fix the
possible inconsistency by removing the system.peers
record with old timestamp. This is what the new
peers_table_read_fixup function is responsible for.

We call this function in all system_keyspace methods
that read the system.peers table. The function
loads the table in memory, decides if some rows
are stale by comparing their timestamps and
removes them.

The new function also removes the records with no
host_id, so we no longer need the get_host_id function.

We'll add a test for the problem this commit fixes
in the next commit.
2024-02-15 13:21:04 +04:00
Petr Gusev
fa8718085a system_keyspace: peers table: use coroutines
This is a refactoring commit with no observable
changes in behaviour.

We switch the functions to coroutines, it'll
be easy to work with them in this way in the
next commit. Also, we add more const-s
along the way.
2024-02-15 13:21:04 +04:00
Petr Gusev
00547d3f48 storage_service::raft_ip_address_updater: log gossiper event name
It's useful for debugging.
2024-02-15 13:20:54 +04:00
Petr Gusev
6955cfa419 raft topology: ip change: purge old IP
When a node changes IP address we need to
remove its old IP from system.peers and
gossiper.

We do this in sync_raft_topology_nodes when
the new IP is saved into system.peers to avoid
losing the mapping if the node crashes
between deleting and saving the new IP. In the
next commit we handle the possible duplicates
in this case by dropping them on the read path.

In subsequent commits, test_change_ip will be
adjusted to ensure that old IPs are removed.

fixes scylladb/scylladb#16886
fixes scylladb/scylladb#16691
2024-02-15 13:19:13 +04:00
Petr Gusev
a2c0384cd1 on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes
We introduce the helper 'ensure_alive' which takes a
coroutine lambda and returns a wrapper which
ensures the proper lifetime for it.
It works by moving the input lambda onto the heap and
keeping the ptr alive until the resulting future
is resolved.

We also move the holder acquired from _async_gate
to the 'then' lambda closure, since now these closures
will be kept alive during the lambda coroutine execution.

We'll be adding more code to this lambda in the subsequent
commits, it's easier to work with coroutines.
2024-02-15 13:13:44 +04:00
Kefu Chai
f9d19a61ff tools/scylla-nodetool: implement viewbuildstatus
Refs 15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-15 16:54:16 +08:00
Nadav Har'El
28db187756 alternator, tablets: return error if enabling TTL with tablets
Alternator TTL doesn't yet work on tables using tablets (this is
issue #16567). Before this patch, it can be enabled on a table with
tablets, and the result is a lot of log spam and nothing will get expired.

So let's make the attempt to enable TTL on a table that uses tablets
into a clear error. The error message points to the issue, and also
suggests how to create a table that uses vnodes, not tablets.

This patch also adds a test that verifies that trying to enable TTL
with tablets is an error. Obviously, this test should be removed
once the issue is solved and TTL begins working with tablets.

Refs #16567
Refs #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17306
2024-02-15 10:47:06 +02:00
Kefu Chai
4da9a62472 utils: managed_bytes: fix typo in comment
s/assigments/assignments/

this misspelling was identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17333
2024-02-15 10:37:25 +02:00
Kefu Chai
8e8b73fa82 dht: add formatter for paritition_range_view and i_partition
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`partition_range_view` and `i_partition`, and drop their operator<<:s.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17331
2024-02-15 09:46:03 +02:00
Lakshmi Narayanan Sreethar
3b7b315f6a replica/database: quiesce compaction before closing system tables during shutdown
During shutdown, as all system tables are closed in parallel, there is a
possibility of a race condition between compaction stoppage and the
closure of the compaction_history table. So, quiesce all the compaction
tasks before attempting to close the tables.

Fixes #15721

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17218
2024-02-15 09:44:16 +02:00
Nadav Har'El
b97ded5c4a test/topology: tests for setting tombstone_gc on materialized view
A user asked on the ScyllaDB forum several questions on whether
tombstone_gc works on materialized views. This patch includes two
tests that confirm the following:

1. The tombstone_gc may be set on a view - either during its creation
   with CREATE MATERIALIZED VIEW or later with ALTER MATERIALIZED VIEW.

2. The tombstone_gc setting is correctly shown - for both base tables
   and views - by the "DESC" statement.

3. The tombstone_gc setting is NOT inherited from a base table to a new
   view - if you want this option on a view, you need to set it
   separately.

Unfortunately, this test could not be a single-node cql-pytest because
we forbid tombstone_gc=repair when RF=1, and since recently, we forbid
setting RF>1 on a single-node setup. So the new tests are written in
the test/topology framework - which may run multiple tests against
a single three-node cluster run multiple tests against it.

To write tests over a shared cluster, we need functions which create
temporary keyspaces, tables and views, which are deleted automatically
as soon as a test ends. The test/topology framework was lacking such
functions, so this tests includes them - currently inside the test
file, but if other people find them useful they can be moved to a more
central location.

The new functions, net_test_keyspace(), new_test_table() and
new_materialized_view() are inspired by the identically-named
functions in test/cql-pytest/util.py, but the implementation is
different: Importantly, the new functions here are *async*
context managers, used via "async with", to fit with the rest
of the asynchronous code used in the topology test framework.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17345
2024-02-15 09:43:30 +02:00
Kefu Chai
bcb144ada3 configure.py: disable stack-use-after-scope check only when ASan is enabled
`-fno-sanitize-address-use-after-scope` is used to disable the check for
stack-use-after-scope bugs, but this check is only performed when ASan
is enabled. if we pass this option when ASan is not enabled, we'd have
following warning, so let's apply it only when ASan is enabled.

```
clang-16: error: argument unused during compilation:
'-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument]
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17329
2024-02-15 09:28:29 +02:00
Botond Dénes
ca13ff10ea service/storage_service: get_natural_endpoints(): add tablets support
Also add a unit test for this API endpoint, testing it with both tablets
and vnodes.
2024-02-15 02:07:18 -05:00
Botond Dénes
7f17d3bb0e replica/database: keyspace: add uses_tablets()
Mirroring table::uses_tablets(), provides a convenient and -- more
importabtly -- easily discoverable way to determine whether the keyspace
uses tablets or not.
This information is of course already available via the abstract
replication strategy, but as seen in a few examples, this is not easily
discoverable and sometimes people resorted to enumerating the keyspace's
tables to be able to invoke table::uses_tablets().
2024-02-15 01:51:26 -05:00
Botond Dénes
0b2acf90ff service/storage_service: remove token overload of get_natural_endpoints()
This overload does not work with tablets because it only has a keyspace
and token parameters. The only caller is the other overload, which also
has a table parameters, so it can be made to works with tablets. Inline
this overload into the other and remove it, in preparation to fixing
this method for tablets.
2024-02-15 01:51:25 -05:00
Kefu Chai
68795eb8fa tools/scylla-nodetool: implement gossipinfo
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17317
2024-02-15 08:41:39 +02:00
Kefu Chai
a7abaa457b tools/scylla-nodetool: implement compactionstats
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-15 12:29:10 +08:00
Anna Stuchlik
710d182654 doc: update Handling Node Failures to add topology
This commit updates the Handling Node Failures page
to specify that the quorum requirement refers to both
schema and topology updates.

Closes scylladb/scylladb#17321
2024-02-14 17:15:13 +01:00
Kamil Braun
7e9e10186f Merge 'change the way ignored nodes are handled by the topology coordinator' from Gleb
This series makes several changes to how ignored nodes list is treated
by the topology coordinator. First the series makes it global and not
part of a single topology operation, second it extends the list at the
time of removenode/replace invocation and third it bans all nodes in
the list from contacting the cluster ever again.

The main motivation is to have a way to unblock tablet migration in case
of a node failure. Tablet migration knows how to avoid nodes in ignored
nodes list and this patch series provides a way to extend it without
performing any topology operation (which is not possible while tables
migration runs).

Fixes scylladb/scylladb#16108

* 'gleb/ignore-nodes-handling-v2' of github.com:scylladb/scylla-dev:
  test: add test for the new ignore nodes behaviour
  topology coordinator: cleanup node_state::decommissioning state handling code
  topology coordinator: ban ignored nodes just like we ban nodes that are left
  storage_service: topology coordinator: validate ignore dead nodes parameters in removenode/replace
  topology coordinator: add removed/replaced nodes to ignored_nodes list at the request invocation time
  topology coordinator: make ignored_nodes list global and permanent
  topology_coordinator: do not cancel rebuild just because some other nodes are dead
  topology coordinator: throw more specific error from wait_for_ip() function in case of a timeout
  raft_group0: add make_nonvoters function that can make multiple node non voters simultaneously
2024-02-14 16:36:01 +01:00
Marcin Maliszkiewicz
0b8b9381f4 auth: drop const from methods on write path
In a follow-up patch abort_source will be used
inside those methods. Current pattern is that abort_source
is passed everywhere as non const so it needs to be
executed in non const context.

Closes scylladb/scylladb#17312
2024-02-14 13:24:53 +01:00
Tzach Livyatan
902733cd7e Docs: rename doc page from REST tp Admin REST API
Closes scylladb/scylladb#17334
2024-02-14 13:49:54 +02:00
Kefu Chai
d43c418f72 tools/scylla-nodetool: implement getendpoints
Refs #15588

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17332
2024-02-14 11:20:52 +02:00
Gleb Natapov
7802c206c7 test: add test for the new ignore nodes behaviour
The test checks that once a node is specified in ignored node list by
one topology operation the information is carried over to the next
operation as well.
2024-02-14 10:35:11 +02:00
Gleb Natapov
7ec9316774 topology coordinator: cleanup node_state::decommissioning state handling code
The code is shared between decommission and removenode and it has
scattered 'ifs' for different behaviours between those. Change it to
have only one 'if'.
2024-02-14 10:35:11 +02:00
Gleb Natapov
363af9e664 topology coordinator: ban ignored nodes just like we ban nodes that are left
Since now a node that is at one point was marked as dead, either via
--ignore-dead-nodes parameter or by been a target for removenode or
replace, can no longer be made "undead" we need to make sure that they
cannot rejoin the cluster any longer. Do that by banning them on a
messaging layer just like we do for nodes that are left.

Not that the removenode failure test had to be altered since it restarted
a node after removenode failure (which now will not work). Also, since
the check for liveness was removed from the topology coordinator (because
the node is already banned by then), the test case that triggers the
removed code is removed as well.
2024-02-14 10:35:06 +02:00
Kefu Chai
ab07fb25f5 scylla_raid_setup: reference xfsprog on the minimal 1024 block size
the quote of "The minimum block size for crc enabled filesystems is
1024" comes from the output of mkfs.xfs, let's quote the source for
better maintainability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17094
2024-02-14 08:44:14 +02:00
Michał Chojnowski
3d81138852 configure.py: don't modify modes in write_build_file()
The true motivation for this patch is a certain problem with configure.py
in scylla-enterprise, which can only be solved by moving the `extra_cxxflags`
lines before configure_seastar(). This patch does that by hoisting
get_extra_cxxflags() up to create_build_system().

But this patch makes sense even if we disregard the real motivation.
It's weird that a function called `write_build_file()` adds additional
build flags on its own.

Closes scylladb/scylladb#17189
2024-02-13 21:28:32 +02:00
Patryk Wrobel
a3fb44cbca Rename keyspace::get_effective_replication_map()
This commit renames keyspace::get_effective_replication_map()
to keyspace::get_vnode_effective_replication_map(). This change
is required to ease the analysis of the usage of this function.

When tablets are enabled, then this function shall not be used.
Instead of per-keyspace, per-table replication map should be used.
The rename was performed to distinguish between those two calls.
The next step will be an audit of usages of
keyspace::get_vnode_effective_replication_map().

Refs: scylladb#16626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17314
2024-02-13 20:22:02 +02:00
Nadav Har'El
5d4c60aee3 test/cql-pytest: avoid spurious guardrail warnings
All cql-pytest tests use one node, and unsuprisingly most use RF=1.
By default, as part of the "guardrails" feature, we print a warning
when creating a keyspace with RF=1. This warning gets printed on
every cql-pytest run, which creates a "boy who cried wolf" effect
whereby developers get used to seeing these warnings, and won't care
if new warnings start appearing.

The fix is easy - in run.py start Scylla with minimum-replication-factor-
warn-threshold set to -1 instead of the default 3.

Note that we do have cql-pytest tests for this guardrail, but those don't
rely on the default setting of this variable (they can't, cql-pytest
tests can also be run on a Scylla instance run manually by a developer).
Those tests temporarily set the threshold during the test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17274
2024-02-13 17:44:20 +02:00
Kefu Chai
b309e42195 collection_mutation: add formatter for collection_mutation_view::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
`collection_mutation_view::printer`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17300
2024-02-13 17:42:25 +02:00
Botond Dénes
120442231f Merge 'row_cache: test cache consistency during multi-partition cache updates' from Michał Chojnowski
Adds a test reproducing https://github.com/scylladb/scylladb/issues/16759, and the instrumentation needed for it.

Closes scylladb/scylladb#17208

* github.com:scylladb/scylladb:
  row_cache_test: test cache consistency during memtable-to-cache merge
  row_cache: use preemption_source in update()
  utils: preempt: add preemption_source
2024-02-13 17:37:06 +02:00
Kefu Chai
54ed65bb50 mutation: s/statics/static content/
codespell reports that "statics" could be the misspelling of
"statistics". but "static" here means the static column(s). so
replace "static" with more specific wording.

Refs #589
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17216
2024-02-13 17:33:21 +02:00
Kefu Chai
9b6a66826c api/storage_service: add more constness to http_context parameter
when we just want to perform read access to `http_context`, there
is no need to use a non-const reference. so let's add `const` specifier
to make this explicit. this shoudl help with the readability and
maintainability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17219
2024-02-13 17:32:45 +02:00
Lakshmi Narayanan Sreethar
f8f8d64982 test.py: support skipping multiple test patterns
Support skipping multiple patterns by allowing them to be passed via
multiple '--skip' arguments to test.py.

Example : `test.py --skip=topology --skip=sstables`

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17220
2024-02-13 17:32:03 +02:00
Kefu Chai
57d138b80f row_cache: s/fro/reader/
"fro" is the short of "from" but the value is an
`optimized_optional<flat_mutation_reader_v2>`. codespell considers
it a misspelling of "for" or "from". neither of them makes sense,
so let's change it to "reader" for better readability, also for
silencing the warning. so that the geniune warning can stands out,
this would help to make the codespell more useful.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17221
2024-02-13 17:28:14 +02:00
Kefu Chai
c555af3cd8 raft: add formatter for raft::log
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `raft::log`, and drop its
operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17301
2024-02-13 17:17:57 +02:00
Anna Stuchlik
02cd84adbf doc: remove OSS-vs-Ent Matrix from OSS docs
This commit removes the Open Source vs. Enterprise matrix
from the Open Source documentation.

In addition, a redirection is added to prevent 404 in the OSS docs,
and to the removed page is replaced with a link to the same page
in the Enterprise docs.

This commit must be reverted enterprise.git, because
we want to keep the Matrix in the Enterprise docs.

Fixes https://github.com/scylladb/scylladb/issues/17289

Closes scylladb/scylladb#17295
2024-02-13 17:17:22 +02:00
Yaniv Kaul
d2ef100b60 Typos: more/less then -> more/less than
Fix repated typos in comments: more then -> more than, less then -> less than

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#17303
2024-02-13 17:16:15 +02:00
Nadav Har'El
dce47a81b0 alternator, tablets: return error if enabling Streams with tablets
Alternator Streams doesn't yet work on tables using tablets (this is
issue #16317). Before this patch, an attempt to enable it results in
an unsightly InternalServerError, which isn't terrible - but we can
do better.

So in this patch, we make the attempt to enable Streams and tablets
together into a clear error. The error message points to the open issue,
and also suggests how to create a table that uses vnodes, not tablets.

Unfortunately, there are slightly two different code paths and error
messages for two cases: One case is the creation of a new table (where
the validation happens before the keyspace is actually created), and
the other case is an attempt to enable streams on an existing table
with an existing keyspace (which already might or might not be using
tablets).

This patch also adds a test that verifies that trying to enable Streams
with tablets is an error - in both cases (table creation and update).
Obviously, this test - and the validation code - should be removed once
the issue is solved and Alternator Streams begins working with tablets.

Fixes #16497
Refs #16807

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17311
2024-02-13 16:42:35 +02:00
Raphael S. Carvalho
54226dddf5 replica: Kill vnode-oriented cleanup handling for multiple compaction groups
With tablets, we don't use vnode-oriented sstable cleanup.
So let's just remove unused code and bail out silently if sharding is
tablet based. The reason for silence is that we don't want to break
tests that might be reused for tablets, and it's not a problem for
sstable cleanup to be ignored with tablets.
This approach is actually already used in the higher level code,
implementing the cleanup API.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17296
2024-02-13 16:35:15 +02:00
Gleb Natapov
8f7d2fd44b storage_service: topology coordinator: validate ignore dead nodes parameters in removenode/replace
Fail commands if provided nodes are not in the "normal" state.
2024-02-13 16:15:35 +02:00
Gleb Natapov
d062a04df0 topology coordinator: add removed/replaced nodes to ignored_nodes list at the request invocation time
To unblock tablet migration in case of a node failure we need a way to
dynamically extend a list of ignored_nodes while the migration is
happening. This patch does it by piggybacking on existing topology
operations that assume their target node is already dead. It adds the
target node to now global ignored_nodes list when request is issued and,
for better HA, makes the nodes in ignored_nodes non voters.
2024-02-13 16:15:35 +02:00
Gleb Natapov
9b52dc4560 topology coordinator: make ignored_nodes list global and permanent
Currently ignored_nodes list is part of a request (removenode or
replace) and exists only while a request is handled. This patch
changes it to be global and exist outside of any request. Node stays
in the list until they eventually removed and moved to the "left" state.
If a node is specified in the ignore-dead-nodes option for any command
it will be ignored for all other operations that support ignored_nodes
(like tablet migration).
2024-02-13 16:15:35 +02:00
Gleb Natapov
cbef807e69 topology_coordinator: do not cancel rebuild just because some other nodes are dead
Rebuild may not contact all the nodes, so it may succeed even while some
nodes are dead.
2024-02-13 16:15:35 +02:00
Gleb Natapov
0fe00e34ef topology coordinator: throw more specific error from wait_for_ip() function in case of a timeout
It will be easier to distinguish the failure reason.
2024-02-13 16:15:35 +02:00
Gleb Natapov
f21a3b4ca5 raft_group0: add make_nonvoters function that can make multiple node non voters simultaneously 2024-02-13 16:15:35 +02:00
Petr Gusev
3722ca0a41 sync_raft_topology_nodes: parallelize system_keyspace update functions
In sync_raft_topology_nodes we execute a system keyspace
update query for each node of the cluster. The system keyspace
tables use schema commitlog which by default enables use_o_dsync.
This means that each write to the commitlog is accompanied by fsync.
For large clusters this can incur hundreds of writes with fsyncs, which
is very expensive. For example, in #17039 for a  moderate size cluster
of 50 nodes sync_raft_topology_nodes took almost 5 seconds.

In this commit we solve this problem by running all such update
queries in parallel. The commitlog should batch them and issue
only one write syscall to the OS.

Closes scylladb/scylladb#17243
2024-02-13 14:44:48 +01:00
Piotr Dulikowski
314fd9a11f test: test_topology_recovery_basic: add missing driver reconnect
Unfortunately, scylladb/python-driver#230 is not fixed yet, so it is
necessary for the sake of our CI's stability to re-create the driver
session after all nodes in the cluster are restarted.

There is one place in test_topology_recovery_basic where all nodes are
restarted but the driver session is not re-created. Even though nodes
are not restarted at once but rather sequentially, we observed a failure
with similar symptoms in a CI run for scylla-enterprise.

Add the missing driver reconnect as a workaround for the issue.

Fixes: scylladb/scylladb#17277

Closes scylladb/scylladb#17278
2024-02-13 12:28:30 +01:00
David Garcia
f45d9d33f1 docs: remove liveness asterisks
Instead of adding an asterisk next to "liveness" linking to the glossary, we will temporarily replace them with a hyperlink pending the implementation of tooltip functionality.

Closes scylladb/scylladb#17244
2024-02-12 20:37:52 +02:00
Avi Kivity
b22db74e6a Regenerate frozen toolchain
For gnutls 3.8.3 and clang clang-16.0.6-4.

Fixes #17285.

Closes scylladb/scylladb#17287
2024-02-12 18:36:11 +02:00
Botond Dénes
3f2d7e8b25 tree: remove unnecessary yields around for_each_tablet()
Commit 904bafd069 consolidated the two
existing for_each_tablet() overloads, to the one which has a future<>
returning callback. It also added yields to the bodies of said
callbacks. This is unnecessary, the loop in for_each_tablet() already
has a yield per tablet, which should be enough to prevent stalls.

This patch is a follow-up to #17118

Closes scylladb/scylladb#17284
2024-02-12 17:10:25 +01:00
Kamil Braun
2e81f045cc Merge 'transport: controller: do_start_server: do not set_cql_read for maintenance port' from Benny Halevy
RPC is not ready yet at this point, so we should not set this application state yet.

Also, simplify add_local_application_state as it contains dead code
that will never generate an internal error after 1d07a596bf.

Fixes #16932

Closes scylladb/scylladb#17263

* github.com:scylladb/scylladb:
  gossiper: add_local_application_state: drop internae error
  transport: controller: do_start_server: do not set_cql_read for maintenance port
2024-02-12 13:26:45 +01:00
Pavel Emelyanov
2b1612aa04 main: Stop lifecycle notifier for real
It wasn't because of storage service, not the latter is stopped (since
e6b34527c1), so the former can be stopped to

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17251
2024-02-12 13:59:50 +02:00
Kefu Chai
7baee379de sstable/storage: pass fs::path to storage::create_links()
this change is a follow-up of 637dd730. the goal is to use
std::filesystem::path for manipulating paths, and to avoid the
converting between sstring and fs::path back and forth.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17257
2024-02-12 13:26:11 +02:00
Kefu Chai
7a5cb69e33 storage_service: s/format()/fmt::format/
in the same spirit of e84a0991, let's switch the callers who expect
std::string to fmt::format(). to minimize the impact and to reduce
the risk, the switch will be performed piecemeal.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17253
2024-02-12 13:24:21 +02:00
Pavel Emelyanov
b9721bd397 test/tablets: Decommissioning node below RF is not allowed
When a node is decommissioned, all tablet replicas need to be moved away
from it. In some cases it may not be possible. If the number of node in
the cluster equals the keysapce RF, one cannot decommission any node
because it's not possible to find nodes for every replica.

The new test case validates this constraint is satisfied.

refs: #16195

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17248
2024-02-12 13:21:47 +02:00
Nadav Har'El
21e7deafeb alternator, mv: fix case of two new key columns in GSI
A materialized view in CQL allows AT MOST ONE view key column that
wasn't a key column in the base table. This is because if there were
two or more of those, the "liveness" (timestamp, ttl) of these different
columns can change at every update, and it's not possible to pick what
liveness to use for the view row we create.

We made an exception for this rule for Alternator: DynamoDB's API allows
creating a GSI whose partition key and range key are both regular columns
in the base table, and we must support this. We claim that the fact that
Alternator allows neither TTL (Alternator's "TTL" is a different feature)
nor user-defined timestamps, does allow picking the liveness for the view
row we create. But we did it wrong!

We claimed in a comment - and implemented in the code before this patch -
that in Alternator we can assume that both GSI key columns will have the
*same* liveness, and in particular timestamp. But this is only true if
one modifies both columns together! In fact, in general it is not true:
We can have two non-key attributes 'a' and 'b' which are the GSI's key
columns, and we can modify *only* b, without modifying a, in which case
the timestamp of the view modification should be b's newer timestamp,
not a's older one. The existing code took a's timestamp, assuming it
will be the same as b's, which is incorrect. The result was that if
we repeatedly modify only b, all view updates will receive the same
timestamp (a's old timestamp), and a deletion will always win over
all the modifications. This patch includes a reproducing test written by
a user (@Zak-Kent) that demonstrates how after a view row is deleted
it doesn't get recreated - because all the modifications use the same
timestamp.

The fix is, as suggested above, to use the *higher* of the two
timestamps of both base-regular-column GSI key columns as the timestamp
for the new view rows or view row deletions. The reproducer that
failed before this patch passes with it. As usual, the reproducer
passes on AWS DynamoDB as well, proving that the test is correct and
should really work.

Fixes #17119

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17172
2024-02-12 13:17:29 +02:00
Nadav Har'El
341af86167 test/cql-pytest: reproducer for GROUP BY regression
This patch adds a simple reproducer for a regression in Scylla 5.4 caused
by commit 432cb02, breaking LIMIT support in GROUP BY.

Refs #17237

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17275
2024-02-12 13:09:52 +02:00
Kefu Chai
57df20eef8 configure.py: use un-deprecated module
PEP 632 deprecates distutils module, and it is remove from Python 3.12.
we are actually using the one vendored by setuptools, if we are using
3.12. so let's use shutil for finding ninja executable.
see https://peps.python.org/pep-0632/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17271
2024-02-12 13:05:35 +02:00
Kamil Braun
7d73c40125 Merge 'test.py: tablets: Fix flakiness of test_tablet_missing_data_repair' from Tomasz Grabiec
Reimplements stop/start sequence using rolling_restart() which is safe
with regards to UP status propagation and not prone to sudden
connection drop which may cause later CQL queries to time out. It also
ensures that CQL is up on all the remaining nodes when the with_down
callback is executed.

The test was observed to fail in CI like this:

```
  cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.157.135.26:9042 datacenter1>: ConnectionException('Pool for 127.157.135.26:9042 is shutdown')})
  ...
      @pytest.mark.repair
      @pytest.mark.asyncio
      async def test_tablet_missing_data_repair(manager: ManagerClient):
  ...
          for idx in range(0,3):
              s = servers[idx].server_id
              await manager.server_stop_gracefully(s, timeout=120)
  >           await check()
```

Hopefully: Fixes #17107

Closes scylladb/scylladb#17252

* github.com:scylladb/scylladb:
  test: py: tablets: Fix flakiness of test_tablet_missing_data_repair
  test: pylib: manager_client: Wait for driver to catch up in rolling_restart()
  test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down
2024-02-12 11:52:09 +01:00
Botond Dénes
f068d1a6fa query: do not kill unpaged queries when they reach the tombstone-limit
The reason we introduced the tombstone-limit
(query_tombstone_page_limit), was to allow paged queries to return
incomplete/empty pages in the face of large tombstone spans. This works
by cutting the page after the tombstone-limit amount of tombstones were
processed. If the read is unpaged, it is killed instead. This was a
mistake. First, it doesn't really make sense, the reason we introduced
the tombstone limit, was to allow paged queries to process large
tombstone-spans without timing out. It does not help unpaged queries.
Furthermore, the tombstone-limit can kill internal queries done on
behalf of user queries, because all our internal queries are unpaged.
This can cause denial of service.

So in this patch we disable the tombstone-limit for unpaged queries
altogether, they are allowed to continue even after having processed the
configured limit of tombstones.

Fixes: #17241

Closes scylladb/scylladb#17242
2024-02-12 12:34:04 +02:00
Kefu Chai
9b85d1aebf configure.py, cmake: do not pass -Wignored-qualifiers explicitly
we recently added -Wextra to configure.py, and this option enables
a bunch of warning options, including `-Wignored-qualifiers`. so
there is no need to enable this specific warning anymore. this change
remove ths option from both `configure.py` and the CMake building system.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17272
2024-02-12 12:32:00 +02:00
Avi Kivity
c14571af16 Update seastar submodule
Because Seastar now defaults to C++23, we downgrade it explicitly to
C++20.

* seastar 289ad5e593...5d3ee98073 (10):
  > Update supported C++ standards to C++23 and C++20 (dropping C++17)
  > docker: install clang-tools-18
  > http: add handler_base::verify_mandatory_params()
  > coroutine/exception: document return_exception_ptr()
  > http: use structured-binding when appropriate
  > test/http: Read full server response before sending next
  > doc/lambda-coroutine-fiasco: fix a syntax error
  > util/source_location-compat: use __cpp_consteval
  > Fix incorrect class name in documentation.
  > Add support for missing HTTP PATCH method.

Closes scylladb/scylladb#17268
2024-02-12 12:21:47 +02:00
Patryk Wrobel
9fccd968d3 test_tablets.py: implement test_tablet_count_metric_per_shard
This change introduces a new test that verifies the
functionality related to tablet_count metric.

It checks if tablet_count metric is correctly reported
and updated when new tables are created, when tables
are dropped and when `move_tablet` is executed.

Refs: scylladb#16131
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17165
2024-02-12 11:49:38 +02:00
Kefu Chai
54995fcac0 test/manual: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17255
2024-02-12 11:49:38 +02:00
Patryk Jędrzejczak
38e1ddb8bc docs: using-scylla: cdc: remove info about failing writes to old generations
In one of the previous patches, we have allowed writing to the
previous CDC generations for `generation_leeway`. This change has
made the information about failing writes to the previous
generation and the "rejecting writes to an old generation" example
obsolete so we remove them.

After the change, a write can only fail if its timestamp is distant
from the node's timestamp. We add the information about it.
2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak
9b923f8b81 docs: dev: cdc: document writing to previous CDC generations
We update the dev documentation after allowing writes to the
previous CDC generations in one of the previous patches.
2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak
e64162e8f6 test: add test_writes_to_previous_cdc_generations
In one of the previous patches, we allowed writing to the previous
CDC generations for `generation_leeway`. Now, we add tests for this
change.
2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak
0470b721c2 cdc: generation: allow increasing generation_leeway through error injection
The increased `generation_leeway` is used in the next patch to
write a test. Since it's no longer a constant, we create a new
getter for it.
2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak
330a37b5c9 cdc: metadata: allow sending writes to the previous generations
Before this patch, writes to the previous CDC generations would
always be rejected. After this patch, they will be accepted if
the write's timestamp is greater than `now - generation_leeway`.

This change was proposed around 3 years ago. The motivation was
to improve user experience. If a client generates timestamps by
itself and its clock is desynchronized with the clock of the node
the client is connected to, there could be a period during
generation switching when writes fail. We didn't consider this
problem critical because the client could simply retry a failed
write with a higher timestamp. Eventually, it would succeed. This
approach is safe because these failed writes cannot have any side
effects. However, it can be inconvenient. Writing to previous
generations was proposed to improve it.

The idea was rejected 3 years ago. Recently, it turned out that
there is a case when the client cannot retry a write with the
increased timestamp. It happens when a table uses CDC and LWT,
which makes timestamps permanent. Once Paxos commits an entry with
a given timestamp, Scylla will keep trying to apply that entry
until it succeeds, with the same timestamp. Applying the entry
involves writing to the CDC log table. If it fails, we get stuck.
It's a major bug with an unknown perfect solution.

Allowing writes to previous generations for `generation_leeway` is
a probabilistic fix that should solve the problem in practice.

Note that allowing writes only to the previous generation might
not be enough. With the Raft-based topology, it is possible to
add multiple nodes concurrently. Moreover, tablets make streaming
instant, which allows the topology coordinator to add multiple nodes
very quickly. So, creating generations with almost identical
timestamps is possible. Then, we could encounter the same bug but,
for example, for a generation before the previous generation.
2024-02-12 10:14:00 +01:00
Asias He
a0e46a6b47 repair: Fix rpc::source and rpc::optional parameter order in rpc message
In a mixed cluster (5.4.1-20231231.3d22f42cf9c3 and
5.5.0~dev-20240119.b1ba904c4977), in the rolling upgrade test, we saw
repair never finishing.

The following was observed:

rpc - client 127.0.0.2:65273 msg_id 5524:  caught exception while
processing a message: std::out_of_range (deserialization buffer
underflow)

It turns out the repair rpc message was not compatible between the two
versions. Even with a rpc stream verb, the new rpc parameters must come
after the rpc::source<> parameter. The rpc::source<> parameter is not
special in the sense that it must be the last parameter.

For example, it should be:

void register_repair_get_row_diff_with_rpc_stream(
std::function<future<rpc::sink<repair_row_on_wire_with_cmd>> (
const rpc::client_info& cinfo, uint32_t repair_meta_id,
rpc::source<repair_hash_with_cmd> source, rpc::optional<shard_id> dst_cpu_id_opt)>&& func);

not:

void register_repair_get_row_diff_with_rpc_stream(
std::function<future<rpc::sink<repair_row_on_wire_with_cmd>> (
const rpc::client_info& cinfo, uint32_t repair_meta_id,
rpc::optional<shard_id> dst_cpu_id_opt, rpc::source<repair_hash_with_cmd> source)>&& func);

Fixes #16941

Closes scylladb/scylladb#17156
2024-02-12 09:50:30 +02:00
Nadav Har'El
13e16475fa cql-pytest: fix skipping of tests on Cassandra or old Scylla
Recently we added a trick to allow running cql-pytests either with or
without tablets. A single fixture test_keyspace uses two separate
fixtures test_keyspace_tablets or test_keyspace_vnodes, as requested.

The problem is that even if test_keyspace doesn't use its
test_keyspace_tablets fixture (it doesn't, if the test isn't
parameterized to ask for tablets explicitly), it's still a fixture,
and it causes the test to be skipped. This causes every test to be
skipped when running on Cassandra or old Scylla which doesn't support
tablets.

The fix is simple - the internal fixture test_keyspace_tablets should
yield None instead of skipping. It is the caller, test_keyspace, which
now skips the test if tablets are requested but test_keyspace_tablets
is None.

Fixes #17266

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17267
2024-02-11 21:03:25 +02:00
Kefu Chai
f990ea9678 tools/scylla-nodetool: implement describecluster
Refs #15588
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17240
2024-02-11 20:21:07 +02:00
Avi Kivity
14bf09f447 Merge 'utils: managed_bytes: optimize memory usage for small buffers' from Michał Chojnowski
managed_bytes is implemented as chain of blob_storage objects.
Each blob_storage contains 24 bytes of metadata. But in the most
common case -- when there is only a single element in the chain --
16 bytes of this metadata is trivial/unused.

This is regrettable waste because managed_bytes is used for every
database cell in the memtables and cache. It means that every value
of size >= 7 bytes (smaller ones fit in the inline storage of
managed_bytes) receives 16 bytes of useless overhead.

To correct that, this series adds to managed_bytes an alternative storage
layout -- used for buffers small enough to fit in one fragment -- which only
stores the necessary minimum of metadata. (That is: a pointer to the parent,
to facilitate moving the storage during memory defragmentation).

This saves 16 bytes on every cell greater than 15 bytes. Which includes e.g.
every live cell with value bigger than 6 bytes, which likely applies to most cells.

Before:
```
$ build/release/scylla perf-simple-query --duration 10
median 218692.88 tps ( 61.1 allocs/op,  13.1 tasks/op,   41762 insns/op,        0 errors)
$ build/release/scylla perf-simple-query --duration 10 --write
median 173511.46 tps ( 58.3 allocs/op,  13.2 tasks/op,   53258 insns/op,        0 errors)
$ build/release/test/perf/mutation_footprint_test -c1 --row-count=20 --partition-count=100 --data-size=8 --column-count=16
 - in cache:     2580222
 - in memtable:  2549852
```

After:
```
$ build/release/scylla perf-simple-query --duration 10
median 218780.89 tps ( 61.1 allocs/op,  13.1 tasks/op,   41763 insns/op,        0 errors)
$ build/release/scylla perf-simple-query --duration 10 --write
median 173105.78 tps ( 58.3 allocs/op,  13.2 tasks/op,   52913 insns/op,        0 errors)
$ build/release/test/perf/mutation_footprint_test -c1 --row-count=20 --partition-count=100 --data-size=8 --column-count=16
 - in cache:     2068238
 - in memtable:  2037696
```

Closes scylladb/scylladb#14263

* github.com:scylladb/scylladb:
  utils: managed_bytes: optimize memory usage for small buffers
  utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view
2024-02-11 16:43:40 +02:00
Kefu Chai
cfb2c2c758 db: add formatter for gc_clock::time_point
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gc_clock::time_point`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17254
2024-02-11 16:39:25 +02:00
Kefu Chai
33224cc10b sstables/storage: avoid unnecessary type cast
the type of `_dir` was changed to fs::path back in 637dd730, there
is no need to cast `_dir` to fs::path anymore.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17256
2024-02-11 16:37:05 +02:00
Benny Halevy
2ed29e31db gms: inet_address: make constructors explicit
In particular, `inet_address(const sstring& addr)` is
dangerous, since a function like
`topology::get_datacenter(inet_address ep)`
might accidentally convert a `sstring` argument
into an `inet_address` (which would most likely
throw an obscure std::invalid_argument if the datacenter
name does not look like an inet_address).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17260
2024-02-11 15:44:13 +02:00
Benny Halevy
136df58cbc data_value: delete data_value(T*) constructor
Currently, since the data_value(bool) ctor
is implicit, pointers of any kind are implicitly
convertible to data_value via intermediate conversion
to `bool`.

This is error prone, since it allows unsafe comparison
between e.g. an `sstring` with `some*` by implicit
conversion of both sides to `data_value`.

For example:
```
    sstring name = "dc1";
    struct X {
        sstring s;
    };
    X x(name);
    auto p = &x;
    if (name == p) {}
```

Refs #17261

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17262
2024-02-11 15:42:55 +02:00
Benny Halevy
f86a5072d6 gossiper: add_local_application_state: drop internae error
After 1d07a596bf that
dropped before_change notifications there is no sense
in getting the local endpoint_state_ptr twice: before
and after the notifications and call on_internal_error
if the state isn't found after the notifications.

Just throw the runtime_error if the endpoint state is not
found, otherwise, use it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-02-11 13:33:26 +02:00
Benny Halevy
ac83df4875 transport: controller: do_start_server: do not set_cql_read for maintenance port
RPC is not ready yet at this point, so we should not
set this application state yet.

This is indicated by the following warning from
`gossiper::add_local_application_state`:
```
WARN  2024-01-22 23:40:53,978 [shard 0:stmt] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.227.191.13, application_states = {{RPC_READY -> Value(1,1)}})
```

That should really be an internal error, but
it can't because of this bug.

Fixes #16932

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-02-11 11:49:52 +02:00
Kefu Chai
d7a404e1ec alternator: add formatter for alternator::calculate_value_caller
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `alternator::calculate_value_caller`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17259
2024-02-11 11:49:46 +02:00
Michał Chojnowski
5a3e4a1cc0 utils: managed_bytes: optimize memory usage for small buffers
managed_bytes is implemented as chain of blob_storage objects.
Each blob_storage contains 24 bytes of metadata. But in the most
common case -- when there is only a single element in the chain --
16 bytes of this metadata is trivial/unused.

This is regrettable waste because managed_bytes is used for every
database cell in the memtables and cache. It means that every value
of size >= 7 bytes (smaller ones fit in the inline storage of
managed_bytes) receives 16 bytes of useless overhead.

To correct that, this patch adds to managed_bytes an alternative storage
layout -- used for buffers small enough to fit in one contiguous
fragment -- which only stores the necessary minimum of metadata.
(That is: a pointer to the parent, to facilitate moving the storage during
memory defragmentation).
2024-02-09 20:56:20 +01:00
Tomasz Grabiec
1eedc85990 test: py: tablets: Fix flakiness of test_tablet_missing_data_repair
Reimplement stop/start sequence using rolling_restart() which is safe
with regards to UP status propagation and not prone to sudden
connection drop which may cause later CQL queries to time out. It also
ensures that CQL is up on all the remaining nodes when the with_down
callback is executed.

Hopefully: Fixes #17107
2024-02-09 20:37:06 +01:00
Tomasz Grabiec
27ed2d94fc test: pylib: manager_client: Wait for driver to catch up in rolling_restart()
For sanity of the developers who want to execute CQL queries after
rolling restarts.
2024-02-09 20:35:41 +01:00
Tomasz Grabiec
3ce4ec796a test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down 2024-02-09 20:35:41 +01:00
Pavel Emelyanov
7a710425f0 streaming: Open-code on-stack lambda
It just wraps one if, no benefit in keeping it this way

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17250
2024-02-09 20:31:09 +01:00
Petr Gusev
4554653ad9 storage_proxy: add a test for stop_remote
This patch adds a reproducer test for an issue #16382.
See scylladb/seastar#2044 for details of the problem.

The test is enabled only in dev mode since it requires
error injection mechanism. The patch adds a new injection
into storage_proxy::handle_read to simulate the problem
scenario - the node is shutting down and there are some
unfinished pending replica requests.

Closes scylladb/scylladb#16776
2024-02-09 17:23:13 +01:00
Michał Chojnowski
277a31f0ae utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view
Some methods of managed_bytes contain the logic needed to read/write the
contents of managed_bytes, even though this logic is already present in
managed_bytes_{,mutable}_view.

Reimplementing those methods by using the views as intermediates allows us to
remove some code and makes the responsibilities cleaner -- after the change,
managed_bytes contains the logic of allocating and freeing the storage,
while views provide read/write access to the storage.

This change will simplify the next patch which changes the internals of
managed_bytes.
2024-02-09 17:00:33 +01:00
Botond Dénes
ba89b86913 Update tools/java submodule
* tools/java c75ce2c1...5e11ed17 (1):
  > bin/nodetool-wrapper: pass all args to nodetool for testings its ability
2024-02-09 16:34:47 +01:00
Raphael S. Carvalho
daa82f406c test_tablets: Enable table debug log in split test
If the test fails, it's helpful to see how split completion was
handled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17236
2024-02-09 14:38:24 +02:00
Mikołaj Grzebieluch
38191144ac transport/controller: get rid of magic number for socket path's maximal length
Calculate `max_socket_length` from the size of the structure
representing the Unix domain socket address.
2024-02-09 12:32:37 +01:00
Mikołaj Grzebieluch
fffb732704 transport/controller: set unix_domain_socket_permissions for maintenance_socket
Set filesystem permissions for the maintenance socket to 660.

Fixes #16487
2024-02-09 12:32:26 +01:00
Botond Dénes
c7d9708092 Merge 'repair: delete table reference from repair related classes' from Aleksandra Martyniuk
row_level_repair and repair_meta keep a reference to a table.
If the table is dropped during repair, its object is destructed, leaving
a dangling reference.

Delete {row_level_repair,repair_meta}::_cf and replace their usages.

Fixes: #17233.

Closes scylladb/scylladb#17234

* github.com:scylladb/scylladb:
  repair: delete _cf from repair_meta
  repair: delete _cf from row_level_repair
2024-02-09 13:16:43 +02:00
Kamil Braun
e9e24f47ec Merge 'raft topology: implement upgrade and recovery procedure' from Piotr Dulikowski
This PR implements a procedure that upgrades existing clusters to use
raft-based topology operations. The procedure does not start
automatically, it must be triggered manually by the administrator after
making sure that no topology operations are currently running.

Upgrade is triggered by sending `POST
/storage_service/raft_topology/upgrade` request. This causes the
topology coordinator to start who drives the rest of the process: it
builds the `system.topology` state based on information observed in
gossip and tells all nodes to switch to raft mode. Then, topology
coordinator runs normally.

Upgrade progress is tracked in a new static column `upgrade_state` in
`system.topology`.

The procedure also serves as an extension to the current recovery
procedure on raft. The current recovery procedure requires restarting
nodes in a special mode which disables raft, perform `nodetool
removenode` on the dead nodes, clean up some state on the nodes and
restart them so that they automatically rebuild the group 0. Raft
topology fits into existing procedure by falling back to legacy topology
operations after disabling raft. After rebuilding the group 0, upgrade
needs to be triggered again.

Because upgrade is manual and it might not be convenient for
administrators to run it right after upgrading the cluster, we allow the
cluster to operate in legacy topology operations mode until upgrade,
which includes allowing new nodes to join. In order to allow it, nodes
now ask the cluster about the mode they should use to join before
proceeding by using a new `JOIN_NODE_QUERY` RPC.

The procedure is explained in more detail in `topology-over-raft.md`.

Fixes: https://github.com/scylladb/scylladb/issues/15008

Closes scylladb/scylladb#17077

* github.com:scylladb/scylladb:
  test/topology_custom: upgrade/recovery tests for topology on raft
  cdc/generation_service: in legacy mode, fall back to raft tables
  system_keyspace: add read_cdc_generation_opt
  cdc/generation_service: turn off gossip notifications in raft topo mode
  cql_test_env: move raft_topology_change_enabled var earlier
  group0_state_machine: pull snapshot after raft topology feature enabled
  storage_service: disable persistent feature enabler on upgrade
  storage_service: replicate raft features to system.peers
  storage_service: gossip tokens and cdc generation in raft topology mode
  API: add api for triggering and monitoring topology-on-raft upgrade
  storage_service: infer which topology operations to use on startup
  storage_service: set the topology kind value based on group 0 state
  raft_group0: expose link to the upgrade doc in the header
  feature_service: fall back to checking legacy features on startup
  storage_service: add fiber for tracking the topology upgrade progress
  gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
  topology_coordinator: implement core upgrade logic
  topology_coordinator: extract top-level error handling logic
  storage_service: initialize discovery leader's state earlier
  topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data
  topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data
  topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data
  topology_state_machine: introduce upgrade_state
  storage_service: disallow topology ops when upgrade is in progress
  raft_group0_client: add in_recovery method
  storage_service: introduce join_node_query verb
  raft_group0: make discover_group0 public
  raft_group0: filter current node's IP in discover_group0
  raft_group0: remove my_id arg from discover_group0
  storage_service: make _raft_topology_change_enabled more advanced
  docs: document raft topology upgrade and recovery
2024-02-09 11:54:53 +01:00
Kefu Chai
c1c96bbc16 api/storage_service: drop /storage_service/describe_ring/ API
per its description, "`/storage_service/describe_ring/`" returns the
token ranges of an arbitrary keyspace. actually, it returns the
first keyspace which is of non-local-vnode-based-strategy. this API
is not used by nodetool, neither is it exercised in dtest.
scylla-manager has a wrapper for this API though, but that wrapper
is not used anywhere.

in this change, this API is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17197
2024-02-09 12:49:21 +02:00
Pavel Emelyanov
309d34a147 topology: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
f7a13b9bb0 topology: Drop if_enabled checks for logging
Now all the logged arguments are lazily evaluated (node* format string
and backtrace) so the preliminary log-level checks are not needed.

indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
c1ea6c8acf topology: Add lazy_backtrace() helper
This helper returns lazy_eval-ed current_backtrace(), so it will be
generated and printed only if logger is really going to do it with its
current log-level.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
da53854b66 topology: Add printer wrapper for node* and formatter for it
Currently to print node information there's a debug_format(node*) helper
function that returns back an sstring object. Here's the formatter
that's more flexible and convenient, and a node_printer wrapper, since
formatters cannot format non-void pointers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
aa0293f411 topology: Expand formatter<locator::node>
Equip it with :v specifier that turns verbose mode on and prints much
more data about the node. Main user will appear in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Kefu Chai
c07de1fad1 topology_coordinator: s/sate/state/
fix a typo in the logging message.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17201
2024-02-09 10:27:33 +01:00
Kefu Chai
876478b84f storage_service: allow concurrent tablet migration in tablets/move API
Currently it waits for topology state machine to be idle, so it allows
one tablet to be moved at a time. We should allow it to start migration
if the current transition state is

- topology::transition_state::tablet_migration or
- topology::transition_state::tablet_draining

to allow starting parallel tablet movement. That will be useful when
scripting a custom rebalancing algorithm.

in this change, we wait until the topology state machine is idle or
it is at either of the above two states.

Fixes #16437
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17203
2024-02-08 21:47:15 +01:00
Piotr Dulikowski
4d4976feb0 test/topology_custom: upgrade/recovery tests for topology on raft
Adds three tests for the new upgrade procedure:

- test_topology_upgrade - upgrades a cluster operating in legacy mode to
  use raft topology operations,
- test_topology_recovery_basic - performs recovery on a three-node
  cluster, no node removal is done,
- test_topology_majority_loss - simulates a majority loss scenario, i.e.
  removed two nodes out of three, performs recovery to rebuild the
  raft topology state and re-add two nodes back.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
d04b3338ce cdc/generation_service: in legacy mode, fall back to raft tables
When a node enters recovery after being in raft topology mode, topology
operations switch back to legacy mode. We want CDC to keep working when
that happens, so we need for the legacy code to be able to access
generations created back in raft mode - so that the node can still
properly serve writes to CDC log tables.

In order to make this possible, modify the legacy logic to also look for
a cdc generation in raft tables, if it is not found in legacy tables.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
fb02453686 system_keyspace: add read_cdc_generation_opt
The `system_keyspace::read_cdc_generation` loads a cdc generation from
the system tables. One of its preconditions is that the generation
exists - this precondition is quite easy to satisfy in raft mode, and
the function was designed to be used solely in that mode.

In legacy mode however, in case when we revert from raft mode through
recovery, it might be necessary to use generations created in raft mode
for some time. In order to make the function useful as a fallback in
case lookup of a generation in legacy mode fails, introduce a relaxed
variant of `read_cdc_generation` which returns std::nullopt if the
generation does not exist.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
77a8f5e3d6 cdc/generation_service: turn off gossip notifications in raft topo mode
In raft topology mode CDC information is propagated through group 0.
Prevent the generation service from reacting to gossiper notifications
after we made the switch to raft mode.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
29e286ee03 cql_test_env: move raft_topology_change_enabled var earlier
We will need to pass it to cdc::generation_service::config in the next
commit, so move it a bit earlier.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
07aba3abc4 group0_state_machine: pull snapshot after raft topology feature enabled
Pulling a snapshot of the raft topology is done via new rpc verb
(RAFT_PULL_TOPOLOGY_SNAPSHOT). If the recipient runs an older version of
scylla and does not understand the verb, sending it will result in an
error. We usually use cluster features to avoid such situations, but in
the case when a node joins the cluster, it doesn't have access to
features yet. Therefore, we need to enable pulling snapshots in two
situations:

- when the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature becomes enabled,
- in case when starting group 0 server when joining a cluster that uses
  raft-based topology.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
53932420f8 storage_service: disable persistent feature enabler on upgrade
When starting in legacy mode, a gossip event listener called persistent
feature enabler is registered. This listener marks a feature as enabled
when it notices, in gossip, that all nodes declare support for the
feature.

With raft-based topology, features are managed in group 0 instead and do
not rely on the persistent feature enabler at all. Make the listener
look at the raft_topology_change_enabled() method and prevent it from
enabling more features after that method starts returning true.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
4fdd3e014a storage_service: replicate raft features to system.peers
This is necessary for cluster features to work after we switch from raft
topology mode to legacy topology mode during recovery, because
information in system.peers is used during legacy cluster feature check
and when enabling features.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
08865a0bd7 storage_service: gossip tokens and cdc generation in raft topology mode
A mixed raft/legacy cluster can happen when entering recovery mode, i.e.
when the group 0 upgrade state is set to 0 and a rolling restart is
performed. Legacy nodes expect at least information about tokens,
otherwise an internal error occurs in the handle_state_normal function.
Therefore, make nodes that use raft topology behave well with respect to
other nodes.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
a672383c2a API: add api for triggering and monitoring topology-on-raft upgrade
Implements the /storage_service/raft_topology/upgrade route. The route
supports two methods: POST, which triggers the cluster-wide upgrade to
topology-on-raft, and GET which reports the status of the upgrade.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
0bfcf7d4c6 storage_service: infer which topology operations to use on startup
Adds a piece of logic to storage_service::join_cluster which chooses the
mode in which it will boot.

If the experimental raft topology flag is disabled, it will fall back to
legacy node operations.

When the node starts for the first time, it will perform group 0
discovery. If the node creates a cluster, it will start it in raft
topology mode. If it joins an existing one, it will ask the node chosen
by the discovery algorithm about which joining method to use.

If the node is already a part of the cluster, it will base its decision
on the group0 state.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
1e0aae8576 storage_service: set the topology kind value based on group 0 state
When booting for the first time, the node determines whether to use raft
mode or not by asking the cluster, or by going straight to raft mode
when it creates a new cluster by itself. This happens before joining
group 0. However, right after joining group 0, the `upgrade_state`
column from `system.topology` is supposed to control which operations
the node is supposed to be using.

In order to have a single source of control over the flag (either
storage_service code or group 0 code), the
`_manage_topology_change_kind_from_group0` flag is added which controls
whether the `_topology_change_kind_enabled` flag is controlled from
group 0 or not.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
5392bac85b raft_group0: expose link to the upgrade doc in the header
So that it can be referenced from other files.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
3513a07d8a feature_service: fall back to checking legacy features on startup
When checking features on startup (i.e. whether support for any feature
was revoked in an unsafe way), it might happen that upgrade to raft
topology didn't finish yet. In that case, instead of loading an empty
set of features - which supposedly represents the set of features that
were enabled until last boot - we should fall back to loading the set
from the legacy `enabled_features` key in `system.scylla_local`.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
d5a2837658 storage_service: add fiber for tracking the topology upgrade progress
The topology coordinator fiber is not started if a node starts in legacy
topology mode. We need to start the raft state monitor fiber after all
preconditions for starting upgrade to raft topology are met.

Add a fiber which is spawned only in legacy mode that will wait until:

- The schema-on-raft upgrade finishes,
- The SUPPORTS_CONSISTENT_CLUSTER_MANAGEMENT feature is enabled,
- The upgrade is triggered by the user.

and, after that, will spawn the raft state monitor fiber.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
2ecb8641b1 gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
All nodes being capable of support for raft topology is a prerequisite
for starting upgrade to raft topology. The newly introduced feature will
track this prerequisite.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
a55797fd41 topology_coordinator: implement core upgrade logic
Implement topology coordinator's logic responsible for building the
group 0 state related to topology.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
b3369611bc topology_coordinator: extract top-level error handling logic
...to a separate method. It will be reused in another method that will
be introduced in the next commit.
2024-02-08 19:09:35 +01:00
Kefu Chai
082ad51b71 .git: skip *.svg when scanning spelling errors
codespell reports following warnings:
```
Error: ./docs/kb/flamegraph.svg:1: writen ==> written
Error: ./docs/kb/flamegraph.svg:1: writen ==> written
Error: ./docs/kb/flamegraph.svg:1: storag ==> storage
Error: ./docs/kb/flamegraph.svg:1: storag ==> storage
```

these misspellings come from the flamgraph, which can be viewed
at https://opensource.docs.scylladb.com/master/kb/flamegraph.html
they are very likely to be truncated function names displayed
in the frames. and the spelling of these names are not responsible
of the author of the article, neither can we change them in a
meaningful way. so add it to the skip list.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17215
2024-02-08 19:46:54 +02:00
Kefu Chai
e84a09911a data_dictionary: use fmt::format() when appropriate
we have three format()s in our arsenal:

* seastar::format()
* fmt::format()
* std::format()

the first one is used most frequently. but it has two limitations:

1. it returns seastar::sstring instead of std::string. under some
   circumstances, the caller of the format() function actually
   expects std::string, in that case a deep copy is performed to
   construct an instance of std::string. this incurs unnecessary
   performance overhead. but this limitation is a by-design behavior.
2. it does not do compile-time format check. this can be improved
   at the Seastar's end.

to address these two problems, we switch the callers who expect
std::string to fmt::format(). to minimize the impact and to reduce
the risk, the switch will be performed piecemeal.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17212
2024-02-08 19:44:56 +02:00
Kefu Chai
64c829da70 docs: reformat the state machine diagram using mermaid
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16620
2024-02-08 19:43:53 +02:00
Kefu Chai
3dfb0f86f1 db: add formatter for error_injection_at_startup
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `error_injection_at_startup`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17211
2024-02-08 19:40:48 +02:00
Piotr Dulikowski
09a6862f96 storage_service: initialize discovery leader's state earlier
Move it before the topology coordinator is started. This way, the
topology coordinator will see non-empty state when it is started and it
will allow for us to assert that topology coordinator is never started
for an empty system.topology table.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
61e2b2fd9f topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data
Extend the prepare_and_broadcast_cdc_generation_data function like we
did in the case of prepare_new_cdc_generation_data - the topology
coordinator state building process not only has to create a new
generation, but also broadcast it.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
0d9b88fd78 topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data
During topology coordinator state build phase a new cdc generation will
be generated. We can reuse prepare_new_cdc_generation_data for that.
Currently, it always takes sharding information (shard count + ignore
msb) from the topology state machine - which won't be available yet at
the point of building the topology, so extend the function so that it
can accept a custom source of sharding information.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
573bb8dd98 topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data
The FIXME mentions that token metadata should return host ID for given
token (instead of, presumably, an IP) - but that is already the case, so
let's remove the fixme.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
32a2e24a0f topology_state_machine: introduce upgrade_state
`upgrade_state` is a static column which will be used to track the
progress of building the topology state machine.
2024-02-08 18:05:02 +01:00
Piotr Dulikowski
b8e4e04096 storage_service: disallow topology ops when upgrade is in progress
Forbid starting new topology changes while upgrade to topology on raft
is in progress. While this does not take into account any ongoing
topology operations, it makes sure that at the end of the upgrade no
node will try to perform any legacy topology operations.
2024-02-08 18:05:02 +01:00
Avi Kivity
f1e11a7060 Merge 'scylla-nodetool: implement the describering command' from Botond Dénes
On top of the capabilities of the java-nodetool command, tablet support is also implemented: in addition to the existing keyspace parameter, an optional table parameter is also accepted and forwarded to the REST API. For tablet keyspaces this is required to get a ring description.

The command comes with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588
Refs: https://github.com/scylladb/scylladb/issues/16846

Closes scylladb/scylladb#17163

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement describering
  tools/scylla-nodetool.cc: handle API request failures gracefully
  test/nodetool: util.py: add check_nodetool_fails_with_all()
2024-02-08 18:52:34 +02:00
Tomasz Grabiec
c06173b3a3 range_streamer, tablets: Do not keep token metadata around streaming
It holds back global token metadata barrier during streaming, which
limits parallelism of load balancing.

Topology transition is protected by the means of topology_guard.

Closes scylladb/scylladb#17230
2024-02-08 18:26:00 +02:00
Aleksandra Martyniuk
5f7263afb5 repair: delete _cf from repair_meta
repair_meta keeps a reference to a table. If the table is dropped
during repair, its object is destructed, leaving a dangling reference.

Delete repair_meta::_cf and replace its usages with appropriate
methods.
2024-02-08 17:01:41 +01:00
Aleksandra Martyniuk
36882e1c4a repair: delete _cf from row_level_repair
row_level_repair keeps a reference to a table. If the table is dropped
during repair, its object is destructed, leaving a dangling reference.

Delete row_level_repair::_cf and replace its usages with appropriate
methods.
2024-02-08 16:47:02 +01:00
Botond Dénes
8fcb4ed707 tools/scylla-nodetool: implement describering
Also implementing tablet support, which basically just means that a new
table parameter is also accepted and forwarded to the API, in addition
to the existing keyspace one.
2024-02-08 09:20:25 -05:00
Botond Dénes
2df2733ed1 tools/scylla-nodetool.cc: handle API request failures gracefully
Currently, error handling is done via catching
http::unexpected_status_error and re-throwing an std::runtime_error.
Turns out this no longer works, because this error will only be thrown
by the http client, if the request had an expected reply code set.
The scylla_rest_client doesn't set an expected reply code, so this
exception was never thrown for some time now.
Furthermore, even when the above worked, it was not too user-friendly as
the error message only included the reply-code, but not the reply
itself.

So in this patch this is fixed:
* The handling of http::unexpected_status_error is removed, we don't
  want to use this mechanism, because it yields very terse error
  messages.
* Instead, the status code of the request is checked explicitely and all
  cases where it is not 200 are handled.
* A new api_request_failed exception is added, which is throw for all
  non-200 statuses with the extracted error message from the server (if
  any).
* This exception is caught by main, the error message is printed and
  scylla-nodetool returns with a new distinct error-code: 4.

With this, all cases where the request fails on ScyllaDB are handled and
we shouldn't hit cases where a nodetool command fails with some
obscure JSON parsing error, because the error reply has different JSON
schema than the expected happy-path reply.
2024-02-08 09:20:25 -05:00
Botond Dénes
d4f7f23b98 test/nodetool: util.py: add check_nodetool_fails_with_all()
Similar to the existing check_nodetool_fails_with() but checks that all
error messages from expected_errors are contained in stderr.

While at it, use list as the typing hint, instead of typing.List.
2024-02-08 09:20:25 -05:00
Kefu Chai
e02958ad35 sstable: let make_entry_descriptor() accept a single fs::path
both of its callers are passing parent_path() and filename() to
it. so let the callee to do this. simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17225
2024-02-08 16:44:16 +03:00
Kefu Chai
770baa806e streaming: ignore failures when streaming dropped tables
before this change, when performing `stream_transfer_task`, if an
exception is raised, we check if the table being streamed is still
around, if it is missing, we just skip the table as it should be
dropped during streaming, otherwise we consider it a failure, and
report it back to the peer. this behavior was introduced by
953af382.

but we perform the streaming on all shards in parallel, and if any
of the shards fail because of the dropped table, the exception is
thrown. and the current shard is not necessarily the one which
throws the exception. actually, current shard might be still
waiting for a write lock for removing the table from the database's
table metadata. in that case, we consider the streaming RPC call a
failure even if the table is already removed on some shard(s). and
the peer would fail to bootstreap because of streaming failure.

in this change, before catching all exceptions, we handle
`no_such_column_family`, and do not fail the streaming in that case.
please note, we don't touch other tables, so we can just assume that
`no_such_column_family` is thrown only if the table to be transferred
is missing. that's why `assert()` is added.

Fixes #15370
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17160
2024-02-08 14:07:22 +02:00
Amnon Heiman
f4e82174b2 replica/table.cc: Align the tablet's behavior with other metrics.
Due to the potentially large number of per-table metrics, ScyllaDB uses
configuration to determine what metrics will be reported.  The user can
decide if they want per-table-per-shard metrics, per-table-per-instance
metrics, or none.

This patch uses the same logic for tablet metrics registration.
It adds a new metrics group tablets with one metric inside it - count.
So, scylla_tablets_count will report the number of tablets per shard.

The existing per-table metrics will be reported aggregated or not like
the other per-table metrics.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#17182
2024-02-08 12:48:25 +01:00
xuchang
9b675d1fe4 repair: resolve load_history shard load skew
Using uuid_xor_to_uint32 instance of table_uuid's most_significant_bits,
optimize the hash conflict to shard.
2024-02-08 18:18:01 +08:00
xuchang
ae422fdf69 repair: accelerate repair load_history time
Using `parallel_for_each_table` instance of `for_each_table_gently` on
`repair_service::load_history`, and parallel num 16 for each shard,
to reduced bootstrap time.
2024-02-08 18:18:01 +08:00
Kefu Chai
6eae678eb3 db: add formatter for gms::gossip_digest_ack2
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::gossip_digest_ack2`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17153
2024-02-08 11:49:37 +02:00
Kefu Chai
07da9fd197 sstable: change sstable_touch_directory_io_check() to accept fs::path
this change is a follow-up of 637dd730. the goal is to use
std::filesystem::path for manipulating paths, and to avoid the
converting between sstring and fs::path back and forth.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17214
2024-02-08 10:01:47 +03:00
Kefu Chai
2c859bc310 sstables: let state_to_dir(sstable_state) return string_view
state_to_dir(sstable_state) translate the enum to the corresponding
directory component. and it returns a `seastar::sstring`. not all
the callers of this function expect a full-blown sstring instance,
on the contrary, quite a few of them just want a string-alike object
which represents the directory component, so they can use it, for
instance to compose a path, or just format the given `state` enum.

so to avoid the overhead of creating/destroying the `seastar::sstring`
instance, let's switch to `std::string_view`. with this change, we
will be able to implement the fmt::formatter for `sstable_state`
without the help of the formatter of sstring.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17213
2024-02-08 10:00:08 +03:00
Kurashkin Nikita
7ce9a3e9e5 cql: add limits for integer values when creating date type
Added a simple check that prevents entering int values that lead to
overflow when creating a date type.

Fixes #17066

Closes scylladb/scylladb#17102
2024-02-08 00:08:01 +02:00
Michał Chojnowski
f5e3a728e4 row_cache_test: test cache consistency during memtable-to-cache merge
A rather minimal reproducer for #16759. Not extensive.
2024-02-07 18:31:36 +01:00
Michał Chojnowski
bed20a2e37 row_cache: use preemption_source in update()
To facilitate testing the state of cache after the update is preempted
at various points, pass a preemption_source& to update() instead of
calling the reactor directly.

In release builds, the calls to preemption_source methods should compile
to the same direct reactor calls as today. Only in dev mode they should
add an extra branch. (However, the `preemption_source&` argument has
to be shoveled in any mode).
2024-02-07 18:31:36 +01:00
Michał Chojnowski
fabab2f46f utils: preempt: add preemption_source
While `preemption_check` can be passed to functions to control
their preemption points, there is no way to inspect the
state of the system after the preemption results in a yield.

`preemption_source` is a superset of `preemption_check`,
which also allows for customizing the yield, not just the preemption
check. An implementation passed by a test can hook the yield to
put the tested function to sleep, run some code, and then wake the
function up.

We use the preprocessor to minimize the impact on release builds.
Only dev-mode preemption_source is hookable. When it's used in other
modes, it should compile to direct reactor calls, as if it wasn't used.
2024-02-07 18:31:28 +01:00
Piotr Dulikowski
f6b303d589 raft_group0_client: add in_recovery method
It tells whether the current node currently operates in recovery mode or
not. It will be vital for storage_service in determining which topology
operations to use at startup.
2024-02-07 10:02:01 +01:00
Piotr Dulikowski
7601f40bf8 storage_service: introduce join_node_query verb
When a node joins an existing cluster, it will ask a node that already
belongs to the cluster about which topology operations to use when
joining.
2024-02-07 10:02:00 +01:00
Piotr Dulikowski
bab5d3bbe5 raft_group0: make discover_group0 public
The `discover_group0` function returns only after it either finds a node
that belongs to some group 0, or learns that the current node is
supposed to create a new one. It will be very helpful to storage_service
in determining which topology mode to use.
2024-02-07 10:00:16 +01:00
Piotr Dulikowski
367df7322e raft_group0: filter current node's IP in discover_group0
This was previously done by `setup_group0`, which always was an
(indirect) caller of `discover_group0`. As we want to make
`discover_group0` public, it's more convenient for the callers if the
called method takes care of sanitizing the argument.
2024-02-07 10:00:16 +01:00
Piotr Dulikowski
86e4a59d5b raft_group0: remove my_id arg from discover_group0
The goal is to make `discover_group0` public. The `my_id` argument was
always set to `this->load_my_id()`, so we can get rid of it and it will
make it more convenient to call `discover_group0` from the outside.
2024-02-07 10:00:16 +01:00
Piotr Dulikowski
4174a32d3f storage_service: make _raft_topology_change_enabled more advanced
Currently, nodes either operate in the topology-on-raft mode or legacy
mode, depending on whether the experimental topology on raft flag is
enabled. This also affects the way nodes join the cluster, as both modes
have different procedures.

We want to allow joining nodes in legacy mode until the cluster is
upgraded. Nodes should automatically choose the best method. Therefore,
the existing boolean _raft_topology_change_enabled flag is extended into
an enum with the following variants:

- unknown - the node still didn't decide in which mode it will operate
- legacy - the node uses legacy topology operations
- upgrading_to_raft - the node is upgrading to use raft topology
  operations
- raft - the node uses raft topology operations

Currently, only the `legacy` and `raft` variants are utilized, but this
will change in the commits that follow.

Additionally, the `_raft_experimental_topology` bool flag is introduced
which retains the meaning of the old `_raft_topology_change_enabled` but
has a more fitting name. It is explicitly needed in
`topology_state_load`.
2024-02-07 10:00:15 +01:00
Piotr Dulikowski
1104f8b00f docs: document raft topology upgrade and recovery 2024-02-07 09:54:54 +01:00
Botond Dénes
35da9551fb Merge 'storage_service: Add describe_ring support for tablet table' from Asias He
The table query param is added to get the describe_ring result for a
given table.

Both vnode table and tablet table can use this table param, so it is
easier for users to user.

If the table param is not provided by user and the keyspace contains
tablet table, the request will be rejected.

E.g.,
curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles"
curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1"

Refs #16509

Closes scylladb/scylladb#17118

* github.com:scylladb/scylladb:
  tablets: Convert to use the new version of for_each_tablet
  storage_service: Add describe_ring support for tablet table
  storage_service: Mark host2ip as const
  tablets: Add for_each_tablet_gently
2024-02-07 10:41:36 +02:00
Kefu Chai
b1e4513c2d dht: add formatter for dht::ring_position
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::ring_posittion`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17194
2024-02-07 09:30:45 +02:00
Kefu Chai
75be212ab2 lang: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17193
2024-02-07 09:27:39 +02:00
Pavel Emelyanov
ca261f8916 utils: Mark chunked_vector::max_chunk_capacity with constexpr
It uses only compile-time constants to produce the value, so deserves
this marking

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17181
2024-02-07 09:22:23 +02:00
Raphael S. Carvalho
41a5c9eaec test: Reduce mem footprint of test_token_group_based_splitting_mutation_writer
Reduces footprint from hundreds of MB to a very few MB.

Issue could be reproduced with:
./build/dev/test/boost/mutation_writer_test --run_test=test_token_group_based_splitting_mutation_writer -- -m 500M --smp 1 --random-seed 1848215131

Fixes #17076.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17187
2024-02-07 09:21:24 +02:00
Tomasz Grabiec
032c1a3d04 Merge 'tablets: Make sure topology has enough endpoints for RF' from Pavel Emelyanov
When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged.

With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE.

closes: #16529

Closes scylladb/scylladb#17079

* github.com:scylladb/scylladb:
  tablets: Make sure topology has enough endpoints for RF
  cql-pytest: Disable tablets when RF > nodes-in-DC
  test: Remove test that configures RF larger than the number of nodes
  keyspace_metadata: Include tablets property in DESCRIBE
2024-02-06 22:38:11 +01:00
Kefu Chai
f3845a7f3d sstable: replace "welp" with more descriptive words
despite that "welp" is more emotional expressive, it is considered
a misspelling of "whelp" by codespell. that's why this comment stands
out. but from a non-native speaker's point of view, probably we can
use more descriptive words to explain what "welp" is for in plain
English.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17183
2024-02-06 16:31:18 +02:00
David Garcia
f14edf3543 docs: correct image sorting order for reference docs
This commit displays images in reference docs in the correct order. Prior to this fix, the images were listed as 4.0.0, 4.0.1, and 4.0.2, but they should be sorted in reverse order (4.0.2, 4.0.1, 4.0.0).

The changes made in this PR resolve the issue introduced in https://github.com/scylladb/scylladb/pull/16942 when common functions for Azure and GCP were extracted into a separate file without reversing the list as defined in the original extension: https://github.com/scylladb/scylladb/pull/16942/files#diff-b8f6253ea8fdcca681deb556ca61cd1f3feb3b7aeb7e856b145ef9b685aad460L185

Closes scylladb/scylladb#17185
2024-02-06 16:24:22 +02:00
Kamil Braun
c0c291b985 Merge 'raft topology: harden IP related tests' from Petr Gusev
In this PR we add the tests for two scenarios, related to the use of IPs in raft topology.

* When the replaced node transitions to the `LEFT` state we used to
  remove the IP of such node from gossiper. If we replace with same IP,
  this caused the IP of the new node to be removed from gossiper. This
  problem was fixed by #16820, this PR adds a regression test for it.
* When a node is restarted after decommissioning some other node, the
  restarting node tries to apply the raft log, this log contains a
  record about the decommissioned node, and we got stuck trying to resolve
  its IP. This was fixed by #16639 - we excluded IPs from the RAFt log
  application code and moved it entirely to host_id-s. This PR adds a
  regression test for this case.

Closes scylladb/scylladb#15967
Closes scylladb/scylladb#14803

Closes scylladb/scylladb#17180

* github.com:scylladb/scylladb:
  test_topology_ops: check node restart after decommission
  test_replace_reuse_ip: check other servers see the IP
2024-02-06 14:28:06 +01:00
Nadav Har'El
14315fcbc3 mv: fix missing view deletions in some cases of range tombstones
For efficiency, if a base-table update generates many view updates that
go the same partition, they are collected as one mutation. If this
mutation grows too big it can lead to memory exhaustion, so since
commit 7d214800d0 we split the output
mutation to mutations no longer than 100 rows (max_rows_for_view_updates)
each.

This patch fixes a bug where this split was done incorrectly when
the update involved range tombstones, a bug which was discovered by
a user in a real use case (#17117).

Range tombstones are read in two parts, a beginning and an end, and the
code could split the processing between these two parts and the result
that some of the range tombstones in update could be missed - and the
view could miss some deletions that happened in the base table.

This patch fixes the code in two places to avoid breaking up the
processing between range tombstones:

1. The counter "_op_count" that decides where to break the output mutation
   should only be incremented when adding rows to this output mutation.
   The existing code strangely incrmented it on every read (!?) which
   resulted in the counter being incremented on every *input* fragment,
   and in particular could reach the limit 100 between two range
   tombstone pieces.

2. Moreover, the length of output was checked in the wrong place...
   The existing code could get to 100 rows, not check at that point,
   read the next input - half a range tombstone - and only *then*
   check that we reached 100 rows and stop. The fix is to calculate
   the number of rows in the right place - exactly when it's needed,
   not before the step.

The first change needs more justification: The old code, that incremented
_op_count on every input fragment and not just output fragments did not
fit the stated goal of its introduction - to avoid large allocations.
In one test it resulted in breaking up the output mutation to chunks of
25 rows instead of the intended 100 rows. But, maybe there was another
goal, to stop the iteration after 100 *input* rows and avoid the possibility
of stalls if there are no output rows? It turns out the answer is no -
we don't need this _op_count increment to avoid stalls: The function
build_some() uses `co_await on_results()` to run one step of processing
one input fragment - and `co_await` always checks for preemption.
I verfied that indeed no stalls happen by using the existing test
test_long_skipped_view_update_delete_with_timestamp. It generates a
very long base update where all the view updates go to the same partition,
but all but the last few updates don't generate any view updates.
I confirmed that the fixed code loops over all these input rows without
increasing _op_count and without generating any view update yet, but it
does NOT stall.

This patch also includes two tests reproducing this bug and confirming
its fixed, and also two additional tests for breaking up long deletions
that I wanted to make sure doesn't fail after this patch (it doesn't).

By the way, this fix would have also fixed issue #12297 - which we
fixed a year ago in a different way. That issue happend when the code
went through 100 input rows without generating *any* output rows,
and incorrectly concluding that there's no view update to send.
With this fix, the code no longer stops generating the view
update just because it saw 100 input rows - it would have waited
until it generated 100 output rows in the view update (or the
input is really done).

Fixes #17117

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#17164
2024-02-06 14:57:33 +02:00
Asias He
e7e1f4b01a streaming: Fix rpc::source and rpc::optional parameter order
The new rpc::optional parameter must come after any existing parameters,
including the rpc::source parameters, otherwise it will break
compatibility.

The regression was introduced in:

```
commit fd3c089ccc
Author: Tomasz Grabiec <tgrabiec@scylladb.com>
Date:   Thu Oct 26 00:35:19 2023 +0200

    service: range_streamer: Propagate topology_guard to receivers
```

We need to backport this patch ASAP before we release anything that
contains commit fd3c089ccc.

Refs: #16941
Fixes: #17175

Closes scylladb/scylladb#17176
2024-02-06 13:15:28 +01:00
Botond Dénes
a3d4131918 Merge 'Sanitize replication factor parsing by strategies' from Pavel Emelyanov
RF values appear as strings and strategies classes convert them to integers. This PR removes some duplication of efforts in converting code.

Closes scylladb/scylladb#17132

* github.com:scylladb/scylladb:
  network_topology_strategy: Do not walk list of datacenters twice
  replication_strategy: Do not convert string RF into int twise
  abstract_replication_strategy: Make validate_replication_factor return value
2024-02-06 13:26:31 +02:00
Kefu Chai
a40d3fc25b db: add formatter for data_dictionary::user_types_metadata
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `data_dictionary::user_types_metadata`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17140
2024-02-06 13:24:07 +02:00
Kefu Chai
97587a2ea4 test/boost: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17139
2024-02-06 13:22:16 +02:00
Kefu Chai
16e1700246 exceptions: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17152
2024-02-06 13:16:03 +02:00
Kefu Chai
3bca11668a db: add formatter for exceptions::exception_code
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `exceptions::exception_code`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17151
2024-02-06 13:15:08 +02:00
Pavel Emelyanov
93918eef62 ks_prop_defs: Remove preprocessor-guarded java code
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17166
2024-02-06 13:14:15 +02:00
Botond Dénes
53a11cba62 Merge 'types/types.cc: move stringstream content instead of copying it' from Patryk Wróbel
C++20 introduced a new overload of std::ostringstream::str() that is selected when the mentioned member function is called on r-value.

The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer.

It also removes a helper function `inet_addr_type_impl::to_sstring()` - it was used only in two places. It was replaced with `fmt::to_string()`.

Closes scylladb/scylladb#16991

* github.com:scylladb/scylladb:
  use fmt::to_string() for seastar::net::inet_address
  types/types.cc: move stringstream content instead of copying it
2024-02-06 13:11:41 +02:00
Botond Dénes
619c3fdf32 Merge 'types: use {fmt} to format time and boolean' from Kefu Chai
so we can tighten our dependencies a little bit. there are only three places where we are using the `date` library. also, there is no need to reinvent the wheels if there are ready-to-use ones.

Closes scylladb/scylladb#17177

* github.com:scylladb/scylladb:
  types: use {fmt} to format boolean
  types: use {fmt} to format time
2024-02-06 13:10:39 +02:00
Kefu Chai
3dfe7c44f6 dht: add formatter for dht::sharder
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::sharder`, and drop
its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17178
2024-02-06 13:06:46 +02:00
Kefu Chai
c38325db26 Update seastar submodule
* seastar 85359b28...289ad5e5 (19):
  > net/dpdk: use user-defined literal when appropriate
  > io_tester: Allow running on non-XFS fs
  > io: Apply rate-factor early
  > circular_buffer: make iterator default constructible
  > net/posix: add a way to change file permissions of unix domain socket
  > resource: move includes to the top of the source file
  > treewide: replace calls to future::get0() by calls to future::get()
  > core/future: add as_ready_future utility
  > build: do not expose -Wno-error=#warnings
  > coroutine: remove remnants of variadic futures
  > build: prevent gcc -Wstringop-overflow workaround from affecting clang
  > util/spinlock: use #warning instead of #warn
  > io_tester: encapsulate code into allocate_and_fill_buffer()
  > io_tester: make maybe_remove_file a function
  > future: remove tuples from get0_return_type
  > circular_buffer_fixed_capacity: use std::uninitialized_move() instead of open-coding
  > rpc/rpc_types: do not use integer literal in preprocessor macro
  > future: use "(T(...))" instead of "{T(...)}" in uninitialized_set()
  > net/posix: include used header

Closes scylladb/scylladb#17179
2024-02-06 13:05:33 +02:00
David Garcia
ad1c9ae452 docs: fix logging in images extensions
Adds a missing logging import in the file scylladb_common_images extension, which prevents the enterprise build from building.

Additionally, it standardizes logging handling across the extensions and removes "ami" references in Azure and GCP extensions.

Closes scylladb/scylladb#17137
2024-02-06 13:00:37 +02:00
Botond Dénes
ce3233112e Merge 'configure.py: add -Wextra to cflags' from Kefu Chai
also disable some more warnings which are failing the build after `-Wextra` is enabled. we can fix them on a case-by-case basis, if they are geniune issues. but before that, we just disable them.

this goal of this change is to reduce the discrepancies between the compile options used by CMake and those used by configure.py. the side effect is that we enable some more warning enabeld by `-Wextra`, for instance, `-Wsign-compare` is enable now. for the full list of the enabled warnings when building with Clang, please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra.

Closes scylladb/scylladb#17131

* github.com:scylladb/scylladb:
  configure.py: add -Wextra to cflags
  test/tablets: do not compare signed and unsigned
2024-02-06 12:57:32 +02:00
Petr Gusev
646ca9515e test_topology_ops: check node restart after decommission
There used to be a problem with restarting a node after
decommissioning some other node - the restarting node
tries to apply the raft log, this log contains a record
about the decommissioned node, and we got stuck trying
to resolve its IP.

This was fixed in #16639 - we excluded IPs from
the RAFt log application code and moved it entirely
to host_id-s.

In this commit we add a regression test
for this case. We move the decommission_node
call before server_stop/server_start. We need
to add one more server to retain majority when
the node is decommissioned, otherwise the topology
coordinator won't migrate from the stopped node
before replacing it, and we'll get an error.

closes #14803
2024-02-06 13:29:42 +04:00
Petr Gusev
aeed5c5fe3 test_replace_reuse_ip: check other servers see the IP
The replaced node transitions to LEFT state, and
we used to remove the IPs of such nodes from gossiper.
If we replace with same IP, this caused the IP of the
new node to be removed from gossiper.

This problem was fixed by #16820, this commit
adds a regression test for it.

closes #15967
2024-02-06 13:28:04 +04:00
Botond Dénes
115ee4e1f5 Merge 'doc: remove the OSS and Enterprise Features pages' from Anna Stuchlik
This PR removes the following pages:
- ScyllaDB Open Source Features
- ScyllaDB Enterprise Features

They were outdated, incomplete, and misleading. They were also redundant, as the per-release updates are added as Release Notes.

With this update, the features listed on the removed pages are added under the common page: ScyllaDB Features.

In addition, a reference to the Enterprise-only Features section is added.

Note: No redirections are added because no file paths or URLs are changed with this PR.

Fixes https://github.com/scylladb/scylladb/issues/13485

Refs https://github.com/scylladb/scylladb/issues/16496

(nobackport)

Closes scylladb/scylladb#17150

* github.com:scylladb/scylladb:
  Update docs/using-scylla/features.rst
  doc: remove the OSS and Enterprise Features pages
2024-02-06 08:17:18 +02:00
Botond Dénes
edb983d165 Merge 'doc: add the 5.4-to-2024.1 upgrade guide' from Anna Stuchlik
This PR:
- Adds the upgrade guide from ScyllaDB Open Source 5.4 to ScyllaDB Enterprise 2024.1. Note: The need to include the "Restore system tables" step in rollback has been confirmed; see https://github.com/scylladb/scylladb/issues/11907#issuecomment-1842657959.
- Removes the 5.1-to-2022.2 upgrade guide (unsupported versions).

Fixes https://github.com/scylladb/scylladb/issues/16445

Closes scylladb/scylladb#16887

* github.com:scylladb/scylladb:
  doc: fix the OSS version number
  doc: metric updates between 2024.1. and 5.4
  doc: remove the 5.1-to-2022.2 upgrade guide
  doc: add the 5.4-to-2024.1 upgrade guide
2024-02-06 08:16:05 +02:00
Kefu Chai
6f07d9edaa types: use {fmt} to format boolean
{fmt} format boolean as "true" / "false" since v2.0.1, no need to
reinvent the wheel.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-06 10:40:02 +08:00
Kefu Chai
be29556955 types: use {fmt} to format time
so we can tighten our dependencies a little bit. there are only
three places where we are using the `date` library. the outputs
of these two ways are identical:
see https://wandbox.org/permlink/Lo9NUrQNUEqyiMEa and https://godbolt.org/z/YEha9ah7v to compare their outputs.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-06 10:39:30 +08:00
Kefu Chai
02376250b5 storage_service: do no filter tablets tables manually
instead of filtering the keyspaces manually, let's reuse
`database::get_non_local_strategy_keyspaces_erms()`. less
repeatings and more future-proof this way.

Fixes #16974
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17121
2024-02-05 21:28:35 +01:00
Anna Stuchlik
d6723134ab doc: fix the OSS version number
Replace "5.2" with "5.4", as this is
the 5.4-to-2024.1 upgrade guide.
2024-02-05 21:10:50 +01:00
Tomasz Grabiec
448e117e7d Merge 'service: validate replication strategy constraints in tablet-moving API' from Aleksandra Martyniuk
Validate replication strategy constraints in /storage_service/tablets/move API:
- replicas are not on the same node
- replicas don't move across DC (violates RF in each DC)
- availability is not reduced due to rack overloading

Add flag to force tablet move even though dc/rack constraints aren't fulfilled.

Test for the change: https://github.com/scylladb/scylla-dtest/pull/3911.

Fixes: #16379.

Closes scylladb/scylladb#16648

* github.com:scylladb/scylladb:
  api: service: add force param to move_tablet api
  service: validate replication strategy constraints
2024-02-05 20:07:21 +01:00
Avi Kivity
9dd76c1035 Merge 'db: add formatter for dht::ring_position_{ext,view}' from Kefu Chai
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::ring_position_ext` and
`dht::ring_position_view`, and drop their operator<<.

Refs #13245

Closes scylladb/scylladb#17128

* github.com:scylladb/scylladb:
  db: add formatter for dht::ring_position_ext
  db: add formatter for dht::ring_position_view
2024-02-05 20:27:54 +02:00
Patryk Wrobel
cc186c1798 use fmt::to_string() for seastar::net::inet_address
This change removes inet_addr_type_impl::to_sstring()
and replaces its usages with fmt::to_string().
The removed helper performed an uneeded copying via
std::ostringstream::str().

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-02-05 16:56:40 +01:00
Patryk Wrobel
8c0d30cd88 types/types.cc: move stringstream content instead of copying it
C++20 introduced a new overload of std::ofstringstream::str()
that is selected when the mentioned member function is called
on r-value.

The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-02-05 16:35:27 +01:00
Kamil Braun
968d1e3e78 Merge 'raft topology: make rollback_to_normal a transition state' from Patryk Jędrzejczak
After changing `left_token_ring` from a node state to a transition
state in scylladb/scylladb#17009, we do the same for
`rollback_to_normal`. `rollback_to_normal` was created as a node
state because `left_token_ring` was a node state.

This change will allow us to distinguish a failed removenode from
a failed decommission in the `rollback_to_normal` handler.
Currently, we use the same logic for both of them, so it's not
required. However, this might change, as it has happened with the
decommission and the failed bootstrap/replace in the
`left_token_ring` state (scylladb/scylladb#16797). We are making
this change now because it would be much harder after branching.

Fixes scylladb/scylladb#17032

Closes scylladb/scylladb#17136

* github.com:scylladb/scylladb:
  docs: dev: topology-over-raft: align indentation
  docs: dev: topology-over-raft: document the rollback_to_normal state
  topology_coordinator: improve logs in rollback_to_normal handler
  raft topology: make rollback_to_normal a transition state
2024-02-05 16:30:20 +01:00
Anna Stuchlik
6d6c400b77 doc: metric updates between 2024.1. and 5.4
This commit adds the information about
metrics updates between these two versions.

Fixes https://github.com/scylladb/scylladb/issues/16446
2024-02-05 16:24:16 +01:00
Anna Stuchlik
1e9c7ab6d1 Update docs/using-scylla/features.rst
Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>
2024-02-05 14:44:31 +01:00
Mikołaj Grzebieluch
4cecda7ead transport/controller: pass unix_domain_socket_permissions to generic_server::listen 2024-02-05 14:22:03 +01:00
Mikołaj Grzebieluch
6b178f9a4a transport/controller: split configuring sockets into separate functions
TCP sockets and unix domain sockets don't share common listen options
excluding `socket_address`. For unix domain sockets, available options will be
expanded to cover also filesystem permissions and owner for the socket.
Storing listen options for both types of sockets in one structure would become messy.
For now, both use `listen_cfg`.

In a singular cql controller, only sockets of one type are created, thus it
can be easily split into two cases.
Isolate maintenance socket from `listen_cfg`.
2024-02-05 14:20:17 +01:00
Nadav Har'El
7888b23e9e Merge 'test/cql-pytest: re-enable disabled tests' from Botond Dénes
In a previous PR (https://github.com/scylladb/scylladb/pull/16840), we enabled tablets by default when running the cql-pytest suite. To handle tests which are failing with tablets enabled, we used a new fixture, `xfail_tablets` to mark these as xfail. This means that we effectively lost test coverage, as these tests can now freely fail and no-one will notice if this is due to a new regression. To restore test coverage, this PR re-enables all the previously disabled tests, by parametrizing each one of them to run with both vnodes and tablets, and targetedly mark as xfail, only the tablet variant. After these tests are fixed with tablets (or the underlying functionality they test is fixed to work with tablets), we will run them with both vnodes and tablets, because these tests apparently *do* care which replication method is used.

Together with https://github.com/scylladb/scylladb/pull/16802, this means all previously disabled test is re-enabled and no coverage is lost.

Closes scylladb/scylladb#16945

* github.com:scylladb/scylladb:
  test/cql-pytest: conftest.py: remove xfail_tablets fixture
  test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests
  test/cql-pytest: test_describe.py: re-enable disabled tests
  test/cql-pytest: test_cdc.py: re-enable disabled tests
  test/cql-pytest: add parameter support to test_keyspace
2024-02-05 14:12:57 +02:00
Asias He
904bafd069 tablets: Convert to use the new version of for_each_tablet
It is more gently than the old one.
2024-02-05 18:45:40 +08:00
Asias He
04773bd1df storage_service: Add describe_ring support for tablet table
The table query param is added to get the describe_ring result for a
given table.

Both vnode table and tablet table can use this table param, so it is
easier for users to user.

If the table param is not provided by user and the keyspace contains
tablet table, the request will be rejected.

E.g.,
curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles"
curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1"

Refs #16509
2024-02-05 18:11:07 +08:00
Pavel Emelyanov
45dbe38658 tablets: Make sure topology has enough endpoints for RF
When creating a keyspace, scylla allows setting RF value smaller than
there are nodes in the DC. With vnodes, when new nodes are bootstrapped,
new tokens are inserted thus catching up with RF. With tablets, it's not
the case as replica set remains unchanged.

With tablets it's good chance not to mimic the vnodes behavior and
require as many nodes to be up and running as the requested RF is. This
patch implementes this in a lazy manned -- when creating a keyspace RF
can be any, but when a new table is created the topology should meet RF
requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:50:04 +03:00
Pavel Emelyanov
8471d88576 cql-pytest: Disable tablets when RF > nodes-in-DC
All the cql-pytest-s run agains single scylla node, but
new_random_keyspace() helper may request RF in the rage of 1 through 6,
so tablets need to be explicitly disabled when the RF is too large

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:50:04 +03:00
Pavel Emelyanov
3b9ca29411 test: Remove test that configures RF larger than the number of nodes
This is going to be disabled soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:50:03 +03:00
Pavel Emelyanov
8910d37994 keyspace_metadata: Include tablets property in DESCRIBE
When tablets are enabled and a keyspace being described has them
explicitly disabled or non-automatic initial value of zero, include this
into the returned describe statement too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:49:20 +03:00
Benny Halevy
bd3ed168ab api/compaction_manager: stop_keyspace_compaction: prevent stack use-after-free
Since `t.parallel_foreach_table_state` may yield,
we should access `type` by reference when calling
`stop_compaction` since it is captured by the calling
lambda and gets lost when it returns if
`parallel_foreach_table_state` returns an unavailable
future.

Instead change all captures to `[&]` so we can access
the `type` variable held by the coroutine frame.

Fixes #16975

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17143
2024-02-05 09:32:08 +02:00
Asias He
ab560c1580 storage_service: Mark host2ip as const
So it can be used by another const function.
2024-02-05 13:42:08 +08:00
Asias He
fab0d33d08 tablets: Add for_each_tablet_gently
In this version, the callback returns a future<>, so it can yield itself
to avoid stalls in func itself.
2024-02-05 13:42:08 +08:00
Anna Stuchlik
f7afa6773f doc: remove the OSS and Enterprise Features pages
This commit removes the following pages:
- ScyllaDB Open Source Features
- ScyllaDB Enterprise Features

They were outdated, incomplete, and misleading.
They were also redundant, as the per-release
updates are added as Release Notes.

With this update, the features listed on the removed
pages are added under the common page: ScyllaDB Features.

Note: No redirections are added, because no file paths
or URLs are changed with this commit.

Fixes https://github.com/scylladb/scylladb/issues/13485

Refs https://github.com/scylladb/scylladb/issues/16496
2024-02-04 20:55:40 +01:00
Avi Kivity
784c2f8ad2 Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai
get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.

Closes scylladb/scylladb#17130

* github.com:scylladb/scylladb:
  treewide: replace seastar::future::get0() with seastar::future::get()
  sstable: capture return value of get0() using auto
  utils: result_loop: define result_type with decayed type

[avi: add another one that snuck in while this was cooking]
2024-02-04 15:23:33 +02:00
Michał Chojnowski
ed98102c45 row_cache: update _prev_snapshot_pos even if apply_to_incomplete() is preempted
Commit e81fc1f095 accidentally broke the control
flow of row_cache::do_update().

Before that commit, the body of the loop was wrapped in a lambda.
Thus, to break out of the loop, `return` was used.

The bad commit removed the lambda, but didn't update the `return` accordingly.
Thus, since the commit, the statement doesn't just break out of the loop as
intended, but also skips the code after the loop, which updates `_prev_snapshot_pos`
to reflect the work done by the loop.

As a result, whenever `apply_to_incomplete()` (the `updater`) is preempted,
`do_update()` fails to update `_prev_snapshot_pos`. It remains in a
stale state, until `do_update()` runs again and either finishes or
is preempted outside of `updater`.

If we read a partition processed by `do_update()` but not covered by
`_prev_snapshot_pos`, we will read stale data (from the previous snapshot),
which will be remembered in the cache as the current data.

This results in outdated data being returned by the replica.
(And perhaps in something worse if range tombstones are involved.
I didn't investigate this possibility in depth).

Note: for queries with CL>1, occurences of this bug are likely to be hidden
by reconciliation, because the reconciled query will only see stale data if
the queried partition is affected by the bug on on *all* queried replicas
at the time of the query.

Fixes #16759

Closes scylladb/scylladb#17138
2024-02-04 11:17:41 +02:00
Aleksandra Martyniuk
89c683f51a api: service: add force param to move_tablet api
Force flag is added to /storage_service/tablets/move. If force is set
to true, replication strategy constraints regarding racks and dcs can
be broken.
2024-02-02 19:08:01 +01:00
Aleksandra Martyniuk
3b0fa7335a service: validate replication strategy constraints
Check whether tablet move meets replication strategy constraints, i.e.
replicas aren't on the same node, replicas don't move across DCs
or HA isn't reduced due to rack overloading. Throw if constraints
are broken.
2024-02-02 19:06:45 +01:00
Botond Dénes
017a574b16 tools: lua_sstable_consumer.cc: load os and math libs
The amount of standard Lua libraries loaded for the sstable-script was
limited, due to fears that some libraries (like the io library) could
expose methods, which if used from the script could interfere with
seastar's asynchronous arhitecture. So initially only the table and
string libraries were loaded.
This patch adds two more libraries to be loaded: match and os. The
former is self-explanatory and the latter contains methods to work with
date/time, obtain the values of environment variables as well as launch
external processes. None of these should interfere with seastar, on the
other hand the facilities they provide can come very handy for sstable
scripts.

Closes scylladb/scylladb#17126
2024-02-02 19:00:57 +03:00
Patryk Jędrzejczak
2687204c7f docs: dev: topology-over-raft: align indentation 2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak
fdd3c3a280 docs: dev: topology-over-raft: document the rollback_to_normal state
In one of the previous patches, we changed the `rollback_to_normal`
state from a node state to a transition state. We document it
in this patch. The node state wasn't documented, so there is
nothing to remove.
2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak
8d6a9730db topology_coordinator: improve logs in rollback_to_normal handler
After making `rollback_to_normal` a transition state, we can
distinguish a failed decommission from a failed bootstrap in the
`rollback_to_normal` handler. We use it to make logs more
descriptive.
2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak
25b90f5554 raft topology: make rollback_to_normal a transition state
After changing `left_token_ring` from a node state to a transition
state in scylladb/scylladb#17009, we do the same for
`rollback_to_normal`. `rollback_to_normal` was created as a node
state because `left_token_ring` was a node state.

This change will allow us to distinguish a failed removenode from
a failed decommission in the `rollback_to_normal` handler.
Currently, we use the same logic for both of them, so it's not
required. However, this might change, as it has happened with the
decommission and the failed bootstrap/replace in the
`left_token_ring` state (scylladb/scylladb#16797). We are making
this change now because it would be much harder after branching.

The change also simplifies the code in
`topology_coordinator:rollback_current_topology_op`.

Moving the `rollback_to_normal` handler from
`handle_node_transition` to `handle_topology_transition` created a
large diff. There is only one change - adding
`auto node = get_node_to_work_on(std::move(guard));`.
2024-02-02 16:55:20 +01:00
Pavel Emelyanov
52e6398ad6 messaging: Add formatter for netw::msg_addr
As a part of ongoing "support fmt v10" effort

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17053
2024-02-02 15:20:40 +01:00
Kefu Chai
cd3c7a50ed scylla_raid_setup: drop unused import
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17095
2024-02-02 15:20:40 +01:00
Kefu Chai
e62b29bab7 tasks: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17125
2024-02-02 15:20:40 +01:00
Pavel Emelyanov
75bc702ae8 utils: Remove unused operator<< for file_lock object
The lock itself is only used by utils/directories code

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17051
2024-02-02 15:20:40 +01:00
Kefu Chai
792fa4441e docs: s/ontop/on top/
this misspelling is identified by codespell. ontop cannot be found
on merriam-webster, but "on top" can, see
https://www.merriam-webster.com/dictionary/on%20top, so let's
replace ontop with "on top".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17127
2024-02-02 15:20:40 +01:00
Botond Dénes
c9ab39af88 install-dependencies.sh: remove duplicate python3-pyudev package
It appeared in the list twice.

Closes scylladb/scylladb#17060
2024-02-02 15:20:40 +01:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Kefu Chai
deef78c796 sstable: capture return value of get0() using auto
instead of capturing the return value of `get0()` with a reference
type, use a plain type. as `get0()` returns a plain `T` while `get0()`
returns a `T&&`, to avoid the value referenced by `T&&` gets destroyed
after the expression, let's use a plain `auto` instead of `auto&&`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 22:12:18 +08:00
Kefu Chai
9fcca8f585 utils: result_loop: define result_type with decayed type
this change prepares for replacing `seastar::future::get0()` with
`seastar::future::get()`. the former's return type is a plain `T`,
while the latter is `T&&`. in this case `T` is
`boost::outcome::result<..>`. in order to extract its `error_type`,
we need to get its decayed type. since `std::remove_reference_t<T>`
also returns `T`, let's use it so it works with both `get0()` and `get()`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 22:12:18 +08:00
Kefu Chai
19025127c3 configure.py: add -Wextra to cflags
also disable some more warnings which are failing the build after
`-Wextra` is enabled. we can fix them on a case-by-case basis, if
they are geniune issues. but before that, we just disable them.

this goal of this change is to reduce the discrepancies between
the compile options used by CMake and those used by configure.py.
the side effect is that we enable some more warning enabeld by
`-Wextra`, for instance, `-Wsign-compare` is enable now. for
the full list of the enabled warnings when building with Clang,
please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 20:49:21 +08:00
Kefu Chai
aea6cd0b2d test/tablets: do not compare signed and unsigned
this change should silence following warning:

```
 test/boost/tablets_test.cc:1600:27: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
19:47:04          for (int i = 0; i < smp::count * 20; i++) {
19:47:04                          ~ ^ ~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 20:49:21 +08:00
Pavel Emelyanov
afda0f6ddf network_topology_strategy: Do not walk list of datacenters twice
Construct of that class walks the provided options to get per-DC
replication factors. It does it twice -- first to populate the dc:rf
map, second to calculate the sum of provided RF values. The latter loop
can be optimized away.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-02 14:39:24 +03:00
Pavel Emelyanov
06f9e7367c replication_strategy: Do not convert string RF into int twise
There are two replication strategy classes that validate string RF and
then convert it into integer. Since validation helper returns the parsed
value, it can be just used avoiding the 2nd conversion.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-02 14:38:17 +03:00
Pavel Emelyanov
a8cd3bc636 abstract_replication_strategy: Make validate_replication_factor return value
The helper in question checks if string RF is indeed an integer. Make
this helper return the "checked" integer value, because it does this
conversion. And rename it to parse_... to reflect what it now does. Next
patches will make use of this change.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-02 14:36:47 +03:00
Kefu Chai
e56e74df0a db: add formatter for dht::ring_position_ext
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::ring_position_ext`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 18:37:56 +08:00
Kefu Chai
bb3ba81b15 db: add formatter for dht::ring_position_view
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `dht::ring_position_view`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-02 18:36:17 +08:00
Pavel Emelyanov
9450a03cdf data_dictionary: Add formatter for keyspace-metadata
Other than being fmt v10 compatible, it's also shorter and easier to
read, thanks to fmt::join() helper

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17115
2024-02-02 11:26:39 +02:00
Kefu Chai
c7a01b9eb4 transport: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17092
2024-02-02 11:20:24 +02:00
Lakshmi Narayanan Sreethar
e86965c272 compaction: run rewrite_sstables_compaction_task_executor tasks in maintenance group
Use maintenance group to run all the compaction tasks that use the
rewrite_sstables_compaction_task_executor.

Fixes #16699

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17112
2024-02-02 11:18:49 +02:00
Pavel Emelyanov
b557dcbf5a cql3: Sanitize ALTER KEYSPACE check for non-local storages
This kills three birds with one stone

1. fixes broken indentation
2. re-uses new_options local variable
3. stops using string literal to check storage type

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17111
2024-02-02 11:13:29 +02:00
Botond Dénes
63d44712af Merge 'storage_service: Fix indentation for stream_ranges' from Asias He
This is a follow up of "storage_service: Run stream_ranges cmd in streaming group" to fix indentation and drop a unnecessary co_return.

Refs: #17090

Closes scylladb/scylladb#17114

* github.com:scylladb/scylladb:
  storage_service: Drop unnecessary co_return in raft_topology_cmd_handler
  storage_service: Fix indentation for stream_ranges
2024-02-02 11:12:52 +02:00
Kefu Chai
b45af994c2 locator/utils: remove stale comment
this comment has already served its purpose when rewriting
C* in C++. since we've re-implemented it, there is no need to keep it
around.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17120
2024-02-02 11:07:35 +02:00
Asias He
23a8b0552c storage_service: Drop unnecessary co_return in raft_topology_cmd_handler
It is introduced in "storage_service: Run stream_ranges cmd in streaming
group".

Refs: #17090
2024-02-02 08:20:06 +08:00
Asias He
732a9b5253 storage_service: Fix indentation for stream_ranges
Fixes the indentation introduced in "storage_service: Run
stream_ranges cmd in streaming group".

Refs: #17090
2024-02-02 08:20:03 +08:00
Pavel Emelyanov
66b859a29f gms: Remove unused operator<< for feature object
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17109
2024-02-01 19:00:46 +02:00
Kefu Chai
aad8035bed replica/database: use structured-bind when appropriate
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17104
2024-02-01 16:31:29 +02:00
Botond Dénes
dc8e13baed Merge 'Move some tablets tests from topology_custom to cql-pytest' from Pavel Emelyanov
The latter suite is now tablets-aware and tablets cases from the former one can happily work with single shared scylla instance

Closes scylladb/scylladb#17101

* github.com:scylladb/scylladb:
  test/topology_custom: Remove test_tablets.py
  test/topology: Move test_tablet_change_initial_tablets
  test/topology: Move test_tablet_explicit_disabling
  test/topology: Move test_tablet_default_initialization
  test/topology: Move test_tablet_change_replication_strategy
  test/topology: Move test_tablet_change_replication_vnode_to_tablets
  cql-pytest: Add skip_without_tablets fixture
2024-02-01 16:28:43 +02:00
Kamil Braun
c911bf1a33 test_raft_snapshot_request: fix flakiness (again)
At the end of the test, we wait until a restarted node receives a
snapshot from the leader, and then verify that the log has been
truncated.

To check the snapshot, the test used the `system.raft_snapshots` table,
while the log is stored in `system.raft`.

Unfortunately, the two tables are not updated atomically when Raft
persists a snapshot (scylladb/scylladb#9603). We first update
`system.raft_snapshots`, then `system.raft` (see
`raft_sys_table_storage::store_snapshot_descriptor`). So after the wait
finishes, there's no guarantee the log has been truncated yet -- there's
a race between the test's last check and Scylla doing that last delete.

But we can check the snapshot using `system.raft` instead of
`system.raft_snapshots`, as `system.raft` has the latest ID. And since
1640f83fdc, storing that ID and truncating
the log in `system.raft` happens atomically.

Closes scylladb/scylladb#17106
2024-02-01 16:06:12 +02:00
Kefu Chai
946d281d39 exceptions: s/#warn/#warning/
`#warning` is a preprocessor macro in C/C++, while `#warn` is not. the
reason we haven't run into the build failure caused by this is likely
that we are only building on amd64/aarch64 with libstdc++ at the time
of writing.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17074
2024-02-01 14:50:17 +02:00
Botond Dénes
1a0300dba6 Merge 'compaction_manager: flush tables before cleanup' from Kefu Chai
according to the document "nodetool cleanup"

> Triggers removal of data that the node no longer owns

currently, scylla performs cleanup by rewriting the sstables. but
commitlog segments may still contain the mutations to the tables
which are dropped during sstable rewriting. when scylla server
restarts, the dirty mutations are replayed to the memtable. if
any of these dirty mutations changes the tables cleaned up. the
stale data are reapplied. this would lead to data resurrection.

so, in this change we following the same model of major compaction
where we

1. forcing new active segment,
2. flushing tables being cleaned up
3. perform cleanup using compaction

Fixes #4734

Closes scylladb/scylladb#16757

* github.com:scylladb/scylladb:
  storage_service: fall back to local cleanup in cleanup_all
  compaction: format flush_mode without the helper
  compaction_manager: flush all tables before cleanup
  replica: table: pass do_flush to table::perform_cleanup_compaction()
  api, compaction: promote flush_mode
2024-02-01 13:47:45 +02:00
libo-sober
a341b870bc Remove unnecessary calculations in integrity_checked_file_impl::write_dma.
Use calculated `rbuf_end` in `std::mismatch` to reduce unnecessary calculations.

Closes scylladb/scylladb#16979
2024-02-01 13:42:59 +02:00
Botond Dénes
8debb6b98f Merge 'storage_service: Run stream_ranges cmd in streaming group' from Asias He
Otherwise it will inherit the rpc verb's scheduling group which is gossip. As a result, it causes the streaming runs in the wrong scheduling group.

Fixes #17090

Closes scylladb/scylladb#17097

* github.com:scylladb/scylladb:
  streaming: Verify stream consumer runs inside streaming group
  storage_service: Run stream_ranges cmd in streaming group
2024-02-01 13:18:26 +02:00
Patryk Wrobel
25324bbe50 cql_test_env.cc: remove dead code
This change removes empty anonymous namespace
that is a dead code.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17099
2024-02-01 13:17:48 +02:00
Pavel Emelyanov
64cb3a6496 test/topology_custom: Remove test_tablets.py
It's now empty, all test cases had been moved to cql-pytest

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
3fbe93e45d test/topology: Move test_tablet_change_initial_tablets
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
480227fcad test/topology: Move test_tablet_explicit_disabling
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
45b0490100 test/topology: Move test_tablet_default_initialization
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
3258c56ca3 test/topology: Move test_tablet_change_replication_strategy
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Pavel Emelyanov
6f50cc2783 test/topology: Move test_tablet_change_replication_vnode_to_tablets
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 13:59:51 +03:00
Botond Dénes
b9af2efcb1 Merge 'directories: prevent inode cache fragmentation by orderly verifying data directory contents' from Lakshmi Narayanan Sreethar
During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache.

Mitigate this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache.

Fixes https://github.com/scylladb/scylladb/issues/14506

Closes scylladb/scylladb#16952

* github.com:scylladb/scylladb:
  directories: prevent inode cache fragmentation by orderly verifying data directory contents
  directories: skip verifying data directory contents during startup
  directories: co-routinize create_and_verify
2024-02-01 12:30:07 +02:00
Kefu Chai
4ec104e086 api: storage_service: correct a typo
s/a any keyspace/a given keyspace/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17098
2024-02-01 10:55:58 +02:00
Botond Dénes
2a4b991772 Merge 'Fix mintimeuuid() call that could crash Scylla' from Nadav Har'El
This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs.

The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function.

Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw.

Fixes #17035

Closes scylladb/scylladb#17073

* github.com:scylladb/scylladb:
  utils: replace assert() by on_internal_error()
  utils: add on_internal_error with common logger
  utils: add a timeuuid minimum, like we had maximum
  test/cql-pytest: tests for "date" type
2024-02-01 10:48:48 +02:00
Patryk Wrobel
6e5a85c387 replica/table: add tablet count metric
This change introduces a new metric called tablet_count
that is recalculated during construction of table object
and on each call to table::update_effective_replication_map().

To get the count of tablet per current shard, tablet map
is traversed and for each tablet_id tablet_map::get_shard()
is called. Its return value is compared with this_shard_id().

The new metric is maintained and exposed only for tables
that uses tablets.

Refs: scylladb#16131
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17056
2024-02-01 10:46:53 +02:00
Asias He
2888c3086c utils: Add uuid_xor_to_uint32 helper
Convert the uuid to a uint32_t using xor.
It is useful to get a uint32_t number from the uuid.

Refs: #16927

Closes scylladb/scylladb#17049
2024-02-01 10:27:55 +02:00
Botond Dénes
f5917b215f Merge 'replica, tablet_allocator: do not compare unsigned with signed' from Kefu Chai
this series addresses couple `-Wsign-compare` warnings surfaced in the tree.

Closes scylladb/scylladb#17091

* github.com:scylladb/scylladb:
  tablet_allocator: do not compare signed and unsigned
  replica: table: do not compare signed with unsigned
2024-02-01 10:26:04 +02:00
Kefu Chai
7a8e8c2ced db: add formatter for db::write_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::write_type`, and drop
its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17093
2024-02-01 10:22:45 +02:00
Kefu Chai
005d231f96 db: add formatter for gms::application_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::application_state`,
but its operator<< is preserved, as it is still used by the generic
homebrew formatter for `std::unordered_map<>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17096
2024-02-01 10:02:25 +02:00
Pavel Emelyanov
ab7ce3d1fa cql-pytest: Add skip_without_tablets fixture
It's opposite to skip_with_tablets one and thus also depends on
scylla_only one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-01 10:58:13 +03:00
Lakshmi Narayanan Sreethar
dbe758d309 directories: prevent inode cache fragmentation by orderly verifying data directory contents
During startup, the contents of the data directory are verified to ensure
that they have the right owner and permissions. Verifying all the
contents, which includes files that will be read and closed immediately,
and files that will be held open for longer durations, together, can
lead to memory fragementation in the dentry/inode cache.

Prevent this by updating the verification in a such way that these two
set of files will be verified separately ensuring their separation in
the dentry/inode cache.

Fixes #14506

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-01 12:20:23 +05:30
Lakshmi Narayanan Sreethar
74a4085426 directories: skip verifying data directory contents during startup
This is in preparation for a subsequent patch that will verify the
contents of the data directory in a specific order.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-01 11:54:59 +05:30
Lakshmi Narayanan Sreethar
2e3d2498f4 directories: co-routinize create_and_verify
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-02-01 11:41:10 +05:30
Kefu Chai
5e0b3671d3 storage_service: fall back to local cleanup in cleanup_all
before this change, if no keyspaces are specified,
scylla-nodetool just enumerate all non-local keyspaces, and
call "/storage_service/keyspace_cleanup" on them one after another.
this is not quite efficient, as each this RESTful API call
force a new active commitlog segment, and flushes all tables.
so, if the target node of this command has N non-local keyspaces,
it would repeat the steps above for N times. this is not necessary.
and after a topology change, we would like to run a global
"nodetool cleanup" without specifying the keyspace, so this
is a typical use case which we do care about.

to address this performance issue, in this change, we improve
an existing RESTful API call "/storage_service/cleanup_all", so
if the topology coordinator is not enabled, we fall back to
a local cleanup to cleanup all non-local keyspaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
4f90a875f6 compaction: format flush_mode without the helper
since flush_mode is moved out of major_compaction_task_impl, let's
drop the helper hosted in that class as well, and implement the
formatter witout it.

please note, the `__builtin_unreachable()` is dropped. it should
not change the behavior of the formatter. we don't put it in the
`default` branch in hope that `-Wswitch` can warn us in the case
when another enum of `flush_mode` is added, but we fail to handle
it somehow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
b39cc01bb3 compaction_manager: flush all tables before cleanup
according to the document "nodetool cleanup"

> Triggers removal of data that the node no longer owns

currently, scylla performs cleanup by rewriting the sstables. but
commitlog segments may still contain the mutations to the tables
which are dropped during sstable rewriting. when scylla server
restarts, the dirty mutations are replayed to the memtable. if
any of these dirty mutations changes the tables cleaned up. the
stale data are reapplied. this would lead to data resurrection.

so, in this change we following the same model of major compaction:

1. force new active segment,
2. flush all tables
3. perform cleanup using compaction, which rewrites the sstables
   of specified tables

because we already `flush()` all tables in
`cleanup_keyspace_compaction_task_impl::run()`, there is no need to
call `flush()` again, in `table::perform_cleanup_compaction()`, so
the `flush()` call is dropped in this function, and the tests using
this function are updated to call `flush()` manually to preserve
the existing behavior.

there are two callers of `cleanup_keyspace_compaction_task_impl`,

* one is `storage_service::sstable_cleanup_fiber()`, which listens
  for the events fired by topology_state_machine, which is in turn
  driven by, for instance, "/storage_service/cleanup_all" API.
  which cleanup all keyspaces in one after another.
* another is "/storage_service/keyspace_cleanup", which cleans up
  the specified keyspace.

in the first use case, we can force a new active segment for a single
time, so another parameter to the ctor of
`cleanup_keyspace_compaction_task_impl` is introduced to specify if
the `db.flush_all_tables()` call should be skiped.

please note, there are two possible optimizations,

1. force new active segment only if the mutations in it touches the
   tables being cleaned up
2. after forcing new active segment, only flush the (mem)tables
   mutated by the non-active segments

but let's leave them for following-up changes. this change is a
minimal fix for data resurrection issue.

Fixes #16757
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
34d80690fa replica: table: pass do_flush to table::perform_cleanup_compaction()
this parameter defaults to do_flush::yes, so the existing behavior is
preserved. and this change prepares for a change which flushes all
tables before performing cleanup on the tables per-demand.

please note, we cannot pass compaction::flush_mode to this function,
as it is used by compaction/task_manager_module.hh, if we want to
share it by both database.hh and compaction/task_manager_module.hh,
we would have to find it a new home. so `table::do_flush` boolean
tag is reused instead.

Refs #16757

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
9afec2e3e7 api, compaction: promote flush_mode
so that this enum type can be shared by other task(s) as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Kefu Chai
110d2e52be tablet_allocator: do not compare signed and unsigned
`available_shards` could be negative when `resize_plan` is empty, and
the loop to build `resize_plan` stops at the next iteration after
`available_shards` is assigned with a negative number. so, instead of
making it an `unsigned`, let's just compare it using `std::cmp_less()`.

this change should silence following warning:

```
/home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -g -O0 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wignored-qualifiers -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -MF service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o.d -o service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -c /home/kefu/dev/scylladb/service/tablet_allocator.cc
/home/kefu/dev/scylladb/service/tablet_allocator.cc:529:60: error: comparison of integers of different signs: 'long' and 'const size_t' (aka 'const unsigned long') [-Werror,-Wsign-compare]
  529 |             if (resize_plan.size() > 0 && available_shards < size_desc.shard_count) {
      |                                           ~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:01:19 +08:00
Kefu Chai
493a608417 replica: table: do not compare signed with unsigned
this change helps to silence follow warning:
```
/home/kefu/dev/scylladb/replica/table.cc:1952:26: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
 1952 |     for (auto id = 0; id < _storage_groups.size(); id++) {
      |                       ~~ ^ ~~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:01:19 +08:00
Asias He
e1fc91bea9 streaming: Verify stream consumer runs inside streaming group
This will catch schedule group leaks by accident.

Refs: 17090
2024-02-01 10:37:24 +08:00
Asias He
f103f75ed8 storage_service: Run stream_ranges cmd in streaming group
Otherwise it will inherit the rpc verb's scheduling group which is
gossip. As a result, it causes the streaming runs in the wrong scheduling
group.

Fixes #17090
2024-02-01 10:20:02 +08:00
Kamil Braun
b2c02d8268 Merge 'schema: column_mapping::{static,regular}_column_at(): use on_internal_error()' from Botond Dénes
Instead of std::out_of_range(). Accessing a non-existing column is a
serious bug and the backtrace coming with `on_internal_error()` can be
very valuable when debugging it. As can be the coredump that is possible
to trigger with `--abort-on-internal-error`.

This change follows another similar change to `schema::column_at()`.

This should help us get to the bottom of the mysterious repair failures
caused by invalid column access, seen in
https://github.com/scylladb/scylladb/issues/16821.

Refs: https://github.com/scylladb/scylladb/issues/16821

Closes scylladb/scylladb#17080

* github.com:scylladb/scylladb:
  schema: column_mapping::{static,regular}_column_at(): use on_internal_error()
  schema: column_mapping: move column accessors out-of-line
2024-01-31 16:29:15 +01:00
Nadav Har'El
458fd0c2f7 utils: replace assert() by on_internal_error()
In issue #17035 we had a situation where a certain input timestamp
could result in the create_time() utility function getting called on
a timestamp that cannot be represented as timeuuid, and this resulted
in an *assertion failure*, and a crash.

I guess we used an assertion because we believed that callers try to
avoid calling this function on excessively large timestamps, but
evidentally, they didn't tried hard enough and we got a crash.
The code in UUID_gen.hh changed a lot over the years and has become
very convoluted and it is almost impossible to understand all the
code paths that could lead to this assertion failures. So it's better
to replace this assertion by a on_internal_error, which by default
is just an exception - and also logs the backtrace of the failure.
Issue #17035 would have been much less serious if we had an exception
instead of an assert.

Refs #17035
Refs #7871, Refs #13970 (removes an assert)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-31 16:45:28 +02:00
Nadav Har'El
259811b6ec utils: add on_internal_error with common logger
Seastar's on_internal_error() is a useful replacement for assert()
but it's inconvenient that it requires each caller to supply a logger -
which is often inconvenient, especially when the caller is a header file.

So in this patch we introduce a utils::on_internal_error() function
which is the same as seastar::on_internal_error() (the former calls
the latter), except it uses a single logger instead of asking the caller
to pass a logger.

Refs #7871

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-31 16:45:09 +02:00
Patryk Wrobel
c6de20a608 replica/mutation_dump.cc: move stringstream content instead of copying it
C++20 introduced a new overload of std::stringstream::str()
that is selected when the mentioned member function is called
on r-value.

The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.

Moreover, it introduces usage of std::stringstream::view() when
checking if the stream contains some characters. It skips another
copy of the underlying string, because std::string_view is returned.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17084
2024-01-31 14:58:20 +02:00
Pavel Emelyanov
7c5c89ba8d Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel"
This reverts commit 370fbd346c, reversing
changes made to 0912d2a2c6.

This makes scylla-manager mis-interpret the data_file_directories
somehow, issue #17078
2024-01-31 15:08:14 +03:00
Avi Kivity
c8397f0287 Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho
The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer.

If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint.

Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on.

A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split.

Tablet metadata gains 2 new fields for managing this:
resize_type: resize decision type, can be either of "merge", "split", or "none".
resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator).

A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ).
When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready.

When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split").
The split monitor will start splitting of compaction groups (using mechanism introduced here: 081f30d149) for the table. And once splitting work is completed, the table updates its local state as having completed split.

When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step).

Fixes #16536.

Closes scylladb/scylladb#16580

* github.com:scylladb/scylladb:
  test/topology_experimental_raft: Add tablet split test
  replica: Bypass reshape on boot with tablets temporarily
  replica: Fix table::compaction_group_for_sstable() for tablet streaming
  test/topology_experimental_raft: Disable load balancer in test fencing
  replica: Remap compaction groups when tablet split is finalized
  service: Split tablet map when split request is finalized
  replica: Update table split status if completed split compaction work
  storage_service: Implement split monitor
  topology_cordinator: Generate updates for resize decisions made by balancer
  load_balancer: Introduce metrics for resize decisions
  db: Make target tablet size a live-updateable config option
  load_balancer: Implement resize decisions
  service: Wire table_resize_plan into migration_plan
  service: Introduce table_resize_plan
  tablet_mutation_builder: Add set_resize_decision()
  topology_coordinator: Wire load stats into load balancer
  storage_service: Allow tablet split and migration to happen concurrently
  topology_coordinator: Periodically retrieve table_load_stats
  locator: Introduce topology::get_datacenter_nodes()
  storage_service: Implement table_load_stats RPC
  replica: Expose table_load_stats in table
  replica: Introduce storage_group::live_disk_space_used()
  locator: Introduce table_load_stats
  tablets: Add resize decision metadata to tablet metadata
  locator: Introduce resize_decision
2024-01-31 13:59:56 +02:00
Kefu Chai
bd71e0b794 tracing: add formatter for tracing::span_id
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `tracing::span_id`, and drop
its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17058
2024-01-31 13:43:46 +02:00
Kefu Chai
f5e3a2d98e test.py: add boost_tests() to suite
this change is a cleanup.

so it only returns tests, to be more symmetric with `junit_tests()`.
this allows us to drop the dummy `get_test_case()` in `PythonTestSuite`.
as only the BoostTest will be asked for `get_test_case()` after this
change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16961
2024-01-31 13:43:21 +02:00
Botond Dénes
181f68f248 Merge 'raft_group0: trigger snapshot if existing snapshot index is 0' from Kamil Braun
The persisted snapshot index may be 0 if the snapshot was created in
older version of Scylla, which means snapshot transfer won't be
triggered to a bootstrapping node. Commands present in the log may not
cover all schema changes --- group 0 might have been created through the
upgrade upgrade procedure, on a cluster with existing schema. So a
deployment with index=0 snapshot is broken and we need to fix it. We can
use the new `raft::server::trigger_snapshot` API for that.

Also add a test.

Fixes scylladb/scylladb#16683

Closes scylladb/scylladb#17072

* github.com:scylladb/scylladb:
  test: add test for fixing a broken group 0 snapshot
  raft_group0: trigger snapshot if existing snapshot index is 0
2024-01-31 13:04:59 +02:00
Kefu Chai
843d74428d configure.py: s/-DBOOST_TEST_DYN_LINK/-DBOOST_ALL_DYN_LINK/
we add `-DBOOST_TEST_DYN_LINK` to the cflags when `--static-boost` is
not passed to `configure.py`. but we don't never pass this option to
`configure.py` in our CI/CD. also, we don't install `boost-static` in
`install-dependencies.sh`, so the linker always use the boost shared
libraries when building scylla and other executables in this project.
this fact has been verified with the latest master HEAD, after building
scylla from `build.ninja` which was in turn created using `configure.py`.

Seastar::seastar_testing exposes `Boost::dynamic_linking` in its public
interface, and `Boost::dynamic_linking` exposes `-DBOOST_ALL_DYN_LINK`
as one of its cflags.

so, when building testings using CMake, the tests are compiled with
`-DBOOST_ALL_DYN_LINK`, while when building tests using `configure.py`,
they are compiled with `-DBOOST_TEST_DYN_LINK`. the former is exposed
by `Boost::dynamic_linking`, the latter is hardwired using
`configure.py`. but the net results are identical. it would be better
to use identical cflags on these two building systems. so, let's use
`-DBOOST_ALL_DYN_LINK` in `configure.py` also. furthermore, this is what
non-static-boost implies.

please note, we don't consume the cflags exposed by
`seastar-testing.pc`, so they don't override the ones we set using
`configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17070
2024-01-31 12:21:31 +02:00
Botond Dénes
ecf654ea11 schema: column_mapping::{static,regular}_column_at(): use on_internal_error()
Instead of std::out_of_range(). Accessing a non-existing column is a
serious bug and the backtrace coming with on_internal_error() can be
very valuable when debugging it. As can be the coredump that is possible
to trigger with --abort-on-internal-error.

This change follows another similar change to schema::column_at().
2024-01-31 05:12:33 -05:00
Botond Dénes
03ed9f77ff schema: column_mapping: move column accessors out-of-line
To faciliate further patching.
2024-01-31 05:06:34 -05:00
Lakshmi Narayanan Sreethar
b5e1097858 build: cmake: include raft.cc in api library
When building with cmake, include the raft source files introduced by
commit 617e0913 as sources for api library target.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17075
2024-01-31 11:39:41 +02:00
Nadav Har'El
827c20467c utils: add a timeuuid minimum, like we had maximum
Our time-handling code in UUID_gen.hh is very fragile for very large
timestamps, because the different types - such as Cassandra "timestamp"
and Timeuuid use very different resolution and ranges.

In issue #17035 we discovered a situation where a certain CQL
"timestamp"-type value could cause an assertion-failure and a crash
in the create_time() function that creates a timeuuid - because that
timestamp didn't fit the place we have in timeuuid.

We already added in the past a limit, UUID_UNIXTIME_MAX, beyond which
we refuse timestamps, to avoid these assertions failure. However, we
missed the possibility of *negative* timestamps (which are allowed in
CQL), and indeed a negative timestamp (or a timestamp which was "wrapped"
to a negative value) is what caused issue #17035.

So this patch adds a second limit, UUID_UNIXTIME_MIN - limiting the
most negative timestamp that we support to well below the area which
causes problems, and adds tests that reproduce #17035 and that we
didn't break anything else (e.g., negative timestamps are still
allowed - just not extremely negative timestamps).

Fixes #17035.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-31 11:32:26 +02:00
Kamil Braun
bb22e06a9e Merge 'abort failed rebuild instead of retrying it forever' from Gleb
Add error handling to rebuild instead of retrying it until succeeds.

* 'gleb/rebuild-fail-v2' of github.com:scylladb/scylla-dev:
  test: add test for rebuild failure
  test: add expected_error to rebuild_node operation
  topology_coordinator: Propagate rebuild failure to the initiator
2024-01-31 10:07:28 +01:00
Nadav Har'El
47955642d9 test/cql-pytest: tests for "date" type
This patch adds a few simple tests for the values of the "date" column
type, and how it can be initialized from string or integers, and what do
those values mean.

Two of the tests reproduce issue #17066, where validation is missing
for values that don't fit in a 32-bit unsigned integer.

Refs #17066

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-31 10:58:02 +02:00
Patryk Wrobel
1b6ab65c51 reader_concurrency_semaphore.cc: move stringstream content instead of copying it
C++20 introduced a new overload of std::stringstream::str()
that is selected when the mentioned member function is called
on r-value.

The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17064
2024-01-31 09:31:50 +02:00
Botond Dénes
f8d3070559 Merge 'Fix flakiness in test_raft_snapshot_request' from Kamil Braun
Add workaround for scylladb/python-driver#295.

Also an assert made at the end of the test was false, it is fixed with
appropriate comment added.

Closes scylladb/scylladb#17071

* github.com:scylladb/scylladb:
  test_raft_snapshot_request: fix flakiness
  test: topology/util: update comment for `reconnect_driver`
2024-01-31 09:30:27 +02:00
Pavel Emelyanov
84ddc37130 utils: Coroutinize disk_sanity()
It's pretty hairy in its future-promises form, with coroutines it's
much easier to read

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17052
2024-01-31 09:20:21 +02:00
Kefu Chai
8a9f13c187 redis: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17057
2024-01-31 09:17:18 +02:00
Kefu Chai
b931d93668 treewide: fix misspellings in code comments
these misspellings are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17004
2024-01-31 09:16:10 +02:00
Kamil Braun
57d5aa5a68 test: add test for fixing a broken group 0 snapshot
In a cluster with group 0 with snapshot at index 0 (such group 0 might
be established in a 5.2 cluster, then preserved once it upgrades to 5.4
or later), no snapshot transfer will be triggered when a node is
bootstrapped. This way to new node might not obtain full schema, or
obtain incorrect schema, like in scylladb/scylladb#16683.

Simulate this scenario in a test case using the RECOVERY mode and error
injections. Check that the newly added logic for creating a new snapshot
if such situation is detected helps in this case.
2024-01-30 16:44:01 +01:00
Kamil Braun
98d75c65af raft_group0: trigger snapshot if existing snapshot index is 0
The persisted snapshot index may be 0 if the snapshot was created in
older version of Scylla, which means snapshot transfer won't be
triggered to a bootstrapping node. Commands present in the log may not
cover all schema changes --- group 0 might have been created through the
upgrade upgrade procedure, on a cluster with existing schema. So a
deployment with index=0 snapshot is broken and we need to fix it. We can
use the new `raft::server::trigger_snapshot` API for that.

Fixes scylladb/scylladb#16683
2024-01-30 16:35:54 +01:00
Kamil Braun
74bf60a8ca test_raft_snapshot_request: fix flakiness
Add workaround for scylladb/python-driver#295.

Also an assert made at the end of the test was false, it is fixed with
appropriate comment added.
2024-01-30 16:21:24 +01:00
Kamil Braun
39339b9f70 test: topology/util: update comment for reconnect_driver
The issues mentioned in the comment before are already fixed.
Unfortunately, there is another, opposite issue which this function can
be used for. The previous issue was about the existing driver session
not reconnecting. The current issue is about the existing driver session
reconnecting too much... (and in the middle of queries.)
2024-01-30 15:36:48 +01:00
Piotr Smaroń
35ba037724 config: fix a typo in --role-manager's description
Closes scylladb/scylladb#17063
2024-01-30 16:13:33 +02:00
Kamil Braun
cf3f26dc94 test_maintenance_mode: fix flakiness
Wait until CQL is available and nodes see each other before trying to
perform a query.

Closes scylladb/scylladb#17059
2024-01-30 14:11:14 +02:00
Gleb Natapov
8b50613465 test: add test for rebuild failure 2024-01-30 11:04:19 +02:00
Gleb Natapov
d62204e758 test: add expected_error to rebuild_node operation 2024-01-30 11:04:19 +02:00
Gleb Natapov
51c40034f5 topology_coordinator: Propagate rebuild failure to the initiator
Do not retry rebuild endlessly, but report the error instead.
2024-01-30 11:04:19 +02:00
Kefu Chai
90c0e83f9a thrift: remove unused namespace definition
thrift_transport is never used, so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17050
2024-01-30 09:16:47 +02:00
Michał Chojnowski
904bb25987 test: test_tablet_cleanup: wait for servers to see each other before multi-node queries
Waiting for CQL connections is not enough. For the queries to succeed,
nodes must see each other. We have to wait for this, otherwise the test
will be flaky.

Fixes #17029

Closes scylladb/scylladb#17040
2024-01-30 08:56:01 +02:00
Tomasz Grabiec
36f218c83b Merge 'main: refuse startup when tablet resharding is required' from Botond Dénes
We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes.
Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard.

Startup will fail as:

    ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.)

Refs: #16739
Fixes: #16843

Closes scylladb/scylladb#17008

* github.com:scylladb/scylladb:
  test/topolgy_experimental_raft: test_tablets.py: add test for resharding
  test/pylib: manager[_client]: add update_cmdline()
  main: refuse startup when tablet resharding is required
  locator: tablets: add check_tablet_replica_shards()
2024-01-29 23:39:41 +01:00
Pavel Emelyanov
370fbd346c Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel
`db::config` is a class, that is used in many places across the code base. When it is changed, its clients' code need to be recompiled. It represents the configuration of the database. Some fields of the configuration that describe the location of directories may be empty. In such cases `db::config::setup_directories()` function is called - it modifies the provided configuration. Such modification is not good - it is better to keep `db::config` intact.

This PR:
 - extends the public interface of utils::directories class to provide required directory paths to the users
 - removes 'db::config::setup_directories()' to avoid altering the fields of configuration object
 - replaces usages of db::config object with utils::directories object in places that require obtaining paths to dirs

Fixes: scylladb#5626

Closes scylladb/scylladb#16787

* github.com:scylladb/scylladb:
  utils/directories: make utils::directories::set an internal type
  db::config: keep dir paths unchanged
  cql_transport/controler: use utils::directories to get paths of dirs
  service/storage_proxy: use utils::directories to get paths of dirs
  api/storage_service.cc: use utils::directories to get paths of dirs
  tools/scylla-sstable.cc: use utils::directories to get paths
  db/commitlog: do not use db::config to get dirs
  Use utils::directories to get dirs paths in replica::database
  Allow utils::directories to provide paths to dirs
  Clean-up of utils::directories
2024-01-29 18:01:15 +03:00
Kamil Braun
0912d2a2c6 Merge 'raft topology: make left_token_ring a transition state' from Patryk Jędrzejczak
When a node is in the `left_token_ring` state, we don't know how
it has ended up in this state. We cannot distinguish a node that
has finished decommissioning from a node that has failed bootstrap.

The main problem it causes is that we incorrectly send the
`barrier_and_drain` command to a node that has failed
bootstrapping or replacing. We must do it for a node that has
finished decommissioning because it could still coordinate
requests. However, since we cannot distinguish nodes in the
`left_token_ring` state, we must send the command to all of them.
This issue appeared in scylladb/scylladb#16797 and this PR is
a follow-up that fixes it.

The solution is changing `left_token_ring` from a node state
to a transition state.

Fixes scylladb/scylladb#16944

Closes scylladb/scylladb#17009

* github.com:scylladb/scylladb:
  docs: dev: topology-over-raft: document the left_token_ring state
  topology_coordinator: adjust reason string in left_token_ring handler
  raft topology: make left_token_ring a transition state
  topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes
2024-01-29 15:29:01 +01:00
Kefu Chai
819fc95a67 reader: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17036
2024-01-29 16:21:42 +02:00
Kefu Chai
43094d2023 db: add formatter for db::read_repair_decision
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::read_repair_decision`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17033
2024-01-29 15:43:51 +02:00
Botond Dénes
d202d32f81 Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun
This allows the user of `raft::server` to cause it to create a snapshot
and truncate the Raft log (leaving no trailing entries; in the future we
may extend the API to specify number of trailing entries left if
needed). In a later commit we'll add a REST endpoint to Scylla to
trigger group 0 snapshots.

One use case for this API is to create group 0 snapshots in Scylla
deployments which upgraded to Raft in version 5.2 and started with an
empty Raft log with no snapshot at the beginning. This causes problems,
e.g. when a new node bootstraps to the cluster, it will not receive a
snapshot that would contain both schema and group 0 history, which would
then lead to inconsistent schema state and trigger assertion failures as
observed in scylladb/scylladb#16683.

In 5.4 the logic of initial group 0 setup was changed to start the Raft
log with a snapshot at index 1 (ff386e7a44)
but a problem remains with these existing deployments coming from 5.2,
we need a way to trigger a snapshot in them (other than performing 1000
arbitrary schema changes).

Another potential use case in the future would be to trigger snapshots
based on external memory pressure in tablet Raft groups (for strongly
consistent tables).

The PR adds the API to `raft::server` and a HTTP endpoint that uses it.

In a follow-up PR, we plan to modify group 0 server startup logic to automatically
call this API if it sees that no snapshot is present yet (to automatically
fix the aforementioned 5.2 deployments once they upgrade.)

Closes scylladb/scylladb#16816

* github.com:scylladb/scylladb:
  raft: remove `empty()` from `fsm_output`
  test: add test for manual triggering of Raft snapshots
  api: add HTTP endpoint to trigger Raft snapshots
  raft: server: add `trigger_snapshot` API
  raft: server: track last persisted snapshot descriptor index
  raft: server: framework for handling server requests
  raft: server: inline `poll_fsm_output`
  raft: server: fix indentation
  raft: server: move `io_fiber`'s processing of `batch` to a separate function
  raft: move `poll_output()` from `fsm` to `server`
  raft: move `_sm_events` from `fsm` to `server`
  raft: fsm: remove constructor used only in tests
  raft: fsm: move trace message from `poll_output` to `has_output`
  raft: fsm: extract `has_output()`
  raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor`
  raft: server: pass `*_aborted` to `set_exception` call
2024-01-29 15:06:04 +02:00
Beni Peled
8009170d3a docs: update the installation instructions with the new gpg 2024 key
Closes scylladb/scylladb#17019
2024-01-29 14:37:25 +02:00
Kefu Chai
6f55d68dd9 .git: add more skip words
these words are either

* shortened words: strategy => strat, read_from_primary => fro
* or acronyms: node_or_data => nd

before we rename them with better names, let's just add them to the
ignore word list.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17002
2024-01-29 14:37:03 +02:00
Patryk Wrobel
781a6a5071 utils/directories: make utils::directories::set an internal type
Previously, utils::directories::set could have been used by
clients of utils::directories class to provide dirs for creation.
Due to moving the responsibility for providing paths of dirs from
db::config to utils::directories, such usage is no longer the case.

This change:
 - defines utils::directories::set in utils/directories.cc to disallow
   its usage by the clients of utils::directories
 - makes utils::directories::create_and_verify() member function
   private; now it is used only by the internals of the class
 - introduces a new member function to utils::directories called
   create_and_verify_sharded_directory() to limit the functionality
   provided to clients

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:20:41 +01:00
Patryk Wrobel
dc8d5ffaf6 db::config: keep dir paths unchanged
This change is intended to ensure, that
db::config fields related to directories
are not changed. To achieve that a member
function called setup_directories() is
removed.

The responsibility for directories paths
has been moved to utils::directories,
which may generate default paths if the
configuration does not provide a specific
value.

Fixes: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:20:41 +01:00
Patryk Wrobel
0f3b00f9ad cql_transport/controler: use utils::directories to get paths of dirs
This change replaces usage of db::config with
usage of utils::directories to get paths of
directories in cql_transport/controler.

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:20:38 +01:00
Patryk Wrobel
f08768e767 service/storage_proxy: use utils::directories to get paths of dirs
This change replaces usage of db::config with
usage of utils::directories to get paths of
directories in service/storage_proxy.

Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
5ac3d0f135 api/storage_service.cc: use utils::directories to get paths of dirs
This change replaces usage of db::config with usage
of utils::directories in api/storage_service.cc in
order to get the paths of directories.

Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
51fa108df7 tools/scylla-sstable.cc: use utils::directories to get paths
This change replaces usage of db::config with usage
of utils::directories to get paths of directories
in tools/scylla-sstable.cc.

Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
804afffb11 db/commitlog: do not use db::config to get dirs
This change removes usage of db::config to
get path of commitlog_directory. Instead, it
introduces a new parameter to directly pass
the path to db::commitlog::config::from_db_config().

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
9483d149af Use utils::directories to get dirs paths in replica::database
This change replaces the usage of db::config with
usage of utils::directories to get dirs paths in
replica::database class.

Moreover, it adjusts tests that require construction
of replica::database - its constructor has been
changed to accept utils::directories object.

Refs: scylladb#5626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
1cd676e438 Allow utils::directories to provide paths to dirs
This change extends utils::directories class in
the following way:
 - adds new member variables that correspond to
   fields from db::config that describe paths
   of directories
 - introduces a public interface to retrieve the
   values of the new members
 - allows construction of utils::directories
   object based on db::config to setup internal
   member variables related to paths to dirs

The new members of utils::directories are overriden
when the provided values are empty. The way of setting
paths is taken from db::config.

To ensure that the new logic works correctly
`utils_directories_test` has been created.

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Patryk Wrobel
1b0ccaf4f2 Clean-up of utils::directories
This change is intended to clean-up files in which
utils::directories class is defined to ease further
extensions.

The preparation consists of:
 - removal of `using namespace` from directories.hh to
   avoid namespace pollution in files, that include this
   header
 - explicit inclusion of headers, that were missing or
   were implicitly included to ensure that directories.hh
   is self-sufficient
 - defining directories::set class outside of its parent
   to improve readability

Refs: scylladb#5626

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-29 13:11:33 +01:00
Botond Dénes
fd66ce1591 test/topolgy_experimental_raft: test_tablets.py: add test for resharding
Check that scylla refuses to start when the shard count is reduced.
2024-01-29 07:04:33 -05:00
Botond Dénes
a7a5aada2a test/pylib: manager[_client]: add update_cmdline()
Similar to the existing update_config(). Updates the command-line
arguments of the specified nodes, merging the new options into the
existing ones. Needs a restart to take effect.
2024-01-29 07:04:33 -05:00
Botond Dénes
8a439fc2a8 main: refuse startup when tablet resharding is required
We do not support tablet resharding yet. All tablet-related code assumes
that the (host_id, shard) tablet replica is always valid. Violating this
leads to undefined behaviour: errors in the tablet load balancer and
potential crashes.
Avoid this by refusing to start if the need to resharding is detected.
Be as lenient as possible: check all tablets with a replica on this node,
and only refuse startup if at least one tablet has an invalid replica
shard.

Startup will fail as:

    ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.)
2024-01-29 07:04:33 -05:00
Botond Dénes
95b6aeebae locator: tablets: add check_tablet_replica_shards()
Checks that all tablets with a replica on the this node, have a valid
replica shard (< smp::count).
Will be used to check whether the node can start-up with the current
shard-count.
2024-01-29 07:04:33 -05:00
Patryk Jędrzejczak
7c10cae6c4 docs: dev: topology-over-raft: document the left_token_ring state
In one of the previous patches, we changed the `left_token_ring`
state from a node state to a transition state. We document it
in this patch. The node state wasn't documented, so there is
nothing to remove.
2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak
9b2d1a20a3 topology_coordinator: adjust reason string in left_token_ring handler
We were using the "finished decommission node" reason string for a
failed bootstrap and replace.
2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak
b0eef50b2e raft topology: make left_token_ring a transition state
A node can be in the `left_token_ring` state after:
- a finished decommission,
- a failed bootstrap,
- a failed replace.

When a node is in the `left_token_ring` state, we don't know how
it has ended up in this state. We cannot distinguish a node that
has finished decommissioning from a node that has failed bootstrap.

The main problem it causes is that we incorrectly send the
`barrier_and_drain` command to a node that has failed
bootstrapping or replacing. We must do it for a node that has
finished decommissioning because it could still coordinate
requests. However, since we cannot distinguish nodes in the
`left_token_ring` state, we must send the command to all of them.
This issue appeared in scylladb/scylladb#16797 and this patch is
a follow-up that fixes it.

The solution is changing `left_token_ring` from a node state
to a transition state.

Regarding implementation, most of the changes are simple
refactoring. The less obvious are:
- Before this patch, in `system_keyspace::left_topology_state`, we
had to keep the ignored nodes' IDs for replace to ensure that the
replacing node will have access to it after moving to the
`left_token_ring` state, which happens when replace fails. We
don't need this workaround anymore. When we enter the new
`left_token_ring` transition state, the new node will still be in
the `decommissioning` state, so it won't lose its request param.
- Before this patch, a decommissioning node lost its tokens
while moving to the `left_token_ring` state. After the patch, it
loses tokens while still being in the `decommissioning` state. We
ensure that all `decommissioning` handlers correctly handle a node
that lost its tokens.

Moving the `left_token_ring` handler from `handle_node_transition`
to `handle_topology_transition` created a large diff. There are
only three changes:
- adding `auto node = get_node_to_work_on(std::move(guard));`,
- adding `builder.del_transition_state()`,
- changing error logged when `global_token_metadata_barrier` fails.
2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak
12eb0738cf topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes
The `exclude_nodes` variable was unused, but it wasn't a bug.
The `left_token_ring` and `rollback_to_normal` handlers correctly
compute excluded nodes on their own.
2024-01-29 10:39:06 +01:00
Kefu Chai
0cbf8f75f0 db: add formatter for dht::decorated_key and repair_sync_boundary
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for dht::decorated_key and
repair_sync_boundary.

please note, before this change, repair_sync_boundary was using
the operator<< based formatter of `dht::decorated_key`, so we are
updating both of them in a single commit.

because we still use the homebrew generic formatter of vector<>
in to format vector<repair_sync_boundary> and vector<dht::decorated_key>,
so their operator<< are preserved.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16994
2024-01-29 11:11:41 +02:00
Tzach Livyatan
06a9a925a5 Update link to sizing / pricing calc
Closes scylladb/scylladb#17015
2024-01-29 11:07:20 +02:00
Kefu Chai
b5ff098f28 thrift: add formatter for cassandra::ConsistencyLevel::type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cassandra::ConsistencyLevel::type.
please note, the operator<< for `cassandra::ConsistencyLevel::type`
is generated using `thrift` command line tool, which does not emit
specialization for fmt::formatter yet, so we need to use
`fmt::ostream_formatter` to implement the formatter for this type.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17013
2024-01-29 10:10:35 +02:00
Pavel Emelyanov
3abdb3c7ee tablets: Remove tablet_aware_replication_strategy::parse_initial_tablets
It's now unused, string with initial tablets its parsed elsewhere

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17010
2024-01-29 10:03:38 +02:00
Kefu Chai
912c588975 thrift: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17012
2024-01-29 10:02:30 +02:00
Kefu Chai
abb12979f8 raft: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17011
2024-01-29 10:00:56 +02:00
Kefu Chai
8f38bd5376 commitlog: add formatter for db::replay_position
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `db::replay_position`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17014
2024-01-29 09:59:30 +02:00
Botond Dénes
d3c1be9107 Merge 'alternator: enable tablets by default if experimental feature is enabled' from Nadav Har'El
This series does a similar change to Alternator as was done recently to CQL:

1. If the "tablets" experimental feature in enabled, new Alternator tables will use tablets automatically, without requiring an option on each new table. A default choice of initial_tablets is used. These choices can still be overridden per-table if the user wants to.
3. In particular, all test/alternator tests will also automatically run with tablets enabled
4. However, some tests will fail on tablets because they use features that haven't yet been implemented with tablets - namely Alternator Streams (Refs #16317) and Alternator TTL (Refs #16567). These tests will - until those features are implemented with tablets - continue to be run without tablets.
5. An option is added to the test/alternator/run to allow developers to manually run tests without tablets enabled, if they wish to (this option will be useful in the short term, and can be removed later).

Fixes #16355

Closes scylladb/scylladb#16900

* github.com:scylladb/scylladb:
  test/alternator: add "--vnodes" option to run script
  alternator: use tablets by default, if available
  test/alternator: run some tests without tablets
2024-01-29 09:22:13 +02:00
Kefu Chai
cb5453d534 .git: only allow codespell to run on master branch
so that non-master branches are not read by 3rd-party tools unless
they are audited.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16999
2024-01-29 09:04:20 +02:00
Kefu Chai
f96d25a0a7 tool: check for existence of keyspace before getting it
in general, user should save output of `DESC foo.bar` to a file,
and pass the path to the file as the argument of `--schema-file`
option of `scylla sstable` commands. the CQL statement generated
from `DESC` command always include the keyspace name of the table.
but in case user create the CQL statement manually and misses
the keyspace name. he/she would have following assertion failure
```
scylla: cql3/statements/cf_statement.cc:49: virtual const sstring &cql3::statements::raw::cf_statement::keyspace() const: Assertion `_cf_name->has_keyspace()' failed.
```
this is not a great user experience.

so, in this change, we check for the existence of keyspace before
looking it up. and throw a runtime error with a better error mesage.
so when the CQL statement does not have the keyspace name, the new
error message would look like:
```
error processing arguments: could not load schema via schema-file: std::runtime_error (tools::do_load_schemas(): CQL statement does not have keyspace specified)
```

since this check is only performed by `do_load_schemas()` which
care about the existence of keyspace, and it only expects the
CQL statement to create table/keyspace/type, we just override the
new `has_keyspace()` method of the corresponding types derived
from `cf_statement`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16981
2024-01-29 09:02:01 +02:00
Anna Stuchlik
dfa88ccc28 doc: document nodetool resetlocalschema
This adds the documentation for the nodetool resetlocalschema
command.
The syntax description is based on the description for Cassandra
and the ScyllaDB help for nodetool.

Fixes https://github.com/scylladb/scylladb/issues/16286

Closes scylladb/scylladb#16790
2024-01-28 21:09:02 +01:00
Kefu Chai
fe3bc00045 topology_coordinator: fix misspellings in log
these misspellings are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17006
2024-01-26 16:50:39 +02:00
Dawid Medrek
b92fb3537a main: Postpone start-up of hint manager
In this commit, we postpone the start-up
of the hint manager until we obtain information
about other nodes in the cluster.

When we start the hint managers, one of the
things that happen is creating endpoint
managers -- structures managed by
db::hints::manager. Whether we create
an instance of endpoint manager depends on
the value returned by host_filter::can_hint_for,
which, in turn, may depend on the current state
of locator::topology.

If locator::topology is incomplete, some endpoint
managers may not be started even though they
should (because the target node IS part of the
cluster and we SHOULD send hints to it if there
are some).

The situation like that can happen because we
start the hint managers too early. This commit
aims to solve that problem. We only start
the hint managers when we've gathered information
about the other nodes in the cluster and created
the locator::topology using it.

Hinted Handoff is not negatively affected by these
changes since in between the previous point of
starting the hint managers and the current one,
all of the mutations performed by
service::storage_proxy target the local node, so
no hints would need to be generated anyway.

Fixes scylladb/scylladb#11870
Closes scylladb/scylladb#16511
2024-01-26 12:49:40 +01:00
Botond Dénes
c6fd4dffbb Merge 'Remove anonymous namespaces from headers' from Patryk Wróbel
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.

This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.

This PR removes unnamed namespaces from header files.

References:

- [CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header"](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf21-dont-use-an-unnamed-anonymous-namespace-in-a-header)

- [SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file"](https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file)

Closes scylladb/scylladb#16998

* github.com:scylladb/scylladb:
  utils/config_file_impl.hh: remove anonymous namespace from header
  mutation/mutation.hh: remove anonymous namespace from header
2024-01-26 13:20:17 +02:00
Kefu Chai
a9d781d70f test/nodetool: only test "storage_service/cleanup_all" with scylla
this RESTful API is a scylla specific extension and is only used
by scylla-nodetool. currently, the java-based nodetool does not use
it at all, so mark it with "scylla_only".

one can verify this change with:
```
pytest --mode=debug --nodetool=cassandra test_cleanup.py::test_cleanup
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17001
2024-01-26 13:19:15 +02:00
Botond Dénes
582ddc70ec Merge 'test/nodetool: return a randomized address if not running with unshare' from Kefu Chai
we should allow user to run nodetool tests without `test.py`. but there
are good chance that the host could be reused by multiple tests or
multiple users who could be using port 12345. by randomizing the IP and
port, they would have better chance to complete the test without running
into used port problem.

Closes scylladb/scylladb#16996

* github.com:scylladb/scylladb:
  test/nodetool: return a randomized address if not running with unshare
  test/nodetool: return an address from loopback_network fixture
2024-01-26 13:15:58 +02:00
Kefu Chai
9ee6c00c84 docs: fix misspellings
these misspellings are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17005
2024-01-26 13:14:21 +02:00
Kefu Chai
72cec22932 repair: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16993
2024-01-26 13:12:38 +02:00
Kamil Braun
4f736894e1 Merge 'Add maintenance mode' from Mikołaj Grzebieluch
In this mode, the node is not reachable from the outside, i.e.
* it refuses all incoming RPC connections,
* it does not join the cluster, thus
  * all group0 operations are disabled (e.g. schema changes),
  * all cluster-wide operations are disabled for this node (e.g. repair),
  * other nodes see this node as dead,
  * cannot read or write data from/to other nodes,
* it does not open Alternator and Redis transport ports and the TCP CQL port.

The only way to make CQL queries is to use the maintenance socket. The node serves only local data.

To start the node in maintenance mode, use the `--maintenance-mode true` flag or set `maintenance_mode: true` in the configuration file.

REST API works as usual, but some routes are disabled:
* authorization_cache
* failure_detector
* hinted_hand_off_manager

This PR also updates the maintenance socket documentation:
* add cqlsh usage to the documentation
* update the documentation to use `WhiteListRoundRobinPolicy`

Fixes #5489.

Closes scylladb/scylladb#15346

* github.com:scylladb/scylladb:
  test.py: add test for maintenance mode
  test.py: generalize usage of cluster_con
  test.py: when connecting to node in maintenance mode use maintenance socket
  docs: add maintenance mode documentation
  main: add maintenance mode
  main: move some REST routes initialization before joining group0
  message_service: add sanity check that rpc connections are not created in the maintenance mode
  raft_group0_client: disable group0 operations in the maintenance mode
  service/storage_service: add start_maintenance_mode() method
  storage_service: add MAINTENANCE option to mode enum
  service/maintenance_mode: add maintenance_mode_enabled bool class
  service/maintenance_mode: move maintenance_socket_enabled definition to seperate file
  db/config: add maintenance mode flag
  docs: add cqlsh usage to maintenance socket documentation
  docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy
2024-01-26 11:02:34 +01:00
Botond Dénes
f94acc2eb4 test/cql-pytest: conftest.py: remove xfail_tablets fixture
No test uses it and going forward we should not add tests wchich do not
work with tablets.
2024-01-26 04:02:40 -05:00
Botond Dénes
dcaf308a59 test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests
The tests in this file, that are related to partition-scans are failing
with tablets, and were hence disabled with xfail_tablets. This means we
are loosing test coverage, so parametrize these tests to run with both
vnodes and tablets, and targetedly mark as xfail only when running with
tablets.
2024-01-26 04:02:40 -05:00
Botond Dénes
3527d0aaed test/cql-pytest: test_describe.py: re-enable disabled tests
This test file has two tests disabled:
* test_desc_cluster - due to #16789
* test_whitespaces_in_table_options - due to #16317

They are disabled via xfail, because they do not work with tablets. This
means we loose test coverage of the respective functionality.
This patch re-enables the two tests, by parametrizing them to run with
both vnodes and tablets:
* test_desc_cluster - when run with tablets, endpoint info is not
  validated. The test is still useful because it checks that DESC
  CLUSTER doesn't break with tablets. A FIXME with a link to #16789
  is left.
* test_whitespaces_in_table_options - marked xfail when run with
  tablets, but not when run with vnodes, thus we re-gain the test
  coverage.
2024-01-26 04:02:40 -05:00
Botond Dénes
a3b75e863b test/cql-pytest: test_cdc.py: re-enable disabled tests
The tests in this file are currently all marked with xfail_tablets,
because tablets are not enabled by default in the cql-pytest suite and
CDC doesn't currently work with tablets at all. This however means that
the CDC functionality looses test coverage. So instead, of a blanket
xfail, prametrize these tests to run with both vnodes and tablets, and
add a targeted xfail for the tablets parameter. This way the no coverage
is lost, the tests are still running with vnode (and will fail if
regressions are introduced), and they are allowed to xfail with tablets
enabled.

We could simply make these tests only run with vnodes for now. But
looking forward, after the CDC functionality is fixed to work with
tablets, we want to verify that it works with both vnodes and tablets.
So we run the test with both and leave the xfail as a remainder that a
fix is required.
2024-01-26 04:02:40 -05:00
Botond Dénes
631f7c99f5 test/cql-pytest: add parameter support to test_keyspace
Tests can now request to be run against both tablets and vnodes, via:

    @pytest.mark.parametrize("test_keyspace", ["tablets", "vnodes"], indirect=True)

This will set request.param for the test_keyspace fixture, which can
create the keyspace according to the requested parameter. This way,
tests can conveniently opt-in to be run against both replication
methods.
When not parameterized like this, the test_keyspace fixture will create
a keyspace as before -- with tablets, if support is enabled.
2024-01-26 04:02:40 -05:00
Kefu Chai
637dd73079 sstable/storage: use fs::path to represent _dir and _temp_dir
they are directories, and we are concating strings to build the paths
to the sstable components. so it would be more elegant to use fs::path
for manipulating paths.

this change was inspired by the discussion on passing the relative
path to sstable to `scylla sstables`, where we use the
`path::parent_path()` as the dir of sstable, and then concatenate
it with the filename component. but if the `parent_path()` method
returns an empty string, we end up with a path like
"/me-42-big-TOC.txt", which is not reachable. what we should be
reading is "me-42-big-TOC.txt". so, we should better off either
using `fs::path` or enforcing the absolute path.

since we already using "/" as separator, and concatenating strings,
this is an opportunity to switch over to `fs::path` to address
the problem and to avoid the string concatenating.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16982
2024-01-26 09:54:41 +02:00
Patryk Wrobel
6faa178f10 utils/config_file_impl.hh: remove anonymous namespace from header
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.

This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.

This change aligns the code with the following guidelines:
 - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous)
                       namespace in a header"
 - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace
                  in a header file"

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-26 08:44:44 +01:00
Patryk Wrobel
c218333afb cql3/type_json.cc: move stringstream content instead of copying it
C++20 introduced a new overload of std::ofstringstream::str()
that is selected when the mentioned member function is called
on r-value.

The new overload returns a string, that is move-constructed
from the underlying string instead of being copy-constructed.

This change applies std::move() on stringstream objects before
calling str() member function to avoid copying of the underlying
buffer.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16990
2024-01-26 09:41:09 +02:00
Kefu Chai
36e81f93d2 .git: do not apply codespell to licenses
we should keep the licenses as they are, even with misspellings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16992
2024-01-26 09:39:27 +02:00
Patryk Wrobel
ba488b10ec mutation/mutation.hh: remove anonymous namespace from header
Anonymous namespace implies internal linkage for its members.
When it is defined in a header, then each translation unit,
which includes such header defines its own unique instance
of members of the unnamed namespace that are ODR-used within
that translation unit.

This can lead to unexpected results including code bloat
or undefined behavior due to ODR violations.

This change aligns the code with the following guidelines:
 - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous)
                       namespace in a header"
 - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace
                  in a header file"

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-26 08:38:39 +01:00
Kefu Chai
01727a5399 test/nodetool: return a randomized address if not running with unshare
we should allow user to run nodetool tests without `test.py`. but there
are good chance that the host could be reused by multiple tests or
multiple users who could be using port 12345. by randomizing the IP and
port, they would have better chance to complete the test without running
into used port problem.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-26 13:32:47 +08:00
Kefu Chai
358d30fd29 test/nodetool: return an address from loopback_network fixture
* rename "maybe_setup_loopback_network" to "server_address"
* return an address from the fixture

this change prepares for bringing back the randomized IP and port,
in case users run this test without test.py, by randomizing the
IP and port, they would have better chance to complete the test
without running into used port problem.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-26 13:20:37 +08:00
Raphael S. Carvalho
3b14c5b84a test/topology_experimental_raft: Add tablet split test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
90c9a5d7af replica: Bypass reshape on boot with tablets temporarily
Without it, table loading fails as reshape mixes sstables from
different tablets together, and now we have a guard for that:

Unable to load SSTable ...-big-Data.db that belongs to tablets 1 and 31,

The fix is about making reshape compaction group aware.
It will be fixed, but not now.

Refs #16966.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
2cb8a824ec replica: Fix table::compaction_group_for_sstable() for tablet streaming
It might happen that sstable being streamed during migration is not
split yet, therefore it should be added to the main compaction group,
allowing the streaming stage to start split work on it, and not
fool the coordinator thinking it can proceed with split execution
which would cause problems.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
4245ad333a test/topology_experimental_raft: Disable load balancer in test fencing
This is easier to reproducer after changes in load balancer, to
emit resize decisions, which in turn results in topology version
being incremented, and that might race with fencing tests that
manipulate the topology version manually.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
85020861fc replica: Remap compaction groups when tablet split is finalized
When coordinator executes split, i.e. commit the new tablet map with
each tablet split into two, all replicas must then proceed with
remapping of compaction groups that were previously split.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
bf6f692f60 service: Split tablet map when split request is finalized
When load balancer emits finalize request, the coordinator will
now react to it by splitting each tablet in the current tablet
map and then committing the new map.

There can be no active migration while we do it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
9342792173 replica: Update table split status if completed split compaction work
The table replica will say to coordinator that its split status
is ready by loading the sequence number from tablet metadata
into its local state, which is pulled periodically by the
coordinator via RPC.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
cfa8200da5 storage_service: Implement split monitor
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:43 -03:00
Raphael S. Carvalho
e0de3dd844 topology_cordinator: Generate updates for resize decisions made by balancer
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:58:40 -03:00
Raphael S. Carvalho
3ef792c4e8 load_balancer: Introduce metrics for resize decisions
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
638e6e30cb db: Make target tablet size a live-updateable config option
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
7ed5b44d52 load_balancer: Implement resize decisions
This implements the ability in load balancer to emit split or merge
requests, cancel ongoing ones if they're no longer needed, and
also finalize those that are ready for the topology changes.

That's all based on average tablet size, collected by coordinator
from all nodes, and split and merge thresholds.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
8f7f74c490 service: Wire table_resize_plan into migration_plan
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
8d283b2593 service: Introduce table_resize_plan
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
ed2138a35a tablet_mutation_builder: Add set_resize_decision()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
490d109055 topology_coordinator: Wire load stats into load balancer
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
ce353bf47c storage_service: Allow tablet split and migration to happen concurrently
Lack of synchronization could lead the coordinator to think that a
pending replica in migration has split ready status, when in reality
it escaped the check if it happens that the leaving replica escaped
the split ready check, after the status has already been pulled at
destination by coordinator.

Example:
1) Coordinator pulls split status (ready) from destination replica
2) Migration sends a non-split tablet into destination
3) Coordinator pulls split status (ready) from source after
transition stage of migration moved to cleanup (so there's no
longer a leaving replica in it).
4) Migration completes, but compaction group is not split yet.
Coordinator thinks destination is ready.

To solve it, streaming now guarantees that pending replica is
split before returning, so migration can only advance to next
stage after the pending replica is split, if and only if
there's a split request emitted.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
2209c7440c topology_coordinator: Periodically retrieve table_load_stats
This implements the fiber that aggregates per-table stats that will
be feeded into load balancer to make resize decisions (split,
merge, or revoke ongoing ones).

Initially, the stats will be refreshed every 60s, but the idea
is that eventually we make the frequency table based, where
the size of each table is taken into account.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
489a527e20 locator: Introduce topology::get_datacenter_nodes()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
9519a0c9e4 storage_service: Implement table_load_stats RPC
This implements the RPC for collecting table stats.

Since both leaving and pending replica can be accounted during
tablet migration, the RPC handler will look at tablet transition
info and account only either leaving or replica based on the
tablet migration stage. Replicas that are not leaving or
pending, of course, don't contribute to the anomaly in the
reported size.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
4684615927 replica: Expose table_load_stats in table
This is the table replica state that coordinator will aggregate
from all nodes and feed into the load balancer.

A tablet filter is added to not double account migrating tablets,
so only one of pending or leaving tablet replica will be accounted
based on current migration stage. More details can be known in
the patch that will implement the filter.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
beef9c9f70 replica: Introduce storage_group::live_disk_space_used()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
6c74fc4b82 locator: Introduce table_load_stats
This is per table stats that will be aggregated from all nodes, by
the coordinator, in order to help load balancer make resize
decisions.

size_in_bytes is the total aggregated table size, so coordinator
becomes responsible for taking into account RF of each DC and
also tablet count, for computing an accurate average size.

split_ready_seq_number is the minimum sequence number among all
replicas. If coordinator sees all replicas store the seq number
of current split, then it knows all replicas are ready for the
next stage in the split process.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:08 -03:00
Raphael S. Carvalho
0d5ba1ee4b tablets: Add resize decision metadata to tablet metadata
The new metadata describes the ongoing resize operation (can be either
of merge, split or none) that spans tablets of a given table.
That's managed by group0, so down nodes will be able to see the
decision when they come back up and see the changes to the
metadata.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:36:06 -03:00
Raphael S. Carvalho
57582ac9c4 locator: Introduce resize_decision
resize_decision is the metadata the says whether tablets of a table
needs split, merge, or none. That will be recorded in tablet metadata,
and therefore stored in group0.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-01-25 18:31:12 -03:00
Avi Kivity
03313d359e Merge ' db: commitlog_replayer: ignore mutations affected by (tablet) cleanups ' from Michał Chojnowski
To avoid data resurrection, mutations deleted by cleanup operations should be skipped during commitlog replay.

This series implements the above for tablet cleanups, by using a new system table which holds records of cleanup operations.

Fixes #16752

Closes scylladb/scylladb#16888

* github.com:scylladb/scylladb:
  test: test_tablets: add a test for cleanup after migration
  test: pylib: add ScyllaCluster.wipe_sstables
  test: boost: add commitlog_cleanup_test
  db: commitlog_replayer: ignore mutations affected by (tablet) cleanups
  replica: table: garbage-collect irrelevant system.commitlog_cleanups records
  db: commitlog: add min_position()
  replica: table: populate system.commitlog_cleanups on tablet cleanup
  db: system_keyspace: add system.commitlog_cleanups
  replica: table: refresh compound sstable set after tablet cleanup
2024-01-25 20:51:03 +02:00
Patryk Wrobel
a858daf038 service/client_state.cc: remove redundant copying
db::schema_tables::all_table_names() returns std::vector<sstring>.
Usage of range-for loop without reference results in copying each
of the elements of the traversed container. Such copying is redundant.

This change introduces usage of const reference to avoid copying.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16983
2024-01-25 20:35:05 +02:00
Kamil Braun
543ad0987a Merge 'raft topology: send barrier_and_drain to a decommissioning node' from Patryk Jędrzejczak
We didn't send the `barrier_and_drain` command to a
decommissioning node that could still be coordinating requests. It
could happen that a decommissioning node sent a request with an
old topology version after normal nodes received the new fence
version. Then, the request would fail on replicas with the stale
topology exception.

This PR fixes this problem by modifying `exec_global_command`.
From now on, it sends `barrier_and_drain` to a decommissioning
node.

We also stop filtering stale topology exceptions in
`test_topology_ops`. We added this filter after detecting the bug
fixed by this PR.

Fixes scylladb/scylladb#15804
Fixes scylladb/scylladb#16579
Fixes scylladb/scylladb#16642

Closes scylladb/scylladb#16797

* github.com:scylladb/scylladb:
  test: test_topology_ops: remove failed mutations filter
  raft topology: send barrier_and_drain to a decommissioning node
  raft topology: ensure at most one transitioning node
2024-01-25 16:09:02 +01:00
Kefu Chai
ee28cf2285 test.py: s/defalt/default/
this typo was identified by codespell

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16980
2024-01-25 16:54:07 +02:00
Botond Dénes
6d5ee6d48a Merge 'test/nodetool: run nodetool tests using "unshare"' from Kefu Chai
before this change, we use a random address when launching
rest_api_mock server, but there are chances that the randomly
picked address conflicts with an already-used address on the
host. and the subprocess fails right away with the returncode of
1 upon this failure, but we just continue on and check the readiness
of the already-dead server. actually, we've seen test failures
caused by the EADDRINUSE failure, and when we checked the readiness
of the rest_api_mock by sending HTTP request and reading the
response, what we had is not a JSON encoded response but a webpage,
which was likely the one returned by a minio server.

in this change, we

* specify the "launcher" option of nodetool
  test suite to "unshare", so that all its tests are launched
  in separated namespaces.
* do not use a random address for the mock server, as the network
  namespaces are separated.

Fixes #16542

Closes scylladb/scylladb#16773

* github.com:scylladb/scylladb:
  test/nodetool: run nodetool tests using "unshare"
  test.py: add "launcher" option support
2024-01-25 16:53:49 +02:00
Mikołaj Grzebieluch
763911af5b test.py: add test for maintenance mode
The test checks that in maintenance mode server A is not available for other
nodes and for clients. It is possible to connect by the maintenance socket
to server A and perform local CQL operations.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
ca35e352f5 test.py: generalize usage of cluster_con
Add option to pass load_balancing policy.
Change hosts type to list of IPs or cassandra.Endpoint.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
77a656bfd6 test.py: when connecting to node in maintenance mode use maintenance socket
A node in the maintenance socket hasn't an opened regular CQL port.
To connect to the node, the scylla cluster needs to use the node's maintenance socket.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
9c07a189e8 docs: add maintenance mode documentation 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
0bdbd6e8f5 main: add maintenance mode
In maintenance mode:
* Group0 doesn't start and the node doesn't join the token ring to behave as a dead
node to others,
* Group0 operations are disabled and result in an error,
* Only the maintenance socket listens for CQL requests,
* The storage service initialises token_metadata with the local node as the only node
on the token ring.

Maintenance mode is enabled by passing the --maintenance-mode flag.

Maintenance mode starts before the group0 is initialised.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
617adde9c9 main: move some REST routes initialization before joining group0
Move REST endpoints that don't need connection with other nodes, before joining the group0.
This way, they can be initialized in the maintenance mode.

Move `snapshot_ctl` along with routes because of snapshots API and tasks API.
Its constructor is a noop, so it is safe to move it.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
d8de209dcf message_service: add sanity check that rpc connections are not created in the maintenance mode
In maintenance mode, a node shouldn't be able to communicate with other nodes.

To make sure this does not happen, the sanity check is added.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
c08266cfe5 raft_group0_client: disable group0 operations in the maintenance mode
In maintenance mode, the node doesn't communicate with other nodes, so it doesn't
start or apply group0 operations. Users can still try to start it, e.g. change
the schema, and the node can't allow it.

Init _upgrade_state with recovery in the maintenance mode.
Throw an error if the group0 operation is started in maintenance mode.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
97641f646a service/storage_service: add start_maintenance_mode() method
In the maintenance mode, other nodes won't be available thus we disabled joining
the token ring and the token metadata won't be populated with the local node's endpoint.
When a CQL query is executed it checks the `token_metadata` structure and fails if it is empty.

Add a method that initialises `token_metadata` with the local node as the only node in the token ring.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
c530756837 storage_service: add MAINTENANCE option to mode enum
join_cluster and start_maintenance_mode are incompatible.
To make sure that only one is called when the node starts, add the MAINTENANCE option.

start_maintenance_mode sets _operation_mode to MAINTENANCE.
join_cluster sets _operation_mode to STARTING.

set_mode will result in an internal error if:
* it tries to set MAINTENANCE mode when the _operation_mode is other than NONE,
  i.e. start_maintenance_mode is called after join_cluster (or it is called during
  the drain, but it also shouldn't happen).
* it tries to set STARTING mode when the mode is set to MAINTENANCE,
  i.e. join_cluster is called after start_maintenance_mode.
2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
d4c22fc86c service/maintenance_mode: add maintenance_mode_enabled bool class 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
8b2f0e38d9 service/maintenance_mode: move maintenance_socket_enabled definition to seperate file 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
e6a83b9819 db/config: add maintenance mode flag 2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch
81ef9fc91e docs: add cqlsh usage to maintenance socket documentation
After https://github.com/scylladb/scylla-cqlsh/pull/67, the user can use
cqlsh to connect to the node by maintenance socket.
2024-01-25 15:27:53 +01:00
Botond Dénes
c67698ea06 compaction/compaction_manager: perform_cleanup(): hold the compaction gate
While the cleanup is ongoing. Otherwise, a concurrent table drop might
trigger a use-after-free, as we have seen in dtests recently.

Fixes: #16770

Closes scylladb/scylladb#16874
2024-01-25 14:52:50 +01:00
Mikołaj Grzebieluch
2c34d9fcd8 docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy
After https://github.com/scylladb/python-driver/pull/287, the user can use
WhiteListRoundRobinPolicy to connect to the node by maintenance socket.
2024-01-25 14:52:24 +01:00
Pavel Emelyanov
bf3cae4992 Merge 'tests: utils: error injection: print time duration instead of count' from Kefu Chai
before this change, we always cast the wait duration to millisecond,
even if it could be using a higher resolution. actually
`std::chrono::steady_clock` is using `nanosecond` for its duration,
so if we inject a deadline using `steady_clock`, we could be awaken
earlier due to the narrowing of the duration type caused by the
duration_cast.

in this change, we just use the duration as it is. this should allow
the caller to use the resolution provided by Seastar without losing
the precision. the tests are updated to print the time duration
instead of count to provide information with a higher resolution.

Fixes #15902

Closes scylladb/scylladb#16264

* github.com:scylladb/scylladb:
  tests: utils: error injection: print time duration instead of count
  error_injection: do not cast to milliseconds when injecting timeout
2024-01-25 16:13:27 +03:00
Avi Kivity
69d597075a Merge 'tablets: Add support for removenode and replace handling' from Tomasz Grabiec
New tablet replicas are allocated and rebuilt synchronously with node
operations. They are safely rebuilt from all existing replicas.
The list of ignored nodes passed to node operations is respected.

Tablet scheduler is responsible for scheduling tablet rebuilding transition which
changes the replicas set. The infrastructure for handling decommission
in tablet scheduler is reused for this.

Scheduling is done incrementally, respecting per-shard load
limits. Rebuilding transitions are recognized by load calculation to
affect all tablet replicas.

New kind of tablet transition is introduced called "rebuild" which
adds new tablet replica and rebuilds it from existing replicas. Other
than that, the transition goes through the same stages as regular
migration to ensure safe synchronization with request coordinators.

In this PR we simply stream from all tablet replicas. Later we should
switch to calling repair to avoid sending excessive amounts of data.

Fixes https://github.com/scylladb/scylladb/issues/16690.

Closes scylladb/scylladb#16894

* github.com:scylladb/scylladb:
  tests: tablets: Add tests for removenode and replace
  tablets: Add support for removenode and replace handling
  topology_coordinator: tablets: Do not fail in a tight loop
  topology_coordinator: tablets: Avoid warnings about ignored failured future
  storage_service, topology: Track excluded state in locator::topology
  raft topology: Introduce param-less topology::get_excluded_nodes()
  raft topology: Move get_excluded_nodes() to topology
  tablets: load_balancer: Generalize load tracking
  tablets: Introduce get_migration_streaming_info() which works on migration request
  tablets: Move migration_to_transition_info() to tablets.hh
  tablets: Extract get_new_replicas() which works on migraiton request
  tablets: Move tablet_migration_info to tablets.hh
  tablets: Store transition kind per tablet
2024-01-25 14:49:43 +02:00
Patryk Jędrzejczak
b348014745 test: test_topology_ops: remove failed mutations filter
We added this filter after detecting a bug in the Raft-based
topology. We weren't sending `barrier_and_drain` commands to a
decommissioning node that could still be coordinating requests.
It could cause stale topology exceptions on replicas if the
decommissioning node sent a request with an old topology version
after normal nodes received the new fence version.

This bug has been fixed in the previous commit, so we remove the
filter.
2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak
9aebd6dd96 raft topology: send barrier_and_drain to a decommissioning node
Before this patch, we didn't send the `barrier_and_drain` command
to a decommissioning node that could still be coordinating
requests. It could happen that a decommissioning node sent
a request with an old topology version after normal nodes received
the new fence version. Then, the request would fail on replicas
with the stale topology exception.

We fix this problem by modifying `exec_global_command`. From now
on, it sends `barrier_and_drain` to a decommissioning node, which
can also be in the `left_token_ring` state.
2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak
378cbd0b70 raft topology: ensure at most one transitioning node
We add a sanity check to ensure at most one transitioning node at
a time. If there is more, something must have gone wrong.

In the future, we might implement concurrent topology operations.
Then, we will remove this sanity check.

We also extend the comment describing `transition_nodes` so that
it better explains why we use a map and how it should be handled.
2024-01-25 13:42:46 +01:00
Alexander Turetskiy
c1ae5425f7 DROP TYPE IF EXISTS should work on non-existent keyspace
DROP TYPE IF EXISTS should pass and do nothing  on non-existent keyspace

fixes #9082

Closes scylladb/scylladb#16504
2024-01-25 14:28:43 +02:00
Kefu Chai
b1431f08f7 test/nodetool: run nodetool tests using "unshare"
before this change, we use a random address when launching
rest_api_mock server, but there are chances that the randomly
picked address conflicts with an already-used address on the
host. and the subprocess fails right away with the returncode of
1 upon this failure, but we just continue on and check the readiness
of the already-dead server. actually, we've seen test failures
caused by the EADDRINUSE failure, and when we checked the readiness
of the rest_api_mock by sending HTTP request and reading the
response, what we had is not a JSON encoded response but a webpage,
which was likely the one returned by a minio server.

in this change, we

* specify the "launcher" option of nodetool
  test suite to "unshare", so that all its tests are launched
  in separated namespaces.
* use a random fixed address for the mock server, as the network
  namespaces are not shared anymore
* add an option in `nodetool/conftest.py`, so that it can optionally
  setup the lo network interface when it is launched in a separated
  new network namespace.

Fixes #16542
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-25 20:28:36 +08:00
Kefu Chai
35b3c51f40 test.py: add "launcher" option support
before this change, all "tool" test suites use "pytest" to launch their
tests. but some of the tests might need a dedicated namespace so they
do not interfere with each other. fortunately, "unshare(1)" allows us
to run a progame in new namespaces.

in this change, we add a "launcher" option to "tool" test suites. so
that these tests can run with the specified "launcher" instead of using
"launcher". if "launcher" is not specified, its default value of
"pytest" is used.

Refs #16542
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-25 20:28:01 +08:00
Kurashkin Nikita
d90eeb5c4f cql3:statement_restrictions.cc: multi-column relation null check
Before this patch we received internal server error
"Attempted to create key component from empty optional" when used null in
multi-column relations.
This patch adds a null check for each element of each tuple in the
expression and generates an invalid request error if it finds such an element.

Modified cassandra test and added a new one that checks the occurrence of null values in tuples.
Added a test that checks whether the wrong number of items is entered in tuples.

Fixes #13217

Closes scylladb/scylladb#16415
2024-01-25 14:17:43 +02:00
Botond Dénes
5df4ad2e48 test/cql-pytest: test_tools.py: fix flaky schema load failure test
The test TestScyllaSsstableSchemaLoading.test_fail_schema_autodetect was
observed to be flaky. Sometimes failing on local setups, but not in CI.
As it turns out, this is because, when run via test.py, the test's
working directory is root directory of scylla.git. In this case,
scylla-sstable will find and read conf/scylla.yaml. After having done
so, it will try look in the default data directory
(/var/lib/scylla/data) for the schema tables. If the local machine
happens to have a scylla data-dir setup at the above mentioned location,
it will read the schema tables and will succeed to find the tested
table (which is system table, so it is always present). This will fail
the test, as the test expects the opposite -- the table not being found.

The solution is to change the test's working directory to the random
temporary work dir, so that the local environment doesn't interfere with
it.

Fixes: #16828

Closes scylladb/scylladb#16837
2024-01-25 15:14:16 +03:00
Botond Dénes
b341aa8f6d Merge 'api/api.hh: improve usage of standard containers' from Patryk Wróbel
This PR contains improvements related to usage of std::vector and looping over containers in the range-for loop.

It is advised to use `std::vector::reserve()` to avoid unneeded memory allocations when the total size is known beforehand.

When looping over a container that stores non-trivial types usage of const reference is advised to avoid redundant copies.

Closes scylladb/scylladb#16978

* github.com:scylladb/scylladb:
  api/api.hh: use const reference when looping over container
  api/api.hh: use std::vector::reserve() when the total size is known
2024-01-25 13:22:48 +02:00
Kamil Braun
994a2ea5c3 Merge 'Call left/joined notifiers when topology coordinator is enabled' from Gleb
The gossiper topology change code calls left/joined notifiers when a
node leave or joins the cluster. This code it not executed in topology coordinator
mode, so the coordinator needs to call those notifiers by itself. The
series add the calls.

Fixes scylladb/scylladb#15841

* 'gleb/raft-topo-notifications-v1' of github.com:scylladb/scylla-dev:
  storage service: topology coordinator: call notify_joined() when a node joins a cluster
  storage service: topology coordinator: call notify_left() when a node leaves a cluster
  storage_service: drop redundant check from notify_joined()
2024-01-25 12:12:53 +01:00
Kefu Chai
1d33a68dd7 tests: utils: error injection: print time duration instead of count
instead of casting / comparing the count of duration unit, let's just
compare the durations, so that boost.test is able to print the duration
in a more informative and user friendly way (line wrapped)

test/boost/error_injection_test.cc(167): fatal error:
    in "test_inject_future_disabled":
      critical check wait_time > sleep_msec has failed [23839ns <= 10ms]

Refs #15902
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-25 19:10:24 +08:00
Kefu Chai
8a5689e7a7 error_injection: do not cast to milliseconds when injecting timeout
before this change, we always cast the wait duration to millisecond,
even if it could be using a higher resolution. actually
`std::chrono::steady_clock` is using `nanosecond` for its duration,
so if we inject a deadline using `steady_clock`, we could be awaken
earlier due to the narrowing of the duration type caused by the
duration_cast.

in this change, we just use the duration as it is. this should allow
the caller to use the resolution provided by Seastar without losing
the precision.

Fixes #15902

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-25 19:10:24 +08:00
Gleb Natapov
adf70aae15 storage service: topology coordinator: call notify_joined() when a node joins a cluster
When the topology coordinator is used for topology changes the gossiper
based code that calls notify_joined() is not called. The coordinator needs
to call it itself. But it needs to call it only once when node becomes
normal. For that the patch changes state loading code to remember the
old set of nodes in normal state to check if a node that is normal after
new state is loaded was not in the normal state before.
2024-01-25 12:28:08 +02:00
Botond Dénes
c9f247f3e8 Merge 'sstables: writer: don't block topology changes while writing sstables' from Avi Kivity
The sstable writer held the effective_replication_map_ptr while writing
sstables, which is both a layering violation and slows down tablet load
balancing. It was needed in order to ensure the sharder was stable. But
it turns out that sharding metadata is unnecessary for tablets, so just
skip the whole thing when writing an sstable for tablets.

Closes scylladb/scylladb#16953

* github.com:scylladb/scylladb:
  sstables: writer: don't require effective_replication_map for sharding metadata
  schema: provide method to get sharder, iff it is static
2024-01-25 12:12:01 +02:00
Botond Dénes
8e82df6fb6 Merge 'coverage libraries: bug fixes' from Eliran Sinvani
This mini-series contains two bug fixes that were found as part of testing coverage reporting in CI:
ref: https://github.com/scylladb/scylladb/pull/16895

1. The html-fixup which is triggered when using:`test/pylib/coverage_utils.py lcov-tools genhtml...` rendered incorrect links for multiple links in the same line.
2. For files that contined `,` in their name the output was simply wrong and resulted in lcov not being able to find such files for the purpose of filtering or generating reports.

The aforementioned draft PR served as a testing bed for finding and fixing those bugs.

Closes scylladb/scylladb#16977

* github.com:scylladb/scylladb:
  lcov_utils.py: support sourcefiles that contains commas in their name
  coreage_utils.py: make regular expression lazy in  html-fixup
2024-01-25 11:46:15 +02:00
Kefu Chai
0fbfc96619 db: add formatter for schema_tables::table_kind
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for db::schema_tables::table_kind,
and its operator<<() is still used by the homebrew generic formatter
for std::map<>, so it is preserved.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16972
2024-01-25 11:33:13 +03:00
Kefu Chai
ffb5ad494f api: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16973
2024-01-25 11:28:02 +03:00
Patryk Wrobel
cdfe0c1c35 api/api.hh: use const reference when looping over container
When reference is not used in the range-for loop, then
each element of a container is copied. Such copying
is not a problem for scalar types. However, the in case
of non-trivial types it may cause unneeded overhead.

This change replaces copying with const references
to avoid copying of types like seastar::sstring etc.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-25 09:20:35 +01:00
Patryk Wrobel
1ca71f2532 api/api.hh: use std::vector::reserve() when the total size is known
When growing via push_back(), std::vector may need to reallocate
its internal block of memory due to not enough space. It is advised
to allocate the required space before appending elements if the
size is known beforehand.

This change introduces usage of std::vector::reserve() in api.hh
to ensure that push_back() does not cause reallocations.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-01-25 08:50:19 +01:00
Eliran Sinvani
d27283918f lcov_utils.py: support sourcefiles that contains commas in their name
As part of the parsing, every line of an lcov file was modeled as
INFO_TYPE:field[,field]...
However specifically for info type "SF" which represents the source file
there can only be one field.
This caused files that are using ',' in their names to be cut down up to
the first ',' and as a results not handled  correctly by lcov_utils.py
especially when rewriting a file.
This patch adds a special handling for the "SF" INFO_TYPE.
ref : `man geninfo`

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-25 09:30:52 +02:00
Eliran Sinvani
11eb9f5bb2 coreage_utils.py: make regular expression lazy in html-fixup
The html-fixup procedure was created because of a bug in genhtml (`man
genhtml` for details about what genhtml is). The bug is that genhtml
doesn't account for file names that contains illegal  url characters (ref:
https://stackoverflow.com/a/1547940/2669716). html-fixup converts those
characters to the %<octet> notation (i.e space character becomes %20
etc..). However, the regular expression used to detect links was eager,
which didn't account for multiple links in the same line. This was
discovered during browsing one of the report and noticing that the links
that are meant to alternate between code view and function view of a
source got scrambled and unusable after html-fixup.
This change makes the regex that is used to detect links lazy so it can
handle multiple links in the same line in an html file correctly.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-25 09:30:42 +02:00
Nadav Har'El
69a68e35dd Merge 'scylla-sstable: add support for loading schema of views and indexes' from Botond Dénes
Loading schemas of views and indexes was not supported, with either `--schema-file`, or when loading schema from schema sstables.
This PR addresses both:
* When loading schema from CQL (file), `CREATE MATERIALIZED VIEW` and `CREATE INDEX` statements are now also processed correctly.
* When loading schema from schema tables, `system_schema.views` is also processed, when the table has no corresponding entry in `system_schema.tables`.

Tests are also added.

Fixes: #16492

Closes scylladb/scylladb#16517

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI
  test/cql-pytest: test_tools.py: extract some fixture logic to functions
  test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class
  tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas
  tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas
  test/boost/schema_loader_test: add test for mvs and indexes
  tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL
  replica/database: extract existing_index_names and get_available_index_name
  tools/schema_loader: make real_db.tables the only source of truth on existing tables
  tools/schema_loader: table(): store const keyspace&
  tools/schema_loader: make database,keyspace,table non-movable
  cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value
  cql3/statements/create_index_statement: make build_index_schema() public
  cql3/statements/create_index_statement: relax some method's dependence on qp
  cql3/statements/create_view_statement: make prepare_view() public
2024-01-24 23:36:54 +02:00
Nadav Har'El
df6c9828ef Merge 'Add protobuf and Native histogram support' from Amnon Heiman
Native histograms (also known as sparse histograms) are an experimental Prometheus feature.
They use protobuf as the reporting layer.
Native histograms hold the benefits of high resolution at a lower resource cost.

This series allows sending histograms in a native histogram format over protobuf.
By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf.

Fixes #12931

Closes scylladb/scylladb#16737

* github.com:scylladb/scylladb:
  main.cc: Add prometheus_allow_protobuf command line
  histogram_metrics_helper: support native histogram
  config: Add prometheus_allow_protobuf flag
2024-01-24 21:24:50 +02:00
Michał Chojnowski
f0eadc734e test: test_tablets: add a test for cleanup after migration
Reproduces the problems fixed by earlier commits in the series.
2024-01-24 19:36:29 +01:00
Botond Dénes
7bb3ed7f23 docs/operating-scylla: scylla-sstable.rst: fix checksum list
Add empty line before list of different checksums in
validate-checksums's description. Otherwise the list is not rendered.

Closes scylladb/scylladb#16401
2024-01-24 16:34:13 +01:00
Kefu Chai
a9851cf834 test.py: replace "$foo is False" with "not $foo"
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16960
2024-01-24 15:21:53 +02:00
Kefu Chai
add74ec8ee mutation_writer: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16958
2024-01-24 15:20:02 +02:00
Kefu Chai
c978d1b3f8 config: s/re-use/reuse/
this misspelling is identified by codespell.
per m-w, reuse is a word per-se, and we don't need the hyphen for
addressing the ambiguity in the use cases, like, recover and re-cover.
see also https://www.merriam-webster.com/dictionary/reuse

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16962
2024-01-24 15:19:03 +02:00
Kefu Chai
8c39aba820 tools/scylla-sstable: use canonical path for sst_path
we deduce the paths to other SSTable components from the one
specified from the command line, for instance, if
/a/b/c/me-really-big-Data.db is fed to `scylla sstable`, the tool
would try to read /a/b/c/me-really-big-TOC.txt for the list of
other components. this works fine if the full path is specified
in the command line.

but if a relative path is specified, like, "me-really-big-Data.db",
this does not work anymore. before this change, the tool
would be reading "/me-really-big-TOC.txt", which does not exist
under most circumstances. while $PWD/me-really-big-TOC.txt should
exist if the SSTable is sane.

after this change, we always convert the specified path to
its canonical representation, no matter it is relative or absolutate.
this enables us to get the correct parent path path when trying
to read, for instance, the TOC component.

Fixes #16955
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16964
2024-01-24 13:28:40 +02:00
Michał Chojnowski
b88a0eb9ab test: pylib: add ScyllaCluster.wipe_sstables
Add a method which wipes sstables files for a particular table on a particular
stopped node.
2024-01-24 11:52:49 +01:00
Michał Chojnowski
94cdfcaa94 test: boost: add commitlog_cleanup_test
Adds a test for the commitlog cleanup functionality added
earlier in the series.
2024-01-24 10:37:39 +01:00
Michał Chojnowski
a246bb39ef db: commitlog_replayer: ignore mutations affected by (tablet) cleanups
To avoid data resurrection, mutations deleted by cleanup operations
have to be skipped during commitlog replay.

This patch implements this, based on the metadata recorded on cleanup
operations into system.commitlog_cleanups.
2024-01-24 10:37:39 +01:00
Michał Chojnowski
f458a1bf3e replica: table: garbage-collect irrelevant system.commitlog_cleanups records
Currently, rows in system.commitlog_cleanups are only dropped on node restart,
so the table can accumulate an unbounded number of records.

This probably isn't a problem in practice, because tablet cleanups aren't that
frequent, but this patch adds a countermeasure anyway.

This patch makes the choice to delete the unneeded records right when new records
are added. This isn't ideal -- it would be more natural if the unneeded records
were deleted as soon as they become unneeded -- but it does the job with a
minimal amount of code.
2024-01-24 10:37:38 +01:00
Michał Chojnowski
05ff32ebf9 db: commitlog: add min_position()
Add a helper function which returns the minimum replay position
across all existing or future commitlog segments.
Only positions greater or equal to it can be replayed on the next reboot.

We will use this helper in a future patch to garbage collect some cleanup
metadata which refers to replay positions.
2024-01-24 10:37:38 +01:00
Michał Chojnowski
a10650959c replica: table: populate system.commitlog_cleanups on tablet cleanup
To avoid data resurrection after cleanup, we have to filter out the
cleaned mutations during commitlog replay.

In this patch, we get tablet cleanup to record the affected set of mutations
to system.commitlog_cleanups. In a later patch, we will use these records
for filtering during commitlog replay.
2024-01-24 10:37:38 +01:00
Michał Chojnowski
7c5a8894be db: system_keyspace: add system.commitlog_cleanups
Add a system table which will hold records of cleanup operations,
for the purpose of filtering commitlog replays to avoid data
resurrections.
2024-01-24 10:37:38 +01:00
Michał Chojnowski
8bfd078c54 replica: table: refresh compound sstable set after tablet cleanup
If the compound set isn't refreshed, readers will keep seeing
the dataset as it was before the cleanup, which is a bug.
2024-01-24 10:37:38 +01:00
Kefu Chai
207fe93b90 utils: add formatter for rjson::value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for rjson::value, and drop its
operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16956
2024-01-24 10:30:52 +02:00
Gleb Natapov
b97ff54a41 storage service: topology coordinator: call notify_left() when a node leaves a cluster
When the topology coordinator is used for topology changes the gossiper
based code that calls notify_left() is not called. The coordinator needs
to call it itself.
2024-01-24 10:21:01 +02:00
Gleb Natapov
5459a8b9a5 storage_service: drop redundant check from notify_joined()
notify_joined() is called from handle_state_normal only, so there is no
point checking that the state is normal inside the function as well.
2024-01-24 10:17:12 +02:00
Avi Kivity
8ee75ae8f4 sstables: writer: don't require effective_replication_map for sharding metadata
Currently, we pass an effective_replication_map_ptr to sstable_writer,
so that we can get a stable dht::sharder for writing the sharding metadata.
This is needed because with tablets, the sharder can change dynamically.

However, this is both bad and unnecessary:
 - bad: holding on to an effective_replication_map_ptr is a barrier
   for topology operations, preventing tablet migrations (etc) while
   an sstable is being written
 - unnecessary: tablets don't require sharding metadata at all, since
   two tablets cannot overlap (unlike two sstables from different shards in
   the same node). So the first/last key is sufficient to determine the
   shard/tablet ownership.

Given that, just pass the sharder for vnode sstables, and don't generate
sharding metadata for tablet sstables.
2024-01-23 22:23:08 +02:00
Avi Kivity
b88f422a53 schema: provide method to get sharder, iff it is static
The current get_sharder() method only allows getting a static sharder
(since a dynamic sharder needs additional protection). However, it
chooses to abort if someone attempt to get a dynamic sharder.

In one case, it's useful to get a sharder only if it's static, so
provide a method to do that. This is for providing sstable sharding
metadata, which isn't useful with tablets.
2024-01-23 22:20:59 +02:00
Kamil Braun
05643208a8 Merge 'raft topology: move the topology coordinator to a dedicated file' from Piotr Dulikowski
The `topology_coordinator` is a large class (>1000 loc) which resides in
an even larger source file (storage_service.cc, ~7800 loc). This PR
moves the topology_coordinator class out of the storage_service.cc file
in order to improve modularity and recompilation times during
development.

As a first step, the `topology_mutation_builder` and
`topology_node_mutation_builder` classes are also moved from
storage_service.cc to their own, new header/source files as they are an
important abstraction used both by the topology coordinator code and
some other code in storage_service.cc that won't be moved.

Then, the `topology_coordinator` is moved out. The
`topology_coordinator` class is completely hidden in the new
topology_coordinator.cc file and can only be started and waited on to
finish via the new `run_topology_coordinator` function.

Fixes: scylladb/scylladb#16605

Closes scylladb/scylladb#16609

* github.com:scylladb/scylladb:
  service: move topology coordinator to a separate file
  storage_service: introduce run_topology_coordinator function
  service: move topology mutation builder out of storage_service
  storage_service: detemplate topology_node_mutation_builder::set
2024-01-23 20:02:06 +01:00
Kefu Chai
f86a5ae87a streaming: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16947
2024-01-23 19:38:30 +02:00
Kefu Chai
d493f949ca cql3: add formatter for cql3::statements::statement_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cql3::statements::statement_type. and its operator<<() is dropped.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16948
2024-01-23 19:36:24 +02:00
Piotr Dulikowski
c3c3f5c1c8 service: move topology coordinator to a separate file
The topology coordinator is a large class that sits in an even larger
storage_service.cc file. For the sake of code modularization and
reducing recompilation time, move the topology coordinator outside
storage_service.cc.

The topology_coordinator class is moved to the new
topology_coordinator.cc unchanged. Along with it, the following items
are moved:

- wait_for_ip function - it's used both by storage_service and
  topology_coordinator, so in order for the new topology_coordinator.cc
  not to depend on storage service, it is moved to the new file,
- raft_topology logger - for the same reason as wait_for_ip,
- run_topology_coordinator - serves as the main interface for the
  topology coordinator. The topology coordinator class is not exposed at
  all, it's only possible to start the coordinator and wait until it
  shuts down itself via that function.
2024-01-23 17:51:10 +01:00
Avi Kivity
4a57b67634 docs: add a rough diagram of module interaction
It is incomplete and maybe inaccurate, but it is a start.

Closes scylladb/scylladb#16903
2024-01-23 18:08:48 +02:00
Kamil Braun
1824c12975 raft: remove empty() from fsm_output
Nobody remembered to keep this function up to date when adding stuff to
`fsm_output`.

Turns out that it's not being used by any Raft logic but only in some
tests. That use case can now be replaced with `fsm::has_output()` which
is also being used by `raft::server` code.
2024-01-23 16:48:28 +01:00
Kamil Braun
bf6d5309ca test: add test for manual triggering of Raft snapshots 2024-01-23 16:48:28 +01:00
Kamil Braun
617e09137d api: add HTTP endpoint to trigger Raft snapshots
This uses the `trigger_snapshot()` API added in previous commit on a
server running for the given Raft group.

It can be used for example in tests or in the context of disaster
recovery (ref scylladb/scylladb#16683).
2024-01-23 16:48:28 +01:00
Kamil Braun
0eda7a2619 raft: server: add trigger_snapshot API
This allows the user of `raft::server` to ask it to create a snapshot
and truncate the Raft log. In a later commit we'll add a REST endpoint
to Scylla to trigger group 0 snapshots.

One use case for this API is to create group 0 snapshots in Scylla
deployments which upgraded to Raft in version 5.2 and started with an
empty Raft log with no snapshot at the beginning. This causes problems,
e.g. when a new node bootstraps to the cluster, it will not receive a
snapshot that would contain both schema and group 0 history, which would
then lead to inconsistent schema state and trigger assertion failures as
observed in scylladb/scylladb#16683.

In 5.4 the logic of initial group 0 setup was changed to start the Raft
log with a snapshot at index 1 (ff386e7a44)
but a problem remains with these existing deployments coming from 5.2,
we need a way to trigger a snapshot in them (other than performing 1000
arbitrary schema changes).

Another potential use case in the future would be to trigger snapshots
based on external memory pressure in tablet Raft groups (for strongly
consistent tables).
2024-01-23 16:48:28 +01:00
David Garcia
77822fc51d chore: add azure and gcp images extensions
Closes scylladb/scylladb#16942
2024-01-23 16:06:40 +02:00
Botond Dénes
e79ea91990 Merge 'Extend query tracing information' from Michał Jadwiszczak
This little patch adds:
- authenticated user to "Processing a statement" tracing log
- name of a semaphore to reader concurrency semaphore logs

The purpose of this patch is to be able to verify parts of query execution to track down issues with service levels.

```
cassandra@cqlsh> select * from ks1.t where a = 1;

 a | b
---+---

(0 rows)

Tracing session: ea7e5ce0-b9f5-11ee-b123-b0816809f2c0

 activity                                                                                                                                     | timestamp                  | source    | source_elapsed | client
----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                                           Execute CQL3 query | 2024-01-23 14:47:14.734000 | 127.0.0.1 |              0 | 127.0.0.1
                                                                                                         Parsing a statement [shard 1/sl:sl1] | 2024-01-23 14:47:14.734126 | 127.0.0.1 |              3 | 127.0.0.1
                                                                    Processing a statement for authenticated user: cassandra [shard 1/sl:sl1] | 2024-01-23 14:47:14.734279 | 127.0.0.1 |            156 | 127.0.0.1
      Creating read executor for token -4069959284402364209 with all: {127.0.0.2} targets: {127.0.0.2} repair decision: NONE [shard 1/sl:sl1] | 2024-01-23 14:47:14.737348 | 127.0.0.1 |           3225 | 127.0.0.1
   Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 1/sl:sl1] | 2024-01-23 14:47:14.737351 | 127.0.0.1 |           3228 | 127.0.0.1
                                                                                  read_data: sending a message to /127.0.0.2 [shard 1/sl:sl1] | 2024-01-23 14:47:14.737358 | 127.0.0.1 |           3236 | 127.0.0.1
                                                                                 read_data: message received from /127.0.0.1 [shard 1/sl:sl1] | 2024-01-23 14:47:14.737593 | 127.0.0.2 |             16 | 127.0.0.1
                                                        Start querying singular range {{-4069959284402364209, 000400000001}} [shard 0/sl:sl1] | 2024-01-23 14:47:14.737676 | 127.0.0.2 |             24 | 127.0.0.1
                                                                  [reader concurrency semaphore sl:sl1] admitted immediately [shard 0/sl:sl1] | 2024-01-23 14:47:14.737684 | 127.0.0.2 |             31 | 127.0.0.1
                                                                        [reader concurrency semaphore sl:sl1] executing read [shard 0/sl:sl1] | 2024-01-23 14:47:14.737688 | 127.0.0.2 |             35 | 127.0.0.1
                                    Querying cache for range {{-4069959284402364209, 000400000001}} and slice {(-inf, +inf)} [shard 0/sl:sl1] | 2024-01-23 14:47:14.737715 | 127.0.0.2 |             63 | 127.0.0.1
 Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 0 clustering row(s) (0 live, 0 dead) and 0 range tombstone(s) [shard 0/sl:sl1] | 2024-01-23 14:47:14.737724 | 127.0.0.2 |             72 | 127.0.0.1
                                                                                                            Querying is done [shard 0/sl:sl1] | 2024-01-23 14:47:14.737731 | 127.0.0.2 |             79 | 127.0.0.1
                                                                read_data handling is done, sending a response to /127.0.0.1 [shard 1/sl:sl1] | 2024-01-23 14:47:14.738321 | 127.0.0.2 |            743 | 127.0.0.1
                                                                                     read_data: got response from /127.0.0.2 [shard 1/sl:sl1] | 2024-01-23 14:47:14.739148 | 127.0.0.1 |           5026 | 127.0.0.1
                                                                                        Done processing - preparing a result [shard 1/sl:sl1] | 2024-01-23 14:47:14.739196 | 127.0.0.1 |           5074 | 127.0.0.1
                                                                                                                             Request complete | 2024-01-23 14:47:14.739087 | 127.0.0.1 |           5087 | 127.0.0.1

```

Closes scylladb/scylladb#16920

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: add name of semaphore in tracing messages
  cql3:query_processor: add logged user to query tracing info
2024-01-23 16:06:16 +02:00
Piotr Dulikowski
4ad6b6563b storage_service: introduce run_topology_coordinator function
Extracts a part of the logic of the raft_state_monitor_fiber method into
a separate function. It will be moved to a separate file in the next
commit along with the topology coordinator, and will serve as the only
way of interaction with the topology coordinator while the class itself
will remain hidden.

The topology_coordinator class is now directly constructed on the stack
(or rather in the coroutine frame), the indirection via shared_ptr is no
longer needed.
2024-01-23 14:09:12 +01:00
Patryk Wrobel
f15880dc48 compaction_group::stop(): always call compaction_manager.remove()
Before introduction of PR#15524 the removal had always been invoked
via finally() continuation. In spite of making flush() noexcept, the
mentioned PR modified the logic. If flush() returns exceptional future,
then the removal is not performed.

This change restores the old behavior - removal operation is always called.
Since now, the logic of compaction_group::stop() is as follows:
 - firstly, it waits for completion of flush() via
   seastar::coroutine::as_future() to avoid premature exception
 - then it executes compaction_manager.remove()
 - in the end it inspects the future returned from flush()
   to re-throw the exception if the operation failed

Fixed: scylladb#16751

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16940
2024-01-23 14:56:27 +02:00
Botond Dénes
78ec96f5f3 Merge 'alternator: allow empty tag value' from Nadav Har'El
Alternator incorrectly refuses an empty tag value for TagResource, but DynamoDB does allow this case and it's useful (note that an empty tag key is rightly forbidden). So this short series fixes this case, and adds additional tests for TagResource which covers this case and other cases we forgot to cover in tests.

Fixes #16904.

Closes scylladb/scylladb#16910

* github.com:scylladb/scylladb:
  test/alternator: add more tests for TagResource
  alternator: allow empty tag value
2024-01-23 13:53:30 +02:00
Botond Dénes
26d814d8be Merge 'Configure initial tablets count scaling' from Pavel Emelyanov
There are currently two options how to "request" the number of initial tables for a table

1. specify it explicitly when creating a keyspace
2. let scylla calculate it on its own

Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly.

Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient.

Closes scylladb/scylladb#16919

* github.com:scylladb/scylladb:
  config: Add --tablets-initial-scale-factor
  tablet_allocator: Add initial tablets scale to config
  tablet_allocator: Add config
2024-01-23 13:25:12 +02:00
Amnon Heiman
50b3078916 main.cc: Add prometheus_allow_protobuf command line
This patch add the prometheus_allow_protobuf command line support.

When set to true, Prometheus will accept protobuf requests and will
reply with protobuf protocol.
This will also enable the experimental Prometheus Native Histograms.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-01-23 13:12:34 +02:00
Amnon Heiman
95d1146fea histogram_metrics_helper: support native histogram
approx_exponential_histogram uses similar logic to Prometheus native
histogram, to allow Prometheus sending its data in a native histogram
format it needs to report schema and min id (id of the first bucket).

This patch update to_metrics_histogram to set those optional parameters,
leaving it to the Prometheus to decide in what format the histogram will
be reported.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-01-23 13:12:34 +02:00
Amnon Heiman
fc9bd2de03 config: Add prometheus_allow_protobuf flag
Native histograms (also known as sparse histograms) are an experimental
Prometheus feature. They use protobuf as the reporting layer.  The
prometheus_allow_protobuf flag allows the user to enable protobuf
protocol. When this flag is set to true, and the Prometheus server sends
in the request that it accepts protobuf, the result will be in protobuf
protocol.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2024-01-23 13:12:07 +02:00
Piotr Dulikowski
79c3ed7fdb service: move topology mutation builder out of storage_service
The topology_mutation_builder, topology_node_mutation_builder and
topology_request_tracking_mutation_builder are currently used by
storage service - mainly, but not exclusively, by the topology
coordinator logic. As we are going to extract the topology coordinator
to a separate file, we need to move the builders to their own file as
well so that they will be accessible both by the topology coordinator
and the storage service.
2024-01-23 11:17:46 +01:00
Piotr Dulikowski
6f11651222 storage_service: detemplate topology_node_mutation_builder::set
One of the overloads of `topology_node_mutation_builder::set` is a
template which takes a std::set of things that convert to a sstring.
This was done to support sets of strings of different types (e.g.
sstring, string_view) but it turns out that only sstring is used at the
moment.

De-template the method as it is unnecessary for it to be a template.
Moreover, the `topology_node_mutation_builder` is going to be moved in
the next commit of the PR to a separate file, so not having template
methods makes the task simpler.
2024-01-23 11:17:46 +01:00
Nadav Har'El
830e52008d test/alternator: add more tests for TagResource
Issue #16904 discovered that Alternator refuses to allow an empty tag
value while it's useful (and DynamoDB allows it). This brought to my
attention that our test coverage of the TagResource operation was lacking.
So this patch adds more tests for some corner cases of TagResource which
we missed, including the allowed lengths of tag keys and values.

These tests reproduce #16904 (the case of empty tag value) and also #16908
(allowing and correctly counting unicode letters), and also add
regression testing to cases which we already handled correctly.

As usual, all the new tests also pass on DynamoDB.

Refs #16904
Refs #16908

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 11:55:22 +02:00
Nadav Har'El
08b26269d8 alternator: allow empty tag value
The existing code incorrectly forbid setting a tag on a table to an empty
string value, but this is allowed by DynamoDB and is useful, so we fix it
in this patch.

While at it, improve the error-checking code for tag parameters to
cleanly detect more cases (like missing or non-string keys or values).

The following patch is a test that fails before this patch (because
it fails to insert a tag with an empty value) and passes after it.

Fixes #16904.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 11:26:08 +02:00
Michał Jadwiszczak
49544c47a1 reader_concurrency_semaphore: add name of semaphore in tracing messages 2024-01-23 10:25:34 +01:00
Michał Jadwiszczak
aac90c1f92 cql3:query_processor: add logged user to query tracing info 2024-01-23 10:25:34 +01:00
Nadav Har'El
4d6b286345 test/alternator: add "--vnodes" option to run script
test/cql-pytest/run.py was recently modified to add the "tablets"
experimental feature, so test/alternator/run now also runs Scylla by
default with tablets enabled.

This is the correct default going forward, but in the short term it
would be nice to also have an option to easily do a manual test run
*without* tablets.

So this patch adds a "--vnodes" option to the test/alternator/run script.
This option causes "run" to run Scylla without enabling the "tablets"
experimental feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:53:23 +02:00
Nadav Har'El
c496d60716 alternator: use tablets by default, if available
Before this patch, Alternator tables did not use tablets even if this
feature was available - tablets had to be manually enabled per table
by using a tag. But recently we changed CQL to enable tablets by default
on all keyspaces (when the experimental "tablets" option is turned on),
so this patch does the same for Alternator tables:

1. When the "tablets" experimental feature is on, new Alternator tables
   will use tablets instead of vnodes. They will use the default choice
   of initial_tablets.

2. The same tag that in the past could be used to enable tablets on a
   specific table, now can be used to disable tablets or change the
   default initial_tablets for a specific table at creation time.

Fixes #16355

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:53:23 +02:00
Nadav Har'El
36f14f89df test/alternator: run some tests without tablets
If an Alternator table uses tablets (we'll turn this on in a following
patch), some tests are known to fail because of features not yet
supported with tablets, namely:

  Refs #16317 - Support Alternator Streams with tablets (CDC)
  Refs #16567 - Support Alternator TTL with tablets

This patch changes all tests failing on tablets due to one of these two
known issues to explicitly ask to disable tablets when creating their
test table. This means that at least we continue to test these two
features (Streams and TTL) even if they don't yet work with tablets.

We'll need to remember to remove this override when tablet support
for CDC and Alternator TTL arrives. I left a comment in the right
places in the code with the relevant issue numbers, to remind us what
to change when we fix those issues.

This patch also adds xfail_tablets and skip_tablets fixtures that can
be used to xfail or skip tests when running with tablets - but we
don't use them yet - and may never use them, but since I already wrote
this code it won't hurt having it, just in case. When running without
tablets, or against an older Scylla or on DynamoDB, the tests with
these marks are run normally.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-23 10:46:48 +02:00
Botond Dénes
08cf5ccd23 Merge 'Fix test_tablet_missing_data_repair' from Asias He
This PR fixes test_tablet_missing_data_repair and enable the test again.

If a node is not UP yet, repair in the test will be a partial repair. The partial repair will not repair all the data which cause the check of rows after repair to fail.  Check nodes see each other as UP before repair.

Closes scylladb/scylladb#16930

* github.com:scylladb/scylladb:
  test: Enable test_tablet_missing_data_repair again
  test: Wait for nodes to be up when repair
  test: Check repair status in ScyllaRESTAPIClient
2024-01-23 10:38:13 +02:00
Anna Stuchlik
9076a944c5 doc: improve the ScyllaDB for Developers page
This commit improves the developer-oriented section
of the core documentation:

- Added links to the developer sections in the new
  Get Started guide (Develop with ScyllaDB and
  Tutorials and Example Projects) for ease of access.

- Replaced the outdated Learn to Use ScyllaDB page with
  a link to the up-to-date page in the Get Started guide.
  This involves removing the learn.rst file and adding
  an appropriate redirection.

- Removed the Apache Copyrights, as this page does not
  need it.

- Removed the Features panel box as there was only one
  feature listed, which looked weird. Also, we are in
  the process of removing the Features section.

Closes scylladb/scylladb#16800
2024-01-23 10:06:31 +02:00
Kefu Chai
ac473eca91 utils:: add formatter for enum_option
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for enum_option<>. since its
operator<<() is still used by the homebrew generic formatter for
formatting vector<>, operator<<() is preserved.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16917
2024-01-23 10:03:51 +02:00
Kefu Chai
91a93b125b utils:: add formatter for cql3::authorized_prepared_statements_cache_key
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for
cql3::authorized_prepared_statements_cache_key, and remove its
operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16924
2024-01-23 09:13:14 +02:00
Kefu Chai
76b9e4f4f4 locator: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16914
2024-01-23 09:12:23 +02:00
Asias He
99e3d2ce72 test: Enable test_tablet_missing_data_repair again
Fixes #16859
2024-01-23 15:02:02 +08:00
Kefu Chai
db77587309 tracing: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16925
2024-01-23 08:57:11 +02:00
Kefu Chai
26004071b3 configure.py: reenable -Wnarrowing
it seems that the tree builds just fine with this warning enabled.
and narrowing is a potentially unsafe numeric conversion. so let's
enable this warning option.

this change also helps to reduce the difference between the rules
generated by configure.py and those generated by CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16929
2024-01-23 08:49:25 +02:00
Kefu Chai
5005e0a156 configure.py: s/--std=/-std/
neither clang nor gcc supports the --std flag, they support -std=
though. see https://clang.llvm.org/cxx_status.html and
https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
so, let's use the -std=gnu++20 for the C++20 standard with GNU
extensions.

this change also helps to reduce the difference between the rules
generated by `configure.py` and those generated by CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16928
2024-01-23 08:48:05 +02:00
Asias He
7c230f17cc test: Wait for nodes to be up when repair
If a node is not UP yet, repair in the test will be a partial repair.
Check nodes see each other as UP before repair.

Fixes #16859
2024-01-23 11:10:08 +08:00
Asias He
57a4e5594d test: Check repair status in ScyllaRESTAPIClient
Raise an exception in case the repair is not successful.
2024-01-23 11:10:08 +08:00
Tomasz Grabiec
06c42681bd tests: tablets: Add tests for removenode and replace 2024-01-23 01:19:42 +01:00
Tomasz Grabiec
e5dcf03b88 tablets: Add support for removenode and replace handling
New tablet replicas are allocated synchronously with node
operations. They are safely rebuilt from all existing replicas.
The list of ignored nodes passed to node operations is respected.

Tablet scheduler is responsible for scheduling tablet transition which
changes the replicas set. The infrastructure for handling decommission
in tablet scheduler is reused for this.

Scheduling is done incrementally, respecting per-shard load
limits. Rebuilding transitions are recognized by load calculation to
affect all tablet replicas.

New kind of tablet transition is introduced called "rebuild" which
adds new tablet replica and rebuilds it from existing replicas. Other
than that, the transition goes through the same stages as regular
migration to ensure safe synchronization with request coordinators.

In this PR we simply stream from all tablet replicas. Later we should
switch to calling repair to avoid sending excessive amounts of data.

Fixes #16690.
2024-01-23 01:19:42 +01:00
Tomasz Grabiec
bdd5bdae14 topology_coordinator: tablets: Do not fail in a tight loop
If streaming or cleanup RPC fails, we would retry immediately. That
fills the logs with erorrs. Throttle them by sleeping on error before
the same action is retried.
2024-01-23 01:19:42 +01:00
Tomasz Grabiec
a3f6682ba2 topology_coordinator: tablets: Avoid warnings about ignored failured future 2024-01-23 01:18:10 +01:00
Tomasz Grabiec
5fccee3a13 storage_service, topology: Track excluded state in locator::topology
Will be used by tablet load balancer to avoid excluded nodes in
scheduling.
2024-01-23 01:12:58 +01:00
Tomasz Grabiec
d59db94f3c raft topology: Introduce param-less topology::get_excluded_nodes()
Picks up currently excluded nodes. Will be used during tablet rebuild
on removenode.
2024-01-23 01:12:58 +01:00
Tomasz Grabiec
d053c5ef1e raft topology: Move get_excluded_nodes() to topology
Will be accessed outside topology coordinator from tablet rebuild handler.
2024-01-23 01:12:58 +01:00
Tomasz Grabiec
92f01674f2 tablets: load_balancer: Generalize load tracking
This patch removes some duplication of logic and implicit assumptions
by creating clear algebra for load impact calculation and its
application to state of the load balancer.

Will make adding new kinds of tablet transitions with different impact
on load much easier.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
649ca0e46c tablets: Introduce get_migration_streaming_info() which works on migration request
Will be used by tablet load balancer to compute impact on load of
planned migrations. Currently, the logic is hard coded in the load
balancer and may get out of sync with the logic we have in
get_migration_streaming_info() for already running tablet transitions.

The logic will become more complex for rebuild transition, so use
shared code to compute it.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
6dc56fd80b tablets: Move migration_to_transition_info() to tablets.hh 2024-01-23 01:12:57 +01:00
Tomasz Grabiec
1df256221c tablets: Extract get_new_replicas() which works on migraiton request
Now we have a single place which translates tablet migration request to new
replicas.

Will be reused in other places.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
ae382196f1 tablets: Move tablet_migration_info to tablets.hh
Will add methods which operate on it to tablets.hh where they belong.
2024-01-23 01:12:57 +01:00
Tomasz Grabiec
4a06ffb43c tablets: Store transition kind per tablet
Will be used to distinguish regular migration from rebuild, repair and
RF change.
2024-01-23 01:12:57 +01:00
Pavel Emelyanov
d1d4620af8 config: Add --tablets-initial-scale-factor
Previous patch taught tablets allocator to multiply the initial tablets
count by some value. This patch makes this factor configurable

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:18:18 +03:00
Pavel Emelyanov
eb3b237e05 tablet_allocator: Add initial tablets scale to config
When allocating tablets for table for the frist time their initial count
is calculated so that each shard in a cluster gets one tablet. It may
happen that more than one initial tablet per shard is better, e.g. perf
tests typically rely on that.

It's possible to specify the initial tablets count when creating a
keyspace, this number doesn't take the cluster topology into
consideration and may also be not very nice.

As a temporary solution (e.g. for perf tests) we may add a configurable
that scales the initial number of calculated tablets by some factor

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:14:45 +03:00
Pavel Emelyanov
f57b194db0 tablet_allocator: Add config
Tablet allocator is a sharded service, that starts in main, it's worth
equipping it with a config. Next patches will fill it with some payload

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-22 19:13:58 +03:00
Kamil Braun
3268be3860 raft: server: track last persisted snapshot descriptor index
Also introduce a condition variable notified whenever this index is
updated.

Will be user in following commits.
2024-01-22 16:48:08 +01:00
Kamil Braun
1e786d9d64 raft: server: framework for handling server requests
Add data structures and modify `io_fiber` code to prepare it for
handling requests generated by the `server`, not just `fsm`.
Used in later commits.
2024-01-22 16:47:34 +01:00
Kefu Chai
33794eca19 database: wait until commitlog are reclaimed in flush_all_tables()
this change addresses the possible data resurrection after
"nodetool compact" and "nodetool flush" commands. and prepare for
the fix of a similar data resurrection issue after "nodetool cleanup".

active commitlog segments are recycled in the background once they are
discarded.

and there is a chance that we could have data resurrection even after
"nodetool cleanup", because the mutations in commitlog's active segments
could change the tables which are supposed to be removed by
"nodetool cleanup", so as a solution to address this problem in the
pre-tablets era, we force new active segments of commitlog, and flush the
involved memtables. since the active segments are discarded in the
background, the completion of the "nodetool cleanup" does not guarantee
that these mutation won't be applied to memtable when server restarts,
if it is killed right away.

the same applies to "force_flush", "force_compaction" and
"force_keyspace_compaction" API calls which are used by nodetool as
well. quote from Benny's comment

> If major comapction doesn't wait for the commitlog deletion it is
> also exposed to data resurrection since theoretically it could purge
> tombstones based on the assumption that commitlog would not resurrect
> data that they might shadow, BUT on a crash/restart scenario commitlog
> replay would happen since the commitlog segments weren't deleted -
> breaking the contract with compaction.

so to ensure that the active segments are reclaimed upon completion of
"nodetool cleanup", "nodetool compact" and "nodetool flush" commands,
let's wait for pending deletes in `database::flush_all_tables()`, so the
caller wait until the reclamation of deleted active segments completes.

Refs #4734
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16915
2024-01-22 17:31:57 +02:00
David Garcia
f3eeba8cc6 docs: parse config.cc properties as rst text
This enhancement formats descriptions in config.cc using the standard markup language reStructuredText (RST).

By doing so, it improves the rendering of these descriptions in the documentation, allowing you to use various directives like admonitions, code blocks, ordered lists, and more.

Closes scylladb/scylladb#16311
2024-01-22 16:40:18 +02:00
Botond Dénes
a48881801a replica/tablets: drop keyspace_name from system.tablets partition-key
The name of the keyspace being part of the partition key is not useful,
the table_id already uniquely identifies the table. The keyspace name
being part of the key, means that code wanting to interact with this
table, often has to resolve the table id, just to be able to provide the
keyspace name. This is counter productive, so make the keyspace_name
just a static column instead, just like table_name already is.

Fixes: #16377

Closes scylladb/scylladb#16881
2024-01-22 13:12:02 +01:00
Petr Gusev
6a4176c84f Update seastar submodule
* seastar 8b9ae36b...85359b28 (4):
  > rpc: extend the use_gate until request processing is finished

Fixes scylladb/scylladb#16382

  > scripts: Remove build.sh
  > build: do not install FindProtobuf.cmake
  > net: add missing include

Closes scylladb/scylladb#16883
2024-01-22 11:29:50 +01:00
Kamil Braun
1007ac4956 Merge 'sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes' from Petr Gusev
Before the patch we called `gossiper.remove_endpoint` for IP-s of the
left nodes. The problem is that in replace-with-same-ip scenario we
called `gossiper.remove_endpoint` for IP which is used by the new,
replacing node. The `gossiper.remove_endpoint` method puts the IP into
quarantine, which means gossiper will ignore all events about this IP
for `quarantine_delay` (one minute by default). If we immediately
replace just replaced node with the same IP again, the bootstrap will
fail since the gossiper events are blocked for this IP, and we won't be
able to resolve an IP for the new host_id.

Another problem was that we called gossiper.remove_endpoint method,
which doesn't remove an endpoint from `_endpoint_state_map`, only from
live and unreachable lists. This means the IP will keep circulating in
the gossiper message exchange between cluster nodes until full cluster
restart.

This patch fixes both of these problems. First, we rely on the fact that
when topology coordinator moves the `being_replaced` node to the left
state, the IP of the `replacing` node is known to all nodes. This means
before removing an IP from the gossiper we can check if this IP is
currently used by another node in the current raft topology. This is
done by constructing the `used_ips` map based on normal and transition
nodes. This map is cached to avoid quadratic behaviour.

Second, we call `gossiper.force_remove_endpoint`, not
`gossiper.remove_endpoint`. This function removes and IP from
`_endpoint_state_map`, as well as from live and unreachable lists.

Closes scylladb/scylladb#16820

* github.com:scylladb/scylladb:
  get_peer_info_for_update: update only required fields in raft topology mode
  get_peer_info_for_update: introduce set_field lambda
  storage_service::on_change: fix indent
  storage_service::on_change: skip handle_state functions in raft topology mode
  test_replace_different_ip: check old IP is removed from gossiper
  test_replace: check two replace with same IP one after another
  storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes
2024-01-22 11:25:55 +01:00
Botond Dénes
742bc1bd11 test/topology_experimental_raft: test_tablet.py: disable flaky test
Skip test_tablet_missing_data_repair, it is failing a lot breaking
promotion and CI. Can't revert because the PR introducing it was already
piled on. So disable while investigated.

Refs: #16859

Closes scylladb/scylladb#16879
2024-01-22 11:49:05 +02:00
Avi Kivity
9e8b65f587 chunked_vector: remove range constructor
Standard containers don't have constructors that take ranges;
instead people use boost::copy_range or C++23 std::ranges::to.

Make the API more uniform by removing this special constructor.

The only caller, in a test, is adjusted.

Closes scylladb/scylladb#16905
2024-01-22 10:26:15 +02:00
Lakshmi Narayanan Sreethar
a1867986e7 test.py: deduce correct path for unit tests when built with cmake
Fix the path deduction for unit test executables when the source code is
built with cmake.

Fixes #16906

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16907
2024-01-22 10:03:44 +02:00
Nadav Har'El
0bef50ef0c cql-pytest: add "--vnodes" option to "run" script
Running test/cql-pytest/run now defaults to enabling the "tablets"
experimental feature when running Scylla - and tests detect this and
use this feature as appropriate. This is the correct default going
forward, but in the short term it would be nice to also have an
option to easily do a manual test run *without* tablets.

So this patch adds a "--vnodes" option to the test/cql-pytest/run
script. This option causes "run" to run Scylla without enabling the
"tablets" experimental feature.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16896
2024-01-22 09:35:11 +02:00
Anna Stuchlik
a462b914cb doc: add 2024.1 to the OSS vs. Enterprise matrix
This commit adds the information that
ScyllaDB Enterprise 2024.1 is based
on ScyllaDB Open Source 5.4
to the OSS vs. Enterprise matrix.

Closes scylladb/scylladb#16880
2024-01-22 09:25:08 +02:00
Kefu Chai
9550f29d22 cql3: add formatter for cql3::prepared_cache_key_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for cql3::prepared_cache_key_type
and cql3::prepared_cache_key_type::cache_key_type, and remove
their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16901
2024-01-21 19:12:59 +02:00
Avi Kivity
3092e3a5dc Merge 'doc: improvements to the Create Cluster page' from Anna Stuchlik
This PR:
- Removes the redundant information about previous versions from the Create Cluster page.
- Fixes language mistakes on that page, and replaces "Scylla" with "ScyllaDB".

(nobackport)

Closes scylladb/scylladb#16885

* github.com:scylladb/scylladb:
  doc: fix the language on the Create Cluster page
  doc: remove reduntant info about old versions
2024-01-21 18:18:32 +02:00
Avi Kivity
5810396ba1 Merge 'Invalidate prepared statements for views when their schema changes.' from Eliran Sinvani
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.

Fixes https://github.com/scylladb/scylladb/issues/16392

This series also adds a test which fails without this fix and passes when the fix is applied.

Closes scylladb/scylladb#16897

* github.com:scylladb/scylladb:
  Add test for mv prepared statements invalidation on base alter
  query processor: treat view changes at least as table changes
2024-01-21 17:43:49 +02:00
Kefu Chai
d1dd71fbd7 mutation: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16889
2024-01-21 16:58:26 +02:00
Kefu Chai
1ce58595aa dht: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16891
2024-01-21 16:56:16 +02:00
Kefu Chai
45c4f2039b cql3: add formatter for cql3::ut_name
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for cql3::ut_name, and remove
their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16890
2024-01-21 16:53:05 +02:00
Kefu Chai
f916286b25 index: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16892
2024-01-21 16:52:25 +02:00
Kefu Chai
ce076b5ae3 gossiping_property_file_snitch: drop unused using namespace
we don't use any symbol in this namespace, in this function, so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16893
2024-01-21 16:48:37 +02:00
Eliran Sinvani
0e5a8cad62 Add test for mv prepared statements invalidation on base alter
Issue #16392 describes a bug where when a base table is altered, it's
materialized views prepared statements are not invalidated which in turn
causes them to return missing data.
This test reproduces this bug and serves as a regression test for this
problem.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-21 15:44:06 +02:00
Eliran Sinvani
5e33d9346b query processor: treat view changes at least as table changes
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.

Fixes #16392

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-21 15:40:54 +02:00
Anna Stuchlik
652cf1fa70 doc: remove the 5.1-to-2022.2 upgrade guide
This commit removes the 5.1-to-2022.2 upgrade
guide - the upgrade guide for versions we
no longer support.
We should remove it while adding the 5.4-to-2024.1
upgrade guide (the previous commit).
2024-01-19 18:33:08 +01:00
Anna Stuchlik
3c17fca363 doc: add the 5.4-to-2024.1 upgrade guide
This commit adds the upgrade guide from
ScyllaDB Open Source 5.4 to ScyllaDB
Enterprise 2024.1.

The need to include the "Restore system tables"
step in rollback has been confirmed; see
https://github.com/scylladb/scylladb/issues/11907#issuecomment-1842657959

Fixes https://github.com/scylladb/scylladb/issues/16445
2024-01-19 18:23:37 +01:00
Petr Gusev
5de970e430 get_peer_info_for_update: update only required fields in raft topology mode
Some fields of system.peers table are updated
through raft, we don't need to peek them from gossiper.

The goal of the patch is to declare explicitly
which code is responsible for which fields.
In particular, in raft topology mode we don't
need to update raft-managed fields since
it's done in topology_state_load and
raft_ip_address_updater.
2024-01-19 20:37:12 +04:00
Petr Gusev
f51f843b67 get_peer_info_for_update: introduce set_field lambda
This is a refactoring commit. In the next commit
we'll add a parameter to this unified lambda and
this is easy to do if we have only one lambda and
not three.
2024-01-19 20:37:12 +04:00
Petr Gusev
37063e2432 storage_service::on_change: fix indent 2024-01-19 20:37:12 +04:00
Petr Gusev
8e6b569de5 storage_service::on_change: skip handle_state functions in raft topology mode
We don't need them in raft topology mode since the token_metadata
update happens in topology_state_load function. We lift the
_raft_topology_change_enabled checks from those functions to on_change.
2024-01-19 20:37:12 +04:00
Petr Gusev
1e00889842 test_replace_different_ip: check old IP is removed from gossiper
In this commit we modify the existing
test_replace_different_ip. We add the check that the old
IP is not contained in alive or down lists, which
means it's completely wiped from gossiper. This test is failing
without the force_remove_endpoint fix from
a previous commit. We also check that the state of
local system.peers table is correct.
2024-01-19 20:36:52 +04:00
Anna Stuchlik
d345a893d6 doc: fix the language on the Create Cluster page
This commit fixes language mistakes on
the Create Cluster page, and replaces
"Scylla" with "ScyllaDB".
2024-01-19 17:21:12 +01:00
Anna Stuchlik
af669dd7ae doc: remove reduntant info about old versions
This commit removes the information about
old versions, which is reduntant in the next
upcoming version.
2024-01-19 17:06:34 +01:00
Anna Stuchlik
b1ba904c49 doc: remove upgrade for unsupported versions
This commit removes the upgrade guides
from ScyllaDB Open Source to Enterprise
for versions we no longer support.

In addition, it removes a link to
one of the removed pages from
the Troubleshooting section (the link is
redundant).

Closes scylladb/scylladb#16249
2024-01-19 15:59:35 +02:00
Mikołaj Grzebieluch
c589793a9e test.py: test_maintenance_socket: remove pytest.xfail
Issue https://github.com/scylladb/python-driver/issues/278 was fixed in
https://github.com/scylladb/python-driver/pull/279.

Closes scylladb/scylladb#16873
2024-01-19 14:54:15 +01:00
Botond Dénes
b50d9bb802 Merge 'Add code coverage support' from Eliran Sinvani
This mini-set includes code coverage support for ScyllaDB, it provides:
1. Support for building ScyllaDB with coverage support.
2. Utilities for processing coverage profiling data
3. test.py support for generation and processing of coverage profiling into an lcov trace files which can later be used to produce HTML or textual coverage reports.

Refs #16323

Closes scylladb/scylladb#16784

* github.com:scylladb/scylladb:
  Add code coverage documentation
  test.py: support code coverage
  code coverage: Add libraries for coverage handling
  test.py: support --coverage and --coverage-mode
  configure.py support coverage profiles on standrad build modes
2024-01-19 15:27:44 +02:00
Pavel Emelyanov
e62114214f Merge 'More logging for Raft-based topology' from Kamil Braun
Currently if topology coordinator gets stuck in a CI test run it's hard to debug this (e.g. scylladb/scylladb#16708). We can add a lot of logging inside topology coordinator code to aid debugging, without spamming the logs -- these are relatively rare control plane events.

Closes scylladb/scylladb#16749

* github.com:scylladb/scylladb:
  test/pylib: scylla_cluster: enable raft_topology=debug level by default
  raft topology: increase level of some TRACE messages
  raft topology: log when entering transition states
  raft topology: don't include null ID in exclude_nodes
  raft topology: INFO log when executing global commands and updating topology state
  storage_service: separate logger for raft topology
2024-01-19 16:19:44 +03:00
Nadav Har'El
debf6753c7 Merge 'test/cql-pytest: run tests with tablets' from Botond Dénes
Add `--experimental-features=tablets` to both `test/cql-pytest/suite.yaml` and `test/cql-pytest/run.py`, so tablets are enabled. Detect tablet support in `contest.py` and add an xfail and skip marker to mark tests that fail/crash with tablets. These are expected to be fixed soon.

Some tests checking things around alter-keyspace, had to force-disable tablets on the created keyspace, because tablets interfere with the test (a keyspace with tablets cannot have simple strategy for example).
Tablets were also interfering with `test_keyspace.py:test_storage_options_local`, because it is expecting `system_schema.scylla_keyspaces` to not have any entries for local storage keyspace, but they have it if tablets are enabled. Adjust the test to account for this.

Closes scylladb/scylladb#16840

* github.com:scylladb/scylladb:
  test/cql-pytest: run.py,suite.yaml: enable tablets by default
  test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed
  test/cql-pytest: disable tablets for some keyspace-altering tests
  test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets
  test/cql-pytest: fix test_tablets.py to set initial_tablets correctly
  test/cql-pytest: add tablet detection logic and fixtures
  test/cql-pytest: extract is_scylla check into util.py
2024-01-19 13:38:56 +02:00
Kamil Braun
cc039498c6 Update tools/cqlsh submodule
* tools/cqlsh 426fa0ea...b8d86b76 (8):
  > Make cqlsh work with unix domain sockets

Fixes scylladb/scylladb#16489

  > Bump python-driver version
  > dist/debian: add trailer line
  > dist/debian: wrap long line
  > Draft: explicit build-time packge dependencies
  > stop retruning status_code=2 on schema disagreement
  > Fix minor typos in the code
  > Dockerfile: apt-get update and apt-get upgrade to get latest OS packages
2024-01-19 11:23:22 +01:00
Botond Dénes
04881b3915 test/cql-pytest: run.py,suite.yaml: enable tablets by default
All the preparations are done, the tests can now run with tablets.
2024-01-19 03:46:38 -05:00
Botond Dénes
075be5a04a test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed
For tests that cover functionality, which doesn't yet work with tablets.
These tests and the respective functionality they test, are expected to
be fixed soon, and then these fixtures will be removed.
2024-01-19 03:46:38 -05:00
Botond Dénes
6e6bee4368 test/cql-pytest: disable tablets for some keyspace-altering tests
When tablets are enabled on a keyspace, they cannot be altered to simple
replication strategy anymore.
These keyspaces are testing exactly that, so disable tablets on the
initial keyspace create statements.
2024-01-19 03:46:38 -05:00
Botond Dénes
5f11aa940d test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets
This test expects a keyspace with local storage option, to not have a
row in system_schema.scylla_keyspace. With tablets enabled by default,
this won't be the case. Adjust the test to check for the specific
storage-related columns instead.
2024-01-19 03:46:38 -05:00
Nadav Har'El
f92d2b4928 test/cql-pytest: fix test_tablets.py to set initial_tablets correctly
Recently, in commit 49026dc319, the
way to choose the number of tablets in a new keyspace changed.
This broke the test we had for a memory leak when many tablets were
used, which saw the old syntax wasn't recognized and assumed Scylla
is running without tablet support - so the test was skipped.

Let's fix the syntax. After this patch the test passes if the tablets
experimental feature is enabled, and only skipped if it isn't.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-19 03:46:38 -05:00
Botond Dénes
2119faf7fe test/cql-pytest: add tablet detection logic and fixtures
Add keyspace_has_tablets() utility function, which, given a keyspace,
returns whether it is using tablets or not.
In addition, 3 new fixtures are added:
* has_tablets - does scylla has tablets by default?
* xfail_tablets - the test is marked xfail, when tablets are enabled by
  default.
* skip_with_tablets - the test is skipped when tablets are enabled by
  default, because it might crash with tablets.

We expect the latter two to be removed soon(ish), as we make all test,
and the functionality they test work with tablets.
2024-01-19 03:46:38 -05:00
Botond Dénes
6e53264bc3 test/cql-pytest: extract is_scylla check into util.py
This logic is currently in the scylla_only fixture, but we want to
re-use this in other utility functions in the next patches too.
2024-01-19 03:46:38 -05:00
Petr Gusev
070de5c551 test_replace: check two replace with same IP one after another
This is a test case for the problem, described in the
previous commit. Before that fix the second replace
failed since it couldn't resolve an IP for the new host_id.
2024-01-19 12:24:04 +04:00
Petr Gusev
30b2e5838c storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes
Before the patch we called gossiper.remove_endpoint for IP-s
of the left nodes. The problem is that in replace-with-same-ip
scenario we called gossiper.remove_endpoint for IP which is
used by the new, replacing node. The gossiper.remove_endpoint
method puts the IP into quarantine, which means gossiper will
ignore all events about this IP for quarantine_delay (one minute by
default). If we immediately replace just replaced node with
the same IP again, the bootstrap will fail since the gossiper
events are blocked for this IP, and we won't be able to
resolve an IP for the new host_id.

Another problem was that we called gossiper.remove_endpoint
method, which doesn't remove an endpoint from _endpoint_state_map,
only from live and unreachable lists. This means the IP
will keep circulating in the gossiper message exchange between cluster
nodes until full cluster restart.

This patch fixes both of these problems. First, we rely on
the fact that when topology coordinator moves the being_replaced
node to the left state, the IP of the replacing node is known to all nodes.
This means before removing an IP from the gossiper we can check if
this IP is currently used by another node in the current raft topology.
This is done by constructing the used_ips map based on normal and
transition nodes. This map is cached to avoid quadratic behaviour.

Second, we call gossiper.force_remove_endpoint, not
gossiper.remove_endpoint. This function removes and IP from
_endpoint_state_map, as well as from live and unreachable lists.

The tests for both of these improvements will be added in subsequent
commits.
2024-01-19 12:24:04 +04:00
Kefu Chai
0dbb0ed09f api: storage_service: correct a typo
s/trough/through/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16870
2024-01-19 10:21:41 +02:00
Kefu Chai
5c0484cb02 db: add formatter for db::operation_type
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for db::operation_type, and
remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16832
2024-01-19 10:16:41 +02:00
Kefu Chai
2d2cd5fa3a repair: do not compare unsigned with signed
this change should silence the warning like

```
/home/kefu/dev/scylladb/repair/repair.cc:222:23: error: comparison of integers of different signs: 'int' and 'size_type' (aka 'unsigned long') [-Werror,-Wsign-compare]
  222 |     for (int i = 0; i < all.size(); i++) {
      |                     ~ ^ ~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16867
2024-01-19 08:52:02 +02:00
Kefu Chai
21d55abe8b unimplemented: add format_as() for unimplemented::cause
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we replace operator<< with format_as() for
unimplemented::cause, so that we don't rely on the deprecated behavior,
and neither do we create a fully blown fmt::formatter. as in
fmt v10, format_as() can be used in place of fmt::formatter,
while in fmt v9, format_as() is only allowed to return a integer.
so, to be future-proof, and to be simpler, format_as() is used.
we can even replace `format_as(c)` with `c`, once fmt v10 is
available in future.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16866
2024-01-19 08:38:30 +02:00
Botond Dénes
70252ee36f Merge 'auth: do not include unused headers' from Kefu Chai
these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning.

Closes scylladb/scylladb#16868

* github.com:scylladb/scylladb:
  auth: do not include unused headers
  locator: Handle replication factor of 0 for initial_tablets calculations
  table: add_sstable_and_update_cache: trigger compaction only in compaction group
  compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
  compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
2024-01-19 08:30:11 +02:00
Kefu Chai
263e2fabae auth: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-19 10:49:17 +08:00
Avi Kivity
d65ce16cf6 Merge 'Prevent empty compaction tasks in cleanup, upgrade sstables, and add_sstable' from Benny Halevy
This short series prevents the creation of compaction tasks when we know in advance that they have nothing to do.
This is possible in the clean path by:
- improve the detection of candidates for cleanup by skipping sstables that require cleanup but are already being compacted
- checking that list of sstables selected for cleanup isn't empty before creating the cleanup task

For upgrade sstables, and generally when rewriting all sstable: launch the task only if the list off candidate sstables isn't empty.

For regular compaction, when triggered via `table::add_sstable_and_update_cache`, we currently trigger compaction (by calling `submit`) on all compaction groups while the sstable is added only to one of them.
Also, it is typically called for maintenance sstables that are awaiting offstrategy compaction, in which case we can skip calling `submit` entirely since the caller triggers offstrategy compaction at a later stage.

Refs scylladb/scylladb#15673
Refs scylladb/scylladb#16694
Fixes scylladb/scylladb#16803

Closes scylladb/scylladb#16808

* github.com:scylladb/scylladb:
  table: add_sstable_and_update_cache: trigger compaction only in compaction group
  compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
  compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
2024-01-18 19:47:33 +02:00
Pavel Emelyanov
8595d64d01 locator: Handle replication factor of 0 for initial_tablets calculations
When calculating per-DC tablets the formula is shards_in_dc / rf_in_dc,
but the denominator in it can be configured to be literally zero and the
division doesn't work.

Fix by assuming zero tablets for dcs with zero rf

fixes: #16844

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16861
2024-01-18 19:42:08 +02:00
Kamil Braun
8d9b0a6538 raft: server: inline poll_fsm_output 2024-01-18 18:09:13 +01:00
Kamil Braun
754a7b54e4 raft: server: fix indentation 2024-01-18 18:09:11 +01:00
Kamil Braun
527780987b raft: server: move io_fiber's processing of batch to a separate function 2024-01-18 18:09:02 +01:00
Kamil Braun
3e6b4910a6 raft: move poll_output() from fsm to server
`server` was the only user of this function and it can now be
implemented using `fsm`'s public interface.

In later commits we'll extend the logic of `io_fiber` to also subscribe
to other events, triggered by `server` API calls, not only to outputs
from `fsm`.
2024-01-18 18:07:52 +01:00
Kamil Braun
95b6a60428 raft: move _sm_events from fsm to server
In later commits we will use it to wake up `io_fiber` directly from
`raft::server` based on events generated by `raft::server` itself -- not
only from events generated by `raft::fsm`.

`raft::fsm` still obtains a reference to the condition variable so it
can keep signaling it.
2024-01-18 18:07:44 +01:00
Kamil Braun
a83e04279e raft: fsm: remove constructor used only in tests
This constructor does not provide persisted commit index. It was only
used in tests, so move it there, to the helper `fsm_debug` which
inherits from `fsm`.

Test cases which used `fsm` directly instead of `fsm_debug` were
modified to use `fsm_debug` so they can access the constructor.
`fsm_debug` doesn't change the behavior of `fsm`, only adds some helper
members. This will be useful in following commits too.
2024-01-18 18:07:17 +01:00
Kamil Braun
689d59fccd raft: fsm: move trace message from poll_output to has_output
In a later commit we'll move `poll_output` out of `fsm` and it won't
have access to internals logged by this message (`_log.stable_idx()`).

Besides, having it in `has_output` gives a more detailed trace. In
particular we can now see values such as `stable_idx` and `last_idx`
from the moment of returning a new fsm output, not only when poll
started waiting for it (a lot of time can pass between these two
events).
2024-01-18 18:06:55 +01:00
Kamil Braun
f6d43779af raft: fsm: extract has_output()
Also use the more efficient coroutine-specific
`condition_variable::when` instead of `wait`.
2024-01-18 18:06:27 +01:00
Kamil Braun
dccfd09d83 raft: pass max_trailing_entries through fsm_output to store_snapshot_descriptor
This parameter says how many entries at most should be left trailing
before the snapshot index. There are multiple places where this
decision is made:
- in `applier_fiber` when the server locally decides to take a snapshot
  due to log size pressure; this applies to the in-memory log
- in `fsm::step` when the server received an `install_snapshot` message
  from the leader; this also applies to the in-memory log
- and in `io_fiber` when calling `store_snapshot_descriptor`; this
  applies to the on-disk log.

The logic of how many entries should be left trailing is calculated
twice:
- first, in `applier_fiber` or in `fsm::step` when truncating the
  in-memory log
- and then again as the snapshot descriptor is being persisted.

The logic is to take `_config.snapshot_trailing` for locally generated
snapshots (coming from `applier_fiber`) and `0` for remote snapshots
(from `fsm::step`).

But there is already an error injection that changes the behavior of
`applier_fiber` to leave `0` trailing entries. However, this doesn't
affect the following `store_snapshot_descriptor` call which still uses
`_config.snapshot_trailing`. So if the server got restarted, the entries
which were truncated in-memory would get "revived" from disk.
Fortunately, this is test-only code.

However in future commits we'd like to change the logic of
`applier_fiber` even further. So instead of having a separate
calculation of trailing entries inside `io_fiber`, it's better for it to
use the number that was already calculated once. This number is passed to
`fsm::apply_snapshot` (by `applier_fiber` or `fsm::step`) and can then
be received by `io_fiber` from `fsm_output` to use it inside
`store_snapshot_descriptor`.
2024-01-18 18:05:45 +01:00
Kamil Braun
40cd91cff7 raft: server: pass *_aborted to set_exception call
This looks like a minor oversight, in `server_impl::abort` there are
multiple calls to `set_exception` on the different promises, only one of
them would not receive `*_aborted`.
2024-01-18 18:05:18 +01:00
Kefu Chai
09a688d325 sstables: do not use lambda when not necessary
before this change, we always reference the return value of
`make_reader()`, and the return value's type `flat_mutation_reader_v2`
is movable, so we can just pass it by moving away from it.

in this change, instead of using a lambda, let's just have the
return value of it. simpler this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16835
2024-01-18 15:54:49 +02:00
Kefu Chai
a1dcddd300 utils: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16833
2024-01-18 12:50:06 +02:00
Asias He
d3efb3ab6f storage_service: Set session id for raft_rebuild
Raft rebuild is broken because the session id is not set.

The following was seen when run rebuild

stream_session - [Stream #8cfca940-afc9-11ee-b6f1-30b8f78c1451]
stream_transfer_task: Fail to send to 127.0.70.1:0:
seastar::rpc::remote_verb_error (Session not found:
00000000-0000-0000-0000-000000000000)

with raft topology, e.g.,

scylla --enable-repair-based-node-ops 0 --consistent-cluster-management true --experimental-features consistent-topology-changes

Fix by setting the session id.

Fixes #16741

Closes scylladb/scylladb#16814
2024-01-18 12:47:20 +02:00
Kamil Braun
e4918c0d31 test/pylib: scylla_cluster: enable raft_topology=debug level by default
To help debugging test.py failures in CI.
2024-01-18 11:24:16 +01:00
Kamil Braun
52e67ca121 raft topology: increase level of some TRACE messages
Increased them to DEBUG level, and in one case to WARN (inside an
exception handler).

The selected messages are still relatively rare (per-node per-transition
control plane events, plus events such as fibers sleeping and waking up)
although more low level. They are also small messages. Messages that are
large such as those which print all tokens of nodes or large mutations
are left on TRACE level.

The plan is to enable DEBUG level logging in test.py tests for
raft_topology, while not spamming the logs completely such as by
printing large mutations.
2024-01-18 11:24:16 +01:00
Kamil Braun
92e6604127 raft topology: log when entering transition states
Those are rare control plane events, but might be useful when debugging
problems with topology coordinator (e.g. where it got stuck).
2024-01-18 11:24:15 +01:00
Kamil Braun
aeb53ea31d raft topology: don't include null ID in exclude_nodes
Observed with newly added logs:
```
raft topology - executing global topology command barrier_and_drain, excluded nodes: {00000000-0000-0000-0000-000000000000}
```
2024-01-18 11:24:15 +01:00
Kamil Braun
ae25f703c4 raft topology: INFO log when executing global commands and updating topology state
Those are rare control plane events, but useful for debugging e.g.  if
topology coordinator gets stuck at some point.
2024-01-18 11:24:15 +01:00
Kamil Braun
71957b4320 storage_service: separate logger for raft topology
Allows selectively enabling higher logging levels for just raft-topology
related things, without doing it for the entire storage_service (which
includes things like gossiper callbacks).

Also gets rid of the redundant "raft topology:" prefix which was also
not included everywhere.
2024-01-18 11:24:14 +01:00
Eliran Sinvani
32d8dadf1a Add code coverage documentation
Add `docs/dev/code-coverage.md` with explanations about how to work with
the different tools added for coverage reporting and cli options added
to `configure.py` and `test.py`

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
c7dff1b81b test.py: support code coverage
test.py already support the routing of coverage data into a
predetermined folder under the `tmpdir` logs folder. This patch extends
on that and leverage the code coverage processing libraries to produce
test coverage lcov files and a coverage summary at the end of the run.
The reason for not generating the full report (which can be achieved
with a one liner through the `coverage_utils.py` cli) is that it is
assumed that unit testing is not necessarily the "last stop" in the
testing process and it might need to be joined with other coverage
information that is created at other testing stages (for example dtest).

The result of this patch is that when running test.py with one of the
coverage options (`--coverage` / `--mode-coverage`) it will perform
another step of processing and aggregating the profiling information
created.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
00a55abdd6 code coverage: Add libraries for coverage handling
Coverage handling is divided into 3 steps:
1. Generation of  profiling data from a run of an instrumented file
   (which this patch doesn't cover)
2. Processing of profiling data, which involves indexing the profile and
   producing the data in some format that can be manipulated and
   unified.
3. Generate some reporting based on this data.

The following patch is aiming to deal with the last two steps by providing a
cli and a library for this end.
This patch adds two libraries:
1. `coverage_utils.py` which is a library for manipulating coverage
   data, it also contains a cli for the (assumed) most common operations
   that are needed in order to eventually generate coverage reporting.
2. `lcov_utils.py` - which is a library to deal with lcov format data,
   which is a textual form containing a source dependant coverage data.
   An example of such manipulation can be `coverage diff` operation
   which produces a set like difference operation. cov_a - cov_b = diff
   where diff is an lcov formated file containing coverage data for code
   cov_a that is not covered at all in cov_b.

The libraries and cli main goal is to provide a unified way to handle
coverage data in a way that can be easily scriptable and extensible.

This will pave the way for automating the coverage reporting and
processing in test.py and in jenkins piplines (for example to also
process dtest or sct coverage reporting)

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
f4b6c9074a test.py: support --coverage and --coverage-mode
We aim to support code coverage reporting as part of our development
process, to this end, we will need the ability to "route" the dumped
profiles from scylla and unit test to a predetermined location.
We can consider profile data as logged data that should persist after
tests have been run.

For this we add two supported options to test.py:
--coverage - which means that all suits on all modes will participate in
             coverage.
--coverage-mode - which can be used to "turn on" coverage support only
                  for some of the modes in this run.

The strategy chosen is to save the profile data in
`tmpdir`/mode/coverage/%m.profraw (ref:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program)
This means that for every suite the profiling data of each object is
going to be merged into the same file (llvm claims to lock the file so
concurrency is fine).
More resolution than the suite level seems to not give us anything
useful (at least not at the moment). Moreover, it can also be achieved
by running a single test.
Data in the suite level will help us to detect suits that don't generate
coverage data at all and to fix this or to skip generating the profiles
for them.

Also added support of  'coverage' parameter in the `suite.yaml` file,
which can be used to disable coverage for a specific suite, this
parameter defaults to True but if a suite is known to not generate
profiles or the suite profile data is not needed or obfuscate the result
it can be set to false in order to cancel profiles routing and
processing for this suite.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
759d70deee configure.py support coverage profiles on standrad build modes
We already have a dedicated coverage build, however, this build is
dedicated mostly for coverage in boost and standalone unit tests.
This added configuration option will compile every configured
build mode with coverage profiling support (excluding 'coverage' mode).
It also does targeted profiling that is narrowed down only to ScyllaDB
code and doesn't instrument seastar and testing code, this should give
a more accurate coverage reporting and also impact performance less, as
one example, the reactor loop in seastar will not be profiled (along
with everything else).
The targeted profiling is done with the help of the newly added
`coverage_sources.list` file which excludes all seastar sub directories
from the profiling.
Also an extra measure is taken to make sure that the seastar
library will not be linked with the coverage framework
(so it will not dump confusing empty profiles).
Some of the seastar headers are still going to be included in the
profile since they are indirectly included by profiled source files in
order to remove them from the final report a processing step on the
resulting profile will need to take place.

A note about expected performance impact:
It is expected to have minimal impact on performance since the
instrumentation adds counter increments without locking.
Ref: https://clang.llvm.org/docs/UsersManual.html#cmdoption-fprofile-update
This means that the numbers themselves are less reliable but all covered
lines are guarantied to have at least non-zero value.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Kefu Chai
f5d1836a45 types: fix indent
f344e130 failed to get the indent right, so fix it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16834
2024-01-18 09:14:39 +02:00
Botond Dénes
8087bc72f0 Merge 'Basic tablet repair support' from Asias He
This patch adds basic tablet repair support.

Below is an example showing how tablet repairs works. The `nodetool
repair -pr` cmd was performed on all the nodes, which makes sure no duplication
repair work will be performed and each tablet will be repaired exactly once.

Three nodes in the cluster. RF = 2. 16 initial tablets.

Tablets:
```
cqlsh> SELECT  * FROM system.tablets;

 keyspace_name | table_id                             | last_token           | table_name | tablet_count | new_replicas | replicas                                                                               | session | stage
---------------+--------------------------------------+----------------------+------------+--------------+--------------+----------------------------------------------------------------------------------------+---------+-------
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -8070450532247928833 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 5)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -6917529027641081857 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 5)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -5764607523034234881 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -4611686018427387905 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 5), (2dd3808d-6601-4483-b081-adf41ef094e5, 4)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -3458764513820540929 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 1), (951cb5bc-5749-481a-9645-4dd0f624f24a, 0)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -2305843009213693953 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 7), (2dd3808d-6601-4483-b081-adf41ef094e5, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -1152921504606846977 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |                   -1 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 7)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  1152921504606846975 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (19caaeb3-d754-4704-a998-840df53eb54c, 2)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  2305843009213693951 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 7)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  3458764513820540927 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 1), (19caaeb3-d754-4704-a998-840df53eb54c, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  4611686018427387903 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  5764607523034234879 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 2)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  6917529027641081855 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  8070450532247928831 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 7)] |    null |  null
```

node1:
```
$nodetool repair -p 7199 -pr ks1 standard1
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: starting user-requested repair for keyspace ks1, repair id 6, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=2 range=(-6917529027641081857,-5764607523034234881] replicas={19caaeb3-d754-4704-a998-840df53eb54c:2, 2dd3808d-6601-4483-b081-adf41ef094e5:3} primary_replica_only=true
[shard 2:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07399633 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7174440}, {127.0.0.2, 7174440}}, row_from_disk_nr={{127.0.0.1, 15330}, {127.0.0.2, 15330}}, row_from_disk_bytes_per_sec={{127.0.0.1, 92.4651}, {127.0.0.2, 92.4651}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 207172}, {127.0.0.2, 207172}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=4 range=(-4611686018427387905,-3458764513820540929] replicas={19caaeb3-d754-4704-a998-840df53eb54c:1, 951cb5bc-5749-481a-9645-4dd0f624f24a:0} primary_replica_only=true
[shard 1:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07302664 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7195032}, {127.0.0.3, 7195032}}, row_from_disk_nr={{127.0.0.1, 15374}, {127.0.0.3, 15374}}, row_from_disk_bytes_per_sec={{127.0.0.1, 93.9618}, {127.0.0.3, 93.9618}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 210526}, {127.0.0.3, 210526}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=6 range=(-2305843009213693953,-1152921504606846977] replicas={19caaeb3-d754-4704-a998-840df53eb54c:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true
[shard 7:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06781354 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7095816}, {127.0.0.3, 7095816}}, row_from_disk_nr={{127.0.0.1, 15162}, {127.0.0.3, 15162}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.7898}, {127.0.0.3, 99.7898}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 223584}, {127.0.0.3, 223584}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=12 range=(4611686018427387903,5764607523034234879] replicas={19caaeb3-d754-4704-a998-840df53eb54c:6, 2dd3808d-6601-4483-b081-adf41ef094e5:2} primary_replica_only=true
[shard 6:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06793772 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7150572}, {127.0.0.2, 7150572}}, row_from_disk_nr={{127.0.0.1, 15279}, {127.0.0.2, 15279}}, row_from_disk_bytes_per_sec={{127.0.0.1, 100.376}, {127.0.0.2, 100.376}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 224897}, {127.0.0.2, 224897}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=13 range=(5764607523034234879,6917529027641081855] replicas={19caaeb3-d754-4704-a998-840df53eb54c:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:3} primary_replica_only=true
[shard 5:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.068579935 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7129512}, {127.0.0.3, 7129512}}, row_from_disk_nr={{127.0.0.1, 15234}, {127.0.0.3, 15234}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1432}, {127.0.0.3, 99.1432}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222135}, {127.0.0.3, 222135}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=6 duration=0.352379s
```

node2:
```
$nodetool repair -p 7200 -pr ks1 standard1
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 1 out of 6 tablets: table=ks1.standard1 tablet_id=1 range=(-8070450532247928833,-6917529027641081857] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:5} primary_replica_only=true
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07016466 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7212816}, {127.0.0.2, 7212816}}, row_from_disk_nr={{127.0.0.1, 15412}, {127.0.0.2, 15412}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.0362}, {127.0.0.2, 98.0362}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 219655}, {127.0.0.2, 219655}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 2 out of 6 tablets: table=ks1.standard1 tablet_id=9 range=(1152921504606846975,2305843009213693951] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:7} primary_replica_only=true
[shard 5:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07180758 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7236216}, {127.0.0.3, 7236216}}, row_from_disk_nr={{127.0.0.2, 15462}, {127.0.0.3, 15462}}, row_from_disk_bytes_per_sec={{127.0.0.2, 96.104}, {127.0.0.3, 96.104}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 215325}, {127.0.0.3, 215325}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 3 out of 6 tablets: table=ks1.standard1 tablet_id=10 range=(2305843009213693951,3458764513820540927] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:1, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true
[shard 1:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06772773 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7039188}, {127.0.0.2, 7039188}}, row_from_disk_nr={{127.0.0.1, 15041}, {127.0.0.2, 15041}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1188}, {127.0.0.2, 99.1188}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222080}, {127.0.0.2, 222080}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 4 out of 6 tablets: table=ks1.standard1 tablet_id=11 range=(3458764513820540927,4611686018427387903] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true
[shard 7:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07025768 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7229664}, {127.0.0.3, 7229664}}, row_from_disk_nr={{127.0.0.2, 15448}, {127.0.0.3, 15448}}, row_from_disk_bytes_per_sec={{127.0.0.2, 98.1351}, {127.0.0.3, 98.1351}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 219876}, {127.0.0.3, 219876}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 5 out of 6 tablets: table=ks1.standard1 tablet_id=14 range=(6917529027641081855,8070450532247928831] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:7} primary_replica_only=true
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0719635 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7225452}, {127.0.0.2, 7225452}}, row_from_disk_nr={{127.0.0.1, 15439}, {127.0.0.2, 15439}}, row_from_disk_bytes_per_sec={{127.0.0.1, 95.7531}, {127.0.0.2, 95.7531}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 214539}, {127.0.0.2, 214539}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 6 out of 6 tablets: table=ks1.standard1 tablet_id=15 range=(8070450532247928831,9223372036854775807] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:4, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true
[shard 4:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0691715 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7122960}, {127.0.0.2, 7122960}}, row_from_disk_nr={{127.0.0.1, 15220}, {127.0.0.2, 15220}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.2049}, {127.0.0.2, 98.2049}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 220033}, {127.0.0.2, 220033}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.42178s
```
node3:
```
$nodetool repair -p 7300 -pr ks1 standard1
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=0 range=(minimum token,-8070450532247928833] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 2dd3808d-6601-4483-b081-adf41ef094e5:5} primary_replica_only=true
[shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07126866 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7133256}, {127.0.0.3, 7133256}}, row_from_disk_nr={{127.0.0.2, 15242}, {127.0.0.3, 15242}}, row_from_disk_bytes_per_sec={{127.0.0.2, 95.4529}, {127.0.0.3, 95.4529}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 213867}, {127.0.0.3, 213867}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=3 range=(-5764607523034234881,-4611686018427387905] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:5, 2dd3808d-6601-4483-b081-adf41ef094e5:4} primary_replica_only=true
[shard 5:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0701025 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7138404}, {127.0.0.3, 7138404}}, row_from_disk_nr={{127.0.0.2, 15253}, {127.0.0.3, 15253}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1108}, {127.0.0.3, 97.1108}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217581}, {127.0.0.3, 217581}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=5 range=(-3458764513820540929,-2305843009213693953] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:7, 2dd3808d-6601-4483-b081-adf41ef094e5:1} primary_replica_only=true
[shard 7:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06859512 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7171632}, {127.0.0.3, 7171632}}, row_from_disk_nr={{127.0.0.2, 15324}, {127.0.0.3, 15324}}, row_from_disk_bytes_per_sec={{127.0.0.2, 99.7068}, {127.0.0.3, 99.7068}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 223398}, {127.0.0.3, 223398}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=7 range=(-1152921504606846977,-1] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:2, 2dd3808d-6601-4483-b081-adf41ef094e5:7} primary_replica_only=true
[shard 2:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06975318 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7105176}, {127.0.0.3, 7105176}}, row_from_disk_nr={{127.0.0.2, 15182}, {127.0.0.3, 15182}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1429}, {127.0.0.3, 97.1429}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217653}, {127.0.0.3, 217653}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=8 range=(-1,1152921504606846975] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 19caaeb3-d754-4704-a998-840df53eb54c:2} primary_replica_only=true
[shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.070810474 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7023276}, {127.0.0.3, 7023276}}, row_from_disk_nr={{127.0.0.1, 15007}, {127.0.0.3, 15007}}, row_from_disk_bytes_per_sec={{127.0.0.1, 94.5894}, {127.0.0.3, 94.5894}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 211932}, {127.0.0.3, 211932}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.351395s
```

Fixes #16599

Closes scylladb/scylladb#16600

* github.com:scylladb/scylladb:
  test: Add test_tablet_missing_data_repair
  test: Add test_tablet_repair
  test: Allow timeout in server_stop_gracefully
  test: Increase STOP_TIMEOUT_SECONDS
  repair: Wire tablet repair with the user repair request
  repair: Pass raft_address_map to repair service
  repair: Add host2ip_t type
  repair: Add finished user-requested log for vnode table too
  repair: Log error in the rpc_stream_handler
  repair: Make row_level repair work with tablet
  repair: Add get_dst_shard_id
  repair: Add shard to repair_node_state
  repair: Add shard map to repair_neighbors
2024-01-18 09:13:00 +02:00
Asias He
1399dc0ff2 test: Add test_tablet_missing_data_repair
The test verifies repair brings the missing rows to the owner.

- Shutdown part of the nodes in the cluster
- Insert data
- Start all nodees
- Run repair
- Shutdown part of the nodes
- Check all data is present
2024-01-18 08:49:06 +08:00
Asias He
bfe5894a9f test: Add test_tablet_repair
A basic repair test that verifies tablet repair works.
2024-01-18 08:49:06 +08:00
Asias He
39912d7bed test: Allow timeout in server_stop_gracefully
The default is 60s. Sometimes it takes more than 60s to stop a node for
some reason.
2024-01-18 08:49:06 +08:00
Asias He
276b04a572 test: Increase STOP_TIMEOUT_SECONDS
It is observed that the stop of scylla took more than 60s to finish in
some cases. Increase the hard coded stop timeout.
2024-01-18 08:49:06 +08:00
Asias He
54239514af repair: Wire tablet repair with the user repair request
Currently, only the table and primary replica selection options are
supported.

Reject repair request if the repair options are not supported yet.

With this patch, users can repair tablet tables by running

    nodetool repair -pr myks mytable

on each node in the cluster, so that each tablet will be repaired only
once without duplication work.

Below is an example showing how tablet repairs works. The `nodetool
repair -pr` cmd was performed on all the nodes. Three nodes in the
cluster. RF = 2. 16 initial tablets.

Tablets:

cqlsh> SELECT  * FROM system.tablets;

 keyspace_name | table_id                             | last_token           | table_name | tablet_count | new_replicas | replicas                                                                               | session | stage
---------------+--------------------------------------+----------------------+------------+--------------+--------------+----------------------------------------------------------------------------------------+---------+-------
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -8070450532247928833 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 5)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -6917529027641081857 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 5)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -5764607523034234881 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -4611686018427387905 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 5), (2dd3808d-6601-4483-b081-adf41ef094e5, 4)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -3458764513820540929 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 1), (951cb5bc-5749-481a-9645-4dd0f624f24a, 0)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -2305843009213693953 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 7), (2dd3808d-6601-4483-b081-adf41ef094e5, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 | -1152921504606846977 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |                   -1 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 7)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  1152921504606846975 |  standard1 |           16 |         null | [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (19caaeb3-d754-4704-a998-840df53eb54c, 2)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  2305843009213693951 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 7)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  3458764513820540927 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 1), (19caaeb3-d754-4704-a998-840df53eb54c, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  4611686018427387903 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  5764607523034234879 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 2)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  6917529027641081855 |  standard1 |           16 |         null | [(19caaeb3-d754-4704-a998-840df53eb54c, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 3)] |    null |  null
           ks1 | 3ffadad0-a552-11ee-bc15-66412bbb6978 |  8070450532247928831 |  standard1 |           16 |         null | [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 7)] |    null |  null

node1:
$nodetool repair -p 7199 -pr ks1 standard1
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: starting user-requested repair for keyspace ks1, repair id 6, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=2 range=(-6917529027641081857,-5764607523034234881] replicas={19caaeb3-d754-4704-a998-840df53eb54c:2, 2dd3808d-6601-4483-b081-adf41ef094e5:3} primary_replica_only=true
[shard 2:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07399633 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7174440}, {127.0.0.2, 7174440}}, row_from_disk_nr={{127.0.0.1, 15330}, {127.0.0.2, 15330}}, row_from_disk_bytes_per_sec={{127.0.0.1, 92.4651}, {127.0.0.2, 92.4651}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 207172}, {127.0.0.2, 207172}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=4 range=(-4611686018427387905,-3458764513820540929] replicas={19caaeb3-d754-4704-a998-840df53eb54c:1, 951cb5bc-5749-481a-9645-4dd0f624f24a:0} primary_replica_only=true
[shard 1:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07302664 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7195032}, {127.0.0.3, 7195032}}, row_from_disk_nr={{127.0.0.1, 15374}, {127.0.0.3, 15374}}, row_from_disk_bytes_per_sec={{127.0.0.1, 93.9618}, {127.0.0.3, 93.9618}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 210526}, {127.0.0.3, 210526}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=6 range=(-2305843009213693953,-1152921504606846977] replicas={19caaeb3-d754-4704-a998-840df53eb54c:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true
[shard 7:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06781354 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7095816}, {127.0.0.3, 7095816}}, row_from_disk_nr={{127.0.0.1, 15162}, {127.0.0.3, 15162}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.7898}, {127.0.0.3, 99.7898}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 223584}, {127.0.0.3, 223584}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=12 range=(4611686018427387903,5764607523034234879] replicas={19caaeb3-d754-4704-a998-840df53eb54c:6, 2dd3808d-6601-4483-b081-adf41ef094e5:2} primary_replica_only=true
[shard 6:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06793772 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7150572}, {127.0.0.2, 7150572}}, row_from_disk_nr={{127.0.0.1, 15279}, {127.0.0.2, 15279}}, row_from_disk_bytes_per_sec={{127.0.0.1, 100.376}, {127.0.0.2, 100.376}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 224897}, {127.0.0.2, 224897}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=13 range=(5764607523034234879,6917529027641081855] replicas={19caaeb3-d754-4704-a998-840df53eb54c:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:3} primary_replica_only=true
[shard 5:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.068579935 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7129512}, {127.0.0.3, 7129512}}, row_from_disk_nr={{127.0.0.1, 15234}, {127.0.0.3, 15234}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1432}, {127.0.0.3, 99.1432}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222135}, {127.0.0.3, 222135}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=6 duration=0.352379s

node2:
$nodetool repair -p 7200 -pr ks1 standard1
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 1 out of 6 tablets: table=ks1.standard1 tablet_id=1 range=(-8070450532247928833,-6917529027641081857] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:5} primary_replica_only=true
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07016466 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7212816}, {127.0.0.2, 7212816}}, row_from_disk_nr={{127.0.0.1, 15412}, {127.0.0.2, 15412}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.0362}, {127.0.0.2, 98.0362}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 219655}, {127.0.0.2, 219655}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 2 out of 6 tablets: table=ks1.standard1 tablet_id=9 range=(1152921504606846975,2305843009213693951] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:7} primary_replica_only=true
[shard 5:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07180758 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7236216}, {127.0.0.3, 7236216}}, row_from_disk_nr={{127.0.0.2, 15462}, {127.0.0.3, 15462}}, row_from_disk_bytes_per_sec={{127.0.0.2, 96.104}, {127.0.0.3, 96.104}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 215325}, {127.0.0.3, 215325}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 3 out of 6 tablets: table=ks1.standard1 tablet_id=10 range=(2305843009213693951,3458764513820540927] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:1, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true
[shard 1:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06772773 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7039188}, {127.0.0.2, 7039188}}, row_from_disk_nr={{127.0.0.1, 15041}, {127.0.0.2, 15041}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1188}, {127.0.0.2, 99.1188}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222080}, {127.0.0.2, 222080}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 4 out of 6 tablets: table=ks1.standard1 tablet_id=11 range=(3458764513820540927,4611686018427387903] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true
[shard 7:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07025768 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7229664}, {127.0.0.3, 7229664}}, row_from_disk_nr={{127.0.0.2, 15448}, {127.0.0.3, 15448}}, row_from_disk_bytes_per_sec={{127.0.0.2, 98.1351}, {127.0.0.3, 98.1351}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 219876}, {127.0.0.3, 219876}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 5 out of 6 tablets: table=ks1.standard1 tablet_id=14 range=(6917529027641081855,8070450532247928831] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:7} primary_replica_only=true
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0719635 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7225452}, {127.0.0.2, 7225452}}, row_from_disk_nr={{127.0.0.1, 15439}, {127.0.0.2, 15439}}, row_from_disk_bytes_per_sec={{127.0.0.1, 95.7531}, {127.0.0.2, 95.7531}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 214539}, {127.0.0.2, 214539}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 6 out of 6 tablets: table=ks1.standard1 tablet_id=15 range=(8070450532247928831,9223372036854775807] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:4, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true
[shard 4:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0691715 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7122960}, {127.0.0.2, 7122960}}, row_from_disk_nr={{127.0.0.1, 15220}, {127.0.0.2, 15220}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.2049}, {127.0.0.2, 98.2049}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 220033}, {127.0.0.2, 220033}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.42178s

node3:
$nodetool repair -p 7300 -pr ks1 standard1
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=0 range=(minimum token,-8070450532247928833] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 2dd3808d-6601-4483-b081-adf41ef094e5:5} primary_replica_only=true
[shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07126866 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7133256}, {127.0.0.3, 7133256}}, row_from_disk_nr={{127.0.0.2, 15242}, {127.0.0.3, 15242}}, row_from_disk_bytes_per_sec={{127.0.0.2, 95.4529}, {127.0.0.3, 95.4529}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 213867}, {127.0.0.3, 213867}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=3 range=(-5764607523034234881,-4611686018427387905] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:5, 2dd3808d-6601-4483-b081-adf41ef094e5:4} primary_replica_only=true
[shard 5:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0701025 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7138404}, {127.0.0.3, 7138404}}, row_from_disk_nr={{127.0.0.2, 15253}, {127.0.0.3, 15253}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1108}, {127.0.0.3, 97.1108}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217581}, {127.0.0.3, 217581}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=5 range=(-3458764513820540929,-2305843009213693953] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:7, 2dd3808d-6601-4483-b081-adf41ef094e5:1} primary_replica_only=true
[shard 7:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06859512 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7171632}, {127.0.0.3, 7171632}}, row_from_disk_nr={{127.0.0.2, 15324}, {127.0.0.3, 15324}}, row_from_disk_bytes_per_sec={{127.0.0.2, 99.7068}, {127.0.0.3, 99.7068}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 223398}, {127.0.0.3, 223398}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=7 range=(-1152921504606846977,-1] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:2, 2dd3808d-6601-4483-b081-adf41ef094e5:7} primary_replica_only=true
[shard 2:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06975318 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7105176}, {127.0.0.3, 7105176}}, row_from_disk_nr={{127.0.0.2, 15182}, {127.0.0.3, 15182}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1429}, {127.0.0.3, 97.1429}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217653}, {127.0.0.3, 217653}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=8 range=(-1,1152921504606846975] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 19caaeb3-d754-4704-a998-840df53eb54c:2} primary_replica_only=true
[shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.070810474 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7023276}, {127.0.0.3, 7023276}}, row_from_disk_nr={{127.0.0.1, 15007}, {127.0.0.3, 15007}}, row_from_disk_bytes_per_sec={{127.0.0.1, 94.5894}, {127.0.0.3, 94.5894}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 211932}, {127.0.0.3, 211932}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
[shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.351395s

Fixes #16599
2024-01-18 08:49:06 +08:00
Asias He
93028f4848 repair: Pass raft_address_map to repair service
It is needed to translate hostid to ip address.
2024-01-18 08:49:06 +08:00
Asias He
194e870996 repair: Add host2ip_t type
It is used to translate hostid to ip address in repair code.
2024-01-18 08:49:06 +08:00
Asias He
637b8e4f51 repair: Add finished user-requested log for vnode table too 2024-01-18 08:49:06 +08:00
Asias He
b24f6fbc92 repair: Log error in the rpc_stream_handler
It is useful for debug when the handler goes wrong. In addition to send
the error back to the peer. Log the error as well.
2024-01-18 08:49:06 +08:00
Asias He
fd774862be repair: Make row_level repair work with tablet
Since a given tablet belongs to a single shard on both repair master and repair
followers, row level repair code needs to be changed to work on a single
shard for a given tablet. In order to tell the repair followers which
shard to work on, a dst_cpu_id value is passed over rpc from the repair
master.
2024-01-18 08:49:06 +08:00
Asias He
e1f68ea64a repair: Add get_dst_shard_id
A helper to get the dst shard id on the repair follower.

If the repair master specifies the shard id for the follower, use it.
Otherwise, the follower chooses one itself.
2024-01-18 08:49:06 +08:00
Asias He
2e8c6ebfca repair: Add shard to repair_node_state
It is used to specify the shard id that repair instance runs on.
2024-01-18 08:49:06 +08:00
Asias He
16349be37e repair: Add shard map to repair_neighbors
It is used to specify the shard id that repair instance should run repair
on.
2024-01-18 08:49:06 +08:00
Avi Kivity
394ef13901 build: regenerate frozen toolchain for tablets-aware Python driver
Pull in scylla-driver 3.26.5, which supports tablets.

Closes scylladb/scylladb#16829
2024-01-17 22:47:36 +02:00
Kefu Chai
0ae81446ef ./: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16766
2024-01-17 16:30:14 +02:00
Kamil Braun
787b24cd24 Merge 'raft topology: join: shut down a node on error in response handler' from Patryk Jędrzejczak
If the joining node fails while handling the response from the
topology coordinator, it hangs even though it knows the join
operation has failed. Therefore, we ensure it shuts down in
this patch.

Additionally, we ensure that if the first join request response
was a rejection or the node failed while handling it, the
following acceptances by the (possibly different) coordinator
don't succeed. The node considers the join operation as failed.
We shouldn't add it to the cluster.

Fixes scylladb/scylladb#16333

Closes scylladb/scylladb#16650

* github.com:scylladb/scylladb:
  topology_coordinator: clarify warnings
  raft topology: join: allow only the first response to be a succesful acceptance
  storage_service: join_node_response_handler: fix indentation
  raft topology: join: shut down a node on error in response handler
2024-01-17 14:55:26 +01:00
Botond Dénes
f22fc88a64 Merge 'Configure service levels interval' from Michał Jadwiszczak
Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests.

This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest.

Closes scylladb/scylladb#16394

* github.com:scylladb/scylladb:
  test:cql-pytest: change service levels intervals in tests
  configure service levels interval
2024-01-17 12:24:49 +02:00
Benny Halevy
0d937f3974 table: add_sstable_and_update_cache: trigger compaction only in compaction group
There is no need to trigger compaction in all compaction
groups when an sstable is added to only one of them.

And with that level of control, if the caller passes
sstables::offstrategy::yes, we know it will
trigger offstrategy compaction later on so there
is no need to trigger compaction at all
for this sstable at this time.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-01-17 12:13:17 +02:00
Benny Halevy
51a46aa83b compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact
Prevent the creation of a compaction task when
the list of sstables is known to be empty ahead
of time.

Refs scylladb/scylladb#16694
Fixes scylladb/scylladb#16803

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-01-17 11:53:39 +02:00
Benny Halevy
bd1d65ec38 compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction
3b424e391b introduced a loop
in `perform_cleanup` that waits until all sstables that require
cleanup are cleaned up.

However, with f1bbf705f9,
an sstable that is_eligible_for_compaction (i.e. it
is not in staging, awaiting view update generation),
may already be compacted by e.g. regular compaction.
And so perform_cleanup should interrupt that
by calling try_perform_cleanup, since the latter
reevaluates `update_sstable_cleanup_state` with
compaction disabled - that stops ongoing compactions.

Refs scylladb/scylladb#15673

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-01-17 11:53:39 +02:00
David Garcia
f555a2cb05 docs: dynamic include based on flag
docs: extend include options

Closes scylladb/scylladb#16753
2024-01-17 09:33:40 +02:00
Calle Wilund
af0772d605 commitlog: Add wait_for_pending_deletes
Refs #16757

Allows waiting for all previous and pending segment deletes to finish.
Useful if a caller of `discard_completed_segments` (i.e. a memtable
flush target) not only wants to ensure segments are clean and released,
but thoroughly deleted/recycled, and hence no treat to resurrecting
data on crash+restart.

Test included.

Closes scylladb/scylladb#16801
2024-01-17 09:30:55 +02:00
Kefu Chai
84a9d2fa45 add formatter for auth::role_or_anonymous
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for auth::role_or_anonymous,
and remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16812
2024-01-17 09:28:13 +02:00
Kefu Chai
3f0fbdcd86 replica: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16810
2024-01-17 09:27:09 +02:00
Tomasz Grabiec
3d76aefb98 Merge "Enhance topology request status tracking" from Gleb
Currently to figure out if a topology request is complete a submitter
checks the topology state and tries to figure out from that the status
of the request. This is not exact. Lets look at rebuild handling for
instance. To figure out if request is completed the code waits for
request object to disappear from the topology, but if another rebuild
starts between the end of the previous one and the code noticing that
it completed the code will continue waiting for the next rebuild.
Another problem is that in case of operation failure there is no way to
pass an error back to the initiator.

This series solves those problems by assigning an id for each request and
tracking the status of each request in a separate table. The initiator
can query the request status from the table and see if the request was
completed successfully or if it failed with an error, which is also
evadable from the table.

The schema for the table is:

    CREATE TABLE system.topology_requests (
        id timeuuid PRIMARY KEY,

        initiating_host uuid,
        start_time timestamp,

        done boolean,
        error text,
        end_time timestamp,
    );

and all entries have TTL of one month.
2024-01-17 00:37:19 +01:00
Benny Halevy
d6071945c8 compaction, table: ignore foreign sstables replay_position
The sstables replay_position in stats_metadata is
valid only on the originating node and shard.

Therefore, validate the originating host and shard
before using it in compaction or table truncate.

Fixes #10080

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16550
2024-01-16 18:45:59 +02:00
Benny Halevy
7a7a1db86b sstables_loader: load_new_sstables: auto-enable load-and-stream for tablets
And call on_internal_error if process_upload_dir
is called for tablets-enabled keyspace as it isn't
supported at the moment (maybe it could be in the future
if we make sure that the sstables are confined to tablets
boundaries).

Refs #12775
Fixes #16743

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16788
2024-01-16 18:43:52 +02:00
Gleb Natapov
9a7243d71a storage_service: topology coordinator: Consolidate some mutation builder code 2024-01-16 17:02:54 +02:00
Gleb Natapov
a145a73136 storage_service: topology coordinator: make topology operation rollback error more informative
Include an error which caused the rollback.
2024-01-16 17:02:54 +02:00
Gleb Natapov
bf91eb37f2 storage_service: topology coordinator: make topology operation cancellation error more informative
Include the list of nodes that were down when cancellation happened.
2024-01-16 17:02:54 +02:00
Gleb Natapov
8beb399b72 storage_service: topology coordinator: consolidate some code in cancel_all_requests
There is a code duplication that can be avoided.
2024-01-16 17:02:54 +02:00
Gleb Natapov
fba6877b3e storage_service: topology coordinator: TTL topology request table
To prevent topology_request table growth TTL all writes to expire after
a month.
2024-01-16 17:02:54 +02:00
Gleb Natapov
d576ed31dc storage_service: topology request: drop explicit shutdown rpc
Now that we have explicit status for each request we may use it to
replace shutdown notification rpc. During a decommission, in
left_token_ring state, we set done to true after metadata barrier
that waits for all request to the decommissioning node to complete
and notify the decommissioning node with a regular barrier. At this
point the node will see that the request is complete and exit.
2024-01-16 17:02:54 +02:00
Gleb Natapov
84197ff735 storage_service: topology coordinator: check topology operation completion using status in topology_requests table
Instead of trying to guess if a request completed by looking into the
topology state (which is sometimes can be error prone) look at the
request status in the new topology_requests. If request failed report
a reason for the failure from the table.
2024-01-16 17:02:54 +02:00
Kefu Chai
0092700ad1 memtable: add formatter for replica::{memtable,memtable_entry}
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for replica::memtable and
replica::memtable_entry, and remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16793
2024-01-16 16:46:52 +02:00
Kefu Chai
2dbf044b91 cql3: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16791
2024-01-16 16:43:17 +02:00
Avi Kivity
a9844ed69a Merge 'view: revert cleanup filter that doesn't work with tablets' from Nadav Har'El
The goal of this PR is fix Scylla so that the dtest test_mvs_populating_from_existing_data, which starts to fail when enabling tablets, will pass.

The main fix (the second patch) is reverting code which doesn't work with tablets, and I explain why I think this code was not necessary in the first place.

Fixes #16598

Closes scylladb/scylladb#16670

* github.com:scylladb/scylladb:
  view: revert cleanup filter that doesn't work with tablets
  mv: sleep a bit before view-update-generator restart
2024-01-16 16:42:20 +02:00
Gleb Natapov
1c18476385 storage_service: topology coordinator: update topology_requests table with requests progress
Make topology coordinator update request's status in topology_requests table as it changes.
2024-01-16 15:35:18 +02:00
Benny Halevy
e277ec6aef force_keyspace_cleanup: skip keyspaces that do not require or support cleanup
Local keyspaces do not need cleanup, and
keyspaces configured with tablets, where their
replication strategy is per-table do not support
cleanup.

In both cases, just skip their cleanup via the api.

Fixes #16738

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16785
2024-01-16 15:01:49 +03:00
Gleb Natapov
1ce1c5001d topology coordinator: add topology_requests table to group0 snapshot
Since the table is updated through raft's group0 state machine its
content needs to be part of the snapshot.
2024-01-16 13:57:27 +02:00
Gleb Natapov
584551f849 topology coordinator: add request_id to the topology state machine
Provide a unique ID for each topology request and store it the topology
state machine. It will be used to index new topology requests table in
order to retrieve request status.
2024-01-16 13:57:27 +02:00
Gleb Natapov
ecb8778950 system keyspace: introduce local table to store topology requests status
The table has the following schema and will be managed by raft:

CREATE TABLE system.topology_requests (
    id timeuuid PRIMARY KEY,

    initiating_host uuid,
    start_time timestamp,

    done boolean,
    error text,
    end_time timestamp,
);

In case of an request completing with an error the "error" filed will be non empty when "done" is set to true.
2024-01-16 13:57:16 +02:00
Tomasz Grabiec
49026dc319 Merge 'Turn on tablets on keyspace by default when the feature is enabled' from Pavel Emelyanov
To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to.

This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is

* `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what
* `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration
* `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing

fixes: #16319

Closes scylladb/scylladb#16364

* github.com:scylladb/scylladb:
  code: Enable tablets if cluster feature is enabled
  test: Turn off tablets feature by default
  test: Move test_tablet_drain_failure_during_decommission to another suite
  test/tablets: Enable tables for real on test keyspace
  test/tablets: Make timestamp local
  cql3: Add feature service to as_ks_metadata_update()
  cql3: Add feature service to ks_prop_defs::as_ks_metadata()
  cql3: Add feature service to get_keyspace_metadata()
  cql: Add tablets on/off switch to CREATE KEYSPACE
  cql: Move initial_tablets from REPLICATION to TABLETS in DDL
  network_topology_strategy: Estimate initial_tablets if 0 is set
2024-01-16 00:15:10 +01:00
Avi Kivity
5e70dd1dbe database: don't allow keyspace objects to be copied
keyspace objects are heavyweight and copies are immediately our-of-date,
so copying them is bad.

Fix by deleting the copy constructor and copy assignment operator. One
call site is fixed. This call site is safe since the it's only used
for accessing a few attributes (introduced in f70c4127c6).

Closes scylladb/scylladb#16782
2024-01-15 21:48:32 +01:00
Botond Dénes
204d3284fa readers/multishard: evictable_reader::fast_forward_to(): close reader on exception
When the reader is currently paused, it is resumed, fast-forwarded, then
paused again. The fast forwarding part can throw and this will lead to
destroying the reader without it being closed first.
Add a try-catch surrounding this part in the code. Also mark
`maybe_pause()` and `do_pause()` as noexcept, to make it clear why
that part doesn't need to be in the try-catch.

Fixes: #16606

Closes scylladb/scylladb#16630
2024-01-15 20:55:55 +01:00
Kefu Chai
e5300f3e21 topology_state_machine: add formatter for service::cleanup_status
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for service::cleanup_status,
and remove its operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16778
2024-01-15 21:31:42 +02:00
Anna Stuchlik
af1405e517 doc: remove support for CentOS 7
This commit removes support for CentOS 7
from the docs.

The change applies to version 5.4,so it
must be backported to branch-5.4.

Refs https://github.com/scylladb/scylla-enterprise/issues/3502

In addition, this commit removes the information
about Amazon Linux and Oracle Linux, unnecessarily added
without request, and there's no clarity over which versions
should be documented.

Closes scylladb/scylladb#16279
2024-01-15 15:37:29 +02:00
Anna Stuchlik
bca39b2a93 doc: remove Serverless from the Drivers page
This commit removes the information about ScyllaDB Cloud Serverless,
which is no longer valid.

Closes scylladb/scylladb#16700
2024-01-15 15:36:51 +02:00
Botond Dénes
66bef6e961 cql3: cluster_describe_statement: don't produce range ownership for tablet keyspaces
Tablet keyspaces have per/table range ownership, which cannot currently
be expressed in a DESC CLUSTER statement, which describes range
ownership in the current keyspace (if set). Until we figure out how to
represent range ownership (tablets) of all tables of a keyspace, we
disable range ownership for tablet keyspaces.

Fixes: #16483

Closes scylladb/scylladb#16713
2024-01-15 14:03:54 +01:00
Patryk Wrobel
aec0db1b96 cql_auth_query_test.cc: do not rely on templated operator<<
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&)
from test/boost/cql_auth_query_test.cc.

It prepares the test for removal of the templated helpers.
Such removal is one of goals of the referenced issue that is linked below.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16758
2024-01-15 13:30:05 +02:00
Kefu Chai
ece2bd2f6e service: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16764
2024-01-15 13:29:33 +02:00
Kefu Chai
fc97d91f1a auth: add fmt::format for auth::resource and friends
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for `auth::resource` and friends,
* update their callers of `operator<<` to use `fmt::print()`.
* drop `operator<<`, as they are not used anymore.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16765
2024-01-15 13:26:39 +02:00
Kefu Chai
f344e13066 types: add formatter for data_value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for data_value, but its
its operator<<() is preserved as we are still using the generic
homebrew formatter for formatting std::vector, which in turn uses
operator<< of the element type.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16767
2024-01-15 13:18:23 +02:00
Kefu Chai
218334eaf5 test/nodetool: use build/$CMAKE_BUILD_TYPE when appropriate
because the CMake-generated build.ninja is located under build/,
and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla,
instead of at build/$scylla_build_mode/scylla, so let's adapt to this
change accordingly.

we will promote this change to a shared place if we have similar
needs in other tests as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16775
2024-01-15 12:52:35 +02:00
Pavel Emelyanov
dd892b0d8a code: Enable tablets if cluster feature is enabled
If the TABLETS map is missing in the CREATE KEYSPACE statement the
tablets are anyway enabled if the respective cluster feature is enabled.

To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
4838eeb201 test: Turn off tablets feature by default
Next patches will make per-keyspace initial_tables option really
optional and turn tablets ON when the feature is ON. This will break all
other tests' assumptions, that they are testing vnodes replication. So
turn the feature off by default, tests that do need tables will need to
explicitly enable this feature on their own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
ae7da54f88 test: Move test_tablet_drain_failure_during_decommission to another suite
In its current location it will be started with 3 pre-created scylla
nodes with default features ON. Next patch will exclude `tablets` from
the default list, so the test needs to create servers on its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
46b36d8c07 test/tablets: Enable tables for real on test keyspace
When started cql_test_env creates a test keyspace. Some tablets test
cases create a table in this keyspace, but misuse the whole feature. The
thing is that while tablets feature is ON in those test cases, the
keyspace itself doesn _not_ have the initial_tables option and thus
tablets are not enabled for the ks' table for real. Currently test cases
work just because this table is only used as a transparent table ID
placeholder. If turning on tablets for the keyspace, several test cases
would get broken for two reasons.

First, the tables map will no longer be empty on test start.

Second, applying changes to tablet metadata may not be visible, becase
test case uses "ranom" timestamp, that can be less that the initial
metadata mutations' timestamp.

This patch fixes all three places:

1. enables tables for the test keyspace
2. removes assumption that the initial metadata is empty
3. uses large enough timestamp for subsequent mutations

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
2376b699e0 test/tablets: Make timestamp local
Just to make next patching simpler

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
f3a69bfaca cql3: Add feature service to as_ks_metadata_update()
To call prepare_options() with tablets feature state later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
4dede19e4f cql3: Add feature service to ks_prop_defs::as_ks_metadata()
To call prepare_options() with tablets feature state later

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
267770bf0f cql3: Add feature service to get_keyspace_metadata()
To be passed down to ks_prop_defs::as_ks_metadata()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:12 +03:00
Pavel Emelyanov
6cb3055059 cql: Add tablets on/off switch to CREATE KEYSPACE
Now the user can do

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }

to turn tablets off. It will be useful in the future to opt-out keyspace
from tablets when they will be turned on by default based on cluster
features only.

Also one can do just

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }

and let Scylla select the initial tablets value by its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:12:11 +03:00
Pavel Emelyanov
941f6d8fca cql: Move initial_tablets from REPLICATION to TABLETS in DDL
This patch changes the syntax of enabling tablets from

  CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }

to be

  CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }

and updates all tests accordingly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:04:48 +03:00
Pavel Emelyanov
4c4a9679d8 network_topology_strategy: Estimate initial_tablets if 0 is set
If user configured zero initial tablets (spoiler: or this value was set
automagically when enabling tablets begind the scenes) we still need
some value to start with and this patch calculates one.

The math is based on topology and RF so that all shards are covered:

initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters)

The estimation is done when a table is created, not when the keyspace is
created. For that, the keyspace is configured with zero initial tabled,
and table-creation time zero is converted into auto-estimated value.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-01-15 13:04:48 +03:00
Kamil Braun
423234841e Merge 'add automatic sstable cleanup to the topology coordinator' from Gleb
For correctness sstable cleanup has to run between (some) topology
changes.  Sometimes even a failed topology change may require running
the cleanup.  The series introduces automatic sstable cleanup step to the
topology change coordinator. Unlike other operations it is not represented
as a global transition state, but done by each node independently which
allows cleanup to run without locking the topology state machine so
tablet code can run in parallel with the cleanup.

It is done by having a cleanup state flag for each node in the
topology. The flag is a tri state: "clean" - the node is clean, "needed"
- cleanup is needed (but not running), "running" - cleanup is running. No
topology operation can proceed if there is a node in "running" state, but
some operation can proceed even if there are nodes in "needed" state. If
the coordinator needs to perform a topology operation that cannot run while
there are nodes that need cleanup the coordinator will start one
automatically and continue only after cleanup completes. There is also a
possibility to kick cleanup manually through the new RAFT API call.

* 'cleanup-needed-v8' of https://github.com/gleb-cloudius/scylla:
  test: add test for automatic cleanup procedure
  test: add test for topology requests queue management
  storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator
  storage_service: topology coordinator: add logging to removenode and decommission
  storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator
  storage_service: topology coordinator: manage cluster cleanup as part of the topology management
  storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter
  test: use servers_see_each_other when needed
  test: add servers_see_each_other helper
  storage_service: topology coordinator: make topology coordinator lifecycle subscriber
  system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request
  storage_service: topology coordinator: introduce sstable cleanup fiber
  storage_proxy: allow to wait for all ongoing writes
  storage_service: topology coordinator: mark nodes as needing cleanup when required
  storage_service: add mark_nodes_as_cleanup_needed function
  vnode_effective_replication_map: add get_all_pending_nodes() function
  vnode_effective_replication_map: pre calculate dirty endpoints during topology change
  raft topology: add cleanup state to the topology state machine
2024-01-14 18:54:02 +01:00
Gleb Natapov
f8b90aeb14 test: add test for automatic cleanup procedure
The test runs two bootstraps and checks that there is no cleanup
in between.  Then it runs a decommission and checks that cleanup runs
automatically and then it runs one more decommission and checks that no
cleanup runs again.  Second part checks manual cleanup triggering. It
adds a node, triggers cleanup through the REST API, checks that is runs,
decommissions a node and check that the cleanup did not run again.
2024-01-14 15:45:53 +02:00
Gleb Natapov
5882855669 test: add test for topology requests queue management
This test creates a 5 node cluster with 2 down nodes (A and B). After
that it creates a queue of 3 topology operation: bootstrap, removenode
A and removenode B with ignore_nodes=A. Check that all operation
manage to complete.  Then it downs one node and creates a queue with
two requests: bootstrap and decommission. Since none can proceed both
should be canceled.
2024-01-14 15:45:53 +02:00
Gleb Natapov
ba7aa0d582 storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator 2024-01-14 15:45:53 +02:00
Gleb Natapov
1afc891bd5 storage_service: topology coordinator: add logging to removenode and decommission
Add some useful logging to removenode and decommission to be used by
tests later.
2024-01-14 15:45:53 +02:00
Gleb Natapov
97ab3f6622 storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator
Introduce new REST API "/storage_service/cleanup_all"
that, when triggered, instructs the topology coordinator to initiate
cluster wide cleanup on all dirty nodes. It is done by introducing new
global command "global_topology_request::cleanup".
2024-01-14 15:45:53 +02:00
Gleb Natapov
0adb3904d8 storage_service: topology coordinator: manage cluster cleanup as part of the topology management
Sometimes it is unsafe to start a new topology operation before cleanup
runs on dirty nodes. This patch detects the situation when the topology
operation to be executed cannot be run safely until all dirty nodes do
cleanup and initiates the cleanup automatically. It also waits for
cleanup to complete before proceeding with the topology operation.

There can be a situation that nodes that needs cleanup dies and will
never clear the flag. In this case if a topology operation that wants to
run next does not have this node in its ignore node list it may stuck
forever. To fix this the patch also introduces the "liveness aware"
request queue management: we do not simple choose _a_ request to run next,
but go over the queue and find requests that can proceed considering
the nodes liveness situation. If there are multiple requests eligible to
run the patch introduces the order based on the operation type: replace,
join, remove, leave, rebuild. The order is such so to not trigger cleanup
needlessly.
2024-01-14 15:45:50 +02:00
Nadav Har'El
2d04070120 Update seastar submodule
* seastar 0ffed835...8b9ae36b (4):
  > net/posix: Track ap-server ports conflict

Fixes #16720

  > include/seastar/core: do not include unused header
  > build: expose flag like -std=c++20 via seastar.pc
  > src: include used headers for C++ modules build

Closes scylladb/scylladb#16769
2024-01-14 14:51:11 +02:00
Gleb Natapov
c9b7bd5a33 storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter
Needed by the next patch.
2024-01-14 14:44:07 +02:00
Gleb Natapov
0e68073b22 test: use servers_see_each_other when needed
In the next patch we want to abort topology operations if there is no
enough live nodes to perform them. This will break tests that do a
topology operation right after restarting a node since a topology
coordinator may still not see the restarted node as alive. Fix all those
tests to wait between restart and a topology operation until UP state
propagates.
2024-01-14 14:44:07 +02:00
Gleb Natapov
455ffaf5d8 test: add servers_see_each_other helper
The helper makes sure that all nodes in the cluster see each other as
alive.
2024-01-14 14:44:07 +02:00
Gleb Natapov
067267ff76 storage_service: topology coordinator: make topology coordinator lifecycle subscriber
We want to change the coordinator to consider nodes liveness when
processing the topology operation queue. If there is no enough live
nodes to process any of the ops we want to cancel them. For that to work
we need to be able to kick the coordinator if liveness situation
changes.
2024-01-14 14:44:07 +02:00
Gleb Natapov
a4ac64a652 system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request
Next patch will need ignore nodes list while processing removenode
request. Load it.
2024-01-14 14:44:07 +02:00
Gleb Natapov
f70c4127c6 storage_service: topology coordinator: introduce sstable cleanup fiber
Introduce a fiber that waits on a topology event and when it sees that
the node it runs on needs to perform sstable cleanup it initiates one
for each non tablet, non local table and resets "cleanup" flag back to
"clean" in the topology.
2024-01-14 14:44:07 +02:00
Gleb Natapov
5b246920ae storage_proxy: allow to wait for all ongoing writes
We want to be able to wait for all writes started through the storage
proxy before a fence is advanced. Add phased_barrier that is entered
on each local write operation before checking the fence to do so. A
write will be either tracked by the phased_barrier or fenced. This will
be needed to wait for all non fenced local writes to complete before
starting a cleanup.
2024-01-14 14:44:07 +02:00
Gleb Natapov
b2ba77978c storage_service: topology coordinator: mark nodes as needing cleanup when required
A cleanup needs to run when a node loses an ownership of a range (during
bootstrap) or if a range movement to an normal node failed (removenode,
decommission failure). Mark all dirty node as "cleanup needed" in those cases.
2024-01-14 14:43:59 +02:00
Gleb Natapov
dbededb1a6 storage_service: add mark_nodes_as_cleanup_needed function
The function creates a mutation that sets cleanup to "needed" for each
normal node that, according to the erm, has data it does not own after
successful or unsuccessful topology operation.
2024-01-14 14:43:33 +02:00
Gleb Natapov
23a27ccc24 vnode_effective_replication_map: add get_all_pending_nodes() function
Add a function that returns all nodes that have vnode been moved to them
during a topology change operation. Needed to know which nodes need to
do cleanup in case of failed topology change operation.
2024-01-14 14:37:16 +02:00
Gleb Natapov
a8f11852da vnode_effective_replication_map: pre calculate dirty endpoints during topology change
Some topology change operations causes some nodes loose ranges. This
information is needed to know which nodes need to do cleanup after
topology operation completes. Pre calculate it during erm creation.
2024-01-14 14:11:19 +02:00
Gleb Natapov
cc54796e23 raft topology: add cleanup state to the topology state machine
The patch adds cleanup state to the persistent and in memory state and
handles the loading. The state can be "clean" which means no cleanup
needed, "needed" which means the node is dirty and needs to run cleanup
at some point, "running" which means that cleanup is running by the node
right now and when it will be completed the state will be reset to "clean".
2024-01-14 13:30:54 +02:00
Nadav Har'El
1bcaeb89c7 view: revert cleanup filter that doesn't work with tablets
This patch reverts commit 10f8f13b90 from
November 2022. That commit added to the "view update generator", the code
which builds view updates for staging sstables, a filter that ignores
ranges that do not belong to this node. However,

1. I believe this filter was never necessary, because the view update
   code already silently ignores base updates which do not belong to
   this replica (see get_view_natural_endpoint()). After all, the view
   update needs to know that this replica is the Nth owner of the base
   update to send its update to the Nth view replica, but if no such
   N exists, no view update is sent.

2. The code introduced for that filter used a per-keyspace replication
   map, which was ok for vnodes but no longer works for tablets, and
   causes the operation using it to fail.

3. The filter was used every time the "view update generator" was used,
   regardless of whether any cleanup is necessary or not, so every
   such operation would fail with tablets. So for example the dtest
   test_mvs_populating_from_existing_data fails with tablets:
     * This test has view building in parallel with automatic tablet
       movement.
     * Tablet movement is streaming.
     * When streaming happens before view building has finished, the
       streamed sstables get "view update generator" run on them.
       This causes the problematic code to be called.

Before this patch, the dtest test_mvs_populating_from_existing_data
fails when tablets are enabled. After this patch, it passes.

Fixes #16598

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-14 13:24:44 +02:00
Nadav Har'El
0fe40f729e mv: sleep a bit before view-update-generator restart
The "view update generator" is responsible for generating view updates
for staging sstables (such as coming from repair). If the processing
fails, the code retries - immediately. If there is some persistent bug,
such as issue #16598, we will have a tight loop of error messages,
potentially a gigabyte of identical messages every second.

In this patch we simply add a sleep of one second after view update
generation fails before retrying. We can still get many identical
error messages if there is some bug, but not more than one per second.

Refs #16598.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-01-14 13:13:52 +02:00
Kamil Braun
4e18f8b453 Merge 'topology_state_load: stop waiting for IP-s' from Petr Gusev
The loop in `id2ip` lambda makes problems if we are applying an old raft
log that contains long-gone nodes. In this case, we may never receive
the `IP` for a node and stuck in the loop forever. In this series we
replace the loop with an if - we just don't update the `host_id <-> ip`
mapping in the `token_metadata.topology` if we don't have an `IP` yet.

The PR moves `host_id -> IP` resolution to the data plane, now it
happens each time the IP-based methods of `erm` are called. We need this
because IPs may not be known at the time the erm is built. The overhead
of `raft_address_map` lookup is added to each data plane request, but it
should be negligible. In this PR `erm/resolve_endpoints` continues to
treat missing IP for `host_id` as `internal_error`, but we plan to relax
this in the follow-up (see this PR first comment).

Closes scylladb/scylladb#16639

* github.com:scylladb/scylladb:
  raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater
  gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes
  storage_service: topology_state_load: remove IP waiting loop
  storage_service: sync_raft_topology_nodes: add target_node parameter
  storage_service: sync_raft_topology_nodes: move loops to the end
  storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node
  storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node
  storage_service: sync_raft_topology_nodes: move update_topology up
  storage_service: topology_state_load: remove clone_async/clear_gently overhead
  storage_service: fix indentation
  storage_service: extract sync_raft_topology_nodes
  storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata
  address_map: move gossiper subscription logic into storage_service
  topology_coordinator: exec_global_command: small refactor, use contains + reformat
  storage_service: wait_for_ip for new nodes
  storage_service.idl.hh: fix raft_topology_cmd.command declaration
  erm: for_each_natural_endpoint_until: use is_vnode == true
  erm: switch the internal data structures to host_id-s
  erm: has_pending_ranges: switch to host_id
2024-01-12 18:46:51 +01:00
Petr Gusev
e24bee545b raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater 2024-01-12 18:29:22 +04:00
Petr Gusev
6e7bbc94f4 gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes
When a node changes its IP we need to store the mapping in
system.peers and update token_metadata.topology and erm
in-memory data structures.

The test_change_ip was improved to verify this new
behaviour. Before this patch the test didn't check
that IPs used for data requests are updated on
IP change. In this commit we add the read/write check.
It fails on insert with 'node unavailable'
error without the fix.
2024-01-12 18:28:57 +04:00
Petr Gusev
6d6e1ba8fb storage_service: topology_state_load: remove IP waiting loop
The loop makes problems if we are applying an old
raft log that contains long-gone nodes. In this case, we may
never receive the IP for a node and stuck in the loop forever.

The idea of the patch is to replace the loop with an
if - we just don't update the host_id <-> ip mapping
in the token_metadata.topology if we don't have an IP yet.
When we get the mapping later, we'll call
sync_raft_topology_nodes again from
gossiper_state_change_subscriber_proxy.
2024-01-12 15:37:50 +04:00
Petr Gusev
260874c860 storage_service: sync_raft_topology_nodes: add target_node parameter
If it's set, instead of going over all the nodes in raft topology,
the function will update only the specified node. This parameter
will be used in the next commit, in the call to sync_raft_topology_nodes
from gossiper_state_change_subscriber_proxy.
2024-01-12 15:37:50 +04:00
Petr Gusev
a9d58c3db5 storage_service: sync_raft_topology_nodes: move loops to the end 2024-01-12 15:37:50 +04:00
Petr Gusev
d1bce3651b storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node 2024-01-12 15:37:50 +04:00
Petr Gusev
aa37b6cfd3 storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node 2024-01-12 15:37:50 +04:00
Petr Gusev
a508d7ffc5 storage_service: sync_raft_topology_nodes: move update_topology up
In this and the following commits we prepare sync_raft_topology_nodes
to handle target_node parameter - the single host_id which should be
updated.
2024-01-12 15:37:50 +04:00
Petr Gusev
1b12f4b292 storage_service: topology_state_load: remove clone_async/clear_gently overhead
Before the patch we used to clone the entire token_metadata
and topology only to immediately drop everything in clear_gently.
This is a sheer waste.
2024-01-12 15:37:50 +04:00
Petr Gusev
1531e5e063 storage_service: fix indentation 2024-01-12 15:37:50 +04:00
Petr Gusev
9c50637f28 storage_service: extract sync_raft_topology_nodes
In the following commits we need part of the
topology_state_load logic to be applied
from gossiper_state_change_subscriber_proxy.
In this commit we extract this logic into a
new function sync_raft_topology_nodes.
2024-01-12 15:37:50 +04:00
Petr Gusev
9679b49cf4 storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata
In the next commit we extract the loops by nodes into
a new function, in this commit we just move them
closer to each other.

Now the remove_endpoint function might be called under
token_metadata_lock (mutate_token_metadata takes it).
It's not a problem since gossiper event handlers in
raft_topology mode doesn't modify token_metadata so
we won't get a deadlock.
2024-01-12 15:37:50 +04:00
Petr Gusev
15b8e565ed address_map: move gossiper subscription logic into storage_service
We are going to remove the IP waiting loop from topology_state_load
in subsequent commits. An IP for a given host_id may change
after this function has been called by raft. This means we need
to subscribe to the gossiper notifications and call it later
with a new id<->ip mapping.

In this preparatory commit we move the existing address_map
update logic into storage_service so that in later commits
we can enhance it with topology_state_load call.
2024-01-12 15:37:50 +04:00
Petr Gusev
743be190f9 topology_coordinator: exec_global_command: small refactor, use contains + reformat 2024-01-12 15:37:50 +04:00
Petr Gusev
db1f0d5889 storage_service: wait_for_ip for new nodes
When a new node joins the cluster we need to be sure that it's IP
is known to all other nodes. In this patch we do this by waiting
for the IP to appear in raft_address_map.

A new raft_topology_cmd::command::wait_for_ip command is added.
It's run on all nodes of the cluster before we put the topology
into transition state. This applies both to new and replacing nodes.
It's important to run wait_for_ip before moving to
topology::transition_state::join_group0 since in this state
node IPs are already used to populate pending nodes in erm.
2024-01-12 15:37:46 +04:00
Michał Jadwiszczak
013487e1e1 test:cql-pytest: change service levels intervals in tests
Set the interval to 0.5s to reduce required sleep time.
2024-01-12 10:28:28 +01:00
Michał Jadwiszczak
f6a464ad81 configure service levels interval
So far the service levels interval, responsible for updating SL configuration,
was hardcoded in main.
Now it's extracted to `service_levels_interval_ms` option.
2024-01-12 10:28:24 +01:00
Kefu Chai
a0e5c14c55 alternator: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16736
2024-01-12 10:53:32 +02:00
Botond Dénes
5f44ae8371 Merge 'Add more logging for gossiper::lock_endpoint and storage_service::handle_state_normal' from Kamil Braun
In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.

Thus the plan is to continue debugging using the longevity test, but we need
more logs. To check whether `handle_state_normal` was called and which branches
were taken, include some INFO level logs there. Also, detect deadlocks inside
`gossiper::lock_endpoint` by reporting an error message if `lock_endpoint`
waits for the lock for too long.

Ref: scylladb/scylladb#16668

Closes scylladb/scylladb#16733

* github.com:scylladb/scylladb:
  gossiper: report error when waiting too long for endpoint lock
  gossiper: store source_location instead of string in endpoint_permit
  storage_service: more verbose logging in handle_state_normal
2024-01-12 10:51:21 +02:00
Lakshmi Narayanan Sreethar
cd9e027047 types: fix ambiguity in align_up call
Compilation fails with recent boost versions (>=1.79.0) due to an
ambiguity with the align_up function call. Fix that by adding type
inference to the function call.

Fixes #16746

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16747
2024-01-12 10:50:31 +02:00
Kefu Chai
344ea25ed8 db: add fmt::format for db::consistency_level
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for `db::consistency_level`
* drop its `operator<<`, as it is not used anymore

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16755
2024-01-12 10:49:00 +02:00
Patryk Wrobel
87545e40c7 test/boost/auth_resource_test.cc: do not rely on templated operator<<
This change is intended to remove the dependency to
operator<<(std::ostream&, const std::unordered_set<T>&)
from auth_resource_test.cc.

It prepares the test for removal of the templated helpers
from utils/to_string.hh, which is one of goals of the
referenced issue that is linked below.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16754
2024-01-12 10:48:01 +02:00
Petr Gusev
802da1e7a5 storage_service.idl.hh: fix raft_topology_cmd.command declaration
Make IDL correspond to the declaration of
raft_topology_cmd::command in topology_state_machine.hh.
2024-01-12 12:23:22 +04:00
Petr Gusev
41c15814e6 erm: for_each_natural_endpoint_until: use is_vnode == true
This is an optimisation - for_each_natural_endpoint_until is
called only for vnode tokens, we don't need to run the
binary search for it in tm.first_token.

Also the function is made private since it's only used
in erm itself.
2024-01-12 12:23:22 +04:00
Petr Gusev
07f2ec63c7 erm: switch the internal data structures to host_id-s
Before this patch the host_id -> IP mapping was done
in calculate_effective_replication_map. This function
is called from mutate_token_metadata, which means we
have to have an IP for each host_id in topology_state_load,
otherwise we get an error. We are going to remove
the IP waiting loop from topology_state_load, so
we need to get rid of IPs resolution from
calculate_effective_replication_map.

In this patch we move the host_id -> IP resolution to
the data plane. When a write or read request is sent
the target endpoints are requested from erm through
get_natural_endpoints_without_node_being_replaced,
get_pending_endpoints and get_endpoints_for_reading
methods and this is where the IP resolution
will now occur.
2024-01-12 12:23:22 +04:00
Petr Gusev
1928dc73a8 erm: has_pending_ranges: switch to host_id
In the next patches we are going to change erm data structures
(replication_map and ring_mapping) from IP to host_id. Having
locator::host_id instead of IP in has_pending_ranges arguments
makes this transition easier.
2024-01-12 12:23:19 +04:00
Botond Dénes
b69f7126c3 Update tools/java submodule
* tools/java 24e51259...c75ce2c1 (1):
  > Update JNA dependency to 5.14.0
2024-01-12 09:47:20 +02:00
Benny Halevy
3e938dbb5a storage_service: get rid of handle_state_moving declaration
The implementation was already removed in e64613154f

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16742
2024-01-12 09:38:23 +02:00
Nadav Har'El
5c7e029012 test/cql-pytest: add reproducer for task-tracking memory leak
This patch adds a reproducer test for the memory leak described in
issue #16493: If a table is repeatedly created and dropped, memory
is leaked by task tracking. Although this "leak" can be temporary
if task_ttl_in_seconds is properly configured, it may still use too
much memory if tables are too frequently created and dropped.
The test here shows that (before #16493 was fixed) as little as
100 tables created and deleted can cause Scylla to run out of
memory.

The problem is severely exacerbated when tablets are used which is
why the test here uses tablets. Before the fix for #16493 (a Seastar
patch, scylladb/seastar#2023), this test of 100 iterations always
failed (with test/cql-pytest/run's default memory allowance).
After the fix, the test doesn't fail in 100 iterations - and even
if increased manually to 10,000 iterations it doesn't fail.

The new test uses the initial_tablets feature, so requires Scylla to be
run with the "tablets" experimental option turned on. This is not
currently the default of test.py or test/cql-pytest/run, so I turned
it on manually to check this test. I also checked that the test is
correctly skipped if tablets are not turned on.

Refs #16493

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16717
2024-01-12 09:37:32 +02:00
Botond Dénes
63b266e94c Merge ' db: Make the "me" sstable format mandatory' from Kefu Chai
The `me` sstable format includes an important feature of storing the `host_id` of the local node when writing sstables.
The is crucial for validating the sstable's `replay_position` in stats metadata as it is valid only on the originating node and shard (#10080), therefor we would like to make the me format mandatory.

in this series, `sstable_format` option is deprecated, and the default sstable format is bumped up from `mc` to `md`, so that a cluster composed of nodes with this change should always use `me` as the sstable format.  if a node with this change joins a 5.x cluster which still using `md` because they are configured as such, this node will also be using `md`, unless the other node(s) changes its `sstable_format` setting to `me`.

Fixes #16551

Closes scylladb/scylladb#16716

* github.com:scylladb/scylladb:
  db/config.cc: do not respect sstable_format option
  feature_service: abort if sstable_format < md
  db, sstable: bump up default sstable format to "md"
2024-01-12 09:33:08 +02:00
Kamil Braun
cf646022cb gossiper: report error when waiting too long for endpoint lock
In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.

One hypothesis is that `gossiper` is stuck on `lock_endpoint`. We dealt
with gossiper deadlocks in the past (e.g. scylladb/scylladb#7127).

Modify the code so it reports an error if `lock_endpoint` waits for the
lock for more than a minute. When the issue reproduces again in
longevity, we will see if `lock_endpoint` got stuck.
2024-01-11 17:29:25 +01:00
Kefu Chai
7abd263ee6 db/config.cc: do not respect sstable_format option
"me" sstable format includes an important feature of storing the
`host_id` of the local node when writing sstables. The is crucial
for validating the sstable's `replay_position` in stats metadata as
it is valid only on the originating node and shard (#10080), therefor
we would like to make the `me` format mandatory.

before making `me` mandatory, we need to stop handling `sstable_format`
option if it is "md".

in this change

- gms/feature_service: do not disable `ME_SSTABLE_FORMAT` even if
  `sstable_format` is configured with "md". and in that case, instead,
  a warning is printed in the logging message to note that
  this setting is not valid anymore.
- docs/architecture/sstable: note that "me" is used by default now.

after this change, "sstable_format" will only accept "me" if it's
explicitly configured. and when a server with this change joins a
cluster, it uses "md" if the any of the node in the cluster still has
`sstable_format`. practically, this change makes "me" mandatory
in a 6.x cluster, assuming this change will be included in 6.x
releases.

Fixes #16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 22:43:05 +08:00
Kefu Chai
bece3eff0c feature_service: abort if sstable_format < md
sstable_format comes from scylla.yaml or from the command line
arguments, and we gate scylla from unallowed sstable formats lower
than `md` when parsing the configuration, and scylla bails out
at seeing the unallowed sstable format like:

```
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Invalid value for sstable_format: got ka which is not inside the set of allowed values md, me
Aborted (core dumped)
```

scylla errors out way before `feature_config_from_db_config()`
gets called -- it throws in `bpo::notify(configuration)`,
way before `func` is evaluated in `app_template::run_deprecated()`.

so, in this change, we do not handle these values anymore, and
consider it a bug if we run into any of them.

Refs #16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 22:43:05 +08:00
Kefu Chai
54d49c04e0 db, sstable: bump up default sstable format to "md"
before this change, we defaults to use "mc" sstable format, and
switch to "md" if the cluster agrees on using it, and to
"me" if the cluster agrees on using this. the cluster feature
is used to get the consensus across the members in the cluster,
if any of the existing nodes in the cluster has its `sstable_format`
configured to, for instance, "mc", then the cluster is stuck with
"mc".

but we disabled "mc" sstable format back in 3d345609, the first LTS
release including that change was scylla v5.2.0. which means, the
cluster of the last major version Scylla should be using "md" or
"me". per our document on upgrade, see docs/upgrade/index.rst,

> You should perform the upgrades consecutively - to each
> successive X.Y version, without skipping any major or minor version.
>
> Before you upgrade to the next version, the whole cluster (each
> node) must be upgraded to the previous version.

we can assume that, a 6.x node will only join a cluster
with 5.x or 6.x nodes. (joining a 7.x cluster should work, but
this is not relevant to this change). in both cases, since
5.x and up scylla can only configured with "md" `sstable_format`,
there is no need to switch from "mc" to "md" anymore. so we can
ditch the code supporting it.

Refs #16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 22:43:05 +08:00
Avi Kivity
f0d6330204 build: add crypto++ to dependencies
We depend on the crypto++ library (see utils/hashers.hh) but don't list
it in install-dependencies.sh. Currently this works because Seastar's
install-dependencies.sh installs it, but that's going away in [1]. List
crypto++ directly to keep install-dependencies.sh working.

Regenerating the frozen toolchain is unnecessary since we're re-adding
an existing dependency.

[1] 6bdef1e431

Closes scylladb/scylladb#16563
2024-01-11 16:26:20 +02:00
Patryk Jędrzejczak
e99d03a21e topology_coordinator: clarify warnings
It was unclear where the error messages ended if they consisted of
multiple sentences.
2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak
b4b170047b raft topology: join: allow only the first response to be a succesful acceptance
The joining node might receive more than one join response (see
the comment at the beginning of `join_node_response_handler`).

If the first response was a rejection or it was an acceptance but
the joining node failed while handling it, the following
acceptances by the coordinator shouldn't succeed. The joining
node considers the join operation as failed.

Currently, we always immediately return from non-first response
handler calls. However, if the response is an acceptance, and the
first response wasn't a successfully handled acceptance, we need
to throw an exception to ensure the topology coordinator moves
the node to the left state. We do it in this patch. We throw the
exception set while handling the first response. It explains why
we are failing the current acceptance.

We don't want to throw the exception on rejection. The topology
coordinator will move the node to the left state anyway. Also,
failing the rejection with an error message containing "the
topology coordinator rejected request to join the cluster" (from
the previous rejection) would be very confusing.
2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak
f3a08757af storage_service: join_node_response_handler: fix indentation
Broken in the previous patch.
2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak
ddfd9c3173 raft topology: join: shut down a node on error in response handler
If the joining node fails while handling the response from the
topology coordinator, it hangs even though it knows the join
operation has failed. Therefore, we ensure it shuts down in
this patch.

We rethrow the caught exception to ensure the topology coordinator
knows the RPC has failed. In case of rejection, it does not matter
because the coordinator behaves the same way in both cases: RPC
success and RPC failure. It transitions the rejected node to the
left state. However, in case of acceptance, this only happens if
the RPC fails. Otherwise, the coordinator continues handling the
request.

On abort, one of the two events happens first:
- the new catch statement catches `abort_requested_exeption` and
sets it on `_join_node_response_done`,
- `co_await _ss._join_node_response_done.get_shared_future(as);`
in `join_node_rpc_handshaker::post_server_start` resolves with
`abort_requested_exception` after triggering `as`. In both cases,
`join_node_rpc_handshaker::post_server_start` throws
`abort_requested_exception`. Therefore, we don't need a separate
catch statement for `abort_requested_exception` in
`join_node_response_handler`.
2024-01-11 14:19:37 +01:00
Botond Dénes
697ebef149 Merge 'tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk
Make compaction tasks internal. Drop all internal tasks without parents
immediately after they are done.

Fixes: #16735
Refs: #16694.

Closes scylladb/scylladb#16698

* github.com:scylladb/scylladb:
  compaction: make regular compaction tasks internal
  tasks: don't keep internal root tasks after they complete
2024-01-11 12:10:44 +02:00
Nadav Har'El
5762170526 main: fix "starting {}" messages
The supervisor::notify() function expects a single string - not a
format and parameters. Calls we have in main.cc like

    supervisor::notify("starting {}", what);

end up printing the silly message "starting {}". The second parameter
"what" is converted to a bool, also having an unintended consequence
for telling notify we're "ready".

This patch fixes it to call fmt::format, as intended.

Fixes #16728

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16729
2024-01-11 11:43:07 +02:00
Botond Dénes
ac69473bac Merge 'utils/pretty_printers: add "I" specifier support' from Kefu Chai
this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree.

Closes scylladb/scylladb#16726

* github.com:scylladb/scylladb:
  utils/pretty_printers: add "I" specifier support
  utils/pretty_printers: use the formatting of to_hr_size()
2024-01-11 10:54:14 +02:00
Kefu Chai
0c2ef5de54 test/unit/bptree_validation: use "{}" for formatting test_data
before this change, "{:d}" is used for formatting `test_data` y
bptree_stress_test.cc. but the "d" specifier is only used for
formatting integers, not for formatting `test_data` or generic
data types, so this fails when the test is compiled with {fmt} v10,
like:

```
In file included from /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:20:
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:294:35: error: call to consteval function 'fmt::basic_format_string<char, test_data &, test_data &>::basic_format_string<char[31], 0>' is not a constant expression
  294 |             fmt::print(std::cout, "Iterator broken, {:d} != {:d}\n", val, *_fwd);
      |                                   ^
/home/kefu/dev/scylladb/test/unit/bptree_validation.hh:267:20: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::forward_check' requested here
  267 |             return forward_check();
      |                    ^
/home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:92:35: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::step' requested here
   92 |                         if (!itc->step()) {
      |                                   ^
/usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression
 2322 |       if (!in(arg_type, set)) throw_format_error("invalid format specifier");
      |                               ^
```

in this change, instead of specifying "{:d}", let's just use "{}",
which works for both integer and `test_data`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16727
2024-01-11 10:53:33 +02:00
Kefu Chai
6c06751640 cdc: not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16725
2024-01-11 09:13:37 +02:00
Kefu Chai
5874652967 cql3: define format_as() for formatting cql3::cql3_type
in the same spirit of 724a6e26, format_as() is defined for
cql3::cql3_type. despite that this is not used yet by fmt v9,
where we still have FMT_DEPRECATED_OSTREAM, this prepares us for
fmt v10.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16232
2024-01-11 09:07:18 +02:00
Botond Dénes
3d1667c720 Update ./tools/java submodule
* ./tools/java e106b500...24e51259 (1):
  > build.xml: update io.airlift to 0.9
2024-01-11 08:55:51 +02:00
Lakshmi Narayanan Sreethar
76f0d5e35b reader_permit: store schema_ptr instead of raw schema pointer
Store schema_ptr in reader permit instead of storing a const pointer to
schema to ensure that the schema doesn't get changed elsewhere when the
permit is holding on to it. Also update the constructors and all the
relevant callers to pass down schema_ptr instead of a raw pointer.

Fixes #16180

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16658
2024-01-11 08:37:56 +02:00
Kefu Chai
f11a53856d utils/pretty_printers: add "I" specifier support
this is to mimic the formatting of `human_readable_value`, and
to prepare for consolidating these two formatters, so we don't have
two pretty printers in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:47 +08:00
Patryk Wrobel
f4e311e871 cql3: add formatter for cql3::expr::oper_t
This change introduces a specialization of fmt::formatter
for cql3::expr::oper_t. This enables the usage of this
type with FMTv10, which dropped the default generated formatter.

Usage of cql3::expr::oper_t without the defined formatter
resulted in compilation error when compiled with FMTv10.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16719
2024-01-11 08:33:35 +02:00
Kefu Chai
7d627b328f utils/pretty_printers: use the formatting of to_hr_size()
keep the precision of 4 digits, for instance, so that we format
"8191" as "8191" instead of as "8 Ki". this is modeled after
the behavior of `to_hr_size()`. for better user experience.
and also prepares to consolidate these two formatters.

tests are updated to exercise both IEC and SI notations.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:03 +08:00
Kefu Chai
8c4576f55d api: storage_service: correct the descriptions of two APIs
this change is more about documentation of the RESTful API of
storage_service. as we define the API using Swagger 2.0 format, and
generate the API document from the definitions. so would be great
if the document matches with the API.

in this change, since the keyspace is not queried but mutated. so
changed to a more accurate description.

from the code perspective, it is but cosmetic. as we don't read the
description fields or verify them in our tests.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16637
2024-01-11 08:28:14 +02:00
Kamil Braun
6e39c2ffde gossiper: store source_location instead of string in endpoint_permit
The original code extracted only the function_name from the
source_location for logging. We'll use more information from the
source_location in later commits.
2024-01-10 17:02:52 +01:00
Kamil Braun
664349a10f storage_service: more verbose logging in handle_state_normal
In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.

Improve the INFO level logging in handle_state_normal to aid debugging
in the future.

The amount of logs is still constant per-node. Even though some log
messages report all tokens owned by a node, handle_state_normal calls
are still rare. The most "spammy" situation is when a node starts and
calls handle_state_normal for every other node in the cluster, but it is
a once-per-startup event.
2024-01-10 16:39:55 +01:00
Patryk Wrobel
a64eb92369 utils: specialize fmt::formatter for utils::tagged_integer
This change introduces a specialization of fmt::formatter
for utils::tagged_integer. This enables the usage of this
type with FMTv10, which dropped the default generated formatter.

Usage of utils::tagged_integer without the defined formatter
resulted in compilation error when compiled with FMTv10.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16715
2024-01-10 18:32:43 +03:00
Nadav Har'El
083868508c Update seastar submodule
* seastar 70349b74...0ffed835 (15):
  > http/client: include used header files
  > treewide: s/format/fmt::format/ when appropriate
  > shared_future: shared_state::run_and_dispose(): release reserve of _peers

Fixes #16493

  > metrics_tester - A demo app to test metrics
  > build: silence the waring of -Winclude-angled-in-module-purview
  > estimated_histogram.hh: Support native histograms
  > prometheus.cc: Clean the pick representation code
  > prometheus.cc add native histogram
  > memory: fix the indentation.
  > metrics_types.hh: add optional native histogram information
  > memory: include used header
  > prometheus.cc: Add filter, aggregate by label and skip_when_empty
  > src/proto/metrics2.proto: newer proto buf definition
  > print: deprecate format_separated()
  > reactor: use fmt::join() when appropriate

Closes scylladb/scylladb#16712
2024-01-10 14:02:04 +02:00
Nadav Har'El
39dd2a2690 cql-pytest: translated Cassandra's test for LWT with static column
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertUpdateIfConditionStaticsTest.java into our
cql-pytest framework.

This test file checks various LWT conditional updates which involve
static columns or UDTs (there are separate test file for LWT conditional
updates that do not involve static columns).

This test did not uncover any new bugs, but demonstrates yet again
several places where we intentionally deviated from Cassandra's behavior,
forcing me to add "is_scylla" checks in many of the checks to allow
them to pass on both Scylla and Cassanda. These deviations are known,
intentional and some are documented in docs/kb/lwt-differences.rst but
not all, so it's worth listing here the ones re-discovered by this test:

    1. On a successful conditional write, Cassandra returns just True, Scylla
       also returns the old contents of the row. This difference is officially
       documented in docs/kb/lwt-differences.rst.

    2. On a batch request, Scylla always returns a row per statement,
       Cassandra doesn't - it often returns just a single failed row,
       or just True if the whole batch succeeded. This difference is
       officially documented in docs/kb/lwt-differences.rst.

    3. In a DELETE statement with a condition, in the returned row
       Cassandra lists the deleted column first - while Scylla lists
       the static column first (as in any other row). This difference
       is probably inconsequential, because columns also have names
       so their order in the response usually doesn't matter.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16643
2024-01-10 12:14:06 +02:00
Nadav Har'El
b1a441ba56 test/cql-pytest: correct xfail status of timestamp parser
The recently-added test test_fromjson_timestamp_submilli demonstrated a
difference between Scylla's and Cassandra's parsing timestamps in JSON:
Trying to use too many (more than 3) digits of precision is forbidden
in Scylla, but ignored in Cassandra. So we marked the test "xfail",
suggesting we think it's a Scylla bug that should be fixed in the future.

However, it turns out that we already had a different test,
test_type_timestamp_from_string_overprecise, which showed the same
difference in a different context (without JSON). In that older test,
the decision was to consider this a Cassandra bug, not Scylla bug -
because Cassandra seemingly allows the sub-millisecond timestap but
in reality drops the extra precision.

So we need to be consistent in the tests - this is either a Scylla bug
or a Cassandra bug, we can't make once choice in one test and another
in a different test :-) So let's accept our older decision, and consider
Scylla's behavior the correct one in this case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16586
2024-01-10 12:12:26 +02:00
Kefu Chai
eb9216ef11 compaction: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16707
2024-01-10 11:07:36 +02:00
Kefu Chai
317af97e41 test/pylib: shutdown unix RESTful client
when stopping the ManagerClient, it would be better to close
all connected connector, otherwise aiohttp complains like:

```
13:57:53.763 ERROR> Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f939d2ca5f0>, 96672.211256817)]']
connector: <aiohttp.connector.UnixConnector object at 0x7f939d2da890>
```

this warning message is printed to the console, and it is distracting
when testing manually.

so, in this change, let's close the client connecting to unix domain
socket.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16675
2024-01-10 11:07:14 +02:00
Kefu Chai
f61f6c27e3 gms: add formatter for gms::endpoint_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for gms::endpoint_state, and
change update the callers of `operator<<` to use `fmt::print()`.
but we cannot drop `operator<<` yet, as we are still using the
templated operator<< and templated fmt::formatter to print containers
in scylla and in seastar -- they are still using `operator<<`
under the hood.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16705
2024-01-10 09:16:23 +02:00
Sylwia Szunejko
eabe97bcd0 transport: remove additional options from TABLETS_ROUTING_V1
Closes scylladb/scylladb#16701
2024-01-10 09:00:25 +02:00
Botond Dénes
5981900dca Update tools/jmx submodule
* tools/jmx 80ce5996...3257897a (1):
  > scylla-apiclient: drop hk2-locator dependency
2024-01-10 08:53:20 +02:00
Kefu Chai
34b03867b2 tools: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16673
2024-01-10 08:44:09 +02:00
Kefu Chai
0dc7db54d1 build: cmake: add "unit_test_list" target
this target is used by test.py for enumerating unit tests

* test/CMakeLists.txt: append executable's full path to
  `scylla_tests`. add `unit_test_list` target printing
  `scylla_tests`, please note, `cmake -E echo` does not
  support the `-e` option of `echo`, and ninja does not
  support command line with newline in it, we have to use
  `echo` to print the list of tests.
* test/{boost,raft,unit}/CMakeLists.txt: set scylla_tests
  only if $PWD/suite.yaml exists. we could hardwire this
  logic in these files, as it is known that this file
  exists in these directory, but this is still put this way,
  so that it serves as a comment explaining that the reason
  why we update scylla_tests here but not somewhere else
  where we also use `add_scylla_test()` function is just
  suite.yaml exists here.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16702
2024-01-10 08:43:04 +02:00
Botond Dénes
4aba445ef6 Merge 'test.py: adapt to cmake building system' from Kefu Chai
in this series, we adapt to cmake building system by mapping scylla build mode to `CMAKE_BUILD_TYPE` and by using `build/build.ninja` if it exists, as `configure.py` generates `build.ninja` in `build` when using CMake for creating `build.ninja`.

Closes scylladb/scylladb#16703

* github.com:scylladb/scylladb:
  test.py: build using build/build.ninja when it exists
  test.py: extract ninja()
  test.py: extract path_to()
  test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE
2024-01-10 08:39:33 +02:00
Kefu Chai
382a5e2d0c test.py: build using build/build.ninja when it exists
CMake puts `build.ninja` under `build`, so use it if it exists, and
fall back to current directory otherwise.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
6674e87842 test.py: extract ninja()
use ninja() to build target using `ninja`. since CMake puts
`build.ninja` under "build", while `configure.py` puts it under
the root source directory, this change prepares us for a follow-up
change to build with build/build.ninja.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
5fda822c4e test.py: extract path_to()
use path_to() to find the path to the directory under build directory.

this change helps to find the executables built using CMake as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
0b11ae9fe6 test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE
because scylla build mode and CMAKE_BUILD_TYPE is not identical,
let's define `all_modes` as a dict so we can look it up.
this change prepares for a follow-up commit which adds a path
resolver which support both build system generator: the plain
`configure.py` and CMake driven by `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Botond Dénes
f4f724921c load_meter: get_load_map(): don't unconditionally dereference _lb
Said method has a check on `_lb` not being null, before accessing it.
However, since 0e5754a, there was an unconditional access, adding an
entry for the local node. Move this inside the if, so it is covered by
the null-check. The only caller is the api (probably nodetool), the
worst that can happend is that they get completely empty load-map if
they call too early during startup.

Fixes: #16617

Closes scylladb/scylladb#16659
2024-01-09 16:02:12 +03:00
Aleksandra Martyniuk
6b87778ef2 compaction: make regular compaction tasks internal
Regular compaction tasks are internal.

Adjust test_compaction_task accordingly: modify test_regular_compaction_task,
delete test_running_compaction_task_abort (relying on regular compaction)
which checks are already achived by test_not_created_compaction_task_abort.
Rename the latter.
2024-01-09 13:13:54 +01:00
Aleksandra Martyniuk
6b2b384c83 tasks: don't keep internal root tasks after they complete 2024-01-09 13:13:54 +01:00
Pavel Emelyanov
cdf5124003 Merge 'tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()' from Botond Dénes
The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level.
If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR).

Fixes: #16538

Closes scylladb/scylladb#16657

* github.com:scylladb/scylladb:
  tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()
  tools/scylla-sstable: allow always passing --scylla-yaml-file option
2024-01-09 14:28:49 +03:00
Kefu Chai
b91eb89ffa gms: heart_beat_state: add formatter for gms::heart_beat_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for gms::heart_beat_state, and
remove its operator<<(). the only caller site of its operator<< is
updated to use `fmt::print()`

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16652
2024-01-09 11:52:40 +02:00
Kefu Chai
cca786e847 gms: endpoint_state: fix a typo in comment
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16653
2024-01-09 11:51:49 +02:00
Kefu Chai
c1beba1f7d utils: config_file: throw bpo::invalid_option_value() when seeing invalid option
before this change, `std::invalid_argument` is thrown by
`bpo::notify(configuration)` in `app_template::run_deprecated()` when
invalid option is passed in via command line. `utils::named_value`
throws `std::invalid_argument` if the given value is not listed in
`_allowed_values`. but we don't handle `std::invalid_argument` in
`app_template::run_deprecated()`. so the application aborts with
unhandled exception if the specified argument is not allowed.

in this change, we convert the `std::invalid_argument` to a
derived class of `bpo::error` in the customized notify handler,
so that it can be handled in `app_template::run_deprecated()`.

because `name_value::operator()` is also used otherwhere, we
should not throw a bpo::error there. so its exception type
is preserved.

Fixes #16687
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16688
2024-01-09 11:49:06 +02:00
Kefu Chai
a6152cb87b sstables: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16666
2024-01-09 11:45:44 +02:00
Kefu Chai
be364d30fd db: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16664
2024-01-09 11:44:19 +02:00
Aleksandra Martyniuk
6f13e55187 tasks: call release_resources when task is finished
Call task_manager::task::impl::release_resources when task is finished
instead of putting the responsibility on user.

Closes scylladb/scylladb#16660
2024-01-09 11:41:54 +02:00
Pavel Emelyanov
cfeff893c6 network_topology_strategy: Print map of dc:rf pairs in one go
The strategy constructor prints the dc:rf at the end making the sstring
for it by hand. Modern fmt-based logger can format unordered_map-s on
its own. The message would look slightly different though:

  Configured datacenter replicas are: foo:1 bar:2

into

  Configured datacenter replicas are: {"foo": 1, "bar": 2}

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16443
2024-01-09 11:30:49 +02:00
Kamil Braun
d93074e87e cql3: don't parallelize select aggregates to local tables
We've observed errors during shutdown like the following:
```
ERROR 2023-12-26 17:36:17,413 [shard 0:main] raft - [088f01a3-a18b-4821-b027-9f49e55c1926] applier fiber stopped because of the error: std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down)
INFO  2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft_state_monitor_fiber aborted with raft::stopped_error (Raft instance is stopped)
ERROR 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft topology: failed to fence previous coordinator raft::stopped_error (Raft instance is stopped, reason: "background error, std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down)")
```

some CQL statement execution was trying to use `forward_service` during
shutdown.

It turns out that the statement is in
`system_keyspace::load_topology_state`:
```
auto gen_rows = co_await execute_cql(
    format("SELECT count(range_end) as cnt FROM {}.{} WHERE key = '{}' AND id = ?",
           NAME, CDC_GENERATIONS_V3, cdc::CDC_GENERATIONS_V3_KEY),
    gen_uuid);
```
It's querying a table in the `system` keyspace.

Pushing local table queries through `forward_service` doesn't make sense
as the data is not distributed. Excluding local tables from this logic
also fixes the shutdown error.

Fixes scylladb/scylladb#16570

Closes scylladb/scylladb#16662
2024-01-08 14:44:22 -05:00
Kamil Braun
d4f4b58f3a Merge 'topology_coordinator: reject removenode if the removed node is alive' from Patryk Jędrzejczak
The removenode operation is defined to succeed only if the node
being removed is dead. Currently, we reject this operation on the
initiator side (in `storage_service::raft_removenode`) when the
failure detector considers the node being removed alive. However,
it is possible that even if the initiator considers the node dead,
the topology coordinator will consider it alive when handling the
topology request. For example, the topology coordinator can use
a bigger failure detector timeout, or the node being removed can
suddenly resurrect.

This PR makes the topology coordinator reject removenode if the
node being removed is considered alive. It also adds
`test_remove_alive_node` that verifies this change.

Fixes scylladb/scylladb#16109

Closes scylladb/scylladb#16584

* github.com:scylladb/scylladb:
  test: add test_remove_alive_node
  topology_coordinator: reject removenode if the removed node is alive
  test: ManagerClient: remove unused wait_for_host_down
  test: remove_node: wait until the node being removed is dead
2024-01-08 12:39:23 +01:00
Kamil Braun
d11e824802 Merge 'storage_service: make all Raft-based operations abortable' from Patryk Jędrzejczak
During a shutdown, we call `storage_service::stop_transport` first.
We may try to apply a Raft command after that, or still be in the
the process of applying a command. In such a case, the shutdown
process will hang because Raft retries replicating a command until
it succeeds even in the case of a network error. It will stop when
a corresponding abort source is set. However, if we pass `nullptr`
to a function like `add_entry`, it won't stop. The shutdown
process will hang forever.

We fix all places that incorrectly pass `nullptr`. These shutdown
hangs are not only theoretical. The incorrect `add_entry` call in
`update_topology_state` caused scylladb/scylladb#16435.

Additionally, we remove the default `nullptr` values in all member
functions of `server` and `raft_group0_client` to avoid similar bugs
in the future.

Fixes scylladb/scylladb#16435

Closes scylladb/scylladb#16663

* github.com:scylladb/scylladb:
  server, raft_group0_client: remove the default nullptr values
  storage_service: make all Raft-based operations abortable
2024-01-08 11:30:56 +01:00
Botond Dénes
9119bcbd67 tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()
The default error handler throws an exception, which means
scylla-sstable will exit with exception if there is any problem in the
configuration. Not even ScyllaDB itself is this harsh -- it will just
log a warning for most errors. A tool should be much more lenient. So
this patch passes an error handler which just logs all errors with debug
level.
If reading an sstable fails, the user is expected to investigate turning
debug-level logging on. When they do so, they will see any problems
while reading the configuration (if it is relevant, e.g. when using EAR).

Fixes: #16538
2024-01-08 02:18:15 -05:00
Botond Dénes
16791a63c9 tools/scylla-sstable: allow always passing --scylla-yaml-file option
Currently, if multiple schema sources are provided, the tool complains
about ambiguity, over which to consider. One of these option is
--scylla-yaml-file. However, we want to allow passing this option any
time, otherwise encrypted sstables cannot be read. So relax the multiple
schema source check to also allow this option to be used even when e.g.
--schema-file was used as the schema source.
2024-01-08 02:18:12 -05:00
Nadav Har'El
61395a3658 Update tools/java submodule
* tools/java b7ebfd38...e106b500 (3):
  > build.xml: update scylla-driver-core to 3.11.5.1
  > Use ReplicaOrdering.NEUTRAL in TokenAwarePolicy to respect RackAwareness
  > treewide: update "guava" package

Refs https://github.com/scylladb/scylladb/pull/16491
Refs https://github.com/scylladb/scylla-tools-java/pull/372
2024-01-07 15:12:15 +02:00
Patryk Jędrzejczak
df2034ebd7 server, raft_group0_client: remove the default nullptr values
The previous commit has fixed 5 bugs of the same type - incorrectly
passing the default nullptr to one of the changed functions. At
least some of these bugs wouldn't appear if there was no default
value. It's much harder to make this kind of a bug if you have to
write "nullptr". It's also much easier to detect it in review.

Moreover, these default values are rarely used outside tests.
Keeping them is just not worth the time spent on debugging.
2024-01-05 18:45:50 +01:00
Patryk Jędrzejczak
3d4af4ecf1 storage_service: make all Raft-based operations abortable
During a shutdown, we call `storage_service::stop_transport` first.
We may try to apply a Raft command after that, or still be in the
the process of applying a command. In such a case, the shutdown
process will hang because Raft retries replicating a command until
it succeeds even in the case of a network error. It will stop when
a corresponding abort source is set. However, if we pass `nullptr`
to a function like `add_entry`, it won't stop. The shutdown
process will hang forever.

We fix all places that incorrectly pass `nullptr`. These shutdown
hangs are not only theoretical. The incorrect `add_entry` call in
`update_topology_state` caused scylladb/scylladb#16435.
2024-01-05 18:45:20 +01:00
Kefu Chai
7e84e03f52 gms: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

because the removal of `#include "unimplemented.hh"`,
`service/migration_manager.cc` misses the definition of
`unimplemented::cause::VALIDATION`, so include the header where it is
used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16654
2024-01-05 13:37:08 +02:00
Nadav Har'El
94580df1c5 test/alternator: fix flaky test in test_filter_expression.py
The test test_filter_expression.py::test_filter_expression_precedence
is flaky - and can fail very rarely (so far we've only actually seen it
fail once). The problem is that the test generates items with random
clustering keys, chosen as an integer between 1 and 1 million, and there
is a chance (roughly 2/10,000) that two of the 20 items happen to have the
same key, so one of the items is "lost" and the comparison we do to the
expected truth fails.

The solution is to just use sequential keys, not random keys.
There is nothing to gain in this test by using random keys.

To make this test bug easy to reproduce, I temporarily changed
random_i()'s range from 1,000,000 to 3, and saw the test failing every
single run before this patch. After this patch - no longer using
random_i() for the keys - the test doesn't fail any more.

Fixes #16647

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16649
2024-01-04 21:36:40 +02:00
Kamil Braun
bf068dd023 Merge handle error in cdc generation propagation during bootstrap from Gleb
Bootstrap cannot proceed if cdc generation propagation to all nodes
fails, so the patch series handles the error by rolling the ongoing
topology operation back.

* 'gleb/raft-cdc-failure' of github.com:scylladb/scylla-dev:
  test: add test to check failure handling in cdc generation commit
  storage_service: topology coordinator: rollback on failure to commit cdc generation
2024-01-04 15:38:51 +01:00
Kamil Braun
f942bf4a1f Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy
Currently, `add_saved_endpoint` is called from two paths:  One, is when
loading states from system.peers in the join path (join_cluster,
join_token_ring), when `_raft_topology_change_enabled` is false, and the
other is from `storage_service::topology_state_load` when raft topology
changes are enabled.

In the later path, from `topology_state_load`, `add_saved_endpoint` is
called only if the endpoint_state does not exist yet.  However, this is
checked without acquiring the endpoint_lock and so it races with the
gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint
state may already be populated.

Since `add_saved_endpoint` applies local information about the endpoint
state (e.g. tokens, dc, rack), it uses the local heart_beat_version,
with generation=0 to update the endpoint states, and that is
incompatible with changes applies via gossip that will carry the
endpoint's generation and version, determining the state's update order.

This change makes sure that the endpoint state is never update in
`add_saved_endpoint` if it has non-zero generation.  An internal error
exception is thrown if non-zero generation is found, and in the only
call site that might reach that state, in
`storage_service::topology_state_load`, the caller acquires the
endpoint_lock for checking for the existence of the endpoint_state,
calling `add_saved_endpoint` under the lock only if the endpoint_state
does not exist.

Fixes #16429

Closes scylladb/scylladb#16432

* github.com:scylladb/scylladb:
  gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
  storage_service: topology_state_load: lock endpoint for add_saved_endpoint
  raft_group_registry: move on_alive error injection to gossiper
2024-01-04 14:47:10 +01:00
qiulijuan2
7fa2c33ba1 replica: remove duplicated function calling
set_skip_when_empty is duplicated of metric column_family_row_hits in replica/table.cc

fix: #16582

Signed-off-by: qiulijuan2<qiulijuan2_yewu@cmss.chinamobile.com>

Closes scylladb/scylladb#16581
2024-01-04 15:04:31 +02:00
Kefu Chai
ee28a1cf4b build: enable -Wimplicit-int-float-conversion
a209ae15 addresses that last -Wimplicit-int-float-conversion warning
in the tree, so we now have the luxury of enabling this warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16640
2024-01-04 12:45:23 +02:00
Botond Dénes
9f0bd62d78 test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI 2024-01-04 03:20:17 -05:00
Botond Dénes
58d5339baa test/cql-pytest: test_tools.py: extract some fixture logic to functions
Namely, the fixture for preparing an sstable and the fixture for
producing a reference dump (from an sstable). In the next patch we will
add more similar fixtures, this patch enables them to share their core
logic, without repeating code.
2024-01-04 03:20:17 -05:00
Botond Dénes
f7d59b3af0 test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class
In the next patch, we want to add schema-load tests specific to views
and indexes. Best to place these into a separate class, so extract the
to-be-shared parts into a common base-class.
2024-01-04 03:20:17 -05:00
Botond Dénes
bea21657ec tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas
The table information of MVs (either user-created, or those backing a
secondary index) is stored in system_schema.views, not
system_schema.tables. So load this table when system_schema.tables has
no entries for the looked-up table. Base table schema is not loaded.
2024-01-04 03:20:17 -05:00
Botond Dénes
79a006d6a8 tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas
The underlying infrastructure (`load_schemas()`) already supports
loading views and inedxes, extend this to said method.
When loading a view/index, expect `load_schemas()` to return two
schemas. The first is the base schema, the second is the view/index
schema (this is validated). Only the latter is returned.
2024-01-04 03:20:17 -05:00
Botond Dénes
276bb16013 test/boost/schema_loader_test: add test for mvs and indexes 2024-01-04 03:20:17 -05:00
Botond Dénes
f5d4c1216e tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL
Add support for processing cql3::statement::create_view_statement and
cql3::statement::create_index_statement statements. The CQL text
(usually a file) has to provide the definition of the base table,
before the definition of the views/indexes.
2024-01-04 03:20:17 -05:00
Botond Dénes
94aac35169 replica/database: extract existing_index_names and get_available_index_name
To standalone functions in index/secondary_index_manager.{hh,cc}. This
way, alternative data dictionary implementations (in
tools/schema_loader.cc), can also re-use this code without having to
instantiate a database or resorting to copy-paste.

The functions are slighly changed: there are some additional params
added to cover for things not internally available in the database
object. const sstring& is converted to std::string_view.
2024-01-04 03:20:17 -05:00
Kefu Chai
cf932888de Update seastar submodule
* seastar e0d515b6...70349b74 (33):
  > util/log: drop unused function
  > util/log, rpc, core: use compile-time formatting with fmtlib >= 8.0
  > Fix edge case in memory sampler at OOM
  > exp/geo distribution benchmark
  > Additional allocation tests
  > Remove null pointer check on free hot path
  > Optimize final part of allocation hot path
  > Optimize zero size checking in allocator
  > memory: Optimize free fast path
  > memory: Optimize small alloc alloation path
  > memory: Limit alloc_sites size
  > memory: Add general comment about sampling strategy
  > memory: Use probabilistic sampler
  > util: Adapt memory sampler to seastar
  > util: Import Android Memory Sampler
  > memory: Use separate small pool for tracking sampled allocations
  > memory: Support enabling memory profiling at runtime
  > util/source_location-compat: mark `source_location::current()` consteval
  > build: use new behavior defined by CMP0155 when building C++ modules
  > circleci: build with C++20 modules enabled
  > seastar.cc: replace cryptopp with gnutls when building seastar modules
  > alien: include used header
  > seastar.cc: include used headers in the global purview
  > docker: install clang-tools-17
  > net/tcp: generate a random src_port hashed to current shard if smp::count > 1
  > net, websocket: replace Crypto++ calls with GnuTLS
  > README-DPDK.md: point user to DPDK's quick start guide
  > reactor: print fatal error using logger as well
  > Avoid ping-pong in spinlock::lock
  > memory: Add allocator perf tests
  > memory: Add a basic sized deletion test
  > Prometheus: Disable Prometheus protobuf with a configuration
  > treewide: bring back prometheus protobuf support
* test/manual/sstable_scan_footprint_test: update to adapt to the
  breaking change of "memory: Use probabilistic sampler" in seastar

Closes scylladb/scylladb#16610
2024-01-04 09:36:53 +02:00
Kefu Chai
47d8edc0fc test.py: s/asyncio.get_event_loop()/asyncio.get_running_loop()/
the latter raises a RuntimeError if there is no no running event loop,
while the former gets one from the the default policy in this case.
in the use cases in test.py, there is always a running event loop,
when `asyncio.get_event_loop()` gets called. so let's use
the preferred `asyncio.get_running_loop()`.

see https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16398
2024-01-04 08:39:49 +02:00
Botond Dénes
d9c30833ea tools/schema_loader: make real_db.tables the only source of truth on existing tables
Currently, we have `real_db.tables` and `schemas`, the former containing
system tables needed to parse statements, and the latter accumulating
user tables parsed from CQL. This will be error-prone to maintain with
view/index support, so ditch `schemas` and instead add a `user` flag to
`table` and accumulate all tables in `real_db.tables`.
At the end, just return the schemas of all user tables.
2024-01-04 01:32:10 -05:00
Botond Dénes
ef3d143886 tools/schema_loader: table(): store const keyspace&
No need for mutable reference, const ref makes life easier, because some
lookup APIs of data_dictinary::database return const keyspace& only.
2024-01-04 01:32:10 -05:00
Botond Dénes
1003508066 tools/schema_loader: make database,keyspace,table non-movable
These types contain self-references. Make sure they are not moved, not
even accidentally.
2024-01-04 01:32:10 -05:00
Botond Dénes
1f7b03672c cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value
Scylla's schema tables code determines which index was added, by diffing
index definitions with previous ones. This is clunky to use in
tools/schema_loader.cc, so also return the index metadata for the newly
created index.
2024-01-04 01:32:10 -05:00
Botond Dénes
94dbb7cb29 cql3/statements/create_index_statement: make build_index_schema() public
tools/schema_builder.cc wants it.
2024-01-04 01:32:10 -05:00
Botond Dénes
039d41f5d4 cql3/statements/create_index_statement: relax some method's dependence on qp
The methods `validate_while_excuting()` and its only caller,
`build_index_schema()`, only use the query processor to get db from it.
So replace qp parameter with db one, relaxing requirements w.r.t.
callers.
2024-01-04 01:32:10 -05:00
Botond Dénes
5f42c2c7c4 cql3/statements/create_view_statement: make prepare_view() public
tools/schema_loader.cc wants to use it.
2024-01-04 01:32:10 -05:00
Kefu Chai
50cf62e186 build: cmake: do not link against Boost::dynamic_linking
Boost::dynamic_linking was introduced as a compatibility target
which adds "BOOST_ALL_DYN_LINK" macro on Win32 platform. but since
Scylla only runs on Linux, there is no need to link against this
library.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16544
2024-01-04 08:06:19 +02:00
Lakshmi Narayanan Sreethar
1d6eaf2985 compaction manager: remove: cleanup _compaction_state on exceptions
If for some reason an exception is thrown in compaction_manager::remove,
it might leave behind stale table pointers in _compaction_state. Fix
that by setting up a deffered action to perform the cleanup.

Fixes #16635

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16632
2024-01-03 22:03:24 +02:00
Benny Halevy
9e8998109f gossiper: get_*_members_synchronized: acquire endpoint update semaphore
To ensure that the value they return is synchronized on all shards.

This got broken recently by 147f30caff.

Refs https://github.com/scylladb/scylladb/pull/16597#discussion_r1440445432

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16629
2024-01-03 17:41:46 +01:00
Michał Chojnowski
a209ae1573 cql3: type_json: fix an edge case in float-to-int conversion
Refer to the added comment for details.

This problem was found by a compiler warning, and I'm fixing
it mainly to silence the warning. I didn't give any thought
to its effects in practice.

Fixes #13077

Closes scylladb/scylladb#16625

[avi: changed Refs to Fixes]
2024-01-03 17:59:01 +02:00
Kefu Chai
2ad532df43 test: randomized_nemesis_test: move std::variant formatter up
we format `std::variant<std::monostate, seastar::timed_out_error,
raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown,
raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>`
in this source file. and currently, we format `std::variant<..>` using
the default-generated `fmt::formatter` from `operator<<`, so in order to
format it using {fmt}'s compile-time check enabled, we have to make the
`operator<<` overload for `std::variant<...>` visible from the caller
sites which format `std::variant<...>` using {fmt}.

in this change, the `operator<<` for `std::variant<...>` is moved to
from the middle of the source file to the top of it, so that it can
be found when the compiler looks up for a matched `fmt::formatter`
for `std::variant<...>`.

please note, we cannot use the `fmt::formatter` provided by `fmt/std.h`,
as its specialization for `std::variant` requires that all the types
of the variant is `is_formattable`. but the default generated formatter
for type `T` is not considered as the proof that `T` is formattable.

this should address the FTBFS with the latest seastar like:

```
 /usr/include/fmt/core.h:2743:12: error: call to deleted constructor of 'conditional_t<has_formatter<mapped_type, context>::value, formatter<mapped_type, char_type>, fallback_formatter<stripped_type, char_type>>' (aka 'fmt::detail::fallback_formatter<std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>>')
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16616
2024-01-03 16:38:25 +01:00
Kefu Chai
2c394e3f6f tablets: remove unused #includes
the removed #include headers are not used, so let's drop their
`#include`s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16619
2024-01-03 15:30:40 +01:00
Avi Kivity
20531872a7 Merge 'test: randomized_nemesis_test: add formatter for append_entry' from Kefu Chai
we are using `seastar::format()` to format `append_entry` in
`append_reg_model`, so we have to provide a `fmt::formatter` for
these callers which format `append_entry`.

despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined
by fmt v9, we don't have it since fmt v10. so this change prepares us
for fmt v10.

Refs https://github.com/scylladb/scylladb/issues/13245

Closes scylladb/scylladb#16614

* github.com:scylladb/scylladb:
  test: randomized_nemesis_test: add formatter for append_entry
  test: randomized_nemesis_test: move append_reg_model::entry out
2024-01-03 15:06:33 +02:00
Kefu Chai
dde8f694f6 build: cmake: use # for line comment
it was a copy-pasta error introduced by 2508d339. the copyright
blob was copied from a C++ source code, but the CMake language
define the block comment is different from the C++ language.

let's use the line comment of CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16615
2024-01-03 15:05:00 +02:00
Tomasz Grabiec
715e062d4a Merge 'table, memtable: share log structured allocator statistics across all tablets in a table' from Avi Kivity
In 7d5e22b43b ("replica: memtable: don't forget memtable
memory allocation statistics") we taught memtable_list to remember
learned memory allocation reserves so a new memtable inherits these
statistics from an older memtable. Share it now further across tablets
that belong to the same table as well. This helps the statistics be more
accurate for tablets that are migrated in, as they can share existing
tablet's memory allocation history.

Closes scylladb/scylladb#16571

* github.com:scylladb/scylladb:
  table, memtable: share log-structured allocator statistics across all memtables in a table
  memtable: consolidate _read_section, _allocating_section in a struct
2024-01-03 14:03:40 +01:00
Avi Kivity
b8a0e3543e docs: ddl: document the initial_tablets replication strategy option
While the feature is experimental, this makes it easier to experiment
with it.

An example is provided.

Closes scylladb/scylladb#16193
2024-01-03 13:49:30 +01:00
Benny Halevy
147f30caff gossiper: mutate_live_and_unreachable_endpoints: make exception safe
Change the mutate_live_and_unreachable_endpoints procedure
so that the called `func` would mutate a cloned
`live_and_unreachable_endpoints` object in place.

Those are replicated to temporary copies on all shards
using `foreign<unique_ptr<>>` so that the would be
automatically freed on exception.

Only after all copies are made, they are applied
on all gossiper shards in a noexcept loop
and finally, a `on_success` function is called
to apply further side effects if everything else
was replicated successfully.

The latter is still susceptible to exceptions,
but we can live with those as long as `_live_endpoints`
and `_unreachable_endpoints` are synchronized on all shards.

With that, the read-only methods:
`get_live_members_synchronized` and
`get_unreachable_members_synchronized`
become trivial and they just return the required data
from shard 0.

Fixes #15089

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16597
2024-01-03 14:46:10 +02:00
Benny Halevy
fadcef01f5 database: setup_scylla_memory_diagnostics_producer: replace infinity sign with unlimited string
The infinity unicode sign used for dumping read concurrency semaphore
state, `∞` may be misrendered.
For example: https://jenkins.scylladb.com/job/scylla-master/job/dtest-release/451/artifact/logs-full.release.011/1703288463175_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_add_dc_during_mv_insert/node1.log
```
  Read Concurrency Semaphores:
    user: 0/100, 1K/9M, queued: 0
    streaming: 0/10, 0B/9M, queued: 0
    system: 0/10, 0B/9M, queued: 0
    compaction: 0/∞, 0B/∞
```

Instead, just print the word `unlimited`.

This was introduced in 34c213f9bb

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16534
2024-01-03 14:46:10 +02:00
Kefu Chai
3e4159fece repair: remove unused #include
remove the unused #include headers from repair.hh, as they are not
directly used. after this change, task_manager_module.hh fails to
have access to stream_reason, so include it where it is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16618
2024-01-03 14:46:10 +02:00
Kefu Chai
1f4b5126f6 build: cmake: add comment explaining CMAKE_CXX_FLAGS_RELWITHDEBINFO
to clarify why we need to set this flagset instead of appending to it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16546
2024-01-03 14:46:10 +02:00
Kefu Chai
3ef0345b7f test/nodetool: log response from mock server when handling JSONDecodeError
it's observed that the mock server could return something not decodable
as JSON. so let's print out the response in the logging message in this case.
this should help us to understand the test failure better if it surfaces again.

Refs #16542
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16543
2024-01-03 14:46:10 +02:00
Kefu Chai
0484ac46af test: randomized_nemesis_test: add formatter for append_entry
we are using `seastar::format()` to format `append_entry` in
`append_reg_model`, so we have to provide a `fmt::formatter` for
these callers which format `append_entry`.

despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined
by fmt v9, we don't have it since fmt v10. so this change prepares us
for fmt v10.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-03 08:38:43 +08:00
Kefu Chai
32e55731ab test: randomized_nemesis_test: move append_reg_model::entry out
this change prepares for adding fmt::formatter for append_entry.
as we are using its formatter in the inline member functions of
`append_reg_model`. but its `fmt::formatter` can only be specialized out of
this class. and we don't have access to `format_as()` yet in {fmt} 9.1.0
which is shipped along with fedora38, which is in turn used for
our base build image.

so, in this change, `append_reg_model::entry` is extracted and renamed
to `append_entry`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-03 08:38:43 +08:00
Sylwia Szunejko
91a5a41313 add a way to negotiate generation of the tablet info for drivers
Tablets metadata is quite expensive to generate (each data_value is
an allocation), so an old driver (without support for tablets) will
generate huge amounts of such notifications. This commit adds a way
to negotiate generation of the notification: a new driver will ask
for them, and an old driver won't get them. It uses the
OPTIONS/SUPPORTED/STARTUP protocol described in native_protocol_v4.spec.

Closes scylladb/scylladb#16611
2024-01-02 20:00:50 +02:00
Kefu Chai
2508d33946 build: cmake: add Findcryptopp.cmake
seastar dropped the dependency to Crypto++, and it also removed
Findcryptopp.cmake from its `cmake` directory. but scylladb still
depends on this library. and it has been using the `Findcryptopp.cmake`
in seastar submodule for finding it.

after the removal of this file, scylladb would not be able to
use it anymore. so, we have to provide our own `Findcryptopp.cmake`.

Findcryptopp.cmake is copied from the Seastar project. So its
date of copyright is preserved. and it was licensed under Apache 2.0,
since we are creating a derivative work from it. let's relicense
it under Apache 2.0 and AGPL 3.0 or later.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16601
2024-01-02 19:09:50 +02:00
Kefu Chai
34259a03d0 treewide: use consteval string as format string when formatting log message
seastar::logger is using the compile-time format checking by default if
compiled using {fmt} 8.0 and up. and it requires the format string to be
consteval string, otherwise we have to use `fmt::runtime()` explicitly.

so adapt the change, let's use the consteval string when formatting
logging messages.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16612
2024-01-02 19:08:47 +02:00
Kefu Chai
64a227fba0 alternator/auth: remove unused #include
in `alternator/auth.cc`, none of the symbols in "query" namespace
provided by the removed headers is used is used, so there is no
need to include this header file.

the same applies to other removed header files.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16603
2024-01-02 17:50:59 +02:00
Kamil Braun
949658590f Merge 'raft topology: do not update token metadata in on_alive and on_remove' from Patryk Jędrzejczak
In the Raft-based topology, we should never update token metadata
through gossip notifications. `storage_service::on_alive` and
`storage_service::on_remove` do it, so we ignore their parts that
touch token metadata.

Additionally, we improve some logs in other places where we ignore
the function because of using the Raft-based topology.

Fixes scylladb/scylladb#15732

Closes scylladb/scylladb#16528

* github.com:scylladb/scylladb:
  storage_service: handle_state_left, handle_state_normal: improve logs
  raft topology: do not update token metadata in on_alive and on_remove
2024-01-02 16:08:50 +01:00
Kefu Chai
dd496afff3 mutation: add formatter for {atomic_cell_view,atomic_cell}::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for `atomic_cell_view::printer`
and `atomic_cell::printer` respectively, and remove their operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16602
2024-01-02 16:14:42 +02:00
Kamil Braun
7f6955b883 Merge 'test: make use of concurrent bootstrap' from Patryk Jędrzejczak
In #16102, we added a test for concurrent bootstrap in the raft-based
topology. This test was running in CI for some time and
never failed. Now, we can believe that concurrent bootstrap is not
bugged or at least the probability of a failure is very low. Therefore,
we can safely make use of it in all tests using the raft-based topology.

This PR:
- makes all initial servers start concurrently in topology tests,
- replaces all multiple `server_add` calls with a single `servers_add`
  call in tests using the raft-based topology,
- removes no longer needed `test_concurrent_bootstrap`.

The changes listed above:
- make running tests a bit faster due to concurrent bootstraps,
- make multiple tests test concurrent bootstrap previously tested by
  a single test.

Fixes scylladb/scylladb#15423

Closes scylladb/scylladb#16384

* github.com:scylladb/scylladb:
  test: test_different_group0_ids: fix comments
  test: remove test_concurrent_bootstrap
  test: replace multiple server_add calls with servers_add
  test: ScyllaCluster: start all initial servers concurrently
  test: ManagerClient: servers_add: specify consistent-topology-changes assumption
2024-01-02 15:11:18 +01:00
Sylwia Szunejko
467d466f7e put all tablet info into one field of custom_payload and update docs
Previously, the tablet information was sent to the drivers
in two pieces within the custom_payload. We had information
about the replicas under the `tablet_replicas` key and token range
information under `token_range`. These names were quite generic
and might have caused problems for other custom_payload users.
Additionally, dividing the information into two pieces raised
the question of what to do if one key is present while the other
is missing.

This commit changes the serialization mechanism to pack all information
under one specific name, `tablets-routing-v1`.

From: Sylwia Szunejko <sylwia.szunejko@scylladb.com>

Closes scylladb/scylladb#16148
2024-01-02 14:35:37 +02:00
Patryk Jędrzejczak
215534d527 test: test_different_group0_ids: fix comments
The test disables consistent topology changes, not cluster
management.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
466723a74f test: remove test_concurrent_bootstrap
This test only adds 3 nodes concurrently to the empty cluster.
After making many other tests use ManagerClient.servers_add, it
serves no purpose.

We had added this test before we decided to use
ManagerClient.servers_add in many tests to avoid multiple failures
in CI if it turned out that the concurrent bootstrap is flaky with
high frequency there. This test was running in CI for some time and
never failed. Now, we can believe that concurrent bootstrap is not
bugged or at least the probability of a failure is very low.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
a8513bd41b test: replace multiple server_add calls with servers_add
ManagerClient.servers_add can be used in every test that uses
consistent topology changes. We replace all multiple server_add
calls in such tests with a single servers_add call to make these
tests faster and simplify their code. Additionally, these
servers_add calls will test concurrent bootstraps for free.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
debf1db3ef test: ScyllaCluster: start all initial servers concurrently
Starting all initial servers concurrently makes tests in suites
with initial_size > 1 run a bit faster. Additionally, these tests
test concurrent bootstraps for free.

add_servers can be called only if the cluster uses consistent
topology changes. We can use this function unconditionally in
install_and_start because every suite uses consistent topology
changes by default. The only way to not use it is by adding all
servers with a config that contains experimental_features without
consistent-topology-changes.
2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak
16b0eeb3d6 test: ManagerClient: servers_add: specify consistent-topology-changes assumption
ManagerClient.servers_add can be called only if the cluster uses
consistent topology changes. We add this specification to the
leading comment.
2024-01-02 12:19:31 +01:00
Kefu Chai
f4bd86384b install.sh: use a temporary file when packaging scylla.yaml
we create a default `scylla.yaml` on the fly in `install.sh`. but
the path to the temporary file holding the default yaml file is
hardwired to `/tmp/scylla.yaml`. this works fine if we only have a
single `install.sh` at a certain time point. but if we have multiple
`install.sh` process running in parallel, these packaging jobs could
step on each other when they create and remove the `scylla.yaml`.

in this change, because the limit of `installconfig`, it always consider
the "dest" parameter as a directory, `mktemp` is used for creating a
parent directory of the temporary file.

Fixes #16591
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16592
2024-01-01 21:50:29 +02:00
Kefu Chai
48b8544a63 .git: add skip more words and directories
we use "ue" for the short of "update_expressions", before we change
our minds and use a more readable name, let's add "ue" to the
"ignore_word_list" option of the codespell.

also, use the abslolute path in "skip" option. as the absolute paths
are also used by codespell's own github workflow. and we are still
observing codespell github workflow is showing the misspelling errors
in our "test/" directory even we have it listed in "skip". so this
change should silence them as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16593
2024-01-01 14:32:16 +02:00
Avi Kivity
8ba0decda5 Merge 'System.peers: enforce host_id' from Benny Halevy
The HOST_ID is already written to system.peers since inception pretty much (See https://github.com/scylladb/scylladb/pull/16376#discussion_r1429248185 for details).

However, it is written to the table using an individual CQL query and so it is not set atomically with other columns.
If scylla crashes or even hits an exception before updating the host_id, then system.peers might be left in an inconsistent state, and in particular without no HOST_ID value.

This series makes sure that HOST_ID is written to system.peers and use it to "seal" the record by upserting it in a single CQL BATCH query when adding the state for new nodes.

On the read side, skip rows that have no HOST_ID state in system.peers, assuming they are incomplete, i.e. scylla got an exception or crashed while writing them, so they can't be trusted.

With that change we can assume that endpoint state loaded from system.peers will always have a valid host_id.

Refs https://github.com/scylladb/scylladb/pull/15903

Closes scylladb/scylladb#16376

* github.com:scylladb/scylladb:
  gms: endpoint_state: change application_state_map to std::unordered_map
  system_keyspace: update_peer_info: drop single-column overloads
  storage_service: drop do_update_system_peers_table
  storage_service: on_change: fixup indentation
  endpoint_state subscriptions: batch on_change notification
  everywhere: drop before_change subscription
  system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id
  system_keyspace: drop update_tokens(endpoint, tokens) overload
  storage_service: seal peer info with host_id
  storage_service: update_peer_info: pass peer_info to sys_ks
  gms: endpoint_state: define application_state_map
  system_keyspace: update_peer_info: use struct peer_info for all optional values
  query_processor: execute_internal: support unset values
  types: add data_value_list
  system_keyspace: get rid of update_cached_values
  storage_service: do not update peer info for this node
2023-12-31 21:22:04 +02:00
Benny Halevy
cdd5605d81 gms: endpoint_state: change application_state_map to std::unordered_map
State changes are processed as a batch and
there is no reason to maintain them as an ordered map.
Instead, use a std::unordered_map that is more efficient.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
c520fc23f0 system_keyspace: update_peer_info: drop single-column overloads
They are no longer used.
Instead, all callers now pass peer_info.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
0e5a666e6f storage_service: drop do_update_system_peers_table
It is no longer used after previous patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
13d395fa6a storage_service: on_change: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
ad8a9104d8 endpoint_state subscriptions: batch on_change notification
Rather than calling on_change for each particular
application_state, pass an endpoint_state::map_type
with all changed states, to be processed as a batch.

In particular, thise allows storage_service::on_change
to update_peer_info once for all changed states.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
1d07a596bf everywhere: drop before_change subscription
None of the subscribers is doing anything before_change.
This is done before changing `on_change` in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
7670f60b83 system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id
Skip rows that have no host_id to make
sure the node state we load always has a valid host_id.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
74159bb5ae system_keyspace: drop update_tokens(endpoint, tokens) overload
It is unused now after the previous patch
to update_peer_info in one call.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
2075c85b70 storage_service: seal peer info with host_id
When adding a peer via update_peer_info,
insert all columns in a single query
using system_keyspace::peer_info.
This ensures that `host_id` is inserted along with all
other app states, so we can rely on it
when loading the peer info after restart.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
eb4cd388ce storage_service: update_peer_info: pass peer_info to sys_ks
Use the newly added system_keyspace::peer_info
to pass a struct of all optional system.peea members
to system_keyspace::update_peer_info.

Add `get_peer_info_for_update` to construct said struct
from the endpoint state.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
5abf556399 gms: endpoint_state: define application_state_map
Have a central definition for the map held
in the endpoint_state (before changing it to
std::unordered_map).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
b2735d47f7 system_keyspace: update_peer_info: use struct peer_info for all optional values
Define struct peer_info holding optional values
for all system.peers columns, allowing the caller to
update any column.

Pass the values as std::vector<std::optional<data_value>>
to query_processor::execute_internal.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:30 +02:00
Benny Halevy
6123dc6b09 query_processor: execute_internal: support unset values
Add overloads for execute_internal and friends
accepting a vector of optional<data_value>.

The caller can pass nullopt for any unset value.
The vector of optionals is translated internally to
`cql3::raw_value_vector_with_unset` by `make_internal_options`.

This path will be called by system_keyspace::update_peer_info
for updating a subset of the system.peers columns.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:21:35 +02:00
Benny Halevy
328ce23c78 types: add data_value_list
data_value_list is a wrapper around std::initializer_list<data_value>.
Use it for passing values to `cql3::query_processor::execute_internal`
and friends.

A following path will add a std::variant for data_value_or_unset
and extend data_value_list to support unset values.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:17:27 +02:00
Benny Halevy
3cba079b26 gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
Currently, when loading peers' endpoint state from system.peers,
add_saved_endpoint is called.
The first instance of the endpoint state is created with the default
heart_beat_state, with both generation and version set to zero.
However, if add_saved_endpoint finds an existing instance of the
endpoint state, it reuses it, but it updates its heart_beat_state
with the local heart_beat_state() rather than keeping the existing
heart_beat_state, as it should.

This is a problem since it may confuse updates over gossip
later on via do_apply_state_locally that compares the remote
generation vs. the local generation, so they must stem from
the same root that is the endpoint itself.

Fixes #16429

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 16:48:57 +02:00
Benny Halevy
3099c5b8ab storage_service: topology_state_load: lock endpoint for add_saved_endpoint
`topology_state_load` currently calls `add_saved_endpoint`
only if it finds no endpoint_state_ptr for the endpoint.
However, this is done before locking the endpoint
and the endpoint state could be inserted concurrently.

To prevent that, a permit_id parameter was added to
`add_saved_endpoint` allowing the caller to call it
while the endpoint is locked.  With that, `topology_state_load`
locks the endpoint and checks the existence of the endpoint state
under the lock, before calling `add_saved_endpoint`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 16:48:57 +02:00
Benny Halevy
db434e8cb5 raft_group_registry: move on_alive error injection to gossiper
Move the `raft_group_registry::on_alive` error injection point
to `gossiper::real_mark_alive` so it can delay marking the endpoint as
alive, and calling the `on_alive` callback, but without holding
the endpoint_lock.

Note that the entry for this endpoint in `_pending_mark_alive_endpoints`
still blocks marking it as alive until real_mark_alive completes.

Fixes #16506

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 15:28:54 +02:00
Konstantin Osipov
246da8884a test.py: override SCYLLA_* env keys
test.py inherits its env from the user, which is the right thing:
some python modules, e.g. logging, do accept env-based configuration.

However, test.py also starts subprocesses, i.e. tests, which start
scylladb instances. And when the instance is started without an explicit
configuration file, SCYLLA_CONF from user environment can be used.

If this scylla.conf contains funny parameters, e.g. unsupported
configuration options, the tests may break in an unexpected way.

Avoid this by resetting the respecting env keys in test.py.

Fixes gh-16583

Closes scylladb/scylladb#16577
2023-12-31 13:02:49 +02:00
Benny Halevy
85b3232086 system_keyspace: get rid of update_cached_values
It's a no-op.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 10:10:51 +02:00
Benny Halevy
f64ecc2edf storage_service: do not update peer info for this node
system_keyspace had a hack to skip update_peer_info
for the local node, and then to remove an entry for
the local node in system.peers if `update_tokens(endpoint, ...)`
was called for this node.

This change unhacks system_keyspace by considering
update of system.peers with the local address as
an internal error and fixing the call sites that do that.

Fixes #16425

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 10:10:51 +02:00
Patryk Jędrzejczak
da37e82fb9 test: add test_remove_alive_node
We add a test for the Raft-based topology's new feature - rejecting
the removenode operation on the topology coordinator side if the
node being removed is considered alive by the failure detector.

Additionally, the test tests a case when the removenode operation
is rejected on the initiator side.
2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak
bd5ee04c18 topology_coordinator: reject removenode if the removed node is alive
The removenode operation is defined to succeed only if the node
being removed is dead. Currently, we reject this operation on the
initiator side (in storage_service::raft_removenode) when the
failure detector considers the node being removed alive. However,
it is possible that even if the initiator considers the node dead,
the topology coordinator will consider it alive when handling the
topology request. For example, the topology coordinator can use
a bigger failure detector timeout, or the node being removed can
suddenly resurrect. This patch adds a check on the topology
coordinator side.

Note that the only goal of this change is to improve the user
experience. The topology coordinator does not rely on the gossiper
to ensure correctness.
2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak
cf955094c1 test: ManagerClient: remove unused wait_for_host_down
The previous commit removed the only call to wait_for_host_down.
Moreover, this function is identical to server_not_sees_other_server.
We can safely remove it.
2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak
7038a033f2 test: remove_node: wait until the node being removed is dead
In the following commits, we make the topology coordinator reject
removenode requests if the node being removed is considered alive
by the gossiper. Before making this change, we need to adapt the
testing framework so that we don't have flaky removenode operations
that fail because the node being removed hasn't been marked as dead
yet. We achieve this by waiting until all other running nodes see
the node being removed as dead in all removenode operations.

Some tests are simplified after this change because they don't have
to call server_not_sees_other_server anymore.
2023-12-29 17:12:45 +01:00
Patryk Jędrzejczak
6ffacae0c7 storage_service: handle_state_left, handle_state_normal: improve logs
We log the information about ignoring the `handle_state_left`
function after logging the general entry information. It is better
to know what exactly is being ignored during debugging.

We also add the `permit_id` info to the log. All other functions
called through gossip notifications log it.
2023-12-29 15:10:56 +01:00
Patryk Jędrzejczak
3e551ef485 raft topology: do not update token metadata in on_alive and on_remove
In the Raft-based topology, we should never update token metadata
through gossip notifications. `storage_service::on_alive` and
`storage_service::on_remove` do it, so we ignore their parts that
touch token metadata.

There are other functions in storage_service called through gossip
notifications that are not ignored in the Raft-based topology.
However, we don't have to or cannot ignore them. We cannot ignore
`on_join` and `on_change` because they update the PEERS table used
by drivers. The rest of those functions don't have to be ignored.
These are:
- `before_change` - it does nothing,
- `on_dead` and `on_restart` - they only remove the RPC client and
  send notifications,
- `handle_state_bootstrap` and `handle_state_removed` - they are
  never called in the Raft-based topology.
2023-12-29 15:10:35 +01:00
Patryk Jędrzejczak
f1dea4bc8a storage_proxy: do not fence reads and writes to local tables
Fencing is necessary only for reads and writes to non-local tables.
Moreover, fencing a read or write to a local table can cause an
error on the bootstrapping node. It is explained in the comment
in storage_proxy::get_fence.

A scenario described in the comment has been reported in
scylladb/scylladb#16423. A write to the local RAFT table failed
because of fencing, and it killed server_impl::io_fiber.

Fixes scylladb/scylladb#16423

Closes scylladb/scylladb#16525
2023-12-28 19:34:27 +02:00
Nadav Har'El
91636f6d21 test/cql-pytest: reproducer of slightly too strict parser of timestamp
Scylla refuses the timestamp format "2014-01-01 12:15:45.0000000Z" that
has 6 digits of precision for the fractional second, and only allows
3 digits of precision. This restriction makes sense - after all CQL
timestamp columns (note - this is NOT "using timestamp"!) only have
millisecond precision. Nevertheless, Cassandra does not have this
restriction and does allow these over-precise timestamps. In this patch
we add a test that demonstrates this difference.

Curiously, in the past Scylla *generated* this forbidden timestamp
format when outputting the timestamp to a string (e.g. toJson()),
which it then couldn't read back! This was issue #16575.
Today Scylla no longer generates this forbidden timestamp format.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16576
2023-12-28 19:01:25 +02:00
Takuya ASADA
7275b614aa scylla_util.py: wait for apt operation on other processes
apt_install() / apt_uninstall() may fail if background process running
apt operation, such as unattended-upgrades.

To avoid this, we need to add two things:

1. For apt-get install / remove, we need to option "DPkg::Lock::Timeout=-1"
to wait for dpkg lock.

2. For apt-get update, there is no option to wait for cache lock.
Therefore, we need to implement retry-loop to wait for apt-get update
succeed.

Fixes #16537

Closes scylladb/scylladb#16561
2023-12-28 19:00:36 +02:00
Takuya ASADA
331d9ce788 install.sh: fix scylla-server.service failure on nonroot mode
On 3da346a86d, we moved
AmbientCapabilities to scylla-server.service, but it causes "Operation
not permitted" on nonroot mode.
It is because nonroot user does not have enough privilege to set
capabilities, we need to disable the parameter on nonroot mode.

Closes scylladb/scylladb#16574
2023-12-27 20:52:17 +02:00
Avi Kivity
6394854f04 Merge 'Some cleanups in tests for tablets + MV ' from Nadav Har'El
This small series improves two things in the multi-node tests for tablet supports in materialized views:

1. The test for Alternator LSI, which "sometimes" could reproduce the bug by creating 10-node cluster with a random tablet distribution, is replaced by a reliable 2-node cluster which controls the tablet distribution. The new test also confirms that tablets are actually enabled in Alternator (reviewers of the original test noted it would be easy to pass the test if tablets were accidentally not enabled... :-)).
2. Simplify the tablet lookup code in the test to not go through a "table id", and lookup the table's (or view's) name directly (requires a full-table of the tablets table, but that's entirely reasonable in a test).

The third patch in this series also fixes a comment typo discovered in a previous review.

Closes scylladb/scylladb#16440

* github.com:scylladb/scylladb:
  materialized views: fix typo in comment
  test_mv_tablets: simplify lookup of tablets
  alternator, tablets: improve Alternator LSI tablets test
2023-12-27 20:18:14 +02:00
Gleb Natapov
e31f6893af storage_service: topology coordinator: fix accessing outdated node in case of barrier failure
When metadata barrier fails a guard is released and node becomes
outdated. Failure handling path needs to re-take the guard and re-create
the node before continuing.

Fixes: #16568

Message-ID: <ZYxEm+SaBeFcRT8E@scylladb.com>
2023-12-27 18:40:10 +02:00
Avi Kivity
3ce0576a31 Merge 'Sanitize keyspace_metadata creation' from Pavel Emelyanov
The amount of arguments needed to create ks metadata object is pretty large and there are many different ways it can be and it is created over the code. This set simplifies it for the most typical patterns.

closes: #16447
closes: #16449

Closes scylladb/scylladb#16565

* github.com:scylladb/scylladb:
  schema_tables: Use new_keyspace() sugar
  keyspace_metadata: Drop vector-of-schemas argument from new_keyspace()
  keyspace_metadata: Add default value for new_keyspace's durable_writes
  keyspace_metadata: Pack constructors with default arguments
2023-12-27 17:15:04 +02:00
Botond Dénes
1647b29cba tools/schema_loader: add db::config parameter to all load methods
So that a single centrally managed db::config instance can be shared by
all code requiring it, instead of creating local instances where needed.
This is required to load schema from encrypted schema-tables, and it
also helps memory consumption a bit (db::config consumes a lot of
memory).

Fixes: #16480

Closes scylladb/scylladb#16495
2023-12-27 16:28:38 +02:00
Nadav Har'El
e6dc9bca0d Merge 'Profile dumping rest api support' from Eliran Sinvani
This change is motivated by wanting to have code coverage reporting support.
Currently the only way to get a profile dump in ScyllaDB is stopping it with SIGTERM, however, this doesn't
suite all cases, more specifically:
1. In dtest, when some of the tests intentionally abruptly kill a node
2. In test.py, where we would like to distinguish (at least for now), graceful shutdown of ScyllaDB testing and
teardown procedures (which currently kills the nodes).

This mini series adds two changes:
1. It adds the support for profile dumping in ScyllaDB with rest api ('/system/dump_profile')
2. It adds the support for this API in test.py and also adds a call for it as part of the node stop procedure in a permissive way that will not fail the teardown or test if the call doesn't succeed for whatever reason - after this change, all current
test.py suits except for pylib_test (expected) dumps profiles if instrumented and will be able to participate in coverage
reporting.

Refs #16323

Closes scylladb/scylladb#16557

* github.com:scylladb/scylladb:
  test.py: Dump coverage profile before killing a node
  rest api: Add an api for profile dumping
2023-12-27 12:06:39 +02:00
Eliran Sinvani
e49b3ffc89 test.py: Dump coverage profile before killing a node
Up until now the only way to get a coverage profile was to shut down the
ScyllaDB nodes gracefully (using SIGTERM), this means that the coverage
profile was lost for every node that was killed abruptly (SIGKILL).
This in turn would have been requiring us to shut down all nodes
gracefully which is not something we set out to do.
Here we use the rest API for dumping the coverage profile which will
cause the most minimal impact possible on the test runs.
If the dumping fails (due to the node doesn't support the API or due to
a real error in dumping we ignore it as it is not part of the system we
would like to test.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-27 07:17:26 +02:00
Eliran Sinvani
4c60804c4c rest api: Add an api for profile dumping
As part of code coverage support we need to work with dumped profiles
for ScyllaDB executables.
Those profiles are created on two occasions:
1. When an application exits notmaly (which will trigger
   __llvm_dump_profile registered in the exit hooks.
2. For ScyllaDB commit d7b524cf10 introduced a manual call to
   __llvm_dump_profile upon receiving a SIGTERM signal.

This commit adds a third option, a rest API to dump the profile.
In addition the target file is logged and the counters are reset, which
enables incremental dumping of the profile.
Except for logging, if the executable is not instrumented, this API call
becomes a no-op so it bears minimal risk in keeping it in our releases.
Specifically for code coverage, the gain will be that we will not be
required to change the entire test run to shut down clusters gracefully
and this will cause minimal effect to the actual test behavior.

The change was tested by manually triggering the API in with and
without instrumentation as well as re triggering it with write
permissions for the profile file disabled (to test fault tolerance).

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-27 07:06:54 +02:00
Avi Kivity
2a76065e3d table, memtable: share log-structured allocator statistics across all memtables in a table
The log-structured allocator collects allocation statistics (which it
uses to manage memory reserves) in some objects kept in
memtable_table_shared_data. Right now, this object is local to memtable_list,
which itself is local to a tablet replica. Move it to table scope so
different tablets in the shard share the statistics. This helps a
newly-migrated tablet adjust more quickly.
2023-12-26 21:24:51 +02:00
Avi Kivity
02111d6754 memtable: consolidate _read_section, _allocating_section in a struct
Those two members are passed from memtable_list to memtable. Since we
wish to pass them from table, it becomes awkward to pass them as two
separate variables as their contents are specific to memtable internals.

Wrap them in a name that indicates their role (being table-wide shared
data for memtables) and pass them as a unit.
2023-12-26 21:11:48 +02:00
Nadav Har'El
fc71c34597 Merge 'select statement: verify EXECUTE permissions only for non native functions' from Eliran Sinvani
Commit 62458b8e4f introduced the enforcement of EXECUTE permissions of functions in cql select. However, according to the reference in #12869, the permissions should be enforced only on UDFs and UDAs.
The code does not distinguish between the two so the permissions are also unintenionally enforced also on native function. This commit introduce the distinction and only enforces the permissions on non native functions.

Fixes #16526

Manually verified (before and after change) with the reproducer supplied in #16526 and also with some the `min` and `max` native functions.
Also added test that checks for regression on native functions execution and verified that it fails on authorization before
the fix and passes after the fix.

Closes scylladb/scylladb#16556

* github.com:scylladb/scylladb:
  test.py: Add test for native functions permissions
  select statement: verify EXECUTE permissions only for non native functions
2023-12-26 18:14:21 +02:00
Gleb Natapov
74d17719db test: add test to check failure handling in cdc generation commit 2023-12-26 16:01:34 +02:00
Gleb Natapov
21063b80fb storage_service: topology coordinator: rollback on failure to commit cdc generation
If the coordinator fail to notify all nodes about new cdc generation
during bootstrap it cannot proceed booting since it can cause data
lose with cdc. Rollback the topology operation if failure happens
during this state.
2023-12-26 15:58:15 +02:00
Pavel Emelyanov
129196db98 schema_tables: Use new_keyspace() sugar
The create_keyspace_from_schema_partition code creates ks metadata
without schemas and user-types. There's new_keyspace() convenience
helper for such cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 13:26:58 +03:00
Pavel Emelyanov
a1ad2571fc keyspace_metadata: Drop vector-of-schemas argument from new_keyspace()
It's only testing code that wants to call new_keyspace with existing
schemas, all the other callers either construct the ks metadata
directly, or use convenience new_keyspace with explicitly empty schemas.
By and large it's nicer if new_keyspace() doesn't requires this
argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 13:00:44 +03:00
Pavel Emelyanov
ffdafe4024 keyspace_metadata: Add default value for new_keyspace's durable_writes
Almost all callers call new_keyspace with durable writes ON, so it's
worth having default value for it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 11:47:37 +03:00
Pavel Emelyanov
9ab0065796 keyspace_metadata: Pack constructors with default arguments
There's a cascade of keyspace_metadata constructors each adding one
default argument to the prevuous one. All this can be expressed shorter
with the help of native default argument

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-26 11:41:01 +03:00
Eliran Sinvani
a336550041 test.py: Add test for native functions permissions
Native functions (non UDF/UDA functions), should be usable even if a
user is not granted EXECUTE permissions on them.

This is a regression test that was added following:
https://github.com/scylladb/scylladb/issues/16526

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-26 10:27:04 +02:00
Eliran Sinvani
cac79977d6 select statement: verify EXECUTE permissions only for non native functions
Commit 62458b8e4f introduced the
enforcement of EXECUTE permissions of functions in cql select. However,
according to the reference in #12869, the permissions should be enforced
only on UDFs and UDAs.
The code does not distinguish between the two so the permissions are
also unintentionally enforced also on native function.
This commit introduce the distinction and only enforces the permissions
on non native functions.

Fixes #16526

Manually verified (before and after change) with the reproducer
supplied in #16526 and also with some the `min` and `max` native
functions.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-26 10:27:04 +02:00
Avi Kivity
3968fc11bf Merge 'cql: fix regression in SELECT * GROUP BY' from Nadav Har'El
This short series fixes a regression from Scylla 5.2 to Scylla 5.4 in "SELECT * GROUP BY" - this query was supposed to return just a single row from each partition (the first one in clustering order), but after the expression rewrite started to wrongly return all rows.

The series also includes a regression test that verifies that this query works doesn't work correctly before this series, but works with this patch - and also works as expected in Scylla 5.2 and in Cassadra.

Fixes #16531.

Closes scylladb/scylladb#16559

* github.com:scylladb/scylladb:
  test/cql-pytest: check that most aggregators don't take "*"
  cql-pytest: add reproducer for GROUP BY regression
  cql: fix regression in SELECT * GROUP BY
2023-12-25 19:53:55 +02:00
Avi Kivity
3da346a86d Merge 'Drop CentOS7 specific codes' from Takuya ASADA
Since we decided to drop CentOS7 support from latest version of Scylla, now we can drop CentOS7 specific codes from packaging scripts and setup scripts.

Related scylladb/scylla-enterprise#3502

Closes scylladb/scylladb#16365

* github.com:scylladb/scylladb:
  scylla-server.service: switch deprecated PermissionsStartsOnly to ExecStartPre=+
  dist: drop legacy control group parameters
  scylla-server.slice: Drop workaround for MemorySwapMax=0 bug
  dist: move AmbientCapabilities to scylla-server.service
  Revert "scylla_setup: add warning for CentOS7 default kernel"

[avi: CentOS 7 reached EOL on June 2024]
2023-12-25 18:25:05 +02:00
Kefu Chai
68c98d2203 build: cmake: link against boost static when --static-boost is specified
`--static-boost` is an option provided by `configure.py`. this option is
not used by our CI or building scripts. but in order to be compatible
with the existing behavior of `configure.py`, let's support this option
when building with CMake.

`Boost_USE_STATIC_LIBS` is a cmake variable supported by CMake's
FindBoost and Boost's own `BoostConfig.cmake`. see
https://cmake.org/cmake/help/latest/module/FindBoost.html#other-variables

by default boost is linked via its shared libraries. by setting
this variable, we link boost's static libraries.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16545
2023-12-25 18:23:49 +02:00
Avi Kivity
da022ca4e8 Merge 'build: cmake: add "mode_list" target ' from Kefu Chai
scylla uses build modes like "debug" and "release" to differentiate
different build modes. while we intend to use the typical build
configurations / build types used by CMake like "Debug" and
"RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and
CMAKE_BUILD_TYPE. the former is used for naming the build directory and
for the preprocess macro named "SCYLLA_BUILD_MODE".

`test.py` and scylladb's CI are designed based on the naming of build
directory. in which, `test.py` lists the build modes using the dedicated
build target named `list_modes`, which is added by `configure.py`.

so, in this change, the target is added to CMake as well. the variables
of "scylla_build_mode" defined by the per-mode configuration are
collected and printed by the `list_modes`.

because, by default, CMake generates a target for each build
configuration when a multi-config generator is used. but we only want to
print the build mode for a single time when "list_modes" is built. so
a "BYPRODUCTS" is deliberately added for the target, and the patch of
this "BYPRODUCTS" is named without the "$<CONFIG>" it its path.

Closes scylladb/scylladb#16532

* github.com:scylladb/scylladb:
  build: cmake: add "mode_list" target
  build: cmake: define scylla_build_mode
2023-12-25 18:20:34 +02:00
Kefu Chai
4a817f8a2a data_dictionary: use insert_or_assign() when appropriate
when compiling clang-18 in "release" mode, `assert()` is optimized out.
so `i` is not used. and clang complains like:

```
/home/kefu/dev/scylladb/data_dictionary/user_types_metadata.hh:29:14: error: unused variable 'i' [-Werror,-Wunused-variable]
   29 |         auto i = _user_types.find(type->_name);
      |              ^
```

in this change, we use `i` as the hint for the insertion, for two
reasons:

- silence the warning.
- avoid the looking up in the unordered_map twice with the same
  key.

`type` is not moved away when being passed to `insert_or_assign()`,
because otherwise, `type->_name` could be referencing a moved-away
shared_ptr, because the order of evaluating a function's parameter
is not determined. since `type` is a shared_ptr, the overhead is
negligible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16530
2023-12-25 18:18:20 +02:00
Takuya ASADA
0b894a7cac locator::ec2_snitch: change retry logic to exponential backoff
Since Amazon recommended to use exponential backoff logic when retries
to call AWS API, we should switch the logic on ec2_snitch.

see https://docs.aws.amazon.com/general/latest/gr/api-retries.html

Related with #12160

Closes scylladb/scylladb#13442
2023-12-25 18:17:23 +02:00
Yaron Kaikov
8917947f29 build_docker: Add description and summary labels
Adding description and summary labels to our docker images per @tzach
and @mykaul request,

Closes scylladb/scylladb#16419
2023-12-25 18:14:56 +02:00
Pavel Emelyanov
ac3dd4bf5d test: Coroutinize some secondary_index_test cases
Now they are long then-chains that are hard to read

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16547
2023-12-25 18:08:19 +02:00
Nadav Har'El
55317666c6 test/cql-pytest: check that most aggregators don't take "*"
Although you can "SELECT COUNT(*)", this has special handling in the CQL
parser (it is converted into a special row-counting request) and you can't
give "*" to other aggregators - e.g., "SELECT SUM(*)". This patch includes
a simple test that confirms this.

I wanted to check this in relation to the previous patch, which did,
sort of, a "SELECT $$first$$(*)" - a syntax which this test shows
wouldn't have actually worked if we tried it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-25 17:53:42 +02:00
Nadav Har'El
e2773b4a3a cql-pytest: add reproducer for GROUP BY regression
test/cql-pytest/test_group_by.py has tests that verifies that requests
like

   SELECT p,c1,c2,v FROM tbl WHERE p=0 GROUP BY p

work as expected - the "GROUP BY p" means in this case that we should
only return the first row in the p=0 partition.

As a user discovered, it turns out that the almost identical request:

   SELECT * FROM tbl WHERE p=0 GROUP BY p

Doesn't work the same - before the fix in the previous patch, it
erroneously returned all rows in p=0, not just the first one.
The test in this patch demonstrates this - it fails on Scylla 5.4,
passes on Scylla 5.2 and on Cassandra - and passes when the fix
from the previous patch is used.

This patch includes another tiny test, to check the interaction of GROUP BY
with filtering. This second test passes on Scylla - but I want it in
anyway because it is yet another interaction that might break (the
user that reported #16531 also had filtering, and I was worried it might
have been related).

Refs #16531

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-25 17:53:42 +02:00
Nadav Har'El
1aea2136c8 cql: fix regression in SELECT * GROUP BY
Recently, the expression-rewrite effort changed the way that GROUP BY is
implemented. Usually GROUP BY involves an aggregation function (e.g., if
you want a separate SUM per partition). But there's also a query like

   SELECT p, c1, c2, v FROM tbl GROUP BY p

This query is supposed to return one row - the *first* row in clustering
order - per group (in this case, partition). The expression rewrite
re-implemented this feature by introducing a new internal aggregator,
first(), which returns the first aggregated value. The above query is
rewritten into:

   SELECT first(p), first(c1), first(c2), first(v) FROM tbl GROUP BY p

This case works correctly, and we even have a regression test for it.
But unfortunately the rewrite broke the following query:

   SELECT * FROM tbl GROUP BY p

Note the "*" instead of the explicit list of columns.
In our implementation, a selection of "*" is looks like an empty
selection, and it didn't get the "first()" treatment and it remained
a "SELECT *" - and wrongly returned all rows instead of just the first
one in each partition. This was a regression - it worked correctly in
Scylla 5.2 (and also in Cassandra) - see the next patch for a
regression test.

In this patch we fix this regression. When there is a GROUP BY, the "*"
is rewritten to the appropriate list of all visible columns and then
gets the first() treatment, so it will return only the first row as
expected. The next patch will be a test that confirms the bug and its
fix.

Fixes #16531

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-25 17:52:57 +02:00
Avi Kivity
a7efaca878 Merge 'Move initial_tablets to system_schema.scylla_keyspaces' from Pavel Emelyanov
Right now the initial_tablets is kept as replication strategy option in the legacy system_schema.keyspaces table. However, r.s. options are all considered to be replication factors, not anything else. Other than being confusing, this also makes it impossible to extend keyspace configuration with non-integer tablets-related values.

This PR moves the initial_tablets into scylla-specific part of the schema. This opens a way to more ~~ugly~~ flexible ways of configuring tablets for keyspace, in particular it should be possible to use boolean on/off switch in CREATE KEYSPACE or some other trick we find appropriate.

Mos of what this PR does is extends arguments passed around keyspace_metadata and abstract_replication_strategy. The essence of the change is in last patches
* schema_tables: Relax extract_scylla_specific_ks_info() check
* locator,schema: Move initial tablets from r.s. options to params

refs: #16319
refs: #16364

Closes scylladb/scylladb#16555

* github.com:scylladb/scylladb:
  test: Add sanity tests for tablets initialization and altering
  locator,schema: Move initial tablets from r.s. options to params
  schema_tables: Relax extract_scylla_specific_ks_info() check
  locator: Keep optional initial_tablets on r.s. params
  ks_prop_defs: Add initial_tablets& arg to prepare_options()
  keyspace_metadata: Carry optional<initial_tablets> on board
  locator: Pass abstract_replication_strategy& into validate_tablet_options()
  locator: Carry r.s. params into process_tablet_options()
  locator: Call create_replication_strategy() with r.s. params
  locator: Wrap replication_strategy_config_options into replication_strategy_params
  locator: Use local members in ..._replication_strategy constructors
2023-12-25 17:44:10 +02:00
Pavel Emelyanov
1d2c871219 test: Add sanity tests for tablets initialization and altering
Check that the initial_tablets appears in system_schema.scylla_keyspaces
if turned on explicitly

Check that it's possible to change initial_tablets with ALTER KEYSPACE

Check that changing r.s. from simple to network-topology doesn't
activate tablets

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:09:01 +03:00
Pavel Emelyanov
c43501d973 locator,schema: Move initial tablets from r.s. options to params
The option is kepd in DDL, but is _not_ stored in
system_schema.keyspaces. Instead, it's removed from the provided options
and kept in scylla_keyspaces table in its own column. All the places
that had optional initial_tablets disengaged now set this value up the
way the find appropriate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:07:10 +03:00
Pavel Emelyanov
30e7273658 schema_tables: Relax extract_scylla_specific_ks_info() check
Nowadays reading scylla-specific info from schema happens under
respective schema feature. However (at least in raft case) when a new
node joins the cluster merging schema for the first time may happen
_before_ features are merged and enabled. Thus merging schema can go the
wrong way by errorneously skipping the scylla-specific info.

On the other hand, if system_schema.scylla_keyspaces is there it's
there, there's no reason _not_ to pick this data up in that case.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:05:01 +03:00
Pavel Emelyanov
562fcf0c19 locator: Keep optional initial_tablets on r.s. params
Now all the callers have it at hands (spoiler: not yet initialized, but
still) so the params can also have it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:02:41 +03:00
Pavel Emelyanov
2d480a2093 ks_prop_defs: Add initial_tablets& arg to prepare_options()
The prepare_options() method is in charge of pre-tuning the replication
strategy CQL parameters so that real keyspace and r.s. creation code
doesn't see some of those. The "initial_tablets" option is going to be
removed from the real options and be placed into scylla-specific part of
the schema. So the prepare_options() will need to modify both -- the
legacy options _and_ the (soon to be separate) initial_tablets thing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:00:50 +03:00
Pavel Emelyanov
a67c535539 keyspace_metadata: Carry optional<initial_tablets> on board
The object in question fully describes the keyspace to be created and,
among other things, contains replication strategy options. Next patches
move the "initial_tablets" option out of those options and keep it
separately, so the ks metadata should also carry this option separately.

This patch is _just_ extending the metadata creation API, in fact the
new field is unused (write-only) so all the places that need to provide
this data keep it disengaged and are explicitly marked with FIXME
comment. Next patches will fix that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:58:05 +03:00
Pavel Emelyanov
45f4276de6 locator: Pass abstract_replication_strategy& into validate_tablet_options()
It will need to check if the r.s. in question had been marked as
per-table one in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:56:49 +03:00
Pavel Emelyanov
bf824d79d9 locator: Carry r.s. params into process_tablet_options()
The latter method is the one that will need extended params in next
patches. It's called from network_topology_strategy() constructor which
already has params at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:56:02 +03:00
Pavel Emelyanov
a943bd927b locator: Call create_replication_strategy() with r.s. params
Previous patch added params to r.s. classes' constructors, but callers
don't construct those directly, instead they use the create_r.s.()
wrapper. This patch adds params to the wrapper too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:54:59 +03:00
Pavel Emelyanov
f88ba0bf5a locator: Wrap replication_strategy_config_options into replication_strategy_params
When replication strategy class is created caller parr const reference
on the config options which is, in turn, a map<string, string>. In the
future r.s. classes will need to get "scylla specific" info along with
legacy options and this patch prepares for that by passing more generic
params argument into constructor. Currently the only inhabitant of the
new params is the legacy options.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:53:03 +03:00
Pavel Emelyanov
ecbafd81f2 locator: Use local members in ..._replication_strategy constructors
The `config_options` arg had been used to initialize `_config_options`
field of the base abstract_replication_strategy class, so it's more
idiomatic to use the latter. Also it makes next patches simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 15:51:51 +03:00
Pavel Emelyanov
f621afa3ec database: Copy storage options too when updating keyspace metadata
When altering a keyspace several keyspace_metadata objects are created
along the way. The last one, that is then kept on the keyspace_metadata
object, forgets to get its copy of storage options thus transparently
converting to LOCAL type.

The bug surfaces itself when altering replication strategy class for
S3-backed storage -- the 2nd attempt fails, because after the 1st one
the keyspace_metadata gets LOCAL storage options and changing storage
options is not allowed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16524
2023-12-25 13:31:15 +02:00
Benny Halevy
060b16f987 view: apply_to_remote_endpoints: fix use-after-free
b815aa021c added a yield before
the trace point, causing the moved `frozen_mutation_and_schema`
(and `inet_address_vector_topology_change`) to drop out of scope
and be destroyed, as the rvalue-referenced objects aren't moved
onto the coroutine frame.

This change passes them by value rather than by rvalue-reference
so they will be stored in the coroutine frame.

Fixes #16540

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16541
2023-12-24 21:43:48 +02:00
Botond Dénes
da033343b7 tools/schema_loader: read_schema_table_mutation(): close the reader
The reader used to read the sstables was not closed. This could
sometimes trigger an abort(), because the reader was destroyed, without
it being closed first.
Why only sometimes? This is due to two factors:
* read_mutation_from_flat_mutation_reader() - the method used to extract
  a mutation from the reader, uses consume(), which does not trigger
  `set_close_is_required()` (#16520). Due to this, the top-level
  combined reader did not complain when destroyed without close.
* The combined reader closes underlying readers who have no more data
  for the current range. If the circumstances are just right, all
  underlying readers are closed, before the combined reader is
  destoyed. Looks like this is what happens for the most time.

This bug was discovered in SCT testing. After fixing #16520, all
invokations of `scylla-sstable`, which use this code would trigger the
abort, without this patch. So no further testing is required.

Fixes: #16519

Closes scylladb/scylladb#16521
2023-12-24 17:21:32 +02:00
Nadav Har'El
6640278aa7 materialized views: fix typo in comment
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-24 10:12:44 +02:00
Nadav Har'El
f9f20e779c test_mv_tablets: simplify lookup of tablets
The tests looked up a table's tablets in an elaborate two-stage search -
first find the table's "id", and then look up this id in the list of
tablets. It is much simpler to just look up the table's name directly
in the list of tablets - although this name is not a key, an ALLOW
FILTERING search is good enough for a test.

As a bonus, with the new technique we don't care if the given name
is the name of a table or a view, further simplifying the test.

This is just a test code cleanup - there is no functional change in
the test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-24 10:12:44 +02:00
Nadav Har'El
cdd5b19f12 alternator, tablets: improve Alternator LSI tablets test
The test test_tablet_alternator_lsi_consistency, checking that Alternator
LSI allow strongly-consistent reads even with tablets, used a large
cluster (10 nodes), to improve the chance of reaching an "unlucky" tablet
placement - and even then only failed in about half the runs without
the code fixed.

In this patch, we rewrite the test using a much more reliable approach:
We start only two nodes, and force the base's tablet onto one node, and
the view's table onto the second node. This ensures with 100% certainty
that the view update is remote, and the new test fails every single time
before the code fix (I reverted the fix to verify) - and passes after it.

The new test is not only more reliable, it's also significantly faster
because it doesn't need to start a 10-node cluster.

We can also remove the tag that excluded this test from debug build
mode tests because the 10-node boot was too slow.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-24 10:11:43 +02:00
Kefu Chai
2bec6751d3 build: cmake: add "mode_list" target
scylla uses build modes like "debug" and "release" to differentiate
different build modes. while we intend to use the typical build
configurations / build types used by CMake like "Debug" and
"RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and
CMAKE_BUILD_TYPE. the former is used for naming the build directory and
for the preprocess macro named "SCYLLA_BUILD_MODE".

`test.py` and scylladb's CI are designed based on the naming of build
directory. in which, `test.py` lists the build modes using the dedicated
build target named `list_modes`, which is added by `configure.py`.

so, in this change, the target is added to CMake as well. the variables
of "scylla_build_mode" defined by the per-mode configuration are
collected and printed by the `list_modes`.

because, by default, CMake generates a target for each build
configuration when a multi-config generator is used. but we only want to
print the build mode for a single time when "list_modes" is built. so
a "BYPRODUCTS" is deliberately added for the target, and the patch of
this "BYPRODUCTS" is named without the "$<CONFIG>" it its path.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-24 12:35:02 +08:00
Kefu Chai
79943e0516 build: cmake: define scylla_build_mode
scylla uses build modes like "debug" and "release" to differentiate
different build modes. while we intend to use the typical build
configurations / build types used by CMake like "Debug" and
"RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and
CMAKE_BUILD_TYPE. the former is used for naming the build directory and
for the preprocess macro named "SCYLLA_BUILD_MODE".

`test.py` and scylladb's CI are designed based on the naming of build
directory. in which, `test.py` lists the build modes using the dedicated
build target named "list_modes", which is added by `configure.py`.

so, in this change, to prepare for adding the target,
"scylla_build_mode" is defined, so we can reuse it in a following-up
change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-24 12:28:23 +08:00
Tomasz Grabiec
2590274f95 Merge 'Don't allow ALTER KEYSPACE to change replication strategy vnode/per-table flavor' from Pavel Emelyanov
This switch is currently possible, but results in not supported keyspace state

Closes scylladb/scylladb#16513

* github.com:scylladb/scylladb:
  test: Add a test that switching between vnodes and tablets is banned
  cql3/statements: Don't allow switching between vnode and per-table replication strategies
  cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate
2023-12-22 17:22:36 +01:00
Kefu Chai
642652efab test/cql-pytest/test_tools.py: test shard-of with a single partition
test_scylla_sstable_shard_of takes lots of time preparing the keys for a
certain shard. with the debug build, it takes 3 minutes to complete the
test.

so in order to test the "shard-of" subcommand in an more efficient way,
in this change, we improve the test in two ways:

1. cache the output of 'scylla types shardof`. so we can avoid the
   overhead of running a seastar application repeatly for the
   same keys.
2. reduce the number of partitions from 42 to 1. as the number of
   partitions in an sstable does not matter when testing the
   output of "shard-of" command of a certain sstable. because,
   the sstable is always generated by a certain shard.

before this change, with pytest-profiling:

```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000  181.950   60.650 runner.py:219(call_and_report)
      4/3    0.000    0.000  181.948   60.649 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000  181.948   60.649 runner.py:318(from_call)
      4/3    0.000    0.000  181.948   60.649 runner.py:262(<lambda>)
    44/11    0.000    0.000  181.935   16.540 _hooks.py:427(__call__)
    43/11    0.000    0.000  181.935   16.540 _manager.py:103(_hookexec)
    43/11    0.000    0.000  181.935   16.540 _callers.py:30(_multicall)
      361    0.001    0.000  181.531    0.503 contextlib.py:141(__exit__)
   782/81    0.001    0.000  177.578    2.192 {built-in method builtins.next}
     1044    0.006    0.000   92.452    0.089 base_events.py:1894(_run_once)
       11    0.000    0.000   91.129    8.284 fixtures.py:686(<lambda>)
    17/11    0.000    0.000   91.129    8.284 fixtures.py:1025(finish)
        4    0.000    0.000   91.128   22.782 fixtures.py:913(_teardown_yield_fixture)
      2/1    0.000    0.000   91.055   91.055 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000   91.055   91.055 runner.py:119(runtestprotocol)
        2    0.000    0.000   91.052   45.526 conftest.py:50(cql)
        2    0.000    0.000   91.040   45.520 util.py:161(cql_session)
        1    0.000    0.000   91.040   91.040 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000   91.040   91.040 runner.py:509(teardown_exact)
     1945    0.002    0.000   90.722    0.047 events.py:82(_run)
```

after this change:
```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/3    0.000    0.000    8.271    2.757 runner.py:219(call_and_report)
    44/11    0.000    0.000    8.270    0.752 _hooks.py:427(__call__)
    44/11    0.000    0.000    8.270    0.752 _manager.py:103(_hookexec)
    44/11    0.000    0.000    8.270    0.752 _callers.py:30(_multicall)
      4/3    0.000    0.000    8.269    2.756 runner.py:247(call_runtest_hook)
      4/3    0.000    0.000    8.269    2.756 runner.py:318(from_call)
      4/3    0.000    0.000    8.269    2.756 runner.py:262(<lambda>)
       48    0.000    0.000    8.269    0.172 {method 'send' of 'generator' objects}
       27    0.000    0.000    5.671    0.210 contextlib.py:141(__exit__)
       11    0.000    0.000    4.297    0.391 fixtures.py:686(<lambda>)
      2/1    0.000    0.000    4.228    4.228 runner.py:111(pytest_runtest_protocol)
      2/1    0.000    0.000    4.228    4.228 runner.py:119(runtestprotocol)
        2    0.000    0.000    4.213    2.106 capture.py:877(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:180(pytest_runtest_teardown)
        1    0.000    0.000    4.213    4.213 runner.py:509(teardown_exact)
        2    0.000    0.000    3.628    1.814 capture.py:872(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 runner.py:160(pytest_runtest_call)
        1    0.000    0.000    3.627    3.627 python.py:1797(runtest)
   114/81    0.001    0.000    3.505    0.043 {built-in method builtins.next}
       15    0.784    0.052    3.183    0.212 subprocess.py:417(check_output)
```

Fixes #16516
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16523
2023-12-22 15:20:03 +02:00
Petr Gusev
c05fd8c018 storage_service: node_ops_cmd_handler: decommission rollback, ignore the node if's already removed
This is a regression after #15903. Before these changes
del_leaving_endpoint took IP as a parameter and did nothing
if it was called with a non-existent IP.

The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was
flaky as in most cases the node died before the
gossiper notification reached all the other nodes. To make
it fail consistently and reproduce the problem one
can move the info log 'Announcing that I have' after
the sleep and add additional sleep after it in
storage_service::leave_ring function.

Fixes #16466

Closes scylladb/scylladb#16508
2023-12-22 12:42:38 +01:00
Avi Kivity
6f6170aae7 Update seastar submodule
* seastar ae8449e04f...e0d515b6cf (18):
  > reactor: poll less frequently in debug mode
  > build: s/exec_program/execute_process/
  > Merge 'httpd: support temporary redirect from inside async reply' from Noah Watkins
  > Merge 'core: enable seastar to run multiple times in a single process' from Kefu Chai
  > rpc/rpc_types: add formatter for rpc::optional<T>
  > memory: do not set_reclaim_hook if cpu_mem_ptr is not set
  > circleci: do not set disable dpdk explicitly
  > fair_queue: Do not pop unplugged class immediately
  > build: install Finducontext.cmake and FindSystem-SDT.cmake
  > treewide: include used headers
  > build: define SEASTAR_COROUTINES_ENABLED for Seastar module
  > seastar.cc: include "core/prefault.hh"
  > build: enable build C++20 modules with GCC 14
  > build: replace seastar_supports_flag() with check_cxx_compiler_flag()
  > Merge 'build: cleanups configure.py to be more PEP8 compatible' from Kefu Chai
  > circleci: build with dpdk enabled
  > build: add "--enable-cxx-modules" option to configure.py
  > build: use a different *_CMAKE_API for CMake 3.27

Closes scylladb/scylladb#16500
2023-12-22 12:58:39 +02:00
Tzach Livyatan
45ffa5221e Improve nodetool scrub definition
fix #16505

Closes scylladb/scylladb#16518
2023-12-22 12:09:58 +02:00
Tomasz Grabiec
9c7e5f6277 Merge 'Fix secondary index feature with tablets' from Nadav Har'El
Before this series, materialized views already work correctly on keyspaces with tablets, but secondary indexes do not. The goal of these series is make CQL secondary indexes fully supported on tablets:

1. First we need to make CREATE INDEX work with tablets (it didn't before this series). Fixes #16396.
2. Then we need to keep the promise that our documentation makes - that **local** secondary index should be synchronously updated - Fixes #16371.

As you can see in the patches below, and as was expected already in the design phase, the code changes needed to make indexes support tablets were minimal. But writing reliable tests for these issues was the biggest effort that went into this series.

Closes scylladb/scylladb#16436

* github.com:scylladb/scylladb:
  secondary-index, tablets: ensure that LSI are synchronous
  test: add missing "tags" schema extension to cql_test_env
  mv, test: fix delay_before_remote_view_update injection point
  secondary index: fix view creation when using tablets
2023-12-21 23:37:00 +01:00
Botond Dénes
1ce07c6f27 test/cql-pytest: test_select_from_mutation_fragments: bump timeout for test_many_partitions
The test test_many_partitions is very slow, as it tests a slow scan over
a lot of partitions. This was observed to time out on the slower ARM
machines, making the test flaky. To prevent this, create an
extra-patient cql connection with a 10 minutes timeout for the scan
itself.
This is a follow-up to fb9379edf1, which
attempted to fix this, but didn't patch all the places doing slow scans.
This patch fixes the other scan, the one actually observed to time-out
in CI.

Fixes: #16145

Closes scylladb/scylladb#16370
2023-12-21 19:55:06 +02:00
Pavel Emelyanov
a03755d6d7 test: Add a test that switching between vnodes and tablets is banned
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-21 19:57:55 +03:00
Pavel Emelyanov
4de433ac23 cql3/statements: Don't allow switching between vnode and per-table replication strategies
When ALTER-ing a keyspace one may as well change its vnode/tablet
flavor, which is not currently supported, so prohibit this change
explicitly

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-21 19:57:00 +03:00
Pavel Emelyanov
299219833b cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate
For convenience of next patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-21 19:56:18 +03:00
Nadav Har'El
79011eeb24 Merge 'virtual_tables, schema_registry: fix use after free related to schema registry' from Avi Kivity
Both virtual tables and schema registry contain thread_local caches that are destroyed
at thread exit. after a Seastar change[1], these destructions can happen after the reactor
is destroyed, triggering a use-after-free.

Fix by scoping the destruction so it takes place earlier.

[1] 101b245ed7

Closes scylladb/scylladb#16510

* github.com:scylladb/scylladb:
  schema_registry, database: flush entries when no longer in use
  virtual_tables: scope virtual tables registry in system_keyspace
2023-12-21 17:10:25 +02:00
Avi Kivity
c00b376a3e schema_registry, database: flush entries when no longer in use
The schema registry disarms internal timers when it is destroyed.
This accesses the Seastar reactor. However, after [1] we don't have ordering
between the reactor destruction and the thread_local registry destruction.

Fix this by flushing all entries when the database is destroyed. The
database object is fundamental so it's unlikely we'll have anything
using the registry after it's gone.

[1] 101b245ed7
2023-12-21 17:00:41 +02:00
Michał Chojnowski
d7b524cf10 main: add a call to LLVM profile dump before exit
Scylla skips exit hooks so we have to manually trigger the data dump to disk
from the LLVM profiling instrumentation runtime which we need in order
to support code coverage.
We use a weak symbol to get the address of the profile dump function. This
is legal: the function is a public interface of the instrumentation runtime.

Closes scylladb/scylladb#16430
2023-12-21 16:48:42 +02:00
Avi Kivity
2853f79f96 virtual_tables: scope virtual tables registry in system_keyspace
Virtual tables are kept in a thread_local registry for deduplication
purposes. The problem is that thread_local variables are destroyed late,
possibly after the schema registry and the reactor are destroyed.
Currently this isn't a problem, but after a seastar change to
destroy the reactor after termination [1], things break.

Fix by moving the registry to system_keyspace. system_keyspace was chosen
since it was the birthplace of virtual tables.

Pimpl is used to avoid increasing dependencies.

[1] 101b245ed7
2023-12-21 16:19:42 +02:00
Nadav Har'El
a41140f569 Merge 'scylla-sstable: handle attempt to load schema for non-existent tables more gracefully' from Botond Dénes
In other words, print more user-friendly messages, and avoid crashing.
Specifically:
* Don't crash when attempting to load schema tables from configured data-dir, while configuration does not have any configured data-directories.
* Detect the case where schema mutations have no rows for the current table -- the keyspace exists, but the table doesn't.
* Add negative tests for schema-loading.

Fixes: https://github.com/scylladb/scylladb/issues/16459

Closes scylladb/scylladb#16494

* github.com:scylladb/scylladb:
  test/cql-pytest: test_tools.py: add test for failed schema loadig
  tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs
  tools/schema_loader: also check for empty table/column mutations
  tools/schema_loader: log more details when loading schema from schema tables
2023-12-21 15:40:51 +02:00
Kefu Chai
6018e0fea7 database: log when done with truncating
truncating is an unusual operation, and we write a logging message
when the truncate op starts with INFO level, it would be great if
we can have a matching logging messge indicating the end of truncate
on the server side. this would help with investigation the TRUNCATE
timeout spotted on the client. at least we can rule out the problem
happening we server is performing truncate.

Refs #15610
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16247
2023-12-21 13:59:09 +02:00
Raphael S. Carvalho
5e55954f27 replica: Make the storage snapshot survive concurrent compactions
Consider this:
1) file streaming takes storage snapshot = list of sstables
2) concurrent compaction unlink some of those sstables from file system
3) file streaming tries to send unlinked sstables, but files other
than data and index cannot be read as only data and index have file
descriptors opened

To fix it, the snapshot now returns a set of files, one per sstable
component, for each sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#16476
2023-12-21 12:50:28 +02:00
Botond Dénes
e6147c1853 Merge 'Some cleanup in compaction group' from Raphael "Raph" Carvalho
Closes scylladb/scylladb#16448

* github.com:scylladb/scylladb:
  replica: Fix indentation
  replica: Kill unused calculate_disk_space_used_for()
2023-12-21 12:48:38 +02:00
Nadav Har'El
a613a3cad2 secondary-index, tablets: ensure that LSI are synchronous
CQL Local Secondary Index is a Scylla-only extension to Cassandra's
secondary index API where the index is separate per partition.
Scylla's documentation guarantees that:

  "As of Scylla Open Source 4.0, updates for local secondary indexes are
   performed synchronously. When updates are synchronous, the client
   acknowledges the write operation only after both the base table
   modification and the view up date are written."

This happened automatically with vnodes, because the base table and the
view have the same partition key, so base and view replicas are co-located,
and the view update is always local and therefore done synchronously.

But with tablets, this does NOT happen automatically - the base and view
tablets may be located on different nodes, and the view update may be
remote, and NOT synchronous.

So in this patch we explicitly mark the view as synchronous_update when
building the view for an LSI.

The bigger part of this patch is to add a test which reliably fails
before this patch, and passes after it. The test creates a two-node
cluster and a table with LSI, and pins the base's tablets to one node
and the view's to the second node, forcing the view updates to be
remote. It also uses an injection point to make the view update slower.
The test then writes to the base and immediately tries to use the index
to read. Before this patch, the read doesn't find the new data (contrary
to the guarantee in the documentation). After this patch, the read
does find the new data - because the write waited for the index to
be updated.

Fixes #16371

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Nadav Har'El
7c5092cb8f test: add missing "tags" schema extension to cql_test_env
One of the unfortunate anti-features of cql_test_env (the framework used
in our CQL tests that are written in C++) is that it needs to repeat
various bizarre initializations steps done in main.cc, otherwise various
requests work incorrectly. One of these steps that main.cc is to initialize
various "schema extensions" which some of the Scylla features need to work
correctly.

We remembered to initialize some schema extensions in cql_test_env, but
forgot others. The one I will need in the following patch is the "tags"
extension, which we need to mark materialized views used by local
secondary indexes as "synchronous_updates" - without this patch the LSI
tests in secondary_index_test.cc will crash.

In addition to adding the missing extension, this patch also replaces
the segmentation-fault crash when it's missing (caused by a dynamic
cast failure) by a clearer on_internal_error() - so if we ever have
this bug again, it will be easier to debug.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Nadav Har'El
b815aa021c mv, test: fix delay_before_remote_view_update injection point
The "delay_before_remote_view_update" is a recently-added injection
point which should add a delay before remove view updates, but NOT
force the writer to wait for it (whether the writer waits for it or
not depends on whether the view is configured as synchronous or not).

Unfortunately, the delay was added at the WRONG place, which caused
it to sometimes be done even on asynchronous views, breaking (with
false-negative) the tests that need this delay to reproduce bugs of
missing synchronous updates (Refs #16371).

The fix here is even simpler then the (wrong) old code - we just add
the sleep to the existing function apply_to_remote_endpoints() instead
of making the caller even more complex.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Nadav Har'El
8181e28731 secondary index: fix view creation when using tablets
In commit 88a5ddabce, we fixed materialized
view creation to support tablets. We added to the function called to
create materialized views in CQL, prepare_new_view_announcement()
a missing call to the on_before_create_column_family() notifier that
creates tablets for this new view.

Unfortunately, We have the same problem when creating a secondary index,
because it does not use prepare_new_view_announcement(), and instead uses
a generic function to "update" the base table, which in some cases ends
up creating new views when a new index is requested. In this path, the
notifier did not get called to the notifier, so we must add it here too.
Unfortunately, the notifiers must run in a Seastar thread, which means
that yet another function now needs to run in a Seastar thread.

Before this patch, creating a secondary index in a table using tablets
fails with "Tablet map not found for table <uuid>". With this patch,
it works.

The patch also includes tests for creating a regular and local secondary
index. Both tests fail (with the aforementioned error) before this
patch, and pass with it.

Fixes #16396

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-21 11:44:50 +02:00
Raphael S. Carvalho
ee203f846e test: Fix segfault when running offstrategy test
Observer, that references table_for_test, must of course, not
outlive table_for_test. Observer can be called later after the
last input sstable is removed from sstable manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#16428
2023-12-20 19:04:41 +02:00
David Garcia
9af6c7e40b docs: add myst parser
Closes scylladb/scylladb#16316
2023-12-20 19:04:41 +02:00
Raphael S. Carvalho
d1e6dfadea sstables: Harden estimate_droppable_tombstone_ratio() interface
The interface is fragile because the user may incorrectly use the
wrong "gc before". Given that sstable knows how to properly calculate
"gc before", let's do it in estimate__d__t__r(), leaving no room
for mistakes.

sstable_run's variant was also changed to conform to new interface,
allowing ICS to properly estimate droppable ratio, using GC before
that is calculated using each sstable's range. That's important for
upcoming tablets, as we want to query only the range that belongs
to a particular tablet in the repair history table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15931
2023-12-20 19:04:41 +02:00
Botond Dénes
758d9cf005 Merge 'build: cmake: map 'release' to 'RelWithDebInfo'' from Kefu Chai
this preserves the existing behavior of `configure.py` in the CMake
generated `build.ninja`.

* configure.py: map 'release' to 'RelWithDebInfo'
* cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake
* CMakeLists.txt: s/Release/RelWithDebInfo/

Closes scylladb/scylladb#16479

* github.com:scylladb/scylladb:
  build: cmake: map 'release' to 'RelWithDebInfo'
  build: define BuildType for enclosing build_by_default
2023-12-20 19:04:40 +02:00
Pavel Emelyanov
5866d265c3 Merge ' tools/utils: tool_app_template: handle the case of no args ' from Botond Dénes
Currently, `tool_app_template::run_async()` crashes when invoked with empty argv (with just `argv[0]` populated). This can happen if the tool app is invoked without any further args, e.g. just invoking `scylla nodetool`. The crash happens because unconditional dereferencing of `argv[1]` to get the current operation.

To fix, add an early-exit for this case, just printing a usage message and exiting with exit code 2.

Fixes: #16451

Closes scylladb/scylladb#16456

* github.com:scylladb/scylladb:
  test: add regression tests for invoking tools with no args
  tools/utils: tool_app_template: handle the case of no args
  tools/utils: tool_app_template: remove "scylla-" prefix from app name
2023-12-20 19:04:40 +02:00
Kamil Braun
6fcaec75db Merge 'Add maintenance socket' from Mikołaj Grzebieluch
It enables interaction with the node through CQL protocol without authentication. It gives full-permission access.
The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node.
It is created before the node joins the cluster.

To set up the maintenance socket, use the `maintenance-socket` option when starting the node.

* If set to `ignore` maintenance socket will not be created.
* If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`.
* Otherwise maintenance socket will be created in the specified path.

The default value is `ignore`.

* With python driver

```python
from cassandra.cluster import Cluster
from cassandra.connection import UnixSocketEndPoint
from cassandra.policies import HostFilterPolicy, RoundRobinPolicy

socket = "<node's workdir>/cql.m"
cluster = Cluster([UnixSocketEndPoint(socket)],
                  # Driver tries to connect to other nodes in the cluster, so we need to filter them out.
                  load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket))
session = cluster.connect()
```

Merge note: apparently cqlsh does not support unix domain sockets; it
will have to be fixed in a follow-up.

Closes scylladb/scylladb#16172

* github.com:scylladb/scylladb:
  test.py: add maintenance socket test
  test.py: enable maintenance socket in tests by default
  docs: add maintenance socket documentation
  main: add maintenance socket
  main: refactor initialization of cql controller and auth service
  auth/service: don't create system_auth keyspace when used by maintenance socket
  cql_controller: maintenance socket: fix indentation
  cql_controller: add option to start maintenance socket
  db/config: add maintenance_socket_enabled bool class
  auth: add maintenance_socket_role_manager
  db/config: add maintenance_socket variable
2023-12-20 19:04:40 +02:00
Botond Dénes
5ef0d16eb3 test/cql-pytest: test_tools.py: add test for failed schema loadig 2023-12-20 10:31:03 -05:00
Botond Dénes
3e0058a594 tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs
The configuration is not guaranteed to have any, so use the safe
variant, to simply abort the schema load attempt, instead of crashing
the tool.
2023-12-20 10:31:03 -05:00
Botond Dénes
208d2e890e tools/schema_loader: also check for empty table/column mutations
system_schema.tables and system_schema.columns must have content for
every existing table. To detect a failed load of a table, before
attempting to invoke `db::schema_tables::create_table_from_mutations()`,
we check for the mutations read from these two tables, to not be
disengaged. There is another failure scenario however. The mutations are
not null, but do not have any clustering rows. This currently results in
a cryptic error message, about failing to lookup a row in a result-set.
This happens when the lookup-up keyspace exists, but the table doesn't.
Add this to the check, so we get a human-readeable error message when
this happens.
2023-12-20 10:31:00 -05:00
Botond Dénes
81e5033902 tools/schema_loader: log more details when loading schema from schema tables
Currently, there is no visibility at all into what happens when
attempting to load schema from schema tables. If it fails, we are left
guessing on what went wrong.
Add a logger and add various debug/trace logs to help following the
process and identify what went wrong.
2023-12-20 10:30:21 -05:00
Nadav Har'El
7ee55dd03e cdc, tablets: don't allow enabling CDC with tablets
We do not yet support enabling CDC in a keyspace that uses tablets
(Refs #16317). But the problem is that today, if this is attempted,
we get a nasty failure: the CDC code creates the extra CDC log table,
it doesn't get tablets, and Raft gets surprised and croaks with a
message like:

    Raft instance is stopped, reason: "background error,
    std::_Nested_exceptionraft::state_machine_error (State machine error at
    raft/server.cc:1230): std::runtime_error (Tablet map not found for
    table 48ca1620-9ea5-11ee-bd7c-22730ed96b85)

After Raft croaks, Scylla never recovers until it is rebooted.

In this patch, we replace this disaster by a graceful error -  a CREATE
TABLE or ALTER TABLE operation with CDC enabled will fail in a clear way,
and allowing Scylla to continue operating normally after this failed request.

This fix is important for allowing us to run tests on Scylla with
tablets, and although CDC tests will fail as expected, they won't
fail the other tests that follow (Refs #16473).

Fixes #16318

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16474
2023-12-20 10:06:34 +01:00
Kamil Braun
ffb6ae917f Merge 'Add support for tablets in Alternator' from Nadav Har'El
The pull requests adds support for tablets in Alternator, and particularly focuses in getting Alternator's GSI and LSI (i.e., materialized views)  to work.

After this series support for tablets in Alternator _mostly_ work, but not completely:
1. CDC doesn't yet work with tablets, and Alternator needs to provide CDC (known as "DynamoDB Streams").
2. Alternator's TTL feature was not tested with tablets, and probably doesn't work because it assumes the replication map belongs to a keyspace.

Because of these reasons, Alternator does not yet use tablets by default and it needs to be enabled explicitly be adding an experimental tag to the new table. This will allow us to test Alternator with tablets even before it is ready for the limelight.

Fixes #16203
Fixes #16313

Closes scylladb/scylladb#16353

* github.com:scylladb/scylladb:
  mv, tablets, alternator: test for Alternator LSI with tablets
  mv: coroutinize wait code for remote view updates
  mv, test: add injection point to delay remove view update
  alternator: explicitly request synchronous updates for LSI
  alternator: fix view creation when using tablets
  alternator: add experimental method to create a table with tablets
2023-12-20 10:00:31 +01:00
Kamil Braun
1f6460972b Merge 'Fix crash on table drop concurrent with streaming ' from Tomasz Grabiec
The observed crash was in the following piece on "cf" access:

        if (*table_is_dropped) {
            sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name());

Fixes #16181

Also, add a test case which reproduces the problem by doing table drop during tablet migration. But note that the problem is not tablet-specific.

Closes scylladb/scylladb#16341

* github.com:scylladb/scylladb:
  test: tablets: Add test case which tests table drop concurrent with migration
  tests: tablets: Do read barrier in get_tablet_replicas()
  streaming: Keep table by shared ptr to avoid crash on table drop
2023-12-20 09:57:06 +01:00
Kefu Chai
db9e314965 treewide: apply codespell to the comments in source code
for less spelling errors in comment.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16408
2023-12-20 10:25:03 +02:00
Kefu Chai
fafe9d9c38 build: cmake: map 'release' to 'RelWithDebInfo'
this preserves the existing behavior of `configure.py` in the CMake
generated `build.ninja`.

* configure.py: map 'release' to 'RelWithDebInfo'
* cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake
* CMakeLists.txt: s/Release/RelWithDebInfo/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-20 15:07:43 +08:00
Kefu Chai
72dcb2466d build: define BuildType for enclosing build_by_default
in existing `modes` defined in `configure.py`, "release" is mapped to
"RelWithDebInfo". this behavior matches that of seastar's
`configure.py`, where we also map "release" build mode to
"RelWithDebInfo" CMAKE_BUILD_TYPE.

but in scylladb's existing cmake settings, it maps "release" to
"Release", despite "Release" is listed as one of the typical
CMAKE_BUILD_TYPE values.

so, in this change, to prepare for the mapping, `BuildType` is
introduced to map a build mode to its related settings. the
building settings are still kept in `cmake.${CMAKE_BUILD_TYPE}.cmake`,
but the other settings, like if a build type should be enabled or
its mappings, are stored in `BuildType` in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-20 15:07:43 +08:00
Nadav Har'El
2e031f2d8e mv, tablets, alternator: test for Alternator LSI with tablets
This patch adds a test (in the topology test framework) for issue #16313 -
the bug where Alternator LSI must use synchronous view updates but didn't.
This test fails with high probability (around 50%) before the previous patch,
which fixed this bug - and passes consistently after the patch (I ran it
100 times and it didn't fail even once).

This is the first test in the topology framework that uses the DynamoDB
API and not CQL. This required a couple of tiny convenience functions,
which are introduced in the only test file that uses them - but if we
want we can later move them out to a library file.

Unfortunately, the standard AWS SDK for Python - boto3 - is *not*
asynchronous, so this test is also not really asynchronous, and will
block the event loop while making requests to Alternator. However,
for now it doesn't matter (we do NOT run multiple tests in the same
event loop), and if it ever matters, I mentioned a couple of options
what we can do in a comment.

Because this test uses a 10-node cluster, it is skipped in debug-mode
runs. In a later patch we will replace it by a more efficent - and
more reliable - 2-node test.

Refs #16313

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-19 15:41:15 +02:00
Avi Kivity
15acceb69f Merge 'commitlog_test::test_commitlog_reader: handle segment_truncation' from Calle Wilund
Fixes #16312

This test replays a segment before it might be closed or even fully flushed, thus it can (with the new semantics) generate a segment_truncation exception if hitting eof earlier than expected. (Note: test does not use pre-allocated segments).

(First patch makes the test coroutinized to make for a nicer, easier fix change.

Closes scylladb/scylladb#16368

* github.com:scylladb/scylladb:
  commitlog_test::test_commitlog_reader: handle segment_truncation
  commitlog_test: coroutinize test_commitlog_reader
2023-12-19 15:33:38 +02:00
Botond Dénes
6abdced7b9 test: add regression tests for invoking tools with no args
This was recently found to produce a crash. Add a simple regression
test, to make sure future changes don't re-introduce problems with this
rarely used code-path.
2023-12-19 04:08:48 -05:00
Botond Dénes
76492407ab tools/utils: tool_app_template: handle the case of no args
Currently, tool_app_template::run_async() crashes when invoked with
empty argv (with just argv[0] populated). This can happen if the tool
app is invoked without any further args, e.g. just invoking `scylla
nodetool`. The crash happens because unconditional dereferencing of
argv[1] to get the current operation.
To fix, add an early-exit for this case, just printing a usage message
and exiting with exit code 2.
2023-12-19 04:08:33 -05:00
Botond Dénes
975c11a54b tools/utils: tool_app_template: remove "scylla-" prefix from app name
In other words, have all tools pass their name without the "scylla-"
prefix to `tool_app_template::config::name`. E.g., replace
"scylla-nodetool" with just "nodetool".
Patch all usages to re-add the prefix if needed.

The app name is just more flexible this way, some users might want the
name without the "scylla-" prefix (in the next patch).
2023-12-19 04:04:57 -05:00
Botond Dénes
ce317d50bc bytes.hh: correct spelling of delimiter and delimited
Pointed out by the new spellcheck workflow.

Closes scylladb/scylladb#16450
2023-12-18 20:46:21 +02:00
Mikołaj Grzebieluch
ef10b497e1 test.py: add maintenance socket test
Test that when connecting to the maintenance socket, the user has superuser permissions,
even if the authentication is enabled on the regular port.
2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
e327478bb5 test.py: enable maintenance socket in tests by default 2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
21b3ba4927 docs: add maintenance socket documentation 2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
f96d30c2b5 main: add maintenance socket
Add initialization of maintenance_auth_service and cql_maintenance_server_ctl.

Create maintenance socket which enables interaction with the node through
CQL protocol without authentication. The maintenance port is available
by Unix domain socket. It gives full-permission access.
It is created before the node joins the cluster.
2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
16ab2c28e4 main: refactor initialization of cql controller and auth service
Move initialization of cql controller and auth service to functions.
It will make it easier to create a new cql controller with a seperate auth service,
for example for the maintenance socket.

Make it possible to initialize new services before joining group0.
2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
999be1d14b auth/service: don't create system_auth keyspace when used by maintenance socket
The maintenance socket is created before joining the cluster. When maintenance auth service
is started it creates system_auth keyspace if it's missing. It is not synchronized
with other nodes, because this node hasn't joined the group0 yet. Thus a node has
a mismatched schema and is unable to join the cluster.

The maintenance socket doesn't use role management, thus the problem is solved
by not creating system_auth keyspace when maintenance auth service is created.

The logic of regular CQL port's auth service won't be changed. For the maintenance
socket will be created a new separate auth service.
2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
2b9a88d17a cql_controller: maintenance socket: fix indentation 2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch
ac61d0f695 cql_controller: add option to start maintenance socket
Add an option to listen on the maintenance socket. It is set up on an unix domain socket
and the metrics are disabled.
This enables having an independent authentication mechanism for this socket.

To start the maintenance socket, a new cql_controller has to be created
with
`db::maintenance_socket_enabled::yes` argument.

Creating maintenance socket will raise an exception if
* the path is longer than 107 chars (due to linux limits),
* a file or a directory already exists in the path.

The indentation is fixed in the next commit.
2023-12-18 17:58:13 +01:00
Tomasz Grabiec
84ea8b32b2 test: tablets: Restart cluster in a graceful manner to avoid connection drop in the middle of request serving
After restarting each node, we should wait for other nodes to notice
the node is UP before restarting the next server. Otherwise, the next
node we restart may not send the shutdown notification to the
previously restarted node, if it still sees it as down when we
initiate its shutdown. In this case, the node will learn about the
restart from gossip later, possible when we already started CQL
requests. When a node learns that some node restarted while it
considers it as UP, it will close connections to that node. This will
fail RPC sent to that node, which will cause CQL request to time-out.

Fixes #14746

Closes scylladb/scylladb#16010
2023-12-18 16:22:02 +01:00
Raphael S. Carvalho
63e4d6c965 test: Enable debug compaction logging for sstable_compaction_test
It will make it easier to understand obscure issues like
https://github.com/scylladb/scylladb/issues/13280.

Refs #13280.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#16426
2023-12-18 16:57:46 +03:00
Kefu Chai
db16048761 test/pylib: avoid using asyncio.get_event_loop()
asyncio.get_event_loop() returns the current event loop. but if there
is not, the result of `get_event_loop_policy().get_event_loop()` is
returned. but this behavior is deprecated since Python 3.12, so let's
use asyncio.run() as recommended by
https://docs.python.org/3/library/asyncio-eventloop.html.
asyncio.run() was introduced by Python 3.7, so we should be able to
use it.

this change should silence the waring when running this script
as a stand-alone script with Python 3.12.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16385
2023-12-18 16:47:31 +03:00
Raphael S. Carvalho
5fa69b8a67 replica: Fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-18 10:23:22 -03:00
Raphael S. Carvalho
8a9784d29c replica: Kill unused calculate_disk_space_used_for()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-18 10:22:19 -03:00
Avi Kivity
cd88f9eb76 Update tools/java submodule (native nodetool)
* tools/java 3963c3abf7...b7ebfd38ef (1):
  > Merge 'Add nodetool interposer script' from Botond Dénes
2023-12-18 14:50:25 +02:00
Mikołaj Grzebieluch
cf43787295 db/config: add maintenance_socket_enabled bool class 2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch
11a2748d7f auth: add maintenance_socket_role_manager
Add `maintenance_socket_role_manager` which will disable all operations
associated with roles to not depend on system_auth keyspace, which may
be not yet created when the maintenance socket starts listening
2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch
e682e362a3 db/config: add maintenance_socket variable
If set to "ignore", maintenance socket will be disabled.
If set to "workdir", maintenance socket will be opened on <scylla's
workdir>/cql.m.
Otherwise it will be opened on path provided by maintenance_socket
variable.

It is set by default to 'ignore'.
2023-12-18 11:42:05 +01:00
Kamil Braun
3b108f2e31 Merge 'db: config: make consistent_cluster_management mandatory' from Patryk Jędrzejczak
We make `consistent_cluster_management` mandatory in 5.5. This
option will be always unused and assumed to be true.

Additionally, we make `override_decommission` deprecated, as this option
has been supported only with `consistent_cluster_management=false`.

Making `consistent_cluster_management` mandatory also simplifies
the code. Branches that execute only with
`consistent_cluster_management` disabled are removed.

We also update documentation by removing information irrelevant in 5.5.

Fixes scylladb/scylladb#15854

Note about upgrades: this PR does not introduce any more limitations
to the upgrade procedure than there are already. As in
scylladb/scylladb#16254, we can upgrade from the first version of Scylla
that supports the schema commitlog feature, i.e. from 5.1 (or
corresponding Enterprise release) or later. Assuming this PR ends up in
5.5, the documented upgrade support is from 5.4. For corresponding
Enterprise release, it's from 2023.x (based on 5.2), so all requirements
are met.

Closes scylladb/scylladb#16334

* github.com:scylladb/scylladb:
  docs: update after making consistent_cluster_management mandatory
  system_keyspace, main, cql_test_env: fix indendations
  db: config: make consistent_cluster_management mandatory
  test: boost: schema_change_test: replace disable_raft_schema_config
  db: config: make override_decommission deprecated
  db: config: make force_schema_commit_log deprecated
2023-12-18 09:44:52 +01:00
Botond Dénes
a6200e99e6 Merge 'Handle S3 partial read overflows' from Pavel Emelyanov
The test case that validates upload-sink works does this by getting several random ranges from the uploaded object and checks that the content is what it should be. The range boundaries are generated like this:

```
    uint64_t len = random(1, chunk_size);
    uint64_t offset = random(file_size) - len;
```

The 2nd line is not correct, if random number happens less than the len the offset befomes "negative", i.e. -- very large 64-bit unsigned value.

Next, this offset:len gets into s3 client's get_object_contiguous() helper which in turn converts them into http range header's bytes-specifier format which is "first_bytet-last_byte" one. The math here is

```
    first_byte = offset;
    last_byte = offset + len - 1;
```

Here the overflow of the offset thing results in underflow of the last_byte -- it becomes less than the first_byte. According to RFC this range-specifier is invalid and (!) can be ignored by the server. This is what minio does -- it ignores invalid range and returns back full object.

But that's not all. When returning object portion the http request status code is PartialContent, but when the range is ignored and full object is returned, the status is OK. This makes s3 client's request fail with unexpected_status_error in the middle of the test. Then the object is removed with deferred action and actual error is printed into logs. In the end of the day logs look as if deletion of an object failed with OK status %)

fixes: #16133

Closes scylladb/scylladb#16324

* github.com:scylladb/scylladb:
  test/s3: Avoid object range overflow
  s3/client: Handle GET-with-Range overflows correctly
2023-12-18 10:00:32 +02:00
Avi Kivity
081f30d149 Merge 'Add support to tablet storage splitting' from Raphael "Raph" Carvalho
Support for splitting tablet storage is added.
Until now, tablet storage was composed of a single compaction group, i.e. a group of sstables eligible to be compacted together.

For splitting, tablet storage can now be composed of multiple compaction groups, main, left and right.

Main group stores sstables that require splitting, whereas left and right groups store sstables that were already split according to the tablet's token range.

After table storage is put in splitting mode, new writes will only go to either left or right group, depending on the token.

When all main groups completed splitting their sstables, then coordinator can proceed with tablet metadata changes.
The coordination part is not implemented yet. Only the storage part. The former will come next and will be wired into the latter.

Missing:
- splitting monitor (verify whether coordinator asked for splitting and acts accordingly) (will come next)

Closes scylladb/scylladb#16158

* github.com:scylladb/scylladb:
  replica: Introduce storage group splitting
  replica: Add storage_group::memtable_count()
  replica: Add compaction_group::empty()
  replica: Rename compaction_group_manager to storage_group_manager
  replica: Introduce concept of storage group
  compaction: Add splitting compaction task to manager
  compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting
  compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor
  replica: Allow uncompacted SSTables to be moved into a new set
  compaction: Add splitting compaction
  flat_mutation_reader: Allow interposer consumers to be stacked
  mutation_writer: Introduce token-group-based mutation segregator
  locator: Introduce tablet_map::get_tablet_id_and_range_side(token)
2023-12-17 21:12:01 +02:00
Nadav Har'El
37b5c03865 mv: coroutinize wait code for remote view updates
In the previous patch we added a delay injection point (for testing)
in the view update code. Because the code was using continuation style,
this resulted in increased indentation and ugly repetition of captures.

So in this patch we coroutinize the code that waits for remote view
updates, making it simpler, shorter, and less indented.

Note that this function still uses continuations in one place:
The remote view update is still composed of two steps that need
to happen one after another, but we don't necessarily need to wait
for them to happen. This is easiest to do with chaining continuations,
and then either waiting or not waiting for the resulting future.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 20:15:08 +02:00
Nadav Har'El
bf6848d277 mv, test: add injection point to delay remove view update
It's difficult to write a test (as we plan to do in to in the next patch)
that verifies that synchronous view updates are indeed synchronous, i.e.,
that write with CL=QUORUM on the base-table write returns only after
CL=QUORUM was also achieved in the view table. The difficulty is that in a
fast test machine, even if the synchronous-view-update is completely buggy,
it's likely that by the time the test reads from the view, all view updates
will have been completed anyway.

So in this patch we introduce an injection point, for testing, named
"delay_before_remote_view_update", which adds a delay before the base
replica sends its update to the remote view replica (in case the view
replica is indeed remote). As usual, this injection point isn't
configurable - when enabled it adds a fixed (0.5 second) delay, on all
view updates on all tables.

The existing code used continuation-style Seastar programming, and the
addition of the injection point in this patch made it even uglier, so
in the next patch we will coroutine-ize this code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 20:15:08 +02:00
Nadav Har'El
2c0b472f44 alternator: explicitly request synchronous updates for LSI
DynamoDB's *local* secondary index (LSI) allows strongly-consistent
reads from the materialized view, which must be able to read what was
previously written to the base. To support this, we need the view to
use the "synchronous_updates".

Previously, with vnodes, there was no need for using this option
explicitly, because an LSI has the same partition key as the base table
so the base and view replicas are the same, and the local writes are
done synchronously. But with tablets, this changes - there is no longer
a guarantee that the base and view tablets are located on the same node.
So to restore the strong consistency of LSIs when tablets are enabled,
this patch explicitly adds the "synchronous_updates" option to views
created by Alternator LSIs. We do *not* add this option for GSIs - those
do not support strongly-consistent reads.

This fix was tested by a test that will be introduced in the following
patches. The test showed that before this patch, it was possible that
reading with ConsistentRead=True from an LSI right after the base was
written would miss the new changes, but after this patch, it always
sees the new data in the LSI.

Fixes #16313.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 20:14:59 +02:00
Nadav Har'El
d11f5e9625 alternator: fix view creation when using tablets
In commit 88a5ddabce, we fixed materialized
view creation to support tablets. We added to the function called to
create materialized views in CQL, prepare_new_view_announcement()
a missing call to the on_before_create_column_family() notifier that
creates tablets for this new view.

We have the same problem in Alternator when creating a view (GSI or LSI).
The Alternator code does not use prepare_new_view_announcement(), and
instead uses the lower-level function add_table_or_view_to_schema_mutation()
so it didn't get the call to the notifier, so we must add it here too.

Before this patch, creating an Alternator table with tablets (which has
become possible after the previous patch) fails with "Tablet map not found
for table <uuid>". With this patch, it works.

A test for materialized views in Alternator will come in a following
patch, and will test everything together - the CreateTable tag to use
tablets (from the previous patch), the LSI/GSI creation (fixed in this patch)
and the correct consistency of the LSI (fixed in the next patch).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 19:55:36 +02:00
Nadav Har'El
8e356d8c31 alternator: add experimental method to create a table with tablets
As explained in issue #16203, we cannot yet enable tablets on Alternator
keyspaces by default, because support for some of the features that
Alternator needs, such as CDC, is not yet available.
Nevertheless, to start testing Alternator integration with tablets,
we want to provide a way to enable tablets in Alternator for tests.

In this patch we add support for a tag, 'experimental:initial_tablets',
which if added on a table during creation, uses tablets for its keyspace.
The value of this tag is a numeric string, and it is exactly analogous
to the 'initial_tablets' property we have in CQL's NetworkTopologyStrategy.

We name this tag with the "experimental:" prefix to emphesize that it
is experimental, and the way to enable or disable tablets will probably
change later.

The new tag only has effect when added while *creating* a table.
Adding, deleting or changing it later on an existing table will have
no effect.

A later patch will have tests that use this tag to test Alternator with
tablets.

Refs #16203.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-12-17 19:55:30 +02:00
Kefu Chai
e436856cf7 token_metadata: pass node id when formatting it
before this change, we use the format string of
"Can't replace node {} with itself", but fail to include the host id as seastar::format()'s arguments. this fails the compile-time check of fmt, which is yet merged. so, if we really run into this problem, {fmt} would throw before the intended runtime_error is raised -- currently, seastar::log formats the logging messages at runtime, this is not intended.

in this change, we pass `existing_node`, so it can be formatted, and the
intended error message can be printed in log.

Refs 11a4908683
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16342
2023-12-17 19:54:09 +02:00
Evgeniy Naydanov
10eebe3c66 test: use different IP addresses for listen and RPC addresses
Scylla can be configured to use different IPs for the internode communication
and client connections.  This test allocates and configure unique IP addresses
for the client connections (`rpc_address`) for 2-nodes cluster.

Two scenarios tested:
  1) Change RPC IPs sequentially
  2) Change RPC IPs simultaneously

Closes scylladb/scylladb#15965
2023-12-17 18:00:09 +02:00
Raphael S. Carvalho
546b31846a replica: Introduce storage group splitting
This introduces the ability to split a storage group.
The main compaction group is split into left and right groups.

set_split() is used to set the storage group to splitting mode, which
will create left and right compaction groups. Incoming writes will
now be placed into memtable of either left or right groups.

split() is used to complete the splitting of a group. It only
returns when all preexisting data is split. That means main
compaction group will be empty and all the data will be stored
in either left or right group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 12:02:01 -03:00
Raphael S. Carvalho
3c5b00ea04 replica: Add storage_group::memtable_count()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
e5a9299696 replica: Add compaction_group::empty()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
213b2f1382 replica: Rename compaction_group_manager to storage_group_manager
That's to reflect the fact that the manager now works with
storage groups instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
15de1cdcbc replica: Introduce concept of storage group
Storage group is the storage of tablets. This new concept is helpful
for tablet splitting, where the storage of tablet will be split
in multiple compaction groups, where each can be compacted
independently.

The reason for not going with arena concept is that it added
complexity, and it felt much more elegant to keep compaction
group unchanged which at the end of the day abstracts the concept
of a set of sstables that can be compacted and operated
independently.

When splitting, the storage group for a tablet may therefore own
multiple compaction groups, left, right, and main, where main
keeps the data that needs splitting. When splitting completes,
only left and right compaction groups will be populated.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
dd1a6d6309 compaction: Add splitting compaction task to manager
The task for splitting compaction will run until all sstables
in the main set are split. The only exceptions are shutdown
or user has explicitly asked for abort.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
f87161e556 compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
c96938c49b compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
55bcfba4de replica: Allow uncompacted SSTables to be moved into a new set
With off-strategy, we allow sstables to be moved into a new sstable
set even if they didn't undergo reshape compaction.
That's done by specifying a sstable is present both in input and
output, with the completion desc.

We want to do the same with other compaction types.
Think for example of split compaction: compaction manager may decide
a sstable doesn't need splitting, yet it wants that sstable to be
moved into a new sstable set.

Theoretically, we could introduce new code to do this movement,
but more code means increased maintenance burden and higher chances
of bugs. It makes sense to reuse the compaction completion path,
as we do today with off-strategy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:09 -03:00
Raphael S. Carvalho
b1c5d5dd4e compaction: Add splitting compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:40:08 -03:00
Raphael S. Carvalho
3dcb800a96 flat_mutation_reader: Allow interposer consumers to be stacked
reader_consumer_v2 being a noncopyable_function imposes a restriction
when stacking one interposer consumer on top of another.

Think for example of a token-based segregator on top of a timestamp
based one.

To achieve that, the interposer consumer creator must be reentrant,
such that the consumer can be created on each "channel", but today
the creator becomes unusable after first usage.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:26:32 -03:00
Raphael S. Carvalho
c8668b90e3 mutation_writer: Introduce token-group-based mutation segregator
Token group is an abstraction that allows us to easily segregate a
mutation stream into buckets. Groups share the same properties as
compaction groups. Groups follow the ring order and they don't
overlap each other. Groups are defined according to a classifier,
which return an id given a token. It's expected that classifier
return ids in monotonic increasing order.

The reasons for this abstraction are:
1) we don't want to make segregator aware of compaction groups
2) splitting happens before tablet metadata is changed, so the
the segregator will have to classify based on whether the token
belongs to left (group id 0) or right (group id 1) side of
the range to be split.

The reason for not extending sstable writer instead, is that
today, writer consumer can only tell producer to switch to a
new writer, when consuming the end of a partition, but that
would be too late for us, as we have to decide to move to
a new writer at partition start instead.

It will be wired into compaction when it happens in split mode.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:26:32 -03:00
Raphael S. Carvalho
bcbba9a5e3 locator: Introduce tablet_map::get_tablet_id_and_range_side(token)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-12-17 11:26:32 -03:00
Kefu Chai
c36945dea2 tasks: include used headers
when compiling with Clang-18 + libstdc++-13, the tree fails to build:
```
/home/kefu/dev/scylladb/tasks/task_manager.hh:45:36: error: no template named 'list' in namespace 'std'
   45 |     using foreign_task_list = std::list<foreign_task_ptr>;
      |                               ~~~~~^
```
so let's include the used header

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16433
2023-12-17 15:28:02 +02:00
Kefu Chai
81d5c4e661 db/system_keyspace: explicitly instantiate used template
future<std::optional<utils::UUID>>
system_keyspace::get_scylla_local_param_as<utils::UUID>(const sstring&)
is used by db/schema_tables.cc. so let's instantiate this template
explicitly.
otherwise we'd have following link failure:

```
: && /home/kefu/.local/bin/clang++ -ffunction-sections -fdata-sections -O3 -g -gz -Xlinker --build-id=sha1 -fuse-ld=lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 -Xlinker --gc-sections CMakeFiles/scylla_version.dir/Release/release.cc.o CMakeFiles/scylla.dir/Release/main.cc.o -o Release/scylla  Release/libscylla-main.a  api/Release/libapi.a  alternator/Release/libalternator.a  db/Release/libdb.a  cdc/Release/libcdc.a  compaction/Release/libcompaction.a  cql3/Release/libcql3.a  data_dictionary/Release/libdata_dictionary.a  gms/Release/libgms.a  index/Release/libindex.a  lang/Release/liblang.a  message/Release/libmessage.a  mutation/Release/libmutation.a  mutation_writer/Release/libmutation_writer.a  raft/Release/libraft.a  readers/Release/libreaders.a  redis/Release/libredis.a  repair/Release/librepair.a  replica/Release/libreplica.a  schema/Release/libschema.a  service/Release/libservice.a  sstables/Release/libsstables.a  streaming/Release/libstreaming.a  test/perf/Release/libtest-perf.a  thrift/Release/libthrift.a  tools/Release/libtools.a  transport/Release/libtransport.a  types/Release/libtypes.a  utils/Release/libutils.a  seastar/Release/libseastar.a  /usr/lib64/libboost_program_options.so.1.81.0  test/lib/Release/libtest-lib.a  Release/libscylla-main.a  -Xlinker --push-state -Xlinker --whole-archive  auth/Release/libscylla_auth.a  -Xlinker --pop-state  /usr/lib64/libcrypt.so  cdc/Release/libcdc.a  compaction/Release/libcompaction.a  mutation_writer/Release/libmutation_writer.a  -Xlinker --push-state -Xlinker --whole-archive  dht/Release/libscylla_dht.a  -Xlinker --pop-state  index/Release/libindex.a  -Xlinker --push-state -Xlinker --whole-archive  locator/Release/libscylla_locator.a  -Xlinker --pop-state  message/Release/libmessage.a  gms/Release/libgms.a  sstables/Release/libsstables.a  readers/Release/libreaders.a  schema/Release/libschema.a  -Xlinker --push-state -Xlinker --whole-archive  tracing/Release/libscylla_tracing.a  -Xlinker --pop-state  service/Release/libservice.a  node_ops/Release/libnode_ops.a  service/Release/libservice.a  node_ops/Release/libnode_ops.a  raft/Release/libraft.a  repair/Release/librepair.a  streaming/Release/libstreaming.a  replica/Release/libreplica.a  /usr/lib64/libabsl_raw_hash_set.so.2308.0.0  /usr/lib64/libabsl_hash.so.2308.0.0  /usr/lib64/libabsl_city.so.2308.0.0  /usr/lib64/libabsl_bad_variant_access.so.2308.0.0  /usr/lib64/libabsl_low_level_hash.so.2308.0.0  /usr/lib64/libabsl_bad_optional_access.so.2308.0.0  /usr/lib64/libabsl_hashtablez_sampler.so.2308.0.0  /usr/lib64/libabsl_exponential_biased.so.2308.0.0  /usr/lib64/libabsl_synchronization.so.2308.0.0  /usr/lib64/libabsl_graphcycles_internal.so.2308.0.0  /usr/lib64/libabsl_kernel_timeout_internal.so.2308.0.0  /usr/lib64/libabsl_stacktrace.so.2308.0.0  /usr/lib64/libabsl_symbolize.so.2308.0.0  /usr/lib64/libabsl_malloc_internal.so.2308.0.0  /usr/lib64/libabsl_debugging_internal.so.2308.0.0  /usr/lib64/libabsl_demangle_internal.so.2308.0.0  /usr/lib64/libabsl_time.so.2308.0.0  /usr/lib64/libabsl_strings.so.2308.0.0  /usr/lib64/libabsl_int128.so.2308.0.0  /usr/lib64/libabsl_strings_internal.so.2308.0.0  /usr/lib64/libabsl_string_view.so.2308.0.0  /usr/lib64/libabsl_throw_delegate.so.2308.0.0  /usr/lib64/libabsl_base.so.2308.0.0  /usr/lib64/libabsl_spinlock_wait.so.2308.0.0  /usr/lib64/libabsl_civil_time.so.2308.0.0  /usr/lib64/libabsl_time_zone.so.2308.0.0  /usr/lib64/libabsl_raw_logging_internal.so.2308.0.0  /usr/lib64/libabsl_log_severity.so.2308.0.0  -lsystemd  /usr/lib64/libz.so  /usr/lib64/libdeflate.so  types/Release/libtypes.a  utils/Release/libutils.a  /usr/lib64/libcryptopp.so  /usr/lib64/libboost_regex.so.1.81.0  /usr/lib64/libicui18n.so  /usr/lib64/libicuuc.so  /usr/lib64/libboost_unit_test_framework.so.1.81.0  seastar/Release/libseastar_perf_testing.a  /usr/lib64/libjsoncpp.so.1.9.5  interface/Release/libinterface.a  /usr/lib64/libthrift.so  db/Release/libdb.a  data_dictionary/Release/libdata_dictionary.a  cql3/Release/libcql3.a  transport/Release/libtransport.a  cql3/Release/libcql3.a  transport/Release/libtransport.a  lang/Release/liblang.a  /usr/lib64/liblua-5.4.so  -lm  rust/Release/libwasmtime_bindings.a  rust/librust_combined.a  /usr/lib64/libsnappy.so.1.1.10  mutation/Release/libmutation.a  seastar/Release/libseastar.a  /usr/lib64/libboost_program_options.so  /usr/lib64/libboost_thread.so  /usr/lib64/libboost_chrono.so  /usr/lib64/libboost_atomic.so  /usr/lib64/libcares.so  /usr/lib64/libcryptopp.so  /usr/lib64/libfmt.so.10.0.0  /usr/lib64/liblz4.so  -ldl  /usr/lib64/libgnutls.so  -latomic  /usr/lib64/libsctp.so  /usr/lib64/libyaml-cpp.so  /usr/lib64/libhwloc.so  //usr/lib64/liburing.so  /usr/lib64/libnuma.so  /usr/lib64/libxxhash.so && :
ld.lld: error: undefined symbol: seastar::future<std::optional<utils::UUID>> db::system_keyspace::get_scylla_local_param_as<utils::UUID>(seastar::basic_sstring<char, unsigned int, 15u, true> const&)
>>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981)
>>>               schema_tables.cc.o:(db::schema_tables::merge_schema(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&, std::vector<mutation, std::allocator<mutation>>, bool)::$_1::operator()()) in archive db/Release/libdb.a
>>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981)
>>>               schema_tables.cc.o:(db::schema_tables::recalculate_schema_version(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&)::$_0::operator()() const) in archive db/Release/libdb.a
>>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981)
>>>               schema_tables.cc.o:(db::schema_tables::merge_schema(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&, std::vector<mutation, std::allocator<mutation>>, bool)::$_1::operator()() (.resume)) in archive db/Release/libdb.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```

it seems that, without the explicit instantiation, clang-18
just inlines the body of the instantiated template function at the
caller site.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16434
2023-12-17 15:12:05 +02:00
Wojciech Mitros
629ea63922 rust: update dependencies
The currently used versions of "time" and "rustix" depencies
had minor security vulnerabilities.
In this patch:
- the "rustix" crate is updated
- the "chrono" crate that we depend on was not compatible
with the version of the "time" crate that had fixes, so
we updated the "chrono" crate, which actually removed the
dependency on "time" completely.
Both updated were performed using "cargo update" on the
relevant package and the corresponding version.

Fixes #15772

Closes scylladb/scylladb#16378
2023-12-17 13:20:25 +02:00
Kefu Chai
10a11c2886 token_metadata: pass node id when formatting it
before this change, we use the format string of
"Can't replace node {} with itself", but fail to include the host id as seastar::format()'s arguments. this fails the compile-time check of fmt, which is yet merged. so, if we really run into this problem, {fmt} would throw before the intended runtime_error is raised -- currently, seastar::log formats the logging messages at runtime, this is not intended.

in this change, we pass `existing_node`, so it can be formatted, and the
intended error message can be printed in log.

Refs 11a4908683
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16422
2023-12-15 16:43:44 +01:00
Kefu Chai
273ee36bee tools/scylla-sstable: add scylla sstable shard-of command
when migrating to the uuid-based identifiers, the mapping from the
integer-based generation to the shard-id is preserved. we used to have
"gen % smp_count" for calculating the shard which is responsible to host
a given sstable. despite that this is not a documented behavior, this is
handy when we try to correlate an sstable to a shard, typically when
looking at a performance issue.

in this change, a new subcommand is added to expose the connection
between the sstable and its "owner" shards.

Fixes #16343
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16345
2023-12-15 11:36:45 +02:00
Kefu Chai
fa3efe6166 .git: use ssh/key or token for auth
enable checkout action to get authenticated if the action need to
clone a non-public repo.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16421
2023-12-15 11:34:50 +02:00
Kamil Braun
6a4106edf3 migration_manager: don't attach empty system.scylla_local mutation in migration request handler
In effb9fb3cb migration request handler
(called when a node requests schema pull) was extended with a
`system.scylla_local` mutation:
```
        cm.emplace_back(co_await self._sys_ks.local().get_group0_schema_version());
```

This mutation is empty if the GROUP0_SCHEMA_VERSIONING feature is
disabled.

Nevertheless, it turned out to cause problems during upgrades.
The following scenario shows the problem:

We upgrade from 5.2 to enterprise version with the aforementioned patch.
In 5.2, `system.scylla_local` does not use schema commitlog.
After the first node upgrades to the enterprise version, it immediately
on boot creates a new enterprise-only table
(`system_replicated_keys.encrypted_keys`) -- the specific table is not
important, only the fact that a schema change is performed.
This happens before the restarting node notices other nodes being UP, so
the schema change is not immediately pushed to the other nodes.
Instead, soon after boot, the other non-upgraded nodes pull the schema
from the upgraded node.
The upgraded node attaches a `system.scylla_local` mutation to the
vector of returned mutations.
The non-upgraded nodes try to apply this vector of mutations. Because
some of these mutations are for tables that already use schema
commitlog, while the `system.scylla_local` table does not use schema
commitlog, this triggers the following error (even though the mutation
is empty):
```
    Cannot apply atomically across commitlog domains: system.scylla_local, system_schema.keyspaces
```

Fortunately, the fix is simple -- instead of attaching an empty
mutation, do not attach a mutation at all if the handler of migration
request notices that group0_schema_version is not present.

Note that group0_schema_version is only present if the
GROUP0_SCHEMA_VERSIONING feature is enabled, which happens only after
the whole upgrade finishes.

Refs: scylladb/scylladb#16414

Not using "Fixes" because the issue will only be fixed once this PR is
merged to `master` and the commit is cherry-picked onto next-enterprise.

Closes scylladb/scylladb#16416
2023-12-14 22:58:13 +01:00
Avi Kivity
2b8392b8b8 Merge 'database, reader_concurrency_semaphore: deduplicate reader_concurrency_semaphore metrics ' from Botond Dénes
Reduce code duplication by defining each metric just once, instead of three times, by having the semaphore register metrics by itself. This also makes the lifecycle of metrics contained in that of the semaphore. This is important on enterprise where semaphores are added and removed, together with service levels.
We don't want all semaphores to export metrics, so a new parameter is introduced and all call-sites make a call whether they opt-in or not.

Fixes: https://github.com/scylladb/scylladb/issues/16402

Closes scylladb/scylladb#16383

* github.com:scylladb/scylladb:
  database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics
  reader_concurrency_semaphore: add register_metrics constructor parameter
  sstables: name sstables_manager
2023-12-14 18:26:24 +02:00
Patryk Jędrzejczak
f23f8628b7 docs: update after making consistent_cluster_management mandatory
We remove Raft documentation irrelevant in 5.5.

One of the changes is removing a part of the "Enabling Raft" section
in raft.rst. Since Raft is mandatory in 5.5, the only way to enable
it in this version is by performing a rolling upgrade from 5.4. We
only need to have this case well-documented. In particular, we
remove information that also appears in the upgrade guides like
verifying schema synchronization.

Similarly, we remove a sentence from the "Manual Recovery Procedure"
section in handling-node-failures.rst because it mentions enabling
Raft manually, which is impossible in 5.5.

The rest of the changes are just removing information about
checking or setting consistent_cluster_management, which has become
unused.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
dced4bb924 system_keyspace, main, cql_test_env: fix indendations
Broken in the previous patch.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
5ebfbf42bc db: config: make consistent_cluster_management mandatory
Code that executed only when consistent_cluster_management=false is
removed. In particular, after this patch:
- raft_group0 and raft_group_registry are always enabled,
- raft_group0::status_for_monitoring::disabled becomes unused,
- topology tests can only run with consistent_cluster_management.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
7dd7ec8996 test: boost: schema_change_test: replace disable_raft_schema_config
In the following commits, we make consistent cluster management
mandatory. This will make disable_raft_schema_config unusable,
so we need to get rid of it. However, we don't want to remove
tests that use it.

The idea is to use the Raft RECOVERY mode instead of disabling
consistent cluster management directly.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
a54f9052fc db: config: make override_decommission deprecated
The override_decommission option is supported only when
consistent_cluster_management is disabled. In the following commit,
we make consistent_cluster_management mandatory, which makes
overwrite_decommission unusable.
2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak
571db3c983 db: config: make force_schema_commit_log deprecated
In scylladb/scylladb#16254, we made force_schema_commit_log unused.
After this change, if someone passes this option as the command line
argument, the boot fails. This behavior is undesired. We only want
this option to be ignored. We can achieve this effect by making it
deprecated.
2023-12-14 16:53:46 +01:00
Paweł Zakrzewski
5af066578a doc: Offer replication_factor=3 as the default in the examples
The goal is to make the available defaults safe for future use, as they
are often taken from existing config files or documentation verbatim.

Referenced issue: #14290

Closes scylladb/scylladb#15947
2023-12-14 16:14:01 +01:00
Piotr Dulikowski
c0cf3e398a raft_rpc: use compat source location instead of std one
The std::source_location is broken on some versions of clang. In order
to be able to use its functionality in code, seastar defines
seastar::compat::source_location, which is a typedef over
std::source_location if the latter works, or s custom, dummy
implementation if the std type doesn't work. Therefore, sometimes
seastar::compat::source_location == std::source_location, but not
always.

In service/raft/raft_rpc.cc, both std source location and compat source
location are used and std source location sometimes passed as an
argument to compat source location, breaking builds on older toolchains.
Fix this by switching the code there to only use compat source location.

Fixes: scylladb/scylladb#16336

Closes scylladb/scylladb#16337
2023-12-14 16:14:01 +01:00
Kefu Chai
764d1e01da locator: include used headers
* exceptions/exceptions.hh is not used
* std::set is not used, while std::unordered_set is uset

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16406
2023-12-14 16:14:01 +01:00
Kefu Chai
37868e5fdc tools: fix spelling errors in user-facing messages
they are identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16409
2023-12-13 21:39:46 +02:00
Kefu Chai
caa0230e5d test/cql-pytest: use raw string when appropriate
we use "\w" to represent a character class in Python. see
https://docs.python.org/3/library/re.html. but "\" should be
escaped as well, CPython accepts "\w" after trying to find
an escaped character of "\."  but failed, and leave "\." as it is.
but it complains.

in this change, we use raw string to avoid escaping "\" in
the regular expression.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16405
2023-12-13 21:14:32 +02:00
Israel Fruchter
514ef48d75 docker: put cqlsh configuration in correct place
since always we were putting cqlsh configuration into `~/.cqlshrc`
acording to commit from 8 years ago [1], this path is deprecated.

until this commit [2], actully remove this path from cqlsh code

as part of moving to scylla-cqlsh, we got [2], and didn't
notice until the first release with it.

this change write the configuration into `~/.casssndra/cqlshrc`
as this is the default place cqlsh is looking.

[1]: 13ea8a6669/bin/cqlsh.py (L264)
[2]: 2024ea4796
Fixes: scylladb/scylladb#16329

Closes scylladb/scylladb#16340
2023-12-13 18:40:52 +02:00
Kamil Braun
26cbd28883 Merge 'token_metadata: switch to host_id' from Petr Gusev
In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.

The refactoring is structured as follows:
  * Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version.
  * Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading.
  * Go over all the places which read `token_metadata` and switch them to the new version.
  * Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template.

These series [depends](1745a1551a) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](95c726a8df) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.

Closes scylladb/scylladb#15903

* github.com:scylladb/scylladb:
  topology: remove_endpoint: remove inet_address overload
  token_metadata: topology: cleanup add_or_update_endpoint
  token_metadata: add_replacing_endpoint: forbid replacing node with itself
  topology: drop key_kind, host_id is now the primary key
  dc_rack_fn: make it non-template
  token_metadata: drop the template
  shared_token_metadata: switch to the new token_metadata
  gossiper: use new token_metadata
  database: get_token_metadata -> new token_metadata
  erm: switch to the new token_metadata
  storage_service: get_token_metadata -> token_metadata2
  storage_service: get_token_to_endpoint_map: use new token_metadata
  api/token_metadata: switch to new version
  storage_service::on_change: switch to new token_metadata
  cdc: switch to token_metadata2
  calculate_natural_endpoints: fix indentation
  calculate_natural_endpoints: switch to token_metadata2
  storage_service: get_changed_ranges_for_leaving: use new token_metadata
  decommission_with_repair, removenode_with_repair -> new token_metadata
  rebuild_with_repair, replace_with_repair: use new token_metadata
  bootstrap: use new token_metadata
  tablets: switch to token_metadata2
  calculate_effective_replication_map: use new token_metadata
  calculate_natural_endpoints: fix formatting
  abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
  network_topology_strategy_test: update new token_metadata
  storage_service: on_alive: update new token_metadata
  storage_service: handle_state_bootstrap: update new token_metadata
  storage_service: snitch_reconfigured: update new token_metadata
  storage_service: leave_ring: update new token_metadata
  storage_service: node_ops_cmd_handler: update new token_metadata
  storage_service: node_ops_cmd_handler: add coordinator_host_id
  storage_service: bootstrap: update new token_metadata
  storage_service: join_token_ring: update new token_metadata
  storage_service: excise: update new token_metadata
  storage_service: join_cluster: update new token_metadata
  storage_service: on_remove: update new token_metadata
  storage_service: handle_state_normal: fill new token_metadata
  storage_service: topology_state_load: fill new token_metadata
  storage_service: adjust update_topology_change_info to update new token_metadata
  topology: set self host_id on the new topology
  locator::topology: allow being_replaced and replacing nodes to have the same IP
  token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
  token_metadata: get_host_id: exception -> on_internal_error
  token_metadata: add get_all_ips method
  token_metadata: support host_id-based version
  token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter.
  locator: make dc_rack_fn a template
  locator/topology: add key_kind parameter
  token_metadata: topology_change_info: change field types to token_metadata_ptr
  token_metadata: drop unused method get_endpoint_to_token_map_for_reading
2023-12-13 16:35:52 +01:00
Avi Kivity
7fce057cda database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics
reader_concurrency_sempaphore are triplicated: each metrics is registered
for streaming, user, and system classes.

To fix, just move the metrics registration from database to
reader_concurrency_sempaphore, so each reader_concurrency_sempaphore
instantiated will register its metrics (if its creator asked for it).

Adjust the names given to reader_concurrency_sempaphore so we don't
change the labels.

scylla-gdb is adjusted to support the new names.
2023-12-13 09:16:18 -05:00
Nadav Har'El
89d311ec23 tablet, mv: fix doc on implicit synchronous update
The document docs/cql/cql-extensions.md documents Scylla's extension
of *synchronous* view updates, and mentioned a few cases where view
updates are synchronous even if synchronous updates are not requested
explicitly. But with tablets, these statements and examples are no
longer correct - with tablets, base and view tablets may find
themselves migrated to entirely different nodes. So in this patch
we correct the statements that are no longer accurate.

Note that after this patch we still have in this document, and in
other documents, similar promises about CQL *local secondary indexes*.
Either the documentation or the implementation needs to change in
that case too, but we'll do it in a separate patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16369
2023-12-13 14:58:06 +02:00
Botond Dénes
e1b30f50be reader_concurrency_semaphore: add register_metrics constructor parameter
To be used in the next patch to control whether the semaphore registers
and exports metrics or not. We want to move metric registration to the
semaphore but we don't want all semaphores to export metrics. The
decision on whether a semaphore should or shouldn't export metrics
should be made on a case-by-case basis so this new parameter has no
default value (except for the for_tests constructor).
2023-12-13 06:25:45 -05:00
Avi Kivity
814f3eb6b5 sstables: name sstables_manager
Soon, the reader_concurrency_semaphore will require a unique
and meaningful name in order to label its metrics. To prepare
for that, name sstable_manager instances. This will be used
to generate a name for sstable_manager's reader_concurrency_semaphore.
2023-12-13 04:40:33 -05:00
Kefu Chai
5ea3af067d .git: add codespell workflow
to identify misspelling in the code.

The GitHub actions in this workflow run codespell when a new pull
request is created targetting master or enterprise branch. Errors
will be annotated in the pull request. A new entry along with the
existing tests like build, unit test and dtest will be added to the
"checks" shown in github PR web UI. one can follow the "Details" to
find the details of the errors.

unfortunately, this check checks all text files unless they
are explicitly skipped, not just the new ones added / changed in the
PR under test. in other words, if there are 42 misspelling
errors in master, and you are adding a new one in your PR,
this workflow shows all of the 43 errors -- both the old
and new ones.

the misspelling in the code hurts the user experience and some
time developer's experience, but the text files under test/cql
can be sensitive to the text, sometimes, a tiny editing could
break the test, so it is added to the skip list.

So far, since there are lots of errors identified by the tool,
before we address all of them, the identified problem are only
annotated,  they are not considered as error. so, they don't
fail the check.

and in this change `only_warn` is set, so the check does not
fail even if there are misspellings. this prevents the distractions
before all problems are addressed. we can remove this setting in
future, once we either fix all the misspellings or add the ignore
words or skip files. but either way, the check is not considered
as blockers for merging the tested PR, even if this check fails --
the check failure is just represented for information purpose, unless
we make it a required in the github settings for the target
branch.

if want to change this, we can configure it in github's Branch
protectionn rule on a per-branch basis, to make this check a
must-pass.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16285
2023-12-13 10:53:09 +02:00
Aleksandra Martyniuk
9b9ea1193c tasks: keep task's children in list
If std::vector is resized its iterators and references may
get invalidated. While task_manager::task::impl::_children's
iterators are avoided throughout the code, references to its
elements are being used.

Since children vector does not need random access to its
elements, change its type to std::list<foreign_task_ptr>, which
iterators and references aren't invalidated on element insertion.

Fixes: #16380.

Closes scylladb/scylladb#16381
2023-12-13 10:47:27 +02:00
Yaniv Kaul
0b0a3ee7fc Typos: fix typos in code
Last batch, hopefully, sing codespell, went over the docs and fixed some typos.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#16388
2023-12-13 10:45:21 +02:00
Botond Dénes
57f5ac03e1 Merge 'scripts/coverage.py: cleanups' from Kefu Chai
various cleanups in `scripts/coverage.py`. they do not change the behavior of this script in the happy path.

Closes scylladb/scylladb#16399

* github.com:scylladb/scylladb:
  scripts/coverage.py: s/exit/sys.exit/
  scripts/coverage.py: do not inherit Value from argparse.Action
  scripts/coverage.py: use `is not None`
  scripts/coverage.py: correct the formatted string in error message
  scripts/coverage.py: do not use f-string when nothing to format
  scripts/coverage.py: use raw string to avoid escaping "\"
2023-12-13 10:25:44 +02:00
Kefu Chai
1b57ba44eb scripts/coverage.py: s/exit/sys.exit/
the former is supposed to be used in "the interactive interpreter
shell and should not be used in programs.". this function
prints out its argument, and the exit code is 1. so just
print the error message using sys.exit()

see also
https://docs.python.org/3/library/sys.html#sys.exit and
https://docs.python.org/3/library/constants.html#exit

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:50:00 +08:00
Kefu Chai
7600b68d5c scripts/coverage.py: do not inherit Value from argparse.Action
as Value is not an argparse.Action, and it is not passed as the argument
of the "action" parameter. neither does it implement the `__call__`
function. so just derive it from object.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Kefu Chai
9c112dacc4 scripts/coverage.py: use is not None
`is not None` is the more idiomatic Python way to check if an
expression evaluates to not None. and it is more readable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Kefu Chai
0d15fc57d5 scripts/coverage.py: correct the formatted string in error message
the formatted string should be `basename`. `input_file` is not defined
in that context.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Kefu Chai
bc94b7bc04 scripts/coverage.py: do not use f-string when nothing to format
there is no string interpolation in this case, so drop the "f" prefix.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Kefu Chai
c3c715236d scripts/coverage.py: use raw string to avoid escaping "\"
we use "\." to escape "." in a regular expression. but "\" should
be escaped as well, CPython accepts "\." after trying to find
an escaped character of "\."  but failed, and leave "\." as it is.
but it complains:

```
/home/kefu/dev/scylladb/scripts/coverage.py:107: SyntaxWarning: invalid escape sequence '\.'
  input_file_re_str = f"(.+)\.profraw(\.{__DISTINCT_ID_RE})?"
```

in this change, we use raw string to avoid escaping "\" in
the regular expression.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-13 10:41:52 +08:00
Tomasz Grabiec
cdc53d0a49 test: tablets: Add test case which tests table drop concurrent with migration 2023-12-13 00:06:56 +01:00
Avi Kivity
1f7c049791 Update tools/java submodule (minor security fixes)
* tools/java 29fe44da84...3963c3abf7 (2):
  > Revert "build: update `guava` dependency"
  > Merge "Update Netty , Guava and Logback dependencies" from Yaron Kaikov

    Ref scylladb/scylla-tools-java#363
    Ref scylladb/scylla-tools-java#364
2023-12-12 22:23:20 +02:00
Avi Kivity
c3d679e31e Merge 'sstables, utils: do not include unused header' from Kefu Chai
do not include unused header

Closes scylladb/scylladb#16386

* github.com:scylladb/scylladb:
  utils: bit_cast: drop unused #includes
  sstables: writer: do not include unused header
2023-12-12 22:22:36 +02:00
Avi Kivity
22b77edef3 Merge 'scylla-nodetool: implement the scrub command' from Botond Dénes
On top of the capabilities of the java-nodetool command, the following additional functionalit is implemented:
* Expose quarantine-mode option of the scrub_keyspace REST API
* Exit with error and print a message, when scrub finishes with abort or validation_errors return code

The command comes with tests and all tests pass with both the new and the current nodetool implementations.

Refs: #15588
Refs: #16208

Closes scylladb/scylladb#16391

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the scrub command
  test/nodetool: rest_api_mock.py: add missing "f" to error message f string
  api: extract scrub_status into its own header
2023-12-12 22:22:35 +02:00
Petr Gusev
9d93a518ac topology: remove_endpoint: remove inet_address overload
The overload was used only in tests.
2023-12-12 23:19:54 +04:00
Petr Gusev
fbf507b1ba token_metadata: topology: cleanup add_or_update_endpoint
Make host_id parameter non-optional and
move it to the beginning of the arguments list.

Delete unused overloads of add_or_update_endpoint.

Delete unused overload of token_metadata::update_topology
with inet_address argument.
2023-12-12 23:19:54 +04:00
Petr Gusev
11a4908683 token_metadata: add_replacing_endpoint: forbid replacing node with itself
This used to work before in replace-with-same-ip scenario, but
with host_id-s it's no longer relevant.

base_token_metadata has been removed from topology_change_info
because the conditions needed for its creation
are no longer met.
2023-12-12 23:19:54 +04:00
Petr Gusev
3b59919a9c topology: drop key_kind, host_id is now the primary key 2023-12-12 23:19:54 +04:00
Petr Gusev
8c551f9104 dc_rack_fn: make it non-template 2023-12-12 23:19:54 +04:00
Petr Gusev
7b55ccbd8e token_metadata: drop the template
Replace token_metadata2 ->token_metadata,
make token_metadata back non-template.

No behavior changes, just compilation fixes.
2023-12-12 23:19:54 +04:00
Petr Gusev
799f747c8f shared_token_metadata: switch to the new token_metadata 2023-12-12 23:19:54 +04:00
Petr Gusev
c7314aa8e2 gossiper: use new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
e50dbef3e2 database: get_token_metadata -> new token_metadata
database::get_token_metadata() is switched to token_metadata2.

get_all_ips method is added to the host_id-based token_metadata, since
its convenient and will be used in several places. It returns all current
nodes converted to inet_address by means of the topology
contained within token_metadata.

hint_sender::can_send: if the node has already left the
cluster we may not find its host_id. This case is handled
in the same way as if it's not a normal token owner - we
simply send a hint to all replicas.
2023-12-12 23:19:53 +04:00
Petr Gusev
11cc21d0a9 erm: switch to the new token_metadata
In this commit we replace token_metadata with token_metadata2
in the erm interface and field types. To accommodate the change
some of strategy-related methods are also updated.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
309e08e597 storage_service: get_token_metadata -> token_metadata2
In this commit we change the return type of
storage_service::get_token_metadata_ptr() to
token_metadata2_ptr and fix whatever breaks.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
f53f34f989 storage_service: get_token_to_endpoint_map: use new token_metadata
The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map
method was used only here. It's inlined in this
commit since it's too specific and incurs the overhead
of creating an intermediate map.
2023-12-12 23:19:53 +04:00
Petr Gusev
0e4c90dca6 api/token_metadata: switch to new version 2023-12-12 23:19:53 +04:00
Petr Gusev
b2d3dc33e2 storage_service::on_change: switch to new token_metadata
The check *ep == endpoint is needed when a node
changes its IP - on_change can be called by the
gossiper for old IP as part of its removal, after
handle_state_normal has already been called for
the new one. Without the check, the
do_update_system_peers_table call overwrites the IP
back to its old value.

Previously token_metadata used endpoint as the key
and the *ep == endpoint condition was followed from the
is_normal_token_owner check. Now with host_id-s we have
an additional layer of indirection, and we need
*ep == endpoint check to get the same end condition.

This case was revealed by the dtest
update_cluster_layout_tests.py::TestUpdateClusterLayout::test_change_node_ip
2023-12-12 23:19:53 +04:00
Petr Gusev
7eb7863635 cdc: switch to token_metadata2
Change the token_metadata type to token_metadata2 in
the signatures of CDC-related methods in
storage_service and cdc/generation. Use
get_new_strong to get a pointer to the new host_id-based
token_metadata from the inet_address-based one,
living in the shared_token_metadata.

The starting point of the patch is in
storage_service::handle_global_request. We change the
tmptr type to token_metadata2 and propagate the change
down the call chains. This includes token-related methods
of the boot_strapper class.
2023-12-12 23:19:53 +04:00
Petr Gusev
b2fb650098 calculate_natural_endpoints: fix indentation 2023-12-12 23:19:53 +04:00
Petr Gusev
80ccbc0d53 calculate_natural_endpoints: switch to token_metadata2
All usages of calculate_natural_endpoints are migrated,
now we can change its interface to take token_metadata2
instead of token_metadata.
2023-12-12 23:19:53 +04:00
Petr Gusev
933acb0f72 storage_service: get_changed_ranges_for_leaving: use new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
7c7dbe3779 decommission_with_repair, removenode_with_repair -> new token_metadata
Just mechanical changes to the new token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
ef534ac876 rebuild_with_repair, replace_with_repair: use new token_metadata
Just mechanical changes to the new token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
93263bf9e7 bootstrap: use new token_metadata
Just mechanical changes to the new token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
d9283bd025 tablets: switch to token_metadata2
locator_topology_test, network_topology_strategy_test and
tablets_test are fully switched to the host_id-based token_metadata,
meaning they no longer populate the old token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
f5038f6c72 calculate_effective_replication_map: use new token_metadata
In this commit we switch the function
calculate_effective_replication_map to use the new
token_metadata. We do this by employing our new helper
calculate_natural_ips function. We can't use this helper for
current_endpoints/target_endpoints though,
since in that case we won't add the IP to the
pending_endpoints in the replace-with-same-ip scenario

The token_metadata_test is migrated to host_ids in the same
commit to make it pass. Other tests work because they fill
both versions of the token_metadata, but for this test it was
simpler to just migrate it straight away. The test constructs
the old token_metadata over the new token_metadata,
this means only the get_new() method will work on it. That's
why we also need to switch some other functions
(maybe_remove_node_being_replaced, do_get_natural_endpoints,
get_replication_factor) to the new version in the same commit.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Petr Gusev
fe3c543c4e calculate_natural_endpoints: fix formatting 2023-12-12 23:19:53 +04:00
Petr Gusev
d5b4b02b28 abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
We've updated all the places where token_metadata
is mutated, and now we can progress to the next stage
of the refactoring - gradually switching the read
code paths.

The calculate_natural_endpoints function
is at the core of all of them. It decides to what nodes
the given token should be replicated to for the given
token_metadata. It has a lot of usages in various contexts,
we can't switch them all in one commit, so instead we
allowed the function to behave in both ways. If
use_host_id parameter is false, the function uses the provided
token_metadata as is and returns endpoint_set as a result.
If it's true, it uses get_new() on the provided token_metadata
and returns host_id_set as a result.

The scope of the whole refactoring is limited to the erm data
structure, its interface will be kept inet_address based for now.
This means we'll often need to resolve host_ids to inet_address-es
as soon as we got a result from calculated_natural_endpoints.
A new calculate_natural_ips function is added for convenience.
It uses the new token_metadata and immediately resolves
returned host_id-s to inet_address-es.

The auxiliary declarations natural_ep_type, set_type, vector_type,
get_self_id, select_tm are introduced only for the sake of
migration, they will be removed later.
2023-12-12 23:19:53 +04:00
Petr Gusev
1960436d93 network_topology_strategy_test: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
90234861ac storage_service: on_alive: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
5c04a47d6f storage_service: handle_state_bootstrap: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
4e03ba3ede storage_service: snitch_reconfigured: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
0aab20d3fe storage_service: leave_ring: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
278c832285 storage_service: node_ops_cmd_handler: update new token_metadata 2023-12-12 23:19:53 +04:00
Petr Gusev
1745a1551a storage_service: node_ops_cmd_handler: add coordinator_host_id
We'll need it in the next commits to address to
replacing and bootstrapping nodes by id.

We assume this change will be shipped in 6.0 with upgrade
from 5.4, where host_id already exists in client_info.
We don't support upgrade between non-adjacent versions.
2023-12-12 23:19:48 +04:00
Botond Dénes
47450ae4db tools/scylla-nodetool: implement the scrub command
On top of the capabilities of the java-nodetool command, the following
additional functionalit is implemented:
* Expose quarantine-mode option of the scrub_keyspace REST API
* Exit with error and print a message, when scrub finishes with abort or
  validation_errors return code
2023-12-12 09:39:58 -05:00
Botond Dénes
892683cace test/nodetool: rest_api_mock.py: add missing "f" to error message f string 2023-12-12 09:33:39 -05:00
Botond Dénes
8064d17f78 api: extract scrub_status into its own header
So it can be shared with scylla-nodetool code.
2023-12-12 09:33:39 -05:00
Petr Gusev
2794b14a80 storage_service: bootstrap: update new token_metadata 2023-12-12 17:27:25 +04:00
Petr Gusev
c20c8c653c storage_service: join_token_ring: update new token_metadata 2023-12-12 17:27:25 +04:00
Petr Gusev
fde20bddc0 storage_service: excise: update new token_metadata
excise is called from handle_state_left, the endpoint
may have already been removed from tm by then -
test_raft_upgrade_majority_loss fails if we use
unconditional tmptr->get_new()->get_host_id
instead of get_host_id_if_known
2023-12-12 17:27:25 +04:00
Petr Gusev
23811486d8 storage_service: join_cluster: update new token_metadata 2023-12-12 17:27:25 +04:00
Petr Gusev
711aaa0e29 storage_service: on_remove: update new token_metadata 2023-12-12 17:27:25 +04:00
Petr Gusev
6412cd64f1 storage_service: handle_state_normal: fill new token_metadata 2023-12-12 17:27:15 +04:00
Kefu Chai
c485644303 utils: bit_cast: drop unused #includes
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-12 21:09:51 +08:00
Kefu Chai
af0ba3d648 sstables: writer: do not include unused header
the helpers in bit_cast.hh are not used, so drop this #include.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-12 21:09:51 +08:00
Tomasz Grabiec
9b0d9e7c6b tests: tablets: Do read barrier in get_tablet_replicas()
In order for the call to see all prior changes to group0. Also, we
should query on the host on which we executed the barrier.

I hope this will reduce flakiness observed in CI runs on
https://github.com/scylladb/scylladb/pull/16341 where the expected
tablet replica didn't match the one returned by get_tablet_replica()
after tablet movement, possibly because the node is still behind
group0 changes.
2023-12-12 12:46:39 +01:00
Botond Dénes
493b6bc65f Merge 'Guard tables in compaction tasks' from Benny Halevy
Currently, if a compaction function enters the table
or compaction_group async_gate, we can't stop it
on the table/compaction_group stop path as they co_await
their respective async_gate.close().

This series introduces a table_ptr smart pointer to guards
the table object by entering its async_gate, and
it also defers awaiting the gate.close future
till after stopping ongoing compaction so that
closing the gate will prevent starting new compactions
while ongoing compaction can be stopped and finally
awaiting the close() future will wait for them to
unwind and exit the gate after being stopped.

Fixes #16305

Closes scylladb/scylladb#16351

* github.com:scylladb/scylladb:
  compaction: run_on_table: skip compaction also on gate_closed_exception
  compaction: run_on_table: hold table
  table: add table_holder and hold method
  table: stop: allow compactions to be stopped while closing async_gate
2023-12-12 12:50:17 +02:00
Botond Dénes
885a807c71 Merge 'api: storage_service: api for starting async compaction' from Aleksandra Martyniuk
For all compaction types which can be started with api, add an asynchronous version of api, which returns task_id of the corresponding task manager task. With the task_id a user can check task status, abort, or wait for it, using task manager api.

Closes scylladb/scylladb#15092

* github.com:scylladb/scylladb:
  test: use async api in test_not_created_compaction_task_abort
  test: test compaction task started asynchronously
  api: tasks: api for starting async compaction
  api: compaction: pass pointer to top level compaction tasks
2023-12-12 12:06:52 +02:00
Asias He
5f20e33e15 api: Reject unsupported http api options for repair
If an option is not supported, reject the request instead of silently
ignoring the unsupported options.

It prevents the user thinks the option is supported but it is ignored by
scylla core.

Fixes #16299

Closes scylladb/scylladb#16300
2023-12-12 09:18:00 +02:00
Benny Halevy
7843025a53 compaction: run_on_table: skip compaction also on gate_closed_exception
Similar to the no_such_column_family error,
gate_closed_exception indicates that the table
is stopped and we should skip compaction on it
gracefully.

Fixes #16305

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:46:37 +02:00
Benny Halevy
92c718c60a compaction: run_on_table: hold table
To ensure the table will not be dropped while
the compaction task is ongoing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:45:59 +02:00
Benny Halevy
cddcf3ad0c table: add table_holder and hold method
A smart pointer that guards the table object
while it's being used by async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:43:49 +02:00
Benny Halevy
c8768f9102 table: stop: allow compactions to be stopped while closing async_gate
To make sure a table object is kept valid throughout the lifetime
of compaction a following patch will enter the table's
_async_gate when the compaction task starts.

This change defers awaiting the gate.close future
till after stopping ongoing compaction so that
closing the gate will prevent starting new compactions
while ongoing compaction can be stopped and finally
awaiting the close() future will wait for them to
unwind and exit the gate after being stopped.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-12 08:31:50 +02:00
Anna Stuchlik
ff2457157d doc: add the 5.4-to-5.5 upgrade guide
This commit adds the upgrade guide from version
5.4 to 5.5.
Also, it removes all previous OSS guides not related
to version 5.5.

The guide includes the required Raft-related
information.

NOTE: The content of the guide must be further
verified closer to the release. I'm making
these updates now to avoid errors and warnings
related to outdated upgrade guides in other PRs,
and to include the Raft information.

Closes scylladb/scylladb#16350
2023-12-11 16:58:43 +01:00
Botond Dénes
3c125891f4 Update ./tools/java submodule
* ./tools/java 26f5f71c...29fe44da (3):
  > tools: catch and print UnsupportedOperationException
  > tools/SSTableMetadataViewer: continue if sstable does not exist
  > throw more informative error when fail to parse sstable generation

Fixes: scylladb/scylla-tools-java#360
2023-12-11 17:08:01 +02:00
Tomasz Grabiec
a33d45f889 streaming: Keep table by shared ptr to avoid crash on table drop
The observed crash was in the following piece on "cf" access:

    if (*table_is_dropped) {
        sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name());

Fixes #16181
2023-12-11 14:58:04 +01:00
Calle Wilund
b34366957e commitlog_test::test_commitlog_reader: handle segment_truncation
Fixes #16312

This test replays a segment before it might be closed or even fully flushed,
thus it can (with the new semantics) generate a segment_truncation exception
if hitting eof earlier than expected. (Note: test does not use pre-allocated
segments).
2023-12-11 11:53:12 +00:00
Calle Wilund
d85c0ea26f commitlog_test: coroutinize test_commitlog_reader
To make it easier to read and modify.
2023-12-11 11:47:48 +00:00
Takuya ASADA
7c38aff368 scylla_swap_setup: fix AttributeError
On dffadabb94 we mistakenly added
"if args.overwrite_unit_file", but the option is comming from unmerged
patch.
So we need to drop this to fix script error.

Fixes #16331

Closes scylladb/scylladb#16358
2023-12-11 13:41:00 +02:00
Tomasz Grabiec
effb9fb3cb Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun
When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620).

If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957).

When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary.

We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`.

Fixes: #7620
Fixes: #13957

---

This is a reincarnation of PR scylladb/scylladb#15331. The previous PR was reverted due to a bug it unmasked; the bug has now been fixed (scylladb/scylladb#16139). Some refactors from the previous PR were already merged separately, so this one is a bit smaller.

I have checked with @Lorak-mmk's reproducer (https://github.com/Lorak-mmk/udt_schema_change_reproducer -- many thanks for it!) that the originally exposed bug is no longer reproducing on this PR, and that it can still be reproduced if I revert the aforementioned fix on top of this PR.

Closes scylladb/scylladb#16242

* github.com:scylladb/scylladb:
  docs: describe group 0 schema versioning in raft docs
  test: add test for group 0 schema versioning
  feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode
  schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0
  migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations
  schema_tables: use schema version from group 0 if present
  migration_manager: store `group0_schema_version` in `scylla_local` during schema changes
  system_keyspace: make `get/set_scylla_local_param` public
  feature_service: add `GROUP0_SCHEMA_VERSIONING` feature
2023-12-11 12:17:57 +01:00
Eliran Sinvani
befd910a06 install-dependencies.sh : Add packages for supporting code coverage
As part of code coverage we need some additional packages in order to
being able to process the code coverage data and being able to provide
some meaningful information in logs.
Here we add the following packages:
fedora packages:
----------------
lcov - A package of utilities to manipulate lcov traces and generate
       coverage html reports

fedora python3 packages:
------------------------
The following packages are added into fedora_packages and not the
python3_packages since we don't need them to be packaged into
scylla-python3 package but we only require them for the build
environment.

python3-unidiff - A python library for working with patch files, this is
                  required in order to generate "patch coverage" reports.
python3-humanfriendly - A python library to format some quantities into
                        a human readable strings (time spans, sizes, etc...)
                        we use it to print meaningful logs that tracks
                        the volume and time it takes to process coverage
                        data so we can better debug and optimize it in the
                        future.
python3-jinja3 - This is a template based generator that will eventually
                 will allow to consolidate and rearrange several reports into one so we
                 can publish a single report "site" for all of the coverage information.
                 For example, include both, coverage report as well as
                 patch report in a tab based site.

pip packages:
-------------
treelib - A tree data structure that supports also pretty printing of
          the tree data. We use it to log the coverage processing steps in
          order to have debugging capabilities in the future.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb/scylladb#16330

[avi: regenerate toolchain]

Closes scylladb/scylladb#16357
2023-12-11 13:12:05 +02:00
Aleksandra Martyniuk
31977a1cde test: use async api in test_not_created_compaction_task_abort 2023-12-11 11:39:41 +01:00
Aleksandra Martyniuk
68f6886d50 test: test compaction task started asynchronously
Check whether task id returned by asynchronous api is correct and
whether tasks of proper type are created.
2023-12-11 11:39:41 +01:00
Aleksandra Martyniuk
b485897704 api: tasks: api for starting async compaction
For all compaction types which can be started with api, add an asynchronous
version of api, which returns task_id of the corresponding task manager
task. With the task_id a user can check task status, abort, or wait for it,
using task manager api.
2023-12-11 11:39:33 +01:00
Takuya ASADA
cc90ff1646 scylla-server.service: switch deprecated PermissionsStartsOnly to ExecStartPre=+
Since we dropped CentOS7 support, now we can switch from
"PermissionsStartsOnly=True" to "ExecStartPre=+".

Fixes scylladb/scylla-enterprise#1067
2023-12-11 19:38:28 +09:00
Takuya ASADA
6f1fff58ba dist: drop legacy control group parameters
Since we dropped CentOS7 support, now we can drop legacy control group
parameters which is deprecated on systemd v252.
2023-12-11 19:38:28 +09:00
Takuya ASADA
dcb5fd6fce scylla-server.slice: Drop workaround for MemorySwapMax=0 bug
It was workaround for https://github.com/systemd/systemd/issues/8363,
but the bug was fixed at
906bdbf5e7
and merged from systemd v239-8.
Since we dropped support CentOS7, now we don't need the workaround
anymore.
2023-12-11 19:38:28 +09:00
Takuya ASADA
6d7cb97645 dist: move AmbientCapabilities to scylla-server.service
Since we dropped support CentOS7, now we always can use AmbientCapabilities
without systemd version check, so we can move it from capabilities.conf
to scylla-server.service.
Although, we still cannnot hardcode CAP_PERFMON since it is too new,
only newer kernel supported this, so keep it on scylla_post_install.sh
2023-12-11 19:38:28 +09:00
Takuya ASADA
1dc4feb68d Revert "scylla_setup: add warning for CentOS7 default kernel"
This reverts commit 85339d1820.
2023-12-11 19:38:28 +09:00
Aleksandra Martyniuk
ceec5577d8 api: compaction: pass pointer to top level compaction tasks
As a preparation for asynchronous compaction api, from which we
cannot take values by reference, top level compaction tasks get
pointers which need to be set to nullptr when they are not needed
(like in async api).
2023-12-11 11:36:10 +01:00
Nadav Har'El
12f0007ede Merge 'Skip auto snapshot for non-local storages' from Pavel Emelyanov
When a table is truncated or dropped it can be auto-snapshotted if the respective config option is set (by default it is). Non local storages don't implement snapshotting yet and emit on_internal_error() in that case aborting the whole process. It's better to skip snapshot with a warning instead.

Closes scylladb/scylladb#16220

* github.com:scylladb/scylladb:
  database: Do not auto snapshot non-local storages' tables
  database: Simplify snapshot booleans in truncate_table_on_all_shards()
2023-12-11 12:13:48 +02:00
Petr Gusev
b6fbbe28aa storage_service: topology_state_load: fill new token_metadata
For each inet_address-based modification of token_metadata we
make a corresponding host_id-based change in token_metadata->get_new().

The _gossiper.add_saved_endpoint logic is switched to the new token_metadata.
2023-12-11 12:51:34 +04:00
Piotr Dulikowski
e7e1c4e63c storage_service: adjust update_topology_change_info to update new token_metadata
Both versions of the token_metadata need to be updated. For
the new version we provide a dc_rack_fn function which looks
for dc_rack by host_id in topology_state_machine if raft
topology is on. Otherwise, it looks for IP for the given
host_id and falls back to the gossiper-based function
get_dc_rack_for.
2023-12-11 12:51:34 +04:00
Petr Gusev
66c30e4f8e topology: set self host_id on the new topology
With this commit, we begin the next stage of the
refactoring - updating the new version of the token_metadata
in all places where the old version is currently being updated.

In this commit we assign host_id of this node, both in main.cc
and in boost tests.
2023-12-11 12:51:34 +04:00
Petr Gusev
e4253776a1 locator::topology: allow being_replaced and replacing nodes to have the same IP
When we're replacing a node with the same IP address, we want
the following behavior:
  * host_id -> IP mapping should work and return the same IP address for two
  different host_ids - old and new.
  * the IP -> host_id mapping should return the host_id of the old (replaced)
  host.
This variant is most convenient for preserving the current behavior
of the code, especially the functions maybe_remove_node_being_replaced,
erm::get_natural_endpoints_without_node_being_replaced,
erm::get_pending_endpoints. The 'being_replaced' node will be properly removed in
maybe_remove_node_being_replaced and 'replacing' node will be added to
the pending_endpoints.
2023-12-11 12:51:34 +04:00
Petr Gusev
5a1418fdba token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
This commit fixes an inconsistency in method names:
get_host_id and get_host_id_if_known are
(internal_error, returns null), but there was only
one method for the opposite conversion - get_endpoint_for_host_id,
and it returns null. In this commit we change it to on_internal_error
if it can't find the argument and add another method
get_endpoint_for_host_id_if_known which returns null in this case.

We can't use get_endpoint_for_host_id/get_host_id
in host_id_or_endpoint::resolve since it's called
from storage_service::parse_node_list
-> token_metadata::parse_host_id_and_endpoint,
and exceptions are caught and handled in
`storage_service::parse_node_list`.
2023-12-11 12:51:34 +04:00
Petr Gusev
08b47d645a token_metadata: get_host_id: exception -> on_internal_error
It's a bug to use get_host_id on a non-existent endpoint,
so on_internal_error is more appropriate. Also, it's
easier to debug since it provides a backtrace.

If a missing inet_address is expected, get_host_id_if_known
should be used instead. We update one such case in
storage_service::force_remove_completion. Other
usages of get_host_id are correct.
2023-12-11 12:51:34 +04:00
Petr Gusev
39bbe5f457 token_metadata: add get_all_ips method
This is convenient for migrating code that uses
get_all_endpoints.
2023-12-11 12:51:34 +04:00
Petr Gusev
9edf0709e6 token_metadata: support host_id-based version
In this commit we enhance token_metadata with a pointer to the
new host_id-based generic_token_metadata specialisation (token_metadata2).
The idea is that in the following commits we'll go over all token_metadata
modifications and make the corresponding modifications to its new
host_id-based alternative.

The pointer to token_metadata2 is stored in the
generic_token_metadata::_new_value field. The pointer can be
mutable, immutable, or absent altogether (std::monostate).
It's mutable if this generic_token_metadata owns it, meaning
it was created using the generic_token_metadata(config cfg)
constructor. It's immutable if the
generic_token_metadata(lw_shared_ptr<const token_metadata2> new_value);
constructor was used. This means this old token_metadata is a wrapper for
new token_metadata and we can only use the get_new() method on it. The field
_new_value is empty for the new host_id-based token_metadata version.

The generic_token_metadata(std::unique_ptr<token_metadata_impl<NodeId>> impl, token_metadata2 new_value);
constructor is used for clone methods. We clone both versions,
and we need to pass a cloned token_metadata2 into constructor.

There are two overloads of get_new, for mutable and immutable
generic_token_metadata. Both of them throws an exception if
they can't get the appropriate pointer. There is also a
get_new_strong method, which returns an immutable owning
pointer. This is convenient since a lot of API's want an
owning pointer. We can't make the get_new/get_new_strong API
simpler and use get_new_strong everywhere since it mutate the
original generic_token_metadata by incrementing the reference
counter and this causes raises when it's passed between
shards in replicate_to_all_cores.
2023-12-11 12:51:34 +04:00
Petr Gusev
63f64f3303 token_metadata: make it a template with NodeId=inet_address/host_id
NodeId is used in all internal token_metadata data structures, that
previously used inet_address. We choose topology::key_kind based
on the value of the template parameter.

generic_token_metadata::update_topology overload with host_id
parameter is added to make update_topology_change_info work,
it now uses NodeId as a parameter type.

topology::remove_endpoint(host_id) is added to make
generic_token_metadata::remove_endpoint(NodeId) work.

pending_endpoints_for and endpoints_for_reading are just removed - they
are not used and not implemented. The declarations were left by mistake
from a refactoring in which these methods were moved to erm.

generic_token_metadata_base is extracted to contain declarations, common
to both token_metadata versions.

Templates are explicitly instantiated inside token_metadata.cc, since
implementation part is also a template and it's not exposed to the header.

There are no other behavioral changes in this commit, just syntax
fixes to make token_metadata a template.
2023-12-11 12:51:34 +04:00
Petr Gusev
c9fbe3d377 locator: make dc_rack_fn a template
In the next commits token_metadata will be
made a template with NodeId=inet_address|host_id
parameter. This parameter will be passed to dc_rack_fn
function, so it also should be made a template.
2023-12-11 12:51:33 +04:00
Piotr Dulikowski
5227b71363 locator/topology: add key_kind parameter
For the host_id-based token_metadata we want host_id
to be the main node key, meaning it should be used
in add_or_update_endpoint to find the node to update.
For the inet_address-based token_metadata version
we want to retain the old behaviour during transition period.

In this commit we introduce key_kind parameter and use
key_kind::inet_address in all current topology usages.
Later we'll use key_kind::host_id for the new token_metadata.

In the last commits of the series, when the new token_metadata
version is used everywhere, we will remove key_kind enum.
2023-12-11 12:51:33 +04:00
Petr Gusev
2f137776c3 token_metadata: topology_change_info: change field types to token_metadata_ptr
In subsequent commits we'll need the following api for token_metadata:
  token_metadata(token_metadata2_ptr);
  get_new() -> token_metadata2*
where token_metadata2 is the new version of token_metadata,
based on host_id.

In other words:
* token_metadata knows the new version of itself and returns a pointer
to it through get_new()
* token_metadata can be constructed based solely on the new version,
without its own implementation. In this case the only method we can
use on it is get_new.

This allows to pass token_metadata2 to API's with token_metadata in method
signature, if these APIs are known to only use the get_new method on the
passed token_metadata.

And back to topology_change_info - if we got it from the new token_metadata
we want to be able to construct token_metadata from token_metadata2 contained
in it, and this requires it to be a ptr, not value.
2023-12-11 12:51:33 +04:00
Petr Gusev
f21f23483c token_metadata: drop unused method get_endpoint_to_token_map_for_reading 2023-12-11 12:51:22 +04:00
Alexander Turetskiy
f30b5473ab cql: Reject empty options while altering a keyspace
Reject ALTER KEYSPACE request for NetworkTopologyStrategy when
replication options are missed.

Also reject CREATE KEYSPACE with no replication factor options.
Cassandra has a default_keyspace_rf configuration that may allow such
CREATE KEYSPACE commands, but Scylla doesn't have this option (refs #16028).

fixes #10036

Closes scylladb/scylladb#16221
2023-12-10 17:44:35 +02:00
Kefu Chai
818343b57d build: build session.cc in CMake building system
this source file was added in d3d83869. so let's update cmake
as well.

sessions_tests was added in the same commit, so add it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16344
2023-12-09 22:14:47 +02:00
Avi Kivity
d62a5fc60b Merge 'tools/scylla-nodetool: implement additional commands, part 5/N ' from Botond Dénes
This PR implements the following new nodetool commands:
* decomission
* rebuild
* removenode
* getlogginglevels
* setlogginglevel
* move
* refresh

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#16348

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the refresh command
  tools/scylla-nodetool: implement the move command
  tools/scylla-nodetool: implement setlogginglevel command
  tools/sclla-sstable: implement the getlogginglevels command
  tools/scylla-nodetool: implement the removenode command
  tools/scylla-nodetool: implement the rebuild command
  tools/scylla-nodetool: implement the decommission command
2023-12-09 21:47:22 +02:00
Pavel Emelyanov
5e69415387 guardrails: Do not validate initial_tablets as replication factor
When checking replication strategy options the code assumes (and it's
stated in the preceeding code comment) that all options are replication
factors. Nowadays it's no longer so, the initial_tablets one is not
replication factor and should be skipped

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16335
2023-12-09 15:56:41 +02:00
Kamil Braun
3352d9bccc docs: describe group 0 schema versioning in raft docs 2023-12-08 17:46:31 +01:00
Kamil Braun
30fc36f8d2 test: add test for group 0 schema versioning
Perform schema changes while mixing nodes in RECOVERY mode with nodes in
group 0 mode:
- schema changes originating from RECOVERY node use
  digest-based schema versioning.
- schema changes originating from group 0
  nodes use persisted versions committed through group 0.

Verify that schema versions are in sync after each schema change, and
that each schema change results in a different version.

Also add a simple upgrade test, performing a schema change before we
enable Raft (which also enables the new versioning feature) in the
entire cluster, then once upgrade is finished.

One important upgrade test is missing, which we should add to dtest:
create a cluster in Raft mode but in a Scylla version that doesn't
understand GROUP0_SCHEMA_VERSIONING. Then start upgrading to a version
that has this patchset. Perform schema changes while the cluster is
mixed, both on non-upgraded and on upgraded nodes. Such test is
especially important because we're adding a new column to the
`system.scylla_local` table (which we then redact from the schema
definition when we see that the feature is disabled).
2023-12-08 17:46:31 +01:00
Kamil Braun
7dad31c78f feature_service: enable GROUP0_SCHEMA_VERSIONING in Raft mode
As promised in earlier commits:
Fixes: #7620
Fixes: #13957

Also modify two test cases in `schema_change_test` which depend on
the digest calculation method in their checks. Details are explained in
the comments.
2023-12-08 17:46:31 +01:00
Kamil Braun
522540da40 schema_tables: don't delete version cell from scylla_tables mutations from group 0
As explained in the previous commit, we use the new
`committed_by_group0` flag attached to each row of a `scylla_tables`
mutation to decide whether the `version` cell needs to be deleted or
not.

The rest of #13957 is solved by pre-existing code -- if the `version`
column is present in the mutation, we don't calculate a hash for
`schema::version()`, but take the value from the column:

```
table_schema_version schema_mutations::digest(db::schema_features sf)
const {
    if (_scylla_tables) {
        auto rs = query::result_set(*_scylla_tables);
        if (!rs.empty()) {
            auto&& row = rs.row(0);
            auto val = row.get<utils::UUID>("version");
            if (val) {
                return table_schema_version(*val);
            }
        }
    }

    ...
```

The issue will therefore be fixed once we enable
`GROUP0_SCHEMA_VERSIONING`.
2023-12-08 17:46:31 +01:00
Kamil Braun
defcf9915c migration_manager: add committed_by_group0 flag to system.scylla_tables mutations
As described in #13957, when creating or altering a table in group 0
mode, we don't want each node to calculate `schema::version()`s
independently using a hash algorithm. Instead, we want to all nodes to
use a single version for that table, commited by the group 0 command.

There's even a column ready for this in `system.scylla_tables` --
`version`. This column is currently being set for system tables, but
it's not being used for user tables.

Similarly to what we did with global schema version in earlier commits,
the obvious thing to do would be to include a live cell for the `version`
column in the `system.scylla_tables` mutation when we perform the schema
change in Raft mode, and to include a tombstone when performing it
outside of Raft mode, for the RECOVERY case.

But it's not that simple because as it turns out, we're *already*
sending a `version` live cell (and also a tombstone, with timestamp
decremented by 1) in all `system.scylla_tables` mutations. But then we
delete that cell when doing schema merge (which begs the question
why were we sending it in the first place? but I digress):
```
        // We must force recalculation of schema version after the merge, since the resulting
        // schema may be a mix of the old and new schemas.
        delete_schema_version(mutation);
```
the above function removes the `version` cell from the mutation.

So we need another way of distinguishing the cases of schema change
originating from group 0 vs outside group 0 (e.g. RECOVERY).

The method I chose is to extend `system.scylla_tables` with a boolean
column, `committed_by_group0`, and extend schema mutations to set
this column.

In the next commit we'll decide whether or not the `version` cell should
be deleted based on the value of this new column.
2023-12-08 17:46:31 +01:00
Kamil Braun
87b2c8a041 schema_tables: use schema version from group 0 if present
As promised in the previous commit, if we persisted a schema version
through a group 0 command, use it after a schema merge instead of
calculating a digest.

Ref: #7620

The above issue will be fixed once we enable the
`GROUP0_SCHEMA_VERSIONING` feature.
2023-12-08 17:46:31 +01:00
Kamil Braun
3db8ac80cb migration_manager: store group0_schema_version in scylla_local during schema changes
We extend schema mutations with an additional mutation to the
`system.scylla_local` table which:
- in Raft mode, stores a UUID under the `group0_schema_version` key.
- outside Raft mode, stores a tombstone under that key.

As we will see in later commits, nodes will use this after applying
schema mutations. If the key is absent or has a tombstone, they'll
calculate the global schema digest on their own -- using the old way. If
the key is present, they'll take the schema version from there.

The Raft-mode schema version is equal to the group 0 state ID of this
schema command.

The tombstone is necessary for the case of performing a schema change in
RECOVERY mode. It will force a revert to the old digest-based way.

Note that extending schema mutations with a `system.scylla_local`
mutation is possible thanks to earlier commits which moved
`system.scylla_local` to schema commitlog, so all mutations in the
schema mutations vector still go to the same commitlog domain.

Also, since we introduce a replicated tombstone to
`system.scylla_local`, we need to set GC grace to nonzero. We set it to
`schema_gc_grace`, which makes sense given the use case.
2023-12-08 17:45:41 +01:00
Botond Dénes
496459165e tools/scylla-nodetool: implement the refresh command 2023-12-08 08:58:16 -05:00
Botond Dénes
ad148a9dbc tools/scylla-nodetool: implement the move command
In the java nodetool, this command ends up calling an API endpoint which
just throws an exception saying moving tokens is not supported. So in
the native implementation we just throw an exception to the same effect
in scylla-nodetool itself.
2023-12-08 08:29:39 -05:00
Botond Dénes
58d3850da1 tools/scylla-nodetool: implement setlogginglevel command 2023-12-08 08:18:56 -05:00
Botond Dénes
3a8590e1af tools/sclla-sstable: implement the getlogginglevels command 2023-12-08 07:32:45 -05:00
Botond Dénes
c35ed794de tools/scylla-nodetool: implement the removenode command 2023-12-08 07:32:31 -05:00
Botond Dénes
9a484cb145 tools/scylla-nodetool: implement the rebuild command 2023-12-08 07:05:30 -05:00
Botond Dénes
ea62f7c848 tools/scylla-nodetool: implement the decommission command 2023-12-08 06:14:36 -05:00
Kefu Chai
893f319004 sstables: add formatter for index_consume_entry_context_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, in order to enable the code in the header to
access the formatter without being moved down after the full specialization's
definition, we

* move the enum definition out of the class and before the
  class,
* rename the enum's name from state to index_consume_entry_context_state
* define a formatter for index_consume_entry_context_state
* remove its operator<<().

as fmt v10 is able to use `format_as()` as a fallback, the formatter
full specialization is guarded with `#if FMT_VERSION < 10'00'00`. we
will remove it after we start build with fmt v10.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16204
2023-12-08 12:45:38 +02:00
Kurashkin Nikita
c071cd92b5 cql3:statement_restrictions.cc add more conditions to prevent "allow filtering" error to pop up in delete/update statements
Modified Cassandra tests to check for Scylla's error messages
Fixes #12474

Closes scylladb/scylladb#15811
2023-12-07 21:25:18 +02:00
Avi Kivity
9c0f05efa1 Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec
Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later.

This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted.

The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained.

The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was.

This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas.

Closes scylladb/scylladb#15847

* github.com:scylladb/scylladb:
  test: tablets: Add test for failed streaming being fenced away
  error_injection: Introduce poll_for_message()
  error_injection: Make is_enabled() public
  api: Add API to kill connection to a particular host
  range_streamer: Do not block topology change barriers around streaming
  range_streamer, tablets: Do not keep token metadata around streaming
  tablets: Fail gracefully when migrating tablet has no pending replica
  storage_service, api: Add API to disable tablet balancing
  storage_service, api: Add API to migrate a tablet
  storage_service, raft topology: Run streaming under session topology guard
  storage_service, tablets: Use session to guard tablet streaming
  tablets: Add per-tablet session id field to tablet metadata
  service: range_streamer: Propagate topology_guard to receivers
  streaming: Always close the rpc::sink
  storage_service: Introduce concept of a topology_guard
  storage_service: Introduce session concept
  tablets: Fix topology_metadata_guard holding on to the old erm
  docs: Document the topology_guard mechanism
2023-12-07 16:29:02 +02:00
Avi Kivity
4b1ef00dbb Merge 'File stream for tablet preparation' from Asias He
This series adds preparation patches for file stream tablet implementation in enterprise branch. It minimizes the differences between those two branches.

Closes scylladb/scylladb#16297

* github.com:scylladb/scylladb:
  messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb
  compaction_group_for_token: Handle minimum_token and maximum_token token
  serializer: Add temporary_buffer support
  cql_test_env: Allow messaging_service to start listen
2023-12-07 16:26:22 +02:00
Pavel Emelyanov
3eaadfcd4a database: Do not auto snapshot non-local storages' tables
Snapshotting is not yet supported for those (see #13025) and
auto-snapshot would step on internal error. Skip it and print a warning
into logs

fixes #16078

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 13:47:12 +03:00
Avi Kivity
ed2a9b8750 Merge 'Commitlog: Fix reading/writing position calculations and allocation size checks' from Calle Wilund
Fixes #16298

The adjusted buffer position calculation in buffer_position(), introduced in https://github.com/scylladb/scylladb/pull/15494
was in fact broken. It calculated (like previously) a "position" based on diff between
underlying buffer size and ostream size() (i.e. avail), then adjusted this according to
sector overhead rules.

However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted.
The two cannot be compared as such, which means the "positions" we get here are borked.

Luckily for us (sarcasm), the position calculation in replayer made a similar error,
in that it adjusts up current position by one sector overhead to much, leading to us
more or less getting the same, erroneous results in both ends.

However, when/iff one needs to adjust the segment file format further, one might very
quickly realize that this does not work well if, say, one needs to be able to safely
read some extra bytes before first chunk in a segment. Conversely, trying to adjust
this also exposes a latent potential error in the skip mechanism, manifesting here.

Issue fixed by keeping track of the initial ostream capacity for segment buffer, and
use this for position calculation, and in the case of replayer, move file pos adjustment
from read_data() to subroutine (shared with skipping), that better takes data stream
position vs. file position adjustment. In implementaion terms, we first inc the
"data stream" pos (i.e. pos in data without overhead), then adjust for overhead.

Also fix replayer::skip, so that we handle the buffer/pos relation correctly now.

Added test for intial entry position, as well as data replay consistency for single
entry_writer paths.

Fixes #16301

The calculation on whether data may be added is based on position vs. size of incoming data.
However, it did not take sector overhead into account, which lead us to writing past allowed
segment end, which in turn also leads to metrics overflows.

Closes scylladb/scylladb#16302

* github.com:scylladb/scylladb:
  commitlog: Fix allocation size check to take sector overhead into account.
  commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart
2023-12-07 12:27:54 +02:00
Pavel Emelyanov
44c076472c database: Simplify snapshot booleans in truncate_table_on_all_shards()
There are three of them in this function -- with_snapshot argument,
auto_snapshot local copy of db::config option and the should_snapshot
local variable that's && of the above two. The code can go with just one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 13:06:28 +03:00
Botond Dénes
fb9379edf1 test/cql-pytest: test_select_from_mutation_fragments: bump timeout for slow test
The test test_many_partitions is very slow, as it tests a slow scan over
a lot of partitions. This was observed to time out on the slower ARM
machines, making the test flaky. To prevent this, create an
extra-patient cql connection with a 10 minutes timeout for the scan
itself.

Fixes: #16145

Closes scylladb/scylladb#16303
2023-12-07 11:55:53 +02:00
Yaniv Kaul
862909ee4f Typos: fix typos in documentation
Using codespell, went over the docs and fixed some typos.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#16275
2023-12-07 11:10:17 +02:00
Anna Stuchlik
8b01cb7fb8 doc: set 5.4 as the latest stable version
This commit updates the configuration for
ScyllaDB documentation so that:
- 5.4 is the latest version.
- 5.4 is removed from the list of unstable versions.

It must be merged when ScyllaDB 5.4 is released.

No backport is required.

Closes scylladb/scylladb#16308
2023-12-07 10:04:26 +02:00
Pavel Emelyanov
76705b6ba2 test/s3: Avoid object range overflow
There's a test case the validates uploading sink by getting random
portions of the uploaded object. The portions are generated as

   len = random % chunk_size
   off = random % file_size - len

The latter may apparently render negative value which will translate
into huuuuge 64-bit range offset which, in turn, would result in invalid
http range specifier and getting object part fails with status OK

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 10:54:54 +03:00
Pavel Emelyanov
3e9309caf4 s3/client: Handle GET-with-Range overflows correctly
The get_object_contiguous() accepts optional range argument in a form of
offset:lengh and then converts it into first_byte:last_byte pair to
satisfy http's Range header range-specifier.

If the lat_byte, which is offset + lenght - 1, overflows 64-bits the
range specifier becomes invalid. According to RFC9110 servers may ignore
invalid ranges if they want to and this is what minio does.

The result is pretty interesting. Since the range is specified, client
expect PartialContent response, but since the range is ignored by server
the result is OK, as if the full object was requested. So instead of
some sane "overflow" error, the get_object_contiguous() fails with
status "success".

The fix is in pre-checking provided ranges and failing early

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 10:50:55 +03:00
Calle Wilund
dba39b47bd commitlog: Fix allocation size check to take sector overhead into account.
Fixes #16301

The calculation on whether data may be added is based on position vs. size of incoming data.
However, it did not take sector overhead into account, which lead us to writing past allowed
segment end, which in turn also leads to metrics overflows.
2023-12-07 07:36:27 +00:00
Calle Wilund
0d35c96ef4 commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart
Fixes #16298

The adjusted buffer position calculation in buffer_position(), introduced in #15494
was in fact broken. It calculated (like previously) a "position" based on diff between
underlying buffer size and ostream size() (i.e. avail), then adjusted this according to
sector overhead rules.

However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted.
The two cannot be compared as such, which means the "positions" we get here are borked.

Luckily for us (sarcasm), the position calculation in replayer made a similar error,
in that it adjusts up current position by one sector overhead to much, leading to us
more or less getting the same, erroneous results in both ends.

However, when/iff one needs to adjust the segment file format further, one might very
quickly realize that this does not work well if, say, one needs to be able to safely
read some extra bytes before first chunk in a segment. Conversely, trying to adjust
this also exposes a latent potential error in the skip mechanism, manifesting here.

Issue fixed by keeping track of the initial ostream capacity for segment buffer, and
use this for position calculation, and in the case of replayer, move file pos adjustment
from read_data() to subroutine (shared with skipping), that better takes data stream
position vs. file position adjustment. In implementaion terms, we first inc the
"data stream" pos (i.e. pos in data without overhead), then adjust for overhead.

Also fix replayer::skip, so that we handle the buffer/pos relation correctly now.

Added test for intial entry position, as well as data replay consistency for single
entry_writer paths.
2023-12-07 07:36:27 +00:00
Asias He
6beadab9e6 messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb
They will be used to implement file stream for tablet in the future. Reserve
the verb ID.
2023-12-07 14:54:12 +08:00
Asias He
67cfa12c7d compaction_group_for_token: Handle minimum_token and maximum_token token
The following error was seen:

[shard 0] table - compaction_group_for_token: compaction_group idx=0 range=(minimum
token,-6917529027641081857] does not contain token=minimum token

Since minimum_token or maximum_token will not be inside a token range. Skip
the in token range check.
2023-12-07 14:54:12 +08:00
Asias He
974b28a750 serializer: Add temporary_buffer support
It will be used by file stream for tablet.
2023-12-07 09:46:37 +08:00
Asias He
faaf58f62c cql_test_env: Allow messaging_service to start listen
This is needed for rpc calls to work in the tests. With this patch, by
default, messaging_service does not listen as it was before.

This is useful for file stream for tablet test.
2023-12-07 09:46:36 +08:00
Avi Kivity
92d61def57 Merge 'scylla_swap_setup: run error check before allocating swap and increase swap allocation speed' from Takuya ASADA
This patch fixes error check and speed up swap allocation.

Following patches are included:
 - scylla_swap_setup: run error check before allocating swap
   avoid create swapfile before running error check
 - scylla_swap_setup: use fallocate on ext4
   this inclease swap allocation speed on ext4

Closes scylladb/scylladb#12668

* github.com:scylladb/scylladb:
  scylla_swap_setup: use fallocate on ext4
  scylla_swap_setup: run error check before allocating swap
2023-12-06 21:40:10 +02:00
Avi Kivity
55dacb8480 Merge 'Generalize atomic sstables deletion' from Pavel Emelyanov
The current implementation starts in sstables_manager that gets the deletion function from storage which, in turn, should atomically do sst.unlink() over a list of sstables (s3 driver is still not atomic though #13567).

This PR generalizes the atomic deletion inside sstables_manager method and removes the atomic deletor function that nobody liked when it was introduced (#13562)

Closes scylladb/scylladb#16290

* github.com:scylladb/scylladb:
  sstables/storage: Drop atomic deleter
  sstables/storage: Reimplement atomic deletion in sstables_manager
  sstables/storage: Add prepare/complete skaffold for atomic deletion
2023-12-06 19:48:07 +02:00
Tomasz Grabiec
7d0f4c10a2 test: tablets: Add test for failed streaming being fenced away 2023-12-06 18:37:01 +01:00
Tomasz Grabiec
083a0279a9 error_injection: Introduce poll_for_message()
To allow more complex waiting, which involves other exit conditions.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
ce0dc9e940 error_injection: Make is_enabled() public 2023-12-06 18:36:17 +01:00
Tomasz Grabiec
733eb21601 api: Add API to kill connection to a particular host
For testing failure scenarios.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
9dac0febce range_streamer: Do not block topology change barriers around streaming
Streaming was keeping effective_replication_map_ptr around the whole
process, which blocks topology change barriers.

This will inhibit progress of tablet load balancer or concurrent
migrations, resulting in worse performance.

Fix by switching to the most recent erm on sharder
calls. multishard_writer calls shard_of() for each new partition.

A better way would be to switch immediately when topology version
changes, but this is left for later.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
c228f2c940 range_streamer, tablets: Do not keep token metadata around streaming
It holds back global token metadata barrier during streaming, which
limits parallelism of load balancing.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
7a59acf248 tablets: Fail gracefully when migrating tablet has no pending replica
Before the patch we SIGSEGV trying to access pending replica in this
case. Fail early instead.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
d1c1b59236 storage_service, api: Add API to disable tablet balancing
Load balancing needs to be disabled before making a series of manual
migrations so that we don't fight with the load balancer.

Also will be used in tests to ensure tablets stick to expected locations.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
1f57d1ea28 storage_service, api: Add API to migrate a tablet
Will be used in tests, or for hot fixes in production.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
31c995332c storage_service, raft topology: Run streaming under session topology guard
Prevents stale streaming operation from running beyond topology
operation they were started in. After the session field is cleared, or
changed to something else, the old topology_guard used by streaming is
interrupted and fenced and the next barrier will join with any
remaining work.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
080169cad6 storage_service, tablets: Use session to guard tablet streaming 2023-12-06 18:36:17 +01:00
Tomasz Grabiec
5381792401 tablets: Add per-tablet session id field to tablet metadata
range_streamer will pick it up when creating topology_guard.

It's materialized in memory only for migrating tablets in
tablet_transition_info.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
fd3c089ccc service: range_streamer: Propagate topology_guard to receivers 2023-12-06 18:36:16 +01:00
Tomasz Grabiec
063095ea50 streaming: Always close the rpc::sink
rpc::sink::~sink aborts if not closed. There is a try/catch clause
which ensures that close() is called, but there was code after sink is
created which is not covered by it. Move sink construction past that
code.
2023-12-06 18:35:41 +01:00
Nadav Har'El
300e549267 tablets, mv: disable self-pairing when tablets are used
A write to a base table can generate one or more writes to a materialized
view. The write to RF base replicas need to cause writes to RF view
replicas. Our MV implementation, based on Cassandra's implementation,
does this via "pairing": Each one of the base replicas involved in this
write sends each view update to exactly one view replica. The function
get_view_natural_endpoint() tells a base replica which of the view
replicas it should send the update to.

The standard pairing is based on the ring order: The first owner of the
base token sends to the first owner of the view token, the second to the
second, and so on. However, the existing code also uses an optimization
we call self-pairing: If a single node is both a base replica and a base
replica, the pairing is modified so this node sends the update to itself.

This patch *disables* the self-pairing optimization in keyspaces that
use tablets:

The self-pairing optimization can cause the pairing to change after
token ranges are moved between nodes, so it can break base-view consistency
in some edge cases, leading to "ghost rows". With tablets, these range
movements become even more frequent - they can happen even if the
cluster doesn't grow.  This is why we want to solve this problem for tablets.

For backward compatibility and to avoid sudden inconsistencies emerging
during upgrades, we decided to continue using the self-pairing optimization
for keyspaces that are *not* using tablets (i.e., using vnoodes).

Currently, we don't introduce a "CREATE MATERIALIZED VIEW" option to
override these defaults - i.e., we don't provide a way to disable
self-pairing with vnodes or to enable them with tablets. We could introduce
such a schema flag later, if we ever want to (and I'm not sure we want to).

It's important to note, that in some cases, this change has implications
on when view updates become synchronous, in the tablets case.
For example:

  * If we have 3 nodes and RF=3, with the self-pairing optimization each
    node is paired with itself, the view update is local, and is
    implicitly synchronous (without requiring a "synchronous_updates"
    flag).
  * In the same setup with tablets, without the self-pairing optimization
    (due to this patch), this is not guaranteed. Some view updates may not
    be synchronous, i.e., the base write will not wait for the view
    write. If the user really wants synchronous updates, they should
    be requested explicitly, with the "synchronous_updates" view option.

Fixes #16260.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16272
2023-12-06 17:11:17 +02:00
Kefu Chai
f483309165 compaction, api: drop unused functions
run_on_existing_tables() is not used at all. and we have two of them.
in this change, let's drop them.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16304
2023-12-06 14:31:08 +02:00
Takuya ASADA
f90c10260f scylla_post_install.sh: Add CAP_PERFMON to AmbientCapabilities
Add CAP_PERFMON to AmbientCapabilities in capabilities.conf, to enable
perf_event based stall detector in Seastar.

However, on Debian/Ubuntu CAP_PERFMON with non-root user does not work
because it sets kernel.perf_event_paranoid=4 which disallow all non-root
user access.
(On Debian it kernel.perf_event_paranoid=3)
So we need to configure kernel.perf_event_paranoid=2 on these distros.
see: https://askubuntu.com/questions/1400874/what-does-perf-paranoia-level-four-do

Also, CAP_PERFMON is only available on linux-5.8+, older kernel does not
have this capability.
To enable older kernel environment such as CentOS7, we need to configure
kernel.perf_event_paranoid=1 to allow non-root user access even without
the capability.

Fixes #15743

Closes scylladb/scylladb#16070
2023-12-06 13:53:08 +02:00
Avi Kivity
3e8f37f0a4 Update seastar submodule
* seastar 55a821524d...ae8449e04f (22):
  > Revert "Merge 'reactor: merge pollfn on I/O paths into a single one' from Kefu Chai"
  > http/exception: Make unexpected status message more informative
  > docker: bump up to clang {16,17} and gcc {12,13}
  > doc: replace space (0xA0) in unicode with ASCII space (0x20)
  > file: Remove reactor class friendship
  > dpdk: adjust for poller in internal namespace
  > http: make_requests accept optional expected
  > Merge 'future: future_state_base: assert owner shard in debug mode' from Benny Halevy
  > Merge 'Keep pollers in internal/poll.hh' from Pavel Emelyanov
  > sharded: access instance promise only on instance shard
  > test: network_interface_test: add tests for format and parse
  > Merge 'reactor: merge pollfn on I/O paths into a single one' from Kefu Chai
  > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc (v2)
  > reactor: set local_engine after it is fully initialized
  > build: do not error when running into GCC BZ-1017852
  > Merge 'shared_future: make available() immediate after set_value()' from Piotr Dulikowski
  > tls: add format_as(subject_alt_name_type) overload
  > tls: linearize small packets on send
  > shared_future: remove unused #include
  > shared_ptr: add fmt::formatter for shared_ptr types
  > lazy: add fmt::formatter for lazy_eval types
  > Merge 'file: use unbuffered generator in experimental_list_directory()' from Kefu Chai

Closes scylladb/scylladb#16274
2023-12-06 13:24:53 +02:00
Kamil Braun
9b73bff752 docs: raft: mention unavailability for topology changes under quorum loss
Closes scylladb/scylladb#16307
2023-12-06 13:18:28 +02:00
Botond Dénes
56c3515751 Merge 'doc: fix Rust Driver release information' from Anna Stuchlik
This PR removes the incorrect information that the ScyllaDB Rust Driver is not GA.

In addition, it replaces "Scylla" with "ScyllaDB".

Fixes https://github.com/scylladb/scylladb/issues/16178

(nobackport)

Closes scylladb/scylladb#16199

* github.com:scylladb/scylladb:
  doc: remove the "preview" label from Rust driver
  doc: fix Rust Driver release information
2023-12-06 08:59:49 +02:00
Botond Dénes
d2a88cd8de Merge 'Typos: fix typos in code' from Yaniv Kaul
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16289

* github.com:scylladb/scylladb:
  Update unified/build_unified.sh
  Update main.cc
  Update dist/common/scripts/scylla-housekeeping
  Typos: fix typos in code
2023-12-06 07:36:41 +02:00
Avi Kivity
12f160045b Merge 'Get rid of fb_utilities' from Benny Halevy
utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address.
As part of the effort to get rid of all global state, this series gets rid of fb_utilities.
This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address.

Closes scylladb/scylladb#16250

* github.com:scylladb/scylladb:
  treewide: get rid of now unused fb_utilities
  tracing: use locator::topology rather than fb_utilities
  streaming: use locator::topology rather than fb_utilities
  raft: use locator::topology/messaging rather than fb_utilities
  storage_service: use locator::topology rather than fb_utilities
  storage_proxy: use locator::topology rather than fb_utilities
  service_level_controller: use locator::topology rather than fb_utilities
  misc_services: use locator::topology rather than fb_utilities
  migration_manager: use messaging rather than fb_utilities
  forward_service: use messaging rather than fb_utilities
  messaging_service: accept broadcast_addr in config rather than via fb_utilities
  messaging_service: move listen_address and port getters inline
  test: manual: modernize message test
  table: use gossiper rather than fb_utilities
  repair: use locator::topology rather than fb_utilities
  dht/range_streamer: use locator::topology rather than fb_utilities
  db/view: use locator::topology rather than fb_utilities
  database: use locator::topology rather than fb_utilities
  db/system_keyspace: use topology via db rather than fb_utilities
  db/system_keyspace: save_local_info: get broadcast addresses from caller
  db/hints/manager: use locator::topology rather than fb_utilities
  db/consistency_level: use locator::topology rather than fb_utilities
  api: use locator::topology rather than fb_utilities
  alternator: ttl: use locator::topology rather than fb_utilities
  gossiper: use locator::topology rather than fb_utilities
  gossiper: add get_this_endpoint_state_ptr
  test: lib: cql_test_env: pass broadcast_address in cql_test_config
  init: get_seeds_from_db_config: accept broadcast_address
  locator: replication strategies: use locator::topology rather than fb_utilities
  locator: topology: add helpers to retrieve this host_id and address
  snitch: pass broadcast_address in snitch_config
  snitch: add optional get_broadcast_address method
  locator: ec2_multi_region_snitch: keep local public address as member
  ec2_multi_region_snitch: reindent load_config
  ec2_multi_region_snitch: coroutinize load_config
  ec2_snitch: reindent load_config
  ec2_snitch: coroutinize load_config
  thrift: thrift_validation: use std::numeric_limits rather than fb_utilities
2023-12-05 19:40:14 +02:00
Eliran Sinvani
d1aaca893c install-dependencies.sh: Complete the pip install logic
install-dependencies.sh includes a list of pip packages that the build
environment requires.
This functionality was added in
729d0feef0, however, the actual use of the
list is missing and instead the `pip install` commands are hard coded
into the logic.

This change complete the transition to pip-packages list.
It includes also modifying the `pip_packages` array to include a
constrain (if needed) for every package.

Fixes #16269

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb/scylladb#16282
2023-12-05 16:35:31 +02:00
Benny Halevy
0bcce35abd treewide: get rid of now unused fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:22:49 +02:00
Benny Halevy
f8a957898b tracing: use locator::topology rather than fb_utilities
Get my_address via query_processor->proxy and pass it
to all static make_ methods, instead of getting it from
utils::fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:22:15 +02:00
Benny Halevy
6f7de427f0 streaming: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:12:11 +02:00
Anna Stuchlik
409e20e5ab doc: enabling experimental Raft-managed topology
This commit adds a short paragraph to the Raft
page to explain how to enable consistent
topology updates with Raft - an experimental
feature in version 5.4.

The paragraph should satisfy the requirements
for version 5.4. The Raft page will be
rewritten in the next release when consistent
topology changes with Raft will be GA.

Fixes https://github.com/scylladb/scylladb/issues/15080

Requires backport to branch-5.4.

Closes scylladb/scylladb#16273
2023-12-05 14:49:17 +01:00
Pavel Emelyanov
b9abd504be sstables/storage: Drop atomic deleter
Now the deleter function is not in use and can be dropped

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 16:47:52 +03:00
Pavel Emelyanov
604279f064 sstables/storage: Reimplement atomic deletion in sstables_manager
Right now the atomic deletion is called on manager, but it gets the
actual deletion function from storage and off-loads the deletion to it.
This patch makes the manager fully responsible for the delition by
implemeting the sequence of

    auto ctx = storage.prepare()
    for sst in sstables:
        sst.unlink()
    storage.complate(ctx)

Storage implementations provide the prepare/complete methods. The
filesystem storage does it via deletion log and the s3 storage is still
not atomic :(

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 16:46:01 +03:00
Pavel Emelyanov
4ecf4c4a6a sstables/storage: Add prepare/complete skaffold for atomic deletion
The atomic deletion is going to look like

    auto ctx = storage.prepare()
    for sst in sstables:
        sst.unlink()
    storage.complate(ctx)

and this patch prepares the class storage for that by extending it with
prepare and complete methods. The opaque ctx object is also here

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 16:44:13 +03:00
Yaniv Kaul
fef565482c Update unified/build_unified.sh
fix sentence overall
2023-12-05 15:23:38 +02:00
Yaniv Kaul
8f97429b16 Update main.cc
fix sentence overall, not just the typo
2023-12-05 15:21:48 +02:00
Yaniv Kaul
f2b810a16a Update dist/common/scripts/scylla-housekeeping
cobvert -> convert
2023-12-05 15:20:35 +02:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Tomasz Grabiec
0e42fe4c3c storage_service: Introduce concept of a topology_guard
topology_guard is used to track distributed operations started by the
topology change coordinator, e.g. streaming, to make sure that those
operations have no side effects after topology change coordinator
moved to the next migration stage, of a given tablet or of the whole
ring.

topology_guard can be sent over the wire in the form of
frozen_topology_guard. It can be materialized again on the other
side. While in transit, it doesn't block the coordinator barriers. But
if the coordinator moved on, materialization of the guard will
fail. So tracking safety is preserved.

In this patch, the guard implementation is based on tracking work
under global sessions, but the concept is flexible and other
mechanisms can be used without changing user code.
2023-12-05 14:09:35 +01:00
Tomasz Grabiec
d3d83869ce storage_service: Introduce session concept 2023-12-05 14:09:34 +01:00
Tomasz Grabiec
2d4cd9c574 tablets: Fix topology_metadata_guard holding on to the old erm
Since abort callbacks are fired synchronously, we must change the
table's erm before we do that so that the callbacks obtain the new
erm.

Otherwise, we will block barriers.
2023-12-05 14:09:34 +01:00
Tomasz Grabiec
6cd310fc1a docs: Document the topology_guard mechanism 2023-12-05 14:09:34 +01:00
Botond Dénes
5fb0d667cb tools/scylla-sstable: always read scylla.yaml
Currently, scylla.yaml is read conditionally, if either the user
provided `--scylla-yaml-file` command line parameter, or if deducing the
data dir location from the sstable path failed.
We want the scylla.yaml file to be always read, so that when working
with encrypted file (enterprise), scylla-sstable can pick up the
configuration for the encryption.
This patch makes scylla-sstable always attempt to read the scylla-yaml
file, whether the user provided a location for it or not. When not, the
default location is used (also considering the `SCYLLA_CONF` and
`SCYLLA_HOME` environment variables.
Failing to find the scylla.yaml file is not considered an error. The
rational is that the user will discover this if they attempt to do an
operation that requires this anyway.
There is a debug-level log about whether it was successfully read or
not.

Fixes: #16132

Closes scylladb/scylladb#16174
2023-12-05 15:06:29 +02:00
Kefu Chai
2ebdc40b0b docs: add Deprecated to value_status_count
despite that the "value_status_count" is not rendered/used yet,
it'd be better to keep it in sync with the code.

since 5fd30578d7 added
"Deprecated" to `value_status` enum, let's update the sphinx
extension accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16236
2023-12-05 14:52:13 +02:00
Avi Kivity
4498979b14 Merge 'When discarding table's sstables, delete them in one atomic batch' from Pavel Emelyanov
The table::discard_sstables() removes sstables attached to a table. For that it tries to atomically delete _each_ suitable sstable, which is a bit heavyweight -- each atomic deletion operation results in a deletion log file written. This PR deletes all table's sstables in one atomic batch. While at it, the body of the discard_sstables is simplified not to allocate the "pruner" object. The latter is possible after the method had become coroutine

Closes scylladb/scylladb#16202

* github.com:scylladb/scylladb:
  discard_sstables: Atomically delete all sstables
  discard_sstables: Indentation and formatting fix after previous patch
  discard_sstable: Open-code local prune() lambda
  discard_sstables: Do not allocate pruner
2023-12-05 14:17:06 +02:00
Kamil Braun
1763c65662 system_keyspace: make get/set_scylla_local_param public
We'll use it outside `system_keyspace` code in later commit.
2023-12-05 13:03:29 +01:00
Kamil Braun
07984215a3 feature_service: add GROUP0_SCHEMA_VERSIONING feature
This feature, when enabled, will modify how schema versions
are calculated and stored.

- In group 0 mode, schema versions are persisted by the group 0 command
  that performs the schema change, then reused by each node instead of
  being calculated as a digest (hash) by each node independently.
- In RECOVERY mode or before Raft upgrade procedure finishes, when we
  perform a schema change, we revert to the old digest-based way, taking
  into account the possibility of having performed group0-mode schema
  changes (that used persistent versions). As we will see in future
  commits, this will be done by storing additional flags and tombstones
  in system tables.

By "schema versions" we mean both the UUIDs returned from
`schema::version()` and the "global" schema version (the one we gossip
as `application_state::SCHEMA`).

For now, in this commit, the feature is always disabled. Once all
necessary code is setup in following commits, we will enable it together
with Raft.
2023-12-05 13:03:28 +01:00
Benny Halevy
6c00c9a45d raft: use locator::topology/messaging rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 13:26:46 +02:00
Benny Halevy
b3bede8141 storage_service: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 13:23:27 +02:00
Kamil Braun
52ae6b8738 Merge 'fix shutdown order between group0 and storage service' from Gleb
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. What it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch
series also introduces.

* 'gleb/group0-ss-shutdown' of github.com:scylladb/scylla-dev:
  storage_service: topology coordinator: ignore abort_requested_exception in background fibers
  storage_service: fix de-initialization order between storage service and group0_service
2023-12-05 11:20:52 +01:00
Kefu Chai
e88bd9c5bd gms/inet_address: pass sstring param by std::move()
less overhead this way. the caller of lookup() always passes
a rvalue reference. and seastar::dns::get_host_by_name() actually
moves away from the parameter, so let's pass by std::move() for
slightly better performance, and to match the expectation of
the underlying seastar API.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16280
2023-12-05 12:05:21 +03:00
Benny Halevy
a529097d96 storage_proxy: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 10:44:13 +02:00
Benny Halevy
0b310c471c service_level_controller: use locator::topology rather than fb_utilities
Expose cql3::query_processor in auth::service
to get to the topology via storage_proxy.replica::database

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 10:17:47 +02:00
Pavel Emelyanov
9bbbe7a99f discard_sstables: Atomically delete all sstables
When collected sstables are deleted each is passed into
sstables_manager.delete_atomically(). For on-disk sstables this creates
a deletion log for each removed stable, which is quite an overkill. The
atomic deletion callback already accepts vector of shared sstables, so
it's simpler (and a bit faster) to remove them all in a batch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 11:14:23 +03:00
Pavel Emelyanov
96bc530a57 discard_sstables: Indentation and formatting fix after previous patch
By "formatting" fix I mean -- remove the temporary on-stack references
that were left for the ease of patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 11:13:40 +03:00
Pavel Emelyanov
6d135fea43 discard_sstable: Open-code local prune() lambda
The lambda in question was the struct pruner method and was left there
for the ease of patching. Now, when this lambda is only called once
inside the function it is declared in, it can be open-coded into the
place where it's called

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 11:13:40 +03:00
Pavel Emelyanov
68cb2e66fc discard_sstables: Do not allocate pruner
This allocation remained from the pre-coroutine times of the method. Now
the contents of prumer -- refernce on table, vector and replay_position
can reside on coroutine frame

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-05 11:13:40 +03:00
Benny Halevy
0e5754adc6 misc_services: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 10:01:36 +02:00
Benny Halevy
d49d10dbdb migration_manager: use messaging rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:48:33 +02:00
Benny Halevy
860b2d38c6 forward_service: use messaging rather than fb_utilities
Use _forwarder._messaging to get to the broadcast address
rather than the global fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:48:12 +02:00
Benny Halevy
984a576405 messaging_service: accept broadcast_addr in config rather than via fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:46:25 +02:00
Benny Halevy
586f35bb55 messaging_service: move listen_address and port getters inline
And make them const noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:44:41 +02:00
Benny Halevy
eabd4570da test: manual: modernize message test
Basically, make it work (great) again.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:44:26 +02:00
Benny Halevy
f9acc90926 table: use gossiper rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:43:47 +02:00
Benny Halevy
6826d87052 repair: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:09:06 +02:00
Benny Halevy
e1239e63bf dht/range_streamer: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:01:31 +02:00
Benny Halevy
63b556123b db/view: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:55:46 +02:00
Benny Halevy
f40bb7c583 database: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
64145388c9 db/system_keyspace: use topology via db rather than fb_utilities
So not to rely on fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
4bb4d673c3 db/system_keyspace: save_local_info: get broadcast addresses from caller
So not to rely on fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
6e79d647e6 db/hints/manager: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
4c20b84680 db/consistency_level: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
e5d3c6741f api: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
03fe674314 alternator: ttl: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
f3e0358563 gossiper: use locator::topology rather than fb_utilities
And add `get_endpoint_state_ptr` for this_node.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
25754f843b gossiper: add get_this_endpoint_state_ptr
Returns this node's endpoint_state_ptr.
With this entry point, the caller doesn't need to
get_broadcast_address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
21ace44f03 test: lib: cql_test_env: pass broadcast_address in cql_test_config
For getting rid of fb_utilities.

In the future, that could be used to instantiate
multiple scylla node instances.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
3c846d3801 init: get_seeds_from_db_config: accept broadcast_address
Pass the broadcast_address from main to get_seeds_from_db_config
rather than getting it from fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
4d461fc788 locator: replication strategies: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
86716b2048 locator: topology: add helpers to retrieve this host_id and address
And respective `is_me()` predicates,
to prepare for getting rid of fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
52412087b7 snitch: pass broadcast_address in snitch_config
To untangle snitch from fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
94fc8e2a9a snitch: add optional get_broadcast_address method
and set broadcast_address / broadcast_rpc_address in main
to remove this dependency of snitch on fb_utilities.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
1d0e71308b locator: ec2_multi_region_snitch: keep local public address as member
To be used in the next patch to retrieve the broadcast_address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
90af71ffa7 ec2_multi_region_snitch: reindent load_config
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
fecb597ad6 ec2_multi_region_snitch: coroutinize load_config
Now that ec2_snitch::load_config is a coroutine
there's no need for a seastar thread here either.

Refs #16241

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
cb7e096a59 ec2_snitch: reindent load_config
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
1c1a048d3f ec2_snitch: coroutinize load_config
Fixes #16241

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:48 +02:00
Benny Halevy
9e1dd78539 thrift: thrift_validation: use std::numeric_limits rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:48 +02:00
Kefu Chai
50332f796e script/base36-uuid.py: interpret timestamp with Gregorian calendar
UUID v1 uses an epoch derived frmo Gregorian calendar. but
base36-uuid.py interprets the timestamp with the UNIX epoch time.
that's why it prints a UUID like

```console
$ ./scripts/base36-uuid.py -d 3gbi_0mhs_4sjf42oac6rxqdsnyx
date = 2411-02-16 16:05:52
decimicro_seconds = 0x7ad550
lsb = 0xafe141a195fe0d59
```

even this UUID is generated on nov 30, 2023. so in this change,
we shift the time with the timestamp of UNIX epoch derived from
the Gregorian calendar's day 0. so, after this change, we have:

```console
$ ./scripts/base36-uuid.py -d 3gbi_0mhs_4sjf42oac6rxqdsnyx
date = 2023-11-30 16:05:52
decimicro_seconds = 0x7ad550
lsb = 0xafe141a195fe0d59
```

see https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.4

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16235
2023-12-05 07:39:34 +02:00
Anna Stuchlik
97244eb68e doc: add metric upgrade info to the 5.4 upgrade
This commit adds the information about metrics
update to the 5.2-to-5.4 upgrade guide.

Fixes https://github.com/scylladb/scylladb/issues/15966

Closes scylladb/scylladb#16161
2023-12-05 07:36:29 +02:00
Kefu Chai
3608d9be97 gms/inet_address: remove unused '#include'
neither <iomanip> nor "utils/to_string.hh" is used in
`gms/inet_address.cc`, so let's remove their "#include"s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16281
2023-12-05 08:30:03 +03:00
Kurashkin Nikita
1438e531f8 cql3: statement_restrictions: cartesian product size error message fix.
This commit fixes:
1.The error message will be specific about what type of keys
exceeds the limit (e.g clustering keys or partition keys).
2.Error message will be more general about what causes it, cartesian product
or simple list.
3.Error message will advise to use --max-partition-key-restrictions-per-query
or --max-clustering-key-restrictions-per-query configuration options to
override current (100) limit.

Fixes #15627

Closes scylladb/scylladb#16226
2023-12-05 07:27:03 +02:00
Kefu Chai
a03be17da7 test/boost/sstable_generation_test: s/LE/LT/ when appropriate
in 7a1fbb38, a new test is added to an existing test for
comparing the UUIDs with different time stamps, but we should tighten
the test a little bit to reflect the intention of the test:

 the timestamp of "2023-11-24 23:41:56" should be less than
 "2023-11-24 23:41:57".

in this change, we replace LE with LT to correct it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16245
2023-12-05 08:25:04 +03:00
Anna Stuchlik
1e80bdb440 doc: fix rollback in the 4.6-to-5.0 upgrade guide
This commit fixes the rollback procedure in
the 4.6-to-5.0 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4, branch-5.2, and branch-5.1

Closes scylladb/scylladb#16155
2023-12-05 07:17:49 +02:00
Anna Stuchlik
52c2698978 doc: fix rollback for RHEL (install) in 5.4
This commit fixes the installation command
in the Rollback section for RHEL/Centos
in the 5.2-5.4 upgrade guide.

It's a follow-up to https://github.com/scylladb/scylladb/pull/16114
where the command was not updated.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4.

Closes scylladb/scylladb#16156
2023-12-05 07:17:14 +02:00
Anna Stuchlik
91cddb606f doc: fix rollback in the 5.1-to-5.2 upgrade guide
This commit fixes the rollback procedure in
the 5.1-to-5.2 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Also, I've the section removed the rollback
section for images, as it's not correct or
relevant.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4 and branch-5.2.

Closes scylladb/scylladb#16152
2023-12-05 07:16:44 +02:00
Anna Stuchlik
7ad0b92559 doc: fix rollback in the 5.0-to-5.1 upgrade guide
This commit fixes the rollback procedure in
the 5.0-to-5.1 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Also, I've the section removed the rollback
section for images, as it's not correct or
relevant.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4, branch-5.2, and branch-5.1

Closes scylladb/scylladb#16154
2023-12-05 07:15:41 +02:00
Patryk Jędrzejczak
c8ee7d4499 db: make schema commitlog feature mandatory
Using consistent cluster management and not using schema commitlog
ends with a bad configuration throw during bootstrap. Soon, we
will make consistent cluster management mandatory. This forces us
to also make schema commitlog mandatory, which we do in this patch.

A booting node decides to use schema commitlog if at least one of
the two statements below is true:
- the node has `force_schema_commitlog=true` config,
- the node knows that the cluster supports the `SCHEMA_COMMITLOG`
  cluster feature.

The `SCHEMA_COMMITLOG` cluster feature has been added in version
5.1. This patch is supposed to be a part of version 6.0. We don't
support a direct upgrade from 5.1 to 6.0 because it skips two
versions - 5.2 and 5.4. So, in a supported upgrade we can assume
that the version which we upgrade from has schema commitlog. This
means that we don't need to check the `SCHEMA_COMMITLOG` feature
during an upgrade.

The reasoning above also applies to Scylla Enterprise. Version
2024.2 will be based on 6.0. Probably, we will only support
an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even
if we support an upgrade from 2023.x, this patch won't break
anything because 2023.1 is based on 5.2, which has schema
commitlog. Upgrades from 2022.x definitely won't be supported.

When we populate a new cluster, we can use the
`force_schema_commitlog=true` config to use schema commitlog
unconditionally. Then, the cluster feature check is irrelevant.
This check could fail because we initiate schema commitlog before
we learn about the features. The `force_schema_commitlog=true`
config is especially useful when we want to use consistent cluster
management. Failing feature checks would lead to crashes during
initial bootstraps. Moreover, there is no point in creating a new
cluster with `consistent_cluster_management=true` and
`force_schema_commitlog=false`. It would just cause some initial
bootstraps to fail, and after successful restarts, the result would
be the same as if we used `force_schema_commitlog=true` from the
start.

In conclusion, we can unconditionally use schema commitlog without
any checks in 6.0 because we can always safely upgrade a cluster
and start a new cluster.

Apart from making schema commitlog mandatory, this patch adds two
changes that are its consequences:
- making the unneeded `force_schema_commitlog` config unused,
- deprecating the `SCHEMA_COMMITLOG` feature, which is always
  assumed to be true.

Closes scylladb/scylladb#16254
2023-12-04 21:02:16 +02:00
Calle Wilund
75a8be5b87 commitlog.hh: Fix numeric constant for file format version 3 to be actual '3'
Fixes #16277

When the PR for 'tagged pages' was submitted for RFC, it was assumed that PR #12849
(compression) would be committed first. The latter introduced v3 format, and the
format in #12849 (tagged pages) was assumed to have to be bumped to 4.

This ended up not the case, and I missed that the code went in with file format
tag numeric value being '4' (and constant named v3).

While not detrimental, it is confusing, and should be changed asap (before anything
depends on files with the tag applied).

Closes scylladb/scylladb#16278
2023-12-04 21:01:44 +02:00
Calle Wilund
e94070db64 commitlog_test: Add test for commit log replay skip past EOF
Refs #15269

Unit test to check that trying to skip past EOF in a borked segment
will not crash the process. file_data_input_impl asserts iff caller
tries this.
2023-12-04 20:50:42 +02:00
Takuya ASADA
6eb9344cb3 dist: introduce scylla-tune-sched.service to tune kernel scheduler
On /usr/lib/sysctl.d/99-scylla-sched.conf, we have some sysctl settings to
tune the scheduler for lower latency.
This is mostly to prevent softirq threads processing tcp and reactor threads
from injecting latency into each other.
However, these parameters are moved to debugfs from linux-5.13+, so we lost
scheduler tuneing on recent kernels.

To support tuning recent kernel, let's add a new service which support
to configure both sysctl and debugfs.
The service named scylla-tune-sched.service
The service will unconditionally enables when installed, on older kernel
it will tune via sysctl, on recent kernel it will tune via debugfs.

Fixes #16077

Closes scylladb/scylladb#16122
2023-12-04 19:29:46 +02:00
Kefu Chai
3ffd8737e4 gms/inet_address: format gms::inet_address via net::inet_address
in 4ea6e06c, we specialized fmt::formatter<gms::inet_address> using
the formatter of bytes if the underlying address is an IPv6 address.
this breaks the tests with JMX which expected the shortened form of
the text representation of the IPv6 address.

in this change, instead of reinventing the wheel, let's reuse the
existing formatter of net::inet_address, which is able to handle
both IPv4 and IPv6 addresses, also it follows
https://datatracker.ietf.org/doc/html/rfc5952 by compressing the
consecutive zeros.

since this new formatter is a thin wrapper of seastar::net::inet_addresss,
the corresponding unit test will be added to Seastar.

Refs #16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16267
2023-12-04 19:24:00 +02:00
Kefu Chai
28906725df repair: add formatter for row_level_diff_detect_algorithm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
row_level_diff_detect_algorithm. but its operator<<() is preserved,
as we are still using our homebrew the generic formatter for
std::vector, and this formatter is still using operator<< for formatting
the elements in the vector.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16248
2023-12-04 18:59:52 +02:00
Yaniv Kaul
21cce458d8 test: alternator: fix typo passs instead of pass in test_gsi.py
Fix a typo.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#16258
2023-12-04 18:58:31 +02:00
Avi Kivity
c1d0baf11a Merge 'build: add an option to create building system with CMake' from Kefu Chai
as part of the efforts to migrate to the CMake-based building system,
this change enables us to `configure.py` to optionally create
`build.ninja` with CMake.

in this change, we add a new option named `--use-cmake` to
`configure.py` so we can create `build.ninja`. please note,
instead of using the "Ninja" generator used by Seastar's
`configure.py` script, we use "Ninja Multi-Config" generator
along with `CMAKE_CROSS_CONFIGS` setting in this project.
so that we can generate a `build.ninja` which is capable of
building the same artifacts with multiple configuration.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15916

* github.com:scylladb/scylladb:
  build: cmake: add compatibility target of dev-headers
  build: add an option to use CMake as the build build system
2023-12-04 18:51:24 +02:00
Kefu Chai
3a8a3100af raft: add formatter for raft::logical_clock::time_point
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we

* define a formatter for logical_clock::time_point, as fmt does not
  provide formatter for this time_point, as it is not a part of the
  standard library
* remove operator<<() for logical_clock::time_point, as its soly
  purpose is to generate the corresponding fmt::formatter when
  FMT_DEPRECATED_OSTREAM is defined.
* remove operator<<() for logical_clock::duration, as fmt provides
  a default implementation for formatting
  std::chrono::nanoseconds already, which uses `int64_t` as its rep
  template parameter as well.
* include "fmt/chrono.h" so that the source files including this
  header can have access the formatter without including it by
  themselves, this preserve the existing behavior which we have
  before removal of "operator<<()".

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16263
2023-12-04 18:32:03 +02:00
Nadav Har'El
4505a86f46 tablets, mv: fix base-view pairing to consider base replication map
In the view update code, the function get_view_natural_endpoint()
determines which view replica this base replica should send an update
to. It currently gets the *view* table's replication map (i.e., the map
from view tokens to lists of replicas holding the token), but assumes
that this is also the *base* table's replication map.

This assumption was true with vnodes, but is no longer true with
tablets - the base table's replication map can be completely different
from the view table's. By looking at the wrong mapping,
get_view_natural_endpoint() can believe that this node isn't really
a base-replica and drop the view update. Alternatively, it can think
it is a base replica - but use the wrong base-view pairing and create
base-view inconsistencies.

This patch solves this bug - get_view_natural_endpoint() now gets two
separate replication maps - the base's and the view's. The callers
need to remember what the base table was (in some cases they didn't
care at the point of the call), and pass it to the function call.

This patch also includes a simple test that reproduces the bug, and
confirms it is fixed: The test has a 6-node cluster using tablets
and a base table with RF=1, and writes one row to it. Before this
patch, the code usually gets confused, thinking the base replica
isn't a replica and loses the view update. With this patch, the
view update works.

Fixes #16227.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16228
2023-12-04 16:38:54 +02:00
Avi Kivity
60af2f3cb2 Merge 'New commitlog file format using tagged pages' from Calle Wilund
Prototype implementation of format suggested/requested by @avikivity:

Divides segments into disk-write-alignment sized pages, each tagged with segment ID + CRC of data content.
When read, we both verify sector integrity (CRC) to detect corruption, as well as matching ID read with expected one.

If the latter mismatches we have a prematurely terminated segment (read truncation), which, depending on whether the CL is
written in batch or periodic mode, as well as explicit sync, can mean data loss.

Note: all-zero pages are treated as kosher, both to align with newly allocated segments, as well as fully terminated (zero-page) ones.

Note: This is a preview/RFC - the rest of the file format is not modified. At least parts of entry CRC could probably be removed, but I have not done so yet (needs some thinking).

Note: Some slight abstraction breaks in impl. and probably less than maximal efficiency.

v2:
* Removed entry CRC:s in file format.
* Added docs on format v3
* Added one more test for recycling-truncation

v3:
* Fixed typos in size calc and docs
* Changed sect metadata order
* Explicit iter type

Closes scylladb/scylladb#15494

* github.com:scylladb/scylladb:
  commitlog_test: Add test for replaying large-ish mutation
  commitlog_test: Add additional test for segmnent truncation
  docs: Add docs on commitlog format 3
  commitlog: Remove entry CRC from file format
  commitlog: Implement new format using CRC:ed sectors
  commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges
  fragmented_temporary_buffer: Add const iterator access to underlying buffers
  commitlog_replayer: differentiate between truncated file and corrupt entries
2023-12-04 13:31:13 +01:00
Avi Kivity
8fa2e3ad2a Merge 'Remove sstables::remove_by_toc_name()' from Pavel Emelyanov
The helper in question complicates the logic of sstable_directory::process() by making garbage collection differently for sstables deleted "atomically" and deleted "one-by-one". Also, the code that deletes sstables one-by-one and uses remove_by_toc_name() renders excessive TOC file reading, because there's sstable object at hand and it had all_components() ready for use.

Surprisingly, there was no test for the deletion-log functionality. This PR adds one. The test passes before the g.c. and regular unlink fix, and (of course) continues passing after it.

Closes scylladb/scylladb#16240

* github.com:scylladb/scylladb:
  sstables: Drop remove_by_name()
  sstables/fs_storage: Wipe by recognized+unrecognized components
  sstable_directory: Enlight deletion log replay
  sstables: Split remove_by_toc_name()
  test: Add test case to validate deletion log work
  sstable_directory: Close dir on exception
  sstable_directory: Fix indentation after previous patch
  sstable_directory: Coroutinize delete_with_pending_deletion_log()
  test: Sstable on_delete() is not necessarily in a thread
  sstable_directory: Split delete_with_pending_deletion_log()
2023-12-03 17:29:34 +02:00
Wojciech Mitros
a8c9451fb2 commitlog: add max disk size api
Currently, the max size of commitlog is obtained either from the
config parameter commitlog_total_space_in_mb or, when the parameter
is -1, from the total memory allocated for Scylla.
To facilitate testing of the behavior of commitlog hard limit,
expose the value of commitlog max_disk_size in a dedicated API.

Closes scylladb/scylladb#16020
2023-12-03 17:16:58 +02:00
Kefu Chai
39b2ee9751 dist/redhat: avoid mixed use of spaces and tabs
rpmlint complains about "mixed-use-of-spaces-and-tabs". and it
does not good in the editor. so let's replace tab with spaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16246
2023-12-03 17:11:03 +02:00
Nadav Har'El
59ff27ea4a Merge 'Typos: fix typos in comments' from Yaniv Kaul
Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16257

* github.com:scylladb/scylladb:
  Update service/topology_state_machine.hh
  Update raft/tracker.hh
  Update db/view/view.cc
  Typos: fix typos in comments
2023-12-03 11:23:51 +02:00
Yaniv Kaul
030d421931 Update service/topology_state_machine.hh 2023-12-03 10:08:11 +02:00
Yaniv Kaul
7c4b742583 Update raft/tracker.hh 2023-12-03 10:07:55 +02:00
Yaniv Kaul
2b73793a39 Update db/view/view.cc 2023-12-03 10:07:45 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Kamil Braun
01e54f5b12 Merge 'test: delete topology_raft_disabled suite' from Patryk Jędrzejczak
This PR is a necessary step to fix #15854 -- making consistent
cluster management mandatory on master.

Before making consistent cluster management mandatory, we have
to get rid of all tests that depend on the
`consistent_cluster_management=false` config. These are the tests
in the `topology_raft_disabled` suite.

There's the internal Raft upgrade procedure, which is the bulk of the
upgrade logic. Then, there are two thin "layers" around it that
invoke it underneath: recovery procedure and
enable-raft-in-the-cluster procedure. We're getting rid of the
second one by making Raft always enabled, so we naturally have to
get rid of tests that depend on it. The idea is to replace every
necessary enable-raft-in-the-cluster procedure in these tests with
the recovery procedure. Then, we will still be testing the internal
Raft upgrade procedure in the in-tree tests. The
enable-raft-in-the-cluster procedure is already tested by QA tests,
so we don't need to worry about these changes.

Unfortunately, we cannot adapt `test_raft_upgrade_no_schema`.
After making consistent cluster management mandatory on master,
schema commitlog will also become mandatory because
`consistent_cluster_management: True`,
`force_schema_commit_log: False`
is considered a bad configuration. These changes will make
`test_raft_upgrade_no_schema` unimplementable in the Scylla repo.
Therefore, we remove this test. If we want to keep it, we must
rewrite it as an upgrade dtest.

After making all tests in `topology_raft_disabled` use consistent
cluster management, there is no point in keeping this suite.
Therefore, we delete it and move all the tests to `topology_custom`.

Closes scylladb/scylladb#16192

* github.com:scylladb/scylladb:
  test: delete topology_raft_disabled suite
  test: topology_raft_disabled: move tests to topology_custom suite
  test: topology_raft_disabled: move utils to topology suite
  test: topology_raft_disabled: use consistent cluster management
  test: topology_raft_disabled: add new util functions
  test: topology_raft_disabled: delete test_raft_upgrade_no_schema
2023-12-01 17:11:32 +01:00
Pavel Emelyanov
17fd558df8 sstables: Drop remove_by_name()
It was used by deletion log replay and by storage wipe, now it's unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
4405a625f6 sstables/fs_storage: Wipe by recognized+unrecognized components
Currently wiping fs-backed sstable happens via reading and parsing its
TOC file back. Then the three-step process goes:

- move TOC -> TOC.tmp
- remove components (obtained from TOC.tmp)
- remove TOC.tmp

However, wiping sstable happens in one of two cases -- the sstable was
loaded from the TOC file _or_ sstable had evaluated the needed
components and generated TOC file. With that, the 2nd step can be made
without reading the TOC file, just by looking at all components sitting
on the sstable

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
de931702ec sstable_directory: Enlight deletion log replay
Garbage collection of sstables is scattered between two strages -- g.c.
per-se and the regular processing.

The former stage collects deletion logs and for each log found goes
ahead and deletes the full sstable with the standard sequence:

- move TOC -> TOC.tmp
- remove components
- remove TOC.tmp

The latter stage picks up partially unlinked sstables that didn't go via
atomic deletion with the log. This comes as

- collect all components
  - keep TOC's and TOC.tmp's in separate lists
  - attach other components to TOC/TOC.tmp by generation value
- for all TOC.tmp's get all attached components and remove them
- continue loading TOC's with attached components

Said that, replaying deletion log can be as light as just the first step
out of the above sequence -- just move TOC to TOC.tmp. After that the
regular processing would pick the remaining components and clean them

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
5ff5946520 sstables: Split remove_by_toc_name()
The helper consists of three phases:

- move TOC -> TOC.tmp
- remove components listed in TOC
- remove TOC.tmp

The first step is needed separately by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
b10ca96e07 test: Add test case to validate deletion log work
The test sequence is

- create several sstables
- create deletion log for a sub-set of them
- partially unlink smaller sub-sub-set
- make sstable directory do the processing with g.c.
- check that the sstables loaded do NOT include the deleted ones

The .throw_on_missing_toc bit set additionally validates that the
directory doesn't contain garbage not attached to any other TOCs

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 18:20:20 +03:00
Pavel Emelyanov
fcf080b63b sstable_directory: Close dir on exception
When committing the deletion log creation its containing directory is
sync-ed via opened file. This place is not exception safe and directory
can be left unclosed

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
bb167dcca5 sstable_directory: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
28b1289d4b sstable_directory: Coroutinize delete_with_pending_deletion_log()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
92f0aa04d0 test: Sstable on_delete() is not necessarily in a thread
One of the test cases injects an observer into sstable->unlink() method
via its _on_delete() callback. The test's callback assumes that it runs
in an async context, but it's a happy coincidence, because deletion via
the deletion log runs so. Next patch is changing it and the test case
will no longer work. But since it's a test case it can just directly
call a libc function for its needs

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:38 +03:00
Pavel Emelyanov
ed043e5762 sstable_directory: Split delete_with_pending_deletion_log()
The helper consists of three parts -- prepare the deletion log, unlink
sstables and drop the deletion log. For testing the first part is needed
as a separate step, so here's this split.

It renders two nested async contexts, but it will change soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-01 15:00:37 +03:00
Nadav Har'El
bae6f3387f CODEOWNERS: remove some entries
The ".github/CODEOWNERS" is used by github to recommend reviewers for
pull requests depending on the directories touched in the pull request.
Github ignores entries on that file who are not **maintainers**. Since
Jan is no longer a Scylla maintainer, I remove his entries in the list.

Additionally, I am removing *myself* from *some* of the directories.
For many years, it was an (unwritten) policy that experienced Scylla
developers are expected to help in reviewing pieces of the code they
are familiar with - even if they no longer work on that code today.
But as ScyllaDB the company grew, this is no longer true; The policy
is now that experienced developers are requested review only code in
their own or their team's area of responsibility - experienced developers
should help review *designs* of other parts, but not the actual code.
For this reason I'm removing my name from various directories.
I can still help review such code if asked specifically - but I will no
longer be the "default" reviewer for such code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16239
2023-11-30 20:29:05 +02:00
Tomasz Grabiec
c64ae7b733 scripts: Introduce tablet-mon.py
Closes scylladb/scylladb#15512
2023-11-30 19:15:36 +02:00
Nadav Har'El
49860952f9 Merge 'LIST EFFECTIVE SERVICE LEVEL statement' from Michał Jadwiszczak
Add `LIST EFFECTIVE SERVICE LEVEL` statement to be able to display from which service level come which service level options.

Example:
There are 2 roles: role1 and role2. Role1 is assigned with sl1 (timeout = 2s, workload_type = interactive) and role2 is assigned with sl2 (timeout = 10s, workload_type = batch).
Then, if we grant role1 to role2, the user with role2 will have 2s timeout (from sl1) and batch workload type (from sl2).

```
> LIST EFFECTIVE SERVICE LEVEL OF role2;

 service_level_option | effective_service_level | value
----------------------+-------------------------+-------------
        workload_type |                     sl2 |       batch
              timeout |                     sl1 |          2s
```

Fixes: https://github.com/scylladb/scylladb/issues/15604

Closes scylladb/scylladb#14431

* github.com:scylladb/scylladb:
  cql-pytest: add `LIST EFFECTIVE SERVICE LEVEL OF` test
  docs: add `LIST EFFECTIVE SERVICE LEVEL` statement docs
  cql3:statements: add `LIST EFFECTIVE SERVICE LEVEL` statement
  service:qos: add option to include effective names to SLO
2023-11-30 18:12:52 +02:00
Gleb Natapov
3ddc1458ee storage_service: topology coordinator: ignore abort_requested_exception in background fibers
The exception may be thrown by "event" CV during shutdown.
2023-11-30 17:52:40 +02:00
Gleb Natapov
8ed8b151da storage_service: fix de-initialization order between storage service and group0_service
Storage service uses group0 internally, but group0 is create long after
storage service is initialized and passed to it using ss::set_group0()
function. But what it means is that during shutdown group0 is destroyed
before ss::stop() is called and thus storage service is left with a
dangling reference. Fix it by introducing a function that cancels all
group0 operations and waits for background fibers to complete. For that
we need separate abort source for group0 operation which the patch also
introduces.
2023-11-30 17:52:38 +02:00
Patryk Jędrzejczak
77c4ee92e5 test: delete topology_raft_disabled suite
After moving all tests out of topology_raft_disabled, we can safely
remove this suite.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
ba990d90bb test: topology_raft_disabled: move tests to topology_custom suite
We move the remaining tests in topology_raft_disabled to
topology_custom. We choose topology_custom because these tests
cannot use consistent topology changes.

We need to modify these tests a bit because we cannot pass
RandomTables to a test case function if the initial cluster size
equals 0. RandomTables.__init__ requires manager.cql to be present.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
659ac9c7f5 test: topology_raft_disabled: move utils to topology suite
We move all used util functions from topology_raft_disabled to
topology before we remove topology_raft_disabled. After this
change, util.py in topology will be the single util file for all
topology tests.

Some util functions in topology_raft_disabled aren't used anymore.
We don't move such functions and remove them instead.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
684b070b20 test: topology_raft_disabled: use consistent cluster management
Soon, we will make consistent cluster management mandatory on
master. Before this, we have to change all tests in the
topology_raft_disabled suite so that they do not depend on the
consistent_cluster_management=false config.

Adapting test_raft_upgrade_majority_loss is simple. We only have
to get rid of the initial upgrade. This initial upgrade didn't
test anything. Every test in topology_raft_disabled had to do it
at the beginning because of consistent_cluster_management=false.

Adapting test_raft_upgrade_basic and test_raft_upgrade_stuck is
more difficult. It requires changing the initial upgrade to
clearing Raft data in RECOVERY mode on all servers and restarting
them. Then, the servers will run the same upgrade procedure as
before.

After changing the tests, we also update their names appropriately.

test_raft_upgrade_stuck becomes a bit slower, so we remove the
comment about running time. Also, one TODO was fixed in the process
of rewriting the test. This fix forced us to skip the test in the
release mode since we cannot update the list of error injections
through manager.server_update_config in this mode.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
1059fece19 test: topology_raft_disabled: add new util functions
They are shorter and more readable than long CQL queries. We use
them even more in the following commit.
2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak
7e43ebf88e test: topology_raft_disabled: delete test_raft_upgrade_no_schema
After making consistent cluster management mandatory on master,
schema commitlog will also become mandatory because
consistent_cluster_management: True,
force_schema_commit_log: False
is considered a bad configuration. These changes will make
test_raft_upgrade_no_schema unimplementable in the Scylla repo, so
we remove it.

If we want to keep this test, we must rewrite it as an upgrade
dtest.
2023-11-30 15:50:21 +01:00
Kefu Chai
7a1fbb38f9 sstable: order uuid-based generation as timeuuid
under most circumstances, we don't care the ordering of the sstable
identifiers, as they are just identifiers. so, as long as they can be
compared, we are good. but we have tests with expect that the sstables
can be ordered by the time they are created. for instance,
sstable_run_based_compaction_test has this expectaion.

before this change, we compare two UUID-based generations by its
(MSB, LSB) lexicographically. but UUID v1 put the lower bits of
the timestamp at the higher bits of MSB, so the ordering of the
"time" in timeuuid is not preserved when comparing the UUID-based
generations. this breaks the test of sstable_run_based_compaction_test,
which feeds the sstables to be compacted in a set, and the set is
ordered with the generation of the sstables.

after this change, we consider the UUID-based generation as
a timeuuid when comparing them.

Fixes #16215
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16238
2023-11-30 14:50:44 +02:00
Michał Jadwiszczak
e3515cfc1b cql-pytest: add LIST EFFECTIVE SERVICE LEVEL OF test 2023-11-30 13:07:20 +01:00
Michał Jadwiszczak
e1d86f9afb docs: add LIST EFFECTIVE SERVICE LEVEL statement docs 2023-11-30 13:07:20 +01:00
Michał Jadwiszczak
2438965b6a cql3:statements: add LIST EFFECTIVE SERVICE LEVEL statement
Add statement to print effective service level of a specified role.
2023-11-30 13:07:20 +01:00
Michał Jadwiszczak
1b08338fe7 service:qos: add option to include effective names to SLO
Allow to include `slo_effective_names` in `service_level_options`
to be able to determine from which service level the specific option value comes from.
2023-11-30 13:07:20 +01:00
Yaron Kaikov
7ce6962141 build_docker.sh: Upgrade package during creation and remove sshd service
When scanning our latest docker image using `trivy` (command: `trivy
image docker.io/scylladb/scylla-nightly:latest`), it shows we have OS
packages which are out of date.

Also removing `openssh-server` and `openssh-client` since we don't use
it for our docker images

Fixes: https://github.com/scylladb/scylladb/issues/16222

Closes scylladb/scylladb#16224
2023-11-30 14:00:15 +02:00
Botond Dénes
d6d9751dd8 tools/scylla-sstable: validate,validate-checksums: print JSON last
Said commands print errors as they validate the sstables. Currently this
intermingles with the regular JSON output of these commands, resulting
in ugly and confusing output.
This is not a problem for scripted use, as logs go to stderr while the
JSON go to stdout, but it is a problem for human users.
Solve this by outputting the JSON into a std::stringstream and printing
it in one go at the very end. This means JSON is accumulated in a memory
buffer, but these commands don't output a lot of JSON, so this shouldn't
be a problem.

Closes scylladb/scylladb#16216
2023-11-30 09:53:47 +03:00
Piotr Smaroń
5fd30578d7 config: introduce value_status::Deprecated
Current mechanism to deprecate config options is implemented in a hacky
way in `main.cpp` and doesn't account for existing
`db::config/boost::po` API controlling lifetime of config options, hence
it's being replaced in this PR by adding yet another `value_status`
enumerator: `Deprecated`, so that deprecation of config options is
controlled in one place in `config.cc`,i.e. when specifying config options.
Motivation: https://docs.google.com/document/d/18urPG7qeb7z7WPpMYI2V_lCOkM5YGKsEU78SDJmt8bM/edit?usp=sharing

With this change, if a `Deprecated` config option is specified as
1. a command line parameter, scylla will run and log:
```
WARN  2023-11-25 23:37:22,623 [shard 0:main] init - background-writer-scheduling-quota option ignored (deprecated)
```
(Previously it was only a message printed to standard output, not a
scylla log of warn level).
2. an option in `scylla.yaml`, scylla will run and log:
```
WARN  2023-11-27 23:55:13,534 [shard 0:main] init - Option is deprecated : background_writer_scheduling_quota
```

Fixes #15887
Incorporates dropped https://github.com/scylladb/scylladb/pull/15928

Closes scylladb/scylladb#16184
2023-11-30 08:52:57 +03:00
Avi Kivity
8e9d3af431 Merge 'Commitlog: complete prerequisites and enforce hard limit by default' from Eliran Sinvani
This miniset, completes the prerequisites for enabling commitlog hard limit on by default.
Namely, start flushing and evacuating segments halfway to the limit in order to never hit it under normal circumstances.
It is worth mentioning that hitting the limit is an exceptional condition which it's root cause need to be resolved, however,
once we do hit the limit, the performance impact that is inflicted as a result of this enforcement is irrelevant.

Tests: unit tests.
          LWT write test (#9331)
A whitebox testing has been performed by @wmitros , the test aimed at putting as much pressure as possible on the commitlog segments by using a write pattern that rewrites the partitions in the memtable keeping it at ~85% occupancy so the dirty memory manager will not kick in. The test compared 3 configurations:
1. The default configuration
2. Hard limit on (without changing the flush threshold)
3. the changes in this PR applied.
The last exhibited the "best" behavior in terms of metrics, the graphs were the flattest and less jaggy from the others.

Closes scylladb/scylladb#10974

* github.com:scylladb/scylladb:
  commitlog: enforce commitlog size hard limit by default
  commitlog: set flush threshold to half of the  limit size
  commitlog: unfold flush threshold assignment
2023-11-29 20:55:53 +02:00
Kamil Braun
8a14839a00 Merge 'handle more failures during topology operations' from Gleb
This series adds handling for more failures during a topology operation
(we already handle a failure during streaming). Here we add handling of
tablet draining errors by aborting the operation and handling of errors
after streaming where an operation cannot be aborted any longer. If the
error happens when rollback is no longer possible we wait for ring delay
and proceed to the next step. Each individual patch that adds the sleep
has an explanation what the consequences of the patch are.

* 'gleb/topology-coordinator-failures' of github.com:scylladb/scylla-dev:
  test: add test to check errro handling during tablet draining
  test: fix test_topology_streaming_failure test to not grep the whole file
  storage_service: add error injection into the tablet migration code
  storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage
  storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state
  storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state
  storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes
  storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure
  storage_service: topology coordinator: add rollback_to_normal node state
  storage_service: topology coordinator: put fence version into the raft state
  storage_service: topology coordinator: do fencing even if draining failed
2023-11-29 19:02:35 +01:00
Avi Kivity
cccd2e7fa7 Merge 'Generalize sstables TOC file reading' from Pavel Emelyanov
TOC file is read and parsed in several places in the code. All do it differently, and it's worth generalizing this place.
To make it happen also fix the S3 readable_file so that it could be used inside file_input_stream.

Closes scylladb/scylladb#16175

* github.com:scylladb/scylladb:
  sstable: Generalize toc file read and parse
  s3/client: Don't GET object contents on out-of-bound reads
  s3/client: Cache stats on readable_file
2023-11-29 19:18:31 +02:00
Nadav Har'El
62f89d49e5 tablets, mv: fix on_internal_error on write to base table
This situation before this patch is that when tablets are enabled for
a keyspace, we can create a materialized view but later any write to
the base table fails with an on_internal_error(), saying that:

     "Tried to obtain per-keyspace effective replication map of test
      but it's per-table."

Indeed, with tablets, the replication is different for each table - it's
not the same for the entire keyspace.

So this patch changes the view update code to take the replication
map from the specific base table, not the keyspace.

This is good enough to get materialized-views reads and writes working
in a simple single-node case, as the included test demonstrates (the
test fails with on_internal_error() before this patch, and passes
afterwards).

But this fix is not perfect - the base-view pairing code really needs
to consider not only the base table's replication map, but also the
view table's replication map - as those can be different. We'll fix
this remaining problem as a followup in a separate patch - it will
require a substantially more elaborate test to reproduce the need
for the different mapping and to verify that fix.

Fixes #16209.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16211
2023-11-29 15:29:17 +01:00
Anna Stuchlik
ce6b15af34 doc: remove the "preview" label from Rust driver 2023-11-29 15:01:31 +01:00
Avi Kivity
cd732b1364 Update seastar submodule
* seastar 830ce8673...55a821524 (34):
  > Revert "reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc"
  > epoll: Avoid spinning on aborted connections
Fixes #12774
Fixes #7753
Fixes #13337
  > Merge 'Sanitize test-only reactor facilities' from Pavel Emelyanov
  > test/unit: fix fmt version check
  > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc
  > build: add spaces before () and after commands
  > reactor: use zero-initialization to initialize io_uring_params
  > Merge 'build: do not return a non-false condition if the option is off ' from Kefu Chai
  > memory: do not use variable length array
  > build: use tri_state_option() to link against Sanitizers
  > build: do not define SEASTAR_TYPE_ERASE_MORE on all builds
  > Revert "shared_future: make available() immediate after set_value()"
  > test_runner: do not throw when seastar.app fails to start
  > Merge 'Address issue where Seastar faults in toeplitz hash when reassembling fragment' from John Hester
  > defer, closeable: do not use [[nodiscard(str)]]
  > Merge 'build: generate config-specific rules using generator expressions' from Kefu Chai
  > treewide: use *_v and *_t for better readability
  > build: use different names for .pc files for each build mode
  > perftune.py: skip discovering IRQs for iSCSI disks
  > io-tester: explicit use uint64_t for boost::irange(...)
  > gate: correct the typo in doxygen comment
  > shared_future: make available() immediate after set_value()
  > smp: drop unused templates
  > include fmt/ostream.h to make headers self-sufficient
  > Support ccache in ./configure.py
  > rpc_tester: Disable -Wuninitialized when including boost.accumulators
  > file: construct directory_entry with aggregated ctor
  > file: s/ino64_t/ino_t/, s/off64_t/off_t/
  > sstring_test: include fmt/std.h only if fmtlib >= 10.0.0
  > file: do not include coroutine headers if coroutine is disabled
  > fair_queue::unregister_priority_class:fix assertion
  > Merge 'Generalize `net::udp_channel` into `net::datagram_channel`' from Michał Sala
  > Merge 'Add file::list_directory() that co_yields entries' from Pavel Emelyanov
  > http/file_handler: remove unnecessary cast

Closes scylladb/scylladb#16201
2023-11-29 14:34:30 +02:00
Kefu Chai
c40da20092 utils/pretty_printers: stop using undocumented fmt api
format_parse_context::on_error() is an undocumented API in fmt v9
and in fmt v10, see

- https://fmt.dev/9.1.0/api.html#_CPPv4I0EN3fmt16basic_format_argE
- https://fmt.dev/10.0.0/api.html#_CPPv4I0EN3fmt26basic_format_parse_contextE

despite that this API was once used in its document for fmt v10.0.0, see
https://fmt.dev/10.0.0/api.html#formatting-user-defined-types. it's
still, well, undocumented.

so, to have better compatibility, let's use the documented API in place
of undocumented one. please note, `throw_format_error()` was still
not a public API before 10.1.0, so before that release we have to
throw `fmt::format_error` explicitly. so we cannot use it yet during
the transitional period.

because the class of `fmt::format_error` is defined in `fmt/format.h`,
we need to include this header for using it.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16212
2023-11-29 12:49:04 +02:00
Pavel Emelyanov
0da37d5fa6 sstable: Generalize toc file read and parse
There are several places where TOC file is parsed into a vector of
components -- sstable::read_toc(), remove_by_toc_name() and
remove_by_registry_entry(). All three deserve some generalization.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:09:52 +03:00
Pavel Emelyanov
c5d85bdf79 s3/client: Don't GET object contents on out-of-bound reads
If S3 readable file is used inside file input stream, the latter may
call its read methods with position that is above file size. In that
case server replies with generic http error and the fact that the range
was invalid is encoded into reply body's xml.

That's not great to catch this via wrong reply status exception and xml
parsing all the more so we can know that the read is out-of-bound in
advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:09:52 +03:00
Pavel Emelyanov
339182287f s3/client: Cache stats on readable_file
S3-based sstables components are immutable, so every time stat is called
there's no need to ping server again.

But the main intention of this patch is to provide stats for read calls
in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:06:54 +03:00
Calle Wilund
3b70fde3cd commitlog: Make named_files in delete_segments have updated size
Fixes #16207

commitlog::delete_segments deletes (or recycles) segments replayed.
The actual file size here is added to footprint so actual delete then
can determine iff things should be recycled or removed.
However, we build a pending delete list of named_files, and the files
we added did not have size set. Bad. Actual deletion then treated files
as zero-byte sized, i.e. footprint calculations borked.

Simple fix is just filling in the size of the objects when addind.
Added unit test for the problem.

Closes scylladb/scylladb#16210
2023-11-29 09:58:47 +02:00
Yaron Kaikov
c3ee53f3be test.py: enable xml validation
Following https://github.com/scylladb/scylladb/issues/4774#issuecomment-1752089862

Adding back xml validation

Closes: https://github.com/scylladb/scylla-pkg/issues/3441

Closes scylladb/scylladb#16198
2023-11-29 09:02:36 +02:00
Botond Dénes
3ed6925673 Merge 'Major compaction: flush commitlog by forcing new active segment and flushing all tables' from Benny Halevy
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb/scylladb#15777

Closes scylladb/scylladb#15820

* github.com:scylladb/scylladb:
  docs: nodetool: flush: enrich examples
  docs: nodetool: compact: fix example
  api: add /storage_service/compact
  api: add /storage_service/flush
  compaction_manager: flush_all_tables before major compaction
  database: add flush_all_tables
  api: compaction: add flush_memtables option
  test/nodetool: jmx: fix path to scripts/scylla-jmx
  scylla-nodetool, docs: improve optional params documentation
2023-11-29 08:48:40 +02:00
Kefu Chai
65994b1e83 build: cmake: add compatibility target of dev-headers
our CI builds "dev-headers" as a gating check. but the target names
generated by CMake's Ninja Multi-Config generator does not follow
this naming convention. we could have headers:Dev, but still, it's
different from what we are using, before completely switching to
CMake, let's keep this backward compatibility by adding a target
with the same name.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-29 10:08:59 +08:00
Kefu Chai
2d284f4749 build: add an option to use CMake as the build build system
as part of the efforts to migrate to the CMake-based building system,
this change enables us to `configure.py` to optionally create
`build.ninja` with CMake.

in this change, we add a new option named `--use-cmake` to
`configure.py` so we can create `build.ninja`. please note,
instead of using the "Ninja" generator used by Seastar's
`configure.py` script, we use "Ninja Multi-Config" generator
along with `CMAKE_CROSS_CONFIGS` setting in this project.
so that we can generate a `build.ninja` which is capable of
building the same artifacts with multiple configuration.

Fixes #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-29 10:08:59 +08:00
Nadav Har'El
88a5ddabce tablets, mv: create tablets for a new materialized view
Before this patch, trying to create a materialized view when tablets
are enabled for a keyspace results in a failure: "Tablet map not found
for table <uuid>", with uuid referring to the new view.

When a table schema is created, the handler on_before_create_column_family()
is called - and this function creates the tablet map for the new table.
The bug was that we forgot to do the same when creating a materialized
view - which also a bona-fide table.

In this patch we call on_before_create_column_family() also when
creating the materialized view. I decided *not* to create a new
callback (e.g., on_before_create_view()) and rather call the existing
on_before_create_column_family() callback - after all, a view is
a column family too.

This patch also includes a test for this issue, which fails to create
the view before this patch, and passes with the patch. The test is
in the test/topology_experimental_raft suite, which runs Scylla with
the tablets experimental feature, and will also allow me to create
tests that need multiple nodes. However, the first test added here
only needs a single node to reproduce the bug and validate its fix.

Fixes #16194.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16205
2023-11-28 21:54:32 +01:00
Kamil Braun
3582095b79 schema_tables: use smaller timestamp for base mutations included with view update
When a view schema is changed, the schema change command also includes
mutations for the corresponding base table; these mutations don't modify
the base schema but are included in case if the receiver of view
mutations somehow didn't receive base mutations yet (this may in theory
happen outside Raft mode).

There are situations where the schema change command contains both
mutations that describe the current state of the base table -- included
by a view update, as explained above -- and mutations that want to
modify the base table. Such situation arises, for example, when we
update a user-defined type which is referenced by both a view and its
corresponding base table. This triggers a schema change of the view,
which generates mutations to modify the view and includes mutations of
the current base schema, and at the same time it triggers a schema
change of the base, which generates mutations to modify the base.

These two sets of mutations are conflicting with each other. One set
wants to preserve the current state of the base table while the other
wants to modify it. And the two sets of mutations are generated using
the same timestamp, which means that conflict resolution between them is
made on a per-mutation-cell basis, comparing the values in each cell and
taking the "larger" one (meaning of "larger" depends on the type of each
cell).

Fortunately, this conflict is currently benign -- or at least there is
no known situation where it causes problems.

Unfortunately, it started causing problems when I attempted to implement
group 0 schema versioning (PR scylladb/scylladb#15331), where instead of
calculating table versions as hashes of schema mutations, we would send
versions as part of schema change command. These versions would be
stored inside the `system_schema.scylla_tables` table, `version` column,
and sent as part of schema change mutations.

And then the conflict showed. One set of mutations wanted to preserve
the old value of `version` column while the other wanted to update it.
It turned out that sometimes the old `version` prevailed, because the
`version` column in `system_schema.scylla_tables` uses UUID-based
comparison (not timeuuid-based comparison). This manifested as issue
scylladb/scylladb#15530.

To prevent this, the idea in this commit is simple: when generating
mutations for the base table as part of corresponding view update, do
not use the provided timestamp directly -- instead, decrement it by one.
This way, if the schema change command contains mutations that want to
modify the base table, these modifying mutations will win all conflicts
based on the timestamp alone (they are using the same provided
timestamp, but not decremented).

One could argue that the choice of this timestamp is anyway arbitrary.
The original purpose of including base mutations during view update was
to ensure that a node which somehow missed the base mutations, gets them
when applying the view. But in that case, the "most correct" solution
should have been to use the *original* base mutations -- i.e. the ones
that we have on disk -- instead of generating new mutations for the base
with a refreshed timestamp. The base mutations that we have on disk have
smaller timestamps already (since these mutations are from the past,
when the base was last modified or created), so the conflict would also
not happen in this case.

But that solution would require doing a disk read, and we can avoid the
read while still fixing the conflict by using an intermediate solution:
regenerating the mutations but with `timestamp - 1`.

Ref: scylladb/scylladb#15530

Closes scylladb/scylladb#16139
2023-11-28 21:51:18 +01:00
Benny Halevy
310ff20e1e docs: nodetool: flush: enrich examples
Provide 3 examples, like in the nodetool/compact page:
global, per-keyspace, per-table.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:48:22 +02:00
Benny Halevy
d32b90155a docs: nodetool: compact: fix example
It looks like `nodetool compact standard1` is meant
to show how to compact a specified table, not a keyspace.
Note that the previous example like is for a keyspace.
So fix the table compaction example to:
`nodetool compact keyspace1 standard1`

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:45:20 +02:00
Benny Halevy
b12b142232 api: add /storage_service/compact
For major compacting all tables in the database.
The advantage of this api is that `commitlog->force_new_active_segment`
happens only once in `database::flush_all_tables` rather than
once per keyspace (when `nodetool compact` translates to
a sequence of `/storage_service/keyspace_compaction` calls).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
1b576f358b api: add /storage_service/flush
For flushing all tables in the database.
The advantage of this api is that `commitlog->force_new_active_segment`
happens only once in `database::flush_all_tables` rather than
once per keyspace (when `nodetool flush` translates to
a sequence of `/storage_service/keyspace_flush` calls).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
66ba983fe0 compaction_manager: flush_all_tables before major compaction
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6

However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).

Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.

However, flushing all tables too frequently might
result in tiny sstables.  Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.

Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).

In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.

Fixes scylladb/scylladb#15777

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
be763bea34 database: add flush_all_tables
Flushes all tables after forcing force_new_active_segment
of the commitlog to make sure all commitlog segments can
get recycled.

Otherwise, due to "false sharing", rarely-written tables
might inhibit recycling of the commitlog segments they reference.

After f42eb4d1ce,
that won't allow compaction to purge some tombstones based on
the min_gc_time.

To be used in the next patch by major compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
1fd85bd37b api: compaction: add flush_memtables option
When flushing is done externally, e.g. by running
`nodetool flush` prior to `nodetool compact`,
flush_memtables=false can be passed to skip flushing
of tables right before they are major-compacted.

This is useful to prevent creation of small sstables
due to excessive memtable flushing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
7f860d612a test/nodetool: jmx: fix path to scripts/scylla-jmx
The current implementation makes no sense.

Like `nodetool_path`, base the default `jmx_path`
on the assumption that the test is run using, e.g.
```
(cd test/nodetool; pytest --nodetool=cassandra test_compact.py)
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
9324363e55 scylla-nodetool, docs: improve optional params documentation
Document the behavior if no keyspace is specified
or no table(s) are specified for a given keyspace.

Fixes scylladb/scylladb#16032

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Anna Stuchlik
bfe19c0ed2 doc: add experimental support for object storage
This commit adds information on how to enable
object storage for a keyspace.

The "Keyspace storage options" section already
existed in the doc, but it was not valid as
the support was only added in version 5.4

The scope of this commit:
- Update the "Keyspace storage options" section.
- Add the information about object storage support
  to the Data Definition> CREATE KEYSPACE section
  * Marked as "Experimental".
  * Excluded from the Enterprise docs with the
    .. only:: opensource directive.

This commit must be backported to branch-5.4,
as support for object storage was added
in version 5.4.

Closes scylladb/scylladb#16081
2023-11-28 14:27:01 +02:00
Anna Stuchlik
37f20f2628 doc: fix Rust Driver release information
This PR removes the incorrect information that
the ScyllaDB Rust Driver is not GA.

In addition, it replaces "Scylla" with "ScyllaDB".

Fixes https://github.com/scylladb/scylladb/issues/16178
2023-11-28 10:32:08 +01:00
Botond Dénes
f46cdce9d3 Merge 'Make memtable flush tolerate misconfigured S3 storage' from Pavel Emelyanov
Nowadays if memtable gets flushed into misconfigured S3 storage, the flush fails and aborts the whole scylla process. That's not very elegant. First, because upon restart garbage collecting non-sealed sstables would fail again. Second, because re-configuring an endpoint can be done runtime, scylla re-reads this config upon HUP signal.

Flushing memtable restarts when seeing ENOSPC/EDQUOT errors from on-disk sstables. This PR extends this to handle misconfigured S3 endpoints as well.

fixes: #13745

Closes scylladb/scylladb#15635

* github.com:scylladb/scylladb:
  test: Add object_store test to validate config reloading works
  test: Add config update facility to test cluster
  test: Make S3_Server export config file as pathlib.Path
  config: Make object storage config updateable_value_source
  memtable: Extend list of checking codes
  sstables/storage/s3: Fix missing TOC status check
  s3/client: Map http exceptions into storage_io_error
  exceptions: Extend storage_io_error construction options
2023-11-28 09:33:37 +02:00
Botond Dénes
3ccf1e020b Merge ' compaction: abort compaction tasks' from Aleksandra Martyniuk
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.

Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.

Closes scylladb/scylladb#16177

* github.com:scylladb/scylladb:
  test: test abort of compaction task that isn't started yet
  test: test running compaction task abort
  tasks: fail if a task was aborted
  compaction: abort task manager compaction tasks
2023-11-28 09:08:04 +02:00
Pavel Emelyanov
1efddc228d sstable: Do not nest io-check wrappers into each other
When sealing an sstable on local storage  the storage driver performs
several flushes on a file that is directory open via checked-file.
Flush calls are wrapped with sstable_write_io_check, but that's
excessive, the checked file will wrap flushes with io-checks on its own

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16173
2023-11-27 15:53:02 +02:00
Kefu Chai
724a6e26f3 cql3: define format_as() for formatting cql3::cql3_type::raw
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

to define a formatter which can be used by raw class and its derived
classes, we have to put the full template specialization before the
call sites. also, please note, the forward declaration is not enough,
as the compile-time formatter check of fmt requires the definition of
formatter. since fmt v10 also enables us to use `format_as()` to format
a certain type with the return value of `format_as()`.

this fulfills our needs.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16125
2023-11-27 15:28:19 +02:00
Kefu Chai
0b69a1badc transport: cast unaligned<T> to T for formatting it
in fmt v10, it does not cast unaligned<T> to T when formatting it,
instead it insists on finding a matched fmt::formatter<> specialization for it.
that's why we have FTBFS with fmt v10 when printing
these packed<T> variables with fmtlib v10.

in this change, we just cast them to the underlying types before
formatting them. because seastar::unaligned<T> does not provide
a method for accessing the raw value, neither does it provide
a type alias of the type of the underlying raw value, we have
to cast to the type without deducing it from the printed value.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16167
2023-11-27 15:26:13 +02:00
Gleb Natapov
e68e998b15 test: add test to check errro handling during tablet draining
The test checks that the topology operation is aborted if an error
happens during tablet migration stage.
2023-11-27 15:06:52 +02:00
Gleb Natapov
b1c0b57acf test: fix test_topology_streaming_failure test to not grep the whole file
A cluster can be reused between tests, so lets grep only the part of the
log that is relevant for the test itself.
2023-11-27 15:05:21 +02:00
Petr Gusev
dca28417b2 storage_service: drop unused method handle_state_replacing_update_pending_ranges 2023-11-27 12:37:26 +01:00
Tomasz Grabiec
ae5220478c tablets: Release group0 guard when waiting for streaming to finish
This bug manifested as delays in DDL statement execution, which had to
wait until streaming is finished so that the topology change
coordinator releases the guard.

The reason is that topology change coordinator didn't release the
group0 guard if there is no work to do with active migrations, and
awaits the condition variable without leaving the scope.

Fixes #16182

Closes scylladb/scylladb#16183
2023-11-27 12:24:27 +01:00
Gleb Natapov
c83ff5a0dd storage_service: add error injection into the tablet migration code 2023-11-27 13:09:58 +02:00
Gleb Natapov
4ebdddc31b storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage
During remove or decommission as a first step tables are drained from
the leaving node. Theoretically this step may fail. Rollback the
topology operation if it happen. Since some tables may stay in migration
state the topology needs to go to the tablet_migration state. Lets do it
always since it should be save to do it even if there is no on going
tablet migrations.
2023-11-27 13:09:58 +02:00
Nadav Har'El
8d040325ab cql: fix SELECT toJson() or SELECT JSON of time column
The implementation of "SELECT TOJSON(t)" or "SELECT JSON t" for a column
of type "time" forgot to put the time string in quotes. The result was
invalid JSON. This is patch is a one-liner fixing this bug.

This patch also removes the "xfail" marker from one xfailing test
for this issue which now starts to pass. We also add a second test for
this issue - the existing test was for "SELECT TOJSON(t)", and the second
test shows that "SELECT JSON t" had exactly the same bug - and both are
fixed by the same patch.

We also had a test translated from Cassandra which exposed this bug,
but that test continues to fail because of other bugs, so we just
need to update the xfail string.

The patch also fixes one C++ test, test/boost/json_cql_query_test.cc,
which enshrined the *wrong* behavior - JSON output that isn't even
valid JSON - and had to be fixed. Unlike the Python tests, the C++ test
can't be run against Cassandra, and doesn't even run a JSON parser
on the output, which explains how it came to enshrine wrong output
instead of helping to discover the bug.

Fixes #7988

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#16121
2023-11-27 10:03:04 +02:00
Anna Stuchlik
24d5dbd66f doc: replace the OSS-only link on the Raft page
This commit replaces the link to the OSS-only page
(the 5.2-to-5.4 upgrade guide not present in
the Enterprise docs) on the Raft page.

While providing the link to the specific upgrade
guide is more user-friendly, it causes build failures
of the Enterprise documentation. I've replaced
it with the link to the general Upgrade section.

The ".. only:: opensource" directive used to wrap
the OSS-only content correctly excludes the content
form the Enterprise docs - but it doesn't prevent
build warnings.

This commit must be backported to branch-5.4 to
prevent errors in all versions.

Closes scylladb/scylladb#16176
2023-11-27 08:52:58 +02:00
Kefu Chai
c937827308 mutation_query: add formatter for reconcilable_result::printer
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
reconcilable_result::printer, and remove its operator<<().

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16186
2023-11-26 20:20:50 +02:00
Konstantin Osipov
f0aa325187 test: provide overview of the contents of test/ directory
Fixes #16080

Closes scylladb/scylladb#16088
2023-11-26 15:51:07 +02:00
Marcin Maliszkiewicz
81be3e0935 test/alternator/run: port -h and --omit-scylla-output options from cql-pytest
Closes scylladb/scylladb#16171
2023-11-26 13:52:01 +02:00
Botond Dénes
fe7c81ea30 Update ./tools/jmx and ./tools/java submodules
* ./tools/jmx 05bb7b68...80ce5996 (4):
  > StorageService: Normalize endpoint inetaddress strings to java form

Fixes #16039

  > ColumnFamilyStore: only quote table names if necessary
  > APIBuilder: allow quoted scope names
  > ColumnFamilyStore: don't fail if there is a table with ":" in its name

Fixes #16153

* ./tools/java 10480342...26f5f71c (1):
  > NodeProbe: allow addressing table name with colon in it

Also needed for #16153

Closes scylladb/scylladb#16146
2023-11-26 13:35:38 +02:00
Kefu Chai
ba3dce3815 build: do escape "\" in regular string
in Python, a raw string is created using 'r' or 'R' prefix. when
creating the regex using Python string, sometimes, we have to use
"\" to escape the parenthesis so the tools like "sed" can consider
the parenthesis as a capture group. but "\" is also used to escape
strings in Python, in order to put "\" as it is, we use "\" instead
of escaping "\" with "\\" which is obscure. when generating rules,
we use multiple-lines string and do not want to have an empty line
at the beginning of the string so added "\" continuation mark.

but we fail to escape some of the "\" in the string, and just put
"\(", despite that Python accepts it after failing to find a matched
escaped char for it, and interprets it as "\\(". this should still
be considered a misuse of oversight. with python's warning enabled,
one is able see its complaints.

in this change, we escape the "\" properly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16179
2023-11-26 13:34:10 +02:00
Kefu Chai
3053d63c7f main: notify systemd that the service is ready
this change addresses a regression introduced by
f4626f6b8e, which stopped notifying
systemd with the status that scylla is READY. without the
notification, systemd would wait in vain for the readiness of
scylla.

Refs f4626f6b8e

Fixes #16159
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16166
2023-11-26 10:38:53 +02:00
Aleksandra Martyniuk
9c2c964b8e test: test abort of compaction task that isn't started yet
Test whether a task which parent was aborted has a proper status.
2023-11-24 19:25:27 +01:00
Aleksandra Martyniuk
8639eae0ce test: test running compaction task abort
Test whether a task which is aborted while running has a proper status.
2023-11-24 19:25:20 +01:00
Botond Dénes
a472700309 Merge 'Minor fixes and refactors' from Kamil Braun
- remove some code that is obsolete in newer Scylla versions,
- fix some minor bugs. These bugs appear to be benign, there are no known issues caused by them, but fixing them is a good idea nevertheless,
- refactor some code for better maintainability.

Parts of this PR were extracted from https://github.com/scylladb/scylladb/pull/15331 (which was merged but later reverted), parts of it are new.

Closes scylladb/scylladb#16162

* github.com:scylladb/scylladb:
  test/pylib: log_browsing: fix type hint
  migration_manager: take `abort_source&` in get_schema_for_read/write
  migration_manager: inline merge_schema_in_background
  migration_manager: remove unused merge_schema_from overload
  migration_manager: assume `canonical_mutation` support
  migration_manager: add `std::move` to avoid a copy
  schema_tables: refactor `scylla_tables(schema_features)`
  schema_tables: pass `reload` flag when calling `merge_schema` cross-shard
  system_keyspace: fix outdated comment
2023-11-24 17:34:21 +02:00
Patryk Jędrzejczak
15d3ed4357 test: topology: update run_first lists
`run_first` lists in `suite.yaml` files provide a simple way to
shorten the tests' average running time by running the slowest
tests at first.

We update these lists, since they got outdated over time:
- `test_topology_ip` was renamed to `test_replace`
   and changed suite,
- `test_tablets` changed suite,
- new slow tests were added:
  - `test_cluster_features`,
  - `test_raft_cluster_features`,
  - `test_raft_ignore_nodes`,
  - `test_read_repair`.

Closes scylladb/scylladb#16104
2023-11-24 16:18:30 +01:00
Aleksandra Martyniuk
c74b3ec596 tasks: fail if a task was aborted
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.

Finish a task with a failure if it was aborted with task manager api.
2023-11-24 15:45:00 +01:00
Aleksandra Martyniuk
aa7bba2d8b compaction: abort task manager compaction tasks
Set top level compaction tasks as abortable.

Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
2023-11-24 15:44:34 +01:00
Kefu Chai
ca31dab9d2 sstable: drop repaired_at related code
before we support incremental repair, these is no point have the
code path setting / getting it. and even worse, it incurs confusion.

so, in this change, we

* just set the field to 0,
* drop the corresponding field in metadata_collector, as we never
  update it.
* add a comment to explain why this variable is initialized to 0

Fixes #16098
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16169
2023-11-24 15:12:25 +02:00
Botond Dénes
697cf41b9b Merge 'repair: Introduce small table optimization' from Asias He
repair: Introduce small table optimization

*) Problem:

We have seen in the field it takes longer than expected to repair system tables
like system_auth which has a tiny amount of data but is replicated to all nodes
in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The
main reason for the slowness is that even if the amount of data is small,
repair has to walk though all the token ranges, that is num_tokens *
number_of_nodes_in_the_cluster. The overhead of the repair protocol for each
token range dominates due to the small amount of data per token range. Another
reason is the high network latency between DCs makes the RPC calls used to
repair consume more time.

*) Solution:

To solve this problem, a small table optimization for repair is introduced in
this patch. A new repair option is added to turn on this optimization.

- No token range to repair is needed by the user. It  will repair all token
ranges automatically.

- Users only need to send the repair rest api to one of the nodes in the
cluster. It can be any of the nodes in the cluster.

- It does not require the RF to be configured to replicate to all nodes in the
cluster. This means it can work with any tables as long as the amount of data
is low, e.g., less than 100MiB per node.

*) Performance:

1)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Before:
```
repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1537, round_nr=4612,
round_nr_fast_path_already_synced=4611,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds,
tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2,
0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848},
{127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0},
{127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s,
tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={}
```

After:
```
repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds,
tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178},
{127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1},
{127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286},
{127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286},
{127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026},
{127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={},
rx_row_nr_peer={}
```

The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X

2)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Same test as above except 5ms delay is added to simulate multiple dc
network latency:

The time to repair is reduced from 333s to 0.2s.

333.26758 s / 0.22625381s = 1472.98

3)

3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3}
, 10 ms network latency

Before:

```
repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=2305, round_nr=6916,
round_nr_fast_path_already_synced=6915,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015
seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0},
{127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1},
{127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}},
row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2,
1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385},
{127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0},
{127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0},
{127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={}
```

After:

```
repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=1, round_nr=4,
round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915
seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178},
{127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}},
row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}},
row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2,
0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793},
{127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7,
0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634},
{127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4,
2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634},
{127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9,
2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
```

The time to repair is reduced from 986s (16 minutes) to 1.6s

*) Summary

So, a more than 1000X difference is observed for this common usage of
system table repair procedure.

Fixes #16011
Refs  #15159

Closes scylladb/scylladb#15974

* github.com:scylladb/scylladb:
  repair: Introduce small table optimization
  repair: Convert put_row_diff_with_rpc_stream to use coroutine
2023-11-24 15:11:42 +02:00
Kamil Braun
1f56962591 Merge 'test: topology: test concurrent bootstrap' from Patryk Jędrzejczak
We add a test for concurrent bootstrap in the raft-based topology.

Additionally, we extend the testing framework with a new function -
`ManagerClient.servers_add`. It allows adding multiple servers
concurrently to a cluster.

This PR is the first step to fix #15423. After merging it, if the new test
doesn't fail for some time in CI, we can:
- use `ManagerClient.servers_add` in other tests wherever possible,
- start initial servers concurrently in all suites with
  `initial_size > 0`.

Closes scylladb/scylladb#16102

* github.com:scylladb/scylladb:
  test: topology: add test_concurrent_bootstrap
  test: ManagerClient: introduce servers_add
  test: ManagerClient: introduce _create_server_add_data
2023-11-24 12:41:05 +01:00
Kefu Chai
f99223919a compaction: add formatter for map<timestamp_type, vector<shared_sstable>>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
map<timestamp_type, vector<shared_sstable>>. since the operator<<
for this type is only used in the .cc file, and the only use case
of it is to provide the formatter for fmt, so the operator<< based
formatter is remove in this change.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16163
2023-11-24 11:56:28 +02:00
Kamil Braun
5acfcd8ef5 Merge 'raft: send group0 RPCs only if the destination group0 server is seen as alive' from Piotr Dulikowski
In topology on raft mode, the events "new node starts its group0 server"
and "new node is added to group0 configuration" are not synchronized
with each other. Therefore it might happen that the cluster starts
sending commands to the new node before the node starts its server. This
might lead to harmless, but ugly messages like:

    INFO  2023-09-27 15:42:42,611 [shard 0:stat] rpc - client
    127.0.0.1:56352 msg_id 2:  exception "Raft group
    b8542540-5d3b-11ee-99b8-1052801f2975 not found" in no_wait handler
    ignored

In order to solve this, the failure detector verb is extended to report
information about whether group0 is alive. The raft rpc layer will drop
messages to nodes whose group0 is not seen as alive.

Tested by adding a delay before group0 is started on the joining node,
running all topology tests and grepping for the aforementioned log
messages.

Fixes: scylladb/scylladb#15853
Fixes: scylladb/scylladb#15167

Closes scylladb/scylladb#16071

* github.com:scylladb/scylladb:
  raft: rpc: introduce destination_not_alive_error
  raft: rpc: drop RPCs if the destination is not alive
  raft: pass raft::failure_detector to raft_rpc
  raft: transfer information about group0 liveness in direct_fd_ping
  raft: add server::is_alive
2023-11-24 10:34:05 +01:00
Patryk Jędrzejczak
a8d06aa9fd test: topology: add test_concurrent_bootstrap
We add a test for concurrent bootstrap support in the raft-based
topology.

The plan is to make this test temporary. In the future, we will:
- use ManagerClient.servers_add in other tests wherever possible,
- start initial servers concurrently in all suites with
  initial_size > 0.
So, this test will not test anything unique.

We could make the changes proposed above now instead of adding
this small test. However, if we did that and it turned out that
concurrent bootstrap is flaky in CI, we would make almost every CI
run fail with many failures. We want to avoid such a situation.
Running only this test for some time in CI will reduce the risk
and make investigating any potential failures easier.
2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak
cd7b282db6 test: ManagerClient: introduce servers_add
We add a new function - servers_add - that allows adding multiple
servers concurrently to a cluster. It makes use of a concurrent
bootstrap now supported in the raft-based topology.

servers_add doesn't have the replace_cfg parameter. The reason is
that we don't support concurrent replace operations, at least for
now.

There is an implementation detail in ScyllaCluster.add_servers. We
cannot simply do multiple calls to add_server concurrently. If we
did that in an empty cluster, every node would take itself as the
only seed and start a new cluster. To solve this, we introduce a
new field - initial_seed. It is used to choose one of the servers
as a seed for all servers added concurrently to an empty cluster.

Note that the add_server calls in asyncio.gather in add_servers
cannot race with each other when setting initial_seed because
there is only one thread.

In the future, we will also start all initial servers concurrently
in ScyllaCluster.install_and_start. The changes in this commit were
designed in a way that will make changing install_and_start easy.
2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak
aca90e6640 test: ManagerClient: introduce _create_server_add_data
We introduce this function to avoid code duplication. After the
following commits, it will also be used in the new
ManagerClient.servers_add function.
2023-11-24 09:39:01 +01:00
Botond Dénes
c47a63835e Merge 'test/sstable_compaction_test: check every sstable replaced sstable ' from Kefu Chai
before this change, in sstable_run_based_compaction_test, we check
every 4 sstables, to verify that we close the sstable to be replaced
in a batch of 4.

since the integer-based generation identifier is monotonically
incremental, we can assume that the identifiers of sstables are like
0, 1, 2, 3, .... so if the compaction consumes sstable in a
batch of 4, the identifier of the first one in the batch should
always be the multiple of 4. unfortunately, this test does not work
if we use uuid-based identifier.

but if we take a closer look at how we create the dataset, we can
have following facts:

1. the `compaction_descriptor` returned by
   `sstable_run_based_compaction_strategy_for_tests` never
   set `owned_ranges` in the returned descriptor
2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no`
   is used, if `_owned_ranges_checker` is empty
3. `mutation_reader_merger` respects the `fwd_mr` passed to its
   ctor, so it closes current sstable immediately when the underlying
   mutation reader reaches the end of stream.

in other words, we close every sstable once it is fully consumed in
sstable_ompaction_test. and the reason why the existing test passes
is that we just sample the sstables whose generation id is a multiple
of 4. what happens when we perform compaction in this test is:

1. replace 5 with 33, closing 5
2. replace 6 with 34, closing 6
3. replace 7 with 35, closing 7
4. replace 8 with 36, closing 8   << let's check here.. good, go on!
5. replace 13 with 37, closing 13
...
8. replace 16 with 40, closing 16 << let's check here.. also, good, go on!

so, in this change, we just check all old sstables, to verify that
we close each of them once it is fully consumed.

Fixes https://github.com/scylladb/scylladb/issues/16073

Closes scylladb/scylladb#16074

* github.com:scylladb/scylladb:
  test/sstable_compaction_test: check every sstable replaced sstable
  test/sstable_compaction_test: s/old_sstables.front()/old_sstable/
2023-11-24 07:25:28 +02:00
Kamil Braun
35bb025f99 test/pylib: log_browsing: fix type hint 2023-11-23 17:23:47 +01:00
Kamil Braun
819f542ee6 migration_manager: take abort_source& in get_schema_for_read/write
No callsite needed the `nullptr` case, so we can convert pointer to
reference.
2023-11-23 17:23:47 +01:00
Kamil Braun
ddfe4f65a8 migration_manager: inline merge_schema_in_background
There was only one use site of this template.
2023-11-23 17:23:47 +01:00
Kamil Braun
42f6c5c2db migration_manager: remove unused merge_schema_from overload
The `frozen_mutation` version is now dead code.
2023-11-23 17:23:47 +01:00
Kamil Braun
8f5c2c88b8 migration_manager: assume canonical_mutation support
Support for `canonical_mutation`s was added way back in Scylla 3.2. A
lot of code in `migration_manager` is still checking whether the old
`frozen_mutations` are received or need to be sent.

We no longer need this code, since we don't support version skips during
upgrade (and certainly not upgrades like 3.2->5.4).

Leave a sanity checks in place, but otherwise delete the
`frozen_mutation` branches.
2023-11-23 17:23:47 +01:00
Kamil Braun
0479e5529a migration_manager: add std::move to avoid a copy 2023-11-23 17:23:47 +01:00
Kamil Braun
269a189526 schema_tables: refactor scylla_tables(schema_features)
The `scylla_tables` function gives a different schema definition
for the `system_schema.scylla_tables` table, depending on whether
certain schema features are enabled or not.

The way it was implemented, we had to write `θ(2^n)` amount
of code and comments to handle `n` features.

Refactor it so that the amount of code we have to write to handle `n`
features is `θ(n)`.
2023-11-23 17:23:47 +01:00
Raphael S. Carvalho
157a5c4b1b treewide: Avoid using namespace sstables in header to avoid conflicts
That's needed for compaction_group.hh to be included in headers.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-11-23 17:36:57 +02:00
Kamil Braun
c3257bf546 Revert "test: cql_test_env: Interrupt all components on cql_test_env teardown"
This reverts commit 93ee7b7df9.

It's causing assertion failures when shutting down `cql_test_env` in
boost unit tests: scylladb/scylladb#16144
2023-11-23 15:32:13 +01:00
Gleb Natapov
7267376eac storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state
Handle the barrier failure by sleeping for a "ring delay" and
continuing. The purpose of the barrier is to wait for all reads to
old replica set to complete and fence the remaining requests.  If the
barrier fails we give the fence some time to propagate and continue with
the topology change. Of fence did not propagate we may have stale reads,
but this is not worse that we have with gossiper.
2023-11-23 15:30:10 +02:00
Gleb Natapov
7ea8fa459c storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state
Handle the barrier failure by sleeping for a "ring delay" and
continuing. The purpose of the barrier is to wait for unfinished writes
to decommissioned node complete. If barrier fails we give them some time
to complete and then proceed with node decommission. The worse thing
that may happen if some write will fail because the node will be
shutdown.
2023-11-23 15:30:10 +02:00
Gleb Natapov
11b7ee32ec storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes
Not that is removed is dead, so no need to talk to it.
2023-11-23 15:30:10 +02:00
Gleb Natapov
4c76b8b59f storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure
Go through the rollback_to_normal state when the node needs to move to
normal during the rollback and update fence in this state before moving
the node to normal. This guaranties that the fence update will not
be missed. Not that when a node moves to left state it already passes
through left_token_ring which guaranties the same.
2023-11-23 15:29:36 +02:00
Gleb Natapov
95dd0e453d storage_service: topology coordinator: add rollback_to_normal node state
When a topology coordinator rolls back from unsuccessful topology operation it
advances the fence (which is now in the raft state) after moving to normal
state. We do not want this to fail (only majority of nodes is needed for
it to not to), but currently it may fail in case the coordinator moves
to another node after changing the rollback node's state to normal, but
before updating the fence. To solve that the rollback operation needs to
go through a new rollback_to_normal state that will do the fencing
before moving to normal. This patch introduces that state, but does not use
it yet.
2023-11-23 15:27:28 +02:00
Kamil Braun
5223d32fab schema_tables: pass reload flag when calling merge_schema cross-shard
In 0c86abab4d `merge_schema` obtained a new flag, `reload`.

Unfortunately, the flag was assigned a default value, which I think is
almost always a bad idea, and indeed it was in this case. When
`merge_schema` is called on shard different than 0, it recursively calls
itself on shard 0. That recursive call forgot to pass the `reload` flag.

Fix this.
2023-11-23 14:06:40 +01:00
Kamil Braun
de3607810d system_keyspace: fix outdated comment 2023-11-23 14:06:27 +01:00
Piotr Dulikowski
c58ff554d8 raft: rpc: introduce destination_not_alive_error
Add a new destination_not_alive_error, thrown from two-way RPCs in case
when the RPC is not issued because the destination is not reported as
alive by the failure detector.

In snapshot transfer code, lower the verbosity of the message printed in
case it fails on the new error. This is done to prevent flakiness in the
CI - in case of slow runs, nodes might get spuriously marked as dead if
they are busy, and a message with the "error" verbosity can cause some
tests to fail.
2023-11-23 11:14:28 +01:00
Kamil Braun
03ecc8457c Merge 'raft topology: reject replace if the node being replaced is not dead' from Patryk Jędrzejczak
The replace operation is defined to succeed only if the node being
replaced is dead. We should reject this operation when the failure
detector considers the node being replaced alive.

Apart from adding this change, this PR adds a test case -
`test_replacing_alive_node_fails` - that verifies it. A few testing
framework adjustments were necessary to implement this test and
to avoid flakiness in other tests that use the replace operation after
the change. From now, we need to ensure that all nodes see the
node being replaced as dead before starting the replace. Otherwise,
the check added in this PR could reject the replace.

Additionally, this PR changes the replace procedure in a way that
if the replacing node reuses the IP of the node being replaced, other
nodes can see it as alive only after the topology coordinator accepts
its join request. The replacing node may become alive before the
topology coordinator checks if the node being replaced is dead. If
that happens and the replacing node reuses the IP of the node being
replaced, the topology coordinator cannot know which of these two
nodes is alive and whether it should reject the join request.

Fixes #15863

Closes scylladb/scylladb#15926

* github.com:scylladb/scylladb:
  test: add test_replacing_alive_node_fails
  raft topology: reject replace if the node being replaced is not dead
  raft topology: add the gossiper ref to topology_coordinator
  test: test_cluster_features: stop gracefully before replace
  test: decrease failure_detector_timeout_in_ms in replace tests
  test: move test_replace to topology_custom
  test: server_add: wait until the node being replaced is dead
  test: server_add: add support for expected errors
  raft topology: join: delay advertising replacing node if it reuses IP
  raft topology: join: fix a condition in validate_joining_node
2023-11-23 10:31:59 +01:00
Kefu Chai
55103f4a6b hints: move formatter of db::hints::sync_point to test
the operator<<() based formatter is only used in its test, so
let's move it to where it is used.
we can always bring it back later if it is required in other places.
but better off implementing it as a fmt::formatter<> then.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16142
2023-11-23 11:22:31 +02:00
Kefu Chai
a9c1a435ec result_message: add formatter for result_message::rows
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for
`cql_transport::messages::result_message::rows`

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16143
2023-11-23 11:12:55 +02:00
Kefu Chai
6749d963ed config: define formatter for db::seed_provider_type
before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter.

in this change, we define a formatter for db::seed_provider_type.

please note, we are still formatting vector<db::seed_provider_type>
with the helper provided by seastar/core/sstring.hh, which uses
operator<<() to print the elements in the vector being printed.
so we have to keep the operator<< formatter before disabling
the generic formatter for vector<T>.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16138
2023-11-23 11:04:35 +02:00
Kefu Chai
ef76c4566b gossiper: do not use {:d} fmt specifier when formating generation_number
generation_number's type is `generation_type`, which in turn is a
`utils::tagged_integer<struct generation_type_tag, int32_t>`,
which formats using either fmtlib which uses ostream_formatter backed by
operator<< . but `ostream_formatter` does not provide the specifier
support. so {:d} does apply to this type, when compiling with fmtlib
v10, it rejects the format specifier (the error is attached at the end
of the commit message).

so in this change, we just drop the format specifier. as fmtlib prints
`int32_t` as a decimal integer, so even if {:d} applied, it does not
change the behavior.

```
/home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: error: call to consteval function 'fmt::basic_format_string<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &>::basic_format_string<char[48], 0>' is not a constant expression
 1798 |                 auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen);
      |                                   ^
/usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression
 2322 |       if (!in(arg_type, set)) throw_format_error("invalid format specifier");
      |                               ^
/usr/include/fmt/core.h:2395:14: note: in call to 'parse_presentation_type.operator()(1, 510)'
 2395 |       return parse_presentation_type(pres::dec, integral_set);
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2706:9: note: in call to 'parse_format_specs<char>(&"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47], formatter<mapped_type, char_type>().formatter::specs_, checker(s).context_, 13)'
 2706 |         detail::parse_format_specs(ctx.begin(), ctx.end(), specs_, ctx, type);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2561:10: note: in call to 'formatter<mapped_type, char_type>().parse<fmt::detail::compile_parse_context<char>>(checker(s).context_)'
 2561 |   return formatter<mapped_type, char_type>().parse(ctx);
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2647:39: note: in call to 'parse_format_specs<utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, fmt::detail::compile_parse_context<char>>(checker(s).context_)'
 2647 |     return id >= 0 && id < num_args ? parse_funcs_[id](context_) : begin;
      |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2485:15: note: in call to 'handler.on_format_specs(0, &"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47])'
 2485 |       begin = handler.on_format_specs(adapter.arg_id, begin + 1, end);
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2541:13: note: in call to 'parse_replacement_field<char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>> &>(&"Remote generation {:d} != local generation {:d}"[19], &"Remote generation {:d} != local generation {:d}"[47], checker(s))'
 2541 |     begin = parse_replacement_field(p, end, handler);
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2769:7: note: in call to 'parse_format_string<true, char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>>>({&"Remote generation {:d} != local generation {:d}"[0], 47}, checker(s))'
 2769 |       detail::parse_format_string<true>(str_, checker(s));
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: note: in call to 'basic_format_string<char[48], 0>("Remote generation {:d} != local generation {:d}")'
 1798 |                 auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen);
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16126
2023-11-23 11:02:44 +02:00
Tzach Livyatan
225f0ff5aa Remove i3i from EC2 recommended EC2 instance types list
There is no reason to prefer i3i over i4i.

Closes scylladb/scylladb#16141
2023-11-23 10:09:34 +02:00
Kefu Chai
0e3f6186cb build: disable enum-constexpr-conversion
Clang-18 starts to complain when a constexp value is casted to a
enum and the value is out of the range of the enum values. in this
case, boost intentially cast the out-of-range values to the
type to be casted. so silence this warning at this moment.
since `lexical_cast.hpp` is included in multiple places in the
source tree, this warning is disabled globally.

the warning look like:

```
In file included from /home/kefu/dev/scylladb/types/types.cc:9:
In file included from /usr/include/boost/lexical_cast.hpp:32:
In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43:
In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36:
In file included from /usr/include/boost/numeric/conversion/cast.hpp:33:
In file included from /usr/include/boost/numeric/conversion/converter.hpp:13:
In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13:
In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18:
In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19:
In file included from /usr/include/boost/mpl/integral_c.hpp:32:
/usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'udt_buil
tin_mixture_enum' [-Wenum-constexpr-conversion]
   73 |     typedef AUX_WRAPPER_INST( BOOST_MPL_AUX_STATIC_CAST(AUX_WRAPPER_VALUE_TYPE, (value - 1)) ) prior;
      |                               ^
/usr/include/boost/mpl/aux_/static_cast.hpp:24:47: note: expanded from macro 'BOOST_MPL_AUX_STATIC_CAST'
   24 | #   define BOOST_MPL_AUX_STATIC_CAST(T, expr) static_cast<T>(expr)
      |                                               ^
In file included from /home/kefu/dev/scylladb/types/types.cc:9:
In file included from /usr/include/boost/lexical_cast.hpp:32:
In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43:
In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36:
In file included from /usr/include/boost/numeric/conversion/cast.hpp:33:
In file included from /usr/include/boost/numeric/conversion/converter.hpp:13:
In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13:
In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18:
In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19:
In file included from /usr/include/boost/mpl/integral_c.hpp:32:
/usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'int_float_mixture_enum' [-Wenum-constexpr-conversion]
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16082
2023-11-23 10:08:56 +02:00
Kefu Chai
d28598763d build: s/-Wignore-qualifiers/-Wignored-qualifiers/
this was a typo introduced by 781b7de5. which intended to add
-Wignored-qualifiers to the compiling options, but it ended up
adding -Wignore-qualifiers.

in this change, the typo is corrected.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16124
2023-11-23 09:47:35 +02:00
Pavel Emelyanov
2f7f4ebb74 raft_state_machine: Check system.topology presense before tying to find it
The write_mutations_to_database() decides if it needs to flush the
database by checking if the mutations came to system.topology table and
performing some more checks if they did. Overall this looks like

    auto topo_schema = db.find_schema(system.topology)
    if (target_schema != topo_schema)
        return false;

    // extra checks go here

However, the system.topology table exists only if the feature named
CONSISTENT_TOPOLOGY_CHANGES is enabled via commandline. If it's not, the
call to db.find_schema(system.topology) throws and the whole attempt to
write mutations throws too stopping raft state machine.

Since the intention is to check if the target schema is the topology
table, the presense of this table should come first.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16089
2023-11-23 09:35:43 +02:00
Takuya ASADA
c9d77699e1 scylla_setup: stop listing virtual devices on the NIC prompt
Currently, the NIC prompt on scylla_setupshows up virtual devices such as
VLAN devices and bridge devices, but perftune.py does not support them.
To prevent causing error while running scylla_setup, we should stop listing
these devices from the NIC prompt.

closes #6757

Closes scylladb/scylladb#15958
2023-11-23 10:27:09 +03:00
Piotr Dulikowski
ab42932ba4 raft: rpc: drop RPCs if the destination is not alive
If the failure detector sees the destination as dead, there is no use to
send the RPC so drop it silently.

This only affects two-way RPCs and "request" one-way RPCs. The one-way
RPCs used as responses to other one-way RPCs are not affected.
2023-11-23 00:34:22 +01:00
Piotr Dulikowski
3e32ee2d36 raft: pass raft::failure_detector to raft_rpc
In following commits, raft_rpc will drop outgoing messages if the
destination is not seen as alive by the failure detector.
2023-11-23 00:34:22 +01:00
Piotr Dulikowski
a8ee4d543a raft: transfer information about group0 liveness in direct_fd_ping
Add a new variant of the reply to the direct_fd_ping which specifies
whether the local group0 is alive or not, and start actively using it.

There is no need to introduce a cluster feature. Due to how our
serialization framework works, nodes which do not recognize the new
variant will treat it as the existing std::monostate. The std::monostate
means "the node and group0 is alive"; nodes before the changes in this
commit would send a std::monostate anyway, so this is completely
transparent for the old nodes.
2023-11-23 00:34:22 +01:00
Piotr Dulikowski
a1ebfcf006 raft: add server::is_alive
Add a method which reports whether given raft server is running.

In following commits, the information about whether the local raft
group 0 is running or not will be included in the response to the
failure detector ping, and the is_alive method will be used there.
2023-11-23 00:34:22 +01:00
Avi Kivity
00d82c0d54 Update tools/java submodule
* tools/java 8485bef333...1048034277 (1):
  > resolver: download sigar artifact only for Linux / AMD64
2023-11-22 18:02:04 +02:00
Kefu Chai
cfcd34ba64 cql3: test_assignment: define formatter for assignment_testable
add fmt formatter for `assignment_testable`.

this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `assignment_testabe` without the help of `operator<<`.

since we are still printing the shared_ptr<assignment_testable> using
operator<<(.., const assignment_testable&), we cannot drop this operator
yet.

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16127
2023-11-22 17:44:07 +02:00
Tomasz Grabiec
b06a0078fb Merge 'Support for sending tablet info to the drivers' from Sylwia Szunejko
There is a need for sending tablet info to the drivers so they can be tablet aware. For the best performance we want to get this info lazily only when it is needed.

The info is send when driver asks about the information that the specific tablet contains and it is directed to the wrong node/shard so it could use that information for every subsequent query. If we send the query to the wrong node/shard, we want to send the RESULT message with additional information about the tablet (replicas and token range) in custom_payload.

Mechanism for sending custom_payload added.

Sending custom_payload tested using three node cluster and cqlsh queries. I used RF=1 so choosing wrong node was testable.

I also manually tested it with the python-driver and confirmed that the tablet info can be deserialized properly.

Automatic tests added.

Closes scylladb/scylladb#15410

* github.com:scylladb/scylladb:
  docs: add documentation about sending tablet info to protocol extensions
  Add tests for sending tablet info
  cql3: send tablet if wrong node/shard is used during modification statement
  cql3: send tablet if wrong node/shard is used during select statement
  locator: add function to check locality
  locator: add function to check if host is local
  transport: add function to add tablet info to the result_message
  transport: add support for setting custom payload
2023-11-22 17:44:07 +02:00
Botond Dénes
0ae1335daa Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk"
This reverts commit 11cafd2fc8, reversing
changes made to 2bae14f743.

Reverting because this series causes frequent CI failures, and the
proposed quickfix causes other failures of its own.

Fixes: #16113
2023-11-22 17:44:07 +02:00
Kefu Chai
48340380dd scylla-sstable: print "validate" result in JSON
instead of printing the result of the "validate" subcommand in a
free-style plain text, let's print it using JSON. for two reasons:

1. it is simpler to consume the output with other tools and tests.
2. more consistent with other commands.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16105
2023-11-22 17:44:07 +02:00
Botond Dénes
8c5f5b7722 service/migration_manager: only reload schema when enabling disabled features
Instead of unconditionally reloading schema when enabling any schema
feature, only create a listener, if the feature was disabled in the
first place. So that we don't trigger reloading of the schema on each
schema feature, on node restarts. In this case, the node will start with
all these features enabled already.
This prevents unnecessary work on restarts.

Fixes: #16112

Closes scylladb/scylladb#16118
2023-11-22 17:44:07 +02:00
Kefu Chai
ca1828c718 scylla-sstable: print "validate-checksum" result in JSON
instead of printing the result of the "validate-checksum" subcommand
with the logging message, let's print it using JSON. for three reasons:

1. it is simpler to consume the output with other tools and tests.
2. more consistent with other commands.
3. the logging system is used for audit the behavior and for debugging
   purposes, not for building a user-facing command line interface.
4. the behavior should match with the corresponding document. and
   in docs/operating-scylla/admin-tools/scylla-sstable.sst, we claim
   that `validate-checksums` subcommand prints a dict of

   ```
   $ROOT := { "$sstable_path": Bool, ... }
   ```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16106
2023-11-22 17:44:07 +02:00
Kefu Chai
43fd63e28c clocks-impl: format time_point using fmt
instead of relying on the operator<<() of an opaque type, use fmtlib
to print a timepoint for better support of new fmtlib which dropped
the default-generated formatter for types with operator<<().

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16116
2023-11-22 17:44:07 +02:00
Nadav Har'El
242a4b23c0 Merge 'tests: Skip unnecessary sleeps in cql_test_env teardown' from Tomasz Grabiec
This PR contains two patches which get rid of unnecessary sleeps on cql_test_env teardown greatly reducing run time of tests.

Reduces run time of `build/dev/test/boost/schema_change_test` from 90s to 6s.

Closes scylladb/scylladb#16111

* github.com:scylladb/scylladb:
  test: cql_test_env: Interrupt all components on cql_test_env teardown
  tests: cql_test_env: Skip gossip shutdown sleep
2023-11-22 17:44:07 +02:00
Anna Stuchlik
3751acce42 doc: fix rollback in the 5.2-to-5.4 upgrade guide
This commit fixes the rollback procedure in
the 5.2-to-5.4 upgrade guide:
- The "Restore system tables" step is removed.
- The "Restore the configuration file" command
  is fixed.
- The "Gracefully shutdown ScyllaDB" command
  is fixed.

In addition, there are the following updates
to be in sync with the tests:

- The "Backup the configuration file" step is
  extended to include a command to backup
  the packages.
- The Rollback procedure is extended to restore
  the backup packages.
- The Reinstallation section is fixed for RHEL.

Also, I've removed the optional step to enable
consistent schema management from the list of
steps - the appropriate section has already
been removed, but it remained in the procedure
description, which was misleading.

Refs https://github.com/scylladb/scylladb/issues/11907

This commit must be backported to branch-5.4

Closes scylladb/scylladb#16114
2023-11-22 17:44:07 +02:00
Takuya ASADA
b97df92d76 scylla_setup: stop aborting on old kernel warning when non-interactive mode
On non-interactive mode setup, RHEL/CentOS7 old kernel check causes
"Setup aborted", this is not what we want.
We should keep warning but proceed setup, so default value of the kernel
check should be True, since it will automatically applied on
non-interactive mode.

Fixes #16045

Closes scylladb/scylladb#16100
2023-11-22 17:44:07 +02:00
Botond Dénes
b1a76ebb93 Merge 'Sanitize storage service init/deinit sequences' from Pavel Emelyanov
Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place.

refs: #2795
refs: #2737

Closes scylladb/scylladb#16103

* github.com:scylladb/scylladb:
  storage_service: Drop (un)init_messaging_service_part() pair
  storage_service: Init/Deinit RPC handlers in constructor/stop
  storage_service: Dont capture container() on RPC handler
  storage_service: Use storage_service::_sys_dist_ks in some places
  storage_service: Add explicit dependency on system dist. keyspace
  storage_service: Rurn query processor pointer into reference
  storage_service: Add explicity query_processor dependency
  main: Start storage service later
2023-11-22 17:44:07 +02:00
sylwiaszunejko
ac51c417ea docs: add documentation about sending tablet info to protocol extensions 2023-11-22 09:23:43 +01:00
sylwiaszunejko
207d673ad6 Add tests for sending tablet info 2023-11-22 09:23:43 +01:00
sylwiaszunejko
cea4c40685 cql3: send tablet if wrong node/shard is used during modification statement 2023-11-22 09:23:43 +01:00
sylwiaszunejko
54f22927a3 cql3: send tablet if wrong node/shard is used during select statement 2023-11-22 09:23:43 +01:00
sylwiaszunejko
954d51389c locator: add function to check locality 2023-11-22 09:23:43 +01:00
Eliran Sinvani
bfa839ce92 commitlog: enforce commitlog size hard limit by default
Since the commitlog size hard limit is a failsafe mechanism,
we don't expect to ever hit it. If we do hit the limit, it means
that we have an exceptional condition in the system. Hence, the
impact of enforcing the commitlog hard limit is irrelevant.
Here we enforce the limit by default.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-22 08:48:28 +02:00
Eliran Sinvani
63d62a7db2 commitlog: set flush threshold to half of the limit size
Once we enable commitlog hard limit by default, we would like
to have some room in case flushing memtables takes some time
to catch up. This threshold is half the limit.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-22 08:48:28 +02:00
Eliran Sinvani
d2a8651bce commitlog: unfold flush threshold assignment
This commit is only a cosmetic change. It is meant to
make the flush threshold assignment more readable and
comprehensible so future changes are easier to review.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-22 08:48:28 +02:00
sylwiaszunejko
a0c8531875 locator: add function to check if host is local 2023-11-21 15:15:20 +01:00
sylwiaszunejko
93420353f4 transport: add function to add tablet info to the result_message 2023-11-21 15:15:20 +01:00
sylwiaszunejko
75b3dbf7ea transport: add support for setting custom payload
A custom payload can now be added to response_message.
If it is set, it will be sent to client and the custom_payload
flag will be set.

write_string_bytes_map method is added to response class
and a missing custom_payload flag is added to
cql_frame_flags.
2023-11-21 15:09:36 +01:00
Pavel Emelyanov
74329e5aee test: Add object_store test to validate config reloading works
The test case is

- start scylla with broken object storage endpoint config
- create and populate s3-backed keyspace
- try flushing it (API call would hang, so do it in the background)
- wait for a few seconds, then fix the config
- wait for the flush to finish and stop scylla
- start scylla again and check that the keyspace is properly populated

Nice side effect of this test is that once flush fails (due to broken
config) it tries to remove the not-yet-sealed sstables and (!) fails
again, for the same reason. So during the restart there happen to be
several sstables in "creating" state with no stored objects, so this
additionally tests one more g.c. corner case

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
26f8202651 test: Add config update facility to test cluster
The Cluster wrapper used by object_store test already has the ability to
access cluster via CQL and via API. Add the sugar to make the cluster
re-read its scylla.yaml and other configs

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
4a531e4129 test: Make S3_Server export config file as pathlib.Path
The pylib minio server does that already. A test case added by the next
patch would need to have both cases as path, not as string

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
210b01a5ce config: Make object storage config updateable_value_source
Now its plain updateable_value, but without the ..._source object the
updateable_value is just a no-op value holder. In order for the
observers to operate there must be the value source, updating it would
update the attached updateable values _and_ notify the observers.

In order for the config to be the u.v._source, config entries should be
comparable to each other, thus the <=> operator for it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
9eb96a03f0 memtable: Extend list of checking codes
When flushing an sstable there can be errors that are not fatal and
shouldn't cause the whole scylla to die. Currently only ENOSPC and
EDQUOT are considered as such, but there's one more possibility --
access denied errors.

Those can happen, for example, if datadir is chmod/chown-ed by mistake
or intentionally while scylla is running (doing it pre-start time won't
trigger the issue as distributed loader checks permissions of datadir on
boot). Another option to step on "access denied" error is to flush
memtable on S3 storage with broken configuration.

Anyway, seeing the access denied error is also a good reason not to
crash, but print a warning in logs and retry in a hope that the node
administrator fixed things.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
a34dae8c37 sstables/storage/s3: Fix missing TOC status check
When TOC file is missing while garbage collecting the S3 server would
resolve with storage_io_error(ENOENT) nowadays

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
855626f7de s3/client: Map http exceptions into storage_io_error
When http request resolves with excpetion it makes sense to translate
the network exception into storage exceptio to make upper layers think
that it was some sort of IO error, not SUDDENLY and http one.

The translation is, for now, pretty simple:

- 404 and 3xx -> ENOENT
- 403(forbidden) and 401(unauthorized) -> EACCESS
- anything else -> EIO

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Patryk Jędrzejczak
566176bcd1 test: add test_replacing_alive_node_fails
We add a test for the Raft-based topology's new feature - rejecting
the replace operation if the node being replaced is considered
alive by the failure detector.

This test is not so fast, and it does not test any critical paths
so we run it only in dev mode.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
bf7a67224c raft topology: reject replace if the node being replaced is not dead
The replace operation is defined to succeed only if the node being
replaced is dead. We should reject this operation when the failure
detector considers the node being replaced alive.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
94ffdb4792 raft topology: add the gossiper ref to topology_coordinator
It is used in the following commit.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
8605cdd9cd test: test_cluster_features: stop gracefully before replace
In on of the previous commits, we have made
ManagerClient.server_add wait until all running nodes see the node
being replaced as dead. Unfortunately, the waiting time is around
20 s if we stop the node being replaced ungracefully. We change the
stop procedure to graceful to not slow down the test.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
206a446a02 test: decrease failure_detector_timeout_in_ms in replace tests
In one of the previous commits, we have made
ManagerClient.server_add wait until all running nodes see the node
being replaced as dead. Unfortunately, the waiting time can be
around 20 s if we stop the node being replaced ungracefully. 20 s
is the default value of the failure detector timeout.

We don't want to slow down the replace operations this much for no
good reason. We could use server_stop_gracefully instead of
server_stop everywhere, but we should have at least a few replace
tests with server_stop. For now, test_replace and
test_raft_ignore_nodes will be these tests. To keep them reasonably
fast, we decrease the failure_detector_timeout_in_ms value on all
initial servers.

We also skip test_replace in debug mode to avoid flakiness due to
low failure_detector_timeout_in_ms (test_raft_ignore_nodes is
already skipped).
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
7062ff145e test: move test_replace to topology_custom
In the following commit, we make all servers in test_replace use
failure-detector-timeout-in-ms = 2000. Therefore, we need
test_replace to be in a suite with initial_size equal to 0.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
9775b1c12d test: server_add: wait until the node being replaced is dead
In the following commits, we make the topology coordinator reject
join requests if the node being replaced is considered alive by the
gossiper. Before making this change, we need to adapt the testing
framework so that we don't have flaky replace operations that fail
because the node being replaced hasn't been marked as dead yet. We
achieve this by waiting until all other running nodes see the node
being replaced as dead in all replace operations.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
18ed89f760 test: server_add: add support for expected errors
After this change, if we try to add a server and it fails with an
expected error, the add_server function will not throw. Also, the
server will be correctly installed and stopped.

Two issues are motivating this feature.

The first one is that if we want to add a server while expecting
an error, we have to do it in two steps:
- call server_add with the start parameter set to False,
- call server_start with the expected_error parameter.
It is quite inconvenient.

The second one is that we want to be able to test the replace
operation when it is considered incorrect, for example when we try
to replace an alive node. To do this, we would have to remove
some assertions from ScyllaCluster.add_server. However, we should
not remove them because they give us clear information when we
write an incorrect test. After adding the expected_error parameter,
we can ignore these assertions only when we expect an error. In
this way, we enable testing failing replace operations without
sacrificing the testing framework's protection.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
ee45a1c430 raft topology: join: delay advertising replacing node if it reuses IP
After this change, other nodes can see the replacing node as alive
only after the topology coordinator accepts its join request.

In the following commits, we make the topology coordinator reject
join requests if the node being replaced is considered alive by the
gossiper. However, the replacing node may become alive before the
topology coordinator does the validation. If the replacing node
reuses the IP of the node being replaced, the topology coordinator
cannot know which of these two nodes is alive and whether it should
reject the join request.

The gossiper-based topology also delays the replacing node from
advertising itself if it reuses the IP. To achieve the same effect
in raft-based topology, we only need to move the definition of
replacing_a_node_with_same_ip. However, there is a code that puts
bootstrap tokens of the node being replaced into the gossiper
state, and it depends on replacing_a_node_with_same_ip and
replacing_a_node_with_diff_ip being always false in the raft-based
topology mode. We prevent it from breaking by changing the
condition.
2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak
c0e4b8e9c0 raft topology: join: fix a condition in validate_joining_node
It was incorrect. node.rs->state evaluated to node_state::none
for both join and replace.
2023-11-21 12:39:13 +01:00
Tomasz Grabiec
93ee7b7df9 test: cql_test_env: Interrupt all components on cql_test_env teardown
This should interrupt all sleeps in component teardown.

Before this patch, there was a 1s sleep on gossiper shutdown, which I
don't know where it comes from. After the patch there is no such
sleep.
2023-11-21 12:22:32 +01:00
Tomasz Grabiec
7f3a74efab tests: cql_test_env: Skip gossip shutdown sleep
Removes unnecessary 2s sleep on each cql test env teardown.
2023-11-21 12:22:24 +01:00
Pavel Emelyanov
0e9428ab4a exceptions: Extend storage_io_error construction options
To make it possible to construct it with plain errno value and a string

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 13:37:52 +03:00
Calle Wilund
33fba28265 commitlog_test: Add test for replaying large-ish mutation
(i.e. cross several normal-sized buffers).
2023-11-21 08:50:57 +00:00
Calle Wilund
0d41769daa commitlog_test: Add additional test for segmnent truncation
Emulate replay of a non-sealed segment, verifying we don't get
data beyond termination point, as well as the correct exception.
2023-11-21 08:50:57 +00:00
Calle Wilund
57a4645c81 docs: Add docs on commitlog format 3 2023-11-21 08:50:57 +00:00
Calle Wilund
6b66daabfc commitlog: Remove entry CRC from file format
Since CRC is already handled by disk blocks, we can remove some of the
entry CRC:ing, both simplifying code and making at least that part of
both write and read faster.
2023-11-21 08:50:57 +00:00
Calle Wilund
e29bf6f9e8 commitlog: Implement new format using CRC:ed sectors
Breaks the file into individually tagged + crc:ed pages.
Each page (sized as disk write alignment) gets a trailing
12-byte metadata, including CRC of the first page-12 bytes,
and the ID of the segment being written.

When reading, each page read is CRC:ed and checked to be part
of the expected segment by comparing ID:s. If crc is broken,
we have broken data. If crc is ok, but ID does not match, we
have a prematurely terminated segment (truncated), which, depending
on whether we use batch mode or not, implied data loss.
2023-11-21 08:50:54 +00:00
Calle Wilund
18e79d730e commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges
With somewhat less overhead than creating 100+ temporary_buffer proxies
2023-11-21 08:42:33 +00:00
Calle Wilund
560364d278 fragmented_temporary_buffer: Add const iterator access to underlying buffers
Breaks abstraction a bit, but some (me) might need something like it...
2023-11-21 08:42:33 +00:00
Calle Wilund
862f4f2ed3 commitlog_replayer: differentiate between truncated file and corrupt entries
Refs #11845

When replaying, differentiate between the two cases for failure we have:
 - A broken actual entry - i.e. entry header/data does not hold up to
   crc scrutiny
 - Truncated file - i.e. a chunk header is broken or unreadable. This can
   be due to either "corruption" (i.e. borked write, post-corruption, hw
   whatever), or simply an unterminated segment.

The difference is that the former is recoverable, the latter is not.
We now signal and report the two separately. The end result for a user
is not much different, in either case they imply data loss and the
need for repair. But there is some value in differentiating which
of the two we encountered.

Modifies and adds test cases.
2023-11-21 08:42:33 +00:00
Botond Dénes
65e42e4166 Merge 'mutation_query: properly send range tombstones in reverse queries' from Michał Chojnowski
reconcilable_result_builder passes range tombstone changes to _rt_assembler
using table schema, not query schema.
This means that a tombstone with bounds (a; b), where a < b in query schema
but a > b in table schema, will not be emitted from mutation_query.

This is a very serious bug, because it means that such tombstones in reverse
queries are not reconciled with data from other replicas.
If *any* queried replica has a row, but not the range tombstone which deleted
the row, the reconciled result will contain the deleted row.

In particular, range deletes performed while a replica is down will not
later be visible to reverse queries which select this replica, regardless of the
consistency level.

As far as I can see, this doesn't result in any persistent data loss.
Only in that some data might appear resurrected to reverse queries,
until the relevant range tombstone is fully repaired.

This series fixes the bug and adds a minimal reproducer test.

Fixes #10598

Closes scylladb/scylladb#16003

* github.com:scylladb/scylladb:
  mutation_query_test: test that range tombstones are sent in reverse queries
  mutation_query: properly send range tombstones in reverse queries
2023-11-21 09:19:14 +02:00
Kefu Chai
691f7f6edb util: do not use variable length array
vla (variable length array) is an extension in GCC and Clang. and
it is not part of the C++ standard.

so let's avoid using it if possible, for better standard compliant.
it's also more consistent with other places where we calculate the size
of an array of T in the same source file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16084
2023-11-20 23:02:41 +02:00
Nadav Har'El
0fd10690d4 Merge 'When creating S3-backed keyspace, check the endpoint instantly' from Pavel Emelyanov
Currently CREATE KEYSPACE ... WITH STORAGE = { 'type' = 'S3' ... } will create keyspace even if the backend configuration is "invalid" in the sense that the requested endpoint is not known to scylla via object_storage.yaml config file. The first time after that when this misconfiguration will reveal itself is when flushing a memtable (see #15635), but it's good to know the endpoint is not configured earlier than that.

fixes: #15074

Closes scylladb/scylladb#16038

* github.com:scylladb/scylladb:
  test: Add validation of misconfigured storage creation
  sstables: Throw early if endpoint for keyspace is not configured
  replica: Move storage options validation to sstables manager
  test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store
  sstables: Add has_endpoint_client() helper to manager
2023-11-20 21:12:48 +02:00
Kefu Chai
9a3c7cd768 build: cmake: drop Seastar_OptimizationLevel_*
in this change,

* all `Seastar_OptimizationLevel_*` are dropped.
* mode.Sanitize.cmake:
    s/CMAKE_CXX_FLAGS_COVERAGE/CMAKE_CXX_FLAGS_SANITIZE/
* mode.Dev.cmake:
    s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/

Seastar_OptimizationLevel_* variables have nothing to do with
Seastar, and they introduce unnecessary indirection. the function
of `update_cxx_flags()` already requires an option name for this
parameter, so there is no need to have a name for it.

the cached entry of `Seastar_OptimizationLevel_DEBUG` is also
dropped, if we really need to have knobs which can be configured
by user, we should define them in a more formal way. at this
moment, this is not necessary. so drop it along with this
variable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16059
2023-11-20 19:26:54 +02:00
Botond Dénes
6e9850067b Merge 'Make test-only write_memtable_to_sstable() overloads shorter' from Pavel Emelyanov
There are three of them, one is used by core, another by tests and the third one passes arguments between those two. And the ..._for_tests() helper in test utils. This PR leaves only one for tests out of three.

Closes scylladb/scylladb#16068

* github.com:scylladb/scylladb:
  tests: Shorten the write_memtable_to_sstable_for_test()
  replica: Squash two write_memtable_to_sstable()
  replica: Coroutinize one of write_memtable_to_sstable() overloads
2023-11-20 16:05:06 +02:00
Pavel Emelyanov
9b16c298e9 test: Add validation of misconfigured storage creation
In an attempt to create a non-local keyspace with unknown endpoint,
there should pop up the configuration exception.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 15:25:58 +03:00
Pavel Emelyanov
2bf1e2a294 sstables: Throw early if endpoint for keyspace is not configured
When a keyspace is created it initiaizes the storage for it and
initialization of S3 storage is the good place to check if the endpoint
for the storage is configured at all.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 15:25:58 +03:00
Pavel Emelyanov
f2a99ad30a replica: Move storage options validation to sstables manager
Currently the cql statement .validate() callback is responsible for
checking if the non-local storage options are allowed with the
respective feature. Next patch will need to extend this check to also
validate the details of the provided storage options, but doing it at
cql level doesn't seem correct -- it's "too far" from query processor
down to sstables manager.

Good news is that there's a lower-level validation of the new keyspace,
namely the database::validate_new_keyspace() call. Move the storage
options validation into sstables manager, while at it, reimplement it
as a visitor to facilitate further extentions and plug the new
validation to the aforementioned database::validate_new_keyspace().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 15:24:59 +03:00
Botond Dénes
f53961248d gms,service: add a feature to protect the usage of allow_mutation_read_page_without_live_row
allow_mutation_read_page_without_live_row is a new option in the
partition_slice::option option set. In a mixed clusters, old nodes
possibly don't know this new option, so its usage must be protected by a
cluster feature. This patch does just that.

Fixes: #15795

Closes scylladb/scylladb#15890
2023-11-20 13:03:55 +01:00
Botond Dénes
935065fd8d Update tools/java submodule
* tools/java b776096d...8485bef3 (2):
  > dist: Require jre-11-headless in from rpm
  > dist: remove duplicated java-headless from "Requires"
2023-11-20 13:55:55 +02:00
Pavel Emelyanov
b31b51ae90 test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store
We're going to ban creation of a keyspace with S3 type in case the
requested endpoint is not configured. The problem is that this test case
of cql-pytest needs such keyspace to be created and in order to provide
the object storage configuration we'd need to touch the generic scylla
cluster management which is an overill for generic cql-pytest case.

Simpler solution is to make object_store test suite perform all the
S3-related checks, including the way DESCRIBE for S3-backed ks works.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:31:08 +03:00
Pavel Emelyanov
2c31cd7817 sstables: Add has_endpoint_client() helper to manager
It's the get_endpoint_client() peer that only checks the client
presense. To be used by next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:31:08 +03:00
Pavel Emelyanov
8ae751a3ff tests: Shorten the write_memtable_to_sstable_for_test()
The wrapper just calls the test-only core write_memtable_to_sstable()
overload, tests can do it on their own.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:27:57 +03:00
Pavel Emelyanov
1d7d2dff50 replica: Squash two write_memtable_to_sstable()
There are three of them and one acts purely as arguments passer between
other two.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:27:57 +03:00
Pavel Emelyanov
e9826858a9 replica: Coroutinize one of write_memtable_to_sstable() overloads
Simpler to read and patch further this way

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 14:27:57 +03:00
Pavel Emelyanov
f4626f6b8e storage_service: Drop (un)init_messaging_service_part() pair
It's no longer needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:59:08 +03:00
Pavel Emelyanov
c42c13e658 storage_service: Init/Deinit RPC handlers in constructor/stop
All the services that need to register RPC handlers do it in service
constructor or .start() method. Unregistration happens in .stop().
Storage service explicitly (de)initializes its RPC handlers in dedicated
calls, but there's no point in that. The handlers' accessibility is
determined by messaging service start_lister/shutdown, handlers
themselves can be registered any time before it and unregistered any
time after it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:57:07 +03:00
Pavel Emelyanov
40cb9dd66f storage_service: Dont capture container() on RPC handler
The handlers are about to be initialized from inside storage_service
constructor. At that time container() is not yet available and its
invalid to capture it on handlers' lambda. Fortunately, there's only one
handler that does it, other handlers capture 'this' and call container()
explicitly. This patch fixes the remaining one to do the same.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:55:56 +03:00
Pavel Emelyanov
cc76f03f63 storage_service: Use storage_service::_sys_dist_ks in some places
The main goal here is to drop sys.dist.ks argument from the
init_messaging_service call to make future patching simpler. While doing
it it turned out that the argument was needed to be passed all the way
down to the mark_existing_views_as_built(), so this patch also dropes
this argument from this whole call-trace.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:53:55 +03:00
Pavel Emelyanov
4df5af931a storage_service: Add explicit dependency on system dist. keyspace
This effectively reverts bc051387c5 (storage_service: Remove sys_dist_ks
from storage_service dependencies) since now storage service needs the
sys. disk. ks not only cluster join time. Next patch will make more use
of it as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:52:42 +03:00
Pavel Emelyanov
a7f23930cb storage_service: Rurn query processor pointer into reference
It's non-nullptr all the time after previous patch and can be a
reference instead

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:52:04 +03:00
Pavel Emelyanov
e59544674a storage_service: Add explicity query_processor dependency
It's now set via a dedicated call that happens after query processor is
started. Now query processor is started before storage service and the
latter can get the q.p. local reference via constructor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:51:09 +03:00
Pavel Emelyanov
6ee8e7a031 main: Start storage service later
The storage service is top-level service which depends on many other
services. Recently (see d42685d0cb storage_service: Load tablet
metadata on boot and from group0 changes) it also got implicit
dependency on query processor, but it still starts too early for
explicit reference on q.p.

This patch moves storage service start to later times. This is possible
because storage service is not explicitly needed by any other component
start/init in between its old and new start places. Also, cql_test_ent
starts storage service "that late" too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-20 13:48:30 +03:00
Nadav Har'El
5752dc875b Merge 'Materialize_views: don't construct global_schema_ptr from views schemas that lacks base information' from Eliran Sinvani
This miniset addresses two potential conversions to `global_schema_ptr` of incomplete materialized views schemas.
One of them was completely unnecessary and also is a "chicken and an egg" problem where on the sync schema procedure itself a view schema was converted to `global_schema_ptr` solely for the purposes of logging. This can create a
"hickup" in the materialized views updates if they are comming from a node with a different mv schema.
The reason why sometimes a synced schema can have no base info is because of deactivision and reactivision of the schema inside the `schema_registry` which doesn't restore the base information due to lack of context.
When a schema is synced the problem becomes easy since we can just use the latest base information from the database.

Fixes #14011

Closes scylladb/scylladb#14861

* github.com:scylladb/scylladb:
  migration manager: fix incomplete mv schemas returned from get_schema_for_write
  migration_manager: do not globalize potentially incomplete schema
2023-11-20 11:54:01 +02:00
Pavel Emelyanov
3471f30b58 view_update_generator: Unplug from database later
Patch 967ebacaa4 (view_update_generator: Move abort kicking to
do_abort()) moved unplugging v.u.g from database from .stop() to
.do_abort(). The latter call happens very early on stop -- once scylla
receives SIGINT. However, database may still need v.u.g. plugged to
flush views.

This patch moves unplug to later, namely to .stop() method of v.u.g.
which happens after database is drained and should no longer continue
view updates.

fixes: #16001

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16091
2023-11-20 11:47:55 +02:00
Botond Dénes
fd11eeeaa3 Merge 'dist/redhat: drop unnecessary variables and tags' from Kefu Chai
this is a cleanup in `scylla.spec`.

Closes scylladb/scylladb#16097

* github.com:scylladb/scylladb:
  dist/redhat: group sub-package preambles together
  dist/redhat: drop unused `defines` variable
  dist/redhat: remove tags for subpackage which are same as main preamble
2023-11-20 11:46:56 +02:00
Asias He
c605220bb3 repair: Introduce small table optimization
*) Problem:

We have seen in the field it takes longer than expected to repair system tables
like system_auth which has a tiny amount of data but is replicated to all nodes
in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The
main reason for the slowness is that even if the amount of data is small,
repair has to walk though all the token ranges, that is num_tokens *
number_of_nodes_in_the_cluster. The overhead of the repair protocol for each
token range dominates due to the small amount of data per token range. Another
reason is the high network latency between DCs makes the RPC calls used to
repair consume more time.

*) Solution:

To solve this problem, a small table optimization for repair is introduced in
this patch. A new repair option is added to turn on this optimization.

- No token range to repair is needed by the user. It  will repair all token
ranges automatically.

- Users only need to send the repair rest api to one of the nodes in the
cluster. It can be any of the nodes in the cluster.

- It does not require the RF to be configured to replicate to all nodes in the
cluster. This means it can work with any tables as long as the amount of data
is low, e.g., less than 100MiB per node.

*) Performance:

1)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Before:
```
repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1537, round_nr=4612,
round_nr_fast_path_already_synced=4611,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds,
tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2,
0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848},
{127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0},
{127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s,
tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={}
```

After:
```
repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds,
tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178},
{127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1},
{127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286},
{127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286},
{127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026},
{127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={},
rx_row_nr_peer={}
```

The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X

2)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Same test as above except 5ms delay is added to simulate multiple dc
network latency:

The time to repair is reduced from 333s to 0.2s.

333.26758 s / 0.22625381s = 1472.98

3)

3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3}
, 10 ms network latency

Before:

```
repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=2305, round_nr=6916,
round_nr_fast_path_already_synced=6915,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015
seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0},
{127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1},
{127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}},
row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2,
1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385},
{127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0},
{127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0},
{127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={}
```

After:

```
repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=1, round_nr=4,
round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915
seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178},
{127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}},
row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}},
row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2,
0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793},
{127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7,
0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634},
{127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4,
2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634},
{127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9,
2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
```

The time to repair is reduced from 986s (16 minutes) to 1.6s

*) Summary

So, a more than 1000X difference is observed for this common usage of
system table repair procedure.

Fixes #16011
Refs  #15159
2023-11-20 15:11:16 +08:00
Kefu Chai
71f352896d dist/redhat: group sub-package preambles together
group sections like `%build` and `%install` together, to improve
the readability of the spec recipe.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-20 12:19:33 +08:00
Kefu Chai
3f108629b9 dist/redhat: drop unused defines variable
this variable was introduced in 6d7d0231. back then, we were still
building the binaries in .spec, but we've switched to the relocatable
package now, so there is no need to use keep these compilation related
flags anymore.

in this change, the `defines` variable is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-20 12:19:33 +08:00
Kefu Chai
d69b4838ea dist/redhat: remove tags for subpackage which are same as main preamble
this is a cleanup.

if a subpackage is licensed under a different license from the one
specified in the main preamble, we need to use a distinct License
tag on a per-subpackage basis. but if it is licensed with the
identical license, it is not necessary. since all three
subpackages of "*-{server, conf, kernel-conf}" are licensed under
AGPLv3, there is no need to repeat the "License:" tag in their
own preamble section.

the same applies to the "URL" tag.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-20 12:19:33 +08:00
Eliran Sinvani
63631257db migration manager: fix incomplete mv schemas returned from
get_schema_for_write

Sometimes a view registry can get deactivated inside the schema
registry, this happens due to dactivating and reactivating the registry
entry which doesn't rebuild the base table information in the view.
This error is later caught when trying to convert the schema into a
`global_schema_ptr`, however, the real bug here is that not all schemas
returned from `get_schema_for_write` are suitable for write because the
mv schemas can be incomplete.
This commit changes the aforementioned function in order to fix the bug.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-20 06:07:20 +02:00
Piotr Grabowski
321459ec51 install-dependencies.sh: update node_exporter to 1.7.0
Update node_exporter to 1.7.0.

The previous version (1.6.1) was flagged by security scanners (such as
Trivy) with HIGH-severity CVE-2023-39325. 1.7.0 release fixed that
problem.

[Botond: regenerate frozen toolchain]

Fixes #16085

Closes scylladb/scylladb#16086

Closes scylladb/scylladb#16090
2023-11-19 18:15:44 +02:00
Calle Wilund
6ffb482bf3 Commitlog replayer: Range-check skip call
Fixes #15269

If segment being replayed is corrupted/truncated we can attempt skipping
completely bogues byte amounts, which can cause assert (i.e. crash) in
file_data_source_impl. This is not a crash-level error, so ensure we
range check the distance in the reader.

v2: Add to corrupt_size if trying to skip more than available. The amount added is "wrong", but at least will
    ensure we log the fact that things are broken

Closes scylladb/scylladb#15270
2023-11-19 17:44:55 +02:00
Gleb Natapov
6edbf4b663 storage_service: topology coordinator: put fence version into the raft state
Currently when the coordinator decides to move the fence it issues an
RPC to each node and each node locally advances fence version. This is
fine if there are no failures or failures are handled by retrying
fencing, but if we want to allow topology changes to progress even in
the presence of barrier failures it is easier to store the fence version
in the raft state. The nodes that missed fence rpc may easily catch up
to the latest fence version by simply executing a raft barrier.
2023-11-19 15:28:08 +02:00
Eliran Sinvani
562403b82f migration_manager: do not globalize potentially incomplete schema
There was a case where maybe sync function of a materialized view could
fail to sync if the view version was old. This is because adding the
base information to the view is only relevant until the record is
synced. This triggers an internal error in the `global_schem_ptr`
constructor.
The conversion to global pointer in that case was solely for logging
purposes so instead, we pass the pieces of information needed for the
logging itself.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-11-19 14:13:01 +02:00
Botond Dénes
eb674128ca Merge 'treewide: do not mark return value const if this has no effect ' from Kefu Chai
this change is a cleanup to add `-Wignore-qualifiers` when building the tree.

to mark a return value without value semantics has no effect. these
`const` specifier useless. so let's drop them.

and, if we compile the tree with `-Wignore-qualifiers`, the compiler
would warn like:

```
/home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
  245 |     const index_metadata_kind kind() const;
      |     ^~~~~
```
so this change also silences the above warnings.

Closes scylladb/scylladb#16083

* github.com:scylladb/scylladb:
  build: enable -Wignore-qualifiers
  treewide: do not mark return value const if this has no effect
2023-11-17 15:35:20 +02:00
Kefu Chai
781b7de502 build: enable -Wignore-qualifiers
`-Wignore-qualifiers` is included by -Wextra. but we are not there yet,
with this change, we can keep the changes introducing -Wignore-qualifiers
warnings out of the repo, before applying `-Wextra`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-17 17:49:47 +08:00
Kefu Chai
15bfa09454 treewide: do not mark return value const if this has no effect
this change is a cleanup.

to mark a return value without value semantics has no effect. these
`const` specifier useless. so let's drop them.

and, if we compile the tree with `-Wignore-qualifiers`, the compiler
would warn like:

```
/home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
  245 |     const index_metadata_kind kind() const;
      |     ^~~~~
```
so this change also silences the above warnings.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-17 17:46:19 +08:00
Tomasz Grabiec
6bcf3ac86c Merge 'Fix a few rare bugs in row cache' from Michał Chojnowski
This is a loose collection of fixes to rare row cache bugs flushed out by running test_concurrent_reads_and_eviction several million times. See individual commits for details.

Fixes #15483

Closes scylladb/scylladb#15945

* github.com:scylladb/scylladb:
  partition_version: fix violation of "older versions are evicted first" during schema upgrades
  cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound()
  cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity()
  cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads
  partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads
  cache_flat_mutation_reader: fix some cache mispopulations with reverse reads
  cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads
  cache_flat_mutation_reader: never make an unlinked last dummy continuous
2023-11-16 23:48:17 +01:00
Michał Chojnowski
9ccd4ea416 partition_version: fix violation of "older versions are evicted first" during schema upgrades
A schema upgrade appends a MVCC version B after an existing version A.

The last dummy in B is added to the front of LRU,
so it will be evicted after the entries in A.

This alone doesn't quite violate the "older versions are evicted first" rule,
because the new last dummy carries no information. But apply_monotonically
generally assumes that entries on the same position have the obvious
eviction order, even if they carry no information. Thus, after the merge,
the rule can become broken.

The proposed fix is as follows:

- In the case where A is merged into B, the merged last dummy
  inherits the link of A.
- The merging of B into anything is prevented until its merge with A is finished.

This is relatively hacky, because it still involves a state that
goes against some natural expectations granted by the "older versions..."
rule. A less hacky fix would be to ensure that the new dummy is inserted
into a proper place in the eviction order to begin with.

Or, better yet, we could eliminate the rule altogether.
Aside from being very hard to maintain, it also prevents the introduction
of any eviction algorithm other than LRU.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
2aac8690c7 cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound()
ensure_population_lower_bound() guarantees that _last_row is valid or null.

However, it fails to provide this guarantee in the special rare case when
`_population_range_starts_before_all_rows == true` and _last_row is non-null.

(This can happen in practice if there is a dummy at before_all_clustering_rows
and eviction makes the `(before_all_clustering_rows, ...)` interval
discontinous. When the interval is read in this state, _last_row will point to
the dummy, while _population_range_starts_before_all_rows will still be true.)

In this special case, `ensure_population_lower_bound()` does not refresh
`_last_row`, so it can be non-null but invalid after the call.
If it is accessed in this state, undefined behaviour occurs.
This was observed to happen in a test,
in the `read_from_underlying() -- maybe_drop_last_entry()` codepath.

The proposed fix is to make the meaning of _population_range_starts_before_all_rows
closer to its real intention. Namely: it's supposed to handle the special case of a
left-open interval, not the case of an interval starting at -inf.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
0dcf91491e cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity()
To reflect the final range tombstone change in the populated range,
maybe_update_continuity() might insert a dummy at `before_key(_next_row.table_position())`.

But the relevant logic breaks down in the special case when that position is
equal to `_last_row.position()`. The code treats the dummy as a part of
the (_last_row, _next_row) range, but this is wrong in the special case.

This can lead to inconsistent state. For example, `_last_row` can be wrongly made
continuous, or its range tombstone can be wrongly nulled.

The proposed fix is to only modify the dummy if it was actually inserted.
If it had been inserted beforehand (which is true in the special case, because
of the `ensure_population_lower_bound()` call earlier), then it's already in a
valid state and doesn't need changes.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
6601c778dd cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads
Cache population routines insert new row entries.

In non-reverse reads, the new entries (except for the lower bound of the query
range) are filled with the correct continuity and range tombstones immediately
after insertion, because that information has already arrived from underlying.
when the entries are inserted.

But in reverse reads, it's the interval *after* the newly-inserted entry
that's made continuous. The continuity information in the new entries isn't
filled. When two population routines race, the one which comes later can
punch holes in the continuity left by the first routine, which can break
the "older versions are evicted first" rule and revert the affected
interval to an older version.

To fix this, we must make sure that inserting new row entries doesn't
change the total continuity of the version.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
47299d6b06 partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads
The FIXME comment claims that setting continity isn't very important in this
place, but in fact this is just wrong.

If two calls to read_from_underlying() get into a race, the one which finishes
later can call ensure_entry_in_latest() on a position which lies inside a
continuous interval in the newest version. If we don't take care to preserve
the total continuity of the version, this can punch a hole in the continuity of the
newest version, potentially reverting the affected interval to an older version.

Fix that.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
b5988fb389 cache_flat_mutation_reader: fix some cache mispopulations with reverse reads
`_last_row` is in table schema, but it is sometimes compared with positions in
query schema. This leads to unexpected behaviour when reverse reads
are used.
The previous patch fixed one such case, which was affecting correctness.

As far as I can tell, the three cases affected by this patch aren't
a correctness problem, but can cause some intervals to fail to be made continuous.
(And they won't be cached even if the same read is repeated many times).
2023-11-16 19:01:18 +01:00
Michał Chojnowski
f9eb64b8e0 cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads
`_last_row` is in table schema, while `cur.position()` is in query schema
(which is either equal to table schema, or its reverse).

Thus, the comparison affected by this patch doesn't work as intended.
In reverse reads, the check will pass even if `_last_row` has the same key,
but opposite bound weight to `cur`, which will lead to the dummy being inserted
at the wrong position, which can e.g. wrongly extend a range tombstone.

Fix that.
2023-11-16 19:01:18 +01:00
Michał Chojnowski
ec364c3580 cache_flat_mutation_reader: never make an unlinked last dummy continuous
It is illegal for an unlinked last dummy to be continuous,
(this is how last dummies respect the "older verions are evicted first" rule),
but it is technically possible for an unlinked last dummy to be
made continuous by read_from_underlying. This commit fixes that.

Found by row_cache_test.

The bug is very unlikely to happen in practice because the relevant
rows_entry is bumped in LRU before read_from_underlying starts.
For the bug to manifest, the entry has to fall down to the end of the
LRU list and be evicted before read_from_underlying() ends.
Usually it takes several minutes for an entry to fall out of LRU,
and read_from_underlying takes maybe a few hundred milliseconds.

And even if the above happened, there still needs to appear a new
version, which needs to have its continuous last dummy evicted
before it's merged.
2023-11-16 19:01:18 +01:00
Anna Stuchlik
ca22de4843 doc: mark the link to upgrade guide as OSS-only
This commit adds the .. only:: opensource directive
to the Raft page to exclude the link to the 5.2-to-5.4
upgrade guide from the Enterprise documentation.

The Raft page belongs to both OSS and Enterprise
documentation sets, while the upgrade guide
is OSS-only. This causes documentation build
issues in the Enterprise repository, for example,
https://github.com/scylladb/scylla-enterprise/pull/3242.

As a rule, all OSS-only links should be provided
by using the .. only:: opensource directive.

This commit must be backported to branch-5.4
to prevent errors in the documentation for
ScyllaDB Enterprise 2024.1

(backport)

Closes scylladb/scylladb#16064
2023-11-16 10:36:27 +02:00
Kefu Chai
687ba9cacc test/sstable_compaction_test: check every sstable replaced sstable
before this change, in sstable_run_based_compaction_test, we check
every 4 sstables, to verify that we close the sstable to be replaced
in a batch of 4.

since the integer-based generation identifier is monotonically
incremental, we can assume that the identifiers of sstables are like
0, 1, 2, 3, .... so if the compaction consumes sstable in a
batch of 4, the identifier of the first one in the batch should
always be the multiple of 4. unfortunately, this test does not work
if we use uuid-based identifier.

but if we take a closer look at how we create the dataset, we can
have following facts:

1. the `compaction_descriptor` returned by
   `sstable_run_based_compaction_strategy_for_tests` never
   set `owned_ranges` in the returned descriptor
2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no`
   is used, if `_owned_ranges_checker` is empty
3. `mutation_reader_merger` respects the `fwd_mr` passed to its
   ctor, so it closes current sstable immediately when the underlying
   mutation reader reaches the end of stream.

in other words, we close every sstable once it is fully consumed in
sstable_ompaction_test. and the reason why the existing test passes
is that we just sample the sstables whose generation id is a multiple
of 4. what happens when we perform compaction in this test is:

1. replace 5 with 33, closing 5
2. replace 6 with 34, closing 6
3. replace 7 with 35, closing 7
4. replace 8 with 36, closing 8   << let's check here.. good, go on!
5. replace 13 with 37, closing 13
...
8. replace 16 with 40, closing 16 << let's check here.. also, good, go on!

so, in this change, we just check all old sstables, to verify that
we close each of them once it is fully consumed.

Fixes #16073
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-16 16:21:46 +08:00
Kefu Chai
18792fe059 test/sstable_compaction_test: s/old_sstables.front()/old_sstable/
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-16 16:21:40 +08:00
Botond Dénes
323e34e1ed Update tools/java submodule
* tools/java 97c49094...b776096d (2):
  > build: take care of old libthrift [PART 2/2]
  > build: take care of old libthrift [PART 1/2]
2023-11-16 10:14:38 +02:00
Kefu Chai
12f4f9f481 build: cmake: link against cryptopp::cryptopp
instead of linking against cryptopp, we should link against
crytopp::crytopp. the latter is the target exposed by
Findcryptopp.cmake, while the former is but a library name which
is not even exposed by any find_package() call.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16060
2023-11-15 17:14:04 +02:00
Anna Stuchlik
e8129d9a0c doc: remove DateTieredCompactionStrategy
This commit removes support for DateTieredCompactionStrategy
from the documentation.

Support for DTCS was removed in 5.4, so this commit
must be backported to branch-5.4.

Refs https://github.com/scylladb/scylladb/issues/15869#issuecomment-1784181274

The information is already added to the 5.2-to-5.4
upgrade guide: https://github.com/scylladb/scylladb/pull/15988

(backport)

Closes scylladb/scylladb#16061
2023-11-15 15:39:57 +02:00
Pavel Emelyanov
f4fd5c7207 s3/client: Tag pieces of jumbo uploader
The jumbo sink is there to upload files that can be potentially larger
than 50Gb (10000*5Mb). For that the sink uploads a set of so called
"pieces" -- files up to 50Gb each -- then uses the copy-upload APi call
to squash the pieces together. After copying the piece is removed. In
case of a crash while uploading pieces remain in the bucket forever
which is not great.

This patch tags pieces with 'kind=piece' tag in order to tell pieces
from regular objects. This can be used, for example, by setting up the
lifecycle tag-based policy and collect dangling pieces eventually.

fixes: #13670

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16023
2023-11-15 15:32:30 +02:00
Kefu Chai
6a753f9f06 build: cmake: define SCYLLA_BUILD_MODE=dev for Dev mode
it was a typo in b234c839. so let's correct it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16063
2023-11-15 13:17:30 +02:00
Kefu Chai
972b852e0a build: cmake: explain the build dependencies in check-headers
developer might notice that when he/she builds 'check-headers',
the whole tree is built. so let's explain this behavior.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16062
2023-11-15 13:16:01 +02:00
Botond Dénes
ba17ae2ab6 Merge 'Fix tests in test/cql-pytest/ that fail on Cassandra' from Nadav Har'El
As a general rule, tests in test/cql-pytest shouldn't just pass on Scylla - they also should not fail on Cassandra; A test that fails on Cassandra may indicate that the test is wrong, or that Scylla's behavior is wrong and the test just enshrines that wrong behavior. Each time we see a test fail on Cassandra we need to check if this is not the case. We also have special markers scylla_only and cassandra_bug to put on tests that we know _should_ fail on Cassandra because it is missing some Scylla-only feature or there is a bug in Cassandra, respectively. Such tests will be xfailed/skipped when running on Cassandra, and not report failures.

Unfortunately, over time more several tests got into our suite in that did not pass on Cassandra. In this series I went over all of them, and fixed each to pass - or be skipped - on Cassandra, in a way that each patch explains.

Fixes #16027

Closes scylladb/scylladb#16033

* github.com:scylladb/scylladb:
  test/cql-pytest: fix test_describe.py to not fail on Cassandra
  test/cql-pytest: fix select_single_column_relation_test.py to not fail on Cassandra
  test/cql-pytest: fix compact_storage_test.py to not fail on Cassandra
  test/cql-pytest: fix test_secondary_index.py to not fail on Cassandra
  test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra
  test/cql-pytest: fix test_keyspace.py to not fail on Cassandra
  test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only
  test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra
  test/cql-pytest: fix test_filtering.py to not fail on Cassandra
2023-11-15 09:13:09 +02:00
Nadav Har'El
8964cce04c test/cql-pytest: fix test_describe.py to not fail on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

Some of the tests checked on Cassandra things that don't exist there
(namely local secondary indexes) and could skip that part. Other tests
need to be skipped completely ("scylla_only") because they rely on a
Scylla-only feature. We have a bit too many of those in this file, but
I don't want to fix this now.

Yet another test found a real bug in Cassandra 4.1.1 (CASSANDRA-17918)
but passes in Cassandra 4.1.2 and up, so there's nothing to fix except
a comment about the situation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:40:30 +02:00
Nadav Har'El
6802dca6b5 test/cql-pytest: fix select_single_column_relation_test.py to not fail on Cassandra
In commit 52bbc1065c, we started to allow "IN NULL" - it started to
match nothing instead of being an error as it is in Cassandra. The
commit *incorrectly* "fixed" the existing translated Cassandra unit test
to match the new behavior - but after this "fix" the test started to
fail on Cassandra.

The appropriate fix is just to comment out this part of the test and
not do it. It's a small point where we deliberately decided to deviate
from Cassandra's behavior, so the test it had for this behavior is
irrelevant.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
d8997d49e7 test/cql-pytest: fix compact_storage_test.py to not fail on Cassandra
Some error-message checks in this test file (which was translated in
the past from Cassandra) try operations which actually has two errors,
and expected to see one error message - but recent Cassandra prints
the other one. This caused several tests to fail when running on
Cassandra 4.1. Both messages are fine, so let's accept both.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
a7f5eb3621 test/cql-pytest: fix test_secondary_index.py to not fail on Cassandra
Fixed two tests thich failed when running on Cassandra:

One test waited for a secondary index to appear, but in Cassandra, the
index can be broken (cause a read failure) for a short while and we
need to wait through this failure as well and not fail the entire test.

Another test was for local secondary index, which is a Scylla-only
feature, but we forgot the "scylla_only" tag.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
92f591dc38 test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra
The test function test_mv_synchronous_updates checks the
synchronous_updates feature, which is a ScyllaDB extension and
doesn't exist in Cassandra. So it should be marked with "scylla_only"
so that it doesn't fail when running the tests on Cassandra.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
301189ee28 test/cql-pytest: fix test_keyspace.py to not fail on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

When testing some invalid cases of ALTER TABLE, the test required
that you cannot choose SimpleStrategy without specifying a
replication_factor. As explained in Refs #16028, this isn't true
in Cassandra 4.1 and up - it now has a default value for
replication_factor and it's no longer required.

So in this patch we split that part of the test to a separate test
function and mark it scylla_only.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
2b67cd3921 test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only
The tests in test/cql-pytest/test_guardrail_replication_strategy.py
are for a Scylla-only feature that doesn't exist in Cassandra, so
obviously they all fail on Cassandra. Let's mark them all as
scylla_only.

We use an autouse fixture to automatically mark all tests in this file
as scylla-only, instead of marking each one separately.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
c4d3e08987 test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

This patch is only a partial fix - it fixes trivial differences in error
messages, but some potentially-real differences remain so three of the
tests still fail:

1. Trying to set tombstone_threshold to 5.5 is an error in ScyllaDB
   ("must be between 0.0 and 1.0") but allowed in Cassandra.

2. Trying to set bucket_low to 0.0 is an error in ScyllaDB, giving the
   wrong-looking error message "must be between 0.0 and 1.0" (so 0.0 should
   have been fine?!) but allowed in Cassandra.

3. Trying to set timestamp_resolution to SECONDS is an error in ScyllaDB
   ("invalid timestamp resolution SECONDS") but allowed in Cassandra.
   I don't think anybody wants to actually use "SECONDS", but it seems
   legal in Cassandra, so do we need to support it?

The patch also simplifies the test to use cql-pytest's util.py, instead
of cassandra_tests/porting.py. The latter was meant to make porting
existing Cassandra tests easier - not for writing new ones - and made
using a regular expression for testing error messages harder so I
switched to using pytest.raises() whose "match=" accepts a regular
expression.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
8e51ebd8a0 test/cql-pytest: fix test_filtering.py to not fail on Cassandra
Yet another test file in cql-pytest which failed when run on Cassandra
(via test/cql-pytest/run-cassandra).

It turns out that when the token() function is used with incorrect
parameters (it needs to be passed all partition-key columns), the
error message is different in ScyllaDB and Cassandra. Both are
reasonable error messages, so if we insist on checking the error
message - we should allow both.

Also the same test called its second partition-key column "ck". This
is confusing, because we usually use the name "ck" to refer to a clustering
key. So just for clarity, we change this name to "pk2". This is not a
functional change in the test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-14 21:27:12 +02:00
Nadav Har'El
64d1d5cf62 Merge 'Fix partition estimation with TWCS tables during streaming' from Raphael "Raph" Carvalho
TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows.

Turns out we had two problems in this area that leads to suboptimal bloom filters.

1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed.
2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count.

For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong.

Fixes https://github.com/scylladb/scylladb/issues/15704.

Closes scylladb/scylladb#15938

* github.com:scylladb/scylladb:
  streaming: Improve partition estimation with TWCS
  streaming: Don't adjust partition estimate if segregation is postponed
2023-11-14 20:41:36 +02:00
Kefu Chai
d49ea833fd scylla-sstable: reject duplicate sstable names
before this change, `load_sstables()` fills the output sstables vector
by indexing it with the sstable's path. but if there are duplicated
items in the given sstable_names, the returned vector would have uninitialized
shared_sstable instance(s) in it. if we feed such a sstables to the
operation funcs, they would segfault when derferencing the empty
lw_shared_ptr.

in this change, we error out if duplicated sstable names are specified
in the command line.

an alternative is to tolerate this usage by initializing the sstables
vector with a back_inserter, as we always return a dictionary with the
sstable's name as the key, but it might be desirable from user's
perspective to preserve the order, like OrderedDict in Python. so
let's preserve the ordering of the sstables in the command line.

this should address the problem of the segfault if we pass duplicated
sstable paths to this tool.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16048
2023-11-14 19:37:14 +02:00
Botond Dénes
11cafd2fc8 Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.

Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.

Closes scylladb/scylladb#16050

* github.com:scylladb/scylladb:
  test: test abort of compaction task that isn't started yet
  test: test running compaction task abort
  tasks: fail if a task was aborted
  compaction: abort task manager compaction tasks
2023-11-14 14:55:17 +02:00
Kefu Chai
2bae14f743 dist: let scylla-server.service Wants var-lib-systemd-coredump
without adding `WantedBy=scylla-server.service` in
var-lib-systemd-coredump, if we starts `scylla-server.service`,
it does not necessarily starts `var-lib-systemd-coredump`
even if the latter is installed.

with `WantedBy=scylla-server.service` in var-lib-systemd-coredump,
if we starts `scylla-server.service`, var-lib-systemd-coredump
will be started also. and `Before=scylla-server.service` ensures
that, before `scylla-server.service` is started,
var-lib-systemd-coredump is already ready.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15984
2023-11-14 14:54:39 +02:00
Michał Jadwiszczak
0083ddd7a0 generic_server: use mutable reference in for_each_gently
Make `generic_server::gentle_iterator` a mutable iterator to allow
`for_each_gently` to make changes to the connections.

Fixes: #16035

Closes scylladb/scylladb#16036
2023-11-14 14:25:22 +02:00
Pavel Emelyanov
a87b5cfbec test/object_store: Generalize test table creation
All two and the upcoming third test cases in the test create the very
same ks.cf pair with the very same sequence of steps. Generalize them.

For the basic test case also tune up the way "expected" rows are
calculated -- now they are SELECT-ed right after insertion and the size
is checked to be non zero. Not _exactly_ the same check, but it's good
enough for basic testing purposes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15986
2023-11-14 13:55:02 +02:00
Takuya ASADA
338a9492c9 scylla_post_install.sh: detect RHEL correctly
$ID_LIKE = "rhel" works only on RHEL compatible OSes, not for RHEL
itself.
To detect RHEL correctly, we also need to check $ID = "rhel".

Fixes #16040

Closes scylladb/scylladb#16041
2023-11-14 13:53:35 +02:00
Kefu Chai
5a6c5320de test/sstable_compaction_test: use BOOST_REQUIRE_EQUAL when appropriate
Boost.Test prints the LHS and RHS when the predicate statement passed
to BOOST_REQUIRE_EQUAL() macro evaluates to false. so the error message
printed by Boost would be more developer friendly when the test fails.

in this test, we replace some BOOST_REQUIRE() with BOOST_REQUIRE_EQUAL()
when appropriate.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16047
2023-11-14 13:51:47 +02:00
Botond Dénes
f63645ceab Merge 'test/cql-pytest: fix test_permissions.py to not fail on Cassandra' from Nadav Har'El
This short series fixes test/cql-pytest/test_permissions.py to stop failing on Cassandra.

The second patch fixes these failures (and explains why). The first patch is a new test for UDFs, which helped me prove that one of the test_permissions.py failures in Cassandra is a Cassandra bug - some esoteric error path that prints the right message when no permissions are involved, becomes wrong when permissions are added.

Fixes #15969

Closes scylladb/scylladb#15979

* github.com:scylladb/scylladb:
  test/cql-pytest: fix test_permissions.py to not fail on Cassandra
  test/cql-pytest: add test for DROP FUNCTION
2023-11-14 13:50:51 +02:00
Gleb Natapov
f04e890690 storage_service: topology coordinator: do fencing even if draining failed
Token metadata barrier consists for two steps. First old request are
drained and then requests that are not drained are fenced. But currently
if draining fails then fencing is note done. This is fine if the
barrier's failure handled by retrying, but we when to start handling
errors differently. In fact during topology operation rollback we
already do not retry failed barrier.

The patch fixes the metadata barrier to do fencing even if draining
failed.
2023-11-14 13:06:41 +02:00
Aleksandra Martyniuk
6af581301b test: test abort of compaction task that isn't started yet
Test whether a task which parent was aborted has a proper status.
2023-11-14 10:36:38 +01:00
Botond Dénes
a66ec1d3c1 Merge 'Drop compaction_manager_test' from Pavel Emelyanov
This is continuation of a34c8dc4 (Drop compaction_manager_for_testing).

There's one more wrapper over compaction_manager to access its private fields. All such access was recently moved to sstables::test_env's compaction manager, now it's time to drop the remaining legacy wrapper class.

Closes scylladb/scylladb#16017

* github.com:scylladb/scylladb:
  test/utils: Drop compaction_manager_test
  test/utils: Get compaction manager from test_env
  test/sstables: Introduce test_env_compaction_manager::perform_compaction()
  test/env: Add sstables::test_env& to compaction_manager_test::run()
  test/utils: Add sstables::test_env& to compact_sstables()
  test/utils: Simplify and unify compaction_manager_test::run()
  test/utils: Squash two compact_sstables() helpers
  test/compaction: Use shorter compact_sstables() helper
  test/utils: Keep test task compaction gate on task itself
  test/utils: Move compaction_manager_test::propagate_replacement()
2023-11-14 11:25:17 +02:00
Kamil Braun
9212bdc6b1 migration_manager: more verbose logging for schema versions
We're observing nodes getting stuck during bootstrap inside
`storage_service::wait_for_ring_to_settle()`, which periodically checks
`migration_manager::have_schema_agreement()` until it becomes `true`:
scylladb/scylladb#15393.

There is no obvious reason why that happens -- according to the nodes'
logs, their latest in-memory schema version is the same.

So either the gossiped schema version is for some reason different
(perhaps there is a race in publishing `application_state::SCHEMA`) or
missing entirely.

Alternatively, `wait_for_ring_to_settle` is leaving the
`have_schema_agreement` loop and getting stuck in
`update_topology_change_info` trying to acquire a lock.

Modify logging inside `have_schema_agreement` so details about missing
schema or version mismatch are logged on INFO level, and an INFO level
message is printed before we return `true`. To prevent logs from getting
spammed, rate-limit the periodic messages to once every 5 seconds. This
will still show the reason in our tests which allow the node to hang for
many minutes before timing out. Also these schema agreement checks are
done on relatively rare occasions such as bootstrap, so the additional
logs should not be harmful.

Furthermore, when publishing schema version to gossip, log it on INFO
level. This is happening at most once per schema change so it's a rare
message. If there's a race in publishing schema versions, this should
allow us to observe it.

Ref: scylladb/scylladb#15393

Closes scylladb/scylladb#16021
2023-11-14 11:24:47 +02:00
Alexey Novikov
bd73536b33 When add duration field to UDT check whether this UDT is used in some clustering key
Having values of the duration type is not allowed for clustering
columns, because duration can't be ordered. This is correctly validated
when creating a table but do not validated when we alter the type.

Fixes #12913

Closes scylladb/scylladb#16022
2023-11-14 11:23:05 +02:00
Botond Dénes
4968f50ff7 Merge 'auth: fix error message when consistency level is not met' from Paweł Zakrzewski
Propagate `exceptions::unavailable_exception` error message to the client such as cqlsh.

Fixes #2339

Closes scylladb/scylladb#15922

* github.com:scylladb/scylladb:
  test: add the auth_cluster test suite
  auth: fix error message when consistency level is not met
2023-11-14 11:22:38 +02:00
Kefu Chai
4f361b73c4 build: cmake: consolidate the setting of cxx_flags
before this change, we define the CMAKE_CXX_FLAGS_${CONFIG} directly.
and some of the configurations are supposed to generate debugging info with
"-g -gz" options, but they failed to include these options in the cxx
flags.

in this change:

* a macro named `update_cxx_flags` is introduced to set this option.
* this macro also sets -O option

instead of using function, this facility is implemented as a macro so
that we can update the CMAKE_CXX_FLAGS_${CONFIG} without setting
this variable with awkward syntax like set

```cmake
set(${flags} "${${flags}}" PARENT_SCOPE)
```

this mirrors the behavior in configure.py in sense that the latter
sets the option on a per-mode basis, and interprets the option to
compiling option.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16043
2023-11-14 11:21:52 +02:00
Kefu Chai
a846291ce8 build: cmake: define SCYLLA_BUILD_MODE for Release build
this macro definition was dropped in 2b961d8e3f by accident.
in this change, let's bring it back. this macro is always necessary,
as it is checked in scylla source.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16044
2023-11-14 11:21:33 +02:00
Tomasz Grabiec
dc6a0b2c35 gossiper: Elevate logging level for node restart events
They cause connection drops, which is a significant disruptive
event. We should log it so that we can know that this is the cause of
the problems it may cause, like requests timing out. Connection drop
will cause coordinator-side requests to time out in the absence of
speculation.

Refs #14746

Closes scylladb/scylladb#16018
2023-11-14 11:21:13 +02:00
Kefu Chai
58f3ced4d6 scylla-gdb: raise if no tasks are found
the "task" fixture is supposed to return a task for test, if it
fails to do so, it would be an issue not directly related to
the test. so let's fail it early.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16042
2023-11-14 11:12:43 +02:00
Botond Dénes
22381441b0 migration_manager: also reload schema on enabling digest_insensitive_to_expiry
Currently, when said feature is enabled, we recalcuate the schema
digest. But this feature also influences how table versions are
calculated, so it has to trigger a recalculation of all table versions,
so that we can guarantee correct versions.
Before, this used to happen by happy accident. Another feature --
table_digest_insensitive_to_expiry -- used to take care of this, by
triggering a table version recalulation. However this feature only takes
effect if digest_insensitive_to_expiry is also enabled. This used to be
the case incidently, by the time the reload triggered by
table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was
already enabled. But this was not guaranteed whatsoever and as we've
recently seen, any change to the feature list, which changes the order
in which features are enabled, can cause this intricate balance to
break.
This patch makes digest_insensitive_to_expiry also kick off a schema
reload, to eliminate our dependence on (unguaranteed) feature order, and
to guarantee that table schemas have a correct version after all features
are enabled. In fact, all schema feature notification handlers now kick
off a full schema reload, to ensure bugs like this don't creep in, in
the future.

Fixes: #16004

Closes scylladb/scylladb#16013
2023-11-13 23:32:20 +02:00
Aleksandra Martyniuk
a63a6dcd93 test: test running compaction task abort
Test whether a task which is aborted while running has a proper status.
2023-11-13 16:06:36 +01:00
Aleksandra Martyniuk
2a9ee59cc4 tasks: fail if a task was aborted
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.

Finish a task with a failure if it was aborted with task manager api.
2023-11-13 16:06:20 +01:00
Aleksandra Martyniuk
599d6ebd52 compaction: abort task manager compaction tasks
Set top level compaction tasks as abortable.

Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
2023-11-13 15:46:58 +01:00
Kamil Braun
d24b305712 Merge 'raft topology: join: do not time out waiting for the node to be joined' from Patryk Jędrzejczak
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.

Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.

Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.

Additionally, after removing the timeout, we adjust the topology
coordinator. We make it try sending the response (both acceptance
and rejection) only once since we do not care if it fails anymore. We
only need to ensure that the joining node is moved to the left state
if sending fails.

Fixes #15865

Closes scylladb/scylladb#15944

* github.com:scylladb/scylladb:
  raft topology: fix indentation
  raft topology: join: try sending the response only once
  raft topology: join: do not time out waiting for the node to be joined
  group 0: group0_handshaker: add the abort_source parameter to post_server_start
2023-11-13 15:02:27 +01:00
Paweł Zakrzewski
a0dcc154c1 test: add the auth_cluster test suite
This commit adds the auth_cluster test suite to test a custom scenario
involving password authentication:
- create a cluster of 2 nodes with password authentication
- down one node
- the other node should refuse login stating that it couldn't reach
  QUORUM

References ScyllaDB OSS #2339
2023-11-13 14:04:28 +01:00
Paweł Zakrzewski
400aa2e932 auth: fix error message when consistency level is not met
Propagate `exceptions::unavailable_exception` error message to the
client such as cqlsh.

Fixes #2339
2023-11-13 14:04:23 +01:00
Takuya ASADA
85339d1820 scylla_setup: add warning for CentOS7 default kernel
Since CentOS7 default kernel is too old, has performance issues and also
has some bugs, we have been recommended to use kernel-ml kernel.
Let's check kernel version in scylla_setup and print warning if the
kernel is CentOS7 default one.

related #7365

Closes scylladb/scylladb#15705
2023-11-13 13:47:06 +02:00
Botond Dénes
2b11a02b67 Merge 'Improvements to gossiper shadow round' from Kamil Braun
Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions.
Fix the calculation of `nodes_down` which could count a single node multiple times.
Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode).
Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node).
More details in commit messages.

Ref: https://github.com/scylladb/scylladb/issues/15675

Closes scylladb/scylladb#15941

* github.com:scylladb/scylladb:
  gossiper: do_shadow_round: increment `nodes_down` in case of timeout
  gossiper: do_shadow_round: fix `nodes_down` calculation
  storage_service: make shadow round mandatory during bootstrap/replace
  gossiper: do_shadow_round: remove default value for nodes param
  gossiper: do_shadow_round: remove `fall_back_to_syn_msg`
2023-11-13 13:37:13 +02:00
Botond Dénes
dfd7981fa7 api/storage_service: start/stop native transport in the statement sg
Currently, it is started/stopped in the streaming/maintenance sg, which
is what the API itself runs in.
Starting the native transport in the streaming sg, will lead to severely
degraded performance, as the streaming sg has significantly less
CPU/disk shares and reader concurrency semaphore resources.
Furthermore, it will lead to multi-paged reads possibly switching
between scheduling groups mid-way, triggering an internal error.

To fix, use `with_scheduling_group()` for both starting and stopping
native transport. Technically, it is only strictly necessary for
starting, but I added it for stop as well for consistency.

Also apply the same treatment to RPC (Thrift). Although no one uses it,
best to fix it, just to be on the safe side.

I think we need a more systematic approach for solving this once and for
all, like passing the scheduling group to the protocol server and have
it switch to it internally. This allows the server to always run on the
correct scheduling group, not depending on the caller to remember using
it. However, I think this is best done in a follow-up, to keep this
critical patch small and easily backportable.

Fixes: #15485

Closes scylladb/scylladb#16019
2023-11-13 14:08:01 +03:00
Anna Stuchlik
8a4a8f077a doc: document full support for RBNO
This commit updates the Repair-Based Node
Operations page. In particular:
- Information about RBNO enabled for all
  node operations is added (before 5.4, RBNO
  was enabled for the replace operation, while
  it was experimental for others).
- The content is rewritten to remove redundant
  information about previous versions.

The improvement is part of the 5.4 release.
This commit must be backported to branch-5.4

Closes scylladb/scylladb#16015
2023-11-13 13:06:15 +02:00
Pavel Emelyanov
492b842929 messaging_service: Define metrics domain for client connections
Recent seastar update included RPC metrics (scylladb/seastar#1753). The
reported metrics groups together sockets based on their "metrics_domain"
configuration option. This patch makes use of this domain to make scylla
metrics sane.

The domain as this patch defines it includes two strings:

First, the datacenter the server lives in. This is because grouping
metrics for connections to different datacenters makes little sense for
several reasons. For example -- packet delays _will_ differ for local-DC
vs cross-DC traffic and mixing those latencies together is pointless.
Another example -- the amount of traffic may also differ for local- vs
cross-DC connections e.g. because of different usage of enryption and/or
compression.

Second, each verb-idx gets its own domain. That's to be able to analyze
e.g. query-related traffic from gossiper one. For that the existing
isolation cookie is taken as is.

Note, that the metrics is _not_ per-server node. So e.g. two gossiper
connections to two different nodes (in one DC) will belong to the same
domain and thus their stats will be summed when reported.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15785
2023-11-13 11:13:20 +01:00
Pavel Emelyanov
f4696f21a8 test/utils: Drop compaction_manager_test
This class only provides a .run() method which allocates a task and
calls sstables::test_env::perform_compaction(). This can be done in a
helper method, no need for the whole class for it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
b68f9c32bb test/utils: Get compaction manager from test_env
This is just to reduce churn in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
9fd270566a test/sstables: Introduce test_env_compaction_manager::perform_compaction()
Take it from compaction_manager_test::run() which is simplified overwite
of the compaction_manager::perform_compaction().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
0160265c7d test/env: Add sstables::test_env& to compaction_manager_test::run()
Continuation of the previous patch that will also be used further.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
393c066f3e test/utils: Add sstables::test_env& to compact_sstables()
Will be used in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
ca18db4a71 test/utils: Simplify and unify compaction_manager_test::run()
The method is the simplified rewrite of the compaction_manager's
perform_compaction() one, but it makes task registration and
unregistration to hard way. Keep it shorter and simpler resembling the
compaction_manager's prototype.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
9a9e1fdd7d test/utils: Squash two compact_sstables() helpers
Now the one sitting in utils is only called from its peer in compaction
test. Things get simpler if they get merged.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
69657a2a97 test/compaction: Use shorter compact_sstables() helper
There are several of them spread between the test and utils. One of the
test cases can use its local shorter overload for brevity. Also this
makes one of the next patches shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
59943267c2 test/utils: Keep test task compaction gate on task itself
They both have the same scope, but keeping it on the task frees the
caller from the need to mess with its private fields. For now it's not a
problem, but it will be critical in one of the next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Pavel Emelyanov
aec3fc493a test/utils: Move compaction_manager_test::propagate_replacement()
The purpose of this method is to turn public the private
compaction_manager method of the same name. The caller of this method is
having sstable_test_env at hand with its test_env_compaction_manager, so
the de-private-isation call can be moved.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-13 11:44:51 +03:00
Kefu Chai
efd65aebb2 build: cmake: add check-header target
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.

we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15913
2023-11-13 10:27:06 +02:00
Avi Kivity
7b08886e8d Update tools/java submodule (dependencies update)
* tools/java 86a200e324...97c490947c (1):
  > Merge 'build: update several dependencies' from Piotr Grabowski

Ref https://github.com/scylladb/scylla-tools-java/issues/348
Ref https://github.com/scylladb/scylla-tools-java/issues/349
Ref https://github.com/scylladb/scylla-tools-java/issues/350
2023-11-12 18:17:04 +02:00
Nadav Har'El
7f34006ce2 test/cql-pytest: fix test_permissions.py to not fail on Cassandra
We shouldn't have cql-pytest tests that report failure when run on
Cassandra (with test/cql-pytest/run-cassandra): A test that passes
on Scylla but fails on Cassandra indicates a *difference* between
Scylla's behavior and Cassandra's, and this difference should always
be investigated:

 1. It can be a Scylla bug, which of should be fixed immediately
    or reported as a bug and the test changed to fail on Scylla ("xfail").

 2. It can be a minor difference in Scylla's and Cassandra's
    behavior where both can be accepted. In this case the test should
    me modified to accept both behaviors, and a comment added to
    explain why we decided to do that.

 3. It can be a Cassandra bug which causes a correct test to fail.
    This case should not be taken lightly, and a serious effort
    is needed to be convinced that this is really a Cassandra bug
    and not our misunderstanding of what Cassandra does. In
    this case the test should be marked "cassandra_bug" and a
    detailed comment should explain why.

 4. Or it can be an outright bug in the test that caused it to fail
    on Cassandra.

This test had most of these cases :-) There was a test bug in one place
(in a Cassandra-specific Java UDF), a minor and (aruably) acceptable
difference between the error codes returned by Scylla and Cassandra
in one case, and two minor Cassandra bugs (in the error path). All
of these are fixed here, and after this patch test/cql-pytest/run-cassandra
no longer fails on this file.

Fixes #15969

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-12 17:14:09 +02:00
Nadav Har'El
0ecf84e83e test/cql-pytest: add test for DROP FUNCTION
We already have in test/cql-pytest various tests for UDF in the bigger
context of UDA (test_uda.py), WASM (test_wasm.py) and permissions, but
somehow we never had a file for simple tests only for UDF, so we
add one here, test/cql-pytest/test_udf.py

We add a test for checking something which was already assumed in
test_permissions.py - that it is possible to create two different
UDFs with the same name and different parameters, and then you must
specify the parameters when you want to DROP one of them. The test
confirms that ScyllaDB's and Cassandra's behavior is identical in
this, as hoped.

To allow the test to run on both ScyllaDB and Cassandra, it needs to
support both Lua (for ScyllaDB) or Java (for Cassandra), and we introduce
a fixture to make it easier to support both. This fixture can later
be used in more tests added to this file.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2023-11-12 17:14:08 +02:00
Tomasz Grabiec
457d170078 Merge 'Multishard mutation query test fix misses expectations' from Botond Dénes
There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well
as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many
shards we have without readers on them.

Fixes: https://github.com/scylladb/scylladb/issues/14087

Closes scylladb/scylladb#15806

* github.com:scylladb/scylladb:
  test/boost/multishard_mutation_query_test: fix querier cache misses expectations
  test/lib/test_utils: add require_* variants for all comparators
2023-11-12 13:15:29 +01:00
Benny Halevy
68a7bbe582 compaction_manager: perform_cleanup: ignore condition_variable_timed_out
The polling loop was intended to ignore
`condition_variable_timed_out` and check for progress
using a longer `max_idle_duration` timeout in the loop.

Fixes #15669

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#15671
2023-11-12 13:53:51 +02:00
Patryk Jędrzejczak
2d7bfeb3fa raft topology: fix indentation
Broken in the previous commit.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
e94c7cff28 raft topology: join: try sending the response only once
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response.
In the previous commit, we have made the operator responsible for
shutting down the joining node if the topology coordinator fails
to deliver a response by removing the timeout. In this commit, we
adjust the topology coordinator. We make it try sending the
response (both acceptance and rejection) only once since we do not
care if it fails anymore. We only need to ensure that the joining
node is moved to the left state if sending fails.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
4ffa692cb3 raft topology: join: do not time out waiting for the node to be joined
When a node tries to join the cluster, it asks the topology
coordinator to add them and then waits for the response. The
response is not guaranteed to come back. If the topology
coordinator cannot contact the joining node, it moves the node to
the left state and moves on.

Currently, to handle the case when the response does not come back,
the joining node gives up waiting for it after 3 minutes. However,
it might take more time for the topology coordinator to start
handling the request to join, as it might be working on other tasks
like adding other nodes, performing tablet migrations, etc. In
general, any timeout duration would be unreliable.

Therefore, we get rid of the timeout. From now on, the operator
will be responsible for shutting down the node if the topology
coordinator fails to deliver the rejection.

This change additionally fixes the TODO in
raft_group0::join_group0.
2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak
5f36e1d7f2 group 0: group0_handshaker: add the abort_source parameter to post_server_start
Used in the following commit to enable the clean shutdown of a
node that does not receive the join rejection from the topology
coordinator.
2023-11-10 12:35:38 +01:00
Anna Stuchlik
8d618bbfc6 doc: update cqlsh compatibility with Python
This commit updates the cqlsh compatibility
with Python to Python 3.

In addition it:
- Replaces "Cassandra" with "ScyllaDB" in
  the description of cqlsh.
  The previous description was outdated, as
  we no longer can talk about using cqlsh
  released with Cassandra.
- Replaces occurrences of "Scylla" with "ScyllaDB".
- Adds additional locations of cqlsh (Docker Hub
  and PyPI), as well as the link to the scylla-cqlsh
  repository.

Closes scylladb/scylladb#16016
2023-11-10 09:19:41 +02:00
Avi Kivity
d8bf8f0f43 Merge 'Do not create directories in datadir for S3-backed sstables' from Pavel Emelyanov
After 146e49d0dd (Rewrap keyspace population loop) the datadir layout is no longer needed by sstables boot-time loader and finally directories can be omitted for S3-backed keyspaces. Tables of that keyspace don't touch/remove their datadirs either (snapshots still don't work for S3)

fixes: #13020

Closes scylladb/scylladb#16007

* github.com:scylladb/scylladb:
  test/object_store: Check that keyspace directory doesn't appear
  sstables/storage: Do storage init/destroy based on storage options
  replica/{ks|cf}: Move storage init/destroy to sstables manager
  database: Add get_sstables_manager(bool_class is_system) method
2023-11-09 20:35:13 +02:00
Kamil Braun
3bcee6a981 Revert "Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani"
This reverts commit 7c7baf71d5.

If `stop_gracefully` times out during test teardown phase, it crashes
the test framework reporting multiple errors, for example:
```
12:35:52  /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
12:35:52    self.exit_artifacts = {}
12:35:52  RuntimeWarning: Enable tracemalloc to get the object allocation traceback
12:35:52  Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:52  Traceback (most recent call last):
12:35:52    File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for
12:35:52      return fut.result()
12:35:52             ^^^^^^^^^^^^
12:35:52    File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait
12:35:52      return await self._transport._wait()
12:35:52             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12:35:52    File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait
12:35:52      return await waiter
12:35:52             ^^^^^^^^^^^^
12:35:52  asyncio.exceptions.CancelledError
12:35:52
12:35:52  The above exception was the direct cause of the following exception:
12:35:52
12:35:52  Traceback (most recent call last):
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully
12:35:52      await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS)
12:35:52    File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for
12:35:52      raise exceptions.TimeoutError() from exc
12:35:52  TimeoutError
12:35:52
12:35:52  During handling of the above exception, another exception occurred:
12:35:52
12:35:52  Traceback (most recent call last):
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789
12:35:52      code = await main()
12:35:52             ^^^^^^^^^^^^
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main
12:35:52      await run_all_tests(signaled, options)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests
12:35:52      await reap(done, pending, signaled)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap
12:35:52      result = coro.result()
12:35:52               ^^^^^^^^^^^^^
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run
12:35:52      await test.run(options)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run
12:35:52      async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager:
12:35:52    File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__
12:35:52      await anext(self.gen)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager
12:35:52      await manager.stop()
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop
12:35:52      await self.clusters.put(self.cluster, is_dirty=True)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put
12:35:52      await self.destroy(obj)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster
12:35:52      await cluster.stop_gracefully()
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully
12:35:52      await asyncio.gather(*(server.stop_gracefully() for server in self.running.values()))
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully
12:35:52      raise RuntimeError(
12:35:52  RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:58  sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited
12:35:58  sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
```
2023-11-09 12:30:35 +01:00
Gleb Natapov
2dd8152c8b storage_service: topology coordinator: log rollback event before changing node's state
The test for the rollback relies on the log to be there after operation
fails, but if node's state is changed before the log the operation may
fail before the log is printed.

Fixes scylladb/scylladb#15980

Message-ID: <ZUuwoq65SJcS+yTH@scylladb.com>
2023-11-09 12:11:58 +01:00
Botond Dénes
d8b6771eb8 Merge 'doc: add CQL Reference for Materialized Views and remove irrelevant version information' from Anna Stuchlik
This PR is a follow-up to https://github.com/scylladb/scylladb/pull/15742#issuecomment-1766888218.
It adds CQL Reference for Materialized Views to the Materialized Views page.

In addition, it removes the irrelevant information about when the feature was added and replaces "Scylla" with "ScyllaDB".

(nobackport)

Closes scylladb/scylladb#15855

* github.com:scylladb/scylladb:
  doc: remove versions from Materialized Views
  doc: add CQL Reference for Materialized Views
2023-11-09 10:43:11 +01:00
Botond Dénes
1cccc86813 Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk"
This reverts commit 2860d43309, reversing
changes made to a3621dbd3e.

Reverting because rest_api.test_compaction_task started failing after
this was merged.

Fixes: #16005
2023-11-09 10:43:11 +01:00
Eliran Sinvani
c5956957f3 use_statement: Covert an exception to a future exception
The use statement execution code can throw if the keyspace is
doesn't exist, this can be a problem for code that will use
execute in a fiber since the exception will break the fiber even
if `then_wrapped` is used.

Fixes #14449

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes scylladb/scylladb#14394
2023-11-09 10:43:11 +01:00
Pavel Emelyanov
7e1017c7d8 test/object_store: Check that keyspace directory doesn't appear
When creating a S3-backed keyspace its storage dir shouldn't be made.
Also it shouldn't be "resurrected" by boot-time loader of existing
keyspaces.

For extra confidence check that the system keyspace's directory does
exists where the test expects keyspaces' directories to appear.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
f6eae191ff sstables/storage: Do storage init/destroy based on storage options
It's only local storage type that needs directores touch/remove, S3
storage initialization is for now a no-op, maybe some day soon it will
appear.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
11b704e8b8 replica/{ks|cf}: Move storage init/destroy to sstables manager
It's the manager that knows about storages and it should init/destroy
it. Also the "upload" and "staging" paths are about to be hidden in
sstables/ code, this code move also facilitates that.

The indentation in storage.cc is deliberately broken to make next patch
look nicer (spoiler: it won't have to shift those lines right).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Pavel Emelyanov
68cf26587c database: Add get_sstables_manager(bool_class is_system) method
There's one place that does this selection, soon there will appear
another, so it's worth having a convenience helper getter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-08 20:23:16 +03:00
Michał Chojnowski
206e313c60 mutation_query_test: test that range tombstones are sent in reverse queries
Reproducer for #10598.
2023-11-08 14:54:48 +01:00
Michał Chojnowski
002357e238 mutation_query: properly send range tombstones in reverse queries
reconcilable_result_builder passes range tombstone changes to _rt_assembler
using table schema, not query schema.
This means that a tombstone with bounds (a; b), where a < b in query schema
but a > b in table schema, are not be emitted from mutation_query.

This is a very serious bug, because it means that such tombstones in reverse
queries are not reconciled with data from other replicas.
If *any* queried replica has a row, but not the range tombstone which deleted
the row, the reconciled result will contain the deleted row.

In particular, range deletes performed while a replica is down, will not
later be visible to reverse queries which select this replica, regardless of the
consistency level.

As far as I can see, this doesn't result in any persistent data loss.
Only in that some data might appear resurrected to reverse queries,
until the relevant range tombstone is fully repaired.
2023-11-08 14:54:48 +01:00
Nadav Har'El
6453f41ca9 Merge 'schema: add whitespaces to values of table options' from Michał Jadwiszczak
Add a space after each colon and comma (if they don't have any after) in values of table option which are json objects (`caching`, `tombstone_gc` and `cdc`).
This improves readability and matches client-side describe format.

Fixes: #14895

Closes scylladb/scylladb#15900

* github.com:scylladb/scylladb:
  cql-pytest:test_describe: add test for whitespaces in json objects
  schema: add whitespace to description of  table options
2023-11-08 15:26:49 +02:00
Anna Stuchlik
ca0f5f39b5 doc: fix info about in 5.4 upgrade guide
This commit fixes the information about
Raft-based consistent cluster management
in the 5.2-to-5.4 upgrade guide.

This a follow-up to https://github.com/scylladb/scylladb/pull/15880 and must be backported to branch-5.4.

In addition, it adds information about removing
DateTieredCompactionStrategy to the 5.2-to-5.4
upgrade guide, including the guideline to
migrate to TimeWindowCompactionStrategy.

Closes scylladb/scylladb#15988
2023-11-08 13:21:53 +01:00
Kamil Braun
3036a80334 docs: mention Raft getting enabled when upgrading to 5.4
Fixes: scylladb/scylladb#15952

Closes scylladb/scylladb#16000
2023-11-08 14:18:29 +02:00
Raphael S. Carvalho
b551f4abd2 streaming: Improve partition estimation with TWCS
When off-strategy is disabled, data segregation is not postponed,
meaning that getting partition estimate right is important to
decrease filter's false positives. With streaming, we don't
have min and max timestamps at destination, well, we could have
extended the RPC verb to send them, but turns out we can deduce
easily the amount of windows using default TTL. Given partitioner
random nature, it's not absurd to assume that a given range being
streamed may overlap with all windows, meaning that each range
will yield one sstable for each window when segregating incoming
data. Today, we assume the worst of 100 windows (which is the
max amount of sstables the input data can be segregated into)
due to the lack of metadata for estimating the window count.
But given that users are recommended to target a max of ~20
windows, it means partition estimate is being downsized 5x more
than needed. Let's improve it by using default TTL when
estimating window count, so even on absence of timestamp
metadata, the partition estimation won't be way off.

Fixes #15704.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-11-08 12:10:03 +02:00
Kamil Braun
f094e23d84 system_keyspace: use system memory for system.raft table
`system.raft` was using the "user memory pool", i.e. the
`dirty_memory_manager` for this table was set to
`database::_dirty_memory_manager` (instead of
`database::_system_dirty_memory_manager`).

This meant that if a write workload caused memory pressure on the user
memory pool, internal `system.raft` writes would have to wait for
memtables of user tables to get flushed before the write would proceed.

This was observed in SCT longevity tests which ran a heavy workload on
the cluster and concurrently, schema changes (which underneath use the
`system.raft` table). Raft would often get stuck waiting many seconds
for user memtables to get flushed. More details in issue #15622.
Experiments showed that moving Raft to system memory fixed this
particular issue, bringing the waits to reasonable levels.

Currently `system.raft` stores only one group, group 0, which is
internally used for cluster metadata operations (schema and topology
changes) -- so it makes sense to keep use system memory.

In the future we'd like to have other groups, for strongly consistent
tables. These groups should use the user memory pool. It means we won't
be able to use `system.raft` for them -- we'll just have to use a
separate table.

Fixes: scylladb/scylladb#15622

Closes scylladb/scylladb#15972
2023-11-08 11:21:14 +02:00
Nadav Har'El
284534f489 Merge 'Nodetool additional commands 4/N' from Botond Dénes
This PR implements the following new nodetool commands:
* snapshot
* drain
* flush
* disableautocompaction
* enableautocompaction

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15939

* github.com:scylladb/scylladb:
  test/nodetool: add README.md
  tools/scylla-nodetool: implement enableautocompaction command
  tools/scylla-nodetool: implement disableautocompaction command
  tools/scylla-nodetool: implement the flush command
  tools/scylla-nodetool: extract keyspace/table parsing
  tools/scylla-nodetool: implement the drain command
  tools/scylla-nodetool: implement the snapshot command
  test/nodetool: add support for matching aproximate query parameters
  utils/http: make dns_connection_factory::initialize() static
2023-11-08 11:18:35 +02:00
Kefu Chai
cf70970226 build: cmake: use $<CONFIG:cfgs> when appropriate
since CMake 3.19, we are able to use $<CONFIG:cfgs> instead of
the more cubersume $<IN_LIST:$<CONFIG>,foo;bar> expression for
checking if a config is in a list of configurations.
and since the minimal required CMake of scylla is 3.27, so let's
use $<CONFIG:cfgs> when possible.

see also https://cmake.org/cmake/help/git-stage/manual/cmake-generator-expressions.7.html#configuration-expressions

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15989
2023-11-08 08:50:44 +02:00
Nadav Har'El
3729ea8bfd cql-pytest: translate Cassandra's test for CREATE operations
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.

The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:

Refs #6442: Always print all schema parameters (including default values)
Refs #8001: Documented unit "µs" not supported for assigning a duration"
            type.
Refs #8892: Add an option for default RF for new keyspaces.
Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:

1. Tests for CREATE TRIGGER were commented out because testing them
   in Cassandra requires adding a Java class for the test. We're also
   not likely to ever add this feature to Scylla (Refs #2205).

2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
   used internal Java APIs instead of CQL, and it also unlikely
   we'll ever implement it in a way compatible with Cassandra because
   of its Java reliance.

3. One test for data center names used internal Cassandra Java APIs, not
   CQL to create mock data centers and snitches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#15791
2023-11-08 08:46:27 +02:00
Botond Dénes
2860d43309 Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.

Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.

Closes scylladb/scylladb#15083

* github.com:scylladb/scylladb:
  test: test abort of compaction task that isn't started yet
  test: test running compaction task abort
  tasks: fail if a task was aborted
  compaction: abort task manager compaction tasks
2023-11-08 08:45:16 +02:00
Asias He
194507dffa repair: Convert put_row_diff_with_rpc_stream to use coroutine
It will be easier to add more logics in this function.
2023-11-08 13:52:34 +08:00
Nadav Har'El
a3621dbd3e Merge 'Alternator: Support new ReturnValuesOnConditionCheckFailure feature' from Marcin Maliszkiewicz
alternator: add support for ReturnValuesOnConditionCheckFailure feature

As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed.

Fixes https://github.com/scylladb/scylladb/issues/14481

Closes scylladb/scylladb#15125

* github.com:scylladb/scylladb:
  alternator: add support for ReturnValuesOnConditionCheckFailure feature
  alternator: add ability to send additional fields in api_error
2023-11-07 23:19:51 +02:00
Takuya ASADA
a4aeef2eb0 scylla_util.py: run apt-get update before apt-get install if it necessary
Unlike yum, "apt-get install" may fails because package cache is outdated.
Let's check package cache mtime and run "apt-get update" if it's too old.

Fixes #4059

Closes scylladb/scylladb#15960
2023-11-07 20:40:16 +02:00
Wojciech Mitros
ab743271f1 test: increase timeout for lua UDF execution
When running on a particularly slow setup, for example on
an ARM machine in debug mode, the execution time of even
a small Lua UDF that we're using in tests may exceed our
default limits.
To avoid timeout errors, the limit in tests is now increased
to a value that won't be exceeded in any reasonable scenario
(for the current set of tested UDFs), while not making the
test take an excessive amount of time in case of an error in
the UDF execution.

Fixes #15977

Closes scylladb/scylladb#15983
2023-11-07 20:28:28 +02:00
Kamil Braun
07e9522d6c Merge 'raft topology: handle abort exceptions better in fence_previous_coordinator' from Piotr Dulikowski
When topology coordinator tries to fence the previous coordinator it
performs a group0 operation. The current topology coordinator might be
aborted in the meantime, which will result in a `raft::request_aborted`
exception being thrown. After the fix to scylladb/scylladb#15728 was
merged, the exception is caught, but then `sleep_abortable` is called
which immediately throws `abort_requested_exception` as it uses the same
abort source as the group0 operation. The `fence_previous_coordinator`
function which does all those things is not supposed to throw
exceptions, if it does - it causes `raft_state_monitor_fiber` to exit,
completely disabling the topology coordinator functionality on that
node.

Modify the code in the following way:

- Catch `abort_requested_exception` thrown from `sleep_abortable` and
  exit the function if it happens. In addition to the described issue,
it will also handle the case when abort is requested while
`sleep_abortable` happens,
- Catch `raft::request_aborted` thrown from group0 operation, log the
  exception with lower verbosity and exit the function explicitly.

Finally, wrap both `fence_previous_coordinator` and `run` functions in a
`try` block with `on_fatal_internal_error` in the catch handler in order
to implement the behavior that adding `noexcept` was originally supposed
to introduce.

Fixes: scylladb/scylladb#15747

Closes scylladb/scylladb#15948

* github.com:scylladb/scylladb:
  raft topology: catch and abort on exceptions from topology_coordinator::run
  Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept"
  raft topology: don't print an error when fencing previous coordinator is aborted
  raft topology: handle abort exceptions from sleeping in fence_previous_coordinator
2023-11-07 17:17:49 +01:00
Botond Dénes
60ea940f9e Merge 'docs: render options with role' from Kefu Chai
this series tries to

1. render options with role. so the options can be cross referenced and defined.
2. move the formatting out of the content. so the representation can be defined in a more flexible way.

Closes scylladb/scylladb#15860

* github.com:scylladb/scylladb:
  docs: add divider using CSS
  docs: extract _clean_description as a filter
  docs: render option with role
  docs: parse source files right into rst
2023-11-07 17:01:22 +02:00
Botond Dénes
3088453a09 test/nodetool: add README.md 2023-11-07 09:49:56 -05:00
Botond Dénes
7ff7cdc86a tools/scylla-nodetool: implement enableautocompaction command 2023-11-07 09:49:56 -05:00
Botond Dénes
0e0401a5c5 tools/scylla-nodetool: implement disableautocompaction command 2023-11-07 09:49:56 -05:00
Botond Dénes
f5083f66f5 tools/scylla-nodetool: implement the flush command 2023-11-07 09:49:56 -05:00
Botond Dénes
f082cc8273 tools/scylla-nodetool: extract keyspace/table parsing
Having to extract 1 keyspace and N tables from the command-line is
proving to be a common pattern among commands. Extract this into a
method, so the boiler-plate can be shared. Add a forward-looking
overload as well, which will be used in the next patch.
2023-11-07 09:49:56 -05:00
Botond Dénes
ec5b24550a tools/scylla-nodetool: implement the drain command 2023-11-07 09:49:56 -05:00
Botond Dénes
598dbd100d tools/scylla-nodetool: implement the snapshot command 2023-11-07 09:49:56 -05:00
Benny Halevy
6a628dd9a6 docs: operating-scylla: nodetool: improve documentation for {en,dis}ableautocompaction
Fixes scylladb/scylladb#15554

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#15950
2023-11-07 14:05:55 +02:00
Kamil Braun
e64613154f Merge 'cleanup no longer used gossiper states' from Gleb
Remove no longer used gossiper states that are not needed even for
compatibility any longer.

* 'remove_unused_states' of github.com:scylladb/scylla-dev:
  gossip: remove unused HIBERNATE gossiper status
  gossip: remove unused STATUS_MOVING state
2023-11-07 11:48:04 +01:00
Botond Dénes
07c7109eb6 test/nodetool: add support for matching aproximate query parameters
Match paramateres within some delta of the expected value. Useful when
nodetool generates a timestamp, whose exact value cannot be predicted in
an exact manner.
2023-11-07 04:58:41 -05:00
Botond Dénes
b61822900b utils/http: make dns_connection_factory::initialize() static
Said method can out-live the factory instance. This was not a problem
because the method takes care to keep all its need from `this` alive, by
copying them to the coroutine stack. However, this fact that this method
can out-live the instance is not obvious, and an unsuspecting developer
(me) added a new member (_logger) which was not kept alive.
This can cause a use-after-free in the factory. Fix by making
initialize() static, forcing the instance to pass all parameters
explicitely and add a comment explaining that this method can out-live
the instance.
2023-11-07 04:39:33 -05:00
Pavel Emelyanov
9443253f3d Merge 'api: failure_detector: invoke on shard 0' from Kamil Braun
These APIs may return stale or simply incorrect data on shards
other than 0. Newer versions of Scylla are better at maintaining
cross-shard consistency, but we need a simple fix that can be easily and
without risk be backported to older versions; this is the fix.

Add a simple test to check that the `failure_detector/endpoints`
API returns nonzero generation.

Fixes: scylladb/scylladb#15816

Closes scylladb/scylladb#15970

* github.com:scylladb/scylladb:
  test: rest_api: test that generation is nonzero in `failure_detector/endpoints`
  api: failure_detector: fix indentation
  api: failure_detector: invoke on shard 0
2023-11-07 11:54:27 +03:00
Botond Dénes
76ab66ca1f Merge 'Support state change for S3-backed sstables' from Pavel Emelyanov
The sstable currently can move between normal, staging and quarantine state runtime. For S3-backed sstables the state change means maintaining the state itself in the ownership table and updating it accordingly.

There's also the upload facility that's implemented as state change too, but this PR doesn't support this part.

fixes: #13017

Closes scylladb/scylladb#15829

* github.com:scylladb/scylladb:
  test: Make test_sstables_excluding_staging_correctness run over s3 too
  sstables,s3: Support state change (without generation change)
  system_keyspace: Add state field to system.sstables
  sstable_directory: Tune up sstables entries processing comment
  system_keyspace: Tune up status change trace message
  sstables: Add state string to state enum class convert
2023-11-07 10:45:41 +02:00
Botond Dénes
74f68a472f Merge 'doc: add the upgrade guide from 5.2 to 5.4' from Anna Stuchlik
This PR adds the 5.2-5.4 upgrade guide.
In addition, it removes the redundant upgrade guide from 5.2 to 5.3 (as 5.3 was skipped), as well as some mentions of version 5.3.

This PR must be backported to branch-5.4.

Closes scylladb/scylladb#15880

* github.com:scylladb/scylladb:
  doc: add the upgrade guide from 5.2 to 5.4
  doc: remove version "5.3" from the docs
  doc: remove the 5.2-to-5.3 upgrade guide
2023-11-07 10:35:33 +02:00
David Garcia
afaeb30930 docs: add dynamic version on aws images extension
Closes scylladb/scylladb#15940
2023-11-07 10:30:23 +02:00
Takuya ASADA
2e7552a0ca dist/redhat: drop rpm conflict with ABRT, add systemd conflict instead
Currently, "yum install scylla" causes conflict when ABRT is installed.

To avoid this behavior and keep using systemd-coredump for scylla
coredump, let's drop "Conflicts: abrt" from rpm and
add "Conflicts=abrt-ccpp.service" to systemd unit.

Fixes #892

Closes scylladb/scylladb#15691
2023-11-07 10:30:23 +02:00
Botond Dénes
2f0284f30d Merge 'build: cmake: configure all available config types' from Kefu Chai
in this series, instead of assuming that we always have only one single `CMAKE_BUILD_TYPE`, we configure all available configurations, to be better prepared for the multi-config support.

Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15933

* github.com:scylladb/scylladb:
  build: cmake: set compile options with generator expression
  build: cmake: configure all available config types
  build: cmake: set per-mode stack usage threshold
  build: cmake: drop build_mode
  build: cmake: check for config type if multi-config is used
2023-11-07 09:45:57 +02:00
Botond Dénes
7679152209 Merge 'Sanitize usage of make_sstable_easy+make_memtable in tests' from Pavel Emelyanov
The helper makes sstable, writes mutations into it and loads one. Internally it uses the make_memtable() helper that prepares a memtable out of a vector of mutations. There are many test cases that don't use these facilities generating some code duplication.

The make_sstable() wrapper around make_sstable_easy() is removed along the way.

Closes scylladb/scylladb#15930

* github.com:scylladb/scylladb:
  tests: Use make_sstable_easy() where appropriate
  sstable_conforms_to_mutation_source_test: Open-code the make_sstable() helper
  sstable_mutation_test: Use make_sstable_easy() instead of make_sstable()
  tests: Make use of make_memtable() helper
  tests: Drop as_mutation_source helper
  test/sstable_utils: Hide assertion-related manipulations into branch
2023-11-07 09:29:30 +02:00
Kefu Chai
882e7eca25 build: cmake: set compile options with generator expression
instead of using a single compile option for all modes, use per-mode
compile options. this change keeps us away from using `CMAKE_BUILD_TYPE`
directly, and prepares us for the multi-config generator support.

because we only apply these settings in the configurations where
sanitizers are used, there is no need to check if these option can be
accepted by the compiler. if this turns out to be a problem, we can
always add the check back on a per-mode basis.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:35:20 +08:00
Kefu Chai
61a542ffd0 build: cmake: configure all available config types
if `CMAKE_CONFIGURATION_TYPES` is set, it implies that the
multi-config generator is used, in this case, we include all
available build types instead of only the one specified by
`CMAKE_BUILD_TYPE`, which is typically used by non-multi-config
generators.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:14:33 +08:00
Kefu Chai
6fcff51cf1 build: cmake: set per-mode stack usage threshold
instead of setting a single stack usage threshold, set per-mode
stack usage threshold. this prepares for the support of
multi-config generator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:13:50 +08:00
Kefu Chai
23bb644314 build: cmake: drop build_mode
there is no benefit having this variable. and it introduces
another layer of indirection. so drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:10:59 +08:00
Kefu Chai
7369e2e3df build: cmake: check for config type if multi-config is used
we should not set_property() on a non-existant property. if a multi-config
generator is used, `CMAKE_BUILD_TYPE` is not added as a cached entry at all.

Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-07 10:10:59 +08:00
Paweł Zakrzewski
9e240c2dc8 test/cql-pytest: Verify that GRANT ALTER ALL allows changing the superuser password
This is a test for #14277. We do want to match Cassandra's behavior,
which means that a user who is granted ALTER ALL is able to change
the password of a superuser.

Closes scylladb/scylladb#15961
2023-11-06 18:39:53 +01:00
Takuya ASADA
a23278308f dist: fix local-fs.target dependency
systemd man page says:

systemd-fstab-generator(3) automatically adds dependencies of type Before= to
all mount units that refer to local mount points for this target unit.

So "Before=local-fs.taget" is the correct dependency for local mount
points, but we currently specify "After=local-fs.target", it should be
fixed.

Also replaced "WantedBy=multi-user.target" with "WantedBy=local-fs.target",
since .mount are not related with multi-user but depends local
filesystems.

Fixes #8761

Closes scylladb/scylladb#15647
2023-11-06 18:39:53 +01:00
Kefu Chai
d78ccab337 test/s3: add --keep-tmp option to preserve the tmp dir
before this change, the tempdir is always nuked no matter if the
test succceds. but sometimes, it would be important to check
scylla's sstables after the test finishes.

so, in this change, an option named `--keep-tmp` is added so
we can optionally preserve the temp directory. this option is off
by default.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15949
2023-11-06 18:39:53 +01:00
Anna Stuchlik
3756705520 doc: add OS support in version 5.4
This commit adds OS support information
in version 5.4 (removing the non-released
version 5.3).

In particular, it adds support for Oracle Linux
and Amazon Linux.

Also, it removes support for outdated versions.

Closes scylladb/scylladb#15923
2023-11-06 18:39:53 +01:00
Anna Stuchlik
1e0cbfe522 doc: update package installation in version 5.4
This commit updates the package installation
instructions in version 5.4.
- It updates the variables to include "5.4"
  as the version name.
- It adds the information for the newly supported
  Rocky/RHEL 9 - a new EPEL download link is required.

Closes scylladb/scylladb#15963
2023-11-06 18:39:53 +01:00
Pavel Emelyanov
bcec9c4ffc Merge 'test/object_store: PEP8 compliant cleanups' from Kefu Chai
this series applies fixes to make the test more PEP8 compliant. the goal is to improve the readability and maintainability.

Closes scylladb/scylladb#15946

* github.com:scylladb/scylladb:
  test/object_store: wrap line which is too long
  test/object_store: use pattern matching to capture variable in loop
  test/object_store: remove space after and before '{' and '}'
  test/object_store: add an empty line before nested function definition
  test/object_store: use two empty lines in-between global functions
2023-11-06 18:39:53 +01:00
Benny Halevy
0064fc55b0 interval: make default ctor and make_open_ended_both_sides constexpr
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#15955
2023-11-06 18:39:53 +01:00
Kefu Chai
39340d23e5 storage_service: avoid using non-constexpr as format string
in order to use compile-time format check, we would need to use
compile-time constexpr for the format string. despite that we
might be able to find a way to tell if an expression is compile-time
constexpr in C++20, it'd be much simpler to always use a
known-to-be-constexpr format string. this would help us to eventually
migrate to the compile-time format check in seastar's logging subsystem.

so, in this change, instead of feeding `seastar::logger::info()` and
friends with a non-constexpr format string, let's just use "{}" for
printing it, or mark the format string with `constexpr` instead of
`const`. as the former tells the compiler it is a variable that
can be evaluated at compile-time, while the latter just inform the
compiler that the variable is not mutable after it is initialized.

This change also helps to address the compiling failure with the
yet-merged compile-time format check patch in Seastar:

```
/usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Og -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result "-Wno-error=#warnings" -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/service.dir/storage_service.cc.o -MF service/CMakeFiles/service.dir/storage_service.cc.o.d -o service/CMakeFiles/service.dir/storage_service.cc.o -c /home/kefu/dev/scylladb/service/storage_service.cc
/home/kefu/dev/scylladb/service/storage_service.cc:2460:18: error: call to consteval function 'seastar::logger::format_info<>::format_info<const char *, 0>' is not a constant expression
    slogger.info(str.c_str());
                 ^
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15959
2023-11-06 18:39:53 +01:00
Kamil Braun
315c69cec2 test: rest_api: test that generation is nonzero in failure_detector/endpoints 2023-11-06 18:03:34 +01:00
Kamil Braun
eb6943b852 api: failure_detector: fix indentation 2023-11-06 17:12:17 +01:00
Kamil Braun
a89c69007e api: failure_detector: invoke on shard 0
These APIs may return stale or simply incorrect data on shards
other than 0. Newer versions of Scylla are better at maintaining
cross-shard consistency, but we need a simple fix that can be easily and
without risk be backported to older versions; this is the fix.

Fixes: scylladb/scylladb#15816
2023-11-06 17:03:38 +01:00
Piotr Dulikowski
85516c9155 raft topology: catch and abort on exceptions from topology_coordinator::run
The `topology_coordinator` function is supposed to handle all of the
exceptions internally. Assert, in runtime, that this is the case by
wrapping the `run` invocation with a try..catch; in case of an
exception, step down as a leader first and then abort.
2023-11-06 15:25:38 +01:00
Anna Stuchlik
a6fd4cccf2 doc: add the upgrade guide from 5.2 to 5.4
This commit adds the upgrade guide from
version 5.2 to 5.4.
Version 5.3 was never released.

This commit must be backported to branch-5.4.
2023-11-06 14:48:26 +01:00
Piotr Dulikowski
843f02eb5d Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept"
This reverts commit dcaaa74cd4. The
`noexcept` specifier that it added is only relevant to the function and
not the coroutine returned from that function. This was not the
intention and it looks confusing now, so remove it.
2023-11-06 12:00:42 +01:00
Piotr Dulikowski
41c2dac250 raft topology: don't print an error when fencing previous coordinator is aborted
An attempt to fence the previous coordinator may fail because the
current coordinator is aborted. It's not a critical error and it can
happen during normal operations, so lower the verbosity used to print a
message about this error to 'debug'.

Return from the function immediately in that case - the sleep_aborted
that happens as a next step would fail on abort_requested_exception
anyway, so make it more explicit.
2023-11-06 12:00:42 +01:00
Piotr Dulikowski
1408b7cfa8 raft topology: handle abort exceptions from sleeping in fence_previous_coordinator
The fence_previous_coordinator function has a retry loop: if it fails to
perform a group0 operation, it will try again after a 1 second delay.
However, if the topology coordinator is aborted while it waits, an
exception will be thrown and will be propagated out of the function. The
function is supposed to handle all exceptions internally, so this is not
desired.

Fix this by catching the abort_requested_exception and returning from
the function if the exception is caught.
2023-11-06 12:00:41 +01:00
Michał Jadwiszczak
213e39a937 cql-pytest:test_describe: add test for whitespaces in json objects 2023-11-06 10:37:10 +01:00
Kamil Braun
15b441550b gossiper: do_shadow_round: increment nodes_down in case of timeout
Previously we would only increment `nodes_down` when getting
`rpc::closed_error`. Distinguishing between that and timeout is
unreliable. Consider:
1. if a node is dead but we can reach the IP, we'd get `closed_error`
2. if we cannot reach the IP (there's a network partition), the RPC
   would hang so we'd get `timeout_error`
3. if the node is both dead and the IP is unreachable, we'd get
   `timeout_error`

And there are probably other more complex scenarios as well. In general,
it is impossible to distinguish a dead node from a partitioned node in
asynchronous networks, and whether we end up with `closed_error` or
`timeout_error` is an implementation detail of the underlying protocol
that we use.

The fact that `nodes_down` was not incremented for timeouts would
prevent a node from starting if it cannot reach isolated IPs (whether or
not there were dead or alive nodes behind those IPs). This was observed
in a Jepsen test: https://github.com/scylladb/scylladb/issues/15675.

Note that `nodes_down` is only used to skip shadow round outside
bootstrap/replace, i.e. during restarts, where the shadow round was
"best effort" anyway (not mandatory). During bootstrap/replace it is now
mandatory.

Also fix grammar in the error message.
2023-11-06 10:28:08 +01:00
Kamil Braun
897cb6510e gossiper: do_shadow_round: fix nodes_down calculation
During shadow round we would calculate the number of nodes from which we
got `rpc::closed_error` using `nodes_counter`, and if the counter
reached the size of all contact points passed to shadow round, we would
skip the shadow round (and after the previous commit, we do it only in
the case of restart, not during bootstrap/replace which is unsafe).

However, shadow round might have multiple loops, and `nodes_down` was
initialized to `0` before the loop, then reused. So the same node might
be counted multiple times in `nodes_down`, and we might incorrectly
enter the skipping branch. Or we might go over `nodes.size()` and never
finish the loop.

Fix this by initializing `nodes_down = 0` inside the loop.
2023-11-06 10:28:07 +01:00
Kamil Braun
b03fa87551 storage_service: make shadow round mandatory during bootstrap/replace
It is unsafe to bootstrap or perform replace without performing the
shadow round, which is used to obtain features from the existing cluster
and verify that we support all enabled features.

Before this patch, I could easily produce the following scenario:
1. bootstrap first node in the cluster
2. shut it down
3. start bootstrapping second node, pointing to the first as seed
4. the second node skips shadow round because it gets
   `rpc::closed_error` when trying to connect to first node.
5. the node then passes the feature check (!) and proceeds to the next
   step, where it waits for nodes to show up in gossiper
6. we now restart the first node, and the second node finishes bootstrap

The shadow round must be mandatory during bootstrap/replace, which is
what this patch does.

On restart it can remain optional as it was until now. In fact it should
be completely unnecessary during restart, but since we did it until now
(as best-effort), we can keep doing it.
2023-11-06 10:28:07 +01:00
Kamil Braun
7e9e84200c gossiper: do_shadow_round: remove default value for nodes param 2023-11-06 10:28:07 +01:00
Kamil Braun
108aae09c5 gossiper: do_shadow_round: remove fall_back_to_syn_msg
If during shadow round we learned that a contact node does not
understand the GET_ENDPOINT_STATES verb, we'd fall back to old shadow
round method (using gossiper SYN messages).

The verb was added a long time ago and it ended up in Scylla 4.3 and
2021.1. So in newer versions we can make it mandatory, as we don't
support skipping major versions during upgrades. Even if someone
attempted to, they would just get an error and they can retry bootstrap
after finnishing upgrade.
2023-11-06 10:28:07 +01:00
Botond Dénes
2e1562d889 Merge 'dht: i_partitioner cleanup' from Benny Halevy
This series refactors the `dht/i_paritioner.hh` header file
and cleans up its usage so to reduce the dependencies on it,
since it is carries a lot of baggage that is rarely required in other header files.

Closes scylladb/scylladb#15954

* github.com:scylladb/scylladb:
  everywhere: reduce dependencies on i_partitioner.hh
  locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh
  cdc: cdc_partitioner: remove extraneous partition_key_view fwd declaration
  dht: reduce dependency on i_partitioner.hh
  dht: fold compatible_ring_position in ring_position.hh
  dht: refactor i_partitioner.hh
  dht: move token_comperator to token.{cc,hh}
  dht/i_partitioner: include i_partitioner_fwd.hh
2023-11-06 10:34:38 +02:00
Kefu Chai
2b961d8e3f build: cmake: define per-mode compile definition
instead of setting for a single CMAKE_BUILD_TYPE, set the compilation
definitions for each build configuration.

this prepares for the multi-config generator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15943
2023-11-06 10:34:38 +02:00
Kefu Chai
f2693752f1 build: cmake: avoid referencing CMAKE_BUILD_TYPE
use generator-expresion instead, so that the value can be evaluated
when generating the build system. this prepares for the multi-config
support.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15942
2023-11-06 10:34:38 +02:00
Botond Dénes
7c7baf71d5 Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani
This mini series purpose is to move all tests (that uses the infrastructure to create a Scylla cluster) to shut down gracefully
on shutdown.
One benefit is that the shutdown sequence for cluster will be tested better, however it is not the main purpose of this change. The main purpose of this change is to pave the way for coverage reporting on all tests and not only the ones that
has a standalone executables.

Full test runs are only slightly impacted by this change (~2.4% increase in runtime):

Without gracefull shutdown
```
time ./test.py --mode dev
Found 2966 tests.
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------
[2966/2966] topology_experimental_raft   dev   [ PASS ] topology_experimental_raft.test_raft_cluster_features.1
------------------------------------------------------------------------------
CPU utilization: 13.1%

real    4m50.587s
user    13m58.358s
sys     6m55.975s
```

With gracefull shutdown
```
time ./test.py --mode dev
Found 2966 tests.
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------
[2966/2966] topology_experimental_raft   dev   [ PASS ] topology_experimental_raft.test_raft_cluster_features.1
------------------------------------------------------------------------------
CPU utilization: 12.6%

real    4m57.637s
user    13m56.864s
sys     6m46.657s
```

Closes scylladb/scylladb#15851

* github.com:scylladb/scylladb:
  test.py: move to a gracefull temination of nodes on teardown
  test.py: Use stop lock also in the graceful version
2023-11-06 10:34:38 +02:00
Benny Halevy
a1acf6854b everywhere: reduce dependencies on i_partitioner.hh
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:47:44 +02:00
Benny Halevy
6de1cc2993 locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh
define token_metadata_ptr in token_metadata_fwd.hh
So that the declaration of `make_splitter` can be moved
to token_range_splitter.hh, where it belongs,
and so token_metadata.hh won't have to include it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:29 +02:00
Benny Halevy
182e5381d8 cdc: cdc_partitioner: remove extraneous partition_key_view fwd declaration
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:29 +02:00
Benny Halevy
4b184e950a dht: reduce dependency on i_partitioner.hh
include only the required header files where needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:29 +02:00
Benny Halevy
aa70e3a536 dht: fold compatible_ring_position in ring_position.hh
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:29 +02:00
Benny Halevy
28b5482403 dht: refactor i_partitioner.hh
Extract decorated_key.hh and ring_position.hh
out of i_partitioner.hh so they can be included
selectively, since i_partitioner.hh contains too much
bagage that is not always needed in full.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:27 +02:00
Benny Halevy
232918eef0 dht: move token_comperator to token.{cc,hh}
Move the `token_comparator` definition and
implementation to token.{hh,cc}, respectively
since they are independent of i_partitioner.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:15 +02:00
Benny Halevy
8309cf743e dht/i_partitioner: include i_partitioner_fwd.hh
Rather than repeating the same declarations in i_partitioner.hh

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-05 20:01:14 +02:00
Kefu Chai
08f8796cf0 test/object_store: wrap line which is too long
to be compliant to PEP8, see
https://peps.python.org/pep-0008/#blank-lines

also easier to read with smaller screen and/or large fonts.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Kefu Chai
5c0e4df624 test/object_store: use pattern matching to capture variable in loop
instead of referencing the elements in tuple with their indexes, use
pattern matching to capture them. for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Kefu Chai
6208a05c40 test/object_store: remove space after and before '{' and '}'
to be compliant with PEP8, see
https://peps.python.org/pep-0008/#whitespace-in-expressions-and-statements

for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Kefu Chai
231938f739 test/object_store: add an empty line before nested function definition
to be compliant to PEP8, see
https://peps.python.org/pep-0008/#blank-lines

for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Kefu Chai
38d5e7cae2 test/object_store: use two empty lines in-between global functions
to be compliant to PEP8, see
https://peps.python.org/pep-0008/#blank-lines

for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 21:29:31 +08:00
Michał Jadwiszczak
cbfbcffc75 schema: add whitespace to description of table options
Values of `caching`, `tombstone_gc` and `cdc` are json object but they
were printed without any whitespaces. This commit adds them after
colons(:) and commas(,), so the values are more readable and it matches
format of old client-side describe.
2023-11-04 12:30:19 +01:00
Kefu Chai
ff12f1f678 docs: add divider using CSS
instead of hardwiring the formatting in the html code, do this using
CSS, more flexible this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 00:22:34 +08:00
Kefu Chai
1694a7addc docs: extract _clean_description as a filter
would be better to split the parser from the formatter. in future,
we can apply more filter on top of the exiting one.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 00:22:34 +08:00
Kefu Chai
9ddc639237 docs: render option with role
so we can cross-reference them with the syntax like

:confval:`alternator_timeout_in_ms`.

or even render an option like:

.. confval:: alternator_timeout_in_ms

in order to make the headerlink of the option visible,
a new CSS rule is added.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 00:22:34 +08:00
Kefu Chai
53dfb5661d docs: parse source files right into rst
so we can render the rst without writing a temporary YAML.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-11-04 00:22:33 +08:00
Kamil Braun
6cc5bcae80 test: test_topology_ops: disable background writes
Recently, in a3ba4b3109, this test was
extended with a background task that continuously performs CQL writes.

This turned out to be very valuable and detected a couple of bugs,
including:
https://github.com/scylladb/scylladb/issues/15924
https://github.com/scylladb/scylladb/issues/15935

Unfortunately this causes CI to be flaky.
Until these bugs are fixed, we disable the background writes to unflake
CI.

Closes scylladb/scylladb#15937
2023-11-03 16:52:10 +02:00
Raphael S. Carvalho
cca85f5454 streaming: Don't adjust partition estimate if segregation is postponed
When off-strategy is enabled, data segregation is postponed to when
off-strategy runs. Turns out we're adjusting partition estimate even
when segregation is postponed, meaning that sstables in maintenance
set will smaller filters than they should otherwise have.
This condition is transient as the system eventually heal this
through compactions. But note that with TWCS, problem of inefficient
filters may persist for a long time as sstables written into older
windows may stay around for a significant amount of time.
In the future, we're planning to make this less fragile by dynamically
resizing filters on sstable write completion.
The problem aforementioned is solved by skipping adjustment when
segregation is postponed (i.e. off-strategy is enabled).

Refs #15704.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-11-03 16:22:07 +02:00
Asias He
2b2302d373 streaming: Ignore dropped table on both sides
It is possible the sender and receiver of streaming nodes have different
views on if a table is dropped or not.

For example:
- n1, n2 and n3 in the cluster

- n4 started to join the cluster and stream data from n1, n2, n3

- a table was dropped

- n4 failed to write data from n2 to sstable because a table was dropped

- n4 ended the streaming

- n2 checked if the table was present and would ignore the error if the table was dropped

- however n2 found the table was still present and was not dropped

- n2 marked the streaming as failed

This will fail the streaming when a table is dropped. We want streaming to
ignore such dropped tables.

In this patch, a status code is sent back to the sender to notify the
table is dropped so the sender could ignore the dropped table.

Fixes #15370

Closes scylladb/scylladb#15912
2023-11-03 13:38:48 +02:00
David Garcia
84e073d0ec docs: update theme 1.6
Closes scylladb/scylladb#15782
2023-11-03 09:45:16 +01:00
Piotr Dulikowski
70f4f8d799 test/pylib: increase control connection timeout in cql_is_up
After starting the associated node, ScyllaServer waits until the node
starts serving CQL requests. It does that by periodically trying to
establish a python driver session to the node.

During session establishment, the driver tries to fetch some metadata
from the system tables, and uses a pretty short timeout to do so (by
default it's 2 seconds). When running tests in debug mode, this timeout
can prove to be too short and may prevent the testing framework from
noticing that the node came up.

Fix the problem by increasing the timeout. Currently, after the session
is established, a query is sent in order to further verify that the
session works and it uses a very generous timeout of 1000 seconds to do
so - use the same timeout for internal queries in the python driver.

Fixes: scylladb/scylladb#15898

Closes scylladb/scylladb#15929
2023-11-03 09:32:11 +01:00
Kefu Chai
5b7feb8b95 build: s/create_building_system/create_build_system/
as build system is more correct in this context.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15932
2023-11-03 09:37:44 +02:00
Pavel Emelyanov
3173336e97 tests: Use make_sstable_easy() where appropriate
There are two test cases out there that make sstable, write it and the
load, but the make_sstable_easy() is for that, so use it there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:32:43 +03:00
Pavel Emelyanov
cc89acff67 sstable_conforms_to_mutation_source_test: Open-code the make_sstable()
helper

This test case is pretty special in the sense that it uses custom path
for tempdir to create, write and load sstable to/from. It's better to
open-code the make_sstable() helper into the test case rather than
encourage callers to use custom tempdirs. "Good" test cases can use
make_sstable_easy() for the same purposes (in fact they alredy do).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:30:54 +03:00
Pavel Emelyanov
7f6423bc35 sstable_mutation_test: Use make_sstable_easy() instead of make_sstable()
The latter is only used in the former test case and doesn't provide
extra value.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:30:02 +03:00
Pavel Emelyanov
eeee58def8 tests: Make use of make_memtable() helper
There's one in the utils that creates lw_shared_ptr<memtable> and
applies provided vector of mutations into it. Lots of other test cases
do literally the same by hand.

The make_memtable() assumes that the caller is sitting in the seastar
thread, and all the test cases that can benfit from it already are.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:28:35 +03:00
Pavel Emelyanov
c1824324bd tests: Drop as_mutation_source helper
It does nothing by calls the sstable method of the same name. Callers
can do it on their own, the method is public.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:27:59 +03:00
Pavel Emelyanov
3ff32a2ca5 test/sstable_utils: Hide assertion-related manipulations into branch
The make_sstable_containing() can validate the applied mutations are
produced by the resulting sstable if the callers asks for it. To do so
the mutations are merged prior to checking and this merging should only
happen if validation is requested, otherwise it just makes no sense.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-02 19:26:46 +03:00
Kamil Braun
8179296f56 Merge 'retry automatic announcements of the schema changes on concurrent operation' from Patryk Jędrzejczak
The follow-up to #15594.

We retry every automatic `migration_manager::announce` if
`group0_concurrent_modification` occurs. Concurrent operations can
happen during concurrent bootstrap in Raft-based topology, so we need
this change to enable support for concurrent bootstrap.

This PR adds retry loops in 4 places:
- `service::create_keyspace_if_missing`,
- `system_distributed_keyspace::start`,
- `redis::create_keyspace_if_not_exists_impl`,
- `table_helper::setup_keyspace` (used for creating the `system_traces` keyspace).

Fixes #15435

Closes scylladb/scylladb#15613

* github.com:scylladb/scylladb:
  table_helper: fix indentation
  table_helper: retry in setup_keyspace on concurrent operation
  table_helper: add logger
  redis/keyspace_utils: fix indentation
  redis: retry creating defualt databases on concurrent operation
  db/system_distributed_keyspace: fix indentation
  db/system_distributed_keyspace: retry start on concurrent operation
  auth/service: retry creating system_auth on concurrent operation
2023-11-02 17:24:52 +01:00
Kamil Braun
5cf18b18b2 Merge 'raft: topology: outside topology-on-raft mode, make sure not to use its RPCs' from Piotr Dulikowski
Topology on raft is still an experimental feature. The RPC verbs
introduced in that mode shouldn't be used when it's disabled, otherwise
we lose the right to make breaking changes to those verbs.

First, make sure that the aforementioned verbs are not sent outside the
mode. It turns out that `raft_pull_topology_snapshot` could be sent
outside topology-on-raft mode - after the PR, it no longer can.

Second, topology-on-raft mode verbs are now not registered at all on the
receiving side when the mode is disabled.

Additionally tested by running `topology/` tests with
`consistent_cluster_management: True` but with experimental features
disabled.

Fixes: scylladb/scylladb#15862

Closes scylladb/scylladb#15917

* github.com:scylladb/scylladb:
  storage_service: fix indentation
  raft: topology: only register verbs in topology-on-raft mode
  raft: topology: only pull topology snapshot in topology-on-raft mode
2023-11-02 16:44:18 +01:00
Kefu Chai
798eede61a build: cmake: update 3rd party library deps where it is found
move the code which updates the third-party library closer to where
the library is found. for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15915
2023-11-02 17:20:57 +02:00
Kefu Chai
0421db2471 build: cmake: enable Seastar_UNUSED_RESULT_ERROR
this mirrors what we already have in `configure.py`.

so that Seastar can report [[nodiscard]] violations as error.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15914
2023-11-02 17:19:31 +02:00
Patryk Jędrzejczak
dacec6374d table_helper: fix indentation
Broken in the previous commit.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
e10036babe table_helper: retry in setup_keyspace on concurrent operation
Currently, table_helper::setup_keyspace is used only for starting
the system_traces keyspace. We need to handle concurrent group 0
operations possible during concurrent bootstrap in the Raft-based
topology.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
e2894a081a table_helper: add logger
It will be used in the next commit to log information when
a concurrent group 0 modification occurs.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
3e8a307cd4 redis/keyspace_utils: fix indentation
Broken in the previous commit.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
24aa5bf72c redis: retry creating defualt databases on concurrent operation
A concurrent group 0 operation in
create_keyspace_if_not_exists_impl can happen during concurrent
bootstrap in the Raft-based topology.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
0357636f16 db/system_distributed_keyspace: fix indentation
Broken in the previous commit.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
813c7a582c db/system_distributed_keyspace: retry start on concurrent operation
A concurrent group 0 operation in
system_distributed_keyspace::start can happen during concurrent
bootstrap in the Raft-based topology.
2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak
dfba0b9e9b auth/service: retry creating system_auth on concurrent operation
A concurrent group 0 operation in
service::create_keyspace_if_missing can happen during concurrent
bootstrap in the Raft-based topology.
2023-11-02 14:21:15 +01:00
Pavel Emelyanov
1a44f362b2 pytest: Do not try to guess which scylla binary user wants to run
When running some pytest-based tests they start scylla binary by hand
instead of relying on test.py's "clusters". In automatic run (e.g. via
test.py itself) the correct scylla binary is the one pointed to by
SCYLLA environment, but when run from shell via pytest directly it tries
to be smart and looks at build/*/scylla binaries picking the one with
the greatest mtime.

That guess is not very nice, because if the developer switches between
build modes with configure.py and rebuilds binaries, binaries from
"older" or "previous" builds stay on the way and confuse the guessing
code. It's better to be explicit.

refs: #15679

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15684
2023-11-02 12:34:49 +02:00
Kamil Braun
0846d324d7 Merge 'rollback topology operation on streaming failure' from Gleb
This patch series adds error handling for streaming failure during
topology operations instead of an infinite retry. If streaming fails the
operation is rolled back: bootstrap/replace nodes move to left and
decommissioned/remove nodes move back to normal state.

* 'gleb/streaming-failure-rollback-v4' of github.com:scylladb/scylla-dev:
  raft: make sure that all operation forwarded to a leader are completed before destroying raft server
  storage_service: raft topology: remove code duplication from global_tablet_token_metadata_barrier
  tests: add tests for streaming failure in bootstrap/replace/remove/decomission
  test/pylib: do not stop node if decommission failed with an expected error
  storage_service: raft topology: fix typo in "decommission" everywhere
  storage_service: raft topology: add streaming error injection
  storage_service: raft topology: do not increase topology version during CDC repair
  storage_service: raft topology: rollback topology operation on streaming failure.
  storage_service: raft topology: load request parameters in left_token_ring state as well
  storage_service: raft topology: do not report term_changed_error during global_token_metadata_barrier as an error
  storage_service: raft topology: change global_token_metadata_barrier error handling to try/catch
  storage_service: raft topology: make global_token_metadata_barrier node independent
  storage_service: raft topology: split get_excluded_nodes from exec_global_command
  storage_service: raft topology: drop unused include_local and do_retake parameters from exec_global_command which are always true
  storage_service: raft topology: simplify streaming RPC failure handling
2023-11-02 10:15:45 +01:00
Kamil Braun
ae58e39743 Merge 'reduce announcements of the automatic schema changes' from Patryk Jędrzejczak
There are some schema modifications performed automatically (during
bootstrap, upgrade etc.) by Scylla that are announced by multiple calls
to `migration_manager::announce` even though they are logically one
change. Precisely, they appear in:
- `system_distributed_keyspace::start`,
- `redis:create_keyspace_if_not_exists_impl`,
- `table_helper::setup_keyspace` (for the `system_traces` keyspace).

All these places contain a FIXME telling us to `announce` only once.
There are a few reasons for this:
- calling `migration_manager::announce` with Raft is quite expensive --
  taking a `read_barrier` is necessary, and that requires contacting a
leader, which then must contact a quorum,
- we must implement a retrying mechanism for every automatic `announce`
  if `group0_concurrent_modification` occurs to enable support for
concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs
mentioned above would be harder, and fixing the FIXMEs later would also
be harder.

This PR fixes the first two FIXMEs and improves the situation with the
last one by reducing the number of the `announce` calls to two.
Unfortunately, reducing this number to one requires a big refactor. We
can do it as a follow-up to a new, more specific issue. Also, we leave a
new FIXME.

Fixing the first two FIXMEs required enabling the announcement of a
keyspace together with its tables. Until now, the code responsible for
preparing mutations for a new table could assume the existence of the
keyspace. This assumption wasn't necessary, but removing it required
some refactoring.

Fixes scylladb/scylladb#15437

Closes scylladb/scylladb#15897

* github.com:scylladb/scylladb:
  table_helper: announce twice in setup_keyspace
  table_helper: refactor setup_table
  redis: create_keyspace_if_not_exists_impl: fix indentation
  redis: announce once in create_keyspace_if_not_exists_impl
  db: system_distributed_keyspace: fix indentation
  db: system_distributed_keyspace: announce once in start
  tablet_allocator: update on_before_create_column_family
  migration_listener: add parameter to on_before_create_column_family
  alternator: executor: use new prepare_new_column_family_announcement
  alternator: executor: introduce create_keyspace_metadata
  migration_manager: add new prepare_new_column_family_announcement
2023-11-02 09:32:35 +01:00
Piotr Dulikowski
6d15f0283e storage_service: fix indentation
It was broken by the previous commit.
2023-11-02 07:39:27 +01:00
Piotr Dulikowski
190d549bd5 raft: topology: only register verbs in topology-on-raft mode
Verbs related to topology on raft should not be sent outside the
topology on raft mode - and, after the previous commit, they aren't.

Make sure not to register handlers for those verbs if topology on raft
mode is not enabled.
2023-11-02 07:39:27 +01:00
Piotr Dulikowski
8727634e9c raft: topology: only pull topology snapshot in topology-on-raft mode
Currently, during group0 snapshot transfer, the node pulling
the snapshot will send the `raft_pull_topology_snapshot` verb even if
the cluster is not in topology-on-raft mode. The RPC handler returns an
empty snapshot in that case. However, using the verb outside topology on
raft causes problems:

- It can cause issues during rolling upgrade as the snapshot transfer
  will keep failing on the upgraded nodes until the leader node is
  upgraded,
- Topology changes on raft are still experimental, and using the RPC
  outside experimental mode will prevent us from doing breaking changes
  to it.

Solve the issue by passing the "topology changes on raft enabled" flag
to group0_state_machine and send the RPC only in topology on raft mode.
2023-11-02 07:39:27 +01:00
Yaniv Kaul
c662fe6444 Debian based Dockerfile: do not install 'suggested' pacakges
We can opt out from installing suggested packages. Mainly those related to Java and friends that we do not seem to need.

Fixes: #15579

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#15580
2023-11-01 17:16:18 +02:00
Botond Dénes
a34c8dc485 Merge 'Drop compaction_manager_for_testing' from Pavel Emelyanov
There's such a wrapper class in test_services. After #15889 this class resembles the test_env_compaction_manager and can be replaced with it. However, two users of the former wrapper class need it just to construct table object, and the way they do it is re-implementation of table_for_tests class.

This PR patches the test cases to make use of table_for_tests and removes the compaction_manager_for_testing that becomes unused after it.

Closes scylladb/scylladb#15909

* github.com:scylladb/scylladb:
  test_services: Ditch compaction_manager_for_testing
  test/sstable_compaction_test: Make use of make_table_for_tests()
  test/sstable_3_x_test: Make use of make_table_for_tests()
  table_for_tests: Add const operator-> overload
  sstable_test_env: Add test_env_compaction_manager() getter
  sstable_test_env: Tune up maybe_start_compaction_manager() method
  test/sstable_compaction_test: Remove unused tracker allocation
2023-11-01 16:08:34 +02:00
Botond Dénes
665a5cb322 Update tools/jmx submodule
* tools/jmx 8d15342e...05bb7b68 (4):
  > README: replace 0xA0 (NBSP) character with space
  > scylla-apiclient: update Guava dependency
  > scylla-apiclient: update snakeyaml dependency
  > scylla-apiclient: update Jackson dependencies

[Botond: regenerate frozen toolchain]
2023-11-01 08:08:37 -04:00
Pavel Emelyanov
787c6576fe test_services: Ditch compaction_manager_for_testing
Now this wrapper is unused, all (both) test cases that needed it were
patched to use make_table_for_tests().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
731a82869a test/sstable_compaction_test: Make use of make_table_for_tests()
The max_ongoing_compaction_test test case constructs table object by
hand. For that it needs tracker, compaction manager and stats. Similarly
to previous patch, the test_env::make_table_for_tests() helper does
exactly that, so the test case can be simplified as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
5b3b8c2176 test/sstable_3_x_test: Make use of make_table_for_tests()
The compacted_sstable_reader() helper constructs table object and all
its "dependencies" by hand. The test_env::make_table_for_tests() helper
does the same, so the test code can be simplified.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
9b8f03bdb0 table_for_tests: Add const operator-> overload
Will be used later in boost transformation lambda

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
3021fb7b6c sstable_test_env: Add test_env_compaction_manager() getter
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
19b524d0f3 sstable_test_env: Tune up maybe_start_compaction_manager() method
Make it public and add `bool enable` flag so that test cases could start
the compaction manager (to call make_table_for_tests() later) but keep
it disabled for their testing purposes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:17 +03:00
Pavel Emelyanov
3f354c07a3 test/sstable_compaction_test: Remove unused tracker allocation
The sstable_run_based_compaction_test case allocates the tracker but
doesn't use it. Probably was left after the case was patched to use
make_table_for_tests() helper.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-01 14:18:12 +03:00
Kefu Chai
ef023dae44 s3: use rapixml/rapidxml.hpp as a fallback
on debian derivatives librapidxml-dev installs rapidxml.h as
rapixml/rapidxml.hpp, so let's use it as a fallback.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15814
2023-11-01 10:25:40 +03:00
Kefu Chai
7253369ad9 SCYLLA-VERSION-GEN: respect --date-stamp
before this change the argument passed to --date-stamp option is
ignored, as we don't reference the date-stamp specified with this option
at all. instead, we always overwrite it with the the output of
`date --utc +%Y%m%d`, if we are going to reference this value.

so, in this change instead of unconditionally overwriting it, we
keep its value intact if it is already set.

the change which introduced this regression was 839d8f40e6

Fixes #15894
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15895
2023-11-01 10:24:04 +03:00
Avi Kivity
fcd86d993d Merge 'Put table_for_tests on a diet' from Pavel Emelyanov
The object in question is used to facilitate creation of table objects for compaction tests. Currently the table_for_test carries a bunch of auxiliary objects that are needed for table creation, such as stats of all sorts and table state. However, there's also some "infrastructure" stuff onboard namely:

- reader concurrency semaphore
- cache tracker
- task manager
- compaction manager

And those four are excessive because all the tests in question run inside the sstables::test_env that has most of it.

This PR removes the mentioned objects from table_for_tests and re-uses those from test_env. Also, while at it, it also removes the table::config object from table_for_tests so that it looks more like core code that creates table does.

Closes scylladb/scylladb#15889

* github.com:scylladb/scylladb:
  table_for_tests: Use test_env's compaction manager
  sstables::test_env: Carry compaction manager on board
  table_for_tests: Stop table on stop
  table_for_tests: Get compaction manager from table
  table_for_tests: Ditch on-board concurrency semaphore
  table_for_tests: Require config argument to make table
  table_for_tests: Create table config locally
  table_for_tests: Get concurrency semaphore from table
  table_for_tests: Get table directory from table itself
  table_for_tests: Reuse cache tracker from sstables manager
  table_for_tests: Remove unused constructor
  tests: Split the compaction backlog test case
  sstable_test_env: Coroutinize and move to .cc test_env::stop()
2023-10-31 18:03:07 +02:00
Piotr Smaroń
8c464b2ddb guardrails: restrict replication strategy (RS)
Replacing `restrict_replication_simplestrategy` config option with
2 config options: `replication_strategy_{warn,fail}_list`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on replication strategy when creating/altering a keyspace.
The reason to rather replace than extend `restrict_replication_simplestrategy` config
option is that it was not used and we wanted to generalize it.
Only soft guardrail is enabled by default and it is set to SimpleStrategy,
which means that we'll generate a CQL warning whenever replication strategy
is set to SimpleStrategy. For new cloud deployments we'll move
SimpleStrategy from warn to the fail list.
Guardrails violations will be tracked by metrics.

Resolves #5224
Refs #8892 (the replication strategy part, not the RF part)

Closes scylladb/scylladb#15399
2023-10-31 18:34:41 +03:00
Botond Dénes
287f05ad26 Merge 'scylla-sstable/tools: Use semi-properly initiated db::config + extensions to allow encrypted sstables' from Calle Wilund
Refs https://github.com/scylladb/scylla-enterprise/issues/3461
Refs https://github.com/scylladb/scylla-enterprise/issues/3210

Adds a tool-app global db::config + extensions to each tool invocation + configurable init.
Uses this in scylla-sstables, allowing both enterprise-only configs to be read, as well as (almost all)
encrypted sstables.

Note: Do not backport to enterprise before https://github.com/scylladb/scylla-enterprise/pull/3473 is merged, otherwise tools will break there.

Closes scylladb/scylladb#15615

* github.com:scylladb/scylladb:
  scylla-sstable: Use tool-global config + extensions
  tools: Add db config + extensions to tool app run
2023-10-31 14:21:57 +02:00
Pavel Emelyanov
b974d8ca1b stream_session: Do not print banign exceptions with error level
Handler of STREAM_MUTATION_FRAGMENTS verb creates and starts reader. The
resulting future is then checked for being exceptional and an error
message is printed in logs.

However, if reader fails because of socket being closed by peer, the
error looks excessive. In that case the exception is just regular
handling of the socket/stream closure and can be demoted down to debug
level.

fixes: #15891

Similar cherry-picking of log level exists in e.g. storage proxy, see
for example 56bd9b5d (service: storage_proxy: do not report abort
    requests in handle_write )

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15892
2023-10-31 14:21:22 +02:00
Gleb Natapov
15a34f650d gossip: remove unused HIBERNATE gossiper status
The status is not used since 2ec1f719de
which is included in scylla-4.6.0. We cannot have mixed cluster with the
version so old, so the new version should not carry the compatibility
burden.
2023-10-31 14:08:38 +02:00
Gleb Natapov
35a1ac1a9a gossip: remove unused STATUS_MOVING state
Moving operation was removed by 4a0b561376
and since then the state is unused. Even back then it worked only for
the case of one token so it is safe to say we never used it. Lets
remove the remains of the code instead of carrying it forever.
2023-10-31 13:54:46 +02:00
Kefu Chai
2cd804b8e5 build: cmake: do not hardwire build_reloc.sh arguments
before this change, we feed `build_reloc.sh` with hardwired arguments
when building python3 submodule. but this is not flexible, and hurts
the maintainability.

in this change, we mirror the behavior of `configure.py`, and collect
the arguments from the output of `install-dependencies.sh`, and feed
the collected argument to `build_reloc.sh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15885
2023-10-31 13:27:12 +02:00
Botond Dénes
90a8489809 repair/repair.cc: do_repair_ranges(): prevent stalls when skipping ranges
We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.

Fixes: #14330

Closes scylladb/scylladb#15879
2023-10-31 13:24:54 +02:00
Avi Kivity
ef7db6df99 Merge 'schema_tables: turn view schema fixing code into a sanity check' from Kamil Braun
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.

The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).

The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.

The main motivation of this PR is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.

The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.

Closes scylladb/scylladb#15695

* github.com:scylladb/scylladb:
  view: remove unused `_backing_secondary_index`
  schema_tables: turn view schema fixing code into a sanity check
  schema_tables: make comment more precise
  feature_service: make COMPUTED_COLUMNS feature unconditionally true
2023-10-31 13:23:19 +02:00
Kefu Chai
e853d7bb4b build: cmake: add Scylla_DATE_STAMP option
to be compatible with `configure.py` which allows us to optionally
specify the --date-stamp option for SCYLLA-VERSION-GEN. this option
is used by our CI workflow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15896
2023-10-31 13:21:30 +02:00
Eliran Sinvani
2a45fed0cf test.py: move to a gracefull temination of nodes on teardown
This change move existing suits which create cluster through the
testing infra to be stopped and uninstalled gracefully.
The motivation, besides the obvious advantage of testing our stop
sequence is that it will pave the way for applying code coverage support
to all tests (not only standalone unit and boost test executables).

testing:
	Ran all tests 10 times in a row in dev mode.
	Ran all tests once in release mode
	Ran all tests once in debug mode

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-10-31 13:12:49 +02:00
Eliran Sinvani
62ec1fe8e0 test.py: Use stop lock also in the graceful version
An already known race (see: https://github.com/scylladb/scylladb/issues/15755)
has been found once again as part of moving all tests to stop all nodes
gracefully on teardown.
The solution was to add the lock acquisition also to `stop_gracefully`.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-10-31 13:12:49 +02:00
Patryk Jędrzejczak
ba5275a6ae table_helper: announce twice in setup_keyspace
We refactor table_helper::setup_keyspace so that it calls
migration_manager::announce at most twice. We achieve it by
announcing all tables at once.

The number of announcements should further be reduced to one, but
it requires a big refactor. The CQL code used in
parse_new_cf_statement assumes the keyspace has already been
created. We cannot have such an assumption if we want to announce
a keyspace and its tables together. However, we shouldn't touch
the CQL code as it would impact user requests, too.

One solution is using schema_builder instead of the CQL statements
to create tables in table_helper.

Another approach is removing table_helper completely. It is used
only for the system_traces keyspace, which Scylla creates
automatically. We could refactor the way Scylla handles this
keyspace and make table_helper unneeded.
2023-10-31 12:08:04 +01:00
Patryk Jędrzejczak
bf15d5f7bb table_helper: refactor setup_table
In the following commit, we reduce migration_manager::announce
calls in table_helper::setup_keyspace by announcing all tables
together. To do it, we cannot use table_helper::setup_table
anymore, which announces a single table itself. However, the new
code still has to translate CQL statements, so we extract it to the
new parse_new_cf_statement function to avoid duplication.
2023-10-31 12:08:04 +01:00
Patryk Jędrzejczak
4dd5d8e5be redis: create_keyspace_if_not_exists_impl: fix indentation
Broken in the previous commit.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
3be7215163 redis: announce once in create_keyspace_if_not_exists_impl
We refactor create_keyspace_if_not_exists_impl so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
df199eec11 db: system_distributed_keyspace: fix indentation
Broken in the previous commit.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
91ff8007b3 db: system_distributed_keyspace: announce once in start
We refactor system_distributed_keyspace::start so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.

We remove a catch expression together with the FIXME from
get_updated_service_levels (add_new_columns_if_missing before the
patch) because we cannot treat the service_levels update
differently anymore.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
5027c5f1e5 tablet_allocator: update on_before_create_column_family
After adding the keyspace_metadata parameter to
migration_listener::on_before_create_column_family,
tablet_allocator doesn't need to load it from the database.

This change is necessary before merging migration_manager::announce
calls in the following commit.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
a762179972 migration_listener: add parameter to on_before_create_column_family
After adding the new prepare_new_column_family_announcement that
doesn't assume the existence of a keyspace, we also need to get
rid of the same assumption in all on_before_create_column_family
calls. After all, they may be initiated before creating the
keyspace. However, some listeners require keyspace_metadata, so we
pass it as a new parameter.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
a2e48b1a5b alternator: executor: use new prepare_new_column_family_announcement
We can use the new prepare_new_column_family_announcement function
that doesn't assume the existence of the keyspace instead of the
previous work-around.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
4ad2d895a3 alternator: executor: introduce create_keyspace_metadata
We need to store a new keyspace's keyspace_metadata as a local
variable in create_table_on_shard0. In the following commit, we
use it to call the new prepare_new_column_family_announcement
function.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
fb2703de50 migration_manager: add new prepare_new_column_family_announcement
In the following commits, we reduce the number of the
migration_manager::anounce calls by merging some of them in a way
that logically makes sense. Some of these merges are similar --
we announce a new keyspace and its tables together. However,
we cannot use the current prepare_new_column_family_announcement
there because it assumes that the keyspace has already been created
(when it loads the keyspace from the database). Luckily, this
assumption is not necessary as this function only needs
keyspace_metadata. Instead of loading it from the database, we can
pass it as a parameter.
2023-10-31 12:08:03 +01:00
Kefu Chai
9dd5af7fef alternator: avoid using the deprecated API
this change silences following compiling warning due to using the
deprecated API by using the recommended API in place of the deprecated
one:

```
/home/kefu/dev/scylladb/alternator/server.cc:569:27: warning: 'set_tls_credentials' is deprecated: use listen(socket_address addr, server_credentials_ptr credentials) [-Wdeprecated-declarations]
            _https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
                          ^
/home/kefu/dev/scylladb/seastar/include/seastar/http/httpd.hh:186:7: note: 'set_tls_credentials' has been explicitly marked deprecated here
    [[deprecated("use listen(socket_address addr, server_credentials_ptr credentials)")]]
      ^
1 warning generated.
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15884
2023-10-31 12:05:58 +03:00
Botond Dénes
4a0f16474f Merge 'row_cache: abort on exteral_updater::execute errors' from Benny Halevy
Currently the cache updaters aren't exception safe
yet they are intended to be.

Instead of allowing exceptions from
`external_updater::execute` escape `row_cache::update`,
abort using `on_fatal_internal_error`.

Future changes should harden all `execute` implementations
to effectively make them `noexcept`, then the pure virtual
definition can be made `noexcept` to cement that.

Fixes scylladb/scylladb#15576

Closes scylladb/scylladb#15577

* github.com:scylladb/scylladb:
  row_cache: abort on exteral_updater::execute errors
  row_cache: do_update: simplify _prev_snapshot_pos setup
2023-10-31 10:07:01 +02:00
Pavel Emelyanov
4db80ed61f table_for_tests: Use test_env's compaction manager
Now when the sstables::test_env provides the compaction manager
instance, the table_for_tests can start using it and can remove c.m. and
the sidecar task_manager.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:42:19 +03:00
Pavel Emelyanov
2c78b46c78 sstables::test_env: Carry compaction manager on board
Most of the test cases that use sstables::test_env do not mess with
table objects, they only need sstables. However, compaction test cases
do need table objects and, respectively, a compaction manager instance.
Today those test cases create compaction manager instance for each table
they create, but that's a bit heaviweight and doesn't work the way core
code works. This patch prepares the sstables::test_env to provide
compaction manager on demand by starting it as soon as it's asked to
create table object.

For now this compaction manager is unused, but it will be in next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:39:54 +03:00
Pavel Emelyanov
b96d39e63a table_for_tests: Stop table on stop
Next patches will stop using compaction manager from table_for_tests in
favor of external one (spoiler: the one from sstables::test_env), thus
the compaction manager would outsurvive the table_for_tests object and
the table object wrapped by it. So in order for the table_for_tests to
stop correctly, it also needs to stop the wrapped table too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:38:03 +03:00
Pavel Emelyanov
e71409df38 table_for_tests: Get compaction manager from table
There's table_for_tests::get_compaction_manager() helper that's
excessive as compaction manager reference can be provided by the wrapped
table object itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:37:22 +03:00
Pavel Emelyanov
ac45aae0c4 table_for_tests: Ditch on-board concurrency semaphore
It's not used any longer and can be removed. This make table_for_tests
stopping code a bit shorter as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:36:59 +03:00
Pavel Emelyanov
21998296a7 table_for_tests: Require config argument to make table
This is the continuation of the previous patch. Make the caller of
table_for_tests constructor provide the table::config. This makes the
table_for_tests constructor shorter and more self-contained.

Also, the caller now needs to provide the reference to reader
concurrency semaphore, and that's good news, because the only caller for
today is the sstables::test_env that does have it. This makes the
semaphore sitting on table_for_tests itself unused and it will be
removed eventually.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:34:59 +03:00
Pavel Emelyanov
5ab1af3804 table_for_tests: Create table config locally
The table_for_tests keeps a copy of table::config on board. That's not
"idiomatic" as table config is a temporary object that should only be
needed while creating table object. Fortunately, the copy of config on
table_for_tests is no longer needed and it can be made temporary.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:33:29 +03:00
Pavel Emelyanov
76e57cc805 table_for_tests: Get concurrency semaphore from table
Making compaction permit needs a semaphore. Current code gets it from
the table_for_tests, but the very same semaphore reference sits on the
table. So get it from table, as the core code does. This will allow
removing the dedicated semaphore from table_for_tests in the future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:32:32 +03:00
Pavel Emelyanov
35f7ada949 table_for_tests: Get table directory from table itself
Making sstable for a table needs passing table directory as an argument.
Current table_for_tests's helper gets the directory from table config,
but the very same path sits on the table itself. This makes testing code
to construct sstable look closer to the core code and is also the
prerequisite for removing the table config from table_for_tests in the
future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:30:59 +03:00
Pavel Emelyanov
769d9f17eb table_for_tests: Reuse cache tracker from sstables manager
When making table object it needs the cache tracker reference. The
table_for_tests keeps one on board, but the very same object already
sits on the sstables manager which has public getter.

This makes the table_for_tests's cache tracker object not needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:29:49 +03:00
Pavel Emelyanov
89e253c77e table_for_tests: Remove unused constructor
No code constructs it with just sstables manager argument.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:29:29 +03:00
Pavel Emelyanov
cba8f633f1 tests: Split the compaction backlog test case
To improve parallelizm of embedded test sub-cases.
By coinsidence, indentation fix is not required.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:27:57 +03:00
Pavel Emelyanov
8d704f2532 sstable_test_env: Coroutinize and move to .cc test_env::stop()
It's going to get larger, so better to move.
Also when coroutinized it's goind to be easier to extend.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-31 09:26:58 +03:00
Kefu Chai
89a75967b1 build: ignore FileExistsError when creating compile_commands.json
before this change, we only check the existence of compile_commands.json
before creating a symlink to build/*/compile_commands.json. but there are
chances that multiple ninja tasks are calling into `configure.py` for
updating `build.ninja`: this does not break the process, as the last one
wins: we just unconditionally `mv build.ninja.new build.ninja` for
updating the this file. but this could break the build of
`'compile_commands.json`: we create a symlink with Python, and if it
fails the Python script errors out.

in this change, we just ignore the `FileExistsError` when creating
the symlink to `compile_commands.json`. because, if this symlink,
we've achieved the goal, and should not consider it a failure.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15870
2023-10-30 23:47:48 +02:00
Anna Stuchlik
d4b1e8441a doc: add the latest AWS image info to Installation
This commit adds the AWS image information for
the latest patch release to the Launch on AWS
page in the installation section.

This is a follow-up PR required to finalize
the AWS installation docs and should be
backported to branch-5.4.

Related:
https://github.com/scylladb/scylladb/pull/14153
https://github.com/scylladb/scylladb/pull/15651

Closes scylladb/scylladb#15867
2023-10-30 23:41:23 +02:00
Avi Kivity
949e9f1205 Merge 'Nodetool additional commands 3/N' from Botond Dénes
This PR implements the following new nodetool commands:
* cleanup
* clearsnapshots
* listsnapshots

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15843

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the listsnapshots command
  tools/scylla-nodetool: implement clearsnapshot command
  tools/scylla-nodetool: implement the cleanup command
  test/nodetool: rest_api_mock: add more options for multiple requests
  tools/scylla-nodetool: log responses with trace level
2023-10-30 21:53:36 +02:00
Avi Kivity
5a7d15a666 Update seastar submodule
* seastar 17183ed4e4...830ce86738 (6):
  > coroutine: fix use-after-free in parallel_for_each
  > build: do not provide zlib as an ingredient
  > http: do not use req.content_length as both input parameter
  > io_tester: disable -Wuninitialized when including boost.accumulators
  > scheduling: revise the doxygen comment of create_scheduling_group()
  > Merge 'Added ability to configure different credentials per HTTP listeners' from Michał Maślanka

Closes scylladb/scylladb#15871
2023-10-30 21:39:12 +02:00
Avi Kivity
03a801b61b Merge 'Nodetools docs improvements 1/N' from Botond Dénes
While working on https://github.com/scylladb/scylladb/issues/15588, I noticed problems with the existing documentation, when comparing it with the actual code.
This PR contains fixes for nodetool compact, stop and scrub.

Closes scylladb/scylladb#15636

* github.com:scylladb/scylladb:
  docs: nodetool compact: remove common arguments
  docs: nodetool stop: fix compaction types and examples
  docs: nodetool compact: remove unsupported partition option
2023-10-30 20:17:14 +02:00
Pavel Emelyanov
c88de8f91e test/compaction: Use shorter make_table_for_tests() overload
There's one that doesn't need tempdir path argument since it gets one
from the env onboard tempdir anyway

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15825
2023-10-30 20:16:29 +02:00
Paweł Zakrzewski
384427bd02 doc: Replace instances of SimpleStrategy with NetworkTopologyStrategy
The goal is to make the available defaults safe for future use, as they
are often taken from existing config files or documentation verbatim.

Referenced issue: #14290

Closes scylladb/scylladb#15856
2023-10-30 20:15:48 +02:00
Pavel Emelyanov
7fa7a9495d task_manager: Don't leave task_ttl uninitialized
When task_manager is constructed without config (tests) its task_ttl is
left uninitialized (i.e. -- random number gets in there). This results
in tasks hanging around being registered for infinite amount of time
making long-living task manager look hanged.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15859
2023-10-30 20:15:05 +02:00
Kefu Chai
d01b9f95a0 build: cmake: disable sanitize-address-use-after-scope only when needed
we enable sanitizer only in Debug and Sanitize build modes, if we pass
`-fno-sanitize-address-use-after-scope` to compiler when the sanitizer
is not enabled when compiling, Clang complains like:

```
clang-16: error: argument unused during compilation: '-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument]
```

this breaks the build on the build modes where sanitizers are not
enabled.

so, in this change, we only disable the sanitize-address-use-after-scope
sanitizer if the sanitizers are enabled.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15868
2023-10-30 20:14:12 +02:00
Anna Stuchlik
9f85b1dc38 doc: remove version "5.3" from the docs
Version 5.3 was never released. This commit
removes mentions of the version from the docs.
2023-10-30 15:56:53 +01:00
Anna Stuchlik
8723f71a3d doc: remove the 5.2-to-5.3 upgrade guide
Version 5.3 was never released, so the upgrade
guide must be removed.
2023-10-30 15:47:23 +01:00
Marcin Maliszkiewicz
3992d1c2ce alternator: add support for ReturnValuesOnConditionCheckFailure feature
As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/,
DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem),
ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the
current value of the item - but only if a condition check failed.

Fixes https://github.com/scylladb/scylladb/issues/14481
2023-10-30 15:33:56 +01:00
Marcin Maliszkiewicz
b4c77a373d alternator: add ability to send additional fields in api_error
While it may not be explicitly documented DynamoDB sometimes enchriches error
message by additional fields. For instance when ConditionalCheckFailedException
occurs while ReturnValuesOnConditionCheckFailure is set it will add Item object,
similarly for TransactionCanceledException it will add CancellationReasons object.
There may be more cases like this so generic json field is added to our error class.

The change will be used by future commit implementing ReturnValuesOnConditionCheckFailure
feature.
2023-10-30 15:13:06 +01:00
Calle Wilund
b9e57583f3 scylla-sstable: Use tool-global config + extensions
Uses a single db::config + extensions, allowing both handling
of enterprise-only scylla.yaml keys, as well as loading sstables
utilizing extension in that universe.
2023-10-30 10:22:12 +00:00
Calle Wilund
6de4e7af21 tools: Add db config + extensions to tool app run
Initializes extensions for tools runs, allowing potentially more interaction
with, say, sstables in some versions of scylla.
2023-10-30 10:20:53 +00:00
Avi Kivity
d450a145ce Revert "Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak"
This reverts commit 4b80130b0b, reversing
changes made to a5519c7c1f. It's suspected
of causing dtest failures due to a bug in coroutine::parallel_for_each.
2023-10-29 18:32:06 +02:00
Wojciech Mitros
f08e7aad61 test: account for multiple flushes of commitlog segments
Currently, when we calculate the number of deactivated segments
in test_commitlog_delete_when_over_disk_limit, we only count the
segments that were active during the first flush. However, during
the test, there may have been more than one flush, and a segment
could have been created between them. This segment would sometimes
get deactivated and even destroyed, and as a result, the count of
destroyed segments would appear larger than the count of deactivated
ones.

This patch fixes this behavior by accounting for all segments that
were active during any flush instead of just segments active during
the first flush.

Fixes #10527

Closes scylladb/scylladb#14610
2023-10-29 18:30:32 +02:00
Michał Chojnowski
93ea3d41d8 position_in_partition: make operator= exception-safe
The copy assignment operator of _ck can throw
after _type and _bound_weight have already been changed.
This leaves position_in_partition in an inconsistent state,
potentially leading to various weird symptoms.

The problem was witnessed by test_exception_safety_of_reads.
Specifically: in cache_flat_mutation_reader::add_to_buffer,
which requires the assignment to _lower_bound to be exception-safe.

The easy fix is to perform the only potentially-throwing step first.

Fixes #15822

Closes scylladb/scylladb#15864
2023-10-29 18:30:32 +02:00
Andrii Patsula
5807ef0bb7 test: Verify server exit code during graceful process shutdown.
Currently, it's possible for a test to pass even if the server crashes
during a graceful shutdown. Additionally, the server may crash in the
middle of a test, resulting in a test failure with an inaccurate
description.  This commit updates the test framework to monitor the
server's return code and throw an exception in the event of an abnormal
server shutdown.

Fixes scylladb/scylla#15365

Closes scylladb/scylladb#15660
2023-10-29 18:30:32 +02:00
Kefu Chai
2be5a86a14 test/pylib: unset the env variables set by MinIoServer
before this change, when running object_store tests with `pytest`
directly, an instance of MinIoServer is started as a function-scope
fixture, but the environmental variables set by it stay with the
process, even after the fixture is teared down. So, when the 2nd test
in the same process check these environmental variables, it would
under the impression that there is already a S3 server running, and
thinks it is drived by `test.py`, hence try to reuse the S3 server.
But the MinIoServer instance is teared down at that moment, when
the first test is completed.

So the test is likely to fail when the Scylla instance tries
to read the missing conf file previously created by the MinIoServer.

after this change, the environmental variables are reset, so they
won't be seen by the succeeding tests in the same pytest session.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15779
2023-10-29 18:30:32 +02:00
Botond Dénes
132ae92c75 Merge 'build: extract code fragments into functions' from Kefu Chai
this series is one of the steps to remove global statements in `configure.py`.

not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.

Refs #15379

Closes scylladb/scylladb#15818

* github.com:scylladb/scylladb:
  build: move the code with side effects into a single function
  build: create outdir when outdir is explictly used
  build: group the code with side effects together
  build: do not rely on updating global with a dict
  build: extract generate_version() out
  build: extract get_release_cxxflags() out
  build: extract get_extra_cxxflags() out
  build: move thrift_libs to where it is used
  build: move pkg closer to where it is used
  build: remove unused variable
  build: move variable closer to where it is used
2023-10-29 18:30:32 +02:00
Avi Kivity
e349a2657c Merge 'Allow running perf-simple-query with tablets' from Tomasz Grabiec
Usage:

```
build/dev/scylla perf-simple-query --tablets
```

Closes scylladb/scylladb#15656

* github.com:scylladb/scylladb:
  perf_simple_query: Allow running with tablets
  tests: cql_test_env: Allow creating keyspace with tablets
  tests: cql_test_env: Register storage_service in migration notifier
  test: cql_test_env: Initialize node state in topology
2023-10-29 18:30:32 +02:00
Aleksandr Bykov
6b991b4791 doc: add note about run test.py with toolchain/dbuild
test.py tests could be run with toolchain/dbuild and in this case
there is no need to executed ./install-dependicies.sh.

Closes scylladb/scylladb#15837
2023-10-29 18:30:32 +02:00
Kefu Chai
3a6e359328 build: cmake: add token_metadata.cc to api
`token_metadata.cc` moved into api in e4c0a4d34d, let's update CMake
accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15857
2023-10-29 18:30:32 +02:00
Kefu Chai
8819865c8d build: cmake: correct the variable names in mode.Dev.cmake
it was a copy-pasta error.

- s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/
- s/Seastar_OptimizationLevel_RELEASE/Seastar_OptimizationLevel_DEV/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15849
2023-10-29 18:30:32 +02:00
Kamil Braun
1c0ae2e7ef Merge 'raft topology: assign tokens after join node response rpc' from Piotr Dulikowski
Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier.

However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet.

Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier.

Refs: scylladb/scylladb#15686
Fixes: scylladb/scylladb#15738

Closes scylladb/scylladb#15724

* github.com:scylladb/scylladb:
  test: test_topology_ops: continuously write during the test
  raft topology: assign tokens after join node response rpc
  storage_service: fix indentation after previous commit
  raft topology: loosen assumptions about transition nodes having tokens
2023-10-29 18:30:32 +02:00
Marcin Maliszkiewicz
020a9c931b db: view: run local materialized view mutations on a separate smp service group
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.

Fixes https://github.com/scylladb/scylladb/issues/15844

Closes scylladb/scylladb#15845
2023-10-29 18:30:32 +02:00
Patryk Jędrzejczak
a6236072ee raft topology: join_node_request_handler: wait until first node becomes normal
We need to wait until the first node becomes normal in
`join_node_request_handler` to ensure that joining nodes are not
handled as the first node in the cluster.

If we placed a join request before the first node becomes normal,
the topology coordinator would incorrectly skip the join node
handshake in `handle_node_transition` (`case node_state::none`).
It would happen because the topology coordinator decides whether
a node is the first in the cluster by checking if there are no
normal nodes. Therefore, we must ensure at least one normal node
when the topology coordinator handles a join request for a
non-first node.

We change the previous check because it can return true if there
are no normal nodes. `topology::is_empty` would also return false
if the first node was still new or in transition.

Additionally, calling `join_node_request_handler` before the first
node sets itself as normal is frequent during concurrent bootstrap,
so we remove "unlikely" from the comment.

Fixes: scylladb/scylladb#15807

Closes scylladb/scylladb#15775
2023-10-29 18:30:32 +02:00
Botond Dénes
16ce212c31 tools/scylla-nodetool: implement the listsnapshots command
The output is changed slightly, compared to the current nodetool:
* Number columns are aligned to the right
* Number columns don't have decimal places
* There are no trailing whitespaces
2023-10-27 01:26:54 -04:00
Botond Dénes
27854a50be tools/scylla-nodetool: implement clearsnapshot command 2023-10-27 01:26:54 -04:00
Botond Dénes
b32ee54ba0 tools/scylla-nodetool: implement the cleanup command
The --jobs command-line argument is accepted but ignored, just like the
current nodetool does.
2023-10-27 01:26:53 -04:00
Botond Dénes
7e3a78d73d test/nodetool: rest_api_mock: add more options for multiple requests
Change the current bool multiple param to a weak enum, allowing for a
third value: ANY, which allows for 0 matches too.
2023-10-26 08:31:12 -04:00
Botond Dénes
b878dcc1c3 tools/scylla-nodetool: log responses with trace level
With this, both requests and responses to/from the remote are logged
when trace-level logging is enabled. This should greatly simplify
debugging any problems.
2023-10-26 08:28:37 -04:00
Anna Stuchlik
eb57c3bc22 doc: remove versions from Materialized Views
This commit removes irrelevant information
about versions from the Materialized Views
page (CQL Reference).
In addition, it replaces "Scylla" with
"ScyllaDB" on MV-related pages.
2023-10-26 12:08:13 +02:00
Anna Stuchlik
29bd044db3 doc: add CQL Reference for Materialized Views
This commit adds CQL Reference for Materialized
Views to the Materialized Views page.
2023-10-26 11:47:22 +02:00
Kefu Chai
227136ddf5 main.cc: specify shortname for scheduling groups
so, for instance, the logging message looks like:
```
INFO  2023-10-24 15:19:37,290 [shard 0:strm] storage_service - entering STARTING mode
```
instead of
```
INFO  2023-10-24 15:19:37,290 [shard 0:stre] storage_service - entering STARTING mode
```

Fixes #15267
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15821
2023-10-26 10:52:05 +03:00
Kefu Chai
d43afd576e cql3/restrictions/statement_restrictions: s/allow filtering/ALLOW FILTERING/
use the captalized "ALLOW FILTERING" in the error message, because the
error message is a part of the user interface, it would be better to
keep it aligned with our document, where "ALLOW FILTERING" is used.

so, in this change, the lower-cased "allow filtering" error message is
changed to "ALLOW FILTERING", and the tests are updated accordingly.

see also a0ffbf3291

Refs #14321
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15718
2023-10-26 10:00:37 +03:00
Kefu Chai
bfd99fad7f build: move the code with side effects into a single function
so that we can optionally utilize CMake for generating the building
system instead.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
85cc9073c9 build: create outdir when outdir is explictly used
actually we've created outdir when using it as the parent directory
of `tempfile.tempdir`, but there are many places where we use
`tempfile.tempdir` for, for instance, testing the compiler flags,
and these tests will be removed once we migrate to CMake, so they
do not really matter when reviewing the change which migrates to
CMake.

the point of this change is to help the review understand the major
changes performed by the migration.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
6c7cc927b5 build: group the code with side effects together
so we can move them into a single function

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
a375ce2ac1 build: do not rely on updating global with a dict
we use `globals().update(vars(args))` for updating the global variables
with a dict in `args`, this is convenient, but it hurts the readability.
let's reference the parsed options explicitly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
a25a153e9f build: extract generate_version() out
so we don't do less things with side effects in the global scope.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
cb6531b1a8 build: extract get_release_cxxflags() out
prepare for the change to read the SCYLLA-*-FILE in functions not
doing this in global scope.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
ec7ac3c750 build: extract get_extra_cxxflags() out
on top of per-mode cxxflags, we apply more of them based on settings
and building environment. to reduce the statements in global scope,
let's extract the related code into a function.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
8646e6c5d1 build: move thrift_libs to where it is used
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:38 +08:00
Kefu Chai
8b76f2a835 build: move pkg closer to where it is used
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:37 +08:00
Kefu Chai
ea6bf6b908 build: remove unused variable
`optional_packages` was introduced in 8b0a26f06d, but we don't
offer the alternative versions of libsystemd anymore, and this
variable is not used in `configure.py`, so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:37 +08:00
Kefu Chai
846218a8bc build: move variable closer to where it is used
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:37 +08:00
Yaniv Kaul
600822379d Docs: small typo in cql extensions page
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#15840
2023-10-25 17:27:04 +03:00
Botond Dénes
5d1e9d8c46 Merge 'Sanitize API -> token_metadata dependency' from Pavel Emelyanov
This is the continuation for 19fc01be23

Registering API handlers for services need to

* use only the required service (sharded<> one if needed)
* get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything)

There are several endpoints scattered over storage_service and snitch that use token metadata and topology. This PR makes those endpoints work the described way and drop the api::ctx -> token_metadata dependency.

Closes scylladb/scylladb#15831

* github.com:scylladb/scylladb:
  api: Remove http::context -> token_metadata dependency
  api: Pass shared_token_metadata instead of storage_service
  api: Move snitch endpoints that use token metadata only
  api: Move storage_service endpoints that use token metadata only
2023-10-25 17:19:39 +03:00
Anna Stuchlik
ad29ba4cad doc: add info about encrypted tables to Backup
This commit updates the introduction of the Backup Your Data page to include information about encryption.

Fixes https://github.com/scylladb/scylladb/issues/15573

Closes scylladb/scylladb#15612
2023-10-25 17:15:15 +03:00
Avi Kivity
782c6a208a Merge 'cql3: mutation_fragments_select_statement: keep erm alive for duration of the query' from Botond Dénes
Said statement keeps a reference to erm indirectly, via a topology node pointer, but doesn't keep erm alive. This can result in use-after-free. Furthermore, it allows for vnodes being pulled from under the query's feet, as it is running.
To prevent this, keep the erm alive for the duration of the query.
Also, use `host_id` instead of `node`, the node pointer is not needed really, as the statement only uses the host id from it.

Fixes: #15802

Closes scylladb/scylladb#15808

* github.com:scylladb/scylladb:
  cql3: mutation_fragments_select_statement: use host_id instead of node
  cql3: mutation_fragments_select_statement: pin erm reference
2023-10-25 15:03:07 +03:00
Gleb Natapov
9f6e93c144 raft: make sure that all operation forwarded to a leader are completed before destroying raft server
Hold a gate around all operations that are forwarded to a leader to be
able to wait for them during server::abort() otherwise the abort() may
complete while those operations are still running which may cause use
after free.
2023-10-25 13:29:36 +03:00
Gleb Natapov
ba044b769a storage_service: raft topology: remove code duplication from global_tablet_token_metadata_barrier
global_token_metadata_barrier and global_tablet_token_metadata_barrier
are doing practically the same thing now. Call the former from the
later.
2023-10-25 13:29:36 +03:00
Gleb Natapov
72419f1a61 tests: add tests for streaming failure in bootstrap/replace/remove/decomission 2023-10-25 13:29:36 +03:00
Gleb Natapov
b072ddd8a7 test/pylib: do not stop node if decommission failed with an expected error 2023-10-25 13:03:57 +03:00
Gleb Natapov
cee7aab32c storage_service: raft topology: fix typo in "decommission" everywhere 2023-10-25 13:03:57 +03:00
Gleb Natapov
0201304096 storage_service: raft topology: add streaming error injection
Add error injection into the stream_ranges topology command.
2023-10-25 13:03:57 +03:00
Gleb Natapov
ba217d9341 storage_service: raft topology: do not increase topology version during CDC repair
CDC repair operation does not change the topology, but it goes through
the same state as bootstrap that does. Distinguish between two cases and
increment the topology version only in the case of the bootstrap.
2023-10-25 13:03:56 +03:00
Gleb Natapov
8e393ea750 storage_service: raft topology: rollback topology operation on streaming failure.
Currently if a streaming fails during a topology operation the streaming
is retried until is succeeds. If it will never succeed it will be
retried forever. There is no way to stop the topology operation.

This patch introduce the rollback mechanism on streaming failure. If
streaming fails during bootstrap/replace the bootstrapping/replacing node
is moved to the left_token_ring state (and then left state)
and the operation has to be restarted after removing data directory. If
streaming fails during decommission/remove the node is moved back to
normal and the operation need to be restarted after the failure reason
is eliminated.
2023-10-25 13:03:55 +03:00
Gleb Natapov
0a8c3e5c78 storage_service: raft topology: load request parameters in left_token_ring state as well
Next patch will want to access request parameters in left_token_ring for
failure recovery purposes.
2023-10-25 12:56:27 +03:00
Gleb Natapov
49b6153d27 storage_service: raft topology: do not report term_changed_error during global_token_metadata_barrier as an error
Term change is not an error. Do not report it as such.
2023-10-25 12:56:27 +03:00
Gleb Natapov
5b760572df storage_service: raft topology: change global_token_metadata_barrier error handling to try/catch
Currently we get a future and check if it is failed, but with
co-routines the complication is not needed. And since we want to filer
out some errors in the next patch with try/catch it will be more
effective.
2023-10-25 12:56:27 +03:00
Gleb Natapov
466fe35474 storage_service: raft topology: make global_token_metadata_barrier node independent
We want to use global_token_metadata_barrier without the node, so make
it accept guard and excluded nodes directly.
2023-10-25 12:56:26 +03:00
Gleb Natapov
a49ae3ff87 storage_service: raft topology: split get_excluded_nodes from exec_global_command
Will be used later.
2023-10-25 12:56:26 +03:00
Gleb Natapov
897a7e599a storage_service: raft topology: drop unused include_local and do_retake parameters from exec_global_command which are always true 2023-10-25 12:56:26 +03:00
Gleb Natapov
7f1aa41e86 storage_service: raft topology: simplify streaming RPC failure handling
Currently streaming failure handling is different for "removing" and all
other operations. Unify them in one try/catch.
2023-10-25 12:56:26 +03:00
Piotr Dulikowski
a3ba4b3109 test: test_topology_ops: continuously write during the test
In order to detect issues where requests are routed incorrectly during
topology changes, modify the test_topology_ops test so that it runs a
background process that continuously writes while the test performs
topology changes in the cluster.

At the end of the test check whether:

- All writes were successful (we only require CL=LOCAL_ONE)
- Whether there are any errors from the replica side logic in the nodes'
  logs (which happen e.g. when node receives writes before learning
  about the schema)
2023-10-25 11:50:17 +02:00
Piotr Dulikowski
63aa9332aa raft topology: assign tokens after join node response rpc
Currently, when the topology coordinator accepts a node, it moves it to
bootstrap state and assigns tokens to it (either new ones during
bootstrap, or the replaced node's tokens). Only then it contacts the
joining node to tell it about the decision and let it perform a read
barrier.

However, this means that the tokens are inserted too early. After
inserting the tokens the cluster is free to route write requests to it,
but it might not have learned about all of the schema yet.

Fix the issue by inserting the tokens later, after completing the join
node response RPC which forces the receiving node to perform a read
barrier.
2023-10-25 11:50:17 +02:00
Piotr Dulikowski
46fce4cff3 storage_service: fix indentation after previous commit 2023-10-25 11:50:17 +02:00
Piotr Dulikowski
2d161676c7 raft topology: loosen assumptions about transition nodes having tokens
In later commits, tokens for a joining/replacing node will not be
inserted when the node enters `bootstrapping`/`replacing` state but at
some later step of the procedure. Loosen some of the assumptions in
`storage_service::topology_state_load` and
`system_keyspace::load_topology_state` appropriately.
2023-10-25 11:50:17 +02:00
Anna Stuchlik
e223624e2e doc: fix the Reference page layout
This commit fixes the layout of the Reference
page. Previously, the toctree level was "2",
which made the page hard to navigate.
This PR changes the level to "1".

In addition, the capitalization of page
titles is fixed.

This is a follow-up PR to the ones that
created and updated the Reference section.
It must be backported to branch-5.4.

Closes scylladb/scylladb#15830
2023-10-25 12:15:27 +03:00
Botond Dénes
ceb866fa2e Merge 'Make s3 upload sink PUT small objects' from Pavel Emelyanov
When upload-sink is flushed, it may notice that the upload had not yet been started and fall-back to plain PUT in that case. This will make small files uploading much nicer, because multipart upload would take 3 API calls (start, part, complete) in this case

fixes: #13014

Closes scylladb/scylladb#15824

* github.com:scylladb/scylladb:
  test: Add s3_client test for upload PUT fallback
  s3/client: Add PUT fallback to upload sink
2023-10-25 10:03:46 +03:00
Pavel Emelyanov
cb63d303f0 test: Make test_sstables_excluding_staging_correctness run over s3 too
This test checks the way sstable is moved and lives in staging state.
Now it passes on S3 as well

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
d827068d01 sstables,s3: Support state change (without generation change)
Now when the system.sstables has the state field, it can be changed
(UPDATEd). However, when changing the state AND generation, this still
won't work, because generation is the clustering key of the table in
question and cannot be just changed. This, nonetheless, is OK, as
generation changes with state only when moving an sstable from upload
dir into normal/staging and this is separate issue for S3 (#13018). For
now changing state only is OK.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
ca5d3d217f system_keyspace: Add state field to system.sstables
The state is one of <empty>(normal)/staging/quarantine. Currently when
sstable is moved to non-normal state the s3 backend state_change() call
throws thus such sstables do not appear. Next patches are going to
change that and the new field in the system.sstables is needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
295936c1d3 sstable_directory: Tune up sstables entries processing comment
In fact, this FIXME had been fixed by 2c9ec6bc (sstable_directory:
Garbage collect S3 sstables on reboot) and is no longer valid. However,
it's still good to know if GC failed or misbehaved, so replace the
comment with a warning.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
e4162227ff system_keyspace: Tune up status change trace message
There will appear very similar one tracing the state change, so it's
good to tell them from one another.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
63758d19ce sstables: Add state string to state enum class convert
There's the backward converter already out there. Next code will need to
convert string representation of the state back to the internal type.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 19:12:37 +03:00
Pavel Emelyanov
8e1ff745fa api: Remove http::context -> token_metadata dependency
Now the token metadata usage is fine grained by the relevant endpoint
handlers only.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:49:05 +03:00
Pavel Emelyanov
be9ea0c647 api: Pass shared_token_metadata instead of storage_service
The token metadata endpoints need token metadata, not storage service

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:48:27 +03:00
Pavel Emelyanov
c23193bed0 api: Move snitch endpoints that use token metadata only
Snitch is now a service can speaks for the local node only. In order to
get dc/rack for peers in the cluster one need to use topology which, in
turn, lives on token metadata. This patch moves the dc/rack getters to
api/token_metadata.cc next to other t.m. related endpoints.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:47:18 +03:00
Pavel Emelyanov
e4c0a4d34d api: Move storage_service endpoints that use token metadata only
There are few of them that don't need the storage service for anything
but get token metadata from. Move them to own .cc/.hh units.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 17:44:53 +03:00
Botond Dénes
6c90d166cc Merge 'build: cmake: avoid using large amount stack of when compiling parser ' from Kefu Chai
this mirrors what we have in `configure.py`, to build the CqlParser with `-O1`
and disable `-fsanitize-address-use-after-scope` when compiling CqlParser.cc
in order to prevent the compiler from emitting code which uses large amount of stack
space at the runtime.

Closes scylladb/scylladb#15819

* github.com:scylladb/scylladb:
  build: cmake: avoid using large amount stack of when compiling parser
  build: cmake: s/COMPILE_FLAGS/COMPILE_OPTIONS/
2023-10-24 16:19:51 +03:00
Nadav Har'El
4b80130b0b Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak
There are some schema modifications performed automatically (during bootstrap, upgrade etc.) by Scylla that are announced by multiple calls to `migration_manager::announce` even though they are logically one change. Precisely, they appear in:
- `system_distributed_keyspace::start`,
- `redis:create_keyspace_if_not_exists_impl`,
- `table_helper::setup_keyspace` (for the `system_traces` keyspace).

All these places contain a FIXME telling us to `announce` only once. There are a few reasons for this:
- calling `migration_manager::announce` with Raft is quite expensive -- taking a `read_barrier` is necessary, and that requires contacting a leader, which then must contact a quorum,
- we must implement a retrying mechanism for every automatic `announce` if `group0_concurrent_modification` occurs to enable support for concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs mentioned above would be harder, and fixing the FIXMEs later would also be harder.

This PR fixes the first two FIXMEs and improves the situation with the last one by reducing the number of the `announce` calls to two. Unfortunately, reducing this number to one requires a big refactor. We can do it as a follow-up to a new, more specific issue. Also, we leave a new FIXME.

Fixing the first two FIXMEs required enabling the announcement of a keyspace together with its tables. Until now, the code responsible for preparing mutations for a new table could assume the existence of the keyspace. This assumption wasn't necessary, but removing it required some refactoring.

Fixes #15437

Closes scylladb/scylladb#15594

* github.com:scylladb/scylladb:
  table_helper: announce twice in setup_keyspace
  table_helper: refactor setup_table
  redis: create_keyspace_if_not_exists_impl: fix indentation
  redis: announce once in create_keyspace_if_not_exists_impl
  db: system_distributed_keyspace: fix indentation
  db: system_distributed_keyspace: announce once in start
  tablet_allocator: update on_before_create_column_family
  migration_listener: add parameter to on_before_create_column_family
  alternator: executor: use new prepare_new_column_family_announcement
  alternator: executor: introduce create_keyspace_metadata
  migration_manager: add new prepare_new_column_family_announcement
2023-10-24 15:42:48 +03:00
David Garcia
a5519c7c1f docs: update cofig params design
Closes scylladb/scylladb#15827
2023-10-24 15:41:56 +03:00
Kefu Chai
f8104b92f8 build: cmake: detect rapidxml
we use rapidxml for parsing XML, so let's detect it before using it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15813
2023-10-24 15:12:04 +03:00
Pavel Emelyanov
caa3e751f7 test: Add s3_client test for upload PUT fallback
The test case creates non-jumbo upload simk and puts some bytes into it,
then flushes. In order to make sure the fallback did took place the
multipar memory tracker sempahore is broken in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 15:03:53 +03:00
Kamil Braun
db49ccccb0 view: remove unused _backing_secondary_index
This boolean was only used for a sanity check which was replaced with a
stronger sanity check in the previous commit that doesn't require the
boolean.
2023-10-24 13:33:36 +02:00
Kamil Braun
3976808b12 schema_tables: turn view schema fixing code into a sanity check
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.

The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).

The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.

The main motivation of this patch is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.

The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.
2023-10-24 13:33:35 +02:00
Kamil Braun
f02ac9a9e7 schema_tables: make comment more precise
`maybe_fix_legacy_secondary_index_mv_schema` function has this piece of
code:

```
// If the first clustering key part of a view is a column with name not found in base schema,
// it implies it might be backing an index created before computed columns were introduced,
// and as such it must be recreated properly.
if (!base_schema->columns_by_name().contains(first_view_ck.name())) {
    schema_builder builder{schema_ptr(v)};
    builder.mark_column_computed(first_view_ck.name(), std::make_unique<legacy_token_column_computation>());
   if (preserve_version) {
       builder.with_version(v->version());
   }
   return view_ptr(builder.build());
}
```

The comment uses the phrase "it might be".
However, the code inside the `if` assumes that it "must be": once we
determined that the first column in this materialized view does not have
a corresponding name in the base table, we set it to be computed using
`legacy_token_column_computation`, so we assumed that the column was
indeed storing the token. Doing that for a column which is not the token
column would be a small disaster.

Assuming that the code is correct, we can make the comment more precise.

I checked the documentation and I don't see any other way how we could
have such a column other than the token column which is internally
created by Scylla when creating a secondary index (for example, it is
forbidden to use an alias in select statement when creating materialized
views, which I checked experimentally).
2023-10-24 13:30:13 +02:00
Kamil Braun
5397524875 feature_service: make COMPUTED_COLUMNS feature unconditionally true
The feature is assumed to be true, it was introduced in 2019.
It's still advertised in gossip, but it's assumed to always be present.

The `schema_feature` enum class still contains `COMPUTED_COLUMNS`,
and the `all_tables` function in schema_tables.cc still checks for the
schema feature when deciding if `computed_columns()` table should be
included. This is necessary because digest calculation tests contain
many digests calculated with the feature disabled, if we wanted to make
it unconditional in the schema_tables code we'd have to regenerate
almost all digests in the tests. It is simpler to leave the possibility
for the tests to disable the feature.
2023-10-24 13:30:13 +02:00
Kamil Braun
2a21029ff5 Merge 'make topology_coordinator::run noexcept' from Gleb
Topology coordinator should handle failures internally as long as it
remains to be the coordinator. The raft state monitor is not in better
position to handle any errors thrown by it, all it can do it to restart
the coordinator. The series makes topology_coordinator::run handle all
the errors internally and mark the function as noexcept to not leak
error handling complexity into the raft state monitor.

* 'gleb/15728-fix' of github.com:scylladb/scylla-dev:
  storage_service: raft topology: mark topology_coordinator::run function as noexcept
  storage_service: raft topology: do not throw error from fence_previous_coordinator()
2023-10-24 12:16:36 +02:00
Kefu Chai
4abcec9296 test: add __repr__ for MinIoServer and S3_Server
it is printed when pytest passes it down as a fixture as part of
the logging message. it would help with debugging a object_store test.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15817
2023-10-24 12:35:49 +03:00
Pavel Emelyanov
63f2bdca01 s3/client: Add PUT fallback to upload sink
When the non-jumbo sink is flushed and notices that the real upload is
not started yet, it may just go ahead and PUT the buffers into the
object with the single request.

For jumbo sink the fallback is not implemented as it likely doesn't make
and any sense -- jumbo sinks are unlikely to produce less than 5Mb of
data so it's going to be dead code anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 10:59:46 +03:00
Gleb Natapov
dcaaa74cd4 storage_service: raft topology: mark topology_coordinator::run function as noexcept
The function handled all exceptions internally. By making it noexcept we
make sure that the caller (raft_state_monitor_fiber) does not need
handle any exceptions from the topology coordinator fiber.
2023-10-24 10:58:45 +03:00
Gleb Natapov
65bf5877e7 storage_service: raft topology: do not throw error from fence_previous_coordinator()
Throwing error kills the topology coordinator monitor fiber. Instead we
retry the operation until it succeeds or the node looses its leadership.
This is fine before for the operation to succeed quorum is needed and if
the quorum is not available the node should relinquish its leadership.

Fixes #15728
2023-10-24 10:57:48 +03:00
Botond Dénes
23898581d5 cql3: mutation_fragments_select_statement: use host_id instead of node
The statement only uses the node to get its host_id later. Simpler to
obtain and store only the host_id int he first place.
2023-10-24 03:12:58 -04:00
Botond Dénes
3cb1669340 cql3: mutation_fragments_select_statement: pin erm reference
This query bypasses the usual read-path in storage-proxy and therefore
also misses the erm pinning done by storage-proxy. To avoid a vnode
being pulled from under its feet, do the erm pinning in the statement
itself.
2023-10-24 03:12:36 -04:00
Botond Dénes
0cba973972 Update tools/java submodule
* tools/java 3c09ab97...86a200e3 (1):
  > cassandra-stress: add storage options
2023-10-24 09:41:36 +03:00
Kefu Chai
9347b61d3b build: cmake: avoid using large amount stack of when compiling parser
this mirrors what we have in `configure.py`, to build the CqlParser with -O1
and disable sanitize-address-use-after-scope when compiling CqlParser.cc
in order to prevent the compiler from emitting code which uses large amount of stack
at the runtime.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-24 12:40:20 +08:00
Kefu Chai
3da02e1bf4 build: cmake: s/COMPILE_FLAGS/COMPILE_OPTIONS/
according to
https://cmake.org/cmake/help/latest/prop_sf/COMPILE_FLAGS.html,
COMPILE_FLAGS has been superseded by COMPILE_OPTIONS. so let's
replace the former with the latter.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-24 12:40:20 +08:00
Pavel Emelyanov
7c580b4bd4 Merge 'sstable: switch to uuid identifier for naming S3 sstable objects' from Kefu Chai
before this change, we create a new UUID for a new sstable managed by the s3_storage, and we use the string representation of UUID defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for naming the objects stored on s3_storage. but this representation is not what we are using for storing sstables on local filesystem when the option of "uuid_sstable_identifiers_enabled" is enabled. instead, we are using a base36-based representation which is shorter.

to be consistent with the naming of the sstables created for local filesystem, and more importantly, to simplify the interaction between the local copy of sstables and those stored on object storage, we should use the same string representation of the sstable identifier.

so, in this change:

1. instead of creating a new UUID, just reuse the generation of the sstable for the object's key.
2. do not store the uuid in the sstable_registry system table. As we already have the generation of the sstable for the same purpose.
3. switch the sstable identifier representation from the one defined by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the base36-based one (implemented by fmt::formatter<sstables::generation_type>)

Fixes #14175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#14406

* github.com:scylladb/scylladb:
  sstable: remove _remote_prefix from s3_storage
  sstable: switch to uuid identifier for naming S3 sstable objects
2023-10-23 21:05:13 +03:00
Pavel Emelyanov
d7031de538 Merge 'test/pylib: extract the env variable related functions out' from Kefu Chai
this series extracts the the env variables related functions out and remove unused `import`s for better readability.

Closes scylladb/scylladb#15796

* github.com:scylladb/scylladb:
  test/pylib: remove duplicated imports
  test/pylib: extract the env variable printing into MinIoServer
  test/pylib: extract _set_environ() out
2023-10-23 21:03:03 +03:00
Aleksandra Martyniuk
0c6a3f568a compaction: delete default_compaction_progress_monitor
default_compaction_progress_monitor returns a reference to a static
object. So, it should be read-only, but its users need to modify it.

Delete default_compaction_progress_monitor and use one's own
compaction_progress_monitor instance where it's needed.

Closes scylladb/scylladb#15800
2023-10-23 16:03:34 +03:00
Anna Stuchlik
55ee999f89 doc: enable publishing docs for branch-5.4
This commit enables publishing documentation
from branch-5.4. The docs will be published
as UNSTABLE (the warning about version 5.4
being unstable will be displayed).

Closes scylladb/scylladb#15762
2023-10-23 15:47:01 +03:00
Botond Dénes
8180f61147 test/boost/multishard_mutation_query_test: fix querier cache misses expectations
There are two tests, test_read_all and
test_read_with_partition_row_limits, which asserts on every page as well
as at the end that there are no misses whatsoever. This is incorrect,
because it is possible that on a given page, not all shards participate
and thus there won't be a saved reader on every shard. On the subsequent
page, a shard without a reader may produce a miss. This is fine.
Refine the asserts, to check that we have only as much misses, as many
shards we have without readers on them.
2023-10-23 08:07:14 -04:00
Botond Dénes
0a34f29ea5 test/lib/test_utils: add require_* variants for all comparators
Not just equal. This allows for better error messages, printing both
values and the failed relation operator, instead of a generic fail
message.
2023-10-23 07:52:38 -04:00
Avi Kivity
ee9cc450d4 logalloc: report increases of reserves
The log-structured allocator maintains memory reserves to so that
operations using log-strucutured allocator memory can have some
working memory and can allocate. The reserves start small and are
increased if allocation failures are encountered. Before starting
an operation, the allocator first frees memory to satisfy the reserves.

One problem is that if the reserves are set to a high value and
we encounter a stall, then, first, we have no idea what value
the reserves are set to, and second, we have no idea what operation
caused the reserves to be increased.

We fix this problem by promoting the log reports of reserve increases
from DEBUG level to INFO level and by attaching a stack trace to
those reports. This isn't optimal since the messages are used
for debugging, not for informing the user about anything important
for the operation of the node, but I see no other way to obtain
the information.

Ref #13930.

Closes scylladb/scylladb#15153
2023-10-23 13:37:50 +02:00
Tomasz Grabiec
4af585ec0e Merge 'row_cache: make_reader_opt(): make make_context() reentrant ' from Botond Dénes
Said method is called in an allocating section, which will re-try the enclosed lambda on allocation failure. `read_context()` however moves the permit parameter so on the second and later calls, the permit will be in a moved-from state, triggering a `nullptr` dereference and therefore a segfault.

We already have a unit test (`test_exception_safety_of_reads` in `row_cache_test.cc`) which was supposed to cover this, but:
* It only tests range scans, not single partition reads, which is a separate path.
* Turns out allocation failure tests are again silently broken (no error is injected at all). This is because `test/lib/memtable_snapshot_source.hh` creates a critical alloc section which accidentally covers the entire duration of tests using it.

Fixes: #15578

Closes scylladb/scylladb#15614

* github.com:scylladb/scylladb:
  test/boost/row_cache_test: test_exception_safety_of_reads: also cover single-partition reads
  test/lib/memtable_snapshot_source: disable critical alloc section while waiting
  row_cache: make_reader_opt(): make make_context() reentrant
2023-10-23 11:22:13 +02:00
Raphael S. Carvalho
ea6c281b9f replica: Fix major compaction semantics by performing off-strategy first
Major compaction semantics is that all data of a table will be compacted
together, so user can expect e.g. a recently introduced tombstone to be
compacted with the data it shadows.
Today, it can happen that all data in maintenance set won't be included
for major, until they're promoted into main set by off-strategy.
So user might be left wondering why major is not having the expected
effect.
To fix this, let's perform off-strategy first, so data in maintenance
set will be made available by major. A similar approach is done for
data in memtable, so flush is performed before major starts.
The only exception will be data in staging, which cannot be compacted
until view building is done with it, to avoid inconsistency in view
replicas.
The serialization in comapaction manager of reshape jobs guarantee
correctness if there's an ongoing off-strategy on behalf of the
table.

Fixes #11915.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15792
2023-10-23 11:32:03 +03:00
Nadav Har'El
e7dd0ec033 test/cql-pytest: reproduce incompatibility with same-name bind marks
This patch adds a reproducer for a minor compatibility between Scylla's
and Cassandra's handling of a prepared statement when a bind marker with
the same name is used more than once, e.g.,
```
SELECT * FROM tbl WHERE p=:x AND c=:x
```
It turns out that Scylla tells the driver that there is only one bind
marker, :x, whereas Cassandra tells the driver that there are two bind
markers, both named :x. This makes no different if the user passes
a map `{'x': 3}`, but if the user passes a tuple, Scylla accepts only
`(3,)` (assigning both bind markers the same value) and Cassandra
accepts only `(3,3)`.

The test added in this patch demonstrates this incompatibility.
It fails on Scylla, passes on Cassandra, and is marked "xfail".

Refs #15559

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#15564
2023-10-23 11:19:15 +03:00
Aleksandra Martyniuk
a1271d2d5c repair: throw more detailed exception
Exception thrown from row_level_repair::run does not show the root
cause of a failure making it harder to debug.

Add the internal exception contents to runtime_error message.

After the change the log will mention the real cause (last line), e.g.:

repair - repair[92db0739-584b-4097-b6e2-e71a66e40325]: 33 out of 132 ranges failed,
keyspace=system_distributed, tables={cdc_streams_descriptions_v2, cdc_generation_timestamps,
view_build_status, service_levels}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false,
failed_because=seastar::nested_exception: std::runtime_error (Failed to repair for keyspace=system_distributed,
cf=cdc_streams_descriptions_v2, range=(8720988750842579417,+inf))
(while cleaning up after seastar::abort_requested_exception (abort requested))

Closes scylladb/scylladb#15770
2023-10-23 11:15:25 +03:00
Botond Dénes
950a1ff22c Merge 'doc: improve the docs for handling failures' from Anna Stuchlik
This PR improves the way of how handling failures is documented and accessible to the user.
- The Handling Failures section is moved from Raft to Troubleshooting.
- Two new topics about failure are added to Troubleshooting with a link to the Handling Failures page (Failure to Add, Remove, or Replace a Node, Failure to Update the Schema).
- A note is added to the add/remove/replace node procedures to indicate that a quorum is required.

See individual commits for more details.

Fixes https://github.com/scylladb/scylladb/issues/13149

Closes scylladb/scylladb#15628

* github.com:scylladb/scylladb:
  doc: add a note about Raft
  doc: add the quorum requirement to procedures
  doc: add more failure info to Troubleshooting
  doc: move Handling Failures to Troubleshooting
2023-10-23 11:09:28 +03:00
Kefu Chai
5a17a02abb build: cmake: add -ffile-prefix-map option
this mirrors what we already have in configure.py.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15798
2023-10-23 10:26:21 +03:00
Botond Dénes
940c2d1138 Merge 'build: cmake: use add_compile_options() and add_link_options() when appropriate ' from Kefu Chai
instead of appending the options to the CMake variables, use the command to do this. simpler this way. and the bonus is that the options are de-duplicated.

Closes scylladb/scylladb#15797

* github.com:scylladb/scylladb:
  build: cmake: use add_link_options() when appropriate
  build: cmake: use add_compile_options() when appropriate
2023-10-23 09:58:10 +03:00
Botond Dénes
c960c2cdbf Merge 'build: extract code fragments into functions' from Kefu Chai
this series is one of the steps to remove global statements in `configure.py`.

not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.

Refs #15379

Closes scylladb/scylladb#15780

* github.com:scylladb/scylladb:
  build: update modeval using a dict
  build: pass args.test_repeat and args.test_timeout explicitly
  build: pull in jsoncpp using "pkgs"
  build: build: extract code fragments into functions
2023-10-23 09:42:37 +03:00
Kefu Chai
0080b15939 build: cmake: use add_link_options() when appropriate
instead of appending to CMAKE_EXE_LINKER_FLAGS*, use
add_link_options() to add more options. as CMAKE_EXE_LINKER_FLAGS*
is a string, and typically set by user, let's use add_link_options()
instead.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 12:06:42 +08:00
Kefu Chai
686adec52e build: cmake: use add_compile_options() when appropriate
instead of appending to CMAKE_CXX_FLAGS, use add_compile_options()
to add more options. as CMAKE_CXX_FLAGS is a string, and typically
set by user, let's use add_compile_options() instead, the options
added by this command will be added before CMAKE_CXX_FLAGS, and
will have lower priority.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 12:06:42 +08:00
Kefu Chai
8756838b16 test/pylib: remove duplicated imports
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:36:05 +08:00
Kefu Chai
6b84bc50c3 test/pylib: extract the env variable printing into MinIoServer
less repeatings this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:36:05 +08:00
Kefu Chai
02cad8f85b test/pylib: extract _set_environ() out
will add _unset_environ() later. extracting this helper out helps
with the readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:36:05 +08:00
Kefu Chai
b36cef6f1a sstable: remove _remote_prefix from s3_storage
since we use the sstable.generation() for the remote prefix of
the key of the object for storing the sstable component, there is
no need to set remote_prefix beforehand.

since `s3_storage::ensure_remote_prefix()` and
`system_kesypace::sstables_registry_lookup_entry()` are not used
anymore, they are removed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:08:22 +08:00
Kefu Chai
af8bc8ba63 sstable: switch to uuid identifier for naming S3 sstable objects
before this change, we create a new UUID for a new sstable managed
by the s3_storage, and we use the string representation of UUID
defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for
naming the objects stored on s3_storage. but this representation is
not what we are using for storing sstables on local filesystem when
the option of "uuid_sstable_identifiers_enabled" is enabled. instead,
we are using a base36-based representation which is shorter.

to be consistent with the naming of the sstables created for local
filesystem, and more importantly, to simplify the interaction between
the local copy of sstables and those stored on object storage, we should
use the same string representation of the sstable identifier.

so, in this change:

1. instead of creating a new UUID, just reuse the generation of the
   sstable for the object's key.
2. do not store the uuid in the sstable_registry system table. As
   we already have the generation of the sstable for the same purpose.
3. switch the sstable identifier representation from the one defined
   by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the
   base36-based one (implemented by
   fmt::formatter<sstables::generation_type>)
4. enable the `uuid_sstable_identifers` cluster feature if it is
   enabled in the `test_env_config`, so that it the sstable manager
   can enable the uuid-based uuid when creating a new uuid for
   sstable.
5. throw if the generation of sstable is not UUID-based when
   accessing / manipulating an sstable with S3 storage backend. as
   the S3 storage backend now relies on this option. as, otherwise
   we'd have sstables with key like s3://bucket/number/basename, which
   is just unable to serve as a unique id for sstable if the bucket is
   shared across multiple tables.

Fixes #14175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-23 10:08:22 +08:00
Avi Kivity
f181ac033a Merge 'tools/nodetool: implement additional commands, part 2/N' from Botond Dénes
The following new commands are implemented:
* stop
* compactionhistory

All are associated with tests. All tests (both old and new) pass with both the scylla-native and the cassandra nodetool implementation.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15649

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement compactionhistory command
  tools/scylla-nodetool: implement stop command
  mutation/json: extract generic streaming writer into utils/rjson.hh
  test/nodetool: rest_api_mock.py: add support for error responses
2023-10-21 00:11:42 +03:00
Botond Dénes
19fc01be23 Merge 'Sanitize API -> task_manager dependency' from Pavel Emelyanov
This is the continuation of 8c03eeb85d

Registering API handlers for services need to

* get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything)
* unset the handlers on stop so that the service is not used after it's stopped (and before API server is stopped)

This makes task manager handlers work this way

Closes scylladb/scylladb#15764

* github.com:scylladb/scylladb:
  api: Unset task_manager test API handlers
  api: Unset task_manager API handlers
  api: Remove ctx->task_manager dependency
  api: Use task_manager& argument in test API handlers
  api: Push sharded<task_manager>& down the test API set calls
  api: Use task_manager& argument in API handlers
  api: Push sharded<task_manager>& down the API set calls
2023-10-20 18:07:20 +03:00
Botond Dénes
4b57c2bf18 tools/scylla-nodetool: implement compactionhistory command 2023-10-20 10:55:38 -04:00
Botond Dénes
f811a63e1b docs: nodetool compact: remove common arguments
These are already documented in the nodetool index page. The list in the
nodetool index page is less informative, so copy the list from nodetool
compact over there.
2023-10-20 10:16:42 -04:00
Botond Dénes
397f67990f docs: nodetool stop: fix compaction types and examples
Nodetool doesn't recognize RESHARD, even though ScyllaDB supports
stopping RESHARD compaction.
Remove VALIDATE from the list - ScyllaDB doesn't support it.
Add a note about the unimplemented --id option.
Fix the examples, they are broken.
Fix the entry in the nodetool command list, the command is called
`stop`, not `stop compaction`.
2023-10-20 10:15:47 -04:00
Botond Dénes
70ba6b94c3 docs: nodetool compact: remove unsupported partition option
This option is not supported by either the nodetool frontend, nor
ScyllaDB itself. Remove it.
Also improve the wording on the unsupported options.
2023-10-20 10:15:44 -04:00
Botond Dénes
a212ddc5b1 tools/scylla-nodetool: implement stop command 2023-10-20 10:04:56 -04:00
Botond Dénes
9231454acd mutation/json: extract generic streaming writer into utils/rjson.hh
This writer is generally useful, not just for writing mutations as json.
Make it generally available as well.
2023-10-20 10:04:56 -04:00
Botond Dénes
6db2698786 test/nodetool: rest_api_mock.py: add support for error responses 2023-10-20 10:04:56 -04:00
Kefu Chai
9f62bfa961 build: update modeval using a dict
instead of updating `modes` in with global statements, update it in
a function. for better readablity. and to reduce the statements in
global scope.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-20 21:37:07 +08:00
Botond Dénes
ad90bb8d87 replica/database: remove "streaming" from dirty memory metric description
We don't have streaming memtables for a while now.

Closes scylladb/scylladb#15638
2023-10-20 13:09:57 +03:00
Kefu Chai
c240c70278 build: pass args.test_repeat and args.test_timeout explicitly
for better readability.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-20 16:53:16 +08:00
Kefu Chai
c2cd11a8b3 build: pull in jsoncpp using "pkgs"
this change adds "jsoncpp" dependency using "pkgs". simpler this
way. it also helps to remove more global statements.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-20 16:53:16 +08:00
Kefu Chai
890113a9cf build: build: extract code fragments into functions
this change extract `get_warnings_options()` out. it helps to
remove more global statements.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-20 16:53:16 +08:00
Patryk Jędrzejczak
fbcd667030 replica: keyspace::create_replication_strategy: remove a redundant parameter
The options parameter is redundant. We always use
`_metadata->strategy_options()` and
`keyspace::create_replication_strategy` already assumes that
`_metadata` is set by using its other fields.

Closes scylladb/scylladb#15776
2023-10-20 10:20:49 +03:00
Botond Dénes
460bc7d8e1 test/boost/row_cache_test: test_exception_safety_of_reads: also cover single-partition reads
The test currently only covers scans. Single partition reads have a
different code-path, make sure it is also covered.
2023-10-20 03:16:57 -04:00
Botond Dénes
ffefa623f4 test/lib/memtable_snapshot_source: disable critical alloc section while waiting
memtable_snapshot_source starts a background fiber in its constructor,
which compacts LSA memory in a loop. The loop's inside is covered with a
critical alloc section. It also contains a wait on a condition variable
and in its present form the critical section also covers the wait,
effectively turning off allocation failure injection for any test using
the memtable_snapshot_source.
This patch disables the critical alloc section while the loop waits on
the condition variable.
2023-10-20 03:16:57 -04:00
Botond Dénes
92966d935a row_cache: make_reader_opt(): make make_context() reentrant
Said lambda currently moves the permit parameter, so on the second and
later calls it will possibly run into use-after-move. This can happen if
the allocating section below fails and is re-tried.
2023-10-20 03:16:57 -04:00
Kefu Chai
11d7cadf0d install-dependencies.sh: drop java deps
the java related build dependencies are installed by

* tools/java/install-dependencies.sh
* tools/jmx/install-dependencies.sh

respectively. and the parent `install-dependencies.sh` always
invoke these scripts, so there is no need to repeat them in the
parent `install-dependenceies.sh` anymore.

in addition to dedup the build deps, this change also helps to
reduce the size of build dependencies. as by default, `dnf`
install the weak deps, unless `-setopt=install_weak_deps=False`
is passed to it, so this change also helps to reduce the traffic
and foot print of the installed packages for building scylla.

see also 9dddad27bf

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15473
2023-10-20 09:43:28 +03:00
Kamil Braun
059d647ee5 test/pylib: scylla_cluster: protect ScyllaCluster.stop with a lock
test.py calls `uninstall()` and `stop()` concurrently from exit
artifacts, and `uninstall()` internally calls `stop()`. This leads to
premature releasing of IP addresses from `uninstall()` (returning IPs to
the pool) while the servers using those IPs are still stopping. Then a
server might obtain that IP from the pool and fail to start due to
"Address already in use".

Put a lock around the body of `stop()` to prevent that.

Fixes: scylladb/scylladb#15755

Closes scylladb/scylladb#15763
2023-10-20 09:30:37 +03:00
Kefu Chai
80c656a08b types: use more readable error message when serializing non-ASCII string
before this change, we print

marshaling error: Value not compatible with type org.apache.cassandra.db.marshal.AsciiType: '...'

but the wording is not quite user friendly, it is a mapping of the
underlying implementation, user would have difficulty understanding
"marshaling" and/or "org.apache.cassandra.db.marshal.AsciiType"
when reading this error message.

so, in this change

1. change the error message to:
     Invalid ASCII character in string literal: '...'
   which should be more straightforward, and easier to digest.
2. update the test accordingly

please note, the quoted non-ASCII string is preserved instead of
being printed in hex, as otherwise user would not be able to map it
with his/her input.

Refs #14320
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15678
2023-10-20 09:25:44 +03:00
Pavel Emelyanov
0c69a312db Update seastar submodule
* seastar bab1625c...17183ed4 (73):
  > thread_pool: Reference reactor, not point to
  > sstring: inherit publicly from string_view formatter
  > circleci: use conditional steps
  > weak_ptr: include used header
  > build: disable the -Wunused-* warnings for checkheaders
  > resource: move variable into smaller lexical scope
  > resource: use structured binding when appropriate
  > httpd: Added server and client addresses to request structure
  > io_queue: do not dereference moved-away shared pointer
  > treewide: explicitly define ctor and assignment operator
  > memory: use `err` for the error string
  > doc: Add document describing all the math behind IO scheduler
  > io_queue: Add flow-rate based self slowdown backlink
  > io_queue: Make main throttler uncapped
  > io_queue: Add queue-wide metrics
  > io_queue: Introduce "flow monitor"
  > io_queue: Count total number of dispatched and completed requests so far
  > io_queue: Introduce io_group::io_latency_goal()
  > tests: test the vector overload for when_all_succeed
  > core: add a vector overload to when_all_succeed
  > loop: Fix iterator_range_estimate_vector_capacity for random iters
  > loop: Add test for iterator_range_estimate_vector_capacity
  > core/posix return old behaviour using non-portable pthread_attr_setaffinity_np when present
  > memory: s/throw()/noexcept/
  > build: enable -Wdeprecated compiler option
  > reactor: mark kernel_completion's dtor protected
  > tests: always wait for promise
  > http, json, net: define-generated copy ctor for polymorphic types
  > treewide: do not define constexpr static out-of-line
  > reactor: do not define dtor of kernel_completion
  > http/exception: stop using dynamic exception specification
  > metrics: replace vector with deque
  > metrics: change metadata vector to deque
  > utils/backtrace.hh: make simple_backtrace formattable
  > reactor: Unfriend disk_config_params
  > reactor: Move add_to_flush_poller() to internal namespace
  > reactor: Unfriend a bunch of sched group template calls
  > rpc_test: Test rpc send glitches
  > net: Implement batch flush support for existing sockets
  > iostream: Configure batch flushes if sink can do it
  > net: Added remote address accessors
  > circleci: update the image to CircleCI "standard" image
  > build: do not add header check target if no headers to check
  > build: pass target name to seastar_check_self_contained
  > build: detect glibc features using CMake
  > build: extract bits checking libc into CheckLibc.cmake
  > http/exception: add formatter for httpd::base_exception
  > http/client: Mark write_body() const
  > http/client: Introduce request::_bytes_written
  > http/client: Mark maybe_wait_for_continue() const
  > http/client: Mark send_request_head() const
  > http/client: Detach setup_request()
  > http/api_docs: copy in api_docs's copy constructor
  > script: do not inherit from object
  > scripts: addr2line: change StdinBacktraceIterator to a function
  > scripts: addr2line: use yield instead defining a class
  > tests: skip tests that require backtrace if execinfo.h is not found
  > backtrace: check for existence of execinfo.h
  > core: use ino_t and off_t as glibc sets these to 64bit if 64bit api is used
  > core: add sleep_abortable instantiation for manual_clock
  > tls: Return EPIPE exception when writing to shutdown socket
  > http/client: Don't cache connection if server advertises it
  > http/client: Mark connection as "keep in cache"
  > core: fix strerror_r usage from glibc extension
  > reactor: access sigevent.sigev_notify_thread_id with a macro
  > posix: use pthread_setaffinity_np instead of pthread_attr_setaffinity_np
  > reactor: replace __mode_t with mode_t
  > reactor: change sys/poll.h to posix poll.h
  > rpc: Add unit test for per-domain metrics
  > rpc: Report client connections metrics
  > rpc: Count dead client stats
  > rpc: Add seastar::rpc::metrics
  > rpc: Make public queues length getters

io-scheduler fixes
refs: #15312
refs: #11805

http client fixes
refs: #13736
refs: #15509

rpc fixes
refs: #15462

Closes scylladb/scylladb#15774
2023-10-19 20:52:37 +03:00
Tomasz Grabiec
899ecaffcd test: tablets: Enable for verbose logging in test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode
To help diagnose #14746 where we experience timeouts due to connection
dropping.

Closes scylladb/scylladb#15773
2023-10-19 16:58:53 +03:00
Raphael S. Carvalho
fded314e46 sstables: Fix update of tombstone GC settings to have immediate effect
After "repair: Get rid of the gc_grace_seconds", the sstable's schema (mode,
gc period if applicable, etc) is used to estimate the amount of droppable
data (or determine full expiration = max_deletion_time < gc_before).
It could happen that the user switched from timeout to repair mode, but
sstables will still use the old mode, despite the user asked for a new one.
Another example is when you play with value of grace period, to prevent
data resurrection if repair won't be able to run in a timely manner.
The problem persists until all sstables using old GC settings are recompacted
or node is restarted.
To fix this, we have to feed latest schema into sstable procedures used
for expiration purposes.

Fixes #15643.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#15746
2023-10-19 16:27:59 +03:00
Kefu Chai
a6e68d8309 build: cmake: move message/* into message/CMakeLists.txt
messaging_service.cc depends on idl, but many source files in
scylla-main do no depend on idl, so let's

* move "message/*" into its own directory and add an inter-library
  dependency between it and the "idl" library.
* rename the target of "message" under test/manual to "message_test"
  to avoid the name collision

this should address the compilation failure of
```
FAILED: CMakeFiles/scylla-main.dir/message/messaging_service.cc.o
/usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_BROKEN_SOURCE_LOCATION -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -march=westmere  -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT CMakeFiles/scylla-main.dir/message/messaging_service.cc.o -MF CMakeFiles/scylla-main.dir/message/messaging_service.cc.o.d -o CMakeFiles/scylla-main.dir/message/messaging_service.cc.o -c /home/kefu/dev/scylladb/message/messaging_service.cc
/home/kefu/dev/scylladb/message/messaging_service.cc:81:10: fatal error: 'idl/join_node.dist.hh' file not found
         ^~~~~~~~~~~~~~~~~~~~~~~
```
where the compiler failed to find the included `idl/join_node.dist.hh`,
which is exposed by the idl library as part of its public interface.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15657
2023-10-19 13:33:29 +03:00
Botond Dénes
60145d9526 Merge 'build: extract code fragments into functions' from Kefu Chai
this series is one of the steps to remove global statements in `configure.py`.

not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.

Refs #15379

Closes scylladb/scylladb#15668

* github.com:scylladb/scylladb:
  build: move check for NIX_CC into dynamic_linker_option()
  build: extract dynamic_linker_option(): out
  build: move `headers` into write_build_file()
2023-10-19 13:31:33 +03:00
Avi Kivity
39966e0eb1 Merge 'build: cmake: pass -dynamic-linker to ld' from Kefu Chai
to match the behavior of `configure.py`.

Closes scylladb/scylladb#15667

* github.com:scylladb/scylladb:
  build: cmake: pass -dynamic-linker to ld
  build: cmake: set CMAKE_EXE_LINKER_FLAGS in mode.common.cmake
2023-10-19 13:15:47 +03:00
Aleksandra Martyniuk
56221f2161 test: test abort of compaction task that isn't started yet
Test whether a task which parent was aborted has a proper status.
2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk
520d9db92d test: test running compaction task abort
Test whether a task which is aborted while running has a proper status.
2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk
b91064bd2a tasks: fail if a task was aborted
run() method of task_manager::task::impl does not have to throw when
a task is aborted with task manager api. Thus, a user will see that
the task finished successfully which makes it inconsistent.

Finish a task with a failure if it was aborted with task manager api.
2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk
0681795417 compaction: abort task manager compaction tasks
Set top level compaction tasks as abortable.

Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
2023-10-19 10:47:17 +02:00
Jan Ciolek
c256cca6f1 cql3/expr: add more comments in expression.hh
`expression` is a std::variant with 16 different variants
that represent different types of AST nodes.

Let's add documentation that explains what each of these
16 types represents. For people who are not familiar with expression
code it might not be clear what each of them does, so let's add
clear descriptions for all of them.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes scylladb/scylladb#15767
2023-10-19 10:56:38 +03:00
Kefu Chai
b105be220b build: cmake: add join_node.idl.hh to CMake
we add a new verb in 7cbe5e3af8, so
let's update the CMake-based building system accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15658
2023-10-19 10:19:16 +03:00
Nikita Kurashkin
2a7932efa1 alternator: fix DeleteTable return values to match DynamoDB's
It seems that Scylla has more values returned by DeleteTable operation than DynamoDB.
In this patch I added a table status check when generating output.
If we delete the table, values KeySchema, AttributeDefinitions and CreationDateTime won't be returned.
The test has also been modified to check that these attributes are not returned.

Fixes scylladb#14132

Closes scylladb/scylladb#15707
2023-10-19 09:34:16 +03:00
Pavel Emelyanov
ec94cc9538 Merge 'test: set use_uuid to true by default in sstables::test_env ' from Kefu Chai
this series

1. let sstable tests using test_env to use uuid-based sstable identifiers by default
2. let the test who requires integer-based identifier keep using it

this should enable us to perform the s3 related test after enforcing the uuid-based identifier for s3 backend, otherwise the s3 related test would fail as it also utilize `test_env`.

Closes scylladb/scylladb#14553

* github.com:scylladb/scylladb:
  test: set use_uuid to true by default in sstables::test_env
  test: enable test to set uuid_sstable_identifiers
2023-10-19 09:09:38 +03:00
Pavel Emelyanov
0981661f8b api: Unset task_manager test API handlers
So that the task_manager reference is not used when it shouldn't on stop

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:56:24 +03:00
Pavel Emelyanov
2d543af78e api: Unset task_manager API handlers
So that the task_manager reference is not used when it shouldn't on stop

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:56:01 +03:00
Pavel Emelyanov
0632ad50f3 api: Remove ctx->task_manager dependency
Now the task manager's API (and test API) use the argument and this
explicit dependency is no longer required

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:55:27 +03:00
Pavel Emelyanov
572c880d97 api: Use task_manager& argument in test API handlers
Now it's there and can be used. This will allow removing the
ctx->task_manager dependency soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:55:13 +03:00
Pavel Emelyanov
0396ce7977 api: Push sharded<task_manager>& down the test API set calls
This is to make it possible to use this reference instead of the ctx.tm
one by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:54:53 +03:00
Pavel Emelyanov
ef1d2b2c86 api: Use task_manager& argument in API handlers
Now it's there and can be used. This will allow removing the
ctx->task_manager dependency soon

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:54:24 +03:00
Pavel Emelyanov
14e10e7db4 api: Push sharded<task_manager>& down the API set calls
This is to make it possible to use this reference instead of the ctx.tm
one by the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-18 18:52:46 +03:00
Avi Kivity
7d5e22b43b replica: memtable: don't forget memtable memory allocation statistics
A memtable object contains two logalloc::allocating_section members
that track memory allocation requirements during reads and writes.
Because these are local to the memtable, each time we seal a memtable
and create a new one, these statistics are forgotten. As a result
we may have to re-learn the typical size of reads and writes, incurring
a small performance penalty.

The solution is to move the allocating_section object to the memtable_list
container. The workload is the same across all memtables of the same
table, so we don't lose discrimination here.

The performance penalty may be increased later if log changes to
memory reserve thresholds including a backtrace, so this reduces the
odds of incurring such a penalty.

Closes scylladb/scylladb#15737
2023-10-18 17:43:33 +02:00
Kefu Chai
c8cb70918b sstable: drop unused parse() overload for deletion_time
`deletion_time` is a part of the `partition_header`, which is in turn
a part of `partition`. and `data_file` is a sequence of `partition`.
`data_file` represents *-Data.db component of an SSTable.
see docs/architecture/sstable3/sstables-3-data-file-format.rst.
we always parse the data component via `flat_mutation_reader_v2`, which is in turn
implemented with mx/reader.cc or kl/reader.cc depending on
the version of SSTable to be read.

in other words, we decode `deletion_time` in mx/reader.cc or
kl/reader.cc, not in sstable.cc. so let's drop the overload
parse() for deletion_time. it's not necessary and more importantly,
confusing.

Refs #15116
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15756
2023-10-18 18:41:56 +03:00
Avi Kivity
f3dc01c85e Merge 'Enlight sstable_directory construction' from Pavel Emelyanov
Currently distributed_loader starts sharded<sstable_directory> with four sharded parameters. That's quite bulky and can be made much shorter.

Closes scylladb/scylladb#15653

* github.com:scylladb/scylladb:
  distributed_loader: Remove explicit sharded<erms>
  distributed_loader: Brush up start_subdir()
  sstable_directory: Add enlightened construction
  table: Add global_table_ptr::as_sharded_parameter()
2023-10-18 16:42:04 +03:00
Anna Stuchlik
274cf7a93a doc:remove upgrade guides for unsupported versions
This commit:
- Removes upgrade guides for versions older than 5.0.
  The oldest one is from version 4.6 to 5.0.
- Adds the redirections for the removed pages.

Closes scylladb/scylladb#15709
2023-10-18 15:12:26 +03:00
Kefu Chai
f69a44bb37 test/object_store: redirect to STDOUT and STDERR
pytest changes the test's sys.stdout and sys.stderr to the
captured fds when it captures the outputs of the test. so we
are not able to get the STDOUT_FILENO and STDERR_FILENO in C
by querying `sys.stdout.fileno()` and `sys.stderr.fileno()`.
their return values are not 1 and 2 anymore, unless pytest
is started with "-s".

so, to ensure that we always redirect the child process's
outputs to the log file. we need to use 1 and 2 for accessing
the well-known fds, which are the ones used by the child
process, when it writes to stdout and stderr.

this change should address the problem that the log file is
always empty, unless "-s" is specified.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15560
2023-10-18 14:54:01 +03:00
Yaron Kaikov
b340bd6d9e release: prepare for 5.5.0-dev 2023-10-18 14:40:06 +03:00
Botond Dénes
f7e269ccb8 Merge 'Progress of compaction executors' from Aleksandra Martyniuk
compaction_read_monitor_generator is an existing mechanism
for monitoring progress of sstables reading during compaction.
In this change information gathered by compaction_read_monitor_generator
is utilized by task manager compaction tasks of the lowest level,
i.e. compaction executors, to calculate task progress.

compaction_read_monitor_generator has a flag, which decides whether
monitored changes will be registered by compaction_backlog_tracker.
This allows us to pass the generator to all compaction readers without
impacting the backlog.

Task executors have access to compaction_read_monitor_generator_wrapper,
which protects the internals of  compaction_read_monitor_generator
and provides only the necessary functionality.

Closes scylladb/scylladb#14878

* github.com:scylladb/scylladb:
  compaction: add get_progress method to compaction_task_impl
  compaction: find total compaction size
  compaction: sstables: monitor validation scrub with compaction_read_generator
  compaction: keep compaction_progress_monitor in compaction_task_executor
  compaction: use read monitor generator for all compactions
  compaction: add compaction_progress_monitor
  compaction: add flag to compaction_read_monitor_generator
2023-10-18 12:19:51 +03:00
Kamil Braun
c1486fee40 Merge 'commitlog: drop truncation_records after replay' from Petr Gusev
This is a follow-up for #15279 and it fixes two problems.

First, we restore flushes on writes for the tables that were switched to the schema commitlog if `SCHEMA_COMMITLOG` feature is not yet enabled. Otherwise durability is not guaranteed.

Second, we address the problem with truncation records, which could refer to the old commitlog if any of the switched tables were truncated in the past. If the node crashes later, and we replay schema commitlog, we may skip some mutations since their `replay_position`s will be smaller than the `replay_position`s stored for the old commitlog in the `truncated` table.

It turned out that this problem exists even if we don't switch commitlogs for tables. If the node was rebooted the segment ids will start from some small number - they use `steady_clock` which is usually bound to boot time. This means that if the node crashed we may skip the mutations because their RPs will be smaller than the last truncation record RP.

To address this problem we delete truncation records as soon as commitlog is replayed. We also include a test which demonstrates the problem.

Fixes #15354

Closes scylladb/scylladb#15532

* github.com:scylladb/scylladb:
  add test_commitlog
  system.truncated: Remove replay_position data from truncated on start
  main.cc: flush only local memtables when replaying schema commitlog
  main.cc: drop redundant supervisor::notify
  system_keyspace: flush if schema commitlog is not available
2023-10-18 11:14:31 +02:00
Gleb Natapov
f80fff3484 gossip: remove unused STATUS_LEAVING gossiper status
The status is no longer used. The function that referenced it was
removed by 5a96751534 and it was unused
back then for awhile already.

Message-Id: <ZS92mcGE9Ke5DfXB@scylladb.com>
2023-10-18 11:13:14 +02:00
Botond Dénes
7f81957437 Merge 'Initialize datadir for system and non-system keyspaces the same way' from Pavel Emelyanov
When populating system keyspace the sstable_directory forgets to create upload/ subdir in the tables' datadir because of the way it's invoked from distributed loader. For non-system keyspaces directories are created in table::init_storage() which is self-contained and just creates the whole layout regardless of what.

This PR makes system keyspace's tables use table::init_storage() as well so that the datadir layout is the same for all on-disk tables.

Test included.

fixes: #15708
closes: scylladb/scylla-manager#3603

Closes scylladb/scylladb#15723

* github.com:scylladb/scylladb:
  test: Add test for datadir/ layout
  sstable_directory: Indentation fix after previous patch
  db,sstables: Move storage init for system keyspace to table creation
2023-10-18 12:12:19 +03:00
David Garcia
51466dcb23 docs: add latest option to aws_images extension
rollback only latest

Closes scylladb/scylladb#15651
2023-10-18 11:43:21 +03:00
Petr Gusev
a0aee54f2c add test_commitlog
Check that commitlog provides durability in case
of a node reboot:
* truncate table T, truncation_record RP=1000;
* clean shutdown node/reboot machine/restart node, now RP=~0
since segment ids count from boot time;
* write some data to T; crash/restart
* check data is retained
2023-10-17 18:16:50 +04:00
Calle Wilund
6fbd210679 system.truncated: Remove replay_position data from truncated on start
Once we've started clean, and all replaying is done, truncation logs
commit log regarding replay positions are invalid. We should exorcise
them as soon as possible. Note that we cannot remove truncation data
completely though, since the time stamps stored are used by things like
batch log to determine if it should use or discard old batch data.
2023-10-17 18:16:48 +04:00
Petr Gusev
dde36b5d9d main.cc: flush only local memtables when replaying schema commitlog
Schema commitlog can be used only on shard 0, so it's redundant
to flush any other memtables.
2023-10-17 18:15:51 +04:00
Petr Gusev
54dd7cf1da main.cc: drop redundant supervisor::notify
Later in the code we have 'replaying schema commit log',
which duplicates this one. Also,
maybe_init_schema_commitlog may skip schema commitlog
initialization if the SCHEMA_COMMITLOG feature is
not yet supported by the cluster, so this notification
can be misleading.
2023-10-17 18:15:49 +04:00
Petr Gusev
c89ead55ff system_keyspace: flush if schema commitlog is not available
In PR #15279 we removed flushes when writing to a number
of tables from the system keyspace. This was made possible
by switching these tables to the schema commitlog.
Schema commitlog is enabled only when the SCHEMA_COMMITLOG
feature is supported by all nodes in the cluster. Before that
these tables will use the regular commitlog, which is not
durable because it uses db::commitlog::sync_mode::PERIODIC. This
means that we may lose data if a node crashes during upgrade
to the version with schema commitlog.

In this commit we fix this problem by restoring flushes
after writes to the tables if the schema commitlog
is not enabled yet.

The patch also contains a test that demonstrates the
problem. We need flush_schema_tables_after_modification
option since otherwise schema changes are not durable
and node fails after restart.
2023-10-17 18:14:27 +04:00
Pavel Emelyanov
d59cd662f8 test: Add test for datadir/ layout
The test checks that

- for non-system keyspace datadir and its staging/ and upload/ subdirs
  are created when the table is created _and_ that the directory is
  re-populated on boot in case it was explicitly removed

- for system non-virtual tables it checks that the same directory layout
  is created on boot

- for system virtual tables it checks that the directory layout doesn't
  exist

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-16 16:26:48 +03:00
Pavel Emelyanov
c3b3e5b107 sstable_directory: Indentation fix after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-16 16:26:37 +03:00
Pavel Emelyanov
059d7c795e db,sstables: Move storage init for system keyspace to table creation
User and system keyspaces are created and populated slightly
differently.

System keyspace is created via system_keyspace::make() which eventually
calls calls add_column_family(). Then it's populated via
init_system_keyspace() which calls sstable_directory::prepare() which,
in turn, optionally creates directories in datadir/ or checks the
directory permissions if it exists

User keyspaces are created with the help of
add_column_family_and_make_directory() call which calls the
add_column_family() mentioned above _and_ calls table::init_storage() to
create directories. When it's populated with init_non_system_keyspaces()
it also calls sstable_directory::prepare() which notices that the
directory exists and then checks the permissions.

As a result, sstable_directory::prepare() initializes storage for system
keyspace only and there's a BUG (#15708) that the upload/ subdir is not
created.

This patch makes the directories creation for _all_ keyspaces with the
table::init_storage(). The change only touches system keyspace by moving
the creation of directories from sstable_directory::prepare() into
system_keyspace::make().

Indentation is deliberately left broken.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-16 16:19:25 +03:00
Patryk Jędrzejczak
7810e8d860 table_helper: announce twice in setup_keyspace
We refactor table_helper::setup_keyspace so that it calls
migration_manager::announce at most twice. We achieve it by
announcing all tables at once.

The number of announcements should further be reduced to one, but
it requires a big refactor. The CQL code used in
parse_new_cf_statement assumes the keyspace has already been
created. We cannot have such an assumption if we want to announce
a keyspace and its tables together. However, we shouldn't touch
the CQL code as it would impact user requests, too.

One solution is using schema_builder instead of the CQL statements
to create tables in table_helper.

Another approach is removing table_helper completely. It is used
only for the system_traces keyspace, which Scylla creates
automatically. We could refactor the way Scylla handles this
keyspace and make table_helper unneeded.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
2b4e1e0f9c table_helper: refactor setup_table
In the following commit, we reduce migration_manager::announce
calls in table_helper::setup_keyspace by announcing all tables
together. To do it, we cannot use table_helper::setup_table
anymore, which announces a single table itself. However, the new
code still has to translate CQL statements, so we extract it to the
new parse_new_cf_statement function to avoid duplication.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
fad71029f0 redis: create_keyspace_if_not_exists_impl: fix indentation
Broken in the previous commit.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
a3044d1f46 redis: announce once in create_keyspace_if_not_exists_impl
We refactor create_keyspace_if_not_exists_impl so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
98d067e77d db: system_distributed_keyspace: fix indentation
Broken in the previous commit.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
5ebc0e8617 db: system_distributed_keyspace: announce once in start
We refactor system_distributed_keyspace::start so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.

We remove a catch expression together with the FIXME from
get_updated_service_levels (add_new_columns_if_missing before the
patch) because we cannot treat the service_levels update
differently anymore.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
449b4c79c2 tablet_allocator: update on_before_create_column_family
After adding the keyspace_metadata parameter to
migration_listener::on_before_create_column_family,
tablet_allocator doesn't need to load it from the database.

This change is necessary before merging migration_manager::announce
calls in the following commit.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
7653059369 migration_listener: add parameter to on_before_create_column_family
After adding the new prepare_new_column_family_announcement that
doesn't assume the existence of a keyspace, we also need to get
rid of the same assumption in all on_before_create_column_family
calls. After all, they may be initiated before creating the
keyspace. However, some listeners require keyspace_metadata, so we
pass it as a new parameter.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
96d9e768c4 alternator: executor: use new prepare_new_column_family_announcement
We can use the new prepare_new_column_family_announcement function
that doesn't assume the existence of the keyspace instead of the
previous work-around.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
fcd092473c alternator: executor: introduce create_keyspace_metadata
We need to store a new keyspace's keyspace_metadata as a local
variable in create_table_on_shard0. In the following commit, we
use it to call the new prepare_new_column_family_announcement
function.
2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak
7e6017d62d migration_manager: add new prepare_new_column_family_announcement
In the following commits, we reduce the number of the
migration_manager::anounce calls by merging some of them in a way
that logically makes sense. Some of these merges are similar --
we announce a new keyspace and its tables together. However,
we cannot use the current prepare_new_column_family_announcement
there because it assumes that the keyspace has already been created
(when it loads the keyspace from the database). Luckily, this
assumption is not necessary as this function only needs
keyspace_metadata. Instead of loading it from the database, we can
pass it as a parameter.
2023-10-16 14:59:53 +02:00
Aleksandra Martyniuk
198119f737 compaction: add get_progress method to compaction_task_impl
compaction_task_impl::get_progress is used by the lowest level
compaction tasks which progress can be taken from
compaction_progress_monitor.
2023-10-12 17:16:05 +02:00
Aleksandra Martyniuk
39e96c6521 compaction: find total compaction size 2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
7b3e0ab1f2 compaction: sstables: monitor validation scrub with compaction_read_generator
Validation scrub bypasses the usual compaction machinery, though it
still needs to be tracked with compaction_progress_monitor so that
we could reach its progress from compaction task executor.

Track sstable scrub in validate mode with read monitors.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
3553556708 compaction: keep compaction_progress_monitor in compaction_task_executor
Keep compaction_progress_monitor in compaction_task_executor and pass a reference
to it further, so that the compaction progress could be retrieved out of it.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
37da5a0638 compaction: use read monitor generator for all compactions
Compaction read monitor generators are used in all compaction types.
Classes which did not use _monitor_generator so far, create it with
_use_backlog_tracker set to no, not to impact backlog tracker.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
22bf3c03df compaction: add compaction_progress_monitor
In the following patches compaction_read_monitor_generator will be used
to find progress of compaction_task_executor's. To avoid unnecessary life
prolongation and exposing internals of the class out of compaction.cc,
compaction_progress_monitor is created.

Compaction class keeps a reference to the compaction_progress_monitor.
Inheriting classes which actually use compaction_read_monitor_generator,
need to set it with set_generator method.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
b852ad25bf compaction: add flag to compaction_read_monitor_generator
Following patches will use compaction_read_monitor_generator
to track progress of all types of compaction. Some of them should
not be registered in compaction_backlog_tracker.

_use_backlog_tracker flag, which is by default set to true, is
added to compaction_read_monitor_generator and passed to all
compaction_read_monitors created by this generator.
2023-10-12 17:03:46 +02:00
Kefu Chai
e76a02abc5 build: move check for NIX_CC into dynamic_linker_option()
`employ_ld_trickery` is only used by `dynamic_linker_option()`, so
move it into this function.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:11:57 +08:00
Kefu Chai
e85fc9f8be build: extract dynamic_linker_option(): out
this change helps to remove more global statements.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:11:57 +08:00
Kefu Chai
21b61e8f0a build: move headers into write_build_file()
`headers` is only used in this function, so move it closer to where
it is used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:11:57 +08:00
Kefu Chai
b3e5c8c348 build: cmake: pass -dynamic-linker to ld
to match the behavior of `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:07:13 +08:00
Kefu Chai
ce46f7b91b build: cmake: set CMAKE_EXE_LINKER_FLAGS in mode.common.cmake
so that CMakeLists.txt is less cluttered. as we will append
`--dynamic-linker` option to the LDFLAGS.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-09 11:07:13 +08:00
Kefu Chai
1efd0d9a92 test: set use_uuid to true by default in sstables::test_env
for better coverage of uuid-based sstable identifier. since this
option is enabled by default, this also match our tests with the
default behavior of scylladb.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-07 18:56:47 +08:00
Kefu Chai
50c8619ed9 test: enable test to set uuid_sstable_identifiers
some of the tests are still relying on the integer-based sstable
identifier, so let's add a method to test_env, so that the tests
relying on this can opt-out. we will change the default setting
of sstables::test_env to use uuid-base sstable identifier in the
next commit. this change does not change the existing behavior.
it just adds a new knob to test_env_config. and let the tests
relying on this to customize the test_env_config to disable
use_uuid.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-07 18:56:47 +08:00
Tomasz Grabiec
7862ffbd14 perf_simple_query: Allow running with tablets 2023-10-06 23:49:15 +02:00
Tomasz Grabiec
0edb39715d tests: cql_test_env: Allow creating keyspace with tablets 2023-10-06 23:49:15 +02:00
Tomasz Grabiec
0ff10c72de tests: cql_test_env: Register storage_service in migration notifier
The procedure in main already does this.

Processing of tablet metadata on schema changes relies on
this. Without this, creating a tablet-based table will fail on missing
tablet map in token metadata because the listener in storage service
does not fire.
2023-10-06 23:49:15 +02:00
Tomasz Grabiec
3c0d723ad4 test: cql_test_env: Initialize node state in topology
This is necessary for using tablets with cql_test_env in tools like
perf-simple-query.

Otherwise, the test will fail with:

  Shard count not known for node c06a7e7f-ee6c-44e5-9257-09cdc5b2bb10

The existing tablets_test works because it creates its own topology
bypassing the one in storage_service.
2023-10-06 23:49:15 +02:00
Anna Stuchlik
5d3584faa5 doc: add a note about Raft
This commit adds a note to specify
that the information on the Handling
Failures page only refers to clusters
with Raft enabled.
Also, the comment is included to remove
the note in future versions.
2023-10-06 16:04:43 +02:00
Pavel Emelyanov
e485c854b2 distributed_loader: Remove explicit sharded<erms>
The sharded replication map was needed to provide sharded for sstable
directory. Now it gets sharded via table reference and thus the erms
thing becomes unused

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-06 15:57:45 +03:00
Pavel Emelyanov
c2eb1ae543 distributed_loader: Brush up start_subdir()
Drop some local references to class members and line-up arguments to
starting distributed sstable directory. Purely a clean up patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-06 15:57:03 +03:00
Pavel Emelyanov
795dcf2ead sstable_directory: Add enlightened construction
The existing constructor is pretty heavyweight for the distributed
loader to use -- it needs to pass it 4 sharded parameters which looks
pretty bulky in the text editor. However, 5 constructor arguments are
obtained directly from the table, so the dist. loader code with global
table pointer at hand can pass _it_ as sharded parameter and let the
sstable directory extract what it needs.

Sad news is that sstable_directory cannot be switched to just use table
reference. Tools code doesn't have table at hand, but needs the
facilities sstable_directory provides

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-06 15:54:51 +03:00
Pavel Emelyanov
e004469827 table: Add global_table_ptr::as_sharded_parameter()
The method returns seastar::sharded_parameter<> for the global table
that evaluates into local table reference

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-06 15:53:57 +03:00
Anna Stuchlik
eb5a9c535a doc: add the quorum requirement to procedures
This commit adds a note to the docs for
cluster management that a quorum is
required to add, remove, or replace a node,
and update the schema.
2023-10-04 13:16:21 +02:00
Anna Stuchlik
bf25b5fe76 doc: add more failure info to Troubleshooting
This commit adds new pages with reference to
Handling Node Failures to Troubleshooting.
The pages are:
- Failure to Add, Remove, or Replace a Node
  (in the Cluster section)
- Failure to Update the Schema
  (in the Data Modeling section)
2023-10-04 12:44:26 +02:00
Anna Stuchlik
8c4f9379d5 doc: move Handling Failures to Troubleshooting
This commit moves the content of the Handling
Failures section on the Raft page to the new
Handling Node Failures page in the Troubleshooting
section.

Background:
When Raft was experimental, the Handling Failures
section was only applicable to clusters
where Raft was explicitly enabled.
Now that Raft is the default, the information
about handling failures is relevant to
all users.
2023-10-04 12:23:33 +02:00
Benny Halevy
bec489409e row_cache: abort on exteral_updater::execute errors
Currently the cache updaters aren't exception safe
yet they are intended to be.

Instead of allowing exceptions from
`external_updater::execute` escape `row_cache::update`,
abort using `on_fatal_internal_error`.

Future changes should harden all `execute` implementations
to effectively make them `noexcept`, then the pure virtual
definition can be made `noexcept` to cement that.

Fixes scylladb/scylladb#15576

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-28 09:11:04 +03:00
Benny Halevy
80bba3d4b7 row_cache: do_update: simplify _prev_snapshot_pos setup
ring_position::min() is noexcept since 6d7ae4ead1
So no need to call it outside of the critical noexcept block.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-28 08:21:30 +03:00
Takuya ASADA
ea61b14f27 scylla_swap_setup: use fallocate on ext4
We stop using fallocate for allocating swap since it does not work on
xfs (#6650).
However, dd is much slower than fallocate since it filling data on the
file, let's use fallocate when filesystem is ext4 since it actually
works and faster.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2023-02-01 01:58:13 +09:00
Takuya ASADA
dffadabb94 scylla_swap_setup: run error check before allocating swap
We should run error check before running dd, otherwise it will left
swapfile on disk without completing swap setup.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2023-02-01 01:58:13 +09:00
2346 changed files with 121249 additions and 50890 deletions

225
.clang-format Normal file
View File

@@ -0,0 +1,225 @@
---
Language: Cpp
AccessModifierOffset: -4
AlignAfterOpenBracket: Align
AlignArrayOfStructures: None
AlignConsecutiveAssignments:
Enabled: false
AcrossEmptyLines: false
AcrossComments: false
AlignCompound: false
PadOperators: true
AlignConsecutiveBitFields:
Enabled: false
AcrossEmptyLines: false
AcrossComments: false
AlignCompound: false
PadOperators: false
AlignConsecutiveDeclarations:
Enabled: false
AcrossEmptyLines: false
AcrossComments: false
AlignCompound: false
PadOperators: false
AlignConsecutiveMacros:
Enabled: false
AcrossEmptyLines: false
AcrossComments: false
AlignCompound: false
PadOperators: false
AlignConsecutiveShortCaseStatements:
Enabled: false
AcrossEmptyLines: false
AcrossComments: false
AlignCaseColons: false
AlignEscapedNewlines: Right
AlignOperands: Align
AlignTrailingComments:
Kind: Always
OverEmptyLines: 0
AllowAllArgumentsOnNextLine: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: Never
AllowShortCaseLabelsOnASingleLine: false
AllowShortEnumsOnASingleLine: true
AllowShortFunctionsOnASingleLine: InlineOnly
AllowShortIfStatementsOnASingleLine: Never
AllowShortLambdasOnASingleLine: All
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: Yes
AttributeMacros:
- __capability
BinPackArguments: false
BinPackParameters: false
BitFieldColonSpacing: Both
BraceWrapping:
AfterCaseLabel: false
AfterClass: false
AfterControlStatement: Never
AfterEnum: false
AfterExternBlock: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
BeforeCatch: false
BeforeElse: false
BeforeLambdaBody: false
BeforeWhile: false
IndentBraces: false
SplitEmptyFunction: true
SplitEmptyRecord: true
SplitEmptyNamespace: true
BreakAfterAttributes: Never
BreakAfterJavaFieldAnnotations: false
BreakArrays: true
BreakBeforeBinaryOperators: None
BreakBeforeConceptDeclarations: Always
BreakBeforeBraces: Attach
BreakBeforeInlineASMColon: OnlyMultiline
BreakBeforeTernaryOperators: true
BreakConstructorInitializers: BeforeComma
BreakInheritanceList: BeforeColon
BreakStringLiterals: true
ColumnLimit: 160
CommentPragmas: '^ IWYU pragma:'
CompactNamespaces: false
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DerivePointerAlignment: false
DisableFormat: false
EmptyLineAfterAccessModifier: Never
EmptyLineBeforeAccessModifier: LogicalBlock
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros:
- foreach
- Q_FOREACH
- BOOST_FOREACH
IfMacros:
- KJ_IF_MAYBE
IncludeBlocks: Preserve
IncludeCategories:
- Regex: '^"(llvm|llvm-c|clang|clang-c)/'
Priority: 2
SortPriority: 0
CaseSensitive: false
- Regex: '^(<|"(gtest|gmock|isl|json)/)'
Priority: 3
SortPriority: 0
CaseSensitive: false
- Regex: '.*'
Priority: 1
SortPriority: 0
CaseSensitive: false
IncludeIsMainRegex: '(Test)?$'
IncludeIsMainSourceRegex: ''
IndentAccessModifiers: false
IndentCaseBlocks: false
IndentCaseLabels: false
IndentExternBlock: AfterExternBlock
IndentGotoLabels: true
IndentPPDirectives: None
IndentRequiresClause: true
IndentWidth: 4
IndentWrappedFunctionNames: false
InsertBraces: false
InsertNewlineAtEOF: true
InsertTrailingCommas: None
IntegerLiteralSeparator:
Binary: 0
BinaryMinDigits: 0
Decimal: 0
DecimalMinDigits: 0
Hex: 0
HexMinDigits: 0
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: true
KeepEmptyLinesAtEOF: false
LambdaBodyIndentation: Signature
LineEnding: DeriveLF
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 2
NamespaceIndentation: None
PackConstructorInitializers: NextLine
PenaltyBreakAssignment: 2
PenaltyBreakBeforeFirstCallParameter: 19
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakOpenParenthesis: 0
PenaltyBreakString: 1000
PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyIndentedWhitespace: 0
PenaltyReturnTypeOnItsOwnLine: 60
PointerAlignment: Left
PPIndentWidth: -1
QualifierAlignment: Leave
ReferenceAlignment: Pointer
ReflowComments: true
RemoveBracesLLVM: false
RemoveParentheses: Leave
RemoveSemicolon: false
RequiresClausePosition: OwnLine
RequiresExpressionIndentation: OuterScope
SeparateDefinitionBlocks: Leave
ShortNamespaceLines: 1
SortIncludes: CaseSensitive
SortJavaStaticImport: Before
SortUsingDeclarations: LexicographicNumeric
SpaceAfterCStyleCast: false
SpaceAfterLogicalNot: false
SpaceAfterTemplateKeyword: true
SpaceAroundPointerQualifiers: Default
SpaceBeforeAssignmentOperators: true
SpaceBeforeCaseColon: false
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
SpaceBeforeJsonColon: false
SpaceBeforeParens: ControlStatements
SpaceBeforeParensOptions:
AfterControlStatements: true
AfterForeachMacros: true
AfterFunctionDefinitionName: false
AfterFunctionDeclarationName: false
AfterIfMacros: true
AfterOverloadedOperator: false
AfterRequiresInClause: false
AfterRequiresInExpression: false
BeforeNonEmptyParentheses: false
SpaceBeforeRangeBasedForLoopColon: true
SpaceBeforeSquareBrackets: false
SpaceInEmptyBlock: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: Never
SpacesInContainerLiterals: true
SpacesInLineCommentPrefix:
Minimum: 1
Maximum: -1
SpacesInParens: Never
SpacesInParensOptions:
InCStyleCasts: false
InConditionalStatements: false
InEmptyParentheses: false
Other: false
SpacesInSquareBrackets: false
Standard: Latest
TabWidth: 8
UseTab: Never
VerilogBreakBetweenInstancePorts: true
WhitespaceSensitiveMacros:
- BOOST_PP_STRINGIZE
- CF_SWIFT_NAME
- NS_SWIFT_NAME
- PP_STRINGIZE
- STRINGIZE
...

1
.gitattributes vendored
View File

@@ -1,3 +1,4 @@
*.cc diff=cpp
*.hh diff=cpp
*.svg binary
docs/_static/api/js/* binary

56
.github/CODEOWNERS vendored
View File

@@ -1,5 +1,5 @@
# AUTH
auth/* @elcallio @vladzcloudius
auth/* @nuivall @ptrsmrn @KrzaQ
# CACHE
row_cache* @tgrabiec
@@ -7,9 +7,9 @@ row_cache* @tgrabiec
test/boost/mvcc* @tgrabiec
# CDC
cdc/* @kbr- @elcallio @piodul @jul-stas
test/cql/cdc_* @kbr- @elcallio @piodul @jul-stas
test/boost/cdc_* @kbr- @elcallio @piodul @jul-stas
cdc/* @kbr-scylla @elcallio @piodul
test/cql/cdc_* @kbr-scylla @elcallio @piodul
test/boost/cdc_* @kbr-scylla @elcallio @piodul
# COMMITLOG / BATCHLOG
db/commitlog/* @elcallio @eliransin
@@ -19,24 +19,24 @@ db/batch* @elcallio
service/storage_proxy* @gleb-cloudius
# COMPACTION
compaction/* @raphaelsc @nyh
compaction/* @raphaelsc
# CQL TRANSPORT LAYER
transport/*
# CQL QUERY LANGUAGE
cql3/* @tgrabiec @cvybhu @nyh
cql3/* @tgrabiec @nuivall @ptrsmrn @KrzaQ
# COUNTERS
counters* @jul-stas
tests/counter_test* @jul-stas
counters* @nuivall @ptrsmrn @KrzaQ
tests/counter_test* @nuivall @ptrsmrn @KrzaQ
# DOCS
docs/* @annastuchlik @tzach
docs/alternator @annastuchlik @tzach @nyh @havaker @nuivall
docs/alternator @annastuchlik @tzach @nyh @nuivall @ptrsmrn @KrzaQ
# GOSSIP
gms/* @tgrabiec @asias
gms/* @tgrabiec @asias @kbr-scylla
# DOCKER
dist/docker/*
@@ -45,44 +45,44 @@ dist/docker/*
utils/logalloc* @tgrabiec
# MATERIALIZED VIEWS
db/view/* @nyh @cvybhu @piodul
cql3/statements/*view* @nyh @cvybhu @piodul
test/boost/view_* @nyh @cvybhu @piodul
db/view/* @nyh @piodul
cql3/statements/*view* @nyh @piodul
test/boost/view_* @nyh @piodul
# PACKAGING
dist/* @syuu1228
# REPAIR
repair/* @tgrabiec @asias @nyh
repair/* @tgrabiec @asias
# SCHEMA MANAGEMENT
db/schema_tables* @tgrabiec @nyh
db/legacy_schema_migrator* @tgrabiec @nyh
service/migration* @tgrabiec @nyh
schema* @tgrabiec @nyh
db/schema_tables* @tgrabiec
db/legacy_schema_migrator* @tgrabiec
service/migration* @tgrabiec
schema* @tgrabiec
# SECONDARY INDEXES
index/* @nyh @cvybhu @piodul
cql3/statements/*index* @nyh @cvybhu @piodul
test/boost/*index* @nyh @cvybhu @piodul
index/* @nyh @piodul
cql3/statements/*index* @nyh @piodul
test/boost/*index* @nyh @piodul
# SSTABLES
sstables/* @tgrabiec @raphaelsc @nyh
sstables/* @tgrabiec @raphaelsc
# STREAMING
streaming/* @tgrabiec @asias
service/storage_service.* @tgrabiec @asias
# ALTERNATOR
alternator/* @nyh @havaker @nuivall
test/alternator/* @nyh @havaker @nuivall
alternator/* @nyh @nuivall @ptrsmrn @KrzaQ
test/alternator/* @nyh @nuivall @ptrsmrn @KrzaQ
# HINTED HANDOFF
db/hints/* @piodul @vladzcloudius @eliransin
# REDIS
redis/* @nyh @syuu1228
test/redis/* @nyh @syuu1228
redis/* @syuu1228
test/redis/* @syuu1228
# READERS
reader_* @denesb
@@ -94,8 +94,8 @@ test/boost/querier_cache_test.cc @denesb
test/cql-pytest/* @nyh
# RAFT
raft/* @kbr- @gleb-cloudius @kostja
test/raft/* @kbr- @gleb-cloudius @kostja
raft/* @kbr-scylla @gleb-cloudius @kostja
test/raft/* @kbr-scylla @gleb-cloudius @kostja
# HEAT-WEIGHTED LOAD BALANCING
db/heat_load_balance.* @nyh @gleb-cloudius

20
.github/clang-include-cleaner.json vendored Normal file
View File

@@ -0,0 +1,20 @@
{
"problemMatcher": [
{
"owner": "clang-include-cleaner",
"severity": "error",
"pattern": [
{
"regexp": "^([^\\-\\+].*)$",
"file": 1
},
{
"regexp": "^(-\\s+[^\\s]+)\\s+@Line:(\\d+)$",
"line": 2,
"message": 1,
"loop": true
}
]
}
]
}

18
.github/clang-matcher.json vendored Normal file
View File

@@ -0,0 +1,18 @@
{
"problemMatcher": [
{
"owner": "clang",
"pattern": [
{
"regexp": "^([^:]+):(\\d+):(\\d+):\\s+(warning|error):\\s+(.*?)\\s+\\[(.*?)\\]$",
"file": 1,
"line": 2,
"column": 3,
"severity": 4,
"message": 5,
"code": 6
}
]
}
]
}

92
.github/mergify.yml vendored Normal file
View File

@@ -0,0 +1,92 @@
pull_request_rules:
- name: put PR in draft if conflicts
conditions:
- label = conflicts
- author = mergify[bot]
- head ~= ^mergify/
actions:
edit:
draft: true
- name: Delete mergify backport branch
conditions:
- base~=branch-
- or:
- merged
- closed
actions:
delete_head_branch:
- name: Automate backport pull request 6.1
conditions:
- or:
- closed
- merged
- or:
- base=master
- base=next
- label=backport/6.1 # The PR must have this label to trigger the backport
- label=promoted-to-master
actions:
copy:
title: "[Backport 6.1] {{ title }}"
body: |
{{ body }}
{% for c in commits %}
(cherry picked from commit {{ c.sha }})
{% endfor %}
Refs #{{number}}
branches:
- branch-6.1
assignees:
- "{{ author }}"
- name: Automate backport pull request 5.4
conditions:
- or:
- closed
- merged
- or:
- base=master
- base=next
- label=backport/5.4 # The PR must have this label to trigger the backport
- label=promoted-to-master
actions:
copy:
title: "[Backport 5.4] {{ title }}"
body: |
{{ body }}
{% for c in commits %}
(cherry picked from commit {{ c.sha }})
{% endfor %}
Refs #{{number}}
branches:
- branch-5.4
assignees:
- "{{ author }}"
- name: Automate backport pull request 6.0
conditions:
- or:
- closed
- merged
- or:
- base=master
- base=next
- label=backport/6.0 # The PR must have this label to trigger the backport
- label=promoted-to-master
actions:
copy:
title: "[Backport 6.0] {{ title }}"
body: |
{{ body }}
{% for c in commits %}
(cherry picked from commit {{ c.sha }})
{% endfor %}
Refs #{{number}}
branches:
- branch-6.0
assignees:
- "{{ author }}"

1
.github/pull_request_template.md vendored Normal file
View File

@@ -0,0 +1 @@
**Please replace this line with justification for the backport/\* labels added to this PR**

186
.github/scripts/auto-backport.py vendored Executable file
View File

@@ -0,0 +1,186 @@
#!/usr/bin/env python3
import argparse
import os
import re
import sys
import tempfile
import logging
from github import Github, GithubException
from git import Repo, GitCommandError
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
try:
github_token = os.environ["GITHUB_TOKEN"]
except KeyError:
print("Please set the 'GITHUB_TOKEN' environment variable")
sys.exit(1)
def is_pull_request():
return '--pull-request' in sys.argv[1:]
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--repo', type=str, required=True, help='Github repository name')
parser.add_argument('--base-branch', type=str, default='refs/heads/master', help='Base branch')
parser.add_argument('--commits', default=None, type=str, help='Range of promoted commits.')
parser.add_argument('--pull-request', type=int, help='Pull request number to be backported')
parser.add_argument('--head-commit', type=str, required=is_pull_request(), help='The HEAD of target branch after the pull request specified by --pull-request is merged')
return parser.parse_args()
def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft=False):
pr_body = f'{pr.body}\n\n'
for commit in commits:
pr_body += f'- (cherry picked from commit {commit})\n\n'
pr_body += f'Parent PR: #{pr.number}'
try:
backport_pr = repo.create_pull(
title=backport_pr_title,
body=pr_body,
head=f'scylladbbot:{new_branch_name}',
base=base_branch_name,
draft=is_draft
)
logging.info(f"Pull request created: {backport_pr.html_url}")
backport_pr.add_to_assignees(pr.user)
if is_draft:
backport_pr.add_to_labels("conflicts")
pr_comment = f"@{pr.user} - This PR was marked as draft because it has conflicts\n"
pr_comment += "Please resolve them and mark this PR as ready for review"
backport_pr.create_issue_comment(pr_comment)
logging.info(f"Assigned PR to original author: {pr.user}")
return backport_pr
except GithubException as e:
if 'A pull request already exists' in str(e):
logging.warning(f'A pull request already exists for {pr.user}:{new_branch_name}')
else:
logging.error(f'Failed to create PR: {e}')
def get_pr_commits(repo, pr, stable_branch, start_commit=None):
commits = []
if pr.merged:
merge_commit = repo.get_commit(pr.merge_commit_sha)
if len(merge_commit.parents) > 1: # Check if this merge commit includes multiple commits
commits.append(pr.merge_commit_sha)
else:
if start_commit:
promoted_commits = repo.compare(start_commit, stable_branch).commits
else:
promoted_commits = repo.get_commits(sha=stable_branch)
for commit in pr.get_commits():
for promoted_commit in promoted_commits:
commit_title = commit.commit.message.splitlines()[0]
# In Scylla-pkg and scylla-dtest, for example,
# we don't create a merge commit for a PR with multiple commits,
# according to the GitHub API, the last commit will be the merge commit,
# which is not what we need when backporting (we need all the commits).
# So here, we are validating the correct SHA for each commit so we can cherry-pick
if promoted_commit.commit.message.startswith(commit_title):
commits.append(promoted_commit.sha)
elif pr.state == 'closed':
events = pr.get_issue_events()
for event in events:
if event.event == 'closed':
commits.append(event.commit_id)
return commits
def create_pr_comment_and_remove_label(pr, comment_body):
labels = pr.get_labels()
pattern = re.compile(r"backport/\d+\.\d+$")
for label in labels:
if pattern.match(label.name):
print(f"Removing label: {label.name}")
comment_body += f'- {label.name}\n'
pr.remove_from_labels(label)
pr.create_issue_comment(comment_body)
def backport(repo, pr, version, commits, backport_base_branch):
new_branch_name = f'backport/{pr.number}/to-{version}'
backport_pr_title = f'[Backport {version}] {pr.title}'
repo_url = f'https://scylladbbot:{github_token}@github.com/{repo.full_name}.git'
fork_repo = f'https://scylladbbot:{github_token}@github.com/scylladbbot/{repo.name}.git'
with (tempfile.TemporaryDirectory() as local_repo_path):
try:
repo_local = Repo.clone_from(repo_url, local_repo_path, branch=backport_base_branch)
repo_local.git.checkout(b=new_branch_name)
is_draft = False
for commit in commits:
try:
repo_local.git.cherry_pick(commit, '-m1', '-x')
except GitCommandError as e:
logging.warning(f'Cherry-pick conflict on commit {commit}: {e}')
is_draft = True
repo_local.git.add(A=True)
repo_local.git.cherry_pick('--continue')
if not repo.private and not repo.has_in_collaborators(pr.user.login):
repo.add_to_collaborators(pr.user.login, permission="push")
comment = f':warning: @{pr.user.login} you have been added as collaborator to scylladbbot fork '
comment += f'Please check your inbox and approve the invitation, once it is done, please add the backport labels again'
create_pr_comment_and_remove_label(pr, comment)
return
repo_local.git.push(fork_repo, new_branch_name, force=True)
create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,
is_draft=is_draft)
except GitCommandError as e:
logging.warning(f"GitCommandError: {e}")
def main():
args = parse_args()
base_branch = args.base_branch.split('/')[2]
promoted_label = 'promoted-to-master'
repo_name = args.repo
if 'scylla-enterprise' in args.repo:
promoted_label = 'promoted-to-enterprise'
stable_branch = base_branch
backport_branch = 'branch-'
backport_label_pattern = re.compile(r'backport/\d+\.\d+$')
g = Github(github_token)
repo = g.get_repo(repo_name)
closed_prs = []
start_commit = None
if args.commits:
start_commit, end_commit = args.commits.split('..')
commits = repo.compare(start_commit, end_commit).commits
for commit in commits:
match = re.search(rf"Closes .*#([0-9]+)", commit.commit.message, re.IGNORECASE)
if match:
pr_number = int(match.group(1))
pr = repo.get_pull(pr_number)
closed_prs.append(pr)
if args.pull_request:
start_commit = args.head_commit
pr = repo.get_pull(args.pull_request)
closed_prs = [pr]
for pr in closed_prs:
labels = [label.name for label in pr.labels]
backport_labels = [label for label in labels if backport_label_pattern.match(label)]
if promoted_label not in labels:
print(f'no {promoted_label} label: {pr.number}')
continue
if not backport_labels:
print(f'no backport label: {pr.number}')
continue
commits = get_pr_commits(repo, pr, stable_branch, start_commit)
logging.info(f"Found PR #{pr.number} with commit {commits} and the following labels: {backport_labels}")
for backport_label in backport_labels:
version = backport_label.replace('backport/', '')
backport_base_branch = backport_label.replace('backport/', backport_branch)
backport(repo, pr, version, commits, backport_base_branch)
if __name__ == "__main__":
main()

View File

@@ -1,8 +1,9 @@
from github import Github
import argparse
import re
import sys
import os
from github import Github
from github.GithubException import UnknownObjectException
try:
github_token = os.environ["GITHUB_TOKEN"]
@@ -15,13 +16,8 @@ def parser():
parser = argparse.ArgumentParser()
parser.add_argument('--repository', type=str, required=True,
help='Github repository name (e.g., scylladb/scylladb)')
parser.add_argument('--commit_before_merge', type=str, required=True, help='Git commit ID to start labeling from ('
'newest commit).')
parser.add_argument('--commit_after_merge', type=str, required=True,
help='Git commit ID to end labeling at (oldest '
'commit, exclusive).')
parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '
'done')
parser.add_argument('--commits', type=str, required=True, help='Range of promoted commits.')
parser.add_argument('--label', type=str, default='promoted-to-master', help='Label to use')
parser.add_argument('--ref', type=str, required=True, help='PR target branch')
return parser.parse_args()
@@ -52,10 +48,11 @@ def main():
target_branch = re.search(r'branch-(\d+\.\d+)', args.ref)
g = Github(github_token)
repo = g.get_repo(args.repository, lazy=False)
commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)
start_commit, end_commit = args.commits.split('..')
commits = repo.compare(start_commit, end_commit).commits
processed_prs = set()
# Print commit information
for commit in commits.commits:
for commit in commits:
print(f'Commit sha is: {commit.sha}')
match = pr_pattern.search(commit.commit.message)
if match:
@@ -65,21 +62,24 @@ def main():
if target_branch:
pr = repo.get_pull(pr_number)
branch_name = target_branch[1]
refs_pr = re.findall(r'Refs (?:#|https.*?)(\d+)', pr.body)
refs_pr = re.findall(r'Parent PR: (?:#|https.*?)(\d+)', pr.body)
if refs_pr:
print(f'branch-{target_branch.group(1)}, pr number is: {pr_number}')
# 1. change the backport label of the parent PR to note that
# we've merge the corresponding backport PR
# we've merged the corresponding backport PR
# 2. close the backport PR and leave a comment on it to note
# that it has been merged with a certain git commit,
# that it has been merged with a certain git commit.
ref_pr_number = refs_pr[0]
mark_backport_done(repo, ref_pr_number, branch_name)
comment = f'Closed via {commit.sha}'
add_comment_and_close_pr(pr, comment)
else:
print(f'master branch, pr number is: {pr_number}')
pr = repo.get_pull(pr_number)
pr.add_to_labels('promoted-to-master')
try:
pr = repo.get_pull(pr_number)
pr.add_to_labels('promoted-to-master')
print(f'master branch, pr number is: {pr_number}')
except UnknownObjectException:
print(f'{pr_number} is not a PR but an issue, no need to add label')
processed_prs.add(pr_number)

95
.github/scripts/sync_labels.py vendored Executable file
View File

@@ -0,0 +1,95 @@
#!/usr/bin/env python3
import argparse
import os
import sys
from github import Github
import re
try:
github_token = os.environ["GITHUB_TOKEN"]
except KeyError:
print("Please set the 'GITHUB_TOKEN' environment variable")
sys.exit(1)
def parser():
parse = argparse.ArgumentParser()
parse.add_argument('--repo', type=str, required=True, help='Github repository name (e.g., scylladb/scylladb)')
parse.add_argument('--number', type=int, required=True, help='Pull request or issue number to sync labels from')
parse.add_argument('--label', type=str, default=None, help='Label to add/remove from an issue or PR')
parse.add_argument('--is_issue', action='store_true', help='Determined if label change is in Issue or not')
parse.add_argument('--action', type=str, choices=['opened', 'labeled', 'unlabeled'], required=True, help='Sync labels action')
return parse.parse_args()
def copy_labels_from_linked_issues(repo, pr_number):
pr = repo.get_pull(pr_number)
if pr.body:
linked_issue_numbers = set(re.findall(r'Fixes:? (?:#|https.*?/issues/)(\d+)', pr.body))
for issue_number in linked_issue_numbers:
try:
issue = repo.get_issue(int(issue_number))
for label in issue.labels:
pr.add_to_labels(label.name)
print(f"Labels from issue #{issue_number} copied to PR #{pr_number}")
except Exception as e:
print(f"Error processing issue #{issue_number}: {e}")
def get_linked_pr_from_issue_number(repo, number):
linked_prs = []
for pr in repo.get_pulls(state='all', base='master'):
if pr.body and f'{number}' in pr.body:
linked_prs.append(pr.number)
break
else:
continue
return linked_prs
def get_linked_issues_based_on_pr_body(repo, number):
pr = repo.get_pull(number)
repo_name = repo.full_name
pattern = rf"(?:fix(?:|es|ed)|resolve(?:|d|s))\s*:?\s*(?:(?:(?:{repo_name})?#)|https://github\.com/{repo_name}/issues/)(\d+)"
issue_number_from_pr_body = []
if pr.body is None:
return issue_number_from_pr_body
matches = re.findall(pattern, pr.body, re.IGNORECASE)
if matches:
for match in matches:
issue_number_from_pr_body.append(match)
print(f"Found issue number: {match}")
return issue_number_from_pr_body
def sync_labels(repo, number, label, action, is_issue=False):
if is_issue:
linked_prs_or_issues = get_linked_pr_from_issue_number(repo, number)
else:
linked_prs_or_issues = get_linked_issues_based_on_pr_body(repo, number)
for pr_or_issue_number in linked_prs_or_issues:
if is_issue:
target = repo.get_issue(pr_or_issue_number)
else:
target = repo.get_issue(int(pr_or_issue_number))
if action == 'labeled':
target.add_to_labels(label)
print(f"Label '{label}' successfully added.")
elif action == 'unlabeled':
target.remove_from_labels(label)
print(f"Label '{label}' successfully removed.")
elif action == 'opened':
copy_labels_from_linked_issues(repo, number)
else:
print("Invalid action. Use 'labeled', 'unlabeled' or 'opened'.")
def main():
args = parser()
github = Github(github_token)
repo = github.get_repo(args.repo)
sync_labels(repo, args.number, args.label, args.action, args.is_issue)
if __name__ == "__main__":
main()

View File

@@ -5,9 +5,10 @@ on:
branches:
- master
- branch-*.*
env:
DEFAULT_BRANCH: 'master'
- enterprise
pull_request_target:
types: [labeled]
branches: [master, next, enterprise]
jobs:
check-commit:
@@ -20,17 +21,51 @@ jobs:
env:
GITHUB_CONTEXT: ${{ toJson(github) }}
run: echo "$GITHUB_CONTEXT"
- name: Set Default Branch
id: set_branch
run: |
if [[ "${{ github.repository }}" == *enterprise* ]]; then
echo "DEFAULT_BRANCH=enterprise" >> $GITHUB_ENV
else
echo "DEFAULT_BRANCH=master" >> $GITHUB_ENV
fi
- name: Checkout repository
uses: actions/checkout@v4
with:
repository: ${{ github.repository }}
ref: ${{ env.DEFAULT_BRANCH }}
token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
fetch-depth: 0 # Fetch all history for all tags and branches
- name: Set up Git identity
run: |
git config --global user.name "GitHub Action"
git config --global user.email "action@github.com"
git config --global merge.conflictstyle diff3
- name: Install dependencies
run: sudo apt-get install -y python3-github
run: sudo apt-get install -y python3-github python3-git
- name: Run python script
if: github.event_name == 'push'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --ref ${{ github.ref }}
GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
run: python .github/scripts/label_promoted_commits.py --commits ${{ github.event.before }}..${{ github.sha }} --repository ${{ github.repository }} --ref ${{ github.ref }}
- name: Run auto-backport.py when promotion completed
if: ${{ github.event_name == 'push' && github.ref == format('refs/heads/{0}', env.DEFAULT_BRANCH) }}
env:
GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --commits ${{ github.event.before }}..${{ github.sha }}
- name: Check if label starts with 'backport/' and contains digits
id: check_label
run: |
label_name="${{ github.event.label.name }}"
if [[ "$label_name" =~ ^backport/[0-9]+\.[0-9]+$ ]]; then
echo "Label matches backport/X.X pattern."
echo "backport_label=true" >> $GITHUB_OUTPUT
else
echo "Label does not match the required pattern."
echo "backport_label=false" >> $GITHUB_OUTPUT
fi
- name: Run auto-backport.py when label was added
if: ${{ github.event_name == 'pull_request_target' && steps.check_label.outputs.backport_label == 'true' && github.event.pull_request.state == 'closed' }}
env:
GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }}

View File

@@ -0,0 +1,33 @@
name: Fixes validation for backport PR
on:
pull_request:
types: [opened, reopened, edited]
branches: [branch-*]
jobs:
check-fixes-prefix:
runs-on: ubuntu-latest
steps:
- name: Check PR body for "Fixes" prefix patterns
uses: actions/github-script@v7
with:
script: |
const body = context.payload.pull_request.body;
const repo = context.payload.repository.full_name;
// Regular expression pattern to check for "Fixes" prefix
// Adjusted to dynamically insert the repository full name
const pattern = `Fixes:? (?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)`;
const regex = new RegExp(pattern);
if (!regex.test(body)) {
const error = "PR body does not contain a valid 'Fixes' reference.";
core.setFailed(error);
await github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `:warning: ${error}`
});
}

39
.github/workflows/build-scylla.yaml vendored Normal file
View File

@@ -0,0 +1,39 @@
name: Build Scylla
on:
workflow_call:
inputs:
build_mode:
description: 'the build mode'
type: string
required: true
outputs:
md5sum:
description: 'the md5sum for scylla executable'
value: ${{ jobs.build.outputs.md5sum }}
jobs:
read-toolchain:
uses: ./.github/workflows/read-toolchain.yaml
build:
if: github.repository == 'scylladb/scylladb'
needs:
- read-toolchain
runs-on: ubuntu-latest
container: ${{ needs.read-toolchain.outputs.image }}
outputs:
md5sum: ${{ steps.checksum.outputs.md5sum }}
steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Generate the building system
run: |
git config --global --add safe.directory $GITHUB_WORKSPACE
./configure.py --mode ${{ inputs.build_mode }} --with scylla
- run: |
ninja build/${{ inputs.build_mode }}/scylla
- id: checksum
run: |
checksum=$(md5sum build/${{ inputs.build_mode }}/scylla | cut -c -32)
echo "md5sum=$checksum" >> $GITHUB_OUTPUT

66
.github/workflows/clang-nightly.yaml vendored Normal file
View File

@@ -0,0 +1,66 @@
name: clang-nightly
on:
schedule:
# only at 5AM Saturday
- cron: '0 5 * * SAT'
env:
# use the development branch explicitly
CLANG_VERSION: 20
BUILD_DIR: build
permissions: {}
# cancel the in-progress run upon a repush
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
clang-dev:
name: Build with clang nightly
if: github.repository == 'scylladb/scylladb'
runs-on: ubuntu-latest
container: fedora:40
strategy:
matrix:
build_type:
- Debug
- RelWithDebInfo
- Dev
steps:
- run: |
sudo dnf -y install git
- uses: actions/checkout@v4
with:
submodules: true
- name: Install build dependencies
run: |
# use the copr repo for llvm snapshot builds, see
# https://copr.fedorainfracloud.org/coprs/g/fedora-llvm-team/llvm-snapshots/
sudo dnf -y install 'dnf-command(copr)'
sudo dnf copr enable -y @fedora-llvm-team/llvm-snapshots
# do not install java dependencies, which is not only not used here
sed -i.orig \
-e '/tools\/.*\/install-dependencies.sh/d' \
-e 's/(minio_download_jobs)/(true)/' \
./install-dependencies.sh
sudo ./install-dependencies.sh
sudo dnf -y install lld
- name: Generate the building system
run: |
cmake \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \
-DCMAKE_C_COMPILER=clang-$CLANG_VERSION \
-DCMAKE_CXX_COMPILER=clang++-$CLANG_VERSION \
-G Ninja \
-B $BUILD_DIR \
-S .
# see https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md
- run: |
echo "::add-matcher::.github/clang-matcher.json"
- run: |
cmake --build $BUILD_DIR --target scylla
- run: |
echo "::remove-matcher owner=clang::"

64
.github/workflows/clang-tidy.yaml vendored Normal file
View File

@@ -0,0 +1,64 @@
name: clang-tidy
on:
pull_request:
branches:
- master
paths-ignore:
- '**/*.rst'
- '**/*.md'
- 'docs/**'
- '.github/**'
workflow_dispatch:
env:
BUILD_TYPE: RelWithDebInfo
BUILD_DIR: build
CLANG_TIDY_CHECKS: '-*,bugprone-use-after-move'
permissions: {}
# cancel the in-progress run upon a repush
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
read-toolchain:
uses: ./.github/workflows/read-toolchain.yaml
clang-tidy:
name: Run clang-tidy
needs:
- read-toolchain
runs-on: ubuntu-latest
container: ${{ needs.read-toolchain.outputs.image }}
steps:
- env:
IMAGE: ${{ needs.read-toolchain.image }}
run: |
echo ${{ needs.read-toolchain.image }}
- uses: actions/checkout@v4
with:
submodules: true
- run: |
sudo dnf -y install clang-tools-extra
- name: Generate the building system
run: |
cmake \
-DCMAKE_BUILD_TYPE=$BUILD_TYPE \
-DCMAKE_C_COMPILER=clang \
-DScylla_USE_LINKER=ld.lld \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
-DCMAKE_CXX_CLANG_TIDY="clang-tidy;--checks=$CLANG_TIDY_CHECKS" \
-G Ninja \
-B $BUILD_DIR \
-S .
# see https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md
- run: |
echo "::add-matcher::.github/clang-matcher.json"
- name: Build with clang-tidy enabled
run: |
cmake --build $BUILD_DIR --target scylla
- run: |
echo "::remove-matcher owner=clang::"

17
.github/workflows/codespell.yaml vendored Normal file
View File

@@ -0,0 +1,17 @@
name: codespell
on:
pull_request:
branches:
- master
permissions: {}
jobs:
codespell:
name: Check for spelling errors
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: codespell-project/actions-codespell@master
with:
only_warn: 1
ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison"
skip: "./.git,./build,./tools,*.js,*.lock,./test,./licenses,./redis/lolwut.cc,*.svg"

View File

@@ -1,17 +0,0 @@
name: "Docs / Amplify enhanced"
on: issue_comment
jobs:
build:
runs-on: ubuntu-latest
if: ${{ github.event.issue.pull_request }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Amplify enhanced
env:
TOKEN: ${{ secrets.GITHUB_TOKEN }}
uses: scylladb/sphinx-scylladb-theme/.github/actions/amplify-enhanced@master

View File

@@ -4,12 +4,14 @@ name: "Docs / Publish"
env:
FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}
DEFAULT_BRANCH: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'master' }}
on:
push:
branches:
- 'master'
- 'enterprise'
- 'branch-**'
paths:
- "docs/**"
workflow_dispatch:
@@ -19,14 +21,15 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
ref: ${{ env.DEFAULT_BRANCH }}
persist-credentials: false
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v3
uses: actions/setup-python@v5
with:
python-version: 3.7
python-version: "3.10"
- name: Set up env
run: make -C docs FLAG="${{ env.FLAG }}" setupenv
- name: Build docs

View File

@@ -12,20 +12,21 @@ on:
- enterprise
paths:
- "docs/**"
- "db/config.hh"
- "db/config.cc"
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
persist-credentials: false
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v3
uses: actions/setup-python@v5
with:
python-version: 3.7
python-version: "3.10"
- name: Set up env
run: make -C docs FLAG="${{ env.FLAG }}" setupenv
- name: Build docs

80
.github/workflows/iwyu.yaml vendored Normal file
View File

@@ -0,0 +1,80 @@
name: iwyu
on:
pull_request:
branches:
- master
env:
BUILD_TYPE: RelWithDebInfo
BUILD_DIR: build
CLEANER_OUTPUT_PATH: build/clang-include-cleaner.log
CLEANER_DIRS: test/unit exceptions alternator api auth cdc compaction
permissions: {}
# cancel the in-progress run upon a repush
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
read-toolchain:
uses: ./.github/workflows/read-toolchain.yaml
clang-include-cleaner:
name: "Analyze #includes in source files"
needs:
- read-toolchain
runs-on: ubuntu-latest
container: ${{ needs.read-toolchain.outputs.image }}
steps:
- uses: actions/checkout@v4
with:
submodules: true
- run: |
sudo dnf -y install clang-tools-extra
- name: Generate compilation database
run: |
cmake \
-DCMAKE_BUILD_TYPE=$BUILD_TYPE \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
-G Ninja \
-B $BUILD_DIR \
-S .
- name: Build headers
run: |
swagger_targets=''
for f in api/api-doc/*.json; do
if test "${f#*.}" = json; then
name=$(basename "$f" .json)
if test $name != swagger20_header; then
swagger_targets+=" scylla_swagger_gen_$name"
fi
fi
done
cmake \
--build build \
--target seastar_http_request_parser \
--target idl-sources \
--target $swagger_targets
- run: |
echo "::add-matcher::.github/clang-include-cleaner.json"
- name: clang-include-cleaner
run: |
for d in $CLEANER_DIRS; do
find $d -name '*.cc' -o -name '*.hh' \
-exec echo {} \; \
-exec clang-include-cleaner \
--ignore-headers=seastarx.hh \
--print=changes \
-p $BUILD_DIR \
{} \; | tee --append $CLEANER_OUTPUT_PATH
done
- run: |
echo "::remove-matcher owner=clang-include-cleaner::"
- uses: actions/upload-artifact@v4
with:
name: Logs (clang-include-cleaner)
path: "./${{ env.CLEANER_OUTPUT_PATH }}"

View File

@@ -0,0 +1,27 @@
name: Mark PR as Ready When Conflicts Label is Removed
on:
pull_request_target:
types:
- unlabeled
env:
DEFAULT_BRANCH: 'master'
jobs:
mark-ready:
if: github.event.label.name == 'conflicts'
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
repository: ${{ github.repository }}
ref: ${{ env.DEFAULT_BRANCH }}
token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
fetch-depth: 1
- name: Mark pull request as ready for review
run: gh pr ready "${{ github.event.pull_request.number }}"
env:
GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

View File

@@ -0,0 +1,22 @@
name: PR require backport label
on:
pull_request:
types: [opened, labeled, unlabeled, synchronize]
branches:
- master
- next
jobs:
label:
if: github.event.pull_request.draft == false
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: mheap/github-action-required-labels@v5
with:
mode: minimum
count: 1
labels: "backport/none\nbackport/\\d.\\d"
use_regex: true
add_comment: false

23
.github/workflows/read-toolchain.yaml vendored Normal file
View File

@@ -0,0 +1,23 @@
name: Read Toolchain
on:
workflow_call:
outputs:
image:
description: "the toolchain docker image"
value: ${{ jobs.read-toolchain.outputs.image }}
jobs:
read-toolchain:
runs-on: ubuntu-latest
outputs:
image: ${{ steps.read.outputs.image }}
steps:
- uses: actions/checkout@v4
with:
sparse-checkout: tools/toolchain/image
sparse-checkout-cone-mode: false
- id: read
run: |
image=$(cat tools/toolchain/image)
echo "image=$image" >> $GITHUB_OUTPUT

View File

@@ -0,0 +1,35 @@
name: Check Reproducible Build
on:
schedule:
# 5AM every friday
- cron: '0 5 * * FRI'
permissions: {}
env:
BUILD_MODE: release
jobs:
build-a:
uses: ./.github/workflows/build-scylla.yaml
with:
build_mode: release
build-b:
uses: ./.github/workflows/build-scylla.yaml
with:
build_mode: release
compare-checksum:
if: github.repository == 'scylladb/scylladb'
runs-on: ubuntu-latest
needs:
- build-a
- build-b
steps:
- env:
CHECKSUM_A: ${{needs.build-a.outputs.md5sum}}
CHECKSUM_B: ${{needs.build-b.outputs.md5sum}}
run: |
if [ $CHECKSUM_A != $CHECKSUM_B ]; then \
echo "::error::mismatched checksums: $CHECKSUM_A != $CHECKSUM_B"; \
exit 1; \
fi

50
.github/workflows/seastar.yaml vendored Normal file
View File

@@ -0,0 +1,50 @@
name: Build with the latest Seastar
on:
schedule:
# 5AM everyday
- cron: '0 5 * * *'
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
BUILD_DIR: build
jobs:
build-with-the-latest-seastar:
runs-on: ubuntu-latest
# be consistent with tools/toolchain/image
container: scylladb/scylla-toolchain:fedora-40-20240621
strategy:
matrix:
build_type:
- Debug
- RelWithDebInfo
- Dev
steps:
- uses: actions/checkout@v4
with:
submodules: true
- run: |
rm -rf seastar
- uses: actions/checkout@v4
with:
repository: scylladb/seastar
submodules: true
path: seastar
- name: Generate the building system
run: |
git config --global --add safe.directory $GITHUB_WORKSPACE
cmake \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-G Ninja \
-B $BUILD_DIR \
-S .
- run: |
cmake --build $BUILD_DIR --target scylla

49
.github/workflows/sync-labels.yaml vendored Normal file
View File

@@ -0,0 +1,49 @@
name: Sync labels
on:
pull_request_target:
types: [opened, labeled, unlabeled]
branches: [master, next]
issues:
types: [labeled, unlabeled]
jobs:
label-sync:
if: ${{ github.repository == 'scylladb/scylladb' }}
name: Synchronize labels between PR and the issue(s) fixed by it
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Dump GitHub context
env:
GITHUB_CONTEXT: ${{ toJson(github) }}
run: echo "$GITHUB_CONTEXT"
- name: Checkout repository
uses: actions/checkout@v4
with:
sparse-checkout: |
.github/scripts/sync_labels.py
sparse-checkout-cone-mode: false
- name: Install dependencies
run: sudo apt-get install -y python3-github
- name: Pull request opened event
if: ${{ github.event.action == 'opened' }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }}
- name: Pull request labeled or unlabeled event
if: github.event_name == 'pull_request_target' && startsWith(github.event.label.name, 'backport/')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }} --label ${{ github.event.label.name }}
- name: Issue labeled or unlabeled event
if: github.event_name == 'issues' && startsWith(github.event.label.name, 'backport/')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.issue.number }} --action ${{ github.event.action }} --is_issue --label ${{ github.event.label.name }}

6
.gitignore vendored
View File

@@ -3,6 +3,8 @@
.settings
build
build.ninja
cmake-build-*
build.ninja.new
cscope.*
/debian/
dist/ami/files/*.rpm
@@ -12,13 +14,14 @@ dist/ami/scylla_deploy.sh
Cql.tokens
.kdev4
*.kdev4
.idea
CMakeLists.txt.user
.cache
.tox
*.egg-info
__pycache__CMakeLists.txt.user
.gdbinit
resources
/resources
.pytest_cache
/expressions.tokens
tags
@@ -30,3 +33,4 @@ compile_commands.json
.ccls-cache/
.mypy_cache
.envrc
clang_build

6
.gitmodules vendored
View File

@@ -6,9 +6,9 @@
path = swagger-ui
url = ../scylla-swagger-ui
ignore = dirty
[submodule "scylla-jmx"]
path = tools/jmx
url = ../scylla-jmx
[submodule "abseil"]
path = abseil
url = ../abseil-cpp
[submodule "scylla-tools"]
path = tools/java
url = ../scylla-tools-java

View File

@@ -2,56 +2,106 @@ cmake_minimum_required(VERSION 3.27)
project(scylla)
include(CTest)
list(APPEND CMAKE_MODULE_PATH
${CMAKE_CURRENT_SOURCE_DIR}/cmake
${CMAKE_CURRENT_SOURCE_DIR}/seastar/cmake)
# Set the possible values of build type for cmake-gui
set(scylla_build_types
"Debug" "Release" "Dev" "Sanitize" "Coverage")
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS
${scylla_build_types})
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Release" CACHE
STRING "Choose the type of build." FORCE)
message(WARNING "CMAKE_BUILD_TYPE not specified, Using 'Release'")
elseif(NOT CMAKE_BUILD_TYPE IN_LIST scylla_build_types)
message(FATAL_ERROR "Unknown CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}. "
"Following types are supported: ${scylla_build_types}")
endif()
string(TOUPPER "${CMAKE_BUILD_TYPE}" build_mode)
include(mode.${build_mode})
"Debug" "RelWithDebInfo" "Dev" "Sanitize" "Coverage")
if(DEFINED CMAKE_BUILD_TYPE)
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS
${scylla_build_types})
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "RelWithDebInfo" CACHE
STRING "Choose the type of build." FORCE)
message(WARNING "CMAKE_BUILD_TYPE not specified, Using 'RelWithDebInfo'")
elseif(NOT CMAKE_BUILD_TYPE IN_LIST scylla_build_types)
message(FATAL_ERROR "Unknown CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}. "
"Following types are supported: ${scylla_build_types}")
endif()
endif(DEFINED CMAKE_BUILD_TYPE)
include(mode.common)
add_compile_definitions(
${Seastar_DEFINITIONS_${build_mode}}
FMT_DEPRECATED_OSTREAM)
if(CMAKE_CONFIGURATION_TYPES)
foreach(config ${CMAKE_CONFIGURATION_TYPES})
include(mode.${config})
list(APPEND scylla_build_modes ${scylla_build_mode_${config}})
endforeach()
add_custom_target(mode_list
COMMAND ${CMAKE_COMMAND} -E echo "$<JOIN:${scylla_build_modes}, >"
COMMENT "List configured modes"
BYPRODUCTS mode-list.phony.stamp
COMMAND_EXPAND_LISTS)
else()
include(mode.${CMAKE_BUILD_TYPE})
add_custom_target(mode_list
${CMAKE_COMMAND} -E echo "${scylla_build_mode}"
COMMENT "List configured modes")
endif()
include(limit_jobs)
# Configure Seastar compile options to align with Scylla
set(CMAKE_CXX_STANDARD "20" CACHE INTERNAL "")
set(CMAKE_CXX_STANDARD "23" CACHE INTERNAL "")
set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")
set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")
set(CMAKE_CXX_VISIBILITY_PRESET hidden)
set(Seastar_TESTING ON CACHE BOOL "" FORCE)
set(Seastar_API_LEVEL 7 CACHE STRING "" FORCE)
set(Seastar_DEPRECATED_OSTREAM_FORMATTERS OFF CACHE BOOL "" FORCE)
set(Seastar_APPS ON CACHE BOOL "" FORCE)
set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)
set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)
set(Seastar_IO_URING OFF CACHE BOOL "" FORCE)
set(Seastar_SCHEDULING_GROUPS_COUNT 16 CACHE STRING "" FORCE)
set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)
add_subdirectory(seastar)
set(ABSL_PROPAGATE_CXX_STD ON CACHE BOOL "" FORCE)
find_package(Sanitizers QUIET)
set(sanitizer_cxx_flags
$<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>>)
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set(ABSL_GCC_FLAGS ${sanitizer_cxx_flags})
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
set(ABSL_LLVM_FLAGS ${sanitizer_cxx_flags})
endif()
set(ABSL_DEFAULT_LINKOPTS
$<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_LINK_LIBRARIES>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_LINK_LIBRARIES>>)
add_subdirectory(abseil)
add_library(absl-headers INTERFACE)
target_include_directories(absl-headers SYSTEM INTERFACE
"${PROJECT_SOURCE_DIR}/abseil")
add_library(absl::headers ALIAS absl-headers)
# Exclude absl::strerror from the default "all" target since it's not
# used in Scylla build and, moreover, makes use of deprecated glibc APIs,
# such as sys_nerr, which are not exposed from "stdio.h" since glibc 2.32,
# which happens to be the case for recent Fedora distribution versions.
#
# Need to use the internal "absl_strerror" target name instead of namespaced
# variant because `set_target_properties` does not understand the latter form,
# unfortunately.
set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)
# System libraries dependencies
find_package(Boost REQUIRED
COMPONENTS filesystem program_options system thread regex unit_test_framework)
target_link_libraries(Boost::regex
INTERFACE
ICU::i18n
ICU::uc)
find_package(Lua REQUIRED)
find_package(ZLIB REQUIRED)
find_package(ICU COMPONENTS uc i18n REQUIRED)
find_package(absl COMPONENTS hash raw_hash_set REQUIRED)
find_package(fmt 10.0.0 REQUIRED)
find_package(libdeflate REQUIRED)
find_package(libxcrypt REQUIRED)
find_package(Snappy REQUIRED)
find_package(RapidJSON REQUIRED)
find_package(Thrift REQUIRED)
find_package(xxHash REQUIRED)
find_package(zstd REQUIRED)
set(scylla_gen_build_dir "${CMAKE_BINARY_DIR}/gen")
file(MAKE_DIRECTORY "${scylla_gen_build_dir}")
@@ -59,6 +109,14 @@ file(MAKE_DIRECTORY "${scylla_gen_build_dir}")
include(add_version_library)
generate_scylla_version()
add_library(scylla-zstd STATIC
zstd.cc)
target_link_libraries(scylla-zstd
PRIVATE
db
Seastar::seastar
zstd::libzstd)
add_library(scylla-main STATIC)
target_sources(scylla-main
PRIVATE
@@ -78,9 +136,9 @@ target_sources(scylla-main
debug.cc
init.cc
keys.cc
message/messaging_service.cc
multishard_mutation_query.cc
mutation_query.cc
node_ops/task_manager_module.cc
partition_slice_builder.cc
querier.cc
query.cc
@@ -94,21 +152,52 @@ target_sources(scylla-main
serializer.cc
sstables_loader.cc
table_helper.cc
tasks/task_handler.cc
tasks/task_manager.cc
timeout_config.cc
unimplemented.cc
validation.cc
vint-serialization.cc
zstd.cc)
vint-serialization.cc)
target_link_libraries(scylla-main
PRIVATE
"$<LINK_LIBRARY:WHOLE_ARCHIVE,scylla-zstd>"
db
absl::headers
absl::btree
absl::hash
absl::raw_hash_set
Seastar::seastar
Snappy::snappy
systemd
ZLIB::ZLIB)
option(Scylla_CHECK_HEADERS
"Add check-headers target for checking the self-containness of headers")
if(Scylla_CHECK_HEADERS)
add_custom_target(check-headers)
# compatibility target used by CI, which builds "check-headers" only for
# the "Dev" mode.
# our CI currently builds "dev-headers" using ninja without specify a build
# mode. where "dev" is actually a prefix encoded in the target name for the
# underlying "headers" target. while we don't have this convention in CMake
# targets. in contrast, the "check-headers" which is built for all
# configurations defined by "CMAKE_DEFAULT_CONFIGS". however, we only need
# to build "check-headers" for the "Dev" configuration. Therefore, before
# updating the CI to use build "check-headers:Dev", let's add a new target
# that specifically builds "check-headers" only for Dev configuration. The
# new target will do nothing for other configurations.
add_custom_target(dev-headers
COMMAND ${CMAKE_COMMAND}
"$<IF:$<CONFIG:Dev>,--build;${CMAKE_BINARY_DIR};--config;$<CONFIG>;--target;check-headers,-E;echo;skipping;dev-headers;in;$<CONFIG>>"
COMMAND_EXPAND_LISTS)
endif()
include(check_headers)
check_headers(check-headers scylla-main
GLOB ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
add_custom_target(compiler-training)
add_subdirectory(api)
add_subdirectory(alternator)
add_subdirectory(db)
@@ -121,9 +210,9 @@ add_subdirectory(dht)
add_subdirectory(gms)
add_subdirectory(idl)
add_subdirectory(index)
add_subdirectory(interface)
add_subdirectory(lang)
add_subdirectory(locator)
add_subdirectory(message)
add_subdirectory(mutation)
add_subdirectory(mutation_writer)
add_subdirectory(node_ops)
@@ -138,7 +227,6 @@ add_subdirectory(service)
add_subdirectory(sstables)
add_subdirectory(streaming)
add_subdirectory(test)
add_subdirectory(thrift)
add_subdirectory(tools)
add_subdirectory(tracing)
add_subdirectory(transport)
@@ -165,6 +253,7 @@ target_link_libraries(scylla PRIVATE
index
lang
locator
message
mutation
mutation_writer
raft
@@ -178,52 +267,24 @@ target_link_libraries(scylla PRIVATE
sstables
streaming
test-perf
thrift
tools
tracing
transport
types
utils)
target_link_libraries(Boost::regex
INTERFACE
ICU::i18n
ICU::uc)
target_link_libraries(scylla PRIVATE
seastar
absl::headers
Boost::program_options)
# Force SHA1 build-id generation
set(default_linker_flags "-Wl,--build-id=sha1")
include(CheckLinkerFlag)
set(Scylla_USE_LINKER
""
CACHE
STRING
"Use specified linker instead of the default one")
if(Scylla_USE_LINKER)
set(linkers "${Scylla_USE_LINKER}")
else()
set(linkers "lld" "gold")
endif()
foreach(linker ${linkers})
set(linker_flag "-fuse-ld=${linker}")
check_linker_flag(CXX ${linker_flag} "CXX_LINKER_HAVE_${linker}")
if(CXX_LINKER_HAVE_${linker})
string(APPEND default_linker_flags " ${linker_flag}")
break()
elseif(Scylla_USE_LINKER)
message(FATAL_ERROR "${Scylla_USE_LINKER} is not supported.")
endif()
endforeach()
set(CMAKE_EXE_LINKER_FLAGS "${default_linker_flags}" CACHE INTERNAL "")
# TODO: patch dynamic linker to match configure.py behavior
target_include_directories(scylla PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}"
"${scylla_gen_build_dir}")
add_custom_target(maybe-scylla
DEPENDS $<$<CONFIG:Dev>:$<TARGET_FILE:scylla>>)
add_dependencies(compiler-training
maybe-scylla)
add_subdirectory(dist)

View File

@@ -19,18 +19,18 @@ $ git submodule update --init --recursive
### Dependencies
Scylla is fairly fussy about its build environment, requiring a very recent
version of the C++20 compiler and numerous tools and libraries to build.
version of the C++23 compiler and numerous tools and libraries to build.
Run `./install-dependencies.sh` (as root) to use your Linux distributions's
package manager to install the appropriate packages on your build machine.
However, this will only work on very recent distributions. For example,
currently Fedora users must upgrade to Fedora 32 otherwise the C++ compiler
will be too old, and not support the new C++20 standard that Scylla uses.
will be too old, and not support the new C++23 standard that Scylla uses.
Alternatively, to avoid having to upgrade your build machine or install
various packages on it, we provide another option - the **frozen toolchain**.
This is a script, `./tools/toolchain/dbuild`, that can execute build or run
commands inside a Docker image that contains exactly the right build tools and
commands inside a container that contains exactly the right build tools and
libraries. The `dbuild` technique is useful for beginners, but is also the way
in which ScyllaDB produces official releases, so it is highly recommended.
@@ -43,6 +43,12 @@ $ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
```
Note: do not mix environemtns - either perform all your work with dbuild, or natively on the host.
Note2: you can get to an interactive shell within dbuild by running it without any parameters:
```bash
$ ./tools/toolchain/dbuild
```
### Build system
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native
@@ -116,6 +122,13 @@ Run all tests through the test execution wrapper with
$ ./test.py --mode={debug,release}
```
or, if you are using `dbuild`, you need to build the code and the tests and then you can run them at will:
```bash
$ ./tools/toolchain/dbuild ninja {debug,release,dev}-build
$ ./tools/toolchain/dbuild ./test.py --mode {debug,release,dev}
```
The `--name` argument can be specified to run a particular test.
Alternatively, you can execute the test executable directly. For example,
@@ -199,7 +212,7 @@ The `scylla.yaml` file in the repository by default writes all database data to
Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.
Additionally, when running on under-powered platforms like portable laptops, the `--overprovisined` flag is useful.
Additionally, when running on under-powered platforms like portable laptops, the `--overprovisioned` flag is useful.
On a development machine, one might run Scylla as

View File

@@ -15,7 +15,7 @@ For more information, please see the [ScyllaDB web site].
## Build Prerequisites
Scylla is fairly fussy about its build environment, requiring very recent
versions of the C++20 compiler and of many libraries to build. The document
versions of the C++23 compiler and of many libraries to build. The document
[HACKING.md](HACKING.md) includes detailed information on building and
developing Scylla, but to get Scylla building quickly on (almost) any build
machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md),
@@ -65,11 +65,13 @@ $ ./tools/toolchain/dbuild ./build/release/scylla --help
## Testing
[![Build with the latest Seastar](https://github.com/scylladb/scylladb/actions/workflows/seastar.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/seastar.yaml) [![Check Reproducible Build](https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml) [![clang-nightly](https://github.com/scylladb/scylladb/actions/workflows/clang-nightly.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/clang-nightly.yaml)
See [test.py manual](docs/dev/testing.md).
## Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and
Thrift. There is also support for the API of Amazon DynamoDB™,
By default, Scylla is compatible with Apache Cassandra and its API - CQL.
There is also support for the API of Amazon DynamoDB™,
which needs to be enabled and configured in order to be used. For more
information on how to enable the DynamoDB™ API in Scylla,
and the current compatibility of this feature as well as Scylla-specific extensions, see
@@ -82,11 +84,11 @@ Documentation can be found [here](docs/dev/README.md).
Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).
User documentation can be found [here](https://docs.scylladb.com/).
## Training
## Training
Training material and online courses can be found at [Scylla University](https://university.scylladb.com/).
The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling,
administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions,
Training material and online courses can be found at [Scylla University](https://university.scylladb.com/).
The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling,
administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions,
multi-datacenters and how Scylla integrates with third-party applications.
## Contributing to Scylla

View File

@@ -28,7 +28,7 @@ The files created are:
By default, these files are created in the 'build'
subdirectory under the directory containing the script.
The destination directory can be overriden by
The destination directory can be overridden by
using '-o PATH' option.
END
)
@@ -78,7 +78,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.4.10
VERSION=6.2.4
if test -f version
then
@@ -87,12 +87,14 @@ then
else
SCYLLA_VERSION=$VERSION
if [ -z "$SCYLLA_RELEASE" ]; then
DATE=$(date --utc +%Y%m%d)
GIT_COMMIT=$(git -C "$SCRIPT_DIR" log --pretty=format:'%h' -n 1 --abbrev=12)
# For custom package builds, replace "0" with "counter.your_name",
# For custom package builds, replace "0" with "counter.yourname",
# where counter starts at 1 and increments for successive versions.
# This ensures that the package manager will select your custom
# package over the standard release.
# Do not use any special characters like - or _ in the name above!
# These characters either have special meaning or are illegal in
# version strings.
SCYLLA_BUILD=0
SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
elif [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
@@ -102,7 +104,7 @@ else
fi
if [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" |cut -d . -f 3)
GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" | rev | cut -d . -f 1 | rev)
if [ "$GIT_COMMIT" = "$GIT_COMMIT_FILE" ]; then
exit 0
fi

1
abseil Submodule

Submodule abseil added at d7aaad83b4

View File

@@ -27,4 +27,8 @@ target_link_libraries(alternator
cql3
idl
Seastar::seastar
xxHash::xxhash)
xxHash::xxhash
absl::headers)
check_headers(check-headers alternator
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -7,37 +7,37 @@
*/
#include "alternator/error.hh"
#include "auth/common.hh"
#include "log.hh"
#include <string>
#include <string_view>
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "auth/roles-metadata.hh"
#include "service/storage_proxy.hh"
#include "alternator/executor.hh"
#include "cql3/selection/selection.hh"
#include "query-result-set.hh"
#include "cql3/result_set.hh"
#include "types/types.hh"
#include <seastar/core/coroutine.hh>
namespace alternator {
static logging::logger alogger("alternator-auth");
future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username) {
schema_ptr schema = proxy.data_dictionary().find_schema("system_auth", "roles");
future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username) {
schema_ptr schema = proxy.data_dictionary().find_schema(auth::get_auth_ks_name(as.query_processor()), "roles");
partition_key pk = partition_key::from_single_value(*schema, utf8_type->decompose(username));
dht::partition_range_vector partition_ranges{dht::partition_range(dht::decorate_key(*schema, pk))};
std::vector<query::clustering_range> bounds{query::clustering_range::make_open_ended_both_sides()};
const column_definition* salted_hash_col = schema->get_column_definition(bytes("salted_hash"));
if (!salted_hash_col) {
co_await coroutine::return_exception(api_error::unrecognized_client(format("Credentials cannot be fetched for: {}", username)));
const column_definition* can_login_col = schema->get_column_definition(bytes("can_login"));
if (!salted_hash_col || !can_login_col) {
co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("Credentials cannot be fetched for: {}", username)));
}
auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col});
auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id}, selection->get_query_options());
auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col, can_login_col});
auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id, can_login_col->id}, selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice,
proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));
auto cl = auth::password_authenticator::consistency_for_user(username);
@@ -51,11 +51,18 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::strin
auto result_set = builder.build();
if (result_set->empty()) {
co_await coroutine::return_exception(api_error::unrecognized_client(format("User not found: {}", username)));
co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("User not found: {}", username)));
}
const managed_bytes_opt& salted_hash = result_set->rows().front().front(); // We only asked for 1 row and 1 column
const auto& result = result_set->rows().front();
bool can_login = result[1] && value_cast<bool>(boolean_type->deserialize(*result[1]));
if (!can_login) {
// This is a valid role name, but has "login=False" so should not be
// usable for authentication (see #19735).
co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("Role {} has login=false so cannot be used for login", username)));
}
const managed_bytes_opt& salted_hash = result.front();
if (!salted_hash) {
co_await coroutine::return_exception(api_error::unrecognized_client(format("No password found for user: {}", username)));
co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("No password found for user: {}", username)));
}
co_return value_cast<sstring>(utf8_type->deserialize(*salted_hash));
}

View File

@@ -9,10 +9,8 @@
#pragma once
#include <string>
#include <string_view>
#include <array>
#include "gc_clock.hh"
#include "utils/loading_cache.hh"
#include "auth/service.hh"
namespace service {
class storage_proxy;
@@ -22,6 +20,6 @@ namespace alternator {
using key_cache = utils::loading_cache<std::string, std::string, 1>;
future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username);
future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username);
}

View File

@@ -6,12 +6,9 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <list>
#include <map>
#include <string_view>
#include "alternator/conditions.hh"
#include "alternator/error.hh"
#include "cql3/constants.hh"
#include <unordered_map>
#include "utils/rjson.hh"
#include "serialization.hh"
@@ -45,12 +42,12 @@ comparison_operator_type get_comparison_operator(const rjson::value& comparison_
{"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},
};
if (!comparison_operator.IsString()) {
throw api_error::validation(format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
throw api_error::validation(fmt::format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
}
std::string op = comparison_operator.GetString();
auto it = ops.find(op);
if (it == ops.end()) {
throw api_error::validation(format("Unsupported comparison operator {}", op));
throw api_error::validation(fmt::format("Unsupported comparison operator {}", op));
}
return it->second;
}
@@ -342,7 +339,7 @@ static bool check_NOT_NULL(const rjson::value* val) {
}
// Only types S, N or B (string, number or bytes) may be compared by the
// various comparion operators - lt, le, gt, ge, and between.
// various comparison operators - lt, le, gt, ge, and between.
// Note that in particular, if the value is missing (v->IsNull()), this
// check returns false.
static bool check_comparable_type(const rjson::value& v) {
@@ -432,7 +429,7 @@ static bool check_BETWEEN(const T& v, const T& lb, const T& ub, bool bounds_from
if (cmp_lt()(ub, lb)) {
if (bounds_from_query) {
throw api_error::validation(
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
fmt::format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
} else {
return false;
}
@@ -616,7 +613,7 @@ conditional_operator_type get_conditional_operator(const rjson::value& req) {
return conditional_operator_type::OR;
} else {
throw api_error::validation(
format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));
fmt::format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));
}
}

View File

@@ -18,8 +18,6 @@
#pragma once
#include "cql3/restrictions/statement_restrictions.hh"
#include "serialization.hh"
#include "expressions_types.hh"
namespace alternator {

View File

@@ -32,8 +32,10 @@ controller::controller(
sharded<service::memory_limiter>& memory_limiter,
sharded<auth::service>& auth_service,
sharded<qos::service_level_controller>& sl_controller,
const db::config& config)
: _gossiper(gossiper)
const db::config& config,
seastar::scheduling_group sg)
: protocol_server(sg)
, _gossiper(gossiper)
, _proxy(proxy)
, _mm(mm)
, _sys_dist_ks(sys_dist_ks)
@@ -62,7 +64,9 @@ std::vector<socket_address> controller::listen_addresses() const {
}
future<> controller::start_server() {
return seastar::async([this] {
seastar::thread_attributes attr;
attr.sched_group = _sched_group;
return seastar::async(std::move(attr), [this] {
_listen_addresses.clear();
auto preferred = _config.listen_interface_prefer_ipv6() ? std::make_optional(net::inet_address::family::INET6) : std::nullopt;
@@ -73,11 +77,11 @@ future<> controller::start_server() {
// shards - if necessary for LWT.
smp_service_group_config c;
c.max_nonlocal_requests = 5000;
_ssg = create_smp_service_group(c).get0();
_ssg = create_smp_service_group(c).get();
rmw_operation::set_default_write_isolation(_config.alternator_write_isolation());
net::inet_address addr = utils::resolve(_config.alternator_address, family).get0();
net::inet_address addr = utils::resolve(_config.alternator_address, family).get();
auto get_cdc_metadata = [] (cdc::generation_service& svc) { return std::ref(svc.get_cdc_metadata()); };
auto get_timeout_in_ms = [] (const db::config& cfg) -> utils::updateable_value<uint32_t> {
@@ -156,7 +160,9 @@ future<> controller::stop_server() {
}
future<> controller::request_stop_server() {
return stop_server();
return with_scheduling_group(_sched_group, [this] {
return stop_server();
});
}
}

View File

@@ -80,7 +80,8 @@ public:
sharded<service::memory_limiter>& memory_limiter,
sharded<auth::service>& auth_service,
sharded<qos::service_level_controller>& sl_controller,
const db::config& config);
const db::config& config,
seastar::scheduling_group sg);
virtual sstring name() const override;
virtual sstring protocol() const override;

View File

@@ -10,6 +10,7 @@
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
#include "utils/rjson.hh"
namespace alternator {
@@ -27,10 +28,16 @@ public:
status_type _http_code;
std::string _type;
std::string _msg;
api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)
// Additional data attached to the error, null value if not set. It's wrapped in copyable_value
// class because copy constructor is required for exception classes otherwise it won't compile
// (despite that its use may be optimized away).
rjson::copyable_value _extra_fields;
api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request,
rjson::value extra_fields = rjson::null_value())
: _http_code(std::move(http_code))
, _type(std::move(type))
, _msg(std::move(msg))
, _extra_fields(std::move(extra_fields))
{ }
// Factory functions for some common types of DynamoDB API errors
@@ -58,8 +65,13 @@ public:
static api_error access_denied(std::string msg) {
return api_error("AccessDeniedException", std::move(msg));
}
static api_error conditional_check_failed(std::string msg) {
return api_error("ConditionalCheckFailedException", std::move(msg));
static api_error conditional_check_failed(std::string msg, rjson::value&& item) {
if (!item.IsNull()) {
auto tmp = rjson::empty_object();
rjson::add(tmp, "Item", std::move(item));
item = std::move(tmp);
}
return api_error("ConditionalCheckFailedException", std::move(msg), status_type::bad_request, std::move(item));
}
static api_error expired_iterator(std::string msg) {
return api_error("ExpiredIteratorException", std::move(msg));

File diff suppressed because it is too large Load Diff

View File

@@ -9,7 +9,6 @@
#pragma once
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
#include <seastar/json/json_elements.hh>
#include <seastar/core/sharded.hh>
@@ -263,4 +262,9 @@ public:
// add more than a couple of levels in its own output construction.
bool is_big(const rjson::value& val, int big_size = 100'000);
// Check CQL's Role-Based Access Control (RBAC) permission (MODIFY,
// SELECT, DROP, etc.) on the given table. When permission is denied an
// appropriate user-readable api_error::access_denied is thrown.
future<> verify_permission(const service::client_state&, const schema_ptr&, auth::permission);
}

View File

@@ -28,7 +28,7 @@
namespace alternator {
template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>
template <typename Func, typename Result = std::invoke_result_t<Func, expressionsParser&>>
static Result do_with_parser(std::string_view input, Func&& f) {
expressionsLexer::InputStreamType input_stream{
reinterpret_cast<const ANTLR_UINT8*>(input.data()),
@@ -43,7 +43,7 @@ static Result do_with_parser(std::string_view input, Func&& f) {
return result;
}
template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>
template <typename Func, typename Result = std::invoke_result_t<Func, expressionsParser&>>
static Result parse(const char* input_name, std::string_view input, Func&& f) {
if (input.length() > 4096) {
throw expressions_syntax_error(format("{} expression size {} exceeds allowed maximum 4096.",
@@ -57,10 +57,10 @@ static Result parse(const char* input_name, std::string_view input, Func&& f) {
// TODO: displayRecognitionError could set a position inside the
// expressions_syntax_error in throws, and we could use it here to
// mark the broken position in 'input'.
throw expressions_syntax_error(format("Failed parsing {} '{}': {}",
throw expressions_syntax_error(fmt::format("Failed parsing {} '{}': {}",
input_name, input, e.what()));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing {} '{}': {}",
throw expressions_syntax_error(fmt::format("Failed parsing {} '{}': {}",
input_name, input, std::current_exception()));
}
}
@@ -133,21 +133,6 @@ void path::check_depth_limit() {
}
}
std::ostream& operator<<(std::ostream& os, const path& p) {
os << p.root();
for (const auto& op : p.operators()) {
std::visit(overloaded_functor {
[&] (const std::string& member) {
os << '.' << member;
},
[&] (unsigned index) {
os << '[' << index << ']';
}
}, op);
}
return os;
}
} // namespace parsed
// The following resolve_*() functions resolve references in parsed
@@ -175,12 +160,12 @@ static std::optional<std::string> resolve_path_component(const std::string& colu
if (column_name.size() > 0 && column_name.front() == '#') {
if (!expression_attribute_names) {
throw api_error::validation(
format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));
fmt::format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));
}
const rjson::value* value = rjson::find(*expression_attribute_names, column_name);
if (!value || !value->IsString()) {
throw api_error::validation(
format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));
fmt::format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));
}
used_attribute_names.emplace(column_name);
return std::string(rjson::to_string_view(*value));
@@ -217,16 +202,16 @@ static void resolve_constant(parsed::constant& c,
[&] (const std::string& valref) {
if (!expression_attribute_values) {
throw api_error::validation(
format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));
fmt::format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));
}
const rjson::value* value = rjson::find(*expression_attribute_values, valref);
if (!value) {
throw api_error::validation(
format("ExpressionAttributeValues missing entry '{}' required by expression", valref));
fmt::format("ExpressionAttributeValues missing entry '{}' required by expression", valref));
}
if (value->IsNull()) {
throw api_error::validation(
format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));
fmt::format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));
}
validate_value(*value, "ExpressionAttributeValues");
used_attribute_values.emplace(valref);
@@ -723,7 +708,7 @@ rjson::value calculate_value(const parsed::value& v,
auto function_it = function_handlers.find(std::string_view(f._function_name));
if (function_it == function_handlers.end()) {
throw api_error::validation(
format("{}: unknown function '{}' called.", caller, f._function_name));
fmt::format("{}: unknown function '{}' called.", caller, f._function_name));
}
return function_it->second(caller, previous_item, f);
},
@@ -756,3 +741,20 @@ rjson::value calculate_value(const parsed::set_rhs& rhs,
}
} // namespace alternator
auto fmt::formatter<alternator::parsed::path>::format(const alternator::parsed::path& p, fmt::format_context& ctx) const
-> decltype(ctx.out()) {
auto out = ctx.out();
out = fmt::format_to(out, "{}", p.root());
for (const auto& op : p.operators()) {
std::visit(overloaded_functor {
[&] (const std::string& member) {
out = fmt::format_to(out, ".{}", member);
},
[&] (unsigned index) {
out = fmt::format_to(out, "[{}]", index);
}
}, op);
}
return out;
}

View File

@@ -60,24 +60,30 @@ enum class calculate_value_caller {
UpdateExpression, ConditionExpression, ConditionExpressionAlone
};
inline std::ostream& operator<<(std::ostream& out, calculate_value_caller caller) {
switch (caller) {
case calculate_value_caller::UpdateExpression:
out << "UpdateExpression";
break;
case calculate_value_caller::ConditionExpression:
out << "ConditionExpression";
break;
case calculate_value_caller::ConditionExpressionAlone:
out << "ConditionExpression";
break;
default:
out << "unknown type of expression";
break;
}
return out;
}
template <> struct fmt::formatter<alternator::calculate_value_caller> {
constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
auto format(alternator::calculate_value_caller caller, fmt::format_context& ctx) const {
std::string_view name = "unknown type of expression";
switch (caller) {
using enum alternator::calculate_value_caller;
case UpdateExpression:
name = "UpdateExpression";
break;
case ConditionExpression:
name = "ConditionExpression";
break;
case ConditionExpressionAlone:
name = "ConditionExpression";
break;
}
return fmt::format_to(ctx.out(), "{}", name);
}
};
namespace alternator {
rjson::value calculate_value(const parsed::value& v,
calculate_value_caller caller,
const rjson::value* previous_item);

View File

@@ -66,7 +66,6 @@ public:
std::vector<std::variant<std::string, unsigned>>& operators() {
return _operators;
}
friend std::ostream& operator<<(std::ostream&, const path&);
};
// When an expression is first parsed, all constants are references, like
@@ -255,3 +254,7 @@ public:
} // namespace parsed
} // namespace alternator
template <> struct fmt::formatter<alternator::parsed::path> : fmt::formatter<string_view> {
auto format(const alternator::parsed::path&, fmt::format_context& ctx) const -> decltype(ctx.out());
};

View File

@@ -19,7 +19,7 @@ namespace alternator {
// operations which may involve a read of the item before the write
// (so-called Read-Modify-Write operations). These operations include PutItem,
// UpdateItem and DeleteItem: All of these may be conditional operations (the
// "Expected" parameter) which requir a read before the write, and UpdateItem
// "Expected" parameter) which require a read before the write, and UpdateItem
// may also have an update expression which refers to the item's old value.
//
// The code below supports running the read and the write together as one
@@ -69,7 +69,11 @@ protected:
enum class returnvalues {
NONE, ALL_OLD, UPDATED_OLD, ALL_NEW, UPDATED_NEW
} _returnvalues;
enum class returnvalues_on_condition_check_failure {
NONE, ALL_OLD
} _returnvalues_on_condition_check_failure;
static returnvalues parse_returnvalues(const rjson::value& request);
static returnvalues_on_condition_check_failure parse_returnvalues_on_condition_check_failure(const rjson::value& request);
// When _returnvalues != NONE, apply() should store here, in JSON form,
// the values which are to be returned in the "Attributes" field.
// The default null JSON means do not return an Attributes field at all.
@@ -77,6 +81,8 @@ protected:
// it (see explanation below), but note that because apply() may be
// called more than once, if apply() will sometimes set this field it
// must set it (even if just to the default empty value) every time.
// Additionally when _returnvalues_on_condition_check_failure is ALL_OLD
// then condition check failure will also result in storing values here.
mutable rjson::value _return_attributes;
public:
// The constructor of a rmw_operation subclass should parse the request

View File

@@ -11,7 +11,6 @@
#include "log.hh"
#include "serialization.hh"
#include "error.hh"
#include "rapidjson/writer.h"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
#include "mutation/position_in_partition.hh"
@@ -59,7 +58,7 @@ type_representation represent_type(alternator_type atype) {
// calculate its magnitude and precision from its scale() and unscaled_value().
// So in the following ugly implementation we calculate them from the string
// representation instead. We assume the number was already parsed
// sucessfully to a big_decimal to it follows its syntax rules.
// successfully to a big_decimal to it follows its syntax rules.
//
// FIXME: rewrite this function to take a big_decimal, not a string.
// Maybe a snippet like this can help:
@@ -144,17 +143,17 @@ static big_decimal parse_and_validate_number(std::string_view s) {
big_decimal ret(s);
auto [magnitude, precision] = internal::get_magnitude_and_precision(s);
if (magnitude > 125) {
throw api_error::validation(format("Number overflow: {}. Attempting to store a number with magnitude larger than supported range.", s));
throw api_error::validation(fmt::format("Number overflow: {}. Attempting to store a number with magnitude larger than supported range.", s));
}
if (magnitude < -130) {
throw api_error::validation(format("Number underflow: {}. Attempting to store a number with magnitude lower than supported range.", s));
throw api_error::validation(fmt::format("Number underflow: {}. Attempting to store a number with magnitude lower than supported range.", s));
}
if (precision > 38) {
throw api_error::validation(format("Number too precise: {}. Attempting to store a number with more significant digits than supported.", s));
throw api_error::validation(fmt::format("Number too precise: {}. Attempting to store a number with more significant digits than supported.", s));
}
return ret;
} catch (const marshal_exception& e) {
throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", s));
throw api_error::validation(fmt::format("The parameter cannot be converted to a numeric value: {}", s));
}
}
@@ -266,7 +265,7 @@ bytes get_key_column_value(const rjson::value& item, const column_definition& co
std::string column_name = column.name_as_text();
const rjson::value* key_typed_value = rjson::find(item, column_name);
if (!key_typed_value) {
throw api_error::validation(format("Key column {} not found", column_name));
throw api_error::validation(fmt::format("Key column {} not found", column_name));
}
return get_key_from_typed_value(*key_typed_value, column);
}
@@ -278,19 +277,26 @@ bytes get_key_column_value(const rjson::value& item, const column_definition& co
// mentioned in the exception message).
// If the type does match, a reference to the encoded value is returned.
static const rjson::value& get_typed_value(const rjson::value& key_typed_value, std::string_view type_str, std::string_view name, std::string_view value_name) {
if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1 ||
!key_typed_value.MemberBegin()->value.IsString()) {
if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {
throw api_error::validation(
format("Malformed value object for {} {}: {}",
fmt::format("Malformed value object for {} {}: {}",
value_name, name, key_typed_value));
}
auto it = key_typed_value.MemberBegin();
if (rjson::to_string_view(it->name) != type_str) {
throw api_error::validation(
format("Type mismatch: expected type {} for {} {}, got type {}",
fmt::format("Type mismatch: expected type {} for {} {}, got type {}",
type_str, value_name, name, it->name));
}
// We assume this function is called just for key types (S, B, N), and
// all of those always have a string value in the JSON.
if (!it->value.IsString()) {
throw api_error::validation(
fmt::format("Malformed value object for {} {}: {}",
value_name, name, key_typed_value));
}
return it->value;
}
@@ -396,16 +402,16 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
if (!v.IsObject() || v.MemberCount() != 1) {
throw api_error::validation(format("{}: invalid number object", diagnostic));
throw api_error::validation(fmt::format("{}: invalid number object", diagnostic));
}
auto it = v.MemberBegin();
if (it->name != "N") {
throw api_error::validation(format("{}: expected number, found type '{}'", diagnostic, it->name));
throw api_error::validation(fmt::format("{}: expected number, found type '{}'", diagnostic, it->name));
}
if (!it->value.IsString()) {
// We shouldn't reach here. Callers normally validate their input
// earlier with validate_value().
throw api_error::validation(format("{}: improperly formatted number constant", diagnostic));
throw api_error::validation(fmt::format("{}: improperly formatted number constant", diagnostic));
}
big_decimal ret = parse_and_validate_number(rjson::to_string_view(it->value));
return ret;
@@ -486,7 +492,7 @@ rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {
auto [set1_type, set1] = unwrap_set(v1);
auto [set2_type, set2] = unwrap_set(v2);
if (set1_type != set2_type) {
throw api_error::validation(format("Mismatched set types: {} and {}", set1_type, set2_type));
throw api_error::validation(fmt::format("Mismatched set types: {} and {}", set1_type, set2_type));
}
if (!set1 || !set2) {
throw api_error::validation("UpdateExpression: ADD operation for sets must be given sets as arguments");
@@ -514,7 +520,7 @@ std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value&
auto [set1_type, set1] = unwrap_set(v1);
auto [set2_type, set2] = unwrap_set(v2);
if (set1_type != set2_type) {
throw api_error::validation(format("Set DELETE type mismatch: {} and {}", set1_type, set2_type));
throw api_error::validation(fmt::format("Set DELETE type mismatch: {} and {}", set1_type, set2_type));
}
if (!set1 || !set2) {
throw api_error::validation("UpdateExpression: DELETE operation can only be performed on a set");

View File

@@ -8,6 +8,7 @@
#include "alternator/server.hh"
#include "log.hh"
#include <fmt/ranges.h>
#include <seastar/http/function_handlers.hh>
#include <seastar/http/short_streams.hh>
#include <seastar/core/coroutine.hh>
@@ -16,14 +17,18 @@
#include <seastar/util/short_streams.hh>
#include "seastarx.hh"
#include "error.hh"
#include "service/client_state.hh"
#include "service/qos/service_level_controller.hh"
#include "utils/assert.hh"
#include "timeout_config.hh"
#include "utils/rjson.hh"
#include "auth.hh"
#include <cctype>
#include <string_view>
#include <utility>
#include "service/storage_proxy.hh"
#include "gms/gossiper.hh"
#include "utils/overloaded_functor.hh"
#include "utils/fb_utilities.hh"
#include "utils/aws_sigv4.hh"
static logging::logger slogger("alternator-server");
@@ -34,8 +39,6 @@ using reply = http::reply;
namespace alternator {
static constexpr auto TARGET = "X-Amz-Target";
inline std::vector<std::string_view> split(std::string_view text, char separator) {
std::vector<std::string_view> tokens;
if (text == "") {
@@ -118,7 +121,7 @@ public:
}
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
auto res = resf.get0();
auto res = resf.get();
std::visit(overloaded_functor {
[&] (const json::json_return_type& json_return_value) {
slogger.trace("api_handler success case");
@@ -156,6 +159,9 @@ public:
protected:
void generate_error_reply(reply& rep, const api_error& err) {
rjson::value results = rjson::empty_object();
if (!err._extra_fields.IsNull() && err._extra_fields.IsObject()) {
results = rjson::copy(err._extra_fields);
}
rjson::add(results, "__type", rjson::from_string("com.amazonaws.dynamodb.v20120810#" + err._type));
rjson::add(results, "message", err._msg);
rep._content = rjson::print(std::move(results));
@@ -210,9 +216,11 @@ protected:
for (auto& ip : local_dc_nodes) {
// Note that it's not enough for the node to be is_alive() - a
// node joining the cluster is also "alive" but not responsive to
// requests. We need the node to be in normal state. See #19694.
if (_gossiper.is_normal(ip)) {
rjson::push_back(results, rjson::from_string(ip.to_sstring()));
// requests. We alive *and* normal. See #19694, #21538.
if (_gossiper.is_alive(ip) && _gossiper.is_normal(ip)) {
// Use the gossiped broadcast_rpc_address if available instead
// of the internal IP address "ip". See discussion in #18711.
rjson::push_back(results, rjson::from_string(_gossiper.get_rpc_address(ip)));
}
}
rep->set_status(reply::status_type::ok);
@@ -255,7 +263,7 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
std::string_view authorization_header = authorization_it->second;
auto pos = authorization_header.find_first_of(' ');
if (pos == std::string_view::npos || authorization_header.substr(0, pos) != "AWS4-HMAC-SHA256") {
throw api_error::invalid_signature(format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header));
throw api_error::invalid_signature(fmt::format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header));
}
authorization_header.remove_prefix(pos+1);
std::string credential;
@@ -290,7 +298,7 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
std::vector<std::string_view> credential_split = split(credential, '/');
if (credential_split.size() != 5) {
throw api_error::validation(format("Incorrect credential information format: {}", credential));
throw api_error::validation(fmt::format("Incorrect credential information format: {}", credential));
}
std::string user(credential_split[0]);
std::string datestamp(credential_split[1]);
@@ -311,8 +319,8 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
}
}
auto cache_getter = [&proxy = _proxy] (std::string username) {
return get_key_from_roles(proxy, std::move(username));
auto cache_getter = [&proxy = _proxy, &as = _auth_service] (std::string username) {
return get_key_from_roles(proxy, as, std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then([this, &req, &content,
user = std::move(user),
@@ -375,7 +383,7 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_
std::string buf;
tracing::add_session_param(trace_state, "alternator_op", op);
tracing::add_query(trace_state, truncated_content_view(query, buf));
tracing::begin(trace_state, format("Alternator {}", op), client_state.get_client_address());
tracing::begin(trace_state, seastar::format("Alternator {}", op), client_state.get_client_address());
if (!username.empty()) {
tracing::set_username(trace_state, auth::authenticated_user(username));
}
@@ -385,10 +393,10 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_
future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {
_executor._stats.total_operations++;
sstring target = req->get_header(TARGET);
std::vector<std::string_view> split_target = split(target, '.');
//NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
std::string op = split_target.empty() ? std::string() : std::string(split_target.back());
sstring target = req->get_header("X-Amz-Target");
// target is DynamoDB API version followed by a dot '.' and operation type (e.g. CreateTable)
auto dot = target.find('.');
std::string_view op = (dot == sstring::npos) ? std::string_view() : std::string_view(target).substr(dot+1);
// JSON parsing can allocate up to roughly 2x the size of the raw
// document, + a couple of bytes for maintenance.
// TODO: consider the case where req->content_length is missing. Maybe
@@ -400,7 +408,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
++_executor._stats.requests_blocked_memory;
}
auto units = co_await std::move(units_fut);
assert(req->content_stream);
SCYLLA_ASSERT(req->content_stream);
chunked_content content = co_await util::read_entire_stream(*req->content_stream);
auto username = co_await verify_signature(*req, content);
@@ -411,7 +419,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
auto callback_it = _callbacks.find(op);
if (callback_it == _callbacks.end()) {
_executor._stats.unsupported_operations++;
co_return api_error::unknown_operation(format("Unsupported operation {}", op));
co_return api_error::unknown_operation(fmt::format("Unsupported operation {}", op));
}
if (_pending_requests.get_count() >= _max_concurrent_requests) {
_executor._stats.requests_shed++;
@@ -419,11 +427,11 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
}
_pending_requests.enter();
auto leave = defer([this] () noexcept { _pending_requests.leave(); });
//FIXME: Client state can provide more context, e.g. client's endpoint address
// We use unique_ptr because client_state cannot be moved or copied
executor::client_state client_state = username.empty()
? service::client_state{service::client_state::internal_tag()}
: service::client_state{service::client_state::internal_tag(), _auth_service, _sl_controller, username};
executor::client_state client_state(service::client_state::external_tag(),
_auth_service, &_sl_controller, _timeout_config.current_values(), req->get_client_address());
if (!username.empty()) {
client_state.set_login(auth::authenticated_user(username));
}
co_await client_state.maybe_update_per_service_level_params();
tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content);
@@ -470,6 +478,7 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos
, _enforce_authorization(false)
, _enabled_servers{}
, _pending_requests{}
, _timeout_config(_proxy.data_dictionary().get_config())
, _callbacks{
{"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
@@ -569,14 +578,14 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:
set_routes(_https_server._routes);
_https_server.set_content_length_limit(server::content_length_limit);
_https_server.set_content_streaming(true);
_https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
auto server_creds = creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
if (ep) {
slogger.warn("Exception loading {}: {}", files, ep);
} else {
slogger.info("Reloaded {}", files);
}
}).get0());
_https_server.listen(socket_address{addr, *https_port}).get();
}).get();
_https_server.listen(socket_address{addr, *https_port}, std::move(server_creds)).get();
_enabled_servers.push_back(std::ref(_https_server));
}
});
@@ -634,7 +643,7 @@ future<> server::json_parser::stop() {
const char* api_error::what() const noexcept {
if (_what_string.empty()) {
_what_string = format("{} {}: {}", static_cast<int>(_http_code), _type, _msg);
_what_string = fmt::format("{} {}: {}", std::to_underlying(_http_code), _type, _msg);
}
return _what_string.c_str();
}

View File

@@ -42,6 +42,11 @@ class server {
bool _enforce_authorization;
utils::small_vector<std::reference_wrapper<seastar::httpd::http_server>, 2> _enabled_servers;
gate _pending_requests;
// In some places we will need a CQL updateable_timeout_config object even
// though it isn't really relevant for Alternator which defines its own
// timeouts separately. We can create this object only once.
updateable_timeout_config _timeout_config;
alternator_callbacks_map _callbacks;
semaphore* _memory_limiter;

View File

@@ -21,10 +21,12 @@ stats::stats() : api_operations{} {
_metrics.add_group("alternator", {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}).set_skip_when_empty(),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name);}),
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name.histogram());}).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(), \
seastar::metrics::make_summary("op_latency_summary", \
seastar::metrics::description("Latency summary of an operation via Alternator API"), [this]{return to_metrics_summary(api_operations.name.summary());})(op(CamelCaseName)).set_skip_when_empty(),
OPERATION(batch_get_item, "BatchGetItem")
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")
@@ -65,6 +67,8 @@ stats::stats() : api_operations{} {
OPERATION_LATENCY(get_item_latency, "GetItem")
OPERATION_LATENCY(delete_item_latency, "DeleteItem")
OPERATION_LATENCY(update_item_latency, "UpdateItem")
OPERATION_LATENCY(batch_write_item_latency, "BatchWriteItem")
OPERATION_LATENCY(batch_get_item_latency, "BatchGetItem")
OPERATION(list_streams, "ListStreams")
OPERATION(describe_stream, "DescribeStream")
OPERATION(get_shard_iterator, "GetShardIterator")
@@ -92,6 +96,10 @@ stats::stats() : api_operations{} {
seastar::metrics::description("number of rows read and matched during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
seastar::metrics::description("number of rows read and dropped during filtering operations")),
seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchWriteItem")},
api_operations.batch_write_item_batch_total).set_skip_when_empty(),
seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchGetItem")},
api_operations.batch_get_item_batch_total).set_skip_when_empty(),
});
}

View File

@@ -11,8 +11,7 @@
#include <cstdint>
#include <seastar/core/metrics_registration.hh>
#include "seastarx.hh"
#include "utils/estimated_histogram.hh"
#include "utils/histogram.hh"
#include "cql3/stats.hh"
namespace alternator {
@@ -27,6 +26,8 @@ public:
struct {
uint64_t batch_get_item = 0;
uint64_t batch_write_item = 0;
uint64_t batch_get_item_batch_total = 0;
uint64_t batch_write_item_batch_total = 0;
uint64_t create_backup = 0;
uint64_t create_global_table = 0;
uint64_t create_table = 0;
@@ -66,11 +67,13 @@ public:
uint64_t get_shard_iterator = 0;
uint64_t get_records = 0;
utils::time_estimated_histogram put_item_latency;
utils::time_estimated_histogram get_item_latency;
utils::time_estimated_histogram delete_item_latency;
utils::time_estimated_histogram update_item_latency;
utils::time_estimated_histogram get_records_latency;
utils::timed_rate_moving_average_summary_and_histogram put_item_latency;
utils::timed_rate_moving_average_summary_and_histogram get_item_latency;
utils::timed_rate_moving_average_summary_and_histogram delete_item_latency;
utils::timed_rate_moving_average_summary_and_histogram update_item_latency;
utils::timed_rate_moving_average_summary_and_histogram batch_write_item_latency;
utils::timed_rate_moving_average_summary_and_histogram batch_get_item_latency;
utils::timed_rate_moving_average_summary_and_histogram get_records_latency;
} api_operations;
// Miscellaneous event counters
uint64_t total_operations = 0;

View File

@@ -13,8 +13,7 @@
#include <seastar/json/formatter.hh>
#include "utils/base64.hh"
#include "log.hh"
#include "auth/permission.hh"
#include "db/config.hh"
#include "cdc/log.hh"
@@ -25,7 +24,6 @@
#include "utils/UUID_gen.hh"
#include "cql3/selection/selection.hh"
#include "cql3/result_set.hh"
#include "cql3/type_json.hh"
#include "cql3/column_identifier.hh"
#include "schema/schema_builder.hh"
#include "service/storage_proxy.hh"
@@ -33,7 +31,6 @@
#include "gms/feature_service.hh"
#include "executor.hh"
#include "rmw_operation.hh"
#include "data_dictionary/data_dictionary.hh"
/**
@@ -237,11 +234,8 @@ struct shard_id {
// dynamo specifies shardid as max 65 chars.
friend std::ostream& operator<<(std::ostream& os, const shard_id& id) {
boost::io::ios_flags_saver fs(os);
return os << marker << std::hex
<< id.time.time_since_epoch().count()
<< ':' << id.id.to_bytes()
;
fmt::print(os, "{} {:x}:{}", marker, id.time.time_since_epoch().count(), id.id.to_bytes());
return os;
}
};
@@ -280,7 +274,7 @@ struct sequence_number {
* Timeuuids viewed as msb<<64|lsb are _not_,
* but they are still sorted as
* timestamp() << 64|lsb
* so we can simpy unpack the mangled msb
* so we can simply unpack the mangled msb
* and use as hi 64 in our "bignum".
*/
uint128_t hi = uint64_t(num.uuid.timestamp());
@@ -419,7 +413,7 @@ using namespace std::string_literals;
*
* In scylla, this is sort of akin to an ID having corresponding ID/ID:s
* that cover the token range it represents. Because ID:s are per
* vnode shard however, this relation can be somewhat ambigous.
* vnode shard however, this relation can be somewhat ambiguous.
* We still provide some semblance of this by finding the ID in
* older generation that has token start < current ID token start.
* This will be a partial overlap, but it is the best we can do.
@@ -526,7 +520,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
// (see explanation above) since we want to find closest
// token boundary when determining parent.
// #7346 - we processed and searched children/parents in
// stored order, which is not neccesarily token order,
// stored order, which is not necessarily token order,
// so the finding of "closest" token boundary (using upper bound)
// could give somewhat weird results.
static auto token_cmp = [](const cdc::stream_id& id1, const cdc::stream_id& id2) {
@@ -783,7 +777,7 @@ struct event_id {
cdc::stream_id stream;
utils::UUID timestamp;
static const auto marker = 'E';
static constexpr auto marker = 'E';
event_id(cdc::stream_id s, utils::UUID ts)
: stream(s)
@@ -791,10 +785,8 @@ struct event_id {
{}
friend std::ostream& operator<<(std::ostream& os, const event_id& id) {
boost::io::ios_flags_saver fs(os);
return os << marker << std::hex << id.stream.to_bytes()
<< ':' << id.timestamp
;
fmt::print(os, "{}{}:{}", marker, id.stream.to_bytes(), id.timestamp);
return os;
}
};
}
@@ -827,11 +819,13 @@ future<executor::request_return_type> executor::get_records(client_state& client
}
if (!schema || !base || !is_alternator_keyspace(schema->ks_name())) {
throw api_error::resource_not_found(fmt::to_string(iter.table));
co_return api_error::resource_not_found(fmt::to_string(iter.table));
}
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
co_await verify_permission(client_state, schema, auth::permission::SELECT);
db::consistency_level cl = db::consistency_level::LOCAL_QUORUM;
partition_key pk = iter.shard.id.to_partition_key(*schema);
@@ -896,7 +890,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),
query::tombstone_limit(_proxy.get_tombstone_limit()), query::row_limit(limit * mul));
return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(
co_return co_await _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(
[this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), start_time = std::move(start_time), limit, key_names = std::move(key_names), attr_names = std::move(attr_names), type, iter, high_ts] (service::storage_proxy::coordinator_query_result qr) mutable {
cql3::selection::result_set_builder builder(*selection, gc_clock::now());
query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *schema, *selection));
@@ -1020,7 +1014,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
// shard did end, then the next read will have nrecords == 0 and
// will notice end end of shard and not return NextShardIterator.
rjson::add(ret, "NextShardIterator", next_iter);
_stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);
_stats.api_operations.get_records_latency.mark(std::chrono::steady_clock::now() - start_time);
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));
}
@@ -1043,7 +1037,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
shard_iterator next_iter(iter.table, iter.shard, utils::UUID_gen::min_time_UUID(high_ts.time_since_epoch()), true);
rjson::add(ret, "NextShardIterator", iter);
}
_stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);
_stats.api_operations.get_records_latency.mark(std::chrono::steady_clock::now() - start_time);
if (is_big(ret)) {
return make_ready_future<executor::request_return_type>(make_streamed(std::move(ret)));
}
@@ -1061,9 +1055,6 @@ void executor::add_stream_options(const rjson::value& stream_specification, sche
if (stream_enabled->GetBool()) {
auto db = sp.data_dictionary();
if (!db.features().cdc) {
throw api_error::validation("StreamSpecification: streams (CDC) feature not enabled in cluster.");
}
if (!db.features().alternator_streams) {
throw api_error::validation("StreamSpecification: alternator streams feature not enabled in cluster.");
}

View File

@@ -26,19 +26,19 @@
#include "log.hh"
#include "gc_clock.hh"
#include "replica/database.hh"
#include "service/client_state.hh"
#include "service_permit.hh"
#include "timestamp.hh"
#include "service/storage_proxy.hh"
#include "service/pager/paging_state.hh"
#include "service/pager/query_pagers.hh"
#include "gms/feature_service.hh"
#include "sstables/types.hh"
#include "mutation/mutation.hh"
#include "types/types.hh"
#include "types/map.hh"
#include "utils/assert.hh"
#include "utils/rjson.hh"
#include "utils/big_decimal.hh"
#include "utils/fb_utilities.hh"
#include "cql3/selection/selection.hh"
#include "cql3/values.hh"
#include "cql3/query_options.hh"
@@ -81,6 +81,11 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
co_return api_error::validation("UpdateTimeToLive requires boolean Enabled");
}
bool enabled = v->GetBool();
// Alternator TTL doesn't yet work when the table uses tablets (#16567)
if (enabled && _proxy.local_db().find_keyspace(schema->ks_name()).get_replication_strategy().uses_tablets()) {
co_return api_error::validation("TTL not yet supported on a table using tablets (issue #16567). "
"Create a table with the tag 'experimental:initial_tablets' set to 'none' to use vnodes.");
}
v = rjson::find(*spec, "AttributeName");
if (!v || !v->IsString()) {
co_return api_error::validation("UpdateTimeToLive requires string AttributeName");
@@ -94,6 +99,7 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
}
sstring attribute_name(v->GetString(), v->GetStringLength());
co_await verify_permission(client_state, schema, auth::permission::ALTER);
co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [&](std::map<sstring, sstring>& tags_map) {
if (enabled) {
if (tags_map.contains(TTL_TAG_KEY)) {
@@ -155,7 +161,7 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta
// node owning this range as a "primary range" (the first node in the ring
// with this range), but when this node is down, the secondary owner (the
// second in the ring) may take over.
// An expiration thread is reponsible for all tables which need expiration
// An expiration thread is responsible for all tables which need expiration
// scans. Currently, the different tables are scanned sequentially (not in
// parallel).
// The expiration thread scans item using CL=QUORUM to ensures that it reads
@@ -309,7 +315,7 @@ static size_t random_offset(size_t min, size_t max) {
// this range's primary node is down. For this we need to return not just
// a list of this node's secondary ranges - but also the primary owner of
// each of those ranges.
static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary_ranges(
static future<std::vector<std::pair<dht::token_range, gms::inet_address>>> get_secondary_ranges(
const locator::effective_replication_map_ptr& erm,
gms::inet_address ep) {
const auto& tm = *erm->get_token_metadata_ptr();
@@ -320,6 +326,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary
}
auto prev_tok = sorted_tokens.back();
for (const auto& tok : sorted_tokens) {
co_await coroutine::maybe_yield();
inet_address_vector_replica_set eps = erm->get_natural_endpoints(tok);
if (eps.size() <= 1 || eps[1] != ep) {
prev_tok = tok;
@@ -347,7 +354,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary
}
prev_tok = tok;
}
return ret;
co_return ret;
}
@@ -380,65 +387,68 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary
// the chances of covering all ranges during a scan when restarts occur.
// A more deterministic way would be to regularly persist the scanning state,
// but that incurs overhead that we want to avoid if not needed.
enum primary_or_secondary_t {primary, secondary};
template<primary_or_secondary_t primary_or_secondary>
class token_ranges_owned_by_this_shard {
// ranges_holder_primary holds just the primary ranges themselves
class ranges_holder_primary {
const dht::token_range_vector _token_ranges;
public:
ranges_holder_primary(const locator::vnode_effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)
: _token_ranges(erm->get_primary_ranges(ep)) {}
std::size_t size() const { return _token_ranges.size(); }
const dht::token_range& operator[](std::size_t i) const {
return _token_ranges[i];
}
bool should_skip(std::size_t i) const {
return false;
}
};
// ranges_holder<secondary> holds the secondary token ranges plus each
// range's primary owner, needed to implement should_skip().
class ranges_holder_secondary {
std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;
gms::gossiper& _gossiper;
public:
ranges_holder_secondary(const locator::effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)
: _token_ranges(get_secondary_ranges(erm, ep))
, _gossiper(g) {}
std::size_t size() const { return _token_ranges.size(); }
const dht::token_range& operator[](std::size_t i) const {
return _token_ranges[i].first;
}
// range i should be skipped if its primary owner is alive.
bool should_skip(std::size_t i) const {
return _gossiper.is_alive(_token_ranges[i].second);
}
};
//
// FIXME: Check if this algorithm is safe with tablet migration.
// https://github.com/scylladb/scylladb/issues/16567
// ranges_holder_primary holds just the primary ranges themselves
class ranges_holder_primary {
dht::token_range_vector _token_ranges;
public:
explicit ranges_holder_primary(dht::token_range_vector token_ranges) : _token_ranges(std::move(token_ranges)) {}
static future<ranges_holder_primary> make(const locator::vnode_effective_replication_map_ptr& erm, gms::inet_address ep) {
co_return ranges_holder_primary(co_await erm->get_primary_ranges(ep));
}
std::size_t size() const { return _token_ranges.size(); }
const dht::token_range& operator[](std::size_t i) const {
return _token_ranges[i];
}
bool should_skip(std::size_t i) const {
return false;
}
};
// ranges_holder<secondary> holds the secondary token ranges plus each
// range's primary owner, needed to implement should_skip().
class ranges_holder_secondary {
std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;
const gms::gossiper& _gossiper;
public:
explicit ranges_holder_secondary(std::vector<std::pair<dht::token_range, gms::inet_address>> token_ranges, const gms::gossiper& g)
: _token_ranges(std::move(token_ranges))
, _gossiper(g) {}
static future<ranges_holder_secondary> make(const locator::effective_replication_map_ptr& erm, gms::inet_address ep, const gms::gossiper& g) {
co_return ranges_holder_secondary(co_await get_secondary_ranges(erm, ep), g);
}
std::size_t size() const { return _token_ranges.size(); }
const dht::token_range& operator[](std::size_t i) const {
return _token_ranges[i].first;
}
// range i should be skipped if its primary owner is alive.
bool should_skip(std::size_t i) const {
return _gossiper.is_alive(_token_ranges[i].second);
}
};
template<class primary_or_secondary_t>
class token_ranges_owned_by_this_shard {
schema_ptr _s;
locator::effective_replication_map_ptr _erm;
// _token_ranges will contain a list of token ranges owned by this node.
// We'll further need to split each such range to the pieces owned by
// the current shard, using _intersecter.
using ranges_holder = std::conditional_t<
primary_or_secondary == primary_or_secondary_t::primary,
ranges_holder_primary,
ranges_holder_secondary>;
const ranges_holder _token_ranges;
const primary_or_secondary_t _token_ranges;
// NOTICE: _range_idx is used modulo _token_ranges size when accessing
// the data to ensure that it doesn't go out of bounds
size_t _range_idx;
size_t _end_idx;
std::optional<dht::selective_token_range_sharder> _intersecter;
locator::effective_replication_map_ptr _erm;
public:
token_ranges_owned_by_this_shard(replica::database& db, gms::gossiper& g, schema_ptr s)
token_ranges_owned_by_this_shard(schema_ptr s, primary_or_secondary_t token_ranges)
: _s(s)
, _token_ranges(db.find_keyspace(s->ks_name()).get_effective_replication_map(),
g, utils::fb_utilities::get_broadcast_address())
, _erm(s->table().get_effective_replication_map())
, _token_ranges(std::move(token_ranges))
, _range_idx(random_offset(0, _token_ranges.size() - 1))
, _end_idx(_range_idx + _token_ranges.size())
, _erm(s->table().get_effective_replication_map())
{
tlogger.debug("Generating token ranges starting from base range {} of {}", _range_idx, _token_ranges.size());
}
@@ -492,6 +502,7 @@ struct scan_ranges_context {
bytes column_name;
std::optional<std::string> member;
service::client_state internal_client_state;
::shared_ptr<cql3::selection::selection> selection;
std::unique_ptr<service::query_state> query_state_ptr;
std::unique_ptr<cql3::query_options> query_options;
@@ -501,6 +512,7 @@ struct scan_ranges_context {
: s(s)
, column_name(column_name)
, member(member)
, internal_client_state(service::client_state::internal_tag())
{
// FIXME: don't read the entire items - read only parts of it.
// We must read the key columns (to be able to delete) and also
@@ -519,10 +531,9 @@ struct scan_ranges_context {
std::vector<query::clustering_range> ck_bounds{query::clustering_range::make_open_ended_both_sides()};
auto partition_slice = query::partition_slice(std::move(ck_bounds), {}, std::move(regular_columns), opts);
command = ::make_lw_shared<query::read_command>(s->id(), s->version(), partition_slice, proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));
executor::client_state client_state{executor::client_state::internal_tag()};
tracing::trace_state_ptr trace_state;
// NOTICE: empty_service_permit is used because the TTL service has fixed parallelism
query_state_ptr = std::make_unique<service::query_state>(client_state, trace_state, empty_service_permit());
query_state_ptr = std::make_unique<service::query_state>(internal_client_state, trace_state, empty_service_permit());
// FIXME: What should we do on multi-DC? Will we run the expiration on the same ranges on all
// DCs or only once for each range? If the latter, we need to change the CLs in the
// scanner and deleter.
@@ -545,7 +556,7 @@ static future<> scan_table_ranges(
expiration_service::stats& expiration_stats)
{
const schema_ptr& s = scan_ctx.s;
assert (partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.
SCYLLA_ASSERT (partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.
auto p = service::pager::query_pagers::pager(proxy, s, scan_ctx.selection, *scan_ctx.query_state_ptr,
*scan_ctx.query_options, scan_ctx.command, std::move(partition_ranges), nullptr);
while (!p->is_exhausted()) {
@@ -718,7 +729,9 @@ static future<bool> scan_table(
expiration_stats.scan_table++;
// FIXME: need to pace the scan, not do it all at once.
scan_ranges_context scan_ctx{s, proxy, std::move(column_name), std::move(member)};
token_ranges_owned_by_this_shard<primary> my_ranges(db.real_database(), gossiper, s);
auto erm = db.real_database().find_keyspace(s->ks_name()).get_vnode_effective_replication_map();
auto my_address = erm->get_topology().my_address();
token_ranges_owned_by_this_shard my_ranges(s, co_await ranges_holder_primary::make(erm, my_address));
while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {
// Note that because of issue #9167 we need to run a separate
// query on each partition range, and can't pass several of
@@ -738,7 +751,7 @@ static future<bool> scan_table(
// by tasking another node to take over scanning of the dead node's primary
// ranges. What we do here is that this node will also check expiration
// on its *secondary* ranges - but only those whose primary owner is down.
token_ranges_owned_by_this_shard<secondary> my_secondary_ranges(db.real_database(), gossiper, s);
token_ranges_owned_by_this_shard my_secondary_ranges(s, co_await ranges_holder_secondary::make(erm, my_address, gossiper));
while (std::optional<dht::partition_range> range = my_secondary_ranges.next_partition_range()) {
expiration_stats.secondary_ranges_scanned++;
dht::partition_range_vector partition_ranges;

View File

@@ -7,6 +7,7 @@ set(swagger_files
api-doc/commitlog.json
api-doc/compaction_manager.json
api-doc/config.json
api-doc/cql_server_test.json
api-doc/endpoint_snitch_info.json
api-doc/error_injection.json
api-doc/failure_detector.json
@@ -15,10 +16,12 @@ set(swagger_files
api-doc/lsa.json
api-doc/messaging_service.json
api-doc/metrics.json
api-doc/raft.json
api-doc/storage_proxy.json
api-doc/storage_service.json
api-doc/stream_manager.json
api-doc/system.json
api-doc/tasks.json
api-doc/task_manager.json
api-doc/task_manager_test.json
api-doc/utils.json)
@@ -44,6 +47,7 @@ target_sources(api
commitlog.cc
compaction_manager.cc
config.cc
cql_server_test.cc
endpoint_snitch.cc
error_injection.cc
authorization_cache.cc
@@ -52,12 +56,15 @@ target_sources(api
hinted_handoff.cc
lsa.cc
messaging_service.cc
raft.cc
storage_proxy.cc
storage_service.cc
stream_manager.cc
system.cc
tasks.cc
task_manager.cc
task_manager_test.cc
token_metadata.cc
${swagger_gen_files})
target_include_directories(api
PUBLIC
@@ -66,6 +73,9 @@ target_include_directories(api
target_link_libraries(api
idl
wasmtime_bindings
Seastar::seastar
xxHash::xxhash)
xxHash::xxhash
absl::headers)
check_headers(check-headers api
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -67,7 +67,7 @@
"parameters":[
{
"name":"pluginid",
"description":"The plugin ID, describe the component the metric belongs to. Examples are cache, thrift, etc'. Regex are supported.The plugin ID, describe the component the metric belong to. Examples are: cache, thrift etc'. regex are supported",
"description":"The plugin ID, describe the component the metric belongs to. Examples are cache and alternator, etc'. Regex are supported.",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -199,4 +199,4 @@
}
}
}
}
}

View File

@@ -92,6 +92,14 @@
"type":"boolean",
"paramType":"query"
},
{
"name":"consider_only_existing_data",
"description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"split_output",
"description":"true if the output of the major compaction should be split in several sstables",
@@ -211,7 +219,7 @@
"operations":[
{
"method":"POST",
"summary":"Sets the minumum and maximum number of sstables in queue before compaction kicks off",
"summary":"Sets the minimum and maximum number of sstables in queue before compaction kicks off",
"type":"string",
"nickname":"set_compaction_threshold",
"produces":[

View File

@@ -144,6 +144,21 @@
"parameters": []
}
]
},
{
"path": "/commitlog/metrics/max_disk_size",
"operations": [
{
"method": "GET",
"summary": "Get max disk size",
"type": "long",
"nickname": "get_max_disk_size",
"produces": [
"application/json"
],
"parameters": []
}
]
}
]
}

View File

@@ -0,0 +1,26 @@
{
"apiVersion":"0.0.1",
"swaggerVersion":"1.2",
"basePath":"{{Protocol}}://{{Host}}",
"resourcePath":"/cql_server_test",
"produces":[
"application/json"
],
"apis":[
{
"path":"/cql_server_test/connections_params",
"operations":[
{
"method":"GET",
"summary":"Get service level params of each CQL connection",
"type":"connections_service_level_params",
"nickname":"connections_params",
"produces":[
"application/json"
],
"parameters":[]
}
]
}
]
}

View File

@@ -63,6 +63,28 @@
"paramType":"path"
}
]
},
{
"method":"GET",
"summary":"Read the state of an injection from all shards",
"type":"array",
"items":{
"type":"error_injection_info"
},
"nickname":"read_injection",
"produces":[
"application/json"
],
"parameters":[
{
"name":"injection",
"description":"injection name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
@@ -90,6 +112,30 @@
}
]
},
{
"path":"/v2/error_injection/disconnect/{ip}",
"operations":[
{
"method":"POST",
"summary":"Drop connection to a given IP",
"type":"void",
"nickname":"inject_disconnect",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ip",
"description":"IP address to disconnect from",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/v2/error_injection/injection",
"operations":[
@@ -128,5 +174,39 @@
}
}
}
},
"models":{
"mapper":{
"id":"mapper",
"description":"A key value mapping",
"properties":{
"key":{
"type":"string",
"description":"The key"
},
"value":{
"type":"string",
"description":"The value"
}
}
},
"error_injection_info":{
"id":"error_injection_info",
"description":"Information about an error injection",
"properties":{
"enabled":{
"type":"boolean",
"description":"Is the error injection enabled"
},
"parameters":{
"type":"array",
"items":{
"type":"mapper"
},
"description":"The parameter values"
}
},
"required":["enabled"]
}
}
}

View File

@@ -12,7 +12,7 @@
"operations":[
{
"method":"GET",
"summary":"Get the addreses of the down endpoints",
"summary":"Get the addresses of the down endpoints",
"type":"array",
"items":{
"type":"string"
@@ -31,7 +31,7 @@
"operations":[
{
"method":"GET",
"summary":"Get the addreses of live endpoints",
"summary":"Get the addresses of live endpoints",
"type":"array",
"items":{
"type":"string"

View File

@@ -7,11 +7,11 @@
"items": {
"type": "string"
},
"description": "The source labels, a match is based on concatination of the labels"
"description": "The source labels, a match is based on concatenation of the labels"
},
"action": {
"type": "string",
"description": "The action to perfrom on match",
"description": "The action to perform on match",
"enum": ["skip_when_empty", "report_when_empty", "replace", "keep", "drop", "drop_label"]
},
"target_label": {
@@ -28,7 +28,7 @@
},
"separator": {
"type": "string",
"description": "The separator string to use when concatinating the labels"
"description": "The separator string to use when concatenating the labels"
}
}
}

View File

@@ -38,6 +38,62 @@
]
}
]
},
{
"path":"/raft/leader_host",
"operations":[
{
"method":"GET",
"summary":"Returns host ID of the current leader of the given Raft group",
"type":"string",
"nickname":"get_leader_host",
"produces":[
"application/json"
],
"parameters":[
{
"name":"group_id",
"description":"The ID of the group. When absent, group0 is used.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path": "/raft/read_barrier",
"operations": [
{
"method": "POST",
"summary": "Triggers read barrier for the given Raft group to wait for previously committed commands in this group to be applied locally. For example, can be used on group 0 to wait for the node to obtain latest schema changes.",
"type": "string",
"nickname": "read_barrier",
"produces": [
"application/json"
],
"parameters": [
{
"name": "group_id",
"description": "The ID of the group. When absent, group0 is used.",
"required": false,
"allowMultiple": false,
"type": "string",
"paramType": "query"
},
{
"name": "timeout",
"description": "Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",
"required": false,
"allowMultiple": false,
"type": "long",
"paramType": "query"
}
]
}
]
}
]
}

View File

@@ -90,7 +90,7 @@
"operations":[
{
"method":"GET",
"summary":"Returns a list of the tokens endpoint mapping",
"summary":"Returns a list of the tokens endpoint mapping, provide keyspace and cf param to get tablet mapping",
"type":"array",
"items":{
"type":"mapper"
@@ -100,6 +100,22 @@
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to provide the tablet mapping for",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"cf",
"description":"The table to provide the tablet mapping for",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
@@ -336,6 +352,14 @@
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"cf",
"description":"Column family name",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
@@ -368,25 +392,6 @@
}
]
},
{
"path":"/storage_service/describe_ring/",
"operations":[
{
"method":"GET",
"summary":"The TokenRange for a any keyspace",
"type":"array",
"items":{
"type":"token_range"
},
"nickname":"describe_any_ring",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/describe_ring/{keyspace}",
"operations":[
@@ -409,6 +414,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"table",
"description":"The name of table to fetch information about",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
@@ -436,6 +449,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Column family name",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
@@ -720,11 +741,123 @@
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"consider_only_existing_data",
"description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/backup",
"operations":[
{
"method":"POST",
"summary":"Starts copying SSTables from a specified keyspace to a designated bucket in object storage",
"type":"string",
"nickname":"start_backup",
"produces":[
"application/json"
],
"parameters":[
{
"name":"endpoint",
"description":"ID of the configured object storage endpoint to copy sstables to",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"bucket",
"description":"Name of the bucket to backup sstables to",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"keyspace",
"description":"Name of a keyspace to copy sstables from",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"snapshot",
"description":"Name of a snapshot to copy sstables from",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/restore",
"operations":[
{
"method":"POST",
"summary":"Starts copying SSTables from a designated bucket in object storage to a specified keyspace",
"type":"string",
"nickname":"start_restore",
"produces":[
"application/json"
],
"parameters":[
{
"name":"endpoint",
"description":"ID of the configured object storage endpoint to copy SSTables from",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"bucket",
"description":"Name of the bucket to read SSTables from",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"snapshot",
"description":"Name of a snapshot to copy SSTables from",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"keyspace",
"description":"Name of a keyspace to copy SSTables to",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"Name of a table to copy SSTables to",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/keyspace_compaction/{keyspace}",
"operations":[
@@ -739,7 +872,7 @@
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to query about",
"description":"The keyspace to compact",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -760,6 +893,14 @@
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"consider_only_existing_data",
"description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -779,7 +920,7 @@
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to query about",
"description":"The keyspace to cleanup",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -797,6 +938,21 @@
}
]
},
{
"path":"/storage_service/cleanup_all",
"operations":[
{
"method":"POST",
"summary":"Trigger a global cleanup",
"type":"long",
"nickname":"cleanup_all",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/storage_service/keyspace_offstrategy_compaction/{keyspace}",
"operations":[
@@ -1169,6 +1325,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"small_table_optimization",
"description":"If the value is the string 'true' with any capitalization, perform small table optimization. When this option is enabled, user can send the repair request to any of the nodes in the cluster. There is no need to send repair requests to multiple nodes. All token ranges for the table will be repaired automatically.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
},
@@ -1502,6 +1666,15 @@
"type":"string",
"enum": [ "all", "user", "non_local_strategy" ],
"paramType":"query"
},
{
"name":"replication",
"description":"Filter keyspaces for the replication used: vnodes or tablets (default: all)",
"required":false,
"allowMultiple":false,
"type":"string",
"enum": [ "all", "vnodes", "tablets" ],
"paramType":"query"
}
]
}
@@ -1636,33 +1809,11 @@
{
"path":"/storage_service/rpc_server",
"operations":[
{
"method":"DELETE",
"summary":"Allows a user to disable thrift",
"type":"void",
"nickname":"stop_rpc_server",
"produces":[
"application/json"
],
"parameters":[
]
},
{
"method":"POST",
"summary":"allows a user to reenable thrift",
"type":"void",
"nickname":"start_rpc_server",
"produces":[
"application/json"
],
"parameters":[
]
},
{
"method":"GET",
"summary":"Determine if thrift is running",
"type":"boolean",
"nickname":"is_rpc_server_running",
"nickname":"is_thrift_server_running",
"produces":[
"application/json"
],
@@ -1860,6 +2011,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"force",
"description":"Enforce the source_dc option, even if it unsafe to use for rebuild",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -2017,7 +2176,7 @@
"operations":[
{
"method":"POST",
"summary":"Enables/Disables tracing for the whole system. Only thrift requests can start tracing currently",
"summary":"Enables/Disables tracing for the whole system.",
"type":"void",
"nickname":"set_trace_probability",
"produces":[
@@ -2457,6 +2616,254 @@
}
]
},
{
"path":"/storage_service/tablets/move",
"operations":[
{
"nickname":"move_tablet",
"method":"POST",
"summary":"Moves a tablet replica",
"type":"void",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ks",
"description":"Keyspace name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"Table name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"token",
"description":"Token owned by the tablet to move",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"src_host",
"description":"Source host id",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"dst_host",
"description":"Destination host id",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"src_shard",
"description":"Source shard number",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"dst_shard",
"description":"Destination shard number",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"force",
"description":"When set to true, replication strategy constraints can be broken (false by default)",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/tablets/add_replica",
"operations":[
{
"nickname":"add_tablet_replica",
"method":"POST",
"summary":"Adds replica to tablet",
"type":"void",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ks",
"description":"Keyspace name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"Table name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"token",
"description":"Token owned by the tablet to add replica to",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"dst_host",
"description":"Destination host id",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"dst_shard",
"description":"Destination shard number",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"force",
"description":"When set to true, replication strategy constraints can be broken (false by default)",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/tablets/del_replica",
"operations":[
{
"nickname":"del_tablet_replica",
"method":"POST",
"summary":"Deletes replica from tablet",
"type":"void",
"produces":[
"application/json"
],
"parameters":[
{
"name":"ks",
"description":"Keyspace name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"Table name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"token",
"description":"Token owned by the tablet to delete replica from",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"host",
"description":"Host id to remove replica from",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"shard",
"description":"Shard number to remove replica from",
"required":true,
"allowMultiple":false,
"type":"integer",
"paramType":"query"
},
{
"name":"force",
"description":"When set to true, replication strategy constraints can be broken (false by default)",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/tablets/balancing",
"operations":[
{
"nickname":"tablet_balancing_enable",
"method":"POST",
"summary":"Controls tablet load-balancing",
"type":"void",
"produces":[
"application/json"
],
"parameters":[
{
"name":"enabled",
"description":"When set to false, tablet load balancing is disabled",
"required":true,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/quiesce_topology",
"operations":[
{
"nickname":"quiesce_topology",
"method":"POST",
"summary":"Waits until there are no ongoing topology operations. Guarantees that topology operations which started before the call are finished after the call. This doesn't consider requested but not started operations. Such operations may start after the call succeeds.",
"type":"void",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/metrics/total_hints",
"operations":[
@@ -2558,6 +2965,33 @@
]
}
]
},
{
"path":"/storage_service/raft_topology/upgrade",
"operations":[
{
"method":"POST",
"summary":"Trigger the upgrade to topology on raft.",
"type":"void",
"nickname":"upgrade_to_raft_topology",
"produces":[
"application/json"
],
"parameters":[
]
},
{
"method":"GET",
"summary":"Get information about the current upgrade status of topology on raft.",
"type":"string",
"nickname":"raft_topology_upgrade_status",
"produces":[
"application/json"
],
"parameters":[
]
}
]
}
],
"models":{

View File

@@ -179,6 +179,36 @@
]
}
]
},
{
"path":"/system/dump_llvm_profile",
"operations":[
{
"method":"POST",
"summary":"Dump llvm profile data (raw profile data) that can later be used for coverage reporting or PGO (no-op if the current binary is not instrumented)",
"type":"void",
"nickname":"dump_profile",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/system/highest_supported_sstable_version",
"operations":[
{
"method":"GET",
"summary":"Get highest supported sstable version",
"type":"string",
"nickname":"get_highest_supported_sstable_version",
"produces":[
"application/json"
],
"parameters":[]
}
]
}
]
}

View File

@@ -115,7 +115,7 @@
"parameters":[
{
"name":"task_id",
"description":"The uuid of a task to abort",
"description":"The uuid of a task to abort; if the task is not abortable, 403 status code is returned",
"required":true,
"allowMultiple":false,
"type":"string",
@@ -144,6 +144,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"timeout",
"description":"Timeout for waiting; if times out, 408 status code is returned",
"required":false,
"allowMultiple":false,
"type":"long",
"paramType":"query"
}
]
}
@@ -197,11 +205,60 @@
"paramType":"query"
}
]
},
{
"method":"GET",
"summary":"Get current ttl value",
"type":"long",
"nickname":"get_ttl",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/task_manager/drain/{module}",
"operations":[
{
"method":"POST",
"summary":"Drain finished local tasks",
"type":"void",
"nickname":"drain_tasks",
"produces":[
"application/json"
],
"parameters":[
{
"name":"module",
"description":"The module to drain",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
}
],
"models":{
"task_identity":{
"id": "task_identity",
"description":"Id and node of a task",
"properties":{
"task_id":{
"type":"string",
"description":"The uuid of a task"
},
"node":{
"type":"string",
"description":"Address of a server on which a task is created"
}
}
},
"task_stats" :{
"id": "task_stats",
"description":"A task statistics object",
@@ -224,6 +281,14 @@
"type":"string",
"description":"The description of the task"
},
"kind":{
"type":"string",
"enum":[
"node",
"cluster"
],
"description":"The kind of a task"
},
"scope":{
"type":"string",
"description":"The scope of the task"
@@ -258,6 +323,14 @@
"type":"string",
"description":"The description of the task"
},
"kind":{
"type":"string",
"enum":[
"node",
"cluster"
],
"description":"The kind of a task"
},
"scope":{
"type":"string",
"description":"The scope of the task"
@@ -327,9 +400,9 @@
"children_ids":{
"type":"array",
"items":{
"type":"string"
"type":"task_identity"
},
"description":"Task IDs of children of this task"
"description":"Task identities of children of this task"
}
}
}

230
api/api-doc/tasks.json Normal file
View File

@@ -0,0 +1,230 @@
{
"apiVersion":"0.0.1",
"swaggerVersion":"1.2",
"basePath":"{{Protocol}}://{{Host}}",
"resourcePath":"/tasks",
"produces":[
"application/json"
],
"apis":[
{
"path":"/tasks/compaction/keyspace_compaction/{keyspace}",
"operations":[
{
"method":"POST",
"summary":"Forces major compaction of a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",
"type":"string",
"nickname":"force_keyspace_compaction_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"flush_memtables",
"description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/tasks/compaction/keyspace_cleanup/{keyspace}",
"operations":[
{
"method":"POST",
"summary":"Trigger a cleanup of keys on a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",
"type": "string",
"nickname":"force_keyspace_cleanup_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/tasks/compaction/keyspace_offstrategy_compaction/{keyspace}",
"operations":[
{
"method":"POST",
"summary":"Perform offstrategy compaction, if needed, in a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",
"type":"string",
"nickname":"perform_keyspace_offstrategy_compaction_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace to operate on",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/tasks/compaction/keyspace_scrub/{keyspace}",
"operations":[
{
"method":"GET",
"summary":"Scrub (deserialize + reserialize at the latest version, resolving corruptions if any) the given keyspace asynchronously, returns uuid which can be used to check progress with task manager. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false. Scrub has the following modes: Abort (default) - abort scrub if corruption is detected; Skip (same as `skip_corrupted=true`) skip over corrupt data, omitting them from the output; Segregate - segregate data into multiple sstables if needed, such that each sstable contains data with valid order; Validate - read (no rewrite) and validate data, logging any problems found.",
"type": "string",
"nickname":"scrub_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"disable_snapshot",
"description":"When set to true, disable snapshot",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"skip_corrupted",
"description":"When set to true, skip corrupted",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"scrub_mode",
"description":"How to handle corrupt data (overrides 'skip_corrupted'); ",
"required":false,
"allowMultiple":false,
"type":"string",
"enum":[
"ABORT",
"SKIP",
"SEGREGATE",
"VALIDATE"
],
"paramType":"query"
},
{
"name":"quarantine_mode",
"description":"Controls whether to scrub quarantined sstables (default INCLUDE)",
"required":false,
"allowMultiple":false,
"type":"string",
"enum":[
"INCLUDE",
"EXCLUDE",
"ONLY"
],
"paramType":"query"
},
{
"name":"keyspace",
"description":"The keyspace to query about",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/tasks/compaction/keyspace_upgrade_sstables/{keyspace}",
"operations":[
{
"method":"GET",
"summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first asynchronously, returns uuid which can be used to check progress with task manager.",
"type": "string",
"nickname":"upgrade_sstables_async",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"exclude_current_version",
"description":"When set to true exclude current version",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"cf",
"description":"Comma-separated table (column family) names",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
]
}

View File

@@ -75,7 +75,7 @@
"items":{
"type":"double"
},
"description":"One, five and fifteen mintues rates"
"description":"One, five and fifteen minutes rates"
},
"mean_rate": {
"type":"double",

View File

@@ -10,7 +10,9 @@
#include <seastar/http/file_handler.hh>
#include <seastar/http/transformers.hh>
#include <seastar/http/api_docs.hh>
#include "cql_server_test.hh"
#include "storage_service.hh"
#include "token_metadata.hh"
#include "commitlog.hh"
#include "gossiper.hh"
#include "failure_detector.hh"
@@ -31,6 +33,7 @@
#include "api/config.hh"
#include "task_manager.hh"
#include "task_manager_test.hh"
#include "tasks.hh"
#include "raft.hh"
logging::logger apilog("api");
@@ -66,6 +69,13 @@ future<> set_server_init(http_context& ctx) {
"The system related API");
rb02->add_definitions_file(r, "metrics");
set_system(ctx, r);
rb->register_function(r, "error_injection",
"The error injection API");
set_error_injection(ctx, r);
rb->register_function(r, "storage_proxy",
"The storage proxy API");
rb->register_function(r, "storage_service",
"The storage service API");
});
}
@@ -76,6 +86,10 @@ future<> set_server_config(http_context& ctx, const db::config& cfg) {
});
}
future<> unset_server_config(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_config(ctx, r); });
}
static future<> register_api(http_context& ctx, const sstring& api_name,
const sstring api_desc,
std::function<void(http_context& ctx, routes& r)> f) {
@@ -95,16 +109,16 @@ future<> unset_transport_controller(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_transport_controller(ctx, r); });
}
future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl) {
return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_rpc_controller(ctx, r, ctl); });
future<> set_thrift_controller(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { set_thrift_controller(ctx, r); });
}
future<> unset_rpc_controller(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_rpc_controller(ctx, r); });
future<> unset_thrift_controller(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_thrift_controller(ctx, r); });
}
future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {
return register_api(ctx, "storage_service", "The storage service API", [&ss, &group0_client] (http_context& ctx, routes& r) {
return ctx.http_server.set_routes([&ctx, &ss, &group0_client] (routes& r) {
set_storage_service(ctx, r, ss, group0_client);
});
}
@@ -113,6 +127,14 @@ future<> unset_server_storage_service(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_storage_service(ctx, r); });
}
future<> set_load_meter(http_context& ctx, service::load_meter& lm) {
return ctx.http_server.set_routes([&ctx, &lm] (routes& r) { set_load_meter(ctx, r, lm); });
}
future<> unset_load_meter(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_load_meter(ctx, r); });
}
future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader) {
return ctx.http_server.set_routes([&ctx, &sst_loader] (routes& r) { set_sstables_loader(ctx, r, sst_loader); });
}
@@ -156,6 +178,14 @@ future<> unset_server_snapshot(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_snapshot(ctx, r); });
}
future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm) {
return ctx.http_server.set_routes([&ctx, &tm] (routes& r) { set_token_metadata(ctx, r, tm); });
}
future<> unset_server_token_metadata(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_token_metadata(ctx, r); });
}
future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch) {
return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", [&snitch] (http_context& ctx, routes& r) {
set_endpoint_snitch(ctx, r, snitch);
@@ -167,20 +197,31 @@ future<> unset_server_snitch(http_context& ctx) {
}
future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {
return register_api(ctx, "gossiper",
co_await register_api(ctx, "gossiper",
"The gossiper API", [&g] (http_context& ctx, routes& r) {
set_gossiper(ctx, r, g.local());
});
co_await register_api(ctx, "failure_detector",
"The failure detector API", [&g] (http_context& ctx, routes& r) {
set_failure_detector(ctx, r, g.local());
});
}
future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {
future<> unset_server_gossip(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) {
unset_gossiper(ctx, r);
unset_failure_detector(ctx, r);
});
}
future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {
return register_api(ctx, "column_family",
"The column family API", [&sys_ks] (http_context& ctx, routes& r) {
set_column_family(ctx, r, sys_ks);
});
}
future<> unset_server_load_sstable(http_context& ctx) {
future<> unset_server_column_family(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_column_family(ctx, r); });
}
@@ -195,10 +236,7 @@ future<> unset_server_messaging_service(http_context& ctx) {
}
future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_proxy>& proxy) {
return register_api(ctx, "storage_proxy",
"The storage proxy API", [&proxy] (http_context& ctx, routes& r) {
set_storage_proxy(ctx, r, proxy);
});
return ctx.http_server.set_routes([&ctx, &proxy] (routes& r) { set_storage_proxy(ctx, r, proxy); });
}
future<> unset_server_storage_proxy(http_context& ctx) {
@@ -221,6 +259,10 @@ future<> set_server_cache(http_context& ctx) {
"The cache service API", set_cache_service);
}
future<> unset_server_cache(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_cache_service(ctx, r); });
}
future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& proxy) {
return register_api(ctx, "hinted_handoff",
"The hinted handoff API", [&proxy] (http_context& ctx, routes& r) {
@@ -232,16 +274,6 @@ future<> unset_hinted_handoff(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_hinted_handoff(ctx, r); });
}
future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &g](routes& r) {
rb->register_function(r, "failure_detector",
"The failure detector API");
set_failure_detector(ctx, r, g.local());
});
}
future<> set_server_compaction_manager(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
@@ -265,36 +297,65 @@ future<> set_server_done(http_context& ctx) {
rb->register_function(r, "collectd",
"The collectd API");
set_collectd(ctx, r);
rb->register_function(r, "error_injection",
"The error injection API");
set_error_injection(ctx, r);
});
}
future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg) {
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &cfg = *cfg](routes& r) {
return ctx.http_server.set_routes([rb, &ctx, &tm, &cfg = *cfg](routes& r) {
rb->register_function(r, "task_manager",
"The task manager API");
set_task_manager(ctx, r, cfg);
set_task_manager(ctx, r, tm, cfg);
});
}
future<> unset_server_task_manager(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_task_manager(ctx, r); });
}
#ifndef SCYLLA_BUILD_MODE_RELEASE
future<> set_server_task_manager_test(http_context& ctx) {
future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) mutable {
return ctx.http_server.set_routes([rb, &ctx, &tm](routes& r) mutable {
rb->register_function(r, "task_manager_test",
"The task manager test API");
set_task_manager_test(ctx, r);
set_task_manager_test(ctx, r, tm);
});
}
future<> unset_server_task_manager_test(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_task_manager_test(ctx, r); });
}
future<> set_server_cql_server_test(http_context& ctx, cql_transport::controller& ctl) {
return register_api(ctx, "cql_server_test", "The CQL server test API", [&ctl] (http_context& ctx, routes& r) {
set_cql_server_test(ctx, r, ctl);
});
}
future<> unset_server_cql_server_test(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_cql_server_test(ctx, r); });
}
#endif
future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &ss, &snap_ctl](routes& r) {
rb->register_function(r, "tasks",
"The tasks API");
set_tasks_compaction_module(ctx, r, ss, snap_ctl);
});
}
future<> unset_server_tasks_compaction_module(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_tasks_compaction_module(ctx, r); });
}
future<> set_server_raft(http_context& ctx, sharded<service::raft_group_registry>& raft_gr) {
auto rb = std::make_shared<api_registry_builder>(ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &raft_gr] (routes& r) {

View File

@@ -14,11 +14,11 @@
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <boost/units/detail/utility.hpp>
#include "api/api_init.hh"
#include "api/api-doc/utils.json.hh"
#include "utils/histogram.hh"
#include "utils/estimated_histogram.hh"
#include <seastar/http/exception.hh>
#include "api_init.hh"
#include "seastarx.hh"
namespace api {
@@ -26,7 +26,9 @@ namespace api {
template<class T>
std::vector<sstring> container_to_vec(const T& container) {
std::vector<sstring> res;
for (auto i : container) {
res.reserve(std::size(container));
for (const auto& i : container) {
res.push_back(fmt::to_string(i));
}
return res;
@@ -35,27 +37,31 @@ std::vector<sstring> container_to_vec(const T& container) {
template<class T>
std::vector<T> map_to_key_value(const std::map<sstring, sstring>& map) {
std::vector<T> res;
for (auto i : map) {
res.reserve(map.size());
for (const auto& [key, value] : map) {
res.push_back(T());
res.back().key = i.first;
res.back().value = i.second;
res.back().key = key;
res.back().value = value;
}
return res;
}
template<class T, class MAP>
std::vector<T>& map_to_key_value(const MAP& map, std::vector<T>& res) {
for (auto i : map) {
res.reserve(res.size() + std::size(map));
for (const auto& [key, value] : map) {
T val;
val.key = fmt::to_string(i.first);
val.value = fmt::to_string(i.second);
val.key = fmt::to_string(key);
val.value = fmt::to_string(value);
res.push_back(val);
}
return res;
}
template <typename T, typename S = T>
T map_sum(T&& dest, const S& src) {
for (auto i : src) {
for (const auto& i : src) {
dest[i.first] += i.second;
}
return std::move(dest);
@@ -64,6 +70,8 @@ T map_sum(T&& dest, const S& src) {
template <typename MAP>
std::vector<sstring> map_keys(const MAP& map) {
std::vector<sstring> res;
res.reserve(std::size(map));
for (const auto& i : map) {
res.push_back(fmt::to_string(i.first));
}
@@ -238,7 +246,7 @@ public:
value = T{boost::lexical_cast<Base>(param)};
}
} catch (boost::bad_lexical_cast&) {
throw httpd::bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));
throw httpd::bad_param_exception(fmt::format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));
}
}

View File

@@ -33,6 +33,10 @@ namespace streaming {
class stream_manager;
}
namespace gms {
class inet_address;
}
namespace locator {
class token_metadata;
@@ -42,7 +46,6 @@ class snitch_ptr;
} // namespace locator
namespace cql_transport { class controller; }
class thrift_controller;
namespace db {
class snapshot_ctl;
class config;
@@ -62,6 +65,10 @@ class gossiper;
namespace auth { class service; }
namespace tasks {
class task_manager;
}
namespace api {
struct http_context {
@@ -69,20 +76,16 @@ struct http_context {
sstring api_doc;
httpd::http_server_control http_server;
distributed<replica::database>& db;
service::load_meter& lmeter;
const sharded<locator::shared_token_metadata>& shared_token_metadata;
sharded<tasks::task_manager>& tm;
http_context(distributed<replica::database>& _db,
service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm, sharded<tasks::task_manager>& _tm)
: db(_db), lmeter(_lm), shared_token_metadata(_stm), tm(_tm) {
http_context(distributed<replica::database>& _db)
: db(_db)
{
}
const locator::token_metadata& get_token_metadata();
};
future<> set_server_init(http_context& ctx);
future<> set_server_config(http_context& ctx, const db::config& cfg);
future<> unset_server_config(http_context& ctx);
future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch);
future<> unset_server_snitch(http_context& ctx);
future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client&);
@@ -95,15 +98,18 @@ future<> set_server_repair(http_context& ctx, sharded<repair_service>& repair);
future<> unset_server_repair(http_context& ctx);
future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl);
future<> unset_transport_controller(http_context& ctx);
future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl);
future<> unset_rpc_controller(http_context& ctx);
future<> set_thrift_controller(http_context& ctx);
future<> unset_thrift_controller(http_context& ctx);
future<> set_server_authorization_cache(http_context& ctx, sharded<auth::service> &auth_service);
future<> unset_server_authorization_cache(http_context& ctx);
future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);
future<> unset_server_snapshot(http_context& ctx);
future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm);
future<> unset_server_token_metadata(http_context& ctx);
future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);
future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks);
future<> unset_server_load_sstable(http_context& ctx);
future<> unset_server_gossip(http_context& ctx);
future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks);
future<> unset_server_column_family(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_proxy>& proxy);
@@ -112,13 +118,21 @@ future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_
future<> unset_server_stream_manager(http_context& ctx);
future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& p);
future<> unset_hinted_handoff(http_context& ctx);
future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g);
future<> set_server_cache(http_context& ctx);
future<> unset_server_cache(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx);
future<> set_server_done(http_context& ctx);
future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg);
future<> set_server_task_manager_test(http_context& ctx);
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg);
future<> unset_server_task_manager(http_context& ctx);
future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm);
future<> unset_server_task_manager_test(http_context& ctx);
future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);
future<> unset_server_tasks_compaction_module(http_context& ctx);
future<> set_server_raft(http_context&, sharded<service::raft_group_registry>&);
future<> unset_server_raft(http_context&);
future<> set_load_meter(http_context& ctx, service::load_meter& lm);
future<> unset_load_meter(http_context& ctx);
future<> set_server_cql_server_test(http_context& ctx, cql_transport::controller& ctl);
future<> unset_server_cql_server_test(http_context& ctx);
}

View File

@@ -9,8 +9,6 @@
#include "api/api-doc/authorization_cache.json.hh"
#include "api/authorization_cache.hh"
#include "api/api.hh"
#include "auth/common.hh"
#include "auth/service.hh"
namespace api {

View File

@@ -8,11 +8,20 @@
#pragma once
#include "api.hh"
#include <seastar/core/sharded.hh>
namespace seastar::httpd {
class routes;
}
namespace auth {
class service;
}
namespace api {
void set_authorization_cache(http_context& ctx, httpd::routes& r, sharded<auth::service> &auth_service);
void unset_authorization_cache(http_context& ctx, httpd::routes& r);
struct http_context;
void set_authorization_cache(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<auth::service> &auth_service);
void unset_authorization_cache(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -7,6 +7,7 @@
*/
#include "cache_service.hh"
#include "api/api.hh"
#include "api/api-doc/cache_service.json.hh"
#include "column_family.hh"
@@ -195,9 +196,9 @@ void set_cache_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().region().occupancy().used_space();
cs::get_row_capacity.set(r, [] (std::unique_ptr<http::request> req) {
return seastar::map_reduce(smp::all_cpus(), [] (int cpu) {
return make_ready_future<uint64_t>(memory::stats().total_memory());
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
});
@@ -240,9 +241,9 @@ void set_cache_service(http_context& ctx, routes& r) {
cs::get_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use num entries instead
// We currently do not support weights, so we use raw size in bytes instead
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().partitions();
return db.row_cache_tracker().region().occupancy().used_space();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
});
@@ -319,5 +320,50 @@ void set_cache_service(http_context& ctx, routes& r) {
});
}
void unset_cache_service(http_context& ctx, routes& r) {
cs::get_row_cache_save_period_in_seconds.unset(r);
cs::set_row_cache_save_period_in_seconds.unset(r);
cs::get_key_cache_save_period_in_seconds.unset(r);
cs::set_key_cache_save_period_in_seconds.unset(r);
cs::get_counter_cache_save_period_in_seconds.unset(r);
cs::set_counter_cache_save_period_in_seconds.unset(r);
cs::get_row_cache_keys_to_save.unset(r);
cs::set_row_cache_keys_to_save.unset(r);
cs::get_key_cache_keys_to_save.unset(r);
cs::set_key_cache_keys_to_save.unset(r);
cs::get_counter_cache_keys_to_save.unset(r);
cs::set_counter_cache_keys_to_save.unset(r);
cs::invalidate_key_cache.unset(r);
cs::invalidate_counter_cache.unset(r);
cs::set_row_cache_capacity_in_mb.unset(r);
cs::set_key_cache_capacity_in_mb.unset(r);
cs::set_counter_cache_capacity_in_mb.unset(r);
cs::save_caches.unset(r);
cs::get_key_capacity.unset(r);
cs::get_key_hits.unset(r);
cs::get_key_requests.unset(r);
cs::get_key_hit_rate.unset(r);
cs::get_key_hits_moving_avrage.unset(r);
cs::get_key_requests_moving_avrage.unset(r);
cs::get_key_size.unset(r);
cs::get_key_entries.unset(r);
cs::get_row_capacity.unset(r);
cs::get_row_hits.unset(r);
cs::get_row_requests.unset(r);
cs::get_row_hit_rate.unset(r);
cs::get_row_hits_moving_avrage.unset(r);
cs::get_row_requests_moving_avrage.unset(r);
cs::get_row_size.unset(r);
cs::get_row_entries.unset(r);
cs::get_counter_capacity.unset(r);
cs::get_counter_hits.unset(r);
cs::get_counter_requests.unset(r);
cs::get_counter_hit_rate.unset(r);
cs::get_counter_hits_moving_avrage.unset(r);
cs::get_counter_requests_moving_avrage.unset(r);
cs::get_counter_size.unset(r);
cs::get_counter_entries.unset(r);
}
}

View File

@@ -8,10 +8,14 @@
#pragma once
#include "api.hh"
namespace seastar::httpd {
class routes;
}
namespace api {
void set_cache_service(http_context& ctx, httpd::routes& r);
struct http_context;
void set_cache_service(http_context& ctx, seastar::httpd::routes& r);
void unset_cache_service(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -10,9 +10,9 @@
#include "api/api-doc/collectd.json.hh"
#include <seastar/core/scollectd.hh>
#include <seastar/core/scollectd_api.hh>
#include "endian.h"
#include <boost/range/irange.hpp>
#include <regex>
#include "api/api_init.hh"
namespace api {

View File

@@ -8,10 +8,13 @@
#pragma once
#include "api.hh"
namespace seastar::httpd {
class routes;
}
namespace api {
void set_collectd(http_context& ctx, httpd::routes& r);
struct http_context;
void set_collectd(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -6,12 +6,16 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <fmt/ranges.h>
#include "column_family.hh"
#include "api/api.hh"
#include "api/api-doc/column_family.json.hh"
#include "api/api-doc/storage_service.json.hh"
#include <vector>
#include <seastar/http/exception.hh>
#include "sstables/sstables.hh"
#include "sstables/metadata_collector.hh"
#include "utils/assert.hh"
#include "utils/estimated_histogram.hh"
#include <algorithm>
#include "db/system_keyspace.hh"
@@ -27,6 +31,7 @@ using namespace httpd;
using namespace json;
namespace cf = httpd::column_family_json;
namespace ss = httpd::storage_service_json;
std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
auto pos = name.find("%3A");
@@ -78,6 +83,65 @@ future<json::json_return_type> get_cf_stats(http_context& ctx,
}, std::plus<int64_t>());
}
static future<json::json_return_type> set_tables(http_context& ctx, const sstring& keyspace, std::vector<sstring> tables, std::function<future<>(replica::table&)> set) {
if (tables.empty()) {
tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return do_with(keyspace, std::move(tables), [&ctx, set] (const sstring& keyspace, const std::vector<sstring>& tables) {
return ctx.db.invoke_on_all([&keyspace, &tables, set] (replica::database& db) {
return parallel_for_each(tables, [&db, &keyspace, set] (const sstring& table) {
replica::table& t = db.find_column_family(keyspace, table);
return set(t);
});
});
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
}
class autocompaction_toggle_guard {
replica::database& _db;
public:
autocompaction_toggle_guard(replica::database& db) : _db(db) {
SCYLLA_ASSERT(this_shard_id() == 0);
if (!_db._enable_autocompaction_toggle) {
throw std::runtime_error("Autocompaction toggle is busy");
}
_db._enable_autocompaction_toggle = false;
}
autocompaction_toggle_guard(const autocompaction_toggle_guard&) = delete;
autocompaction_toggle_guard(autocompaction_toggle_guard&&) = default;
~autocompaction_toggle_guard() {
SCYLLA_ASSERT(this_shard_id() == 0);
_db._enable_autocompaction_toggle = true;
}
};
static future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
apilog.info("set_tables_autocompaction: enabled={} keyspace={} tables={}", enabled, keyspace, tables);
return ctx.db.invoke_on(0, [&ctx, keyspace, tables = std::move(tables), enabled] (replica::database& db) {
auto g = autocompaction_toggle_guard(db);
return set_tables(ctx, keyspace, tables, [enabled] (replica::table& cf) {
if (enabled) {
cf.enable_auto_compaction();
} else {
return cf.disable_auto_compaction();
}
return make_ready_future<>();
}).finally([g = std::move(g)] {});
});
}
static future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {
apilog.info("set_tables_tombstone_gc: enabled={} keyspace={} tables={}", enabled, keyspace, tables);
return set_tables(ctx, keyspace, std::move(tables), [enabled] (replica::table& t) {
t.set_tombstone_gc_enabled(enabled);
return make_ready_future<>();
});
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const replica::column_family& cf) {
@@ -303,10 +367,20 @@ ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared
return ratio_holder(f + sst->filter_get_recent_true_positive(), f);
}
uint64_t accumulate_on_active_memtables(replica::table& t, noncopyable_function<uint64_t(replica::memtable& mt)> action) {
uint64_t ret = 0;
t.for_each_active_memtable([&] (replica::memtable& mt) {
ret += action(mt);
});
return ret;
}
void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace>& sys_ks) {
cf::get_column_family_name.set(r, [&ctx] (const_req req){
std::vector<sstring> res;
ctx.db.local().get_tables_metadata().for_each_table_id([&] (const std::pair<sstring, sstring>& kscf, table_id) {
const replica::database::tables_metadata& meta = ctx.db.local().get_tables_metadata();
res.reserve(meta.size());
meta.for_each_table_id([&] (const std::pair<sstring, sstring>& kscf, table_id) {
res.push_back(kscf.first + ":" + kscf.second);
});
return res;
@@ -326,21 +400,23 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_column_family_name_keyspace.set(r, [&ctx] (const_req req){
std::vector<sstring> res;
for (auto i = ctx.db.local().get_keyspaces().cbegin(); i!= ctx.db.local().get_keyspaces().cend(); i++) {
res.push_back(i->first);
const flat_hash_map<sstring, replica::keyspace>& keyspaces = ctx.db.local().get_keyspaces();
res.reserve(keyspaces.size());
for (const auto& i : keyspaces) {
res.push_back(i.first);
}
return res;
});
cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t{0}, [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));
return accumulate_on_active_memtables(cf, std::mem_fn(&replica::memtable::partition_count));
}, std::plus<>());
});
cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t{0}, [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));
return accumulate_on_active_memtables(cf, std::mem_fn(&replica::memtable::partition_count));
}, std::plus<>());
});
@@ -354,33 +430,33 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().total_space();
}), uint64_t(0));
return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {
return active_memtable.region().occupancy().total_space();
});
}, std::plus<int64_t>());
});
cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().total_space();
}), uint64_t(0));
return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {
return active_memtable.region().occupancy().total_space();
});
}, std::plus<int64_t>());
});
cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().used_space();
}), uint64_t(0));
return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {
return active_memtable.region().occupancy().used_space();
});
}, std::plus<int64_t>());
});
cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().used_space();
}), uint64_t(0));
return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {
return active_memtable.region().occupancy().used_space();
});
}, std::plus<int64_t>());
});
@@ -418,9 +494,9 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {
return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {
return active_memtable->region().occupancy().used_space();
}), uint64_t(0));
return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {
return active_memtable.region().occupancy().used_space();
});
}, std::plus<int64_t>());
});
@@ -759,24 +835,6 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
return make_ready_future<json::json_return_type>(0);
});
cf::get_true_snapshots_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
auto uuid = get_uuid(req->param["name"], ctx.db.local());
return ctx.db.local().find_column_family(uuid).get_snapshot_details().then([](
const std::unordered_map<sstring, replica::column_family::snapshot_details>& sd) {
int64_t res = 0;
for (auto i : sd) {
res += i.second.total;
}
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
cf::get_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<http::request> req) {
//TBD
unimplemented();
@@ -872,26 +930,32 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/enable_auto_compaction: name={}", req->get_path_param("name"));
return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return foreach_column_family(ctx, req->get_path_param("name"), [](replica::column_family &cf) {
cf.enable_auto_compaction();
}).then([g = std::move(g)] {
return make_ready_future<json::json_return_type>(json_void());
});
});
auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));
validate_table(ctx, ks, cf);
return set_tables_autocompaction(ctx, ks, {std::move(cf)}, true);
});
cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/disable_auto_compaction: name={}", req->get_path_param("name"));
return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {
auto g = replica::database::autocompaction_toggle_guard(db);
return foreach_column_family(ctx, req->get_path_param("name"), [](replica::column_family &cf) {
return cf.disable_auto_compaction();
}).then([g = std::move(g)] {
return make_ready_future<json::json_return_type>(json_void());
});
});
auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));
validate_table(ctx, ks, cf);
return set_tables_autocompaction(ctx, ks, {std::move(cf)}, false);
});
ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("enable_auto_compaction: keyspace={} tables={}", keyspace, tables);
return set_tables_autocompaction(ctx, keyspace, tables, true);
});
ss::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("disable_auto_compaction: keyspace={} tables={}", keyspace, tables);
return set_tables_autocompaction(ctx, keyspace, tables, false);
});
cf::get_tombstone_gc.set(r, [&ctx] (const_req req) {
@@ -902,20 +966,32 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
cf::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/enable_tombstone_gc: name={}", req->get_path_param("name"));
return foreach_column_family(ctx, req->get_path_param("name"), [](replica::table& t) {
t.set_tombstone_gc_enabled(true);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));
validate_table(ctx, ks, cf);
return set_tables_tombstone_gc(ctx, ks, {std::move(cf)}, true);
});
cf::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
apilog.info("column_family/disable_tombstone_gc: name={}", req->get_path_param("name"));
return foreach_column_family(ctx, req->get_path_param("name"), [](replica::table& t) {
t.set_tombstone_gc_enabled(false);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));
validate_table(ctx, ks, cf);
return set_tables_tombstone_gc(ctx, ks, {std::move(cf)}, false);
});
ss::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("enable_tombstone_gc: keyspace={} tables={}", keyspace, tables);
return set_tables_tombstone_gc(ctx, keyspace, tables, true);
});
ss::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {
auto keyspace = validate_keyspace(ctx, req);
auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");
apilog.info("disable_tombstone_gc: keyspace={} tables={}", keyspace, tables);
return set_tables_tombstone_gc(ctx, keyspace, tables, false);
});
cf::get_built_indexes.set(r, [&ctx, &sys_ks](std::unique_ptr<http::request> req) {
@@ -1050,6 +1126,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
auto params = req_params({
std::pair("name", mandatory::yes),
std::pair("flush_memtables", mandatory::no),
std::pair("consider_only_existing_data", mandatory::no),
std::pair("split_output", mandatory::no),
});
params.process(*req);
@@ -1058,7 +1135,8 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
}
auto [ks, cf] = parse_fully_qualified_cf_name(*params.get("name"));
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.info("column_family/force_major_compaction: name={} flush={}", req->get_path_param("name"), flush);
auto consider_only_existing_data = params.get_as<bool>("consider_only_existing_data").value_or(false);
apilog.info("column_family/force_major_compaction: name={} flush={} consider_only_existing_data={}", req->get_path_param("name"), flush, consider_only_existing_data);
auto keyspace = validate_keyspace(ctx, ks);
std::vector<table_info> table_infos = {table_info{
@@ -1067,11 +1145,11 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace
}};
auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();
std::optional<major_compaction_task_impl::flush_mode> fmopt;
if (!flush) {
fmopt = major_compaction_task_impl::flush_mode::skip;
std::optional<flush_mode> fmopt;
if (!flush && !consider_only_existing_data) {
fmopt = flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), ctx.db, std::move(table_infos), fmopt);
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), ctx.db, std::move(table_infos), fmopt, consider_only_existing_data);
co_await task->done();
co_return json_void();
});
@@ -1151,8 +1229,6 @@ void unset_column_family(http_context& ctx, routes& r) {
cf::get_speculative_retries.unset(r);
cf::get_all_speculative_retries.unset(r);
cf::get_key_cache_hit_rate.unset(r);
cf::get_true_snapshots_size.unset(r);
cf::get_all_true_snapshots_size.unset(r);
cf::get_row_cache_hit_out_of_range.unset(r);
cf::get_all_row_cache_hit_out_of_range.unset(r);
cf::get_row_cache_hit.unset(r);
@@ -1169,6 +1245,13 @@ void unset_column_family(http_context& ctx, routes& r) {
cf::get_auto_compaction.unset(r);
cf::enable_auto_compaction.unset(r);
cf::disable_auto_compaction.unset(r);
ss::enable_auto_compaction.unset(r);
ss::disable_auto_compaction.unset(r);
cf::get_tombstone_gc.unset(r);
cf::enable_tombstone_gc.unset(r);
cf::disable_tombstone_gc.unset(r);
ss::enable_tombstone_gc.unset(r);
ss::disable_tombstone_gc.unset(r);
cf::get_built_indexes.unset(r);
cf::get_compression_metadata_off_heap_memory_used.unset(r);
cf::get_compression_parameters.unset(r);

View File

@@ -8,11 +8,10 @@
#pragma once
#include "api.hh"
#include "api/api-doc/column_family.json.hh"
#include "replica/database.hh"
#include <seastar/core/future-util.hh>
#include <seastar/json/json_elements.hh>
#include <any>
#include "api/api_init.hh"
namespace db {
class system_keyspace;

View File

@@ -9,6 +9,7 @@
#include "commitlog.hh"
#include "db/commitlog/commitlog.hh"
#include "api/api-doc/commitlog.json.hh"
#include "api/api_init.hh"
#include "replica/database.hh"
#include <vector>
@@ -16,7 +17,7 @@ namespace api {
using namespace seastar::httpd;
template<typename T>
static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {
static auto acquire_cl_metric(http_context& ctx, std::function<T (const db::commitlog*)> func) {
typedef T ret_type;
return ctx.db.map_reduce0([func = std::move(func)](replica::database& db) {
@@ -62,6 +63,9 @@ void set_commitlog(http_context& ctx, routes& r) {
httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));
});
httpd::commitlog_json::get_max_disk_size.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::disk_limit, std::placeholders::_1));
});
}
}

View File

@@ -8,10 +8,12 @@
#pragma once
#include "api.hh"
namespace seastar::httpd {
class routes;
}
namespace api {
void set_commitlog(http_context& ctx, httpd::routes& r);
struct http_context;
void set_commitlog(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -11,6 +11,7 @@
#include "compaction_manager.hh"
#include "compaction/compaction_manager.hh"
#include "api/api.hh"
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
@@ -51,7 +52,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
for (const auto& c : cm.get_compactions()) {
cm::summary s;
s.id = c.compaction_uuid.to_sstring();
s.id = fmt::to_string(c.compaction_uuid);
s.ks = c.ks_name;
s.cf = c.cf_name;
s.unit = "keys";
@@ -116,9 +117,9 @@ void set_compaction_manager(http_context& ctx, routes& r) {
table_names = map_keys(ctx.db.local().find_keyspace(ks_name).metadata().get()->cf_meta_data());
}
auto type = req->get_query_param("type");
co_await ctx.db.invoke_on_all([&ks_name, &table_names, type] (replica::database& db) {
co_await ctx.db.invoke_on_all([&] (replica::database& db) {
auto& cm = db.get_compaction_manager();
return parallel_for_each(table_names, [&db, &cm, &ks_name, type] (sstring& table_name) {
return parallel_for_each(table_names, [&] (sstring& table_name) {
auto& t = db.find_column_family(ks_name, table_name);
return t.parallel_foreach_table_state([&] (compaction::table_state& ts) {
return cm.stop_compaction(type, &ts);
@@ -161,7 +162,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
co_await s.write("[");
co_await ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable -> future<> {
cm::history h;
h.id = entry.id.to_sstring();
h.id = fmt::to_string(entry.id);
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;

View File

@@ -8,10 +8,12 @@
#pragma once
#include "api.hh"
namespace seastar::httpd {
class routes;
}
namespace api {
void set_compaction_manager(http_context& ctx, httpd::routes& r);
struct http_context;
void set_compaction_manager(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -6,14 +6,21 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include "api/api.hh"
#include "api/config.hh"
#include "api/api-doc/config.json.hh"
#include "api/api-doc/storage_proxy.json.hh"
#include "api/api-doc/storage_service.json.hh"
#include "replica/database.hh"
#include "db/config.hh"
#include <sstream>
#include <boost/algorithm/string/replace.hpp>
#include <seastar/http/exception.hh>
namespace api {
using namespace seastar::httpd;
namespace sp = httpd::storage_proxy_json;
namespace ss = httpd::storage_service_json;
template<class T>
json::json_return_type get_json_return_type(const T& val) {
@@ -100,6 +107,112 @@ void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx
}
throw bad_param_exception(sstring("No such config entry: ") + id);
});
sp::get_rpc_timeout.set(r, [&cfg](const_req req) {
return cfg.request_timeout_in_ms()/1000.0;
});
sp::set_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
return make_ready_future<json::json_return_type>(seastar::json::json_void());
});
sp::get_read_rpc_timeout.set(r, [&cfg](const_req req) {
return cfg.read_request_timeout_in_ms()/1000.0;
});
sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
return make_ready_future<json::json_return_type>(seastar::json::json_void());
});
sp::get_write_rpc_timeout.set(r, [&cfg](const_req req) {
return cfg.write_request_timeout_in_ms()/1000.0;
});
sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
return make_ready_future<json::json_return_type>(seastar::json::json_void());
});
sp::get_counter_write_rpc_timeout.set(r, [&cfg](const_req req) {
return cfg.counter_write_request_timeout_in_ms()/1000.0;
});
sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
return make_ready_future<json::json_return_type>(seastar::json::json_void());
});
sp::get_cas_contention_timeout.set(r, [&cfg](const_req req) {
return cfg.cas_contention_timeout_in_ms()/1000.0;
});
sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
return make_ready_future<json::json_return_type>(seastar::json::json_void());
});
sp::get_range_rpc_timeout.set(r, [&cfg](const_req req) {
return cfg.range_request_timeout_in_ms()/1000.0;
});
sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
return make_ready_future<json::json_return_type>(seastar::json::json_void());
});
sp::get_truncate_rpc_timeout.set(r, [&cfg](const_req req) {
return cfg.truncate_request_timeout_in_ms()/1000.0;
});
sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<http::request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("timeout");
return make_ready_future<json::json_return_type>(seastar::json::json_void());
});
ss::get_all_data_file_locations.set(r, [&cfg](const_req req) {
return container_to_vec(cfg.data_file_directories());
});
ss::get_saved_caches_location.set(r, [&cfg](const_req req) {
return cfg.saved_caches_directory();
});
}
void unset_config(http_context& ctx, routes& r) {
cs::find_config_id.unset(r);
sp::get_rpc_timeout.unset(r);
sp::set_rpc_timeout.unset(r);
sp::get_read_rpc_timeout.unset(r);
sp::set_read_rpc_timeout.unset(r);
sp::get_write_rpc_timeout.unset(r);
sp::set_write_rpc_timeout.unset(r);
sp::get_counter_write_rpc_timeout.unset(r);
sp::set_counter_write_rpc_timeout.unset(r);
sp::get_cas_contention_timeout.unset(r);
sp::set_cas_contention_timeout.unset(r);
sp::get_range_rpc_timeout.unset(r);
sp::set_range_rpc_timeout.unset(r);
sp::get_truncate_rpc_timeout.unset(r);
sp::set_truncate_rpc_timeout.unset(r);
ss::get_all_data_file_locations.unset(r);
ss::get_saved_caches_location.unset(r);
}
}

View File

@@ -8,10 +8,11 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
#include <seastar/http/api_docs.hh>
namespace api {
void set_config(std::shared_ptr<httpd::api_registry_builder20> rb, http_context& ctx, httpd::routes& r, const db::config& cfg, bool first = false);
void unset_config(http_context& ctx, httpd::routes& r);
}

70
api/cql_server_test.cc Normal file
View File

@@ -0,0 +1,70 @@
/*
* Copyright (C) 2024-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#ifndef SCYLLA_BUILD_MODE_RELEASE
#include <seastar/core/coroutine.hh>
#include <boost/range/algorithm/transform.hpp>
#include "api/api-doc/cql_server_test.json.hh"
#include "cql_server_test.hh"
#include "transport/controller.hh"
#include "transport/server.hh"
#include "service/qos/qos_common.hh"
namespace api {
namespace cst = httpd::cql_server_test_json;
using namespace json;
using namespace seastar::httpd;
struct connection_sl_params : public json::json_base {
json::json_element<sstring> _role_name;
json::json_element<sstring> _workload_type;
json::json_element<sstring> _timeout;
connection_sl_params(const sstring& role_name, const sstring& workload_type, const sstring& timeout) {
_role_name = role_name;
_workload_type = workload_type;
_timeout = timeout;
register_params();
}
connection_sl_params(const connection_sl_params& params)
: connection_sl_params(params._role_name(), params._workload_type(), params._timeout()) {}
void register_params() {
add(&_role_name, "role_name");
add(&_workload_type, "workload_type");
add(&_timeout, "timeout");
}
};
void set_cql_server_test(http_context& ctx, seastar::httpd::routes& r, cql_transport::controller& ctl) {
cst::connections_params.set(r, [&ctl] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto sl_params = co_await ctl.get_connections_service_level_params();
std::vector<connection_sl_params> result;
boost::transform(std::move(sl_params), std::back_inserter(result), [] (const cql_transport::connection_service_level_params& params) {
auto nanos = std::chrono::duration_cast<std::chrono::nanoseconds>(params.timeout_config.read_timeout).count();
return connection_sl_params(
std::move(params.role_name),
sstring(qos::service_level_options::to_string(params.workload_type)),
to_string(cql_duration(months_counter{0}, days_counter{0}, nanoseconds_counter{nanos})));
});
co_return result;
});
}
void unset_cql_server_test(http_context& ctx, seastar::httpd::routes& r) {
cst::connections_params.unset(r);
}
}
#endif

29
api/cql_server_test.hh Normal file
View File

@@ -0,0 +1,29 @@
/*
* Copyright (C) 2024-present ScyllaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#ifndef SCYLLA_BUILD_MODE_RELEASE
#pragma once
namespace cql_transport {
class controller;
}
namespace seastar::httpd {
class routes;
}
namespace api {
struct http_context;
void set_cql_server_test(http_context& ctx, seastar::httpd::routes& r, cql_transport::controller& ctl);
void unset_cql_server_test(http_context& ctx, seastar::httpd::routes& r);
}
#endif

View File

@@ -6,45 +6,15 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include "locator/token_metadata.hh"
#include "locator/snitch_base.hh"
#include "locator/production_snitch_base.hh"
#include "endpoint_snitch.hh"
#include "api/api-doc/endpoint_snitch_info.json.hh"
#include "api/api-doc/storage_service.json.hh"
#include "utils/fb_utilities.hh"
namespace api {
using namespace seastar::httpd;
void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_ptr>& snitch) {
static auto host_or_broadcast = [](const_req req) {
auto host = req.get_query_param("host");
return host.empty() ? gms::inet_address(utils::fb_utilities::get_broadcast_address()) : gms::inet_address(host);
};
httpd::endpoint_snitch_info_json::get_datacenter.set(r, [&ctx](const_req req) {
auto& topology = ctx.shared_token_metadata.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return locator::endpoint_dc_rack::default_location.dc;
}
return topology.get_datacenter(ep);
});
httpd::endpoint_snitch_info_json::get_rack.set(r, [&ctx](const_req req) {
auto& topology = ctx.shared_token_metadata.local().get()->get_topology();
auto ep = host_or_broadcast(req);
if (!topology.has_endpoint(ep)) {
// Cannot return error here, nodetool status can race, request
// info about just-left node and not handle it nicely
return locator::endpoint_dc_rack::default_location.rack;
}
return topology.get_rack(ep);
});
httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [&snitch] (const_req req) {
return snitch.local()->get_name();
});
@@ -60,8 +30,6 @@ void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_p
}
void unset_endpoint_snitch(http_context& ctx, routes& r) {
httpd::endpoint_snitch_info_json::get_datacenter.unset(r);
httpd::endpoint_snitch_info_json::get_rack.unset(r);
httpd::endpoint_snitch_info_json::get_snitch_name.unset(r);
httpd::storage_service_json::update_snitch.unset(r);
}

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace locator {
class snitch_ptr;

View File

@@ -7,10 +7,8 @@
*/
#include "api/api-doc/error_injection.json.hh"
#include "api/api.hh"
#include "api/api_init.hh"
#include <seastar/http/exception.hh>
#include "log.hh"
#include "utils/error_injection.hh"
#include "utils/rjson.hh"
#include <seastar/core/future-util.hh>
@@ -64,6 +62,32 @@ void set_error_injection(http_context& ctx, routes& r) {
});
});
hf::read_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {
const sstring injection = req->get_path_param("injection");
std::vector<error_injection_json::error_injection_info> error_injection_infos(smp::count, error_injection_json::error_injection_info{});
co_await smp::invoke_on_all([&] {
auto& info = error_injection_infos[this_shard_id()];
auto& errinj = utils::get_local_injector();
const auto enabled = errinj.is_enabled(injection);
info.enabled = enabled;
if (!enabled) {
return;
}
std::vector<error_injection_json::mapper> parameters;
for (const auto& p : errinj.get_injection_parameters(injection)) {
error_injection_json::mapper param;
param.key = p.first;
param.value = p.second;
parameters.push_back(std::move(param));
}
info.parameters = std::move(parameters);
});
co_return json::json_return_type(error_injection_infos);
});
hf::disable_on_all.set(r, [](std::unique_ptr<request> req) {
auto& errinj = utils::get_local_injector();
return errinj.disable_on_all().then([] {

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace api {

View File

@@ -7,6 +7,7 @@
*/
#include "failure_detector.hh"
#include "api/api.hh"
#include "api/api-doc/failure_detector.json.hh"
#include "gms/application_state.hh"
#include "gms/gossiper.hh"
@@ -65,7 +66,7 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
return g.container().invoke_on(0, [] (gms::gossiper& g) {
std::map<sstring, sstring> nodes_status;
g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {
nodes_status.emplace(node.to_sstring(), g.is_alive(node) ? "UP" : "DOWN");
nodes_status.emplace(fmt::to_string(node), g.is_alive(node) ? "UP" : "DOWN");
});
return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));
});
@@ -98,5 +99,16 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
});
}
void unset_failure_detector(http_context& ctx, routes& r) {
fd::get_all_endpoint_states.unset(r);
fd::get_up_endpoint_count.unset(r);
fd::get_down_endpoint_count.unset(r);
fd::get_phi_convict_threshold.unset(r);
fd::get_simple_states.unset(r);
fd::set_phi_convict_threshold.unset(r);
fd::get_endpoint_state.unset(r);
fd::get_endpoint_phi_values.unset(r);
}
}

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api_init.hh"
namespace gms {
@@ -19,5 +19,6 @@ class gossiper;
namespace api {
void set_failure_detector(http_context& ctx, httpd::routes& r, gms::gossiper& g);
void unset_failure_detector(http_context& ctx, httpd::routes& r);
}

View File

@@ -12,6 +12,7 @@
#include "api/api-doc/gossiper.json.hh"
#include "gms/endpoint_state.hh"
#include "gms/gossiper.hh"
#include "api/api.hh"
namespace api {
using namespace seastar::httpd;
@@ -70,4 +71,14 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {
});
}
void unset_gossiper(http_context& ctx, routes& r) {
httpd::gossiper_json::get_down_endpoint.unset(r);
httpd::gossiper_json::get_live_endpoint.unset(r);
httpd::gossiper_json::get_endpoint_downtime.unset(r);
httpd::gossiper_json::get_current_generation_number.unset(r);
httpd::gossiper_json::get_current_heart_beat_version.unset(r);
httpd::gossiper_json::assassinate_endpoint.unset(r);
httpd::gossiper_json::force_remove_endpoint.unset(r);
}
}

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace gms {
@@ -19,5 +19,6 @@ class gossiper;
namespace api {
void set_gossiper(http_context& ctx, httpd::routes& r, gms::gossiper& g);
void unset_gossiper(http_context& ctx, httpd::routes& r);
}

View File

@@ -6,10 +6,10 @@
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#include <algorithm>
#include <vector>
#include "hinted_handoff.hh"
#include "api/api.hh"
#include "api/api-doc/hinted_handoff.json.hh"
#include "gms/inet_address.hh"

View File

@@ -9,7 +9,7 @@
#pragma once
#include <seastar/core/sharded.hh>
#include "api.hh"
#include "api/api_init.hh"
namespace service { class storage_proxy; }

View File

@@ -8,12 +8,10 @@
#include "api/api-doc/lsa.json.hh"
#include "api/lsa.hh"
#include "api/api.hh"
#include <seastar/http/exception.hh>
#include "utils/logalloc.hh"
#include "log.hh"
#include "replica/database.hh"
namespace api {
using namespace seastar::httpd;
@@ -21,9 +19,9 @@ using namespace seastar::httpd;
static logging::logger alogger("lsa-api");
void set_lsa(http_context& ctx, routes& r) {
httpd::lsa_json::lsa_compact.set(r, [&ctx](std::unique_ptr<request> req) {
httpd::lsa_json::lsa_compact.set(r, [](std::unique_ptr<request> req) {
alogger.info("Triggering compaction");
return ctx.db.invoke_on_all([] (replica::database&) {
return smp::invoke_on_all([] {
logalloc::shard_tracker().reclaim(std::numeric_limits<size_t>::max());
}).then([] {
return json::json_return_type(json::json_void());

View File

@@ -8,7 +8,7 @@
#pragma once
#include "api.hh"
#include "api/api_init.hh"
namespace api {

View File

@@ -10,8 +10,8 @@
#include "message/messaging_service.hh"
#include <seastar/rpc/rpc_types.hh>
#include "api/api-doc/messaging_service.json.hh"
#include <iostream>
#include <sstream>
#include "api/api-doc/error_injection.json.hh"
#include "api/api.hh"
using namespace seastar::httpd;
using namespace httpd::messaging_service_json;
@@ -19,6 +19,8 @@ using namespace netw;
namespace api {
namespace hf = httpd::error_injection_json;
using shard_info = messaging_service::shard_info;
using msg_addr = messaging_service::msg_addr;
@@ -112,7 +114,7 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging
}));
get_version.set(r, [&ms](const_req req) {
return ms.local().get_raw_version(req.get_query_param("addr"));
return ms.local().get_raw_version(gms::inet_address(req.get_query_param("addr")));
});
get_dropped_messages_by_ver.set(r, [&ms](std::unique_ptr<request> req) {
@@ -142,6 +144,14 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging
return make_ready_future<json::json_return_type>(res);
});
});
hf::inject_disconnect.set(r, [&ms] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto ip = msg_addr(req->get_path_param("ip"));
co_await ms.invoke_on_all([ip] (netw::messaging_service& ms) {
ms.remove_rpc_client(ip);
});
co_return json::json_void();
});
}
void unset_messaging_service(http_context& ctx, routes& r) {
@@ -155,6 +165,7 @@ void unset_messaging_service(http_context& ctx, routes& r) {
get_respond_completed_messages.unset(r);
get_version.unset(r);
get_dropped_messages_by_ver.unset(r);
hf::inject_disconnect.unset(r);
}
}

Some files were not shown because too many files have changed in this diff Show More