Commit Graph

493 Commits

Author SHA1 Message Date
Avi Kivity
5e764d1de2 Merge 'Drop v2 and flat from reader and related names' from Botond Dénes
Following a number of similar code cleanup PR, this one aims to be the last one, definitely dropping flat from all reader and related names.
Similarly, v2 is also dropped from reader names, although it still persists in mutation_fragment_v2, mutation_v2 and related names. This won't change in the foreseeable future, as we don't have plans to drop mutation (the v1 variant).
The changes in this PR are entirely mechanical, mostly just search-and-replace.

Code cleanup, no backport required.

Closes scylladb/scylladb#24087

* github.com:scylladb/scylladb:
  test/boost/mutation_reader_another_test: drop v2 from reader and related names
  test/boost/mutation_reader: s/puppet_reader_v2/puppet_reader/
  test/boost/sstable_datafile_test: s/sstable_reader_v2/sstable_mutation_reader/
  test/boost/mutation_test: s/consumer_v2/consumer/
  test/lib/mutation_reader_assertions: s/flat_reader_assertions_v2/mutation_reader_assertions/
  readers/mutation_readers: s/generating_reader_v2/generating_reader/
  readers/mutation_readers: s/delegating_reader_v2/delegating_reader/
  readers/mutation_readers: s/empty_flat_reader_v2/empty_mutation_reader/
  readers/mutation_source: s/make_reader_v2/make_mutation_reader/
  readers/mutation_source: s/flat_reader_v2_factory_type/mutation_reader_factory/
  readers/mutation_reader: s/reader_consumer_v2/mutation_reader_consumer/
  mutation/mutation_compactor: drop v2 from compactor and related names
  replica/table: s/make_reader_v2/make_mutation_reader/
  mutation_writer: s/bucket_writer_v2/bucket_writer/
  readers/queue: drop v2 from reader and related names
  readers/multishard: drop v2 from reader and related names
  readers/evictable: drop v2 from reader and related names
  readers/multi_range: remove flat from name
2025-05-11 22:22:35 +03:00
Botond Dénes
b5170e27d0 replica/table: s/make_reader_v2/make_mutation_reader/ 2025-05-09 07:53:29 -04:00
Botond Dénes
ca7f557e86 readers/multishard: drop v2 from reader and related names 2025-05-09 07:53:29 -04:00
Raphael S. Carvalho
28056344ba replica: Fix take_storage_snapshot() running concurrently to merge completion
Some background:
When merge happens, a background fiber wakes up to merge compaction
groups of sibling tablets into main one. It cannot happen when
rebuilding the storage group list, since token metadata update is
not preemptable. So a storage group, post merge, has the main
compaction group and two other groups to be merged into the main.
When the merge happens, those two groups are empty and will be
freed.

Consider this scenario:
1) merge happens, from 2 to 1 tablet
2) produces a single storage group, containing main and two
other compaction groups to be merged into main.
3) take_storage_snapshot(), triggered by migration post merge,
gets a list of pointer to all compaction groups.
4) t__s__s() iterates first on main group, yields.
5) background fiber wakes up, moves the data into main
and frees the two groups
6) t__s__s() advances to other groups that are now freed,
since step 5.
7) segmentation fault

In addition to memory corruption, there's also a potential for
data to escape the iteration in take_storage_snapshot(), since
data can be moved across compaction groups in background, all
belonging to the same storage group. That could result in
data loss.

Readers should all operate on storage group level since it can
provide a view on all the data owned by a tablet replica.
The movement of sstable from group A to B is atomic, but
iteration first on A, then later on B, might miss data that
was moved from B to A, before the iteration reached B.
By switching to storage group in the interface that retrieves
groups by token range, we guarantee that all data of a given
replica can be found regardless of which compaction group they
sit on.

Fixes #23162.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#24058
2025-05-09 14:07:06 +03:00
Botond Dénes
0a9ca52cfd replica/database: memtable_list: save ref to memtable_table_shared_data
This is passed by reference to the constructor, but a copy is saved into
the _table_shared_data member. A reference to this member is passed down
to all memtable readers. Because of the copy, the memtable readers save
a reference to the memtable_list's member, which goes away together with
the memtable_list when the storage_group is destroyed.
This causes use-after-free when a storage group is destroyed while a
memtable read is still ongoing. The memtable reader keeps the memtable
alive, but its reference to the memtable_table_shared_data becomes
stale.
Fix by saving a reference in the memtable_list too, so memtable readers
receive a reference pointing to the original replica::table member,
which is stable accross tablet migrations and merges.
The copy was introduced by 2a76065e3d.
There was a copy even before this commit, but in the previous vnode-only
world this was fine -- there was one memtable_list per table and it was
around until the table itself was. In the tablet world, this is no
longer given, but the above commit didn't account for this.

A test is included, which reproduces the use-after-free on memtable
migration. The test is somewhat artificial in that the use-after-free
would be prevented by holding on to an ERM, but this is done
intentionaly to keep the test simple. Migration -- unlike merge where
this use-after-free was originally observed -- is easy to trigger from
unit tests.

Fixes: #23762

Closes scylladb/scylladb#23984
2025-05-06 22:13:17 +03:00
Raphael S. Carvalho
21d1e78457 compaction: Wire table_state into make_sstable_set()
This will be useful for feeding token range owned by compaction group
into sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-04-29 15:47:33 -03:00
Botond Dénes
f5125ffa18 Merge 'Ensure raft group0 RPCs use the gossip scheduling group.' from Sergey Zolotukhin
Scylla operations use concurrency semaphores to limit the number of concurrent operations and prevent resource exhaustion. The semaphore is selected based on the current scheduling group.

For RAFT group operations, it is essential to use a system semaphore to avoid queuing behind user operations. This patch ensures that RAFT operations use the `gossip` scheduling group to leverage the system semaphore.

Fixes scylladb/scylladb#21637

Backport: 6.2 and 6.1

Closes scylladb/scylladb#22779

* github.com:scylladb/scylladb:
  Ensure raft group0 RPCs use the gossip scheduling group
  Move RAFT operations verbs to GOSSIP group.
2025-04-16 09:11:29 +03:00
Sergey Zolotukhin
e05c082002 Ensure raft group0 RPCs use the gossip scheduling group
Scylla operations use concurrency semaphores to limit the number
of concurrent operations and prevent resource exhaustion. The
semaphore is selected based on the current scheduling group.
For Raft group operations, it is essential to use a system semaphore to
avoid queuing behind user operations.
This commit adds a check to ensure that the raft group0 RPCs are
executed with the `gossiper` scheduling group.
2025-04-14 17:10:46 +02:00
Benny Halevy
7342a57cbb replica: table: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00
Pavel Emelyanov
d9853efa7c Merge '[Out-of-space prevention] db: backup: prioritize sstables that were deleted from the table' from Benny Halevy
The motivation behind this change to free up disk space as early as possible.
The reason is that snapshot locks the space of all SSTables in the snapshot,
and deleting form the table, for example, by compaction, or tablet migration,
won't free-up their capacity until they are uploaded to object storage and deleted from the snapshot.

This series adds prioritization of deleted sstables in two cases:
First, after the snapshot dir is processed, the list of SSTable generation is cross-referenced with the
list of SSTables presently in the table and any generation that is not in the table is prioritized to
be uploaded earlier.
In addition, a subscription mechanism was added to sstables_manager
and it is used in backup to prioritize SSTables that get deleted from the table directory
during backup.

This is particularly important when backup happens during high disk utilization (e.g. 90%).
Without it, even if the cluster is scaled up and tablets are migrated away from the full nodes
to new nodes, tablet cleanup might not free any space if all the tablet sstables are hardlinked to the
snapshot taken for backup.

* Enhancement, no backport needed

Closes scylladb/scylladb#23241

* github.com:scylladb/scylladb:
  db: snapshot: backup_task: prioritize sstables deleted during upload
  sstables_manager: add subscriptions
  db: snapshot: backup_task: limit concurrency
  sstables: directory_semaphore: expose get_units
  db: snapshot: backup_task: add sharded sstables_manager
  database: expose get_sstables_manager(schema)
  db: snapshot: backup_task: do_backup: prioritize sstables that are already deleted from the table
  db: snapshot-ctl: pass table_id to backup_task
  db: snapshot-ctl: expose sharded db() getter
  db: snapshot: backup_task: do_backup: organize components by sstable generation
  db: snapshot: coroutinize backup_task
  db: snapshot: backup_task: refactor backup_file out of uploads_worker
  db: snapshot: backup_task: refactor uploads_worker out of do_backup
  db: snapshot: backup_task: process_snapshot_dir: initialize total progress
  utils/s3: upload_progress: init members to 0
  db: snapshot: backup_task: do_backup: refactor process_snapshot_dir
  db: snapshot: backup_task: keep expection as member
2025-04-09 15:32:11 +03:00
Benny Halevy
b270d552fb database: expose get_sstables_manager(schema)
Return either the system or use sstables manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-09 08:54:07 +03:00
Tomasz Grabiec
06b49bdf69 Merge 'row_cache: don't garbage-collect tombstones which cover data in memtables' from Botond Dénes
The row cache can garbage-collect tombstones in two places:
1) When populating the cache - the underlying reader pipeline has a `compacting_reader` in it;
2) During reads - reads now compact data including garbage collection;

In both cases, garbage collection has to do overlap checks against memtables, to avoid collecting tombstones which cover data in the memtables.
This PR includes fixes for (2), which were not handled at all currently.
(1) was already supposed to be fixed, see https://github.com/scylladb/scylladb/issues/20916. But the test added in this PR showed that the test is incomplete: https://github.com/scylladb/scylladb/issues/23291. A fix for this issue is also included.

Fixes: https://github.com/scylladb/scylladb/issues/23291
Fixes: https://github.com/scylladb/scylladb/issues/23252

The fix will need backport to all live release.

Closes scylladb/scylladb#23255

* github.com:scylladb/scylladb:
  test/boost/row_cache_test: add memtable overlap check tests
  replica/table: add error injection to memtable post-flush phase
  utils/error_injection: add a way to set parameters from error injection points
  test/cluster: add test_data_resurrection_in_memtable.py
  test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts
  replica/mutation_dump: don't assume cells are live
  replica/database: do_apply() add error injection point
  replica: improve memtable overlap checks for the cache
  replica/memtable: add is_merging_to_cache()
  db/row_cache: add overlap-check for cache tombstone garbage collection
  mutation/mutation_compactor: copy key passed-in to consume_new_partition()
2025-04-08 17:26:58 +02:00
Raphael S. Carvalho
0f59deffaa replica: Fix truncate and drop table after tablet migration happens
When running those operations after a tablet replica is migrated away from
a shard, an assert can fail resulting in a crash.

Status quo (around the assert in truncate procedure):

1) Highest RP seen by table is saved in low_mark, and the current time in
low_mark_at.
2) Then compaction is disabled in order to not mix data written before truncate,
and data written later.
3) Then memtable is flushed in order for the data written before truncate to be
available in sstables and then removed.
4) Now, current time is saved in truncated_at, which is supposedly the time of
truncate to decide which sstables to remove.

Note: truncated_at is likely above low_mark_at due to steps 2 and 3.

The interesting part of the assert is:
    (truncated_at <= low_mark_at ? rp <= low_mark : low_mark <= rp)

Note: RP in the assert above is the highest RP among all sstables generated
before truncated_at. RP is retrieved by table::discard_sstables().

If truncated_at > low_mark_at, maybe newer data was written during steps 2 and
3, and memtable's RP becomes greater than low_mark, resulting in a SSTable with
RP > low_mark.
So assert's 2nd condition is there to defend against the scenario above.

truncated_at and low_mark_at uses millisecond granularity, so even if
truncated_at == low_mark_at, data could have been written in steps 2 and 3
(during same MS window), failing the assert. This is fragile.

Reproducer:

To reproduce the problem, truncated_at must be > low_mark_at, which can easily
happen with both drop table and truncate due to steps 2 and 3.

If a shard has 2 or more tablets, the table's highest RP refer to just one
tablet in that shard.
If the tablet with the highest RP is migrated away, then the sstables in that
shard will have lower RP than the recorded highest RP (it's a table wide state,
which makes sense since CL is shared among tablets).

So when either drop table or truncate runs, low_mark will be potentially bigger
than highest RP retrieved from sstables.

Proposed solution:

The current assert is hacked to not fail if writes sneak in, during steps 2 and
3, but it's still fragile and seems not to serve its real purpose, since it's
allowing for RP > low_mark.

We should be able to say that low_mark >= RP, as a way of asserting we're not
leaving data targeted by truncate behind (or that we're not removing the wrong
data).

But the problem is that we're saving low_mark in step 1, before preparation
steps (2 and 3). When truncated_at is recorded in step 4, it's a way of saying
all data written so far is targeted for removal. But as of today, low_mark
refers to all data written up to step 1. So low_mark is now only one set
before issuing flush, and also accounts for all potentially flushed data.

Fixes #18059.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#23560
2025-04-08 07:32:58 +03:00
Botond Dénes
d126ea09ba replica: improve memtable overlap checks for the cache
The current memtable overlap check that is used by the cache
-- table::get_max_purgeable_fn_for_cache_underlying_reader() -- only
checks the active memtable, so memtables which are either being flushed
or are already flushed and also have active reads against them do not
participate in the overlap check.
This can result in temporary data resurrection, where a cache read can
garbage-collect a tombstone which still covers data in a flushing or
flushed memtable, which still have active read against it.

To prevent this, extend the overlap check to also consider all of the
memtable list. Furthermore, memtable_list::erase() now places the removed
(flushed) memtable in an intrusive list. These entries are alive only as
long as there are readers still keeping an `lw_shared_ptr<memtable>`
alive. This list is now also consulted on overlap checks.
2025-04-08 00:11:35 -04:00
Pavel Emelyanov
10376b5b85 db: Re-use database::snapshot_table_on_all_shards()
There are two snapshot-on-all-shards methods on the database -- the one
that snapshots a keyspace and the one that snapshots a vector of tables.
The latter snapshots a single table with a neat helper, while the former
has the helper open-coded.

Re-using the helper in keyspace snapshot is worth it, but needs to patch
the helper to work on uuid, rather than ks:cf pair of strings.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23532
2025-04-07 11:55:43 +02:00
Michał Chojnowski
d920ab5366 database: add sample_data_files()
Add a helper for sampling the Data files for a given table.
We will use it to take samples for dictionary training.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
48c06c7e4b database: add take_sstable_set_snapshot()
We want a method that will allow us to take a stable snapshot of
SSTables, to asynchronously compute some stats on them.
But `take_storage_snapshot` is overly invasive for that, because
it flushes memtables on each call.
(If `take_storage_snapshot` was, for example, called repetitively,
it could create a ton of small memtables and lead to trouble).

This commit adds a weaker version which only takes a snapshot of
*existing SSTables*, and doesn't flush memtables by itself.

This will be useful for dictionary training, which doesn't
care about the semantics of SSTables, only their rough statistical
properties.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
30a9d471fa sstables: plug an sstable_compressor_factory into sstables_manager
Create a `sstable_compressor_factory_impl` in `scylla_main`,
and pipe it through constructors into `sstables_manager`.

In next commits, the factory available through the `sstables_manager`
will be used to create compressors for SSTable readers and writers.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
e23fdc0799 table: fix a race in table::take_storage_snapshot()
`safe_foreach_sstable` doesn't do its job correctly.

It iterates over an sstable set under the sstable deletion
lock in an attempt to ensure that SSTables aren't deleted during the iteration.

The thing is, it takes the deletion lock after the SSTable set is
already obtained, so SSTables might get unlinked *before* we take the lock.

Remove this function and fix its usages to obtain the set and iterate
over it under the lock.

Closes scylladb/scylladb#23397
2025-03-31 09:40:32 +03:00
Avi Kivity
7646e1448a Merge 'cql3: Introduce RF-rack-valid keyspaces' from Dawid Mędrek
This PR is an introductory step towards enforcing
RF-rack-valid keyspaces in Scylla.

The scope of changes:
* defining RF-rack-valid keyspaces,
* introducing a configuration option enforcing RF-rack-valid
  keyspaces,
* restricting the CREATE and ALTER KEYSPACE statements
  so that they never lead to RF-rack invalid keyspaces,
* during the initialization of a node, it verifies that all existing
  keyspaces are RF-rack-valid. If not, the initialization fails.

We provide tests verifying that the changes behave as intended.

---

Note that there are a number of things that still need to be implemented.
That includes, for instance, restricting topology operations too.

---

Implementation strategy (going beyond the scope of this PR):

1. Introduce the new configuration option `rf_rack_valid_keyspaces`.
2. Start enforcing RF-rack-validity in keyspaces if the option is enabled.
3. Adjust the tests: in the tree and out of it. Explicitly enable the option in all tests.
4. Once the tests have been adjusted, change the default value of the option to enabled.
5. Stop explicitly enabling the option in tests.
6. Get rid of the option.

---

Fixes scylladb/scylladb#20356
Fixes scylladb/scylladb#23276
Fixes scylladb/scylladb#23300

---

Backport: this is part of the requirements for releasing 2025.1.

Closes scylladb/scylladb#23138

* github.com:scylladb/scylladb:
  main: Refuse to start node when RF-rack-invalid keyspace exists
  cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces
  db/config: Introduce RF-rack-valid keyspaces
2025-03-20 19:10:36 +02:00
Dawid Mędrek
0e04a6f3eb main: Refuse to start node when RF-rack-invalid keyspace exists
When a node is started with the option `rf_rack_valid_keyspaces`
enabled, the initialization will fail if there is an RF-rack-invalid
keyspace. We want to force the user to adjust their existing
keyspaces when upgrading to 2025.* so that the invariant that
every keyspace is RF-rack-valid is always satisfied.

Fixes scylladb/scylladb#23300
2025-03-19 15:13:44 +01:00
Botond Dénes
fda3486770 Merge 'Remove some excessive ks:cf -> table_id conversions in API and schema_tables' from Pavel Emelyanov
Actually, the main goal of this PR was to remove parse_tables() helpers from api/ in favor of more flexible (yet same complex) parse_table_infos(), but it turned out that it also saves some lookups in database maps.

There are several places in API and schema_tables that have table_id at hand, but at some point drop it and carry keyspace and table names over to a place that maps ks:cf back to table_id and then uses it to find the table object. This PR keeps the table_id with the help of table_info struct in those places. This change allows removing the aforementioned parse_table() helpers from api/ and also saves few lookups in database maps.

Removing the parse_tables() from api/ is the continuation of previous effort that reduces the set of helpers in api/ code that help handlers "parse" keyspaces and tables names see #22742 #21533

Closes scylladb/scylladb#23216

* github.com:scylladb/scylladb:
  api: Remove the remaining parse_tables() overload
  database: Sanitize flush_tables_on_all_shards()
  schema_tables: Remove all_table_names()
  database: Make tables flushing helper use table_info-s, not names
  api: Make keyspace flush endpoint use parse_table_infos() (and a bit more)
  schema_tables,client_state: Switch to using all_table_infos()
  schema_tables: Tune up some methods to benefit from table_infos
  schema_tables: Introduce all_table_infos()
2025-03-17 15:40:41 +02:00
Pavel Emelyanov
2bb455ec75 Merge 'Main: stop system_keyspace' from Benny Halevy
This series adds an async guard to system_keyspace operations
and adds a deferred action to stop the system_keyspace in main() before destroying the service.

This helps to make sure that sys_ks is unplugged from its users and that all async operations using it are drained once it's stopped.

* Enhancement, no backport needed

Closes scylladb/scylladb#23113

* github.com:scylladb/scylladb:
  main: stop system keyspace
  system_keyspace: call shutdown from stop
  system_keyspace: shutdown: allow calling more than once
  database, compaction_manager, large_data_handler: use pluggable<system_keysapce>
  utils: add class pluggable
2025-03-14 13:23:28 +03:00
Pavel Emelyanov
89f3c1a91e database: Sanitize flush_tables_on_all_shards()
Previous patch left this method with few uglinesses

- the vector<table_id> argument is named table_names
- the sstring keyspace argument is unused
- the keyspace argument is captured for no use

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-10 13:13:10 +03:00
Pavel Emelyanov
c2d23d7948 database: Make tables flushing helper use table_info-s, not names
The database::flush_tables_on_all_shards() method accepts a keyspace
name and a vector of table names. Then it converts ks:cf pair for each
of the table name into a table-id and flushes the table with the ID.

All the callers of that method already have or can easily get the vector
of table_id-s, not just names, so make use of this.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-10 13:11:32 +03:00
Raphael S. Carvalho
fedd838b9d replica: Fix race of some operations like cleanup with snapshot
There are two semaphores in table for synchronizing changes to sstable list:

sstable_set_mutation_sem: used to serialize two concurrent operations updating
the list, to prevent them from racing with each other.

sstable_deletion_sem: A deletion guard, used to serialize deletion and
iteration over the list, to prevent iteration from finding deleted files on
disk.

they're always taken in this order to avoid deadlocks:
sstable_set_mutation_sem -> sstable_deletion_sem.

problem:

A = tablet cleanup
B = take_snapshot()

1) A acquires sstable_set_mutation_sem for updating list
2) A acquires sstable_deletion_sem, then delete sstable before updating list
3) A releases sstable_deletion_sem, then yield
4) B acquires sstable_deletion_sem
5) B iterates through list and bumps sstable deleted in step 2
6) B fails since it cannot find the file on disk

Initial reaction is to say that no procedure must delete sstable before
updating the list, that's true.

But we want a iteration, running concurrently to cleanup, to not find sstables
being removed from the system. Otherwise, e.g. snapshot works with sstables
of a tablet that was just cleaned up. That's achieved by serializing iteration
with list update.
Since sstable_deletion_sem is used within the scope of deletion only, it's
useless for achieving this. Cleanup could acquire the deletion sem when
preparing list updates, and then pass the "permit" to deletion function, but
then sstable_deletion_sem would essentially become sstable_set_mutation_sem,
which was created exactly to protect the list update.

That being said, it makes sense to merge both semaphores. Also things become
easier to reason about, and we don't have to worry about deadlocks anymore.

The deletion goes through sstable_list_builder, which holds a permit throughout
its lifetime, which guarantees that list updates and deletion are atomic to
other concurrent operations. The interface becomes less error prone with that.
It allowed us to find discard_sstables() was doing deletion without any permit,
meaning another race could happen between truncate and snapshot.

So we're fixing race of (truncate|cleanup) with take_snapshot, as far as we
know. It's possible another unknown races are fixed as well.

Fixes #23049.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#23117
2025-03-06 11:00:48 +02:00
Benny Halevy
fba88bdd62 database, compaction_manager, large_data_handler: use pluggable<system_keysapce>
To allow safe plug and unplug of the system_keyspace.

This patch follows-up on 917fdb9e53
(more specifically - f9b57df471)
Since just keeping a shared_ptr<system_keyspace> doesn't prevent
stopping the system_keyspace shards, while using the `pluggable`
interface allows safe draining of outstanding async calls
on shutdown, before stopping the system_keyspace.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-05 08:27:23 +02:00
Pavel Emelyanov
db1e29cfea replica: Mark database::make_keyspace_config() private
It's not been used outside of database class for long ago already

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-26 09:56:07 +03:00
Pavel Emelyanov
d7018ae3d9 replica: Prepare full keyspace config in make_keyspace_config()
Currently for system keyspace part of config members are configured
outside of this helper, in the caller. It's more consistent to have full
config initialization in one place.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-26 09:54:12 +03:00
Avi Kivity
d4c531307d replica: database.hh: drop dependency on boost ranges
Reduces dependency load.

Closes scylladb/scylladb#22749
2025-02-10 13:29:55 +01:00
Ran Regev
edd56a2c1c moved cache files to db
As requested in #22097, moved the files
and fixed other includes and build system.

Fixes: #22097
Signed-off-by: Ran Regev <ran.regev@scylladb.com>

Closes scylladb/scylladb#22495
2025-02-04 12:21:31 +03:00
Benny Halevy
0e388a1594 view: get_view_natural_endpoint: handle case when there are too few view replicas
Currently, when reducing RF, we may drop replicas from
the view before dropping replicas from the base table.
Since get_view_natural_endpoint is allowed to return
a disengaged optional if it can't find a pair for the
base replica, replcace the exiting assertion with code
handling this case, and count those events in a new
table metric: total_view_updates_failed_pairing.

Note that this does not fix the root cause for the issue
which is the unsynchronized dropping of replicas, that
should be atomic, using a single group0 transaction.

Refs scylladb/scylladb#21492

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-01-22 09:04:24 +02:00
Botond Dénes
47989b1503 Merge 'tasks: add tablet resize virtual task' from Aleksandra Martyniuk
In this change, tablet_virtual_task starts supporting tablet
resize (i.e. split and merge).

Users can see running resize tasks - finished tasks are not
presented with the task manager API.

A new task state "suspended" is added. If a resize was revoked,
it will appear to users as suspended. We assume that the resize was revoked
when the tablet number didn't change.

Fixes: #21366.
Fixes: #21367.

No backport, new feature

Closes scylladb/scylladb#21891

* github.com:scylladb/scylladb:
  test: boost: check resize_task_info in tablet_test.cc
  test: add tests to check revoked resize virtual tasks
  test: add tests to check the list of resize virtual tasks
  test: add tests to check spilt and merge virtual tasks status
  test: test_tablet_tasks: generalize functions
  replica: service: add split virtual task's children
  replica: service: pass parent info down to storage_group::split
  tasks: children of virtual tasks aren't internal by default
  tasks: initialize shard in task_info ctor
  service: extend tablet_virtual_task::abort
  service: retrun status_helper struct from tablet_virtual_task::get_status_helper
  service: extend tablet_virtual_task::wait
  tasks: add suspended task state
  service: extend tablet_virtual_task::get_status
  service: extend tablet_virtual_task::contains
  service: extend tablet_virtual_task::get_stats
  service: add service::task_manager_module::get_nodes
  tasks: add task_manager::get_nodes
  tasks: drop noexcept from module::get_nodes
  replica: service: add resize_task_info static column to system.tablets
  locator: extend tablet_task_info to cover resize tasks
2025-01-17 14:24:07 +02:00
Botond Dénes
55963f8f79 replica: remove noexcept from token -> tablet resolution path
The methods to resolve a key/token/range to a table are all noexcept.
Yet the method below all of these, `storage_group_for_id()` can throw.
This means that if due to any mistake a tablet without local replica is
attempted to be looked up, it will result in a crash, as the exception
bubbles up into the noexcept methods.
There is no value in pretending that looking up the tablet replica is
noexcept, remove the noexcept specifiers so that any bad lookup only
fails the operation at hand and doesn't crash the node. This is
especially relevant to replace, which still has a window where writes
can arrive for tablets that don't (yet) have a local replica. Currently,
this results in a crash. After this patch, this will only fail the
writes and the replace can move on.

Fixes: #21480

Closes scylladb/scylladb#22251
2025-01-17 11:24:09 +03:00
Kefu Chai
7215d4bfe9 utils: do not include unused headers
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-14 07:56:39 -05:00
Aleksandra Martyniuk
7ef6900837 replica: service: pass parent info down to storage_group::split
Pass task_info down to storage_group::split.

In the following patches, it will be used to set the parent
of offstrategy_compaction_task_executor and split_compaction_task_executor
running as a part of the split. The task_info param will contain task
info of a split virtual task.
2025-01-10 10:03:08 +01:00
Piotr Dulikowski
7383013f43 replica/database: add reader concurrency semaphore groups
Replace the reader concurrency semaphores for user reads and view
updates with the newly introduced reader concurrency semaphore group,
which assigns a semaphore for each service level.

Each group is statically assigned to some pool of memory on startup and
dynamically distribute this memory between the semaphores, relative to
the number of shares of the corresponding scheduling group.

The intent of having a separate reader concurrency semaphore for each
scheduling group is to prevent priority inversion issues due to reads
with different priorities waiting on the same semaphore, as well as make
memory allocation more fair between service levels due to the adjusted
number of shares.
2025-01-02 07:13:34 +01:00
Avi Kivity
4905b1bf76 Merge 'table: make update_effective_replication_map sync again' from Benny Halevy
Commit f2ff701489 introduced
a yield in update_effective_replication_map that might
cause the storage_group manager to be inconsistent with the
new effective_replication_map (e.g. if yielding right
before calling `handle_tablet_split_completion`.

Also, yielding inside storage_service::replicate_to_all_cores
update loop means that base tables and their views
aren't updated atomically, that caused scylladb/scylladb#17786

This change essentially reverts f2ff701489
and makes handle_tablet_split_completion synchronous too.
The stopped compaction groups future is kept as a member and
storage_group_manager::stop() consumes this future during table::stop().

- storage_service: replicate_to_all_cores: update base and view tables atomically

Currently, the loop updating all tables (including views) with the
new effective_replication_map may yield, and therefore expose
a state where the base and view tables effective_replication_map
and topology are out of sync (as seen in scylladb/scylladb#17786)

To prevent that, loop over all base tables and for each table
update the base table and all views atomically, without yielding,
and so allow yielding only between base tables.

* Regression was introduced in f2ff701489, so backport is required to 6.x, 2024.2

Closes scylladb/scylladb#21781

* github.com:scylladb/scylladb:
  storage_service: replicate_to_all_cores: clear_gently pending erms
  test_mv_topology_change: drop delay_after_erm_update injection case
  storage_service: replicate_to_all_cores: update base and view tables atomically
  table: make update_effective_replication_map sync again
2024-12-30 23:42:06 +02:00
Takuya ASADA
03461d6a54 test: compile unit tests into a single executable
To reduce test executable size and speed up compilation time, compile unit
tests into a single executable.

Here is a file size comparison of the unit test executable:

- Before applying the patch
$ du -h --exclude='*.o' --exclude='*.o.d' build/release/test/boost/ build/debug/test/boost/
11G	build/release/test/boost/
29G	build/debug/test/boost/

- After applying the patch
du -h --exclude='*.o' --exclude='*.o.d' build/release/test/boost/ build/debug/test/boost/
5.5G	build/release/test/boost/
19G	build/debug/test/boost/

It reduces executable sizes 5.5GB on release, and 10GB on debug.

Closes #9155

Closes scylladb/scylladb#21443
2024-12-22 19:14:09 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Kefu Chai
e65fc35b5e replica: do not include unused headers
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21836
2024-12-18 13:52:57 +02:00
Benny Halevy
10c4cf930c table: make update_effective_replication_map sync again
Commit f2ff701489 introduced
a yield in update_effective_replication_map that might
cause the storage_group manager to be inconsistent with the
new effective_replication_map (e.g. if yielding right
before calling `handle_tablet_split_completion`.

Also, yielding inside storage_service::replicate_to_all_cores
update loop means that base tables and their views
aren't updated atomically, that caused scylladb/scylladb#17786

This change essentially reverts f2ff701489
and makes handle_tablet_split_completion synchronous too.
The stopped compaction groups future is kept as a memebr and
storage_group_manager::stop() consumes this future during table::stop().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-12-15 11:45:08 +02:00
Avi Kivity
841481c202 Merge "move storage proxy and adjacent services to identify hosts by ids" from Gleb
"
This rather large patch series moves storage proxy and some adjacent
services (like migration manager) to use host ids to identify nodes rather
than ips. Messaging service gains a capability to address nodes by host
ids (which allows dropping translations from topology coordinator code
that worked on host ids already) and also makes sure that a node with
incorrect host id will reject a message (can happen during address
changes).

The series gets rid of the raft address map completely and replaces it with
the gossiper address map which is managed by the gossiper since translation
is now done in the layer below raft.

Fixes: scylladb/scylladb#6403

perf-simple-query -- smp 1 -m 1G output

Before:

enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
64336.82 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41291 insns/op,   24485 cycles/op,        0 errors)
62669.58 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41277 insns/op,   24695 cycles/op,        0 errors)
69172.12 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41326 insns/op,   24463 cycles/op,        0 errors)
56706.60 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41143 insns/op,   24513 cycles/op,        0 errors)
56416.65 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41186 insns/op,   24851 cycles/op,        0 errors)

         throughput: mean=61860.35 standard-deviation=5395.48 median=62669.58 median-absolute-deviation=5153.75 maximum=69172.12 minimum=56416.65
instructions_per_op: mean=41244.62 standard-deviation=76.90 median=41276.94 median-absolute-deviation=58.55 maximum=41326.19 minimum=41142.80
  cpu_cycles_per_op: mean=24601.35 standard-deviation=167.39 median=24512.64 median-absolute-deviation=116.65 maximum=24851.45 minimum=24462.70

After:

enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
65237.35 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   40733 insns/op,   23145 cycles/op,        0 errors)
59283.09 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40624 insns/op,   23948 cycles/op,        0 errors)
70851.03 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40625 insns/op,   23027 cycles/op,        0 errors)
70549.61 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40650 insns/op,   23266 cycles/op,        0 errors)
68634.96 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   40622 insns/op,   22935 cycles/op,        0 errors)

         throughput: mean=66911.21 standard-deviation=4814.60 median=68634.96 median-absolute-deviation=3638.40 maximum=70851.03 minimum=59283.09
instructions_per_op: mean=40650.89 standard-deviation=47.55 median=40624.60 median-absolute-deviation=27.11 maximum=40733.37 minimum=40622.33
  cpu_cycles_per_op: mean=23264.16 standard-deviation=402.12 median=23145.29 median-absolute-deviation=237.63 maximum=23947.96 minimum=22934.59

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13531/
SCT (longevity-100gb-4h with nemesis_selector: ['topology_changes']): https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/gleb/job/move-to-host-id/3/

Tested mixed cluster manually.
"

* 'gleb/move-to-host-id-v2' of github.com:scylladb/scylla-dev: (55 commits)
  group0: drop unused field from replace_info struct
  test: rename raft_address_map_test to address_map_test and move if from raft tests
  raft_address_map: remove raft address map
  topology coordinator: do not modify expire state for left/new nodes any more in raft address map
  topology coordinator: drop expiring entries in gossiper address map on error injections since raft one is no longer used
  group0: drop raft address map dependency from raft_rpc
  group0: move raft_ticker_type definition from raft_address_map.hh
  storage_service: do not update raft address map on gossiper events
  group0: drop raft address map dependency from raft_server_with_timeouts
  group0: move group0 upgrade code to host ids
  repair: drop raft address map dependency
  group0: remove unused raft address map getter from raft_group0
  group0: drop raft address map from group0_state_machine dependency since it is not used there any more
  group0: remove dependency on raft address map from group0_state_id_handler
  gossiper: add get_application_state_ptr that searches by host_id
  gossiper: change get_live_token_owners to return host ids
  view: move view building to host id
  hints: use host id to send hints
  storage_proxy: remove id_vector_to_addr since it is no longer used
  db: consistency_level: change is_sufficient_live_nodes to work on host ids
  ...
2024-12-03 18:18:48 +02:00
Avi Kivity
58baeac0ad Merge 'compaction: update maintenance sstable set on scrub compaction completion' from Lakshmi Narayanan Sreethar
Scrub compaction can pick up input sstables from maintenance sstable set
but on compaction completion, it doesn't update the maintenance set
leaving the original sstable in set after it has been scrubbed. To fix
this, on compaction completion has to update the maintenance sstable if
the input originated from there. This PR solves the issue by updating the
correct sstable_sets on compaction completion.

Fixes #20030

This issue has existed since the introduction of main and maintenance sstable sets into scrub compaction. It would be good to have the fix backported to versions 6.1 and 6.2.

Closes scylladb/scylladb#21582

* github.com:scylladb/scylladb:
  compaction: remove unused `update_sstable_lists_on_off_strategy_completion`
  compaction_group: replace `update_sstable_lists_on_off_strategy_completion`
  compaction_group: rename `update_main_sstable_list_on_compaction_completion`
  compaction_group: update maintenance sstable set on scrub compaction completion
  compaction_group: store table::sstable_list_builder::result in replacement_desc
  table::sstable_list_builder: remove old sstables only from current list
  table::sstable_list_builder: return removed sstables from build_new_list
2024-12-02 13:32:49 +02:00
Gleb Natapov
474b47ed22 database: move hits rates handling to host ids
Hits rates map is now indexed by ip. Change it to be indexed by host id since this is what
storage proxy uses now.
2024-12-02 10:31:12 +02:00
Botond Dénes
055a36ae55 main: dump diagnostics on SIGQUIT
Dump a diagnostics report on each shard when receiving a SIGQUIT. The
report is logged with a dedicated logger, called diagnostics.
The report has multiple parts:
* seastar memory diagnostics, similar to that printed by the scylla
  memory command (from scylla-gdb.py).
* reader concurrency semaphore diagnostics for each semaphore.

Example report:

    INFO  2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT:
    Dumping seastar memory diagnostics
    Used memory:   3988M
    Free memory:   58M
    Total memory:  4G
    Hard failures: 0

    LSA
      allocated: 4M
      used:      16
      free:      4G

    Cache:
      total: 1M
      used:  642K
      free:  398K

    Memtables:
     total: 3M
     Regular:
      real dirty: 0B
      virt dirty: 0B
     System:
      real dirty: 3M
      virt dirty: 3M

    Replica:
      Read Concurrency Semaphores:
        user: 0/100, 0B/81M, queued: 0
        streaming: 0/10, 0B/81M, queued: 0
        system: 0/10, 0B/81M, queued: 0
        compaction: 0/unlimited, 0B/unlimited
        view update: 0/50, 0B/40M, queued: 0
      Execution Stages:
        apply stage:
             Total: 0
      Tables - Ongoing Operations:
        Pending writes (top 10):
          0 Total (all)
        Pending reads (top 10):
          0 Total (all)
        Pending streams (top 10):
          0 Total (all)

    Small pools:
    objsz spansz usedobj memory unused wst%
        8     4K     858    16K     9K   58
       10     4K       5     8K     8K   99
       12     4K       5     8K     8K   99
       14     4K       0     0B     0B    0
       16     4K      2k    44K    15K   35
       32     4K      4k   136K    16K   11
       32     4K      8k   280K    24K    8
       32     4K      3k    92K     6K    6
       32     4K      4k   140K    21K   14
       48     4K      3k   180K    25K   14
       48     4K      2k   120K    27K   22
       64     4K      2k   156K    18K   11
       64     4K     19k     1M    11K    0
       80     4K      3k   236K    16K    6
       96     4K      6k   572K    49K    8
      112     4K      2k   276K    72K   25
      128     4K     477    80K    20K   25
      160     4K     194    60K    30K   49
      192     4K      1k   232K    39K   16
      224     4K      2k   468K    15K    3
      256     4K     182   100K    55K   54
      320     8K     349   152K    43K   28
      384     8K     332   288K   164K   56
      448     4K     243   180K    74K   40
      512     4K     256   244K   116K   47
      640    16K     185   192K    76K   39
      768    16K     394   432K   137K   31
      896     8K      54   192K   144K   75
     1024     4K     288   432K   144K   33
     1280    32K      92   256K   140K   54
     1536    32K      11   128K   111K   86
     1792    16K      10   144K   126K   87
     2048     8K     487     1M    90K    8
     2560    64K     113   384K   100K   26
     3072    64K       9   256K   228K   89
     3584    32K       3   288K   277K   96
     4096    16K     129   912K   396K   43
     5120   128K      21   384K   275K   71
     6144   128K       4   512K   486K   94
     7168    64K       3   576K   553K   96
     8192    32K     373     3M    56K    1
    10240    64K       6   832K   770K   92
    12288    64K      17   960K   756K   78
    14336   128K       2     1M     1M   97
    16384    64K      14     1M   992K   81

    Page spans:
    index  size  free  used spans
        0    4K    4K    5M    1k
        1    8K    8K    2M   213
        2   16K   16K    2M   106
        3   32K   64K    6M   200
        4   64K   64K    4M    71
        5  128K  384K 3934M   31k
        6  256K    1M  256K     5
        7  512K  512K  512K     2
        8    1M    2M    0B     2
        9    2M    2M    2M     2
       10    4M    4M    0B     1
       11    8M   16M    0B     2
       12   16M   32M    0B     2
       13   32M    0B   32M     1
       14   64M    0B    0B     0
       15  128M    0B    0B     0
       16  256M    0B    0B     0
       17  512M    0B    0B     0
       18    1G    0B    0B     0
       19    2G    0B    0B     0
       20    4G    0B    0B     0
       21    8G    0B    0B     0
       22   16G    0B    0B     0
       23   32G    0B    0B     0
       24   64G    0B    0B     0
       25  128G    0B    0B     0
       26  256G    0B    0B     0
       27  512G    0B    0B     0
       28    1T    0B    0B     0
       29    2T    0B    0B     0
       30    4T    0B    0B     0
       31    8T    0B    0B     0

    INFO  2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT:
    Semaphore user with 0/100 count and 0/84850769 memory resources: user request, dumping permit diagnostics:

    permits	count	memory	table/operation/state

    0	0	0B	total

    Stats:
    permit_based_evictions: 0
    time_based_evictions: 0
    inactive_reads: 0
    total_successful_reads: 0
    total_failed_reads: 0
    total_reads_shed_due_to_overload: 0
    total_reads_killed_due_to_kill_limit: 0
    reads_admitted: 0
    reads_enqueued_for_admission: 0
    reads_enqueued_for_memory: 0
    reads_admitted_immediately: 0
    reads_queued_because_ready_list: 0
    reads_queued_because_need_cpu_permits: 0
    reads_queued_because_memory_resources: 0
    reads_queued_because_count_resources: 0
    reads_queued_with_eviction: 0
    total_permits: 0
    current_permits: 0
    need_cpu_permits: 0
    awaits_permits: 0
    disk_reads: 0
    sstables_read: 0
    INFO  2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT:
    Semaphore streaming with 0/10 count and 0/84850769 memory resources: user request, dumping permit diagnostics:

    permits	count	memory	table/operation/state

    0	0	0B	total

    Stats:
    permit_based_evictions: 0
    time_based_evictions: 0
    inactive_reads: 0
    total_successful_reads: 6
    total_failed_reads: 0
    total_reads_shed_due_to_overload: 0
    total_reads_killed_due_to_kill_limit: 0
    reads_admitted: 6
    reads_enqueued_for_admission: 0
    reads_enqueued_for_memory: 0
    reads_admitted_immediately: 6
    reads_queued_because_ready_list: 0
    reads_queued_because_need_cpu_permits: 0
    reads_queued_because_memory_resources: 0
    reads_queued_because_count_resources: 0
    reads_queued_with_eviction: 0
    total_permits: 6
    current_permits: 0
    need_cpu_permits: 0
    awaits_permits: 0
    disk_reads: 0
    sstables_read: 0
    INFO  2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT:
    Semaphore compaction with 0/2147483647 count and 0/9223372036854775807 memory resources: user request, dumping permit diagnostics:

    permits	count	memory	table/operation/state

    0	0	0B	total

    Stats:
    permit_based_evictions: 0
    time_based_evictions: 0
    inactive_reads: 0
    total_successful_reads: 0
    total_failed_reads: 0
    total_reads_shed_due_to_overload: 0
    total_reads_killed_due_to_kill_limit: 0
    reads_admitted: 0
    reads_enqueued_for_admission: 0
    reads_enqueued_for_memory: 0
    reads_admitted_immediately: 0
    reads_queued_because_ready_list: 0
    reads_queued_because_need_cpu_permits: 0
    reads_queued_because_memory_resources: 0
    reads_queued_because_count_resources: 0
    reads_queued_with_eviction: 0
    total_permits: 27
    current_permits: 0
    need_cpu_permits: 0
    awaits_permits: 0
    disk_reads: 0
    sstables_read: 0
    INFO  2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT:
    Semaphore system with 0/10 count and 0/84850769 memory resources: user request, dumping permit diagnostics:

    permits	count	memory	table/operation/state
    1	0	0B	*.*/view_builder/active

    1	0	0B	total

    Stats:
    permit_based_evictions: 0
    time_based_evictions: 0
    inactive_reads: 0
    total_successful_reads: 234
    total_failed_reads: 0
    total_reads_shed_due_to_overload: 0
    total_reads_killed_due_to_kill_limit: 0
    reads_admitted: 234
    reads_enqueued_for_admission: 154
    reads_enqueued_for_memory: 0
    reads_admitted_immediately: 80
    reads_queued_because_ready_list: 154
    reads_queued_because_need_cpu_permits: 0
    reads_queued_because_memory_resources: 0
    reads_queued_because_count_resources: 0
    reads_queued_with_eviction: 0
    total_permits: 235
    current_permits: 1
    need_cpu_permits: 0
    awaits_permits: 0
    disk_reads: 0
    sstables_read: 0
    INFO  2024-11-27 01:31:55,882 [shard 0:main] diagnostics - Diagnostics dump requested via SIGQUIT:
    Semaphore view_update with 0/50 count and 0/42425384 memory resources: user request, dumping permit diagnostics:

    permits	count	memory	table/operation/state

    0	0	0B	total

    Stats:
    permit_based_evictions: 0
    time_based_evictions: 0
    inactive_reads: 0
    total_successful_reads: 0
    total_failed_reads: 0
    total_reads_shed_due_to_overload: 0
    total_reads_killed_due_to_kill_limit: 0
    reads_admitted: 0
    reads_enqueued_for_admission: 0
    reads_enqueued_for_memory: 0
    reads_admitted_immediately: 0
    reads_queued_because_ready_list: 0
    reads_queued_because_need_cpu_permits: 0
    reads_queued_because_memory_resources: 0
    reads_queued_because_count_resources: 0
    reads_queued_with_eviction: 0
    total_permits: 0
    current_permits: 0
    need_cpu_permits: 0
    awaits_permits: 0
    disk_reads: 0
    sstables_read: 0

Fixes: scylladb/scylladb#7400

Closes scylladb/scylladb#21692
2024-11-28 18:52:29 +02:00
Lakshmi Narayanan Sreethar
0e08ccd307 table::sstable_list_builder: return removed sstables from build_new_list
Updated the method table::sstable_list_builder::build_new_list() to
return the list of sstables that was removed along with the newly built
sstable set. This change will be used to unify the
`update_sstable_lists` variants in a following patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-28 11:25:11 +05:30
Ernest Zaslavsky
793f2c95d1 snapshots: Stop taking snapshots of MVs
Stop taking snapshots of MVs and allow taking snapshot of individual tables, now one can take a snapshot of any base table, any view or index. Also add tests to cover new cases both boost test (using cc code) and pytest (using the API)
Also, update documentation to reflect the change

fixes: #21339
fixes: #20760

Closes scylladb/scylladb#21433
2024-11-26 15:27:30 +02:00
Kefu Chai
a5ee0c896b treewide: migrate from boost::adaptors::filtered to std::views::filter
Modernize the codebase by replacing Boost range adaptors with C++23 standard library views,
reducing external dependencies and leveraging modern C++ language features.

Key Changes:
- Replace `boost::adaptors::filtered` with `std::views::filter`
- Remove `#include <boost/range/adaptor/filtered.hpp>`
- Utilize standard library range views

Motivation:
- Reduce project's external dependency footprint
- Leverage standard library's range and view capabilities
- Improve long-term code maintainability
- Align with modern C++ best practices

Implementation Challenges and Considerations:
1. Range Conversion and Move Semantics
   - `std::ranges::to` adaptor requires rvalue references
   - Necessitated updates to variable and parameter constness
   - Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const`
     from `common` to enable efficient range conversion

2. Range Iteration and Mutation
   - Range views may mutate internal state during iteration
   - Cannot pass ranges by const reference in some scenarios
   - Solution: Pass ranges by rvalue reference to explicitly indicate
     state invalidation

Limitations:
- One instance of `boost::adaptors::filtered` temporarily preserved
  due to lack of a C++23 alternative for `boost::join()`
- A comprehensive replacement will be addressed in a follow-up change

This change is part of our ongoing effort to modernize the codebase,
reducing external dependencies and adopting modern C++ practices.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21648
2024-11-26 14:26:50 +02:00
Lakshmi Narayanan Sreethar
c4db4abcae replica/table: implement get_token_range_after_split() wrappers
Expose the functionality of `tablet_map::get_token_range_after_split()`
via the replica::table class.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-11-11 12:24:00 +05:30