Commit Graph

3411 Commits

Author SHA1 Message Date
Tomasz Grabiec
416cbafd16 Merge '[Backport 6.0] sstables: fix some mixups between the writer's schema and the sstable's schema' from Michał Chojnowski
There are two schemas associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

This series fixes the known mixups between the two — when setting up compression,
and when setting up the bloom filters.

Fixes scylladb/scylladb#16065

The bug is present in all supported versions, so the patch has to be backported to all of them.

(cherry picked from commit a1834efd82)

(cherry picked from commit d10b38ba5b)

(cherry picked from commit 1a8ee69a43)

Refs scylladb/scylladb#19695

Closes scylladb/scylladb#19877

* github.com:scylladb/scylladb:
  sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's
  sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's
  sstables: for i_filter downcasts, use dynamic_cast instead of static_cast
2024-07-29 15:36:52 +02:00
Michał Chojnowski
43ba44ce97 sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

The problem fixed by this patch is that the writer was wrongly creating
the compressor objects based on its own schema, but using them based
based on the sstable's schema the sstable's schema.
This patch forces the writer to use the sstable's schema for both.

(cherry picked from commit 1a8ee69a43)
2024-07-25 12:23:58 +02:00
Michał Chojnowski
d6d3a91283 sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).

It's easy to mix up the two and break something as a result.

The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.

The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.

The problem fixed by this patch is that the writer was wrongly creating
the filter based on its own schema, while the layer outside the writer
was interpreting it as if it was created with the sstable's schema.

This patch forces the writer to pick the filter's parameters based on the
sstable's schema instead.

(cherry picked from commit d10b38ba5b)
2024-07-25 12:23:58 +02:00
Michał Chojnowski
0555d4c30b sstables: for i_filter downcasts, use dynamic_cast instead of static_cast
As of this patch, those static_casts are actually invalid in some cases
(they cast to the wrong type) because of an oversight.
A later patch will fix that. But to even write a reliable reproducer
for the problem, we must force the invalid casts to manifest as a crash
(instead of weird results).

This patch both allows writing a reproducer for the bug and serves
as a bit of defensive programming for the future.

(cherry picked from commit a1834efd82)

# Conflicts:
#	sstables/sstables.cc
2024-07-25 12:23:58 +02:00
Lakshmi Narayanan Sreethar
3c1fd843c8 [Backport 6.0]: sstables: do not reload components of unlinked sstables
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.

The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.

Fixes https://github.com/scylladb/scylladb/issues/19722

Closes scylladb/scylladb#19717

* github.com:scylladb/scylladb:
  boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded
  sstables: do not reload components of unlinked sstables
  sstables/sstables_manager: introduce on_unlink method

(cherry picked from commit 591876b44e)

Backported from #19717 to 6.0

Closes scylladb/scylladb#19830
2024-07-23 23:16:53 +03:00
Botond Dénes
4f165eb3e9 sstables: introduce load_metadata()
Loads just the metadata components. No validation.
Split off from load(), to allow scylla-sstable to partially load an
sstable.

(cherry picked from commit 435c01d1e6)
2024-07-12 10:36:59 +00:00
Kefu Chai
1b2f10a4e7 sstables: use "me" sstable format by default
in 7952200c, we changed the `selected_format` from `mc` to `me`,
but to be backward compatible the cluster starts with "md", so
when the nodes in cluster agree on the "ME_SSTABLE_FORMAT" feature,
the format selector believes that the node is already using "ME",
which is specified by `_selected_format`. even it is actually still
using "md", which is specified by `sstable_manager::_format`, as
changed by 54d49c04. as explained above, it was specified to "md"
in hope to be backward compatible when upgrading from an existign
installation which might be still using "md". but after a second
thought, since we are able to read sstables persisted with older
formats, this concern is not valid.

in other words, 7952200c introduced a regression which changed the
"default" sstable format from `me` to `md`.

to address this, we just change `sstable_manager::_format` to "me",
so that all sstables are created using "me" format.

a test is added accordingly.

Fixes #18995
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 5a0d30f345)

Closes scylladb/scylladb#19422
2024-06-23 19:26:53 +03:00
Raphael S. Carvalho
37f1af2646 sstables: Allow to get free space from underlying storage
That will be used in turn to restrict reshape to 10% of available space
in underlying storage.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 51c7ee889e)
2024-06-20 20:41:41 +00:00
Kefu Chai
5db315930e sstables: fix a typo in comment: s/Mimicks/Mimics/
this typo was identified by the codespell workflow

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18781
2024-05-21 12:14:10 +03:00
Botond Dénes
e1c4e6c151 Merge 'sstables_manager: use maintenance scheduling group to run components reload fiber' from Lakshmi Narayanan Sreethar
PR https://github.com/scylladb/scylladb/pull/18186 introduced a fiber that reloads reclaimed bloom filters when memory becomes available. Use maintenance scheduling group to run that fiber instead of running it in the main scheduling group.

Fixes #18675

Closes scylladb/scylladb#18721

* github.com:scylladb/scylladb:
  sstables_manager: use maintenance scheduling group to run components reload fiber
  sstables_manager: add member to store maintenance scheduling group
2024-05-20 16:38:42 +03:00
Lakshmi Narayanan Sreethar
6f58768c46 sstables_manager: use maintenance scheduling group to run components reload fiber
Fixes #18675

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-19 15:23:45 +05:30
Lakshmi Narayanan Sreethar
79f6746298 sstables_manager: add member to store maintenance scheduling group
Store that maintenance scheduling group inside the sstables_manager. The
next patch will use this to run the components reloader fiber.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-19 15:23:45 +05:30
Raphael S. Carvalho
715ae689c0 Implement fast streaming for intra-node migration
With intra-node migration, all the movement is local, so we can make
streaming faster by just cloning the sstable set of leaving replica
and loading it into the pending one.

This cloning is underlying storage specific, but s3 doesn't support
snapshot() yet (th sstables::storage procedure which clone is built
upon). It's only supported by file system, with help of hard links.
A new generation is picked for new cloned sstable, and it will
live in the same directory as the original.

A challenge I bumped into was to understand why table refused to
load the sstable at pending replica, as it considered them foreign.
Later I realized that sharder (for reads) at this stage of migration
will point only to leaving replica. It didn't fail with mutation
based streaming, because the sstable writer considers the shard --
that the sstable was written into -- as its owner, regardless of what
sharder says. That was fixed by mimicking this behavior during
loading at pending.

test:
./test.py --mode=dev intranode --repeat=100 passes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
c9e6b4dca7 dht: split_range_to_single_shard: Work with static_sharder only
In preparation for intra-node tablet migration, to avoid
using deprecated sharder APIs.

This function is used for generating sstable sharding metadata.
For tablets, it is not invoked, so we can safely work with the
static sharder. The call site already passes static_sharder only.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
4d84451cf1 sstables, gdb: Track readers in a linked list
For the purpose of scylla-gdb.py command "scylla
active-sstables". Before the patch, readers were located by scanning
the heap for live objects with vtable pointers corresponding to
readers. It was observed that the test scylla_gdb/test_misc.py::test_active_sstables started failing like this:

  gdb.error: Error occurred in Python: Cannot access memory at address 0x300000000000000

This could be explained by there being a live object on the heap which
used to be a reader but now is a different object, and the _sst field
contains some other data which is not a pointer.

To fix, track readers explicitly in a linked list so that the gdb
script can reliably walk readers.

Fixes #18618.
2024-05-16 00:28:46 +02:00
Raphael S. Carvalho
0b2ec3063c sstables: Fix incremental_reader_selector (for range reads) with tablets
incremental_reader_selector is the mechanism for incremental comsumption
of disjoint sstables on range reads.

tablet_sstable_set was implemented, such that selector is efficient with
tablets.

The problem is selector is vnode addicted and will only consider a given
set exhausted when maximum token is reached.

With tablets, that means a range read on first tablet of a given shard
will also consume other tablets living in the same shard. That results
in combined reader having to work with empty sstable readers of tablets
that don't intersect with the range of the read. It won't cause extra
I/O because the underlying sstables don't intersect with the range of
the read. It's only unnecessary CPU work, as it involves creating
readers (= allocation), feeding them into combined reader, which will
in turn invoke the sstable readers only to realize they don't have any
data for that range.

With 100k tablets (ranges), and 100 tablets per shard, and ~5 sstables
per tablet, there will be this amount of readers (empty or not):
  (100k * ((100^2 + 100) / 2) * avg_sstable_per_tablet=5) = ~2.5 billions.

~5000 times more readers, it can be quite significant additional cpu
work, even though I/O dominates the most in scans. It's an inefficiency
that we rather get rid of.

The behavior can be observed from logs (there's 1 sstable for each of
4 tablets, but note how readers are created for every single one of
them when reading only 1 tablet range):
```
table - make_reader_v2 - range=(-inf, {-4611686018427387905, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {minimum token, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._34qn42... that has range [{-9151620220812943033, start},{-4813568684827439727, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {-4611686018427387904, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._368nk2... that has range [{-4599560452460784857, start},{-78043747517466964, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {0, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._38lj42... that has range [{851021166589397842, start},{3516631334339266977, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {4611686018427387904, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._3dba82... that has range [{5065088566032249228, start},{9215673076482556375, end}]
```

Fix is about making sure the tablet set won't select past the
supplied range of the read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18556
2024-05-14 07:43:22 +03:00
Lakshmi Narayanan Sreethar
4d22c4b68b sstable_datafile_test: add testcase to test reclaim during reload
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 19:57:40 +05:30
Lakshmi Narayanan Sreethar
0b061194a7 sstables_manager: reload previously reclaimed components when memory is available
When an SSTable is dropped, the associated bloom filter gets discarded
from memory, bringing down the total memory consumption of bloom
filters. Any bloom filter that was previously reclaimed from memory due
to the total usage crossing the threshold, can now be reloaded back into
memory if the total usage can still stay below the threshold. Added
support to reload such reclaimed filters back into memory when memory
becomes available.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar
f758d7b114 sstables_manager: start a fiber to reload components
Start a fiber that gets notified whenever an sstable gets deleted. The
fiber doesn't do anything yet but the following patch will add support
to reload reclaimed components if there is sufficient memory.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar
54bb03cff8 sstables: support reloading reclaimed components
Added support to reload components from which memory was previously
reclaimed as the total memory of reclaimable components crossed a
threshold. The implementation is kept simple as only the bloom filters
are considered reclaimable for now.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
2340ab63c6 sstables_manager: add new intrusive set to track the reclaimed sstables
The new set holds the sstables from where the memory has been reclaimed
and is sorted in ascending order of the total memory reclaimed.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
140d8871e1 sstable: add link and comparator class to support new instrusive set
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
3ef2f79d14 sstable: renamed intrusive list link type
Renamed the intrusive list link type to differentiate it from the set
link type that will be added in an upcoming patch.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
02d272fdb3 sstable: track memory reclaimed from components per sstable
Added a member variable _total_memory_reclaimed to the sstable class
that tracks the total memory reclaimed from a sstable.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar
a53af1f878 sstable: rename local variable in sstable::total_reclaimable_memory_size
Renamed local variable in sstable::total_reclaimable_memory_size in
preparation for the next patch which adds a new member variable
_total_memory_reclaimed to the sstable class.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Botond Dénes
96a7ed7efb Merge 'sstables: add dead row count when issuing warning to system.large_partitions' from Ferenc Szili
This is the second half of the fix for issue #13968. The first half is already merged with PR #18346

Scylla issues warnings for partitions containing more rows than a configured threshold. The warning is issued by inserting a row into the `system.large_partitions` table. This row contains the information about the partition for which the warning is issued: keyspace, table, sstable, partition key and size, compaction time and the number of rows in the partition. A previous PR #18346 also added range tombstone count to this row.

This change adds a new counter for dead rows to the large_partitions table.

This change also adds cluster feature protection for writing into these new counters. This is needed in case a cluster is in the process of being upgraded to this new version, after which an upgraded node writes data with the new schema into `system.large_partitions`, and finally a node is then rolled back to an old version. This node will then revert the schema to the old version, but the written sstables will still contain data with the new counters, causing any readers of this table to throw errors when they encounter these cells.

This is an enhancement, and backporting is not needed.

Fixes #13968

Closes scylladb/scylladb#18458

* github.com:scylladb/scylladb:
  sstable: added test for counting dead rows
  sstable: added docs for system.large_partitions.dead_rows
  sstable: added cluster feature for dead rows and range tombstones
  sstable: write dead_rows count to system.large_partitions
  sstable: added counter for dead rows
2024-05-09 08:26:43 +03:00
Kefu Chai
0b0e661a85 build: bring abseil submodule back
because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689,
the rebuilt abseil package provided by fedora has different settings
than the ones if the tree is built with the sanitizer enabled. this
inconsistency leads to a crash.

to address this problem, we have to reinstate the abseil submodule, so
we can built it with the same compiler options with which we build the
tree.

in this change

* Revert "build: drop abseil submodule, replace with distribution abseil"
* update CMake building system with abseil header include settings
* bump up the abseil submodule to the latest LTS branch of abseil:
  lts_2024_01_16
* update scylla-gdb.py to adapt to the new structure of
  flat_hash_map

This reverts commit 8635d24424.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18511
2024-05-05 23:31:09 +03:00
Pavel Emelyanov
99f9807f15 sstables: Remove operator<<(std::ostream&, const deletion_time&)
It's completely unused, likely in favor of recently added formatter
for the type in question.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18502
2024-05-05 14:43:27 +03:00
Pavel Emelyanov
b8f9eeb82b sstable_directory: Remove _sstable_dir member
It's no longer in use.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-02 13:12:59 +03:00
Pavel Emelyanov
608762adda sstable_directory: Create sstable path with make_path() when logging
The sstable_directory::sstable_filename() should generate a name of an
sstable for log messages. It's not accurate, because it silently assumes
that the filename is on local storage, which might not be the case.
Fixing it is large chage, so for now replace _sstable_dir with explicit
call to make_path(). The change is idempotent, as _sstable_dir is
initialized with the result of make_path() call in constructor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-02 13:12:59 +03:00
Pavel Emelyanov
07c1df575e sstable_directory: Use make_path to construct filesystem lister
The _sstable_dir is used currently, but it's initialized with
make_path() result anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-02 13:12:59 +03:00
Pavel Emelyanov
ef98777b27 sstable_directory: Move some logging around
At the beginning of .process() method there's a log message which path
and which storage is being processed. That's not really nice, because,
e.g. filesystem lister may skip processing quarantine directory. Also,
the registry lister doesn't list entries by their _sstable_dir, but
rather by its _location (spoiler: dir = location / state).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-02 13:08:28 +03:00
Ferenc Szili
b06af5b2b9 sstable: write dead_rows count to system.large_partitions 2024-05-02 11:49:10 +02:00
Ferenc Szili
63e724c974 sstable: added counter for dead rows 2024-05-02 11:49:10 +02:00
Pavel Emelyanov
ba512c52a5 sstable_directory: Use sstable location to initialize registry lister
When populating sstables on boot a bunch of sstable_directory objects is
created. For each sstable there come three -- one for normal, quarantine
and staging state. Each is initialized with sstable location (which is
now a datadir/ks_name/cf_name-and-uuid) and the desired state (a enum
class). When created, the directory object wires up component lister,
depending on which storage options are provided. For local sstables a
legacy filesystem lister is created and it's initialized with a path
where to search files for -- location + / + string(state). But for s3
sstables, that keep their entries in registry, the lister is
errorneously initialized with the same location + / + string(state)
value. The mistake is that sstables in registry keep location and state
in different columns, so for any state lister should query registry with
the same location value (then it filters entries by state on its own).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-26 16:36:47 +03:00
Lakshmi Narayanan Sreethar
6af2659b57 sstables: reclaim_memory_from_components: do not update _recognised_components
When reclaiming memory from bloom filters, do not remove them from
_recognised_components, as that leads to the on-disk filter component
being left back on disk when the SSTable is deleted.

Fixes #18398

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#18400
2024-04-25 19:15:59 +03:00
Botond Dénes
ca26899c36 Merge 'sstable: large data handler needs to count range tombstones as rows' from Ferenc Szili
When issuing warnings about partitions with the number of rows above a configured threshold, the  large partitions handler does not take into consideration the number of range tombstone markers in the total rows count. This fix adds the number of range tombstone markers to the total number of rows and saves this total in system.large_partitions.rows (if it is above the threshold). It also adds a new column range_tombstones to the system.large_partitions table which only contains the number of range tombstone markers for the given partition.

This PR fixes the first part of issue #13968
It does not cover distinguishing between live and dead rows. A subsequent PR will handle that.

Closes scylladb/scylladb#18346

* github.com:scylladb/scylladb:
  sstables: add docs changes for system.large_partitions
  sstable: large data handler needs to count range tombstones as rows
2024-04-25 11:38:30 +03:00
Raphael S. Carvalho
fa2dc5aefa sstables: Fix use-after-move in an error path of FS-based sstable writer
```
sstables/storage.cc:152:21: warning: 'file_path' used after it was moved [bugprone-use-after-move]
        remove_file(file_path).get();
                    ^
sstables/storage.cc:145:64: note: move occurred here
    auto w = file_writer(output_stream<char>(std::move(sink)), std::move(file_path));

```

It's a regression when TOC is found for a new sstable, and we try to delete temporary TOC.

courtesy of clang-tidy.

Fixes #18323.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18367
2024-04-23 17:19:55 +03:00
Botond Dénes
5a1e3b25d0 Merge 'Sanitize sstables::directory_semaphore usage' from Pavel Emelyanov
The semaphore in question is used to limit parallelism of manipulations with table's sstables. It's currently used in two places -- sstable_directory (mainly on boot) and by table::take_snapshot() to take snapshot. For the latter, there's also a database -> sharded<directory_semaphore> reference.

This PR sanitizes the semaphore usage. The results are
- directory_semaphore no longer needs to friend several classes that mess with its internals
- database no longer references directory_semaphore

Closes scylladb/scylladb#18281

* github.com:scylladb/scylladb:
  database: Keep local directory_semaphore to initialize sstables managers
  database: Don't reference directory_semaphore
  table: Use directory semaphore from sstables manager
  table: Indentation fix after previous patch
  table: Use directory_semaphore for rate-limited snapshot taking
  sstables: Move directory_semaphore::parallel_for_each() to header
  sstables: Move parallel_for_each_restricted to directory_semaphore
  table: Use smp::all_cpus() to iterate over all CPUs locally
2024-04-23 13:54:52 +03:00
Lakshmi Narayanan Sreethar
de6570e1ec serializer_impl, sstables: fix build failure due to missing includes
When building scylla with cmake, it fails due to missing includes in
serializer_impl.hh and sstables/compress.hh files. Fix that by adding
the appropriate include files.

Fixes #18343

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#18344
2024-04-23 12:03:51 +03:00
Ferenc Szili
98bec4e02a sstable: large data handler needs to count range tombstones as rows
When issuing warnings about partitions with the number of rows above a configured threshold,
the  large partitions handler does not take into consideration the number of range tombstone
markers in the total rows count. This fix adds the number of range tombstone markers to the
total number of rows and saves this total in system.large_partitions.rows (if it is above
the threshold). It also adds a new column range_tombstones to the system.large_partitions
table which only contains the number of range tombstone markers for the given partition.

This PR fixes the first part of issue #13968
It does not cover distinguishing between live and dead rows. A subsequent PR will handle that.
2024-04-22 15:24:18 +02:00
Kefu Chai
372a4d1b79 treewide: do not define FMT_DEPRECATED_OSTREAM
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.

in this change,

* utils: drop the range formatters in to_string.hh and to_string.c, as
  we don't use them anymore. and the tests for them in
  test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
  we are not able to print the elements using operator<< anymore
  after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
  due to a bug in {fmt} v9, {fmt} fails to format a range whose
  element type is `basic_sstring<uint8_t>`, as it considers it
  as a string-like type, but `basic_sstring<uint8_t>`'s char type
  is signed char, not char. this issue does not exist in {fmt} v10,
  so, in this change, we add a workaround to explicitly specialize
  the type trait to assure that {fmt} format this type using its
  `fmt::formatter` specialization instead of trying to format it
  as a string. also, {fmt}'s generic ranges formatter calls the
  pair formatter's `set_brackets()` and `set_separator()` methods
  when printing the range, but operator<< based formatter does not
  provide these method, we have to include this change in the change
  switching to {fmt}, otherwise the change specializing
  `fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
  for comparing values. but without the operator<< based formatters,
  Boost.Test would not be able to print them. after removing
  the homebrew formatters, we need to use the generic
  `boost_test_print_type()` helper to do this job. so we are
  including `test_utils.hh` in tests so that we can print
  the formattable types.
* treewide: add "#include "utils/to_string.hh" where
  `fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:57:36 +08:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Pavel Emelyanov
2fced3c557 table: Use directory_semaphore for rate-limited snapshot taking
The table::take_snapshot() limits its parallelizm with the help of
direcoty semaphore already, but implements it "by hand". There's already
parallel_for_each() method on the dir.sem. class that does exactly that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Pavel Emelyanov
6514c67fae sstables: Move directory_semaphore::parallel_for_each() to header
It's a template and in order to use it in other .cc files it's more
convenient to move it into a header file

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Pavel Emelyanov
ad1a9d4c11 sstables: Move parallel_for_each_restricted to directory_semaphore
In order not to make sstable_directory mess with private members of this
class. Next patch will also make use of this new method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 13:53:57 +03:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00
Pavel Emelyanov
c60639d582 sstables: Coroutinize drop_caches() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18220
2024-04-15 17:22:59 +03:00
Lakshmi Narayanan Sreethar
d261f0fbea sstable_datafile_test: add tests to verify auto reclamation of components
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar
a36965c474 sstables_manager: support reclaiming memory from components
Reclaim memory from the SSTable that has the most reclaimable memory if
the total reclaimable memory has crossed the threshold. Only the bloom
filter memory is considered reclaimable for now.

Fixes #17747

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-04-02 01:37:47 +05:30