Commit Graph

211 Commits

Author SHA1 Message Date
Botond Dénes
674d41e3e6 readers/mutation_source: s/make_reader_v2/make_mutation_reader/ 2025-05-09 07:53:29 -04:00
Botond Dénes
7af0690762 mutation/mutation_compactor: drop v2 from compactor and related names 2025-05-09 07:53:29 -04:00
Botond Dénes
c29c696780 readers: mv from_mutations_v2.hh from_mutations.hh
Completely mechanical change.
2025-04-16 04:46:08 -04:00
Botond Dénes
b104862702 tree: s/make_mutation_reader_from_mutations_v2/make_mutation_reader_from_mutations/s
Completely mechanical change.
2025-04-16 04:46:07 -04:00
Botond Dénes
d67202972a mutation/frozen_mutation: frozen_mutation_consumer_adaptor: fix end-of-partition handling
This adaptor adapts a mutation reader pausable consumer to the frozen
mutation visitor interface. The pausable consumer protocol allows the
consumer to skip the remaining parts of the partition and resume the
consumption with the next one. To do this, the consumer just has to
return stop_iteration::yes from one of the consume() overloads for
clustering elements, then return stop_iteration::no from
consume_end_of_partition(). Due to a bug in the adaptor, this sequence
leads to terminating the consumption completely -- so any remaining
partitions are also skipped.

This protocol implementation bug has user-visible effects, when the
only user of the adaptor -- read repair -- happens during a query which
has limitations on the amount of content in each partition.
There are two such queries: select distinct ... and select ... with
partition limit. When converting the repaired mutation to to query
result, these queries will trigger the skip sequence in the consumer and
due to the above described bug, will skip the remaining partitions in
the results, omitting these from the final query result.

This patch fixes the protocol bug, the return value of the underlying
consumer's consume_end_of_partition() is now respected.

A unit test is also added which reproduces the problem both with select
distinct ... and select ... per partition limit.

Follow-up work:
* frozen_mutation_consumer_adaptor::on_end_of_partition() calls the
  underlying consumer's on_end_of_stream(), so when consuming multiple
  frozen mutations, the underlying's on_end_of_stream() is called for
  each partition. This is incorrect but benign.
* Improve documentation of mutation_reader::consume_pausable().

Fixes: #20084

Closes scylladb/scylladb#23657
2025-04-10 13:19:57 +03:00
Botond Dénes
df09b3f970 replica/mutation_dump: don't assume cells are live
Currently the dumper unconditionally extracts the value of atomic cells,
assuming they are live. This doesn't always hold of course and
attempting to get the value of a dead cell will lead to marshalling
errors. Fix by checking is_live() before attempting to get the cell
value. Fix for both regular and collection cells.
2025-04-08 00:11:36 -04:00
Botond Dénes
c2518cdf1a mutation/mutation_compactor: copy key passed-in to consume_new_partition()
This doesn't introduce additional work for single-partition queries: the
key is copied anyway on consume_end_of_stream().
Multi-partition reads and compaction are not that sensitive to
additional copy added.

This change fixes a bug in the compacting_reader: currently the reader
passes _last_uncompacted_partition_start.key() to the compactor's
consume_new_partition(). When the compactor emits enough content for this
partition, _last_uncompacted_partition_start is moved from to emit the
partition start, this makes the key reference passed to the compaction
corrupt (refer to moved-from value). This in turn means that subsequent
GC checks done by the compactor will be done with a corrupt key and
therefore can result in tombstone being garbage-collected while they
still cover data elsewhere (data resurrection).

The compacting reader is violating the API contract and normally the bug
should be fixed there. We make an exception here because doing the fix
in the mutation compactor better aligns with our future plans:
* The fix simplifies the compactor (gets rid of _last_dk).
* Prepares the way to get rid of the consume API used by the compactor.
2025-04-08 00:11:35 -04:00
Botond Dénes
a2d0d7b9a0 mutation: fold FragmentConsumer[V2] into FlattenedConsumer[V2]
FragmentConsumer[V2] also has no direct users, so fold it into
FlattenedConsumer[V2] as well. With this, FlattenedConsumer[V2] has a
nice and simple definition, with a single nesting level required due to
the return-type flexibility.
2025-03-18 09:24:49 -04:00
Botond Dénes
8768e2e08e mutation: fold StreamedMutationConsumer[V2] into FlattenedConsumer[V2]
No code uses StreamedMutationConsumer[V2] directly, so let's take this
opportunity to reduce the jungle of consumer concepts.
2025-03-18 09:24:44 -04:00
Kefu Chai
a483ff8647 mutation: replace boost::upper_bound with std::ranges::upper_bound
Reduces dependencies on boost/range.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23119
2025-03-04 10:36:57 +03:00
Kefu Chai
6e4cb20a69 tree: implement boost::accumulate with std::ranges library
Replace boost::accumulate() calls with std::ranges facilities. This
change reduces external dependencies and modernizes the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23062
2025-02-26 23:22:02 +02:00
Kefu Chai
6e4df57f97 mutation,test: replace boost::equal with std::ranges::equal
to reduce third-party dependencies and modernize the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22999
2025-02-26 14:27:42 +03:00
Kefu Chai
3cf0f71420 query-result-writer: reorder initialization to prevent use-after-move
Reorder member variable initialization sequence to ensure `pw` is accessed
before being moved. While the current use-after-move warning from clang-tidy
is a false positive, this change:
- Makes the initialization order more logical
- Eliminates misleading static analysis warnings
- Prevents potential future issues if class structure changes

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22830
2025-02-17 13:45:35 +03:00
Li Bo
de8de50fb9 Remove redundant code in mutation_partition.cc
Use the defined `cdef` variable.

Closes scylladb/scylladb#22048
2025-02-15 20:32:22 +02:00
Nadav Har'El
bc7b5926d2 mv: support regular_column_transformation key columns in view
In an earlier patch, we introduced regular_column_transformation,
a new type of computed column that does a computation on a cell in
regular column in the base and returns a potentially transformed cell
(value or deletion, timestamp and ttl). In *this* patch, we wire the
materialized view code to support this new kind of computed column that
is usable as a materialized-view key column. This new type of computed
column is not yet used in this patch - this will come in the next
patch, where we will use it for Alternator GSIs.

Before this patch, the logic of deciding when the view update needs
to create a new row or delete a new one, and which timestamp and ttl
to give to the new row, could depend on one (or two - in Alternator)
cells read from base-table regular columns. In this patch, this logic
is rewritten - the notion of "base table regular columns" is generalized
to the notion of "updatable view key columns" - these are view key
columns that an update may change - because they really are base regular
columns, or a computed function of one (regular_column_transformation).

In some sense, the new code is easier to understand - there is no longer
a separate "compute_row_marker()" function, rather the top-level
generate_update() is now in charge of finding the "updatable view key
columns" and calculate the row marker (timestamp and ttl) as part
of deciding what needs to be done.

But unfortunately the code still has separate code paths for "collection
secondary indexing", and also for old-style column_computation (basically,
only token_column_computation). Perhaps in the future this can be further
simplified.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-02-06 09:59:49 +01:00
Ran Regev
edd56a2c1c moved cache files to db
As requested in #22097, moved the files
and fixed other includes and build system.

Fixes: #22097
Signed-off-by: Ran Regev <ran.regev@scylladb.com>

Closes scylladb/scylladb#22495
2025-02-04 12:21:31 +03:00
Kefu Chai
7215d4bfe9 utils: do not include unused headers
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-14 07:56:39 -05:00
Kefu Chai
353b522ca0 treewide: migrate from boost::adaptors::reversed to std::views::reverse
now that we are allowed to use C++23. we now have the luxury of using
`std::views::reverse`.

- replace `boost::adaptors::transformed` with `std::views::transform`
- remove unused `#include <boost/range/adaptor/reversed.hpp>`

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-07 13:22:00 +02:00
Kefu Chai
f1a0613a39 mutation: remove unused function
`prefixed()` is a static function in `mutation_partition_v2.cc`.
and this function is not used in this translation unit. so let's
remove it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22006
2024-12-20 16:12:10 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Kefu Chai
ce2f80c227 treewide: migrate from boost::make_iterator_range to ranges::subrange
Replace boost::make_iterator_range() with std::ranges::subrange.

This change improves code modernization and reduces external dependencies:

- Replace boost::make_iterator_range() with std::ranges::subrange
- Remove boost/range/iterator_range.hpp include
- Improve iterator type detection in interval.hh using std::ranges::const_iterator_t<Range>

This is part of ongoing efforts to modernize our codebase and minimize
external dependencies.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21787
2024-12-09 21:31:53 +02:00
Kefu Chai
48c8d24345 treewide: drop support for fmt < v10
since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also,
we've switched to the build image based on fedora 40, which ships
fmt-devel v10.2.1, there is no need to support fmt < 10.

in this change, we drop the support fmt < 10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21847
2024-12-09 20:42:38 +02:00
Emil Maskovsky
8191e57036 treewide: fix annotations reported by GH checks
Clean up the unnecessary includes reported by the GitHub checks that are
polluting the PR diffs.

The "utils/assert.hh" report should be actually fixed by the #21739, but
as the usage of `SEASTAR_ASSERT()` is protected by the `SEASTAR_DEBUG`
check it makes sense to include the header conditionally as well.

Closes scylladb/scylladb#21817
2024-12-09 13:44:12 +03:00
Kefu Chai
61ae4a1c86 mutation: remove unused "#include"s
This commit follows up on commit f436edfa22, which initially cleaned up
unused #include directives in the "mutation" subdirectory. This change
removes additional unused header files that were missed in the previous
cleanup.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21740
2024-12-04 15:36:33 +03:00
Benny Halevy
d5d4307a20 scylla-sstable: dump-summary: print also first and last tokens
To help scylla-manager restore to map sstables to
nodes or tablets, print also the tokens of the
sstable first and last keys.

For example, the json output will now look like this:
```
$ build/dev/scylla sstable dump-summary /tmp/scylla-344593/data/ks/t-52a92590afd011ef9b68ba86378ed63b/me-3glp_0tm9_00uv52doobo0bvk2t7-big-Data.db | jq
{
  "sstables": {
    "/tmp/scylla-344593/data/ks/t-52a92590afd011ef9b68ba86378ed63b/me-3glp_0tm9_00uv52doobo0bvk2t7-big-Data.db": {
      "header": {
        "min_index_interval": 128,
        "size": 1,
        "memory_size": 16,
        "sampling_level": 128,
        "size_at_full_sampling": 0
      },
      "positions": [
        4
      ],
      "entries": [
        {
          "key": {
            "token": "2008715943680221220",
            "raw": "000400000064",
            "value": "100"
          },
          "position": 0
        }
      ],
      "first_key": {
        "token": "2008715943680221220",
        "raw": "000400000064",
        "value": "100"
      },
      "last_key": {
        "token": "9010454139840013625",
        "raw": "000400000003",
        "value": "3"
      }
    }
  }
}
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#21735
2024-12-04 10:16:13 +02:00
Kefu Chai
bab12e3a98 treewide: migrate from boost::adaptors::transformed to std::views::transform
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.

in this change, we:

- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
  is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
  `std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
  range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
  returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
  by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
  to feed `std::views::transform()` a view range.

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

limitations:

there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21700
2024-12-03 09:41:32 +02:00
Kefu Chai
f436edfa22 mutation: remove unused "#include"s
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

please note, because `mutation/mutation.hh` does not include
`seastar/coroutine/maybe_yield.hh` anymore, and quite a few source
files were relying on this header to bring in the declaration of
`maybe_yield()`, we have to include this header in the places where
this symbol is used. the same applies to `seastar/core/when_all.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-29 14:01:44 +08:00
Avi Kivity
1c26c8deeb mutation: mutation_partition_v2.hh: switch from boost ranges to std ranges
Consolidate on one range solution. Fallout in mutation_partition_v2.cc
and row_cache_test.cc due to interoperability problems is adjusted.
2024-11-15 14:36:28 +02:00
Avi Kivity
de822d3a46 mutation: mutation_partition.hh: switch from boost ranges to std ranges
Consolidate on one range solution. Fallout in mutation_partition.cc
due to interoperability problems is adjusted.
2024-11-15 14:09:31 +02:00
Kefu Chai
00810e6a01 treewide: include seastar/core/format.hh instead of seastar/core/print.hh
The later includes the former and in addition to `seastar::format()`,
`print.hh` also provides helpers like `seastar::fprint()` and
`seastar::print()`, which are deprecated and not used by scylladb.

Previously, we include `seastar/core/print.hh` for using
`seastar::format()`. and in seastar 5b04939e, we extracted
`seastar::format()` into `seastar/core/format.hh`. this allows us
to include a much smaller header.

In this change, we just include `seastar/core/format.hh` in place of
`seastar/core/print.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21574
2024-11-14 17:45:07 +02:00
Michał Chojnowski
35921eb67e mvcc_test: fix a benign failure of test_apply_to_incomplete_respects_continuity
For performance reasons, mutation_partition_v2::maybe_drop(), and by extension
also mutation_partition_v2::apply_monotonically(mutation_partition_v2&&)
can evict empty row entries, and hence change the continuity of the merged
entry.

For checking that apply_to_incomplete respects continuity,
test_apply_to_incomplete_respects_continuity obtains the continuity of
the partition entry before and after apply_to_incomplete by calling
e.squashed().get_continuity(). But squashed() uses apply_monotonically(),
so in some circumstances the result of squashed() can have smaller
continuity than the argument of squashed(), which messes with the thing
that the test is trying to check, and causes spurious failures.

This patch changes the method of calculating the continuity set,
so that it matches the entry exactly, fixing the test failures.

Fixes scylladb/scylladb#13757

Closes scylladb/scylladb#21459
2024-11-08 06:08:39 +01:00
Avi Kivity
ee92784098 serialization: replace boost::type with std::type_identity
Recently, seastar rpc started accepting std::type_identity in addition
to boost::type as a type marker (while labeling the latter with an
ominous deprecation warning). Reduce our depedendency on boost
by switching to std::type_identity.
2024-11-05 00:43:27 +01:00
Avi Kivity
2531dc2d80 schema_registry: stop including replica/database.hh
database.hh is a hotspot that changes often (or its dependencies
do). Avoid including it to reduce recompilations.

Closes scylladb/scylladb#21407
2024-11-04 13:16:27 +01:00
Avi Kivity
907da210b6 compound_compat: replace use of boost ranges with std ranges
To reduce the dependency load, replace use of boost ranges
with the std equivalent.

Files that lost the indirect boost dependency have it added as a
direct dependency.
2024-10-30 19:58:07 +02:00
Kefu Chai
6ead5a4696 treewide: move log.hh into utils/log.hh
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.

in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-22 06:54:46 +03:00
Avi Kivity
d12ba753e0 utils/unconst, mutation_partition: switch to ranges
unconst is a small help that converts a const iterator to a non-const
iterator with the help of the container. Currently it is using the
boost iterator/range libraries.

Convert it to <ranges> as part of an effort to standardize on a single
range library. Its only user in mutation_partition is converted as well.

Due to more iteroperability problems between <range> and boost, some
calls to boost::adaptors::reversed have to be converted as well.
2024-10-07 17:30:12 +03:00
Avi Kivity
e99426df60 treewide: de-static namespace scope functions in headers
'static inline' is always wrong in headers - if the same header is
included multiple times, and the function happens not to be inlined,
then multiple copies of it will be generated.

Fix by mechanically changing '^static inline' to 'inline'.
2024-10-01 14:02:50 +03:00
Tomasz Grabiec
adf99402c5 Merge 'readers/flat_mutation_reader_v2: call set_close_required() from consume*()' from Botond Dénes
The `consume*()` variants just forward the call to the `_impl` method with the same name. The latter, being a member of `::impl`, will bypass the top level `fill_buffer()`, etc. methods and thus will never call `set_close_required()`. Do this in the top-level `consume*()` methods instead, to ensure a reader, on which only `consume*()` is called, and then is destroyed, will complain as it should (and abort).
Only one place was found in core code, which didn't close the reader: `split_mutation() in `mutation/mutation.cc` and this reader is the "from-mutation" one which has no real close routine. All other places were in tests. All this is to say, there were no real bugs uncovered by this PR.

Fixes #16520

Improvement, no backport required.

Closes scylladb/scylladb#16522

* github.com:scylladb/scylladb:
  readers/flat_mutation_reader_v2: call set_close_required() from consume*()
  test/boost/sstable_compaction_test: close reader after use
  test/boost/repair_test: close reader after use
  mutation/mutation: split_mutation(): close reader after use
2024-09-17 13:21:34 +02:00
Botond Dénes
1a11f9cf95 mutation/mutation: split_mutation(): close reader after use 2024-09-13 06:52:26 -04:00
Botond Dénes
c7c5817808 Merge 'Improve timestamp heuristics for tombstone garbage collection' from Benny Halevy
When purging regular tombstone consult the min_live_timestamp, if available.
This is safe since we don't need to protect dead data from resurrection, as it is already dead.

For shadowable_tombstones, consult the min_memtable_live_row_marker_timestamp,
if available, otherwise fallback to the min_live_timestamp.

If we see in a view table a shadowable tombstone with time T, then in any row where the row marker's timestamp is higher than T the shadowable tombstone is completely ignored and it doesn't hide any data in any column, so the shadowable tombstone can be safely purged without any effect or risk resurrecting any deleted data.

In other words, rows which might cause problems for purging a shadowable tombstone with time T are rows with row markers older or equal T. So to know if a whole sstable can cause problems for shadowable tombstone of time T, we need to check if the sstable's oldest row marker (and not oldest column) is older or equal T. And the same check applies similarly to the memtable.

If both extended timestamp statistics are missing, fallback to the legacy (and inaccurate) min_timestamp.

Fixes scylladb/scylladb#20423
Fixes scylladb/scylladb#20424

> [!NOTE]
> no backport needed at this time
> We may consider backport later on after given some soak time in master/enterprise
> since we do see tombstone accumulation in the field under some materialized views workloads

Closes scylladb/scylladb#20446

* github.com:scylladb/scylladb:
  cql-pytest: add test_compaction_tombstone_gc
  sstable_compaction_test: add mv_tombstone_purge_test
  sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection
  sstable_compaction_test: tombstone_purge_test: add testlog debugging
  sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp
  sstable, compaction: add debug logging for extended min timestamp stats
  compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats
  compaction: define max_purgeable_fn
  tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh
  sstables: scylla_metadata: add ext_timestamp_stats
  compaction_group, storage_group, table_state: add extended timestamp stats getters
  sstables, memtable: track live timestamps
  memtable_encoding_stats_collector: update row_marker: do nothing if missing
2024-09-13 08:56:51 +03:00
Kefu Chai
3e84d43f93 treewide: use seastar::format() or fmt::format() explicitly
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.

that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:

```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
  265 |     return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
      |            ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
 4290 |     format(format_string<_Args...> __fmt, _Args&&... __args)
      |     ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
  143 | format(fmt::format_string<A...> fmt, A&&... a) {
      | ^
```

in this change, we

change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
  `seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
  because, `sstring::operator std::basic_string` would incur a deep
  copy.

we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-11 23:21:40 +03:00
Benny Halevy
5849ba83e0 sstable, compaction: add debug logging for extended min timestamp stats
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
7d893a5ed9 compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats
When purging regular tombstone consult the min_live_timestamp,
if available.

For shadowable_tombstones, consult the
min_memtable_live_row_marker_timestamp,
if available, otherwise fallback to the min_live_timestamp.

If both are missing, fallback to the legacy
(and inaccurate) min_timestamp.

Fixes scylladb/scylladb#20423
Fixes scylladb/scylladb#20424

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
57e9e9c369 compaction: define max_purgeable_fn
Before we add a new, is_shadowable, parameter to it.

And define global `can_always_purge` and `can_never_purge`
functions, a-la `always_gc` and `never_gc`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
b6fabd98c6 tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh
And define `never_gc` globally, same as `always_gc`

Before adding a new, is_shadowable parameter to it.

Since it is used in the context of compaction
it better fits compaction_garbage_collector header
rather than tombstone.hh

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:57 +03:00
Benny Halevy
14d86a3a12 sstables, memtable: track live timestamps
When garbage collecting tombstones, we care only
about shadowing of live data.  However, currently
we track min/max timestamp of both live and dead
data, but there is no problem with purging tombstones
that shadow dead data (expired or shdowed by other
tombstones in the sstable/memtable).

Also, for shadowable tombstones, we track live row marker timestamps
separately since, if the live row marker timestamp is greater than
a shadowable tombstone timestamp, then the row marker
would shadow the shadowable tombstone thus exposing the cells
in that row, even if their timestasmp may be smaller
than the shadow tombstone's.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-09-10 19:05:49 +03:00
Łukasz Paszkowski
ba2f037af5 mutation_partition: drop reverse parameter in compact_for_query
The reverse parameter is no longer used with native reverse reads.
The row ranges are provided in native reverse order together with
a reversed schema, thus the reverse parameter remain false all the
time and can be droped.
2024-08-13 10:07:12 +02:00
Łukasz Paszkowski
8b5ec0e963 streamed_mutation_freezer: drop the reverse parameter
The reverse parameter is no longer used with native reverse reads.
A reversed schema is provided and thus the reverse parameter shall
remain false all the time.
2024-08-13 10:07:12 +02:00
Łukasz Paszkowski
da95f44adc readers: Use reversed schema and native reversed slices
The reconcilable_result is built as it would be constructed for
forward read queries for tables with reversed order.

Mutations constructed for reversed queries are consumed forward.

Drop overloaded reversed functions that reverse read_command and
reconcilable_result directly and keep only those requiring smart
pointers. They are not used any more.
2024-08-13 10:03:46 +02:00
Botond Dénes
fb0ab3c1fb mutation/canonical_mutation: add key()
Extracts the partition key without deserializing the entire mutation.
2024-08-11 09:52:37 -04:00