Commit Graph

59 Commits

Author SHA1 Message Date
Ernest Zaslavsky
d2c5765a6b treewide: Move keys related files to a new keys directory
As requested in #22102, #22103 and #22105 moved the files and fixed other includes and build system.

Moved files:
- clustering_bounds_comparator.hh
- keys.cc
- keys.hh
- clustering_interval_set.hh
- clustering_key_filter.hh
- clustering_ranges_walker.hh
- compound_compat.hh
- compound.hh
- full_position.hh

Fixes: #22102
Fixes: #22103
Fixes: #22105

Closes scylladb/scylladb#25082
2025-07-25 10:45:32 +03:00
Michał Chojnowski
746ec1d4e4 test/boost/mvcc_test: fix an overly-strong assertion in test_snapshot_cursor_is_consistent_with_merging
The test checks that merging the partition versions on-the-fly using the
cursor gives the same results as merging them destructively with apply_monotonically.

In particular, it tests that the continuity of both results is equal.
However, there's a subtlety which makes this not true.
The cursor puts empty dummy rows (i.e. dummies shadowed by the partition
tombstone) in the output.
But the destructive merge is allowed (as an expection to the general
rule, for optimization reasons), to remove those dummies and thus reduce
the continuity.

So after this patch we instead check that the output of the cursor
has continuity equal to the merged continuities of version.
(Rather than to the continuity of merged versions, which can be
smaller as described above).

Refs https://github.com/scylladb/scylladb/pull/21459, a patch which did
the same in a different test.
Fixes https://github.com/scylladb/scylladb/issues/13642

Closes scylladb/scylladb#24044
2025-05-08 00:41:01 +02:00
Kefu Chai
7ff0d7ba98 tree: Remove unused boost headers
This commit eliminates unused boost header includes from the tree.

Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22857
2025-02-15 20:32:22 +02:00
Ran Regev
edd56a2c1c moved cache files to db
As requested in #22097, moved the files
and fixed other includes and build system.

Fixes: #22097
Signed-off-by: Ran Regev <ran.regev@scylladb.com>

Closes scylladb/scylladb#22495
2025-02-04 12:21:31 +03:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Kefu Chai
bab12e3a98 treewide: migrate from boost::adaptors::transformed to std::views::transform
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.

in this change, we:

- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
  is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
  `std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
  range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
  returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
  by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
  to feed `std::views::transform()` a view range.

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

limitations:

there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21700
2024-12-03 09:41:32 +02:00
Michał Chojnowski
35921eb67e mvcc_test: fix a benign failure of test_apply_to_incomplete_respects_continuity
For performance reasons, mutation_partition_v2::maybe_drop(), and by extension
also mutation_partition_v2::apply_monotonically(mutation_partition_v2&&)
can evict empty row entries, and hence change the continuity of the merged
entry.

For checking that apply_to_incomplete respects continuity,
test_apply_to_incomplete_respects_continuity obtains the continuity of
the partition entry before and after apply_to_incomplete by calling
e.squashed().get_continuity(). But squashed() uses apply_monotonically(),
so in some circumstances the result of squashed() can have smaller
continuity than the argument of squashed(), which messes with the thing
that the test is trying to check, and causes spurious failures.

This patch changes the method of calculating the continuity set,
so that it matches the entry exactly, fixing the test failures.

Fixes scylladb/scylladb#13757

Closes scylladb/scylladb#21459
2024-11-08 06:08:39 +01:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Benny Halevy
9f05072527 token: make kind-based ctor private
Users outside of the token module don't
need to mess with the token::kind.
They can only create key tokens.
Never, minimum or maximum tokens, with a particular
datya value.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Michał Chojnowski
bed20a2e37 row_cache: use preemption_source in update()
To facilitate testing the state of cache after the update is preempted
at various points, pass a preemption_source& to update() instead of
calling the reactor directly.

In release builds, the calls to preemption_source methods should compile
to the same direct reactor calls as today. Only in dev mode they should
add an extra branch. (However, the `preemption_source&` argument has
to be shoveled in any mode).
2024-02-07 18:31:36 +01:00
Michał Chojnowski
2d1a345068 test: mvcc_test: add a test for gentle schema upgrades 2023-05-04 03:35:15 +02:00
Michał Chojnowski
0273101890 partition_version: remove the unused "from" argument in partition_entry::upgrade()
partition_entry now contains a reference to its schema, so it doesn't have to
be supplied by the caller anymore.
2023-05-04 02:37:30 +02:00
Michał Chojnowski
94e4dc3d8d partition_version: add a logalloc::region argument to partition_entry::upgrade()
The argument is currently unused, but will be further propagated to
add_version() in an upcoming patch.
2023-05-04 02:37:29 +02:00
Michał Chojnowski
caaf0bd6bf partition_version: remove _schema from partition_entry::operator<<
operator<< accepts a schema& and a partition_entry&. But since the latter
now contains a reference to its schema inside, the former is redundant.
Remove it.
2023-05-04 02:37:29 +02:00
Michał Chojnowski
f6e11c95e2 partition_version: remove the schema argument from partition_entry::read()
partition_entry now contains a reference to its schema, so it no longer
needs to be supplied by the caller.
2023-05-04 02:37:29 +02:00
Michał Chojnowski
1d01a4a168 partition_version: add a _schema field to partition_version
Currently, partition_version does not reference its schema.
All partition_version reachable from a entry/snapshot have the same schema,
which is referenced in memtable_entry/cache_entry/partition_snapshot.

To enable gentle schema upgrades, we want to use the existing background
version merging mechanism. To achieve that, we will move the schema reference
into partition_version, and we will allow neighbouring MVCC versions to have
different schemas, and we will merge them on-the-fly during reads and
persistently during background version merges.
This way, an upgrade will boil down to adding a new empty version with
the new schema.

This patch adds the _schema field to partition_version and propagates
the schema pointer to it from the version's containers (entry/snapshot).
Subsequent patches will remove the schema references from the containers,
because they are now redundant.
2023-05-04 02:37:29 +02:00
Michał Chojnowski
a70c5704df mutation_partition: change schema_ptr to schema& in mutation_partition constructor
Cosmetic change. See the preceding commit for details.
2023-05-04 02:37:29 +02:00
Michał Chojnowski
781514acfe mutation_partition_v2: change schema_ptr to schema& in mutation_partition_v2 constructor
We don't have a convention for when to pass `schema_ptr` and and when to pass
`const schema&` around.
In general, IMHO the natural convention for such a situation is to pass the
shared pointer if the callee might extend the lifetime of shared_ptr,
and pass a reference otherwise. But we convert between them willy-nilly
through shared_from_this().

While passing a reference to a function which actually expects a shared_ptr
can make sense (e.g. due to the fact that smart pointers can't be passed in
registers), the other way around is rather pointless.

This patch takes one occurence of that and modifies the parameter to a reference.

Since enable_shared_from_this makes shared pointer parameters and reference
parameters interchangeable, this is a purely cosmetic change.
2023-05-04 02:37:29 +02:00
Michał Chojnowski
88a0871729 mutation_partition: remove apply_weak()
apply_weak is just an alias for apply(), and most of its variants
are dead code. Get rid of it.
2023-05-04 02:37:29 +02:00
Kefu Chai
3ae11de204 treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:53 +08:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Avi Kivity
f73e2c992f Merge 'Keep range tombstones with rows in memtables and cache' from Tomasz Grabiec
This series switches memtable and cache to use a new representation for mutation data,
called `mutation_partition_v2`. In this representation, range tombstone information is stored
in the same tree as rows, attached to row entries. Each entry has a new tombstone field,
which represents range tombstone part which applies to the interval between this entry and
the previous one. See docs/dev/mvcc.md for more details about the format.

The transient mutation object still uses the old model in order to avoid work needed to adapt
old code to the new model. It may also be a good idea to live with two models, since the
transient mutation has different requirements and thus different trade-offs can be made.
Transient mutation doesn't need to support eviction and strong exception guarantees,
so its algorithms and in-memory representation can be simpler.

This allows us to incrementally evict range tombstone information. Before this series,
range tombstones were accumulated and evicted only when the whole partition entry was evicted. This
could lead to inefficient use of cache memory.

Another advantage of the new representation is that reads don't have to lookup
range tombstone information in a different tree while reading. This leads to simpler
and more efficient readers.

There are several disadvantages too. Firstly, rows_entry is now larger by 16 bytes.
Secondly, update algorithms are more complex because they need to deoverlap range tombstone
information. Also, to handle preemption and provide strong exception guarantees, update
algorithms may need to allocate sentinel entries, which adds complexity and reduces performance.

The memtable reader was changed to use the same cursor implementation
which cache uses, for improved code reuse and reducing risk of bugs
due to discrepancy of algorithms which deal with MVCC.

Remaining work:
  - performance optimizations to apply_monotonically() to avoid regressions
  - performance testing
  - preemption support in apply_to_incomplete (cache update from memtable)

Fixes #2578
Fixes #3288
Fixes #10587

Closes #12048

* github.com:scylladb/scylladb:
  test: mvcc: Extend some scenarios with exhaustive consistency checks on eviction
  test: mvcc: Extract mvcc_container::allocate_in_region()
  row_cache, lru: Introduce evict_shallow()
  test: mvcc: Avoid copies of mutation under failure injection
  test: mvcc: Add missing logalloc::reclaim_lock to test_apply_is_atomic
  mutation_partition_v2: Avoid full scan when applying mutation to non-evictable
  Pass is_evictable to apply()
  tests: mutation_partition_v2: Introduce test_external_memory_usage_v2 mirroring the test for v1
  tests: mutation: Fix test_external_memory_usage() to not measure mutation object footprint
  tests: mutation_partition_v2: Add test for exception safety of mutation merging
  tests: Add tests for the mutation_partition_v2 model
  mutation_partition_v2: Implement compact()
  cache_tracker: Extract insert(mutation_partition_v2&)
  mvcc, mutation_partition: Document guarantees in case merging succeeds
  mutation_partition_v2: Accept arbitrary preemption source in apply_monotonically()
  mutation_partition_v2: Simplify get_continuity()
  row_cache: Distinguish dummy insertion site in trace log
  db: Use mutation_partition_v2 in mvcc
  range_tombstone_change_merger: Introduce peek()
  readers: Extract range_tombstone_change_merger
  mvcc: partition_snapshot_row_cursor: Handle non-evictable snapshots
  mvcc: partition_snapshot_row_cursor: Support digest calculation
  mutation_partition_v2: Store range tombstones together with rows
  db: Introduce mutation_partition_v2
  doc: Introduce docs/dev/mvcc.md
  db: cache_tracker: Introduce insert() variant which positions before existing entry in the LRU
  db: Print range_tombstone bounds as position_in_partition
  test: memtable_test: Relax test_segment_migration_during_flush
  test: cache_flat_mutation_reader: Avoid timestamp clash
  test: cache_flat_mutation_reader_test: Use monotonic timestamps when inserting rows
  test: mvcc: Fix sporadic failures due to compact_for_compaction()
  test: lib: random_mutation_generator: Produce partition tombstone less often
  test: lib: random_utils: Introduce with_probability()
  test: lib: Improve error message in has_same_continuity()
  test: mvcc: mvcc_container: Avoid UB in tracker() getter when there is no tracker
  test: mvcc: Insert entries in the tracker
  test: mvcc_test: Do not set dummy::no on non-clustering rows
  mutation_partition: Print full position in error report in append_clustered_row()
  db: mutation_cleaner: Extract make_region_space_guard()
  position_in_partition: Optimize equality check
  mvcc: Fix version merging state resetting
  mutation_partition: apply_resume: Mark operator bool() as explicit
2023-02-05 22:33:10 +02:00
Raphael S. Carvalho
3c5afb2d5c test: Enable Scylla test command line options for boost tests
We have enabled the command line options without changing a
single line of code, we only had to replace old include
with scylla_test_case.hh.

Next step is to add x-log-compaction-groups options, which will
determine the number of compaction groups to be used by all
instantiations of replica::table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:14:51 -03:00
Tomasz Grabiec
c9c476afd7 test: mvcc: Extend some scenarios with exhaustive consistency checks on eviction 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
80de99cb1b test: mvcc: Extract mvcc_container::allocate_in_region() 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
f2832046e9 test: mvcc: Avoid copies of mutation under failure injection
Speeds up the test a bit because we avoid the copy when converting to
mutation_partition_v2 in apply().
2023-01-27 21:56:31 +01:00
Tomasz Grabiec
b8980f68f0 test: mvcc: Add missing logalloc::reclaim_lock to test_apply_is_atomic 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
bc35fa7696 Pass is_evictable to apply() 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
919ff433d1 tests: Add tests for the mutation_partition_v2 model 2023-01-27 21:56:31 +01:00
Tomasz Grabiec
026f8cc1e7 db: Use mutation_partition_v2 in mvcc
This patch switches memtable and cache to use mutation_partition_v2,
and all affected algorithms accordingly.

The memtable reader was changed to use the same cursor implementation
which cache uses, for improved code reuse and reducing risk of bugs
due to discrepancy of algorithms which deal with MVCC.

Range tombstone eviction in cache has now fine granularity, like with
rows.

Fixes #2578
Fixes #3288
Fixes #10587
2023-01-27 21:56:28 +01:00
Tomasz Grabiec
71057412ed test: mvcc: Fix sporadic failures due to compact_for_compaction()
compact_for_compaction() will perform cell expiration based on
gc_clock::now(), which introduces sporadic mismatches due to expiry
status of a row marker.

Drop this, we can rely on compaction done by is_equal_to_compacted()
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
08f68c5f20 test: mvcc: mvcc_container: Avoid UB in tracker() getter when there is no tracker 2023-01-27 19:15:38 +01:00
Tomasz Grabiec
5aa8cb56a8 test: mvcc: Insert entries in the tracker
evictable snapshots must have all entries added to the
tracker. Partition version merging assumes this. Before this was
benign, but will start to trigger asserts in mutation_partition_v2.
2023-01-27 19:15:38 +01:00
Tomasz Grabiec
9d38997971 test: mvcc_test: Do not set dummy::no on non-clustering rows
This will trigger an assert in apply_monotonically() later in the
series, where this row would be merged with a dummy at the same
position. This row must not be marked as non-dummy, there is an
assumption that non-clustering positions are all dummies. There can't
be two entries with the same position an a different dummy status.
2023-01-27 19:15:38 +01:00
Michał Radwański
dcab289656 boost/mvcc_test: use failure_injecting_allocation_strategy where it is meant to
In test_apply_is_atomic, a basic form of exception testing is used.
There is failure_injecting_allocation_strategy, which however is not
used for any allocation, since for some reason,
`with_allocator(r.allocator()` is used instead of
`with_allocator(alloc`. Fix that.

Closes #12354
2023-01-10 12:01:36 +01:00
Pavel Emelyanov
6075e01312 test/lib: Remove sstable_utils.hh from simple_schema.hh
The latter is pretty popular test/lib header that disseminates the
former one over whole lot of unit tests. The former, in turn, naturally
includes sstables.hh thus making tons of unrelated tests depend on
sstables class unused by them.

However, simple removal doesn't work, becase of local_shard_only bool
class definition in sstable_utils.hh used in simple_schema.hh. This
thing, in turn, is used in keys making helpers that don't belong to
sstable utils, so these are moved into simple_schema as well.

When done, this affects the mutation_source_test.hh, which needs the
local_shard_only bool class (and helps spreading the sstables.hh
throughout more unrelated tests) and a bunch of .cc test sources that
used sstable_utils.hh to indirectly include various headers of their
demand.

After patching, sstables.hh touches 2x times less tests. As a side
effect the sstables_manager.hh also becomes 2x times less dependent
on by tests.

Continuation of 9bdea110a6

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #12240
2022-12-08 15:37:33 +02:00
Avi Kivity
444de2831e dirty_memory_manager: move to replica module
It's a replica-side thing, so move it there. The related
flush_permit and sstable_write_permit are moved alongside.
2022-12-06 22:24:17 +02:00
Michał Chojnowski
d785364375 cache: make all cache unlinks explicit
Our LSA cache is implemented as an auto_unlink Boost intrusive list, meaning
that elements of the list unlink themselves from the list automatically on
destruction. Some parts of the code rely on that, and don't unlink them
manually.

However, this precludes accurate bookkeeping about the cache. Elements only have
access to themselves and their neighbours, not to any bookkeeping context.
Therefore, a destructor cannot update the relevant metadata.

In this patch, we fix this by adding explicit unlink calls to places where it
would be done by a destructor. In a following patch, we will add an assert to
the destructor to check that every element is unlinked before destruction.
2022-10-17 12:07:27 +02:00
Benny Halevy
0627667a06 mutation_partition: compact_for_compaction: get tombstone_gc_state
And pass down to `do_compact`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-07 07:43:15 +03:00
Tomasz Grabiec
05b0a62132 test: mvcc: Fix illegal use of maybe_refresh()
maybe_refresh() can only be called if the cursor is pointing at a row.
The code was calling it before the cursor was advanced, and was
thus relying on implementation detail.
2022-08-09 02:28:56 +02:00
Tomasz Grabiec
02c92d5ea2 test: mutation: Compare against compacted mutations
Memtables and cache will compact eagerly, so tests should not expect
readers to produce exact mutations written, only those which are
equivalant after applying copmaction.
2022-06-15 11:30:01 +02:00
Tomasz Grabiec
a4e96960b8 mvcc: Apply mutations in memtable with preemption enabled
Preerequisite for eagerly applying tombstones, which we want to be
preemptible. Before the patch, apply path to the memtable was not
preemptible.

Because merging can now be defered, we need to involve snapshots to
kick-off background merging in case of preemption. This requires us to
propagate region and cleaner objects, in order to create a snapshot.
2022-06-15 11:29:43 +02:00
Avi Kivity
5129280f45 Revert "Merge 'memtable, cache: Eagerly compact data with tombstones' from Tomasz Grabiec"
This reverts commit e0670f0bb5, reversing
changes made to 605ee74c39. It causes failures
in debug mode in
database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain,
though with low probability.

Fixes #10780
Reopens #652.
2022-06-14 18:06:22 +03:00
Tomasz Grabiec
374234cf76 test: mutation: Compare against compacted mutations
Memtables and cache will compact eagerly, so tests should not expect
readers to produce exact mutations written, only those which are
equivalant after applying copmaction.
2022-06-06 19:25:40 +02:00
Tomasz Grabiec
0e3c4fc641 mvcc: Apply mutations in memtable with preemption enabled
Preerequisite for eagerly applying tombstones, which we want to be
preemptible. Before the patch, apply path to the memtable was not
preemptible.

Because merging can now be defered, we need to involve snapshots to
kick-off background merging in case of preemption. This requires us to
propagate region and cleaner objects, in order to create a snapshot.
2022-06-06 19:23:37 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Asias He
a8ad385ecd repair: Get rid of the gc_grace_seconds
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.

To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.

In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:

1) GC a tombstone after gc_grace_seconds

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;

This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.

2) Never GC a tombstone

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};

3) GC a tombstone immediately

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};

4) GC a tombstone after repair

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};

In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.

A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.

Tests: compaction_test.py, ninja test

Fixes #3560

[avi: resolve conflicts vs data_dictionary]
2022-01-04 19:48:14 +02:00
Tomasz Grabiec
d0c367f44f mvcc: partition_snapshot: Support slicing range tombstones in reverse 2021-12-19 22:41:35 +01:00
Tomasz Grabiec
757fc1275f partition_snapshot_row_cursor: Support reverse iteration 2021-12-19 22:41:35 +01:00
Tomasz Grabiec
0d7b3f9463 tests: mvcc: Relax monotonicity check
Consecutive range tombstones can have the same position. They will, in
one of the test cases, after the range tombstone merger in
partition_snapshot_flat_reader no longer uses range_tombstone_list to
merge data form multiple versions, which deoverlaps, but rather merges
the streams corresponding to each version, which interleaves range
tombstones from different versions.
2021-07-26 17:27:03 +02:00