Commit Graph

41 Commits

Author SHA1 Message Date
Benny Halevy
2d80057617 range_tombstone_list: insert_from: correct rev.update range_tombstone in not overlapping case
2nd std::move(start) looks like a typo
in fe2fa3f20d.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220404124741.1775076-1-bhalevy@scylladb.com>
2022-04-04 22:26:29 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Tomasz Grabiec
78a6474982 range_tombstone_list: Deoverlap adjacent empty ranges
Appending an empty range adjacent to an existing range tombstone would
not deoverlap (by dropping the empty range tombstone) resulting in
different (non canoncial) result depending on the order of appending.

 Suppose that [a, b] covers [x, x)

Appending [a, x) then [x, b) then [x, x) would give [a, b)
Appending [a, x) then [x, x) then [x, b) would give [a, x), [x, x), [x, b)

Fix by dropping empty range tombstones.
2021-12-13 21:31:36 +01:00
Tomasz Grabiec
fe2fa3f20d range_tombstone_list: Convert to work in terms of position_in_partition
This makes it comprehensible, and a bit simpler.
2021-12-08 15:16:18 +01:00
Michał Radwański
35b1c3ff52 range_tombstone_list: {lower,upper,}slice share comparator
implementation

slice (2 overloads), upper_slice, lower_slice previously had
implementations of a comparator. Move out the common structs, so that
all 4 of them can share implementation.
2021-11-05 10:51:58 +01:00
Michał Radwański
07e78807e6 range_tombstone_list: add lower_slice
lower_slice returns the range tombstones which have end inside range
[start, before).
2021-11-02 10:50:31 +01:00
Pavel Emelyanov
d6af441eaa range_tombstone: Move linkage into range_tombstone_entry
Now it's time to remove the boost set's hook from the range_tombstone
and keep it wrapped into another class if the r._tombstone's location
is the range_tombstone_list.

Also the added previously .tombstone() getters and the _entry alias
can be removed -- all the code can work with the new class.

Two places in the code that made use of without_link{} move-constructor
are patched to get the range_tombstone part from the respective _entry
with the same result.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-03 19:34:45 +03:00
Pavel Emelyanov
b8c585c54d range_tombstone_list: Prepare to use range_tombstone_entry
A continuation of the previous patch. The range_tombstone_list works
with the range_tombstone very actively, kicking every single line
doing this to call .tombstone() seems excessive. Instead, declare
the range_tombstone_entry alias. When the entry will appear for real,
the alias would go away and the range_tombstone_list will be switched
into new entity right at once.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-03 19:34:45 +03:00
Pavel Emelyanov
5515f7187d range_tombstone, code: Add range_tombstone& getters
Currently all the code operates on the range_tombstone class.
and many of those places get the range tombstone in question
from the range_tombstone_list. Next patches will make that list
carry (and return) some new object called range_tombstone_entry,
so all the code that expects to see the former one there will
need to patched to get the range_tombstone from the _entry one.

This patch prepares the ground for that by introdusing the

    range_tombstone& tombstone() { return *this; }

getter on the range_tombstone itself and patching all future
users of the _entry to call .tombstone() right now.

Next patch will remove those getters together with adding the new
range_tombstone_entry object thus automatically converting all
the patched places into using the entry in a proper way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-03 19:34:45 +03:00
Pavel Emelyanov
ae8a5bd046 range_tombstone_list: Factor out tombstone construction
Just add a helper for constructing the managed range
tombstone object. This will also help further patch
have less duplicating hunks in it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-03 19:34:45 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Emelyanov
c8b2079705 range_tombstone_list: Add new slice() helper
There are two of them now -- one to return iterator_range that
covers the given query::clustering_range, the other to return
it for two given positions.

In the next patch the 3rd one is needed -- the slice() to get
iterator_range that's

a) starts strictly after a given position
b) ends after the given clustering_range's end

It will be used to refresh the range tombstones iterators after
some of them will have been emitted. The same thing is currently
done by partition_snapshot_reader's refresh_state wrt rows:

    if (last_row)
        start = rows.upper_bound(last_row) // continuation
    else
        start = rows.lower_bound(range.start) // initial

    end = rows.upper_bound(range.end) // end is the same in
                                      // either case

Respectively for range tombstones the goal is the same.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-03-16 11:55:28 +03:00
Pavel Emelyanov
7e1170ecb9 range_tombstone_list: Introduce iterator_range alias
The range_tombstone_list::slice() set of methods return
back pair of iterators represending a range. In the next
patches this pair will be actively used, and it's handy
to have a shorter alias for it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-03-16 11:55:28 +03:00
Avi Kivity
6f394e8e90 tombstone: use comparison operator instead of ad-hoc compare() function and with_relational_operators
The comparison operator (<=>) default implementation happens to exactly
match tombstone::compare(), so use the compiler-generated defaults. Also
default operator== and operator!= (these are not brought in by operator<=>).
These become slightly faster as they perform just an equality comparison,
not three-way compare.

shadowable_tombstone and row_tombstone depend on tombstone::compare(),
so convert them too in a similar way.

with_relational_operations.hh becomes unused, so delete it.

Tests: unit (dev)
Message-Id: <20200602055626.2874801-1-avi@scylladb.com>
2020-06-02 09:28:52 +03:00
Avi Kivity
f3da043230 Merge "Make in-memory partition version merging preemptable" from Tomasz
"
Partition snapshots go away when the last read using the snapshot is done.
Currently we will synchronously attempt to merge partition versions on this event.
If partitions are large, that may stall the reactor for a significant amount of time,
depending on the size of newer versions. Cache update on memtable flush can
create especially large versions.

The solution implemented in this series is to allow merging to be preemptable,
and continue in the background. Background merging is done by the mutation_cleaner
associated with the container (memtable, cache). There is a single merging process
per mutation_cleaner. The merging worker runs in a separate scheduling group,
introduced here, called "mem_compaction".

When the last user of a snapshot goes away the snapshot is slided to the
oldest unreferenced version first so that the version is no longer reachable
from partition_entry::read(). The cleaner will then keep merging preceding
(newer) versions into it, until it merges a version which is referenced. The
merging is preemtable. If the initial merging is preempted, the snapshot is
enqueued into the cleaner, the worker woken up, and merging will continue
asynchronously.

When memtable is merged with cache, its cleaner is merged with cache cleaner,
so any outstanding background merges will be continued by the cache cleaner
without disruption.

This reduces scheduling latency spikes in tests/perf_row_cache_update
for the case of large partition with many rows. For -c1 -m1G I saw
them dropping from >23ms to 1-2ms. System-level benchmark using scylla-bench
shows a similar improvement.
"

* tag 'tgrabiec/merge-snapshots-gradually-v4' of github.com:tgrabiec/scylla:
  tests: perf_row_cache_update: Test with an active reader surviving memtable flush
  memtable, cache: Run mutation_cleaner worker in its own scheduling group
  mutation_cleaner: Make merge() redirect old instance to the new one
  mvcc: Use RAII to ensure that partition versions are merged
  mvcc: Merge partition version versions gradually in the background
  mutation_partition: Make merging preemtable
  tests: mvcc: Use the standard maybe_merge_versions() to merge snapshots
2018-07-01 15:32:51 +03:00
Vladimir Krivopalov
82f76b0947 Use std::reference_wrapper instead of a plain reference in bound_view.
The presence of a plain reference prohibits the bound_view class from
being copyable. The trick employed to work around that was to use
'placement new' for copy-assigning bound_view objects, but this approach
is ill-formed and causes undefined behaviour for classes that have const
and/or reference members.

The solution is to use a std::reference_wrapper instead.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <a0c951649c7aef2f66612fc006c44f8a33713931.1530113273.git.vladimir@scylladb.com>
2018-06-28 11:24:06 +01:00
Tomasz Grabiec
4d3cc2867a mutation_partition: Make merging preemtable 2018-06-27 12:48:30 +02:00
Tomasz Grabiec
40cc766cf2 database: Add API for incremental clearing of partition entries
Partitions can get very large. Destroying them all at once can stall
the reactor for significant amount of time. We want to avoid that by
doing destruction incrementally, deferring in between. A new API is
added for that at various levels:

  stop_iteration clear_gently() noexcept;

It returns stop_iteration::yes when the object is fully cleared and
can be now destroyed quickly. So a deferring destruction can look like
this:

  return repeat([this] { return clear_gently(); });

The reason why clear_gently() doesn't return a future<> itself is that some
contexts cannot defer, like memory reclamation.
2018-05-30 12:18:56 +02:00
Vladimir Krivopalov
5c3b32a9bf Remove to_boost_visitor heler.
The minimal Boost version required for Scylla now is 1.58 and this
helper is no longer needed.
Replaced it with more generic visitation utils from Seastar.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <e589ace7ac411d3d55dead475a8a2271f51642f1.1520976010.git.vladimir@scylladb.com>
2018-03-14 23:49:07 +00:00
Tomasz Grabiec
dfe48bbbc7 range_tombstone_list: Fix insert_from()
end_bound was not updated in one of the cases in which end and
end_kind was changed, as a result later merging decision using
end_bound were incorrect. end_bound was using the new key, but the old
end_kind.

Fixes #3083.
Message-Id: <1513772083-5257-1-git-send-email-tgrabiec@scylladb.com>
2017-12-20 12:20:20 +00:00
Tomasz Grabiec
9c620e0246 range_tombstone_list: Introduce apply_monotonically() 2017-11-07 15:33:24 +01:00
Tomasz Grabiec
2fe53ac617 range_tombstone_list: Make reverter::erase() exception-safe
erase_undo_op() constructor takes ownership of *it, and destroys it
when it goes out of scope. If emplace_back() fails, *it would be
destroyed before being removed from its container (_dst._tombstones).
Fix by making sure _ops.emplace_back() won't fail.
2017-11-07 15:33:24 +01:00
Tomasz Grabiec
6190f9fc63 range_tombstone_list: Fix memory leaks in case of bad_alloc
If insert() fails, the allocated range_tombstone would not be freed.
Use alloc_strategy_unique_ptr.
2017-11-07 15:33:24 +01:00
Tomasz Grabiec
b6d349728f range_tombstone_list: Introduce slice() working with position range 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
92d6456070 range_tombstone_list: Introduce equal() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
19edb0b535 range_tombstone_list: Make printable 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
2e75595ecf range_tombstone_list: Introduce trim() 2017-06-24 18:06:11 +02:00
Tomasz Grabiec
3c509308ab range_tombstone_list: Merge adjacent range tombstones in apply()
Needed for equivalence to work correctly with difference and addition:

  m1 + (m2 - m1) = m1 + m2

Fixes #2158.
2017-05-23 13:16:03 +02:00
Tomasz Grabiec
1dea251ca2 range_tombstone_list: Avoid violating set invariant
The code was inserting an entry with the same key as its successor,
and only later adjusting the key of the old entry. This is violating
set's invariant of unique keys, and insertion may cause rebalancing. I
don't know if this violation actually causes problems currently, but
it's safer not to.

Fix by first updating the existing entry and then inserting the new
one.
2017-05-23 12:11:12 +02:00
Tomasz Grabiec
a2a22e5f00 range_tombstone_list: Make tombstone merging commutative
Example of non-commutative case:

 a = [1, 5]@t2
 b = {[2, 3]@t1, [4, 5]@t1}

 a + b = [1, 5]@t2
 b + a = [1, 4)@t2, [4, 5]@t2

After this patch, both will yield [1, 5]@t2.

The patch also changes the logic to handle overlaps of tombstones with
equal timestamps to be handled symmetrically. They are now merged
instead of split on either of the boundary.

Refs #2158.
2017-05-23 12:11:12 +02:00
Tomasz Grabiec
c4dac7c80f range_tombstone_list: Add erase() operation to the reverter 2017-05-23 12:11:12 +02:00
Tomasz Grabiec
935709cddc range_tombstone_list: Make all undo operations ordered relative to each other
Later operation may depend on the result of previous operation. Same dependency
is present when reverting the operations.

Fixes assertion failure in update reverter.
2017-05-23 12:11:12 +02:00
Tomasz Grabiec
fb42366552 range_tombstone_list: Introduce erase() 2017-02-13 16:12:15 +01:00
Tomasz Grabiec
8b7f93175c range_tombstone_list: Introduce slice() 2017-02-13 16:12:15 +01:00
Duarte Nunes
85315d1760 range_tombstone_list: Correctly implement difference()
The difference method wasn't properly implemented. The version in this
patch correctly computes the difference and returns a range tombstone
list contains those range tombstones in "this" but absent from the
other, specified range tombstone list.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-01-23 18:14:33 +01:00
Avi Kivity
056b427855 range_tombstone_list: use non-template lambda for cloning tombstones
Using a template lambda invokes a bug in Fedora 24's boost where the
lambda's parameter is an internal boost type rather than a range_tombestone.

Constraining the parameter with an explicit type avoids the problem.
Message-Id: <1466844211-17298-1-git-send-email-avi@scylladb.com>
2016-06-27 10:48:59 +02:00
Paweł Dziepak
a200189541 range_tombstone_list: mark apply() argument as const
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
c24f08a683 range_tombstone_list: compare full tombstones not just timestamps
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Duarte Nunes
95594b8171 mutations: Encapsulate row tombstones difference
This patch moves the difference between two mutation_partition's
row_tombstones inside the range_tombstone_list.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:59 +02:00
Duarte Nunes
284bb6b66f range_tombstone_list: Make it ReversiblyMergeable
This patch implements the ReversiblyMergeable cancellative monoid
for the range_tombstone_list.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00
Duarte Nunes
86030885c8 mutations: Introduce range tombstone list
This class is responsible for representing a set of range tombstones
as non-overlapping disjoint sets of range tombstones.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-02 16:21:58 +02:00