Commit Graph

80 Commits

Author SHA1 Message Date
Botond Dénes
c8ea0715e9 tests: move away from table::make_reader()
Use v2 equivalents instead.
2022-04-01 13:39:26 +03:00
Botond Dénes
b029bd3db7 tree: remove mutation_reader.hh include
In most files it was unused. We should move these to the patch which
moved out the last interesting reader from mutation_reader.hh (and added
the corresponding new header include) but its probably not worth the
effort.
Some other files still relied on mutation_reader.hh to provide reader
concurrency semaphore and some other misc reader related definitions.
2022-03-30 15:42:51 +03:00
Avi Kivity
39504dea61 Merge "Convert result builders to v2" from Botond
"
Namely the query result writer and the reconcilable result builder, used
for building results for regular queries and mutation queries (used in
read repair) respectively.
With this, there are no users left for the v1 output of the compactor,
so we remove that, making the compactor v2 all-the-way (and simpler).
This means that for regular queries, a downgrade phase is eliminated
completely, as regular queries don't store range tombstone in their
result, so no need to convert them.

Tests: unit(dev, release, debug)
"

* 'result-builders-v2/v1' of https://github.com/denesb/scylla:
  reconcilable_result_builder: remove v1 support
  query_result_builder: remove v1 support
  mutation_compactor: drop v1 related code-paths
  mutation_compactor: drop v1 support altogether from the API
  tree: migrate to the v2 consumer APIs
  test/boost/mutation_test: remove v1 specific test code
  querier: switch to v2 compactor output
  reconcilable_result_builder: add v2 support
  query_result_writer: add v2 support
  query_result_builder: make consume(range_tombstone) noop
2022-03-15 14:32:58 +02:00
Benny Halevy
90edddd7e3 everywhere: use make_flat_mutation_reader_from_mutations_v2
Rather than upgrade_to_v2(make_flat_mutation_reader_from_mutations)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220315083425.2786228-1-bhalevy@scylladb.com>
2022-03-15 11:41:10 +02:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Botond Dénes
924ff6a503 mutation_compactor: drop v1 support altogether from the API
Fully mechanical change. Drop all v1 types, template types. Internal
code is left unchanged, will be made v2 only in the next patch.
2022-03-11 09:24:05 +02:00
Botond Dénes
eacdfb2cb7 test/boost/mutation_test: remove v1 specific test code
From test_compactor_range_tombstone_spanning_many_pages, preparing for
the retirement of the v1 output of the compactor.
2022-03-11 09:24:05 +02:00
Botond Dénes
32e9809e9c test/boost/mutation_test: migrate to compact_for_mutation_v2 2022-03-10 09:16:33 +02:00
Botond Dénes
2057db54ab test/boost/mutation_test: test_compactor_range_tombstone_spanning_many_pages extend to check v2 output too 2022-03-10 07:03:49 +02:00
Benny Halevy
a085ef74ff atomic_cell: compare_atomic_cell_for_merge: compare ttl if expiry is equal
Following up on a57c087c89,
compare_atomic_cell_for_merge should compare the ttl value in the
reverse order since, when comparing two cells that are identical
in all attributes but their ttl, we want to keep the cell with the
smaller ttl value rather than the larger ttl, since it was written
at a later (wall-clock) time, and so would remain longer after it
expires, until purged after gc_grace seconds.

Fixes #10173

Test: mutation_test.test_cell_ordering, unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220302154328.2400717-1-bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220306091913.106508-1-bhalevy@scylladb.com>
2022-03-07 11:05:30 +02:00
Benny Halevy
a57c087c89 atomic_cell: compare_atomic_cell_for_merge: compare ttl if expiry is equal
Unlike atomic_cell_or_collection::equals, compare_atomic_cell_for_merge
currently returns std::strong_ordering::equal if two cells are equal in
every way except their ttl:s.

The problem with that is that the cells' hashes are different and this
will cause repair to keep trying to repair discrepancies caused by the
ttl being different.

This may be triggered by e.g. the spark migrator that computes the ttl
based on the expiry time by subtracting the expiry time from the current
time to produce a respective ttl.

If the cell is migrated multiple times at different times, it will generate
cells that the same expiry (by design) but have different ttl values.

Fixes #10156

Test: mutation_test.test_cell_ordering, unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220302154328.2400717-1-bhalevy@scylladb.com>
2022-03-03 15:27:16 +02:00
Avi Kivity
cbba80914d memtable: move to replica module and namespace
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.

Memtables are also used outside the replica, in two places:
 - in some virtual tables; this is also in some way inside the replica,
   (virtual readers are installed at the replica level, not the
   cooordinator), so I don't consider it a layering violation
 - in many sstable unit tests, as a convenient way to create sstables
   with known input. This is a layering violation.

We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).

Test: unit (dev)

Closes #10120
2022-02-23 09:05:16 +02:00
Botond Dénes
f1e9e3b3b7 compact_mutation: drop support for v1 input 2022-02-21 12:29:24 +02:00
Botond Dénes
284ed9154f test: pass v2 input to mutation_compaction 2022-02-21 12:29:24 +02:00
Botond Dénes
dec4e5659b test/boost/mutation_test: simplify test_compaction_data_stream_split test
This test has very elaborate infrastructure essentially duplicating
mutation, mutation::apply() and mutation::operator==. Drop all this
extra code and use mutations directly instead. This makes migrating the
test to v2 easier.
2022-02-21 12:29:24 +02:00
Botond Dénes
fcda35d08e mutation: migrate consume() to v2
The underlying mutation format is still v1, so consume() ends up doing
an online conversion. This allows converting all downstream code to v2,
leaving the conversion close to the code that is yet to be migrated to
v2 native: the mutation itself.
2022-02-21 12:27:55 +02:00
Botond Dénes
eb42213db4 compact_mutation: close active range tombstone on page end
The compactor recently acquired the ability to consume a v2 stream. The
v2 spec requires that all streams end with a null tombstone.
`range_tombstone_assembler`, the component the compactor uses for
converting the v2 input into its v1 output enforces this with a check on
`consume_end_of_partition()`. Normally the producer of the stream the
compactor is consuming takes care of closing the active tombstone before
the stream ends. The compactor however (or its consumer) can decide to
end the consume early, e.g. to cut the current page. When this happens
the compactor must take care of closing the tombstone itself.
Furthermore it has to keep this tombstone around to re-open it on the
next page.
This patch implements this mechanism which was left out of 134601a15e.
It also adds a unit test which reproduces the problems caused by the
missing mechanism.
The compactor now tracks the last clustering position emitted. When the
page ends, this position will be used as the position of the closing
range tombstone change. This ensures the range tombstone only covers the
actually emitted range.

Fixes: #9907

Tests: unit(dev), dtest(paging_test.py, paging_additional_test.py)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220114053215.481860-1-bdenes@scylladb.com>
2022-01-25 09:52:30 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
134601a15e Merge "Convert input side of mutation compactor to v2" from Botond
"
With this series the mutation compactor can now consume a v2 stream. On
the output side it still uses v1, so it can now act as an online
v2->v1 converter. This allows us to push out v2->v1 conversion to as far
as the compactor, usually the next to last component in a read pipeline,
just before the final consumer. For reads this is as far as we can go,
as the intra-node ABI and hence the result-sets built are v1. For
compaction we could go further and eliminate conversion altogether, but
this requires some further work on both the compactor and the sstable
writer and so it is left to be done later.
To summarize, this patchset enables a v2 input for the compactor and it
updates compaction and single partition reads to use it.
"

* 'mutation-compactor-consume-v2/v1' of https://github.com/denesb/scylla:
  table: add make_reader_v2()
  querier: convert querier_cache and {data,mutation}_querier to v2
  compaction: upgrade compaction::make_interposer_consumer() to v2
  mutation_reader: remove unecessary stable_flattened_mutations_consumer
  compaction/compaction_strategy: convert make_interposer_consumer() to v2
  mutation_writer: migrate timestamp_based_splitting_writer to v2
  mutation_writer: migrate shard_based_splitting_writer to v2
  mutation_writer: add v2 clone of feed_writer and bucket_writer
  flat_mutation_reader_v2: add reader_consumer_v2 typedef
  mutation_reader: add v2 clone of queue_reader
  compact_mutation: make start_new_page() independent of mutation_fragment version
  compact_mutation: add support for consuming a v2 stream
  compact_mutation: extract range tombstone consumption into own method
  range_tombstone_assembler: add get_range_tombstone_change()
  range_tombstone_assembler: add get_current_tombstone()
2022-01-12 14:37:19 +02:00
Botond Dénes
aa3c943f4c mutation_reader: remove unecessary stable_flattened_mutations_consumer
Said wrapper was conceived to make unmovable `compact_mutation` because
readers wanted movable consumers. But `compact_mutation` is movable for
years now, as all its unmovable bits were moved into an
`lw_shared_ptr<>` member. So drop this unnecessary wrapper and its
unnecessary usages.
2022-01-07 13:52:07 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Avi Kivity
53a83c4b1e Merge "flat_mutation_reader: convert flat_mutation_reader_from_mutations to v2" from Botond
"
Like flat_mutation_reader_from_fragments, this reader is also heavily
used by tests to compose a specific workload for readers above it. So
instead of converting it, we add a v2 variant and leave the v1 variant
in place.
The v2 variant was written from scratch to have built-in support for
reading in reverse. It is built-on `mutation::consume()` to avoid
duplicating the logic of consuming the contents of the mutation. To
avoid stalls, `mutation::consume()` gets support for pausing and
resuming consuming a mutation.

Tests: unit(dev)
"

* 'flat_mutation_reader_from_mutations_v2/v2' of https://github.com/denesb/scylla:
  flat_mutation_reader: convert make_flat_mutation_reader_from_mutation() v2
  flat_mutation_reader: extract mutation slicing into a function
  mutation: consume(): make it pausable/resumable
  mutation: consume(): restructure clustering iterator initialization
  test/boost/mutation_test: add rebuild test for mutation::consume()
2022-01-05 10:23:17 +02:00
Asias He
a8ad385ecd repair: Get rid of the gc_grace_seconds
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.

To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.

In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:

1) GC a tombstone after gc_grace_seconds

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;

This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.

2) Never GC a tombstone

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};

3) GC a tombstone immediately

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};

4) GC a tombstone after repair

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};

In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.

A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.

Tests: compaction_test.py, ninja test

Fixes #3560

[avi: resolve conflicts vs data_dictionary]
2022-01-04 19:48:14 +02:00
Botond Dénes
5e547dcc8a test/boost/mutation_test: add rebuild test for mutation::consume()
In the next patches we will refactor mutation::consume(). Before doing
that add another test, which rebuilds the consumed mutation, comparing
it with the original.
2022-01-04 11:43:46 +02:00
Botond Dénes
64bb48855c flat_mutation_reader: revamp flat_mutation_reader_from_mutations()
Add schema parameter so that:
* Caller has better control over schema -- especially relevant for
  reverse reads where it is not possible to follow the convention of
  passing the query schema which is reversed compared to that of the
  mutations.
* Now that we don't depend on the mutations for the schema, we can lift
  the restriction on mutations not being empty: this leads to safer
  code. When the mutations parameter is empty, an empty reader is
  created.
Add "make_" prefix to follow convention of similar reader factory
functions.

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>
2021-11-15 17:58:46 +02:00
Tomasz Grabiec
c4328ffc4d tests: mutation_test: Add test for position_in_partition::reversed()
Message-Id: <20210927154942.44236-1-tgrabiec@scylladb.com>
2021-09-28 13:09:39 +02:00
Botond Dénes
f02632aeb0 range_tombstone_accumulator: drop _reversed flag 2021-09-09 15:42:15 +03:00
Botond Dénes
f07805c3ef test/boost/mutation_test: add test for mutation::consume() monotonicity
In both forward and reverse modes.
2021-09-09 15:42:15 +03:00
Pavel Emelyanov
5515f7187d range_tombstone, code: Add range_tombstone& getters
Currently all the code operates on the range_tombstone class.
and many of those places get the range tombstone in question
from the range_tombstone_list. Next patches will make that list
carry (and return) some new object called range_tombstone_entry,
so all the code that expects to see the former one there will
need to patched to get the range_tombstone from the _entry one.

This patch prepares the ground for that by introdusing the

    range_tombstone& tombstone() { return *this; }

getter on the range_tombstone itself and patching all future
users of the _entry to call .tombstone() right now.

Next patch will remove those getters together with adding the new
range_tombstone_entry object thus automatically converting all
the patched places into using the entry in a proper way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-03 19:34:45 +03:00
Benny Halevy
4476800493 flat_mutation_reader: get rid of timeout parameter
Now that the timeout is taken from the reader_permit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Avi Kivity
0909e3c17d treewide: remove redundant "x <=> 0" compares
If x is of type std::strong_ordering, then "x <=> 0" is equivalent to
x. These no-ops were inserted during #1449 fixes, but are now unnecessary.
They have potential for harm, since they can hide an accidental of the
type of x to an arithmetic type, so remove them.

Ref #1449.
2021-07-28 13:30:32 +03:00
Avi Kivity
70f481a1f0 test: mutation_test: convert internal tri-compare to std::strong_ordering
Drop the temporary merge_container() overload we had to support
tri-compares that returned int.
2021-07-28 13:30:07 +03:00
Avi Kivity
d86e529239 serialized_tri_compare: change to std::strong_ordering
Also convert a users in mutation_test.

Ref #1449.
2021-07-28 13:19:00 +03:00
Botond Dénes
2d2b9e7b36 test/boost: migrate off the global test reader semaphore 2021-07-08 16:53:38 +03:00
Botond Dénes
3679418e62 test/lib/test_services: migrate off the global test reader semaphore 2021-07-08 15:28:39 +03:00
Raphael S. Carvalho
1924e8d2b6 treewide: Move compaction code into a new top-level compaction dir
Since compaction is layered on top of sstables, let's move all compaction code
into a new top-level directory.
This change will give me extra motivation to remove all layer violations, like
sstable calling compaction-specific code, and compaction entanglement with
other components like table and storage service.

Next steps:
- remove all layer violations
- move compaction code in sstables namespace into a new one for compaction.
- move compaction unit tests into its own file

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>
2021-07-07 23:21:51 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Emelyanov
d2442a1bb3 tests: Ditch storage_service_for_tests
The purpose of the class in question is to start sharded storage
service to make its global instance alive. I don't know when exactly
it happened but no code that instantiates this wrapper really needs
the global storage service.

Ref: #2795
tests: unit(dev), perf_sstable(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210526170454.15795-1-xemul@scylladb.com>
2021-05-27 14:39:13 +03:00
Avi Kivity
e2e723cc4c build: enable -Wrange-loop-construct warning
This warning triggers when a range for ("for (auto x : range)") causes
non-trivial copies, prompting the developer to replace with a capture
by reference. A few minor violations in the test suite are corrected.

Closes #8699
2021-05-26 10:32:56 +03:00
Pavel Solodovnikov
fff7ef1fc2 treewide: reduce boost headers usage in scylla header files
`dev-headers` target is also ensured to build successfully.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-05-20 01:33:18 +03:00
Benny Halevy
efe938cf1f flat_mutation_reader: make sure to close reader passed to read_mutation_from_flat_mutation_reader
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
4b8dc7ac7e flat_mutation_reader: make sure to close flat_mutation_reader_from_mutations
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:25:47 +03:00
Pavel Emelyanov
64074f45ce code: Relax position_in_partition::tri_compare users
There are some pieces left doing res <=> 0 with the
res now being a strong_ordering itself. All these can
be just dropped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 18:20:39 +03:00
Pavel Emelyanov
a15f158661 test: Convert clustering_fragment_summary::tri_cmp to strong_ordering
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 18:20:39 +03:00
Avi Kivity
f0092ae475 test: mutation_test: prepare merge_container for std::strong_ordering
The function merge_container() accepts a trichotomic comparator returning
an int. As #1449 explains, this is dangerous as it could be mistaken for
a less comparator. Switch to std::strong_ordering, but leave a compatible
merge_container() in place as it is still needed (even after this series).
2021-03-18 12:40:05 +02:00
Botond Dénes
cf28552357 mutation_test: test_mutation_diff_with_random_generator: compact input mutations
This test checks that `mutation_partition::difference()` works correctly.
One of the checks it does is: m1 + m2 == m1 + (m2 - m1).
If the two mutations are identical but have compactable data, e.g. a
shadowable tombstone shadowed by a row marker, the apply will collapse
these, causing the above equality check to fail (as m2 - m1 is null).
To prevent this, compact the two input mutations.

Fixes: #8221
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210310141118.212538-1-bdenes@scylladb.com>
2021-03-10 16:28:14 +01:00
Michał Chojnowski
5b79d6ca4c test: mutation_test: remove an obsolete assertion
Due to small value optimizations, the removed assertions are not true in
general. Until now, atomic_cell did not use small value optimizations, but
it will after upcoming changes.
2021-02-16 21:35:14 +01:00
Michał Chojnowski
aa60f28a09 test: mutation_test: initialize an uninitialized variable
It was assumed to be zero-initialized, but C++ does not guarantee that.
It has to be initialized explicitly.
2021-02-16 21:35:14 +01:00
Pavel Emelyanov
575c992a35 test: Bring test_apply_monotonically_is_monotonic back to work
The idea of the monotonicity checking test is: try to apply
one one random partition to another random one sequentually
failing allocations. Each time allocation fails (with the
bad_alloc exception) -- check the exception guarantee is
respected, then apply (!) the very same two partitions to
each other. At the end of the test we make sure, that an
exception may pop up at any point of application and it
will be safe.

This idea is flawed currently. When verifying the guarantee
the test moves the 2nd partition and leaves it empty for the
next loop iteration. So right on the 2nd attempt to apply
partitions it becomes a no-op, doesn't fail and no more
exceptions arise.

Fix by restoring both partitions at the end of each check.
Broken since 74db08165d.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210129153641.5449-1-xemul@scylladb.com>
2021-01-29 18:47:15 +01:00