Commit Graph

72 Commits

Author SHA1 Message Date
Botond Dénes
f8015d9c26 readers: move combined reader into readers/
Since the combined reader family weighs more than 1K SLOC, it gets its
own .cc file.
2022-03-30 15:42:51 +03:00
Benny Halevy
90edddd7e3 everywhere: use make_flat_mutation_reader_from_mutations_v2
Rather than upgrade_to_v2(make_flat_mutation_reader_from_mutations)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220315083425.2786228-1-bhalevy@scylladb.com>
2022-03-15 11:41:10 +02:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Kamil Braun
168c6f47f9 replica: database: allow disabling optimized TWCS queries through compaction strategy options
As requested from field engineering, add a way to disable
the optimized TWCS query algorithm (use regular query path)
just in case a bug or a performance regression shows up in
production.

To disable the optimized query path, add
'enable_optimized_twcs_queries': 'false' to compaction strategy options,
e.g.
```
alter table ks.t with compaction =
    {'class': 'TimeWindowCompactionStrategy',
     'enable_optimized_twcs_queries': 'false'};
```

Setting the `enable_optimized_twcs_queries` key to anything other than
`'false'` (note: a boolean `false` expands to a string `'false'`) or
skipping it (re)enables the optimized query path.

Note: the flag can be set in a cluster in the middle of upgrade. Nodes
which do not understand it simply ignore it, but they do store it in
their schema tables (they store the entire `compaction` map). After
these nodes are upgraded, they will understand the flag and act
accordingly.

Note: in the situation above, some nodes may use the optimized path and
some may use the regular path. This may happen also in a fully upgraded
cluster when compaction options are changed concurrently to reads;
there is a short period of time where the schema change propagates and
some nodes got the flag but some didn't.

These should not be a problem since the optimization does not change the
returned read results (unless there is a bug).

Generally, the flag is not intended for normal use, but for field
engineers to disable it in case of a serious problem.

Ref #6418.

Closes #9900
2022-01-14 07:10:02 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Botond Dénes
9027c6f936 sstables/sstable_set: create_single_key_sstable_reader() upgrade to v2
With this all methods of the sstable set create v2 readers.
2021-12-20 17:17:33 +02:00
Botond Dénes
e1bbc4a480 mutation_reader: convert make_clustering_combined_reader() to v2
Just sprinkle the right amount downgrade_to_v1() and upgrade_to_v2() to
call sites, no attempts at optimization was done.
2021-12-20 09:29:05 +02:00
Botond Dénes
2364144b19 mutation_reader: convert position_reader_queue to v2
By removing the converting (v1->v2) constructor of
`reader_and_upper_bound` and adjusting its users.
2021-12-20 09:29:05 +02:00
Botond Dénes
aeddcf50a1 mutation_reader: convert make_combined_reader() overloads to v2
Just sprinkle the right amount downgrade_to_v1() and upgrade_to_v2() to
call sites, no attempts at optimization was done.
2021-12-20 09:29:05 +02:00
Botond Dénes
1554b94b78 mutation_reader: combined_reader: convert reader_selector to v2 2021-12-20 09:29:05 +02:00
Raphael S. Carvalho
c6399005a3 sstable_set: update make_crawling_reader() to flat_mutation_reader_v2
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-07 09:37:55 -03:00
Raphael S. Carvalho
aebbe68239 sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-07 09:37:53 -03:00
Raphael S. Carvalho
c3c070a5ca sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-07 09:37:51 -03:00
Raphael S. Carvalho
6b664067dd sstable_set: update incremental_reader_selector to flat_mutation_reader_v2
Cannot be fully converted to flat_mutation_reader_v2 yet, as the
selector is built on combined_reader interface which is still not
converted. So only updated wherever possible.
Subsequent work will update sstable_set readers, which uses the
selector, to flat_mutation_reader_v2.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-07 09:37:49 -03:00
Raphael S. Carvalho
1f3135abb4 sstable_set: use for_each_sstable() in make_crawling_reader()
sstable_set_impl::all() may have to copy all sstables from multiple
sets, if compound. let's avoid this overhead by using
sstable_set_impl::for_each_sstable().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211127181037.56542-1-raphaelsc@scylladb.com>
2021-11-29 19:59:39 +02:00
Botond Dénes
64bb48855c flat_mutation_reader: revamp flat_mutation_reader_from_mutations()
Add schema parameter so that:
* Caller has better control over schema -- especially relevant for
  reverse reads where it is not possible to follow the convention of
  passing the query schema which is reversed compared to that of the
  mutations.
* Now that we don't depend on the mutations for the schema, we can lift
  the restriction on mutations not being empty: this leads to safer
  code. When the mutations parameter is empty, an empty reader is
  created.
Add "make_" prefix to follow convention of similar reader factory
functions.

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>
2021-11-15 17:58:46 +02:00
Raphael S. Carvalho
aa4aba40aa sstables: sstable_run: introduce estimate_droppable_tombstone_ratio
Make it possible to estimate dropppable tombstones for sstable runs.
The result is averaged by number of fragments composing the run.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211014143424.353357-1-raphaelsc@scylladb.com>
2021-10-18 12:24:08 +03:00
Tomasz Grabiec
cc56a971e8 database, treewide: Introduce partition_slice::is_reversed()
Cleanup, reduces noise.

Message-Id: <20211014093001.81479-1-tgrabiec@scylladb.com>
2021-10-14 12:39:16 +03:00
Botond Dénes
41facb3270 treewide: move reversing to the mutation sources
Push down reversing to the mutation-sources proper, instead of doing it
on the querier level. This will allow us to test reverse reads on the
mutation source level.
The `max_size` parameter of `consume_page()` is now unused but is not
removed in this patch, it will be removed in a follow-up to reduce
churn.
2021-09-29 12:15:45 +03:00
Kamil Braun
7dc4ee35c9 sstable_set: time_series_sstable_set: reverse mode
`time_series_sstable_set` uses `clustering_combined_reader` to implement
efficient single-partition reads. It provides a `position_reader_queue`
to the reader. This queue returns readers to the sstables from the set
in order of the sstables' lower bounds, and with each reader it provides
an upper bound for the positions-in-partition returned by the reader.

Until now we would assume non-reversed queries only. Reversed queries
were implemented by performing forward query in the lower layers
and reversing the results at the upper-most layer of the reader stack.
Before pushing the reversing down to the sources (in particular,
to sstable readers), we need to support the reverse mode in
`time_series_sstable_set` and the queue it provides to
`clustering_combined_reader`.

This requires using different lower and upper bounds in the queue.
For non-reversed reads we used `sstable::min_position()` as the lower
bound and `sstable::max_position()` as the upper bound. For reversed
reads all comparisons performed by `clustering_combined_reader` will be
reversed, as it will use a reversed schema. We can then use
`sstable::max_position().reversed()` for the lower bound and
`sstable::min_position().reversed()` for the upper bound.
2021-09-28 17:03:57 +03:00
Botond Dénes
1abf665d1d sstables: wire in crawling reader 2021-09-01 16:21:49 +03:00
Benny Halevy
4476800493 flat_mutation_reader: get rid of timeout parameter
Now that the timeout is taken from the reader_permit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
4439e5c132 everywhere: cleanup defer.hh includes
Get rid of unused includes of seastar/util/{defer,closeable}.hh
and add a few that are missing from source files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-22 21:11:39 +03:00
Michael Livshin
f07306d75c sstables: make sstable::make_reader() return flat_mutation_reader_v2
Rename the old version to `sstables::make_reader_v1()`, to have a
nicely searcheable eradication target.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-08-09 19:20:48 +03:00
Raphael S. Carvalho
f75154afca compaction: Remove overhead of merging reader for cleanup compaction
When perfing cleanup, merging reader showed up as significant.
Given that cleanup is performed on a single sstable at a time,
merging reader becomes an extra layer doing useless work.

     1.71%     1.71%  scylla     scylla               [.] merging_reader<mutation_reader_merger>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#1}::operator()

mutation compactor, to get rid of purgeable expired data
and so on, still consumes the data retrieved by sstable
reader, so no semantic change is done.

With the overhead removed, cleanup becomes ~9% faster, see:

BEFORE

real	1m15.240s
user	0m2.648s
sys	0m0.128s

240MB to 237MB (~98% of original) in 3301ms = 71MB/s.
719MB to 719MB (~99% of original) in 9761ms = 73MB/s.
1GB to 1GB (~100% of original) in 15372ms = 73MB/s.
1GB to 1GB (~100% of original) in 15343ms = 74MB/s.
1GB to 1GB (~100% of original) in 15329ms = 74MB/s.
1GB to 1GB (~100% of original) in 15360ms = 73MB/s.

AFTER

real	1m9.154s
user	0m2.428s
sys	0m0.123s

240MB to 237MB (~98% of original) in 3010ms = 78MB/s.
719MB to 719MB (~99% of original) in 8997ms = 79MB/s.
1GB to 1GB (~100% of original) in 14114ms = 80MB/s.
1GB to 1GB (~100% of original) in 14145ms = 80MB/s.
1GB to 1GB (~100% of original) in 14106ms = 80MB/s.
1GB to 1GB (~100% of original) in 14053ms = 80MB/s.

With 1TB set, ~20m would had been reduced instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210730190713.462135-1-raphaelsc@scylladb.com>
2021-08-02 19:22:41 +03:00
Botond Dénes
2bab76c80e sstables: sstable_set: remove now unused make_restricted_range_sstable_reader() 2021-07-14 17:19:02 +03:00
Raphael S. Carvalho
1924e8d2b6 treewide: Move compaction code into a new top-level compaction dir
Since compaction is layered on top of sstables, let's move all compaction code
into a new top-level directory.
This change will give me extra motivation to remove all layer violations, like
sstable calling compaction-specific code, and compaction entanglement with
other components like table and storage service.

Next steps:
- remove all layer violations
- move compaction code in sstables namespace into a new one for compaction.
- move compaction unit tests into its own file

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>
2021-07-07 23:21:51 +03:00
Raphael S. Carvalho
ab9d08d80e sstables: Remove unused filtering reader from sstable_set::make_local_shard_sstable_reader()
It's been a long time since table no longer accepts shared sstables, so this
code which creates a filtering reader, if sstable is shared, is never used.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210618200026.1002621-2-raphaelsc@scylladb.com>
2021-06-24 12:03:26 +03:00
Raphael S. Carvalho
846f0bd16e sstables: Fix incremental selection with compound sstable set
Incremental selection may not work properly for LCS and ICS due to an
use-after-free bug in partitioned set which came into existence after
compound set was introduced.

The use-after-free happens because partitioned set wasn't taking into
account that the next position can become the current position in the
next iteration, which will be used by all selectors managed by
compound set. So if next position is freed, when it were being used
as current position, subsequent selectors would find the current
position freed, making them produce incorrect results.

Fix this by moving ownership of next pos from incremental_selector_impl
to incremental_selector, which makes it more robust as the latter knows
better when the selection is done with the next pos. incremental_selector
will still return ring_position_view to avoid copies.

Fixes #8802.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210611130957.156712-1-raphaelsc@scylladb.com>
2021-06-13 16:45:07 +03:00
Raphael S. Carvalho
f8b2a6c923 sstables: Optimize incremental selection when only primary set contains sstables
Compound set's incremental selector isn't needed when only one set
contains sstables, which is the common case because secondary set
will only contain data during maintenance operations.
From now on, if only primary set contains data, its selector will
be built directly without compound set's selector acting as an
interposer.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210607193651.126937-1-raphaelsc@scylladb.com>
2021-06-08 10:25:19 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Benny Halevy
6ce826206a sstables: use vector empty method rather than size
Testing std::vector::empty() is slightly more efficient
than testing for `size() > 0`.

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210601115552.155148-2-bhalevy@scylladb.com>
2021-06-06 09:21:23 +03:00
Avi Kivity
0acf5bfca6 build: enable -Wreturn-std-move
Clang warns when "return std::move(x)" is needed to elide a copy,
but the call to std::move() is missing. We disabled the warning during
the migration to clang. This patch re-enables the warning and fixes
the places it points out, usually by adding std::move() and in one
place by converting the returned variable from a reference to a local,
so normal copy elision can take place.

Closes #8739
2021-05-27 21:16:26 +03:00
Pavel Solodovnikov
c3a7b55507 treewide: remove extraneous database.hh includes from headers
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-05-20 01:59:14 +03:00
Pavel Solodovnikov
fff7ef1fc2 treewide: reduce boost headers usage in scylla header files
`dev-headers` target is also ensured to build successfully.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-05-20 01:33:18 +03:00
Benny Halevy
7d42a71310 mutation_reader: position_reader_queue: add close method
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Raphael S. Carvalho
0f7774a6f8 sstables: Extract code to count amount of overlapping into a function
This function will be reused by TWCS reshape when checking if all
sstables in a window are disjoint and can be all compacted together.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-04-21 11:03:16 -03:00
Kamil Braun
5c7ed7a83f time_series_sstable_set: return partition start if some sstables were ck-filtered out
When a particular partition exists in at least one sstable, the cache
expects any single-partition query to this partition to return a `partition_start`
fragment, even if the result is empty.

In `time_series_sstable_set::create_single_key_sstable_reader` it could
happen that all sstables containing data for the given query get
filtered out and only sstables without the relevant partition are left,
resulting in a reader which immediately returns end-of-stream (while it
should return a `partition_start` and if not in forwarding mode, a
`partition_end`). This commit fixes that.

We do it by extending the reader queue (used by the clustering reader
merger) with a `dummy_reader` which will be returned by the queue as
the very first reader. This reader only emits a `partition_start` and,
if not in forwarding mode, a `partition_end` fragment.

Fixes #8447.

Closes #8448
2021-04-14 13:16:00 +02:00
Raphael S. Carvalho
224120f7df sstables: rewrite compound_sstable_set::all()
Procedure is rewritten using std::partition, making it easier to
maintain and it also fixes a theoretical quadratic behavior because
list is entirely copied when extending it, which isn't harmful
because maintenance set will be rarely populated and there are only
2 sets at most.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210409171412.57729-1-raphaelsc@scylladb.com>
2021-04-12 12:45:43 +03:00
Kamil Braun
3687757115 sstables: fix TWCS single key reader sstable filter
The filter passed to `min_position_reader_queue`, which was used by
`clustering_order_reader_merger`, would incorrectly include sstables as
soon as they passed through the PK (bloom) filter, and would include
sstables which didn't pass the PK filter (if they passed the CK
filter). Fortunately this wouldn't cause incorrect data to be returned,
but it would cause sstables to be opened unnecessarily (these sstables
would immediately return eof), resulting in a performance drop. This commit
fixes the filter and adds a regression test which uses statistics to
check how many times the CK filter was invoked.

Fixes #8432.

Closes #8433
2021-04-08 18:03:49 +03:00
Raphael S. Carvalho
8e0a1ca866 sstable_set: Implement compound_sstable_set's create_single_key_sstable_reader()
compound set isn't overriding create_single_key_sstable_reader(), so
default implementation is always called. Although default impl will
provide correct behavior, specialized ones which provides better perf,
which currently is only available for TWCS, were being ignored.

compound set impl of single key reader will basically combine single key
readers of all sets managed by it.

Fixes #8415.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210406205009.75020-1-raphaelsc@scylladb.com>
2021-04-07 12:36:30 +03:00
Raphael S. Carvalho
c86dd125a1 sstables: clean up partitioned_sstable_set::insert()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210322130227.16805-2-raphaelsc@scylladb.com>
2021-03-22 15:30:32 +02:00
Raphael S. Carvalho
48d8cc261e sstables: don't swallow exception in partitioned_sstable_set::insert()
regression introduced by 02b2df1ea9 (Fri Mar 12 01:22:41 2021 -0300).

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210322130227.16805-1-raphaelsc@scylladb.com>
2021-03-22 15:30:31 +02:00
Raphael S. Carvalho
e4b5f5ba33 sstable_set: Introduce compound sstable set
This new sstable set implementation is useful for combining operation of
multiple sstable sets, which can still be referenced individually via
its shared ptr reference.
It will be used when maintenance set is introduced in table, so a
compound set is required to allow both sets to have their operations
efficiently combined.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-03-18 11:42:49 -03:00
Benny Halevy
7862cad669 sstable_set: partitioned_sstable_set: clone: do clone all sstables
The existing implementation wrongfully shares _all sstables
rather than cloning it. This caused a use-after-free
in `repair_meta::do_estimate_partitions_on_local_shard`
when traversing a shared sstable_set, during which
`table::make_reader_excluding_sstables` erased an entry.
The erase should have happened on a cloned copy
of the sstable_list, not on a shared copy.

The regression was introduced in
c3b8757fa1.

Added a unit test that reproduces the share-on-copy issue
for partitioned_stable_set (sstables::sstable_set).

Fixes #8274

Test: unit(release, debug)
DTest: materialized_views_test.py:TestMaterializedViews.simple_repair_test(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210317145552.701559-1-bhalevy@scylladb.com>
2021-03-18 11:15:59 +02:00
Benny Halevy
6846319e65 partitioned_sstables_set: insert: propagate exception
Do not swallow the caught exception.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210316170821.496218-1-bhalevy@scylladb.com>
2021-03-17 13:29:03 +02:00
Raphael S. Carvalho
2065e2c912 partitioned_sstable_set: adjust select_sstable_runs() to work with compound set
compound set will select runs from all of its managed sets, so let's
adjust select_sstable_runs() to only return runs which belong to it.
without this adjustment, selection of runs would fail because
function would try to unconditionally retrieve the run which may
live somewhere else.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210312042255.111060-3-raphaelsc@scylladb.com>
2021-03-17 09:59:22 +02:00
Raphael S. Carvalho
02b2df1ea9 sstable_set: move select_sstable_runs() into partitioned_sstable_set
after compound set is introduced, select_sstable_runs() will no longer
work because the sstable runs live in sstable_set, but they should
actually live in the sstable_set being written to.

Given that runs is a concept that belongs only to strategies which
use partitioned_sstable_set, let's move the implementation of
select_sstable_runs() to it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210312042255.111060-2-raphaelsc@scylladb.com>
2021-03-17 09:59:22 +02:00