Commit Graph

1424 Commits

Author SHA1 Message Date
Gleb Natapov
512556914a test: move memtable_test.cc to new schema announcement api 2022-01-13 23:10:13 +02:00
Gleb Natapov
be46109af6 test: move cql_query_test.cc to new schema announcement api 2022-01-13 23:09:02 +02:00
Gleb Natapov
5dffc8ed3e test: convert database_test to new schema announcement api 2022-01-13 23:09:02 +02:00
Avi Kivity
134601a15e Merge "Convert input side of mutation compactor to v2" from Botond
"
With this series the mutation compactor can now consume a v2 stream. On
the output side it still uses v1, so it can now act as an online
v2->v1 converter. This allows us to push out v2->v1 conversion to as far
as the compactor, usually the next to last component in a read pipeline,
just before the final consumer. For reads this is as far as we can go,
as the intra-node ABI and hence the result-sets built are v1. For
compaction we could go further and eliminate conversion altogether, but
this requires some further work on both the compactor and the sstable
writer and so it is left to be done later.
To summarize, this patchset enables a v2 input for the compactor and it
updates compaction and single partition reads to use it.
"

* 'mutation-compactor-consume-v2/v1' of https://github.com/denesb/scylla:
  table: add make_reader_v2()
  querier: convert querier_cache and {data,mutation}_querier to v2
  compaction: upgrade compaction::make_interposer_consumer() to v2
  mutation_reader: remove unecessary stable_flattened_mutations_consumer
  compaction/compaction_strategy: convert make_interposer_consumer() to v2
  mutation_writer: migrate timestamp_based_splitting_writer to v2
  mutation_writer: migrate shard_based_splitting_writer to v2
  mutation_writer: add v2 clone of feed_writer and bucket_writer
  flat_mutation_reader_v2: add reader_consumer_v2 typedef
  mutation_reader: add v2 clone of queue_reader
  compact_mutation: make start_new_page() independent of mutation_fragment version
  compact_mutation: add support for consuming a v2 stream
  compact_mutation: extract range tombstone consumption into own method
  range_tombstone_assembler: add get_range_tombstone_change()
  range_tombstone_assembler: add get_current_tombstone()
2022-01-12 14:37:19 +02:00
Avi Kivity
4118f2d8be treewide: replace deprecated seastar::later() with seastar::yield()
seastar::later() was recently deprecated and replaced with two
alternatives: a cheap seastar::yield() and an expensive (but more
powerful) seastar::check_for_io_immediately(), that corresponds to
the original later().

This patch replaces all later() calls with the weaker yield(). In
all cases except one, it's unambiguously correct. In one case
(test/perf scheduling_latency_measurer::stop()) it's not so ambiguous,
since check_for_io_immediately() will additionally force a poll and
so will cause more work to be done (but no additional tasks to be
executed). However, I think that any measurement that relies on
the measuring the work on the last tick to be inaccurate (you need
thousands of ticks to get any amount of confidence in the
measurement) that in the end it doesn't matter what we pick.

Tests: unit (dev)

Closes #9904
2022-01-12 12:19:19 +01:00
Nadav Har'El
7a9f69ec38 Merge 'lister cleanup and test' from Benny Halevy
Split off of #9835.

The series removes extraneous includes of lister.hh from header files
and adds a unit test for lister::scan_dir to test throwing an exception
from the walker function passed to `scan_dir`.

Test: unit(dev)

Closes #9885

* github.com:scylladb/scylla:
  test: add lister_list
  lister: add more overloads of fs::path operator/ for std::string and string_view
  resource_manager: remove unnecessary include of lister.hh from header file
  sstables: sstable_directory: remove unncessary include of lister.hh from header file
2022-01-12 08:20:07 +01:00
Benny Halevy
1e6829e9f1 test: add lister_list
Test the lister class.

In particular the ability to abort the lister
when the walker function throws an exception.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-11 17:04:16 +02:00
Benny Halevy
b9c41dc0fd sstables: sstable_directory: remove unncessary include of lister.hh from header file
The source file depends on it, not the header.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-11 17:04:16 +02:00
Botond Dénes
97d74de8fc Merge "flat_mutation_reader: clone evictable_reader & convert some others" from Michael Livshin
"
The first patch introduces evictable_reader_v2, and the second one
further simplifies it.  We clone instead of converting because there
is at least one downstream (by way of multishard_combining_reader) use
that is not itself straightforward to convert at the moment
(multishard_mutation_query), and because evictable_reader instances
cannot be {up,down}graded (since users also access the undelying
buffers).  This also means that shard_reader, reader_lifecycle_policy
and multishard_combining_reader have to be cloned.
"

* tag 'clone-evictable-reader-to-v2/v3' of https://github.com/cmm/scylla:
  convert make_multishard_streaming_reader() to flat_mutation_reader_v2
  convert table::make_streaming_reader() to flat_mutation_reader_v2
  convert make_flat_multi_range_reader() to flat_mutation_reader_v2
  view_update_generator: remove unneeded call to downgrade_to_v1()
  introduce multishard_combining_reader_v2
  introduce shard_reader_v2
  introduce the reader_lifecycle_policy_v2 abstract base
  evictable_reader_v2: further code simplifications
  introduce evictable_reader_v2 & friends
2022-01-11 17:01:08 +02:00
Michael Livshin
1f27e12dc6 convert make_multishard_streaming_reader() to flat_mutation_reader_v2
All changes are mechanical.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Avi Kivity
4392c20bd3 replica: move distributed_loader into replica module
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.
2022-01-10 15:25:28 +02:00
Botond Dénes
aa3c943f4c mutation_reader: remove unecessary stable_flattened_mutations_consumer
Said wrapper was conceived to make unmovable `compact_mutation` because
readers wanted movable consumers. But `compact_mutation` is movable for
years now, as all its unmovable bits were moved into an
`lw_shared_ptr<>` member. So drop this unnecessary wrapper and its
unnecessary usages.
2022-01-07 13:52:07 +02:00
Botond Dénes
9826b5d732 mutation_writer: migrate timestamp_based_splitting_writer to v2 2022-01-07 13:51:48 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Avi Kivity
d01e1a774b Merge 'Build performance: do not include the entire <seastar/net/ip.hh>' from Nadav Har'El
The header file <seastar/net/ip.hh> is a large collection of unrelated stuff, and according to ClangBuildAnalyzer, takes 2 seconds to compile for every source file that included it - and unfortunately virtually all Scylla source files included it - through either "types.hh" or "gms/inet_address.hh". That's 2*300 CPU seconds wasted.

In this two-patch series we completely eliminate the inclusion of <seastar/net/ip.hh> from Scylla. We still need the ipv4_address, ipv6_address types (e.g., gms/inet_address.hh uses it to hold a node's IP address) so those were split (in a Seastar patch that is already in) from ip.hh into separate small header files that we can include.

This patch reduces the entire build time (of build/dev/scylla) by 4% - reducing almost 10 sCPU minutes (!) from the build.

Closes #9875

* github.com:scylladb/scylla:
  build performance: do not include <seastar/net/ip.hh>
  build performance: speed up inclusion of <gm/inet_address.hh>
2022-01-05 17:55:07 +02:00
Nadav Har'El
6012f6f2b6 build performance: do not include <seastar/net/ip.hh>
In a previous patch, we noticed that the header file <gm/inet_address.hh>,
which is included, directly or indirectly, by most source files,
includes <seastar/net/ip.hh> which is very slow to compile, and
replaced it by the much faster-to-include <seastar/net/ipv[46]_address.hh>.

However, we also included <seastar/net/ip.hh> in types.hh - and that
too is included by almost every file, so the actual saving from the
above patch was minimal. So in this patch we replace this include too.
After this patch Scylla does not include <seastar/net/ip.hh> at all.

According to ClangBuildAnalyzer, this reduces the average time to include
types.hh (multiply this by 312 times!) from 4 seconds to 1.8 seconds,
and reduces total build time (dev mode) by about 3%.

Some of the source files were now missing some include directives, that
were previously included in ip.hh - so we need to add those explicitly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-01-05 17:29:21 +02:00
Avi Kivity
53a83c4b1e Merge "flat_mutation_reader: convert flat_mutation_reader_from_mutations to v2" from Botond
"
Like flat_mutation_reader_from_fragments, this reader is also heavily
used by tests to compose a specific workload for readers above it. So
instead of converting it, we add a v2 variant and leave the v1 variant
in place.
The v2 variant was written from scratch to have built-in support for
reading in reverse. It is built-on `mutation::consume()` to avoid
duplicating the logic of consuming the contents of the mutation. To
avoid stalls, `mutation::consume()` gets support for pausing and
resuming consuming a mutation.

Tests: unit(dev)
"

* 'flat_mutation_reader_from_mutations_v2/v2' of https://github.com/denesb/scylla:
  flat_mutation_reader: convert make_flat_mutation_reader_from_mutation() v2
  flat_mutation_reader: extract mutation slicing into a function
  mutation: consume(): make it pausable/resumable
  mutation: consume(): restructure clustering iterator initialization
  test/boost/mutation_test: add rebuild test for mutation::consume()
2022-01-05 10:23:17 +02:00
Botond Dénes
62d82b8b0e flat_mutation_reader: convert make_flat_mutation_reader_from_mutation() v2
Since this reader is also heavily used by tests to compose a specific
workload for readers above it, we just add a v2 variant, instead of
changing the existing v1 one.
The v2 variant was written from scratch to have built-in support for
reading in reverse. It is built-on `mutation::consume()` to avoid
duplicating the logic of consuming the contents of the mutation.

A v2 native unit test is also added.
2022-01-05 09:06:16 +02:00
Asias He
a8ad385ecd repair: Get rid of the gc_grace_seconds
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.

To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.

In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:

1) GC a tombstone after gc_grace_seconds

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;

This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.

2) Never GC a tombstone

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};

3) GC a tombstone immediately

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};

4) GC a tombstone after repair

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};

In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.

A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.

Tests: compaction_test.py, ninja test

Fixes #3560

[avi: resolve conflicts vs data_dictionary]
2022-01-04 19:48:14 +02:00
Botond Dénes
5e547dcc8a test/boost/mutation_test: add rebuild test for mutation::consume()
In the next patches we will refactor mutation::consume(). Before doing
that add another test, which rebuilds the consumed mutation, comparing
it with the original.
2022-01-04 11:43:46 +02:00
Avi Kivity
9e74556413 Merge 'Support reverse reads in the row cache natively' from Tomasz Grabiec
This change makes row cache support reverse reads natively so that reversing wrappers are not needed when reading from cache and thus the read can be executed efficiently, with similar cost as the forward-order read.

The database is serving reverse reads from cache by default after this. Before, it was bypassing cache by default after 703aed3277.

Refs: #1413

Tests:

  - unit [dev]
  - manual query with build/dev/scylla and cache tracing on

Closes #9454

* github.com:scylladb/scylla:
  tests: row_cache: Extend test_concurrent_reads_and_eviction to run reverse queries
  row_cache: partition_snapshot_row_cursor: Print more details about the current version vector
  row_cache: Improve trace-level logging
  config: Use cache for reversed reads by default
  config: Adjust reversed_reads_auto_bypass_cache description
  row_cache: Support reverse reads natively
  mvcc: partition_snapshot: Support slicing range tombstones in reverse
  test: flat_mutation_reader_assertions: Consume expected range tombstones before end_of_partition
  row_cache: Log produced range tombstones
  test: Make produces_range_tombstone() report ck_ranges
  tests: lib: random_mutation_generator: Extract make_random_range_tombstone()
  partition_snapshot_row_cursor: Support reverse iteration
  utils: immutable-collection: Make movable
  intrusive_btree: Make default-initialized iterator cast to false
2021-12-29 16:53:25 +02:00
Botond Dénes
aba68c8f83 Merge "reader_concurrency_semaphore: convert to flat_mutation_reader_v2" from Michael
"
The second patch in this series is a mechanical conversion of
reader_concurrency_semaphore to flat_mutation_reader_v2, and caller
updates.

The first patch is needed to pass the test suite, since without it a
real reader version conversion would happen on every entry to and exit
from reader_concurrency_semaphore, which is stressful (for example:
mutation_reader_test.test_multishard_streaming_reader reaches 8191
conversions for a couple of readers, which somehow causes it to catch
SIGSEGV in diverse and seemingly-random places).

Note that in a real workload it is unreasonable to expect readers being
parked in a reader_concurrency_semaphore to be pristine, so
short-circuiting their version conversions will be impossible and this
workaround will not really help.
"

* tag 'rcs-v2-v4' of https://github.com/cmm/scylla:
  reader_concurrency_semaphore: convert to flat_mutation_reader_v2
  short-circuit flat mutation reader upgrades and downgrades
2021-12-22 15:08:31 +02:00
Michael Livshin
a1b8ba23d2 reader_concurrency_semaphore: convert to flat_mutation_reader_v2
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-12-21 11:26:17 +02:00
Raphael S. Carvalho
64ec1c6ec6 table: Make sure major compaction doesn't miss data in memtable
Make sure that major will compact data in all sstables and memtable,
as tombstones sitting in memtable could shadow data in sstables.
For example, a tombstone in memtable deleting a large partition could
be missed in major, so space wouldn't be saved as expected.
Additionally, write amplification is reduced as data in memtable
won't have to travel through tiers once flushed.

Fixes #9514.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211217160055.96693-2-raphaelsc@scylladb.com>
2021-12-21 07:21:34 +02:00
Botond Dénes
55bb70a878 Merge "Make sure TWCS per-window major includes all files" from Raphael
"
TWCS perform STCS on a window as long as it's the most recent one.
From there on, TWCS will compact all files in the past window into
a single file. With some moderate write load, it could happen that
there's still some compaction activity in that past window, meaning
that per-window major may miss some files being currently compacted.
As a result, a past window may contain more than 1 file after all
compaction activity is done on its behalf, which may increase read
amplification. To avoid that, TWCS will now make sure that per-window
major is serialized, to make sure no files are missed.

Fixes #9553.

tests: unit(dev).
"

* 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla:
  TWCS: Make sure major on past window is done on all its sstables
  TWCS: remove needless param for STCS options
  TWCS: kill unused param in newest_bucket()
  compaction: Implement strategy control and wire it
  compaction: Add interface to control strategy behavior.
2021-12-20 17:12:50 +02:00
Avi Kivity
e772fcbd57 Merge "Convert combined reader to v2" from Botond
"
Users are adjusted by sprinkling `upgrade_to_v2()` and
`downgrade_to_v1()` where necessary (or removing any of these where
possible). No attempt was made to optimize and reduce the amount of
v1<->v2 conversions. This is left for follow-up patches to keep this set
small.

The combined reader is composed of 3 layers:
1. fragment producer - pop fragments from readers, return them in batches
  (each fragment in a batch having the same type and pos).
2. fragment merger - merge fragment batches into single fragments
3. reader implementation glue-code

Converting layers (1) and (3) was mostly mechanical. The logic of
merging range tombstone changes is implemented at layer (2), so the two
different producer (layer 1) implementations we have share this logic.

Tests: unit(dev)
"

* 'combined-reader-v2/v4' of https://github.com/denesb/scylla:
  test/boost/mutation_reader_test: add test_combined_reader_range_tombstone_change_merging
  mutation_reader: convert make_clustering_combined_reader() to v2
  mutation_reader: convert position_reader_queue to v2
  mutation_reader: convert make_combined_reader() overloads to v2
  mutation_reader: combined_reader: convert reader_selector to v2
  mutation_reader: convert combined reader to v2
  mutation_reader: combined_reader: attach stream_id to mutation_fragments
  flat_mutation_reader_v2: add v2 version of empty reader
  test/boost/mutation_reader_test: clustering_combined_reader_mutation_source_test: fix end bound calculation
2021-12-20 14:01:03 +02:00
Botond Dénes
7f331cee01 test/boost/mutation_reader_test: add test_combined_reader_range_tombstone_change_merging
Stressing the range tombstone change merging logic.
2021-12-20 09:29:05 +02:00
Botond Dénes
e1bbc4a480 mutation_reader: convert make_clustering_combined_reader() to v2
Just sprinkle the right amount downgrade_to_v1() and upgrade_to_v2() to
call sites, no attempts at optimization was done.
2021-12-20 09:29:05 +02:00
Botond Dénes
2364144b19 mutation_reader: convert position_reader_queue to v2
By removing the converting (v1->v2) constructor of
`reader_and_upper_bound` and adjusting its users.
2021-12-20 09:29:05 +02:00
Botond Dénes
aeddcf50a1 mutation_reader: convert make_combined_reader() overloads to v2
Just sprinkle the right amount downgrade_to_v1() and upgrade_to_v2() to
call sites, no attempts at optimization was done.
2021-12-20 09:29:05 +02:00
Botond Dénes
1554b94b78 mutation_reader: combined_reader: convert reader_selector to v2 2021-12-20 09:29:05 +02:00
Tomasz Grabiec
1c80d7fec4 tests: row_cache: Extend test_concurrent_reads_and_eviction to run reverse queries 2021-12-19 22:43:52 +01:00
Tomasz Grabiec
63351483f0 row_cache: Support reverse reads natively
Some implementation notes below.

When iterating in reverse, _last_row is after the current entry
(_next_row) in table schema order, not before like in the forward
mode.

Since there is no dummy row before all entries, reverse iteration must
be now prepared for the fact that advancing _next_row may land not
pointing at any row. The partition_snapshot_row_cursor maintains
continuity() correctly in this case, and positions the cursor before
all rows, so most of the code works unchanged. The only excpetion is
in move_to_next_entry(), which now cannot assume that failure to
advance to an entry means it can end a read.

maybe_drop_last_entry() is not implemented in reverse mode, which may
expose reverse-only workload to the problem of accumulating dummy
entries.

ensure_population_lower_bound() was not updating _last_row after
inserting the entry in latets version. This was not a problem for
forward reads because they do not modify the row in the partition
snapshot represented by _last_row. They only need the row to be there
in the latest version after the call. It's different for reveresed
reads, which change the continuity of the entry represented by
_last_row, hence _last_row needs to have the iterator updated to point
to the entry from the latest version, otherwise we'd set the
continuity of the previous version entry which would corrupt the
continuity.
2021-12-19 22:41:35 +01:00
Tomasz Grabiec
d0c367f44f mvcc: partition_snapshot: Support slicing range tombstones in reverse 2021-12-19 22:41:35 +01:00
Tomasz Grabiec
757fc1275f partition_snapshot_row_cursor: Support reverse iteration 2021-12-19 22:41:35 +01:00
Raphael S. Carvalho
f508f54f3e table: move min_compaction_threshold() and compaction_enforce_min_threshold() into table_state
Compaction specific methods can be implemented in table_state only,
as they aren't needed elsewhere.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211214191822.164223-1-raphaelsc@scylladb.com>
2021-12-17 10:00:31 +02:00
Botond Dénes
f15f4952be test/boost/mutation_reader_test: clustering_combined_reader_mutation_source_test: fix end bound calculation
Currently the test assumes that fragments represent weakly monotonic
upper bounds and therefore unconditionally overwrites the upper-bound on
receiving each fragment. Range tombstones however violate this as a
range tombstone with a smaller position (lower bound) may have a higher
upper bound than some or all fragments that follow it in the stream.
This causes test failures after the converting the combined reader to
v2, but not before, no idea why.
2021-12-16 14:57:49 +02:00
Pavel Emelyanov
b2a62d2b59 Merge 'db: range_tombstone_list: Deoverlap empty range tombstones' from Tomasz Grabiec
Appending an empty range adjacent to an existing range tombstone would
not deoverlap (by dropping the empty range tombstone) resulting in
different (non canoncial) result depending on the order of appending.

Suppose that range tombstone [a, b] covers range tombstone [x, x), and [a, x) and [x, b) are range tombstones which correspond to [a, b] split around position x.

Appending [a, x) then [x, b) then [x, x) would give [a, b)
Appending [a, x) then [x, x) then [x, b) would give [a, x), [x, x), [x, b)

The fix is to drop empty range tombstones in range_tombstone_list so that the result is canonical.

Fixes #9661

Closes #9764

* github.com:scylladb/scylla:
  range_tombstone_list: Deoverlap adjacent empty ranges
  range_tombstone_list: Convert to work in terms of position_in_partition
2021-12-16 10:00:40 +03:00
Avi Kivity
d768e9fac5 cql3, related: switch to data_dictionary
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.

data_dictionary::database::real_database() is called from several
places, for these reasons:

 - calling yet-to-be-converted code
 - callers with a legitimate need to access data (e.g. system_keyspace)
   but with the ::database accessor removed from query_processor.
   We'll need to find another way to supply system_keyspace with
   data access.
 - to gain access to the wasm engine for testing whether used
   defined functions compile. We'll have to find another way to
   do this as well.

The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.

Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
2021-12-15 13:54:23 +02:00
Avi Kivity
3ac622bdd8 Merge "Add v2 versions of make_forwadable() and make_flat_mutation_reader_from_fragments()" from Botond
"
These two readers are crucial for writing tests for any composable
reader so we need v2 versions of them before we can convert and test the
combined reader (for example). As these two readers are often used in
situations where the payload they deliver is specially crafted for the
test at hand, we keep their v1 versions too to avoid conversion meddling
with the tests.

Tests: unit(dev)
"

* 'forwarding-and-fragment-reader-v2/v1' of https://github.com/denesb/scylla:
  flat_mutation_reader_v2: add make_flat_mutation_reader_from_fragments()
  test/lib/mutation_source_test: don't force v1 reader in reverse run
  mutation_source: add native_version() getter
  flat_mutation_reader_v2: add make_forwardable()
  position_in_partition: add after_key(position_in_partition_view)
  flat_mutation_reader: make_forwardable(): fix indentation
  flat_mutation_reader: make_forwardable(): coroutinize reader
2021-12-14 20:43:09 +02:00
Benny Halevy
32d61a3d09 test: sstable_directory_test_table_lock_works: verify that truncate is blocked on the the table lock
The test in its current form is invalid, as database::remove
does removing the table's name from its listing
as well as from the keyspace metadata, so it won't be found
after that.

That said, database::drop_column_family then proceeds
to truncate and stop the table, after calling await_pending_ops,
and the latter should indeed block on the lock taken by the test.

This change modifies the test to create some sstables in the
table's directory before starting the sstable_directory.

Then, when executing "drop table" in the background,
wait until the table is not found by db.find_column_family

That would fail the test before this change.
See https://jenkins.scylladb.com/job/scylla-enterprise/job/next/1442/artifact/testlog/x86_64_debug/sstable_directory_test.sstable_directory_test_table_lock_works.4720.log
```
INFO  2021-12-13 14:00:17,298 [shard 0] schema_tables - Dropping ks.cf id=00487bc0-5c1d-11ec-9e3b-a44f824027ae version=b10c4994-31c7-3f5a-9591-7fedb0273c82
test/boost/sstable_directory_test.cc(453): fatal error: in "sstable_directory_test_table_lock_works": unexpected exception thrown by table_ok.get()
```

A this point, the test verifies again that the sstables are still on
disk (and no truncate happened), and only after drop completed,
the table should not exist on disk.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211214104407.2225080-1-bhalevy@scylladb.com>
2021-12-14 14:26:17 +02:00
Tomasz Grabiec
b228ddabb7 Merge "Move schema altering statement to raft" from Gleb
The series is on top of "wire up schema raft state machine". It will
apply without, but will not work obviously (when raft is disabled it
does nothing anyway).

This series does not provide any linearisability just yet though. It
only uses raft as a means to distribute schema mutations. To achieve
linearisability more work is needed. We need to at lease make sure
that schema mutation use monotonically increasing timestamps and,
since schema altering statement are RMW, no modification to schema
were done between schema mutation creation and application. If there
were an operation needs to be restarted.

* scylla-dev/gleb/raft-schema-v5: (59 commits)
  cql3: cleanup mutation creation code in ALTER TYPE
  cql3: use migration_manager::schema_read_barrier() before accessing a schema in altering statements
  cql3: bounce schema altering statement to shard 0
  migration_manager: add is_raft_enabled() to check if raft is enabled on a cluster
  migration_manager: add schema_read_barrier() function
  migration_manager: make announce() raft aware
  migration_manager: co-routinize announce() function
  migration_manager: pass raft_gr to the migration manager
  migration_manager: drop view_ptr array from announce_column_family_update()
  mm: drop unused announce_ methods
  cql3: drop schema_altering_statement::announce_migration()
  cql3: drop has_prepare_schema_mutations() from schema altering statement
  cql3: drop announce_migration() usage from schema_altering_statement
  cql3: move DROP AGGREGATE statement to prepare_schema_mutations() api
  migration_manager: add prepare_aggregate_drop_announcement() function
  cql3: move DROP FUNCTION statement to prepare_schema_mutations() api
  migration_manager: add prepare_function_drop_announcement() function
  cql3: move CREATE AGGREGATE statement to prepare_schema_mutations() api
  migration_manager: add prepare_new_aggregate_announcement() function
  cql3: move CREATE FUNCTION statement to prepare_schema_mutations() api
  ...
2021-12-14 11:05:32 +01:00
Tomasz Grabiec
78a6474982 range_tombstone_list: Deoverlap adjacent empty ranges
Appending an empty range adjacent to an existing range tombstone would
not deoverlap (by dropping the empty range tombstone) resulting in
different (non canoncial) result depending on the order of appending.

 Suppose that [a, b] covers [x, x)

Appending [a, x) then [x, b) then [x, x) would give [a, b)
Appending [a, x) then [x, x) then [x, b) would give [a, x), [x, x), [x, b)

Fix by dropping empty range tombstones.
2021-12-13 21:31:36 +01:00
Raphael S. Carvalho
8eace8fc49 TWCS: Make sure major on past window is done on all its sstables
Once current window is sealed, TWCS is supposed to compact all its
sstables into one. If there's ongoing compaction, it can happen that
sstables are missed and therefore past windows will contain more than
one sstable. Additionally, it could happen that major doesn't happen
at all if under heavy load. All these problems are fixed by serializing
major on past window and also postponing it if manager refuses to run
the job now.

Fixes #9553.

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-13 16:10:43 -03:00
Raphael S. Carvalho
2dc890d8e6 TWCS: remove needless param for STCS options
STCS option can be retrieved from class member, as newest_bucket()
is no longer a static function. let's get rid of it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-13 16:05:40 -03:00
Raphael S. Carvalho
41a5736aaf TWCS: kill unused param in newest_bucket()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-13 16:05:36 -03:00
Raphael S. Carvalho
49f40c8791 compaction: Implement strategy control and wire it
This implements strategy control interface for both manager and
tests, and wire it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-13 16:05:23 -03:00
Avi Kivity
e44a28dce4 Merge "compaction: Allow data from different buckets (e.g. windows) to be compacted together" from Raphael
"
Today, data from different buckets (e.g. windows) cannot be compacted together because
mutation compactor happens inside each consumer, where each consumer is done on behalf
of a particular bucket. To solve this problem, mutation compaction process is being
moved from consumer into producer, such that interposer consumer, which is responsible
for segregation, will be feeded with compacted data and forward it into the owner bucket.

Fixes #9662.

tests: unit(debug).
"

* 'compact_across_buckets_v2' of github.com:raphaelsc/scylla:
  tests: sstable_compaction_test: add test_twcs_compaction_across_buckets
  compaction: Move mutation compaction into producer for TWCS
  compaction: make enable_garbage_collected_sstable_writer() more precise
2021-12-12 15:07:15 +02:00
Gleb Natapov
38e1f85959 migration_manager: drop view_ptr array from announce_column_family_update()
No users pass it any longer.
2021-12-11 12:31:07 +02:00