Commit Graph

459 Commits

Author SHA1 Message Date
Benny Halevy
2f2e3b2e84 test: lib: index_reader_assertions: close reader before it is destroyed
Otherwise, it may trip an assertion when the nuderlying
file is closed, as seen in e.g.:
https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/4318/artifact/testlog/x86_64_release/sstable_3_x_test.test_read_rows_only_index.4174.log
```
test/boost/sstable_3_x_test.cc(0): Entering test case "test_read_rows_only_index"
sstable_3_x_test: ./seastar/src/core/fstream.cc:205: virtual seastar::file_data_source_impl::~file_data_source_impl(): Assertion `_reads_in_progress == 0' failed.
Aborting on shard 0.
Backtrace:
  0x22557e8
  0x2286842
  0x7f2799e99a1f
  /lib64/libc.so.6+0x3d2a1
  /lib64/libc.so.6+0x268a3
  /lib64/libc.so.6+0x26788
  /lib64/libc.so.6+0x35a15
  0x222c53d
  0x222c548
  0xb929cc
  0xc0b23b
  0xa84bbf
  0x24d0111
```

Decoded:
```
__GI___assert_fail at :?
~file_data_source_impl at ./build/release/seastar/./seastar/src/core/fstream.cc:205
~file_data_source_impl at ./build/release/seastar/./seastar/src/core/fstream.cc:202
std::default_delete<seastar::data_source_impl>::operator()(seastar::data_source_impl*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85
 (inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361
 (inlined by) ~data_source at ././seastar/include/seastar/core/iostream.hh:55
 (inlined by) ~input_stream at ././seastar/include/seastar/core/iostream.hh:254
 (inlined by) ~continuous_data_consumer at ././sstables/consumer.hh:484
 (inlined by) ~index_consume_entry_context at ././sstables/index_reader.hh:116
 (inlined by) std::default_delete<sstables::index_consume_entry_context<sstables::index_consumer> >::operator()(sstables::index_consume_entry_context<sstables::index_consumer>*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85
 (inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361
 (inlined by) ~index_bound at ././sstables/index_reader.hh:395
 (inlined by) ~index_reader at ././sstables/index_reader.hh:435
std::default_delete<sstables::index_reader>::operator()(sstables::index_reader*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85
 (inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361
 (inlined by) ~index_reader_assertions at ././test/lib/index_reader_assertions.hh:31
 (inlined by) operator() at ./test/boost/sstable_3_x_test.cc:4630
```

Test: unit(dev), sstable_3_x_test.test_read_rows_only_index(release X 10000)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211222132858.2155227-1-bhalevy@scylladb.com>
2021-12-22 15:33:22 +02:00
Raphael S. Carvalho
e1e8e020fe tests: Allow memtable to be flushed through column_family_for_tests
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211217160055.96693-1-raphaelsc@scylladb.com>
2021-12-21 07:21:26 +02:00
Botond Dénes
55bb70a878 Merge "Make sure TWCS per-window major includes all files" from Raphael
"
TWCS perform STCS on a window as long as it's the most recent one.
From there on, TWCS will compact all files in the past window into
a single file. With some moderate write load, it could happen that
there's still some compaction activity in that past window, meaning
that per-window major may miss some files being currently compacted.
As a result, a past window may contain more than 1 file after all
compaction activity is done on its behalf, which may increase read
amplification. To avoid that, TWCS will now make sure that per-window
major is serialized, to make sure no files are missed.

Fixes #9553.

tests: unit(dev).
"

* 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla:
  TWCS: Make sure major on past window is done on all its sstables
  TWCS: remove needless param for STCS options
  TWCS: kill unused param in newest_bucket()
  compaction: Implement strategy control and wire it
  compaction: Add interface to control strategy behavior.
2021-12-20 17:12:50 +02:00
Avi Kivity
e772fcbd57 Merge "Convert combined reader to v2" from Botond
"
Users are adjusted by sprinkling `upgrade_to_v2()` and
`downgrade_to_v1()` where necessary (or removing any of these where
possible). No attempt was made to optimize and reduce the amount of
v1<->v2 conversions. This is left for follow-up patches to keep this set
small.

The combined reader is composed of 3 layers:
1. fragment producer - pop fragments from readers, return them in batches
  (each fragment in a batch having the same type and pos).
2. fragment merger - merge fragment batches into single fragments
3. reader implementation glue-code

Converting layers (1) and (3) was mostly mechanical. The logic of
merging range tombstone changes is implemented at layer (2), so the two
different producer (layer 1) implementations we have share this logic.

Tests: unit(dev)
"

* 'combined-reader-v2/v4' of https://github.com/denesb/scylla:
  test/boost/mutation_reader_test: add test_combined_reader_range_tombstone_change_merging
  mutation_reader: convert make_clustering_combined_reader() to v2
  mutation_reader: convert position_reader_queue to v2
  mutation_reader: convert make_combined_reader() overloads to v2
  mutation_reader: combined_reader: convert reader_selector to v2
  mutation_reader: convert combined reader to v2
  mutation_reader: combined_reader: attach stream_id to mutation_fragments
  flat_mutation_reader_v2: add v2 version of empty reader
  test/boost/mutation_reader_test: clustering_combined_reader_mutation_source_test: fix end bound calculation
2021-12-20 14:01:03 +02:00
Botond Dénes
2364144b19 mutation_reader: convert position_reader_queue to v2
By removing the converting (v1->v2) constructor of
`reader_and_upper_bound` and adjusting its users.
2021-12-20 09:29:05 +02:00
Botond Dénes
aeddcf50a1 mutation_reader: convert make_combined_reader() overloads to v2
Just sprinkle the right amount downgrade_to_v1() and upgrade_to_v2() to
call sites, no attempts at optimization was done.
2021-12-20 09:29:05 +02:00
Avi Kivity
a97731a7e5 migration_manager: replace uses of get_storage_proxy and get_local_storage_proxy with constructor-provided reference
A static helper also gained a storage_proxy parameter.
2021-12-16 21:05:47 +02:00
Avi Kivity
d768e9fac5 cql3, related: switch to data_dictionary
Stop using database (and including database.hh) for schema related
purposes and use data_dictionary instead.

data_dictionary::database::real_database() is called from several
places, for these reasons:

 - calling yet-to-be-converted code
 - callers with a legitimate need to access data (e.g. system_keyspace)
   but with the ::database accessor removed from query_processor.
   We'll need to find another way to supply system_keyspace with
   data access.
 - to gain access to the wasm engine for testing whether used
   defined functions compile. We'll have to find another way to
   do this as well.

The change is a straightforward replacement. One case in
modification_statement had to change a capture, but everything else
was just a search-and-replace.

Some files that lost "database.hh" gained "mutation.hh", which they
previously had access to through "database.hh".
2021-12-15 13:54:23 +02:00
Avi Kivity
399e2895f1 test: cql_test_env: provide access to data_dictionary
Allow tests to have access to the data_dictionary.
2021-12-15 13:54:18 +02:00
Avi Kivity
3ac622bdd8 Merge "Add v2 versions of make_forwadable() and make_flat_mutation_reader_from_fragments()" from Botond
"
These two readers are crucial for writing tests for any composable
reader so we need v2 versions of them before we can convert and test the
combined reader (for example). As these two readers are often used in
situations where the payload they deliver is specially crafted for the
test at hand, we keep their v1 versions too to avoid conversion meddling
with the tests.

Tests: unit(dev)
"

* 'forwarding-and-fragment-reader-v2/v1' of https://github.com/denesb/scylla:
  flat_mutation_reader_v2: add make_flat_mutation_reader_from_fragments()
  test/lib/mutation_source_test: don't force v1 reader in reverse run
  mutation_source: add native_version() getter
  flat_mutation_reader_v2: add make_forwardable()
  position_in_partition: add after_key(position_in_partition_view)
  flat_mutation_reader: make_forwardable(): fix indentation
  flat_mutation_reader: make_forwardable(): coroutinize reader
2021-12-14 20:43:09 +02:00
Raphael S. Carvalho
49f40c8791 compaction: Implement strategy control and wire it
This implements strategy control interface for both manager and
tests, and wire it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-13 16:05:23 -03:00
Gleb Natapov
e9fafea5c1 migration_manager: pass raft_gr to the migration manager
Migration manager will be use raft group zero to distribute schema
changes.
2021-12-11 12:31:07 +02:00
Botond Dénes
20e45987b5 test/lib/mutation_source_test: don't force v1 reader in reverse run
Currently in the reverse run we wrap the test-provided mutation-source
and create a v1 reader with it, forcing a conversion if the
mutation-source has a v2 factory. Worse still, if the test is v2 native,
there will be a double conversion. This patch fixes this by creating a
wrapper mutation-source appropriate to the version of the underlying
factory of the wrapped mutation-source.
2021-12-10 15:48:49 +02:00
Avi Kivity
79bcdc104e Merge "Fix stateful multi-range scans" from Botond
"
Currently stateful (readers being saved and resumed on page boundaries)
multi-range scans are broken in multiple ways. Trying to use them can
result in anything from use-after-free (#6716) or getting corrupt data
(#9718). Luckily no-one is doing such queries today, but this started to
change recently as code such as Alternator TTL and distributed
aggregate reads started using this.
This series fixes both problems and adds a unit test too exercising this
previously completely unused code-path.

Fixes: #6716
Fixes: #9718

Tests: unit(dev, release, debug)
"

* 'fix-stateful-multi-range-scans/v1' of https://github.com/denesb/scylla:
  test/boost/multishard_mutation_query_test: add multi-range test
  test/boost/multishard_mutation_query_test: add multi-range support
  multishard_mutation_query: don't drop data during stateful multi-range reads
  multishard_combining_reader: reader_lifecycle_policy: allow saving read range on fast-forward
2021-12-07 12:19:56 +02:00
Botond Dénes
953603199e multishard_combining_reader: reader_lifecycle_policy: allow saving read range on fast-forward
The reader_lifecycle_policy API was created around the idea of shard
readers (optionally) being saved and reused on the next page. To do
this, the lifecycle policy has to also be able to control the lifecycle
of by-reference parameters of readers: the slice and the range. This was
possible from day 1, as the readers are created through the lifecycle
policy, which can intercept and replace the said parameters with copies
that are created in stable storage. There was one whole in the design
though: fast-forwarding, which can change the range of the read, without
the lifecycle policy knowing about this. In practice this results in
fast-forwarded readers being saved together with the wrong range, their
range reference becoming stale. The only lifecycle implementation prone
to this is the one in `multishard_mutation_query.cc`, as it is the only
one actually saving readers. It will fast-forward its reader when the
query happens over multiple ranges. There were no problems related to
this so far because no one passes more than one range to said functions,
but this is incidental.
This patch solves this by adding an `update_read_range()` method to the
lifecycle policy, allowing the shard reader to update the read range
when being fast forwarded. To allow the shard reader to also have
control over the lifecycle of this range, a shared pointer is used. This
control is required because when an `evictable_reader` is the top-level
reader on the shard, it can invoke `create_reader()` with an edited
range after `update_read_range()`, replacing the fast-forwarded-to
range with a new one, yanking it out from under the feet of the
evictable reader itself. By using a shared pointer here, we can ensure
the range stays alive while it is the current one.
2021-12-03 10:27:44 +02:00
Avi Kivity
3b82ef854d Merge "Some compaction manager cleanups" from Raphael
"
couple of preparatory changes for coroutinization of manager
"

* 'some_compaction_manager_cleanups_v5' of github.com:raphaelsc/scylla:
  compaction_manager: move check_for_cleanup into perform_cleanup()
  compaction_manager: replace get_total_size by one liner
  compaction_manager: make consistent usage of type and name table
  compaction_manager: simplify rewrite_sstables()
  compaction_manager: restore indentation
2021-12-02 19:53:13 +02:00
Botond Dénes
f0b9519999 test/lib/exception_utils: add message_matches() predicate
Which checks the message against the given regex.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211202124955.542293-1-bdenes@scylladb.com>
2021-12-02 19:43:30 +02:00
Raphael S. Carvalho
760cfd93fb compaction_manager: make consistent usage of type and name table
new code in manager adopted name and type table, whereas historical
code still uses name and type column family. let's make it consistent
for newcomers to not get confused.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 14:39:27 -03:00
Avi Kivity
078f69c133 Merge "raft: (service) implement group 0 as a service" from Kostja
"
To ensure consistency of schema and topology changes,
Scylla needs a linearizable storage for this data
available at every member of the database cluster.

The series introduces such storage as a service,
available to all Scylla subsystems. Using this service, any other
internal service such as gossip or migrations (schema) could
persist changes to cluster metadata and expect this to be done in
a consistent, linearizable way.

The series uses the built-in Raft library to implement a
dedicated Raft group, running on shard 0, which includes all
members of the cluster (group 0), adds hooks to topology change
events, such as adding or removing nodes of the cluster, to update
group 0 membership, ensures the group is started when the
server boots.

The state machine for the group, i.e. the actual storage
for cluster-wide information still remains a stub. Extending
it to actually persist changes of schema or token ring
is subject to a subsequent series.

Another Raft related service was implemented earlier: Raft Group
Registry. The purpose of the registry is to allow Scylla have an
arbitrary number of groups, each with its own subset of cluster
members and a relevant state machine, sharing a common transport.
Group 0 is one (the first) group among many.
"

* 'raft-group-0-v12' of github.com:scylladb/scylla-dev:
  raft: (server) improve tracing
  raft: (metrics) fix spelling of waiters_awaken
  raft: make forwarding optional
  raft: (service) manage Raft configuration during topology changes
  raft: (service) break a dependency loop
  raft: (discovery) introduce leader discovery state machine
  system_keyspace: mark scylla_local table as always-sync commitlog
  system_keyspace: persistence for Raft Group 0 id and Raft Server Id
  raft: add a test case for adding entries on follower
  raft: (server) allow adding entries/modify config on a follower
  raft: (test) replace virtual with override in derived class
  raft: (server) fix a typo in exception message
  raft: (server) implement id() helper
  raft: (server) remove apply_dummy_entry()
  raft: (test) fix missing initialization in generator.hh
2021-11-30 16:24:51 +02:00
Eliran Sinvani
ddd7248b3b testlib: close index_reader to avoid racing condition
In order to avoid race condition introduced in 9dce1e4 the
index_reader should be closed prior to it's destruction.
This only exposes 4.4 and earlier releases to this specific race.
However, it is always a good idea to first close the index reader
and only then destroy it since it is most likely to be assumed by
all developers that will change the reader index in the future.

Ref #9704 (because on 4.4 and earlier releases are vulnerable).

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>

Closes #9705
2021-11-30 13:05:24 +01:00
Tomasz Grabiec
3226c5bf9d Merge 'sstables: mx: enable position fast-forwarding in reverse mode' from Kamil Braun
Most of the machinery was already implemented since it was used when
jumping between clustering ranges of a query slice. We need only perform
one additional thing when performing an index skip during
fast-forwarding: reset the stored range tombstone in the consumer (which
may only be stored in fast-forwarding mode, so it didn't matter that it
wasn't reset earlier). Comments were added to explain the details.

As a preparation for the change, we extend the sstable reversing reader
random schema test with a fast-forwarding test and include some minor
fixes.

Fixes #9427.

Closes #9484

* github.com:scylladb/scylla:
  query-request: add comment about clustering ranges with non-full prefix key bounds
  sstables: mx: enable position fast-forwarding in reverse mode
  test: sstable_conforms_to_mutation_source_test: extend `test_sstable_reversing_reader_random_schema` with fast-forwarding
  test: sstable_conforms_to_mutation_source_test: fix `vector::erase` call
  test: mutation_source_test: extract `forwardable_reader_to_mutation` function
  test: random_schema: fix clustering column printing in `random_schema::cql`
2021-11-29 16:01:53 +01:00
Nadav Har'El
1e2ecd282a Merge 'Harden compaction manager remove' from Benny Halevy
This series hardens compaction_manager::remove by:
- add debug logging around task execution and stopping.
- access compaction_state as lw_shared_ptr rather than via a raw pointer.
  - with that, detach it from `_compaction_state` in `compaction_manager::remove` right away, to prevent further use of it while compactions are stopped.
 - added write_lock in `remove` to make sure the lock is not held by any stray task.

Test: unit(dev), sstable_compaction_test(debug)
Dtest: alternator_tests.py:AlternatorTest.test_slow_query_logging (debug)

Closes #9636

* github.com:scylladb/scylla:
  compaction_manager: add compaction_state when table is constructed
  compaction_manager: remove: fixup indentation
  compaction_manager: remove: detach compaction_state before stopping ongoing compactions
  compaction_manager: remove: serialize stop_ongoing_compactions and gate.close
  compaction_manager: task: keep a reference on compaction_state
  test: sstable_compaction_test: incremental_compaction_data_resurrection_test: stop table before it's destroyed.
  test: sstable_utils: compact_sstables: deregister compaction also on error path
  test: sstable_compaction_test: partial_sstable_run_filtered_out_test: deregiser_compaction also on error path
  test: compaction_manager_test: add debug logging to register/deregister compaction
  test: compaction_manager_test: deregister_compaction: erase by iterator
  test: compaction_manager_test: move methods out of line
  compaction_manager: compaction_state: use counter for compaction_disabled
  compaction_manager: task: delete move and copy constructors
  compaction_manager: add per-task debug log messages
  compaction_manager: stop_ongoing_compactions: log number of tasks to stop
2021-11-28 22:12:52 +02:00
Konstantin Osipov
c22f945f11 raft: (service) manage Raft configuration during topology changes
Operations of adding or removing a node to Raft configuration
are made idempotent: they do nothing if already done, and
they are safe to resume after a failure.

However, since topology changes are not transactional, if a
bootstrap or removal procedure fails midway, Raft group 0
configuration may go out of sync with topology state as seen by
gossip.

In future we must change gossip to avoid making any persistent
changes to the cluster: all changes to persistent topology state
will be done exclusively through Raft Group 0.

Specifically, instead of persisting the tokens by advertising
them through gossip, the bootstrap will commit a change to a system
table using Raft group 0. nodetool will switch from looking at
gossip-managed tables to consulting with Raft Group 0 configuration
or Raft-managed tables.
Once this transformation is done, naturally, adding a node to Raft
configuration (perhaps as a non-voting member at first) will become the
first persistent change to ring state applied when a node joins;
removing a node from the Raft Group 0 configuration will become the last
action when removing a node.

Until this is done, do our best to avoid a cluster state when
a removed node or a node which addition failed is stuck in Raft
configuration, but the node is no longer present in gossip-managed
system tables. In other words, keep the gossip the primary source of
truth. For this purpose, carefully chose the timing when we
join and leave Raft group 0:

Join the Raft group 0 only after we've advertised our tokens, so the
cluster is aware of this node, it's visible in nodetool status,
but before node state jumps to "normal", i.e. before it accepts
queries. Since the operation is idempotent, invoke it on each
restart.

Remove the node from Group 0 *before* its tokens are removed
from gossip-managed system tables. This guarantees
that if removal from Raft group 0 fails for whatever reason,
the node stays in the ring, so nodetool removenode and
friends are re-tried.

Add tracing.
2021-11-25 12:35:42 +03:00
Pavel Emelyanov
ef1960d034 code: Carry gossiper down to virtual tables creation
One of the tables needs gossiper and uses global one. This patch
prepares the fix by patching the main -> register_virtual_tables
stack with the gossiper reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-25 10:52:55 +03:00
Mikołaj Sielużycki
44f4ea38c5 test: Future-proof reader conversions tests.
Query time must be fetched after populate. If compaction is executed
during populate it may be executed with timestamp later than query_time.
This would cause the test expected compaction and compaction during
populate to be executed at different time points producing different
results. The result would be sporadic test failures depending on relative
timing of those operations. If no other mutations happen after populate,
and query_time is later than the compaction time during population, we're
guaranteed to have the same results.
Message-Id: <20211123134808.105068-1-mikolaj.sieluzycki@scylladb.com>
2021-11-24 21:01:57 +01:00
Pavel Emelyanov
aaa58b7b89 storage_service: Keep streaming_manager reference
The manager is drained() on drain/decommission/isolate. Since now
it's storage_service who orchestrates all of the above, it needs
and explicit reference on the target.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-24 12:17:35 +03:00
Benny Halevy
3940ffb085 compaction_manager: task: keep a reference on compaction_state
And hold its gate to make sure the compaction_state outlives
the task and can be used to wait on all tasks and functions
using it.

With that, doing access _compaction_state[cf] to acquire
shared/exclusive locks but rather get to it via
task->compaction_state so it can be detached from
_compaction_state while task is running, if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
3955829286 test: sstable_utils: compact_sstables: deregister compaction also on error path
We need to call deregister_compaction(cdata) also if
compact_sstables failed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:39:10 +02:00
Benny Halevy
d344765ec6 get rid of the global batchlog_manager
Now that it's unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 08:27:30 +02:00
Benny Halevy
9cde52c58f storage_service: keep a reference to the batchlog_manager
Rather than accessing the global batchlog_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 08:27:30 +02:00
Benny Halevy
c6d82891cc test: cql_test_env: expose batchlog_manager
And use in batchlog_manager_test.test_execute_batch
to help deglobalize the batchlog_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 08:27:30 +02:00
Benny Halevy
03039e8f8a main: allow setting the global batchlog_manager
As a prerequisite to globalizing the batchlog_manager,
allow setting a global pointer to it and instantiate
the sharded<db::batchlog_manager> on the main/cql_test_env
stack.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 08:27:30 +02:00
Benny Halevy
8d7909de83 test: compaction_manager_test: add debug logging to register/deregister compaction
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:09:40 +02:00
Benny Halevy
ca97c919eb test: compaction_manager_test: deregister_compaction: erase by iterator
No need to search for the task again in the list.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:09:40 +02:00
Benny Halevy
5d6ea651d7 test: compaction_manager_test: move methods out of line
No need for them to be inlined in the sstable_utils.hh.

While at it, mark constructor noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:09:40 +02:00
Avi Kivity
96e9c3951c Merge "Finally stop including database.hh in compaction.cc" from Raphael
"
After this series, compaction will finally stop including database.hh.

tests: unit(debug).
"

* 'stop_including_database_hh_for_compaction' of github.com:raphaelsc/scylla:
  compaction: stop including database.hh
  compaction: switch to table_state in get_fully_expired_sstables()
  compaction: switch to table_state
  compaction: table_state: Add missing methods required by compaction
2021-11-20 18:28:05 +02:00
Raphael S. Carvalho
d89edad9fb compaction: switch to table_state
Make compaction procedure switch to table_state. Only function in
compaction.cc still directly using table is
get_fully_expired_sstables(T,...), but subsequently we'll make it
switch to table_state and then we can finally stop including database.hh
in the compaction code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-19 22:06:01 -03:00
Benny Halevy
1e259665fe test: cql_test_env: stop view_update_generator before database shuts down
We can't have view updates happening after the database shuts down.
In particular, mutateMV depends on the keyspace effective_replaication_map
and it is going to be released when all keyspaces shut down, in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:52:41 +02:00
Benny Halevy
1d7556d099 main: pass erm_factory to storage_service
To be used for creating effective_replication_map
when token_metadata changes, and update all
keyspaces with it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
242043368e main: pass erm_factory to storage_proxy
To be used for creating the effective_replication_map per keyspace.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Benny Halevy
3fed73e7c2 locator: add effective_replication_map_factory
It will be used further to create shared copies
of effective_replication_map based on replication_strategy
type and config options.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00
Kamil Braun
3abcbf6875 test: mutation_source_test: extract forwardable_reader_to_mutation function
The function shall be used in other places as well.
2021-11-15 17:32:17 +01:00
Kamil Braun
9f0e13dd0b test: random_schema: fix clustering column printing in random_schema::cql
Also leave a FIXME to include the key ordering in the string as well.
2021-11-15 17:30:59 +01:00
Pavel Emelyanov
947e4c9a10 code: Push db::config down to virtual tables
The db::config reference is available on the database, which
can be get from the virtual_table itself. The problem is that
it's a const refernece, while system.config will be updateable
and will need non-const reference.

Adding non-const get_config() on the database looks wrong. The
database shouldn't be used as config provider, even the const
one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-11 16:39:34 +03:00
Avi Kivity
d2e02ea7aa Merge " Abstract table for compaction layer with table_state" from Raphael
"
table_state is being introduced for compaction subsystem, to remove table dependency
from compaction interface, fix layer violations, and also make unit testing
easier as table_state is an abstraction that can be implemented even with no
actual table backing it.

In this series, compaction strategy interfaces are switching to table_state,
and eventually, we'll make compact_sstables() switch to it too. The idea is
that no compaction code will directly reference a table object, but only work
with the abstraction instead. So compaction subdirectory can stop
including database.hh altogether, which is a great step forward.
"

* 'table_state_v5' of https://github.com/raphaelsc/scylla:
  sstable_compaction_test: switch to table_state
  compaction: stop including database.hh for compaction_strategy
  compaction: switch to table_state in estimated_pending_compactions()
  compaction: switch to table_state in compaction_strategy::get_major_compaction_job()
  compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction()
  DTCS: reduce table dependency for task estimation
  LCS: reduce table dependency for task estimation
  table: Implement table_state
  compaction: make table param of get_fully_expired_sstables() const
  compaction_manager: make table param of has_table_ongoing_compaction() const
  Introduce table_state
2021-11-09 19:21:57 +02:00
Raphael S. Carvalho
e2f6a47999 compaction: switch to table_state in estimated_pending_compactions()
Last method in compaction_strategy using table. From now on,
compaction strategy no longer works directly with table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 11:25:28 -03:00
Raphael S. Carvalho
d881310b52 compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction()
From now on, get_sstables_for_compaction() will use table_state.
With table_state, we avoid layer violations like strategy using
manager and also makes testing easier.

Compaction unit tests were temporarily disabled to avoid a giant
commit which is hard to parse.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 10:52:14 -03:00
Michael Livshin
4941e2ec41 tests: fix range tombstone checking and deal with the fallout
flat_reader_assertions::produces_range_tombstone() does not actually
check range tombstones beyond the fact that they are in fact range
tombstones (unless non-empty ck_ranges is passed).

Fixing the immediate problem reveals that:

* The assertion logic is not flexible enough to deal with
  creatively-split or creatively-overlapping range tombstones.

* Some existing tests involving range tombstones are in fact wrong:
  some assertions may (at least with some readers) refer to wrong
  tombstones entirely, while others assert wrong things about right
  tombstones.

* Range tombstones in pre-made sstables (such as those read by
  sstable_3_x_test) have deletion time drift, and that now has to be
  somehow dealt with.

This patch (which is not split into smaller ones because that would
either generate unreasonable amount of work towards ensuring
bisectability or entail "temporarily" disabling problematic tests,
which is cheating) contains the following changes:

* flat_reader_assertions check range tombstones more carefully, by
  accumulating both expected and actually-read range tombstones into
  lists and comparing those lists when a partition ends (or when the
  assertion object is destroyed).

* flat_reader_assertions::may_produce_tombstones() can take
  constraining ck_ranges.

* Both flat_reader_assertions and flat_reader_assertions_v2 can be
  instructed to ignore tombstone deletion times, to help with tests that
  read pre-made sstables.

* Affected tests are changed to reflect reality.  Most changes to
  tests make sense; the only one I am not completely sure about is in
  test_uncompressed_filtering_and_forwarding_range_tombstones_read.

Fixes #9470

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-11-08 00:56:39 +02:00
Raphael S. Carvalho
63dc4e2107 compaction_manager: simplify creation of compaction_data
there's no need for wrapping compaction_data in shared_ptr, also
let's kill unused params in create_compaction_data to simplify
its creation.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:35:49 -03:00
Botond Dénes
6a76e12768 mutation_partition: row: make row marker shadowing symmetric
Currently row marker shadowing the shadowable tombstone is only checked
in `apply(row_marker)`. This means that shadowing will only be checked
if the shadowable tombstone and row marker are set in the correct order.
This at the very least can cause flakyness in tests when a mutation
produced just the right way has a shadowable tombstone that can be
eliminated when the mutation is reconstructed in a different way,
leading to artificial differences when comparing those mutations.

This patch fixes this by checking shadowing in
`apply(shadowable_tombstone)` too, making the shadowing check symmetric.

There is still one vulnerability left: `row_marker& row_marker()`, which
allow overwriting the marker without triggering the corresponding
checks. We cannot remove this overload as it is used by compaction so we
just add a comment to it warning that `maybe_shadow()` has to be manually
invoked if it is used to mutate the marker (compaction takes care of
that). A caller which didn't do the manual check is
mutation_source_test: this patch updates it to use `apply(row_marker)`
instead.

Fixes: #9483

Tests: unit(dev)

Closes #9519
2021-10-26 20:40:31 +02:00