Commit Graph

1965 Commits

Author SHA1 Message Date
Glauber Costa
628dd16519 compaction: deprecate DTCS. Step 1.
This patch adds a warning of deprecation to DTCS. In a follow up step,
we will start requiring a flag for it to be enabled to make sure users
notice.

For now we'll just be nice and add a warning for the log watchers.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200224164405.9656-1-glauber@scylladb.com>
2020-02-24 20:26:24 +02:00
Raphael S. Carvalho
e81076b01c compaction: Implement ranges for cache invalidation on behalf of cleanup
This procedure will calculate ranges for cache invalidation by subtracting
all owned ranges from the sstables' partition ranges. That's done so as
to reduce the size of invalidated ranges.

Refs #4446.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-02-20 10:55:49 -03:00
Raphael S. Carvalho
db4c3230f7 compaction: Add ranges for cache invalidation to compaction_completion_desc
It will store the ranges to be invalidated in row cache on compaction
completion. Intended to be used by cleanup compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-02-19 19:30:35 -03:00
Raphael S. Carvalho
51532b84f8 compaction: Make it possible for a compaction type to customize compaction_completion_desc
compaction_completion_desc will eventually store more information that can be
customized by the compaction type.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-02-19 19:30:35 -03:00
Raphael S. Carvalho
65b4fc8bcd sstables/compaction: Introduce compaction_completion_desc
This descriptor contain all information needed for table to be properly
updated on compaction completion. A new member will be added to it soon,
which will store ranges to be invalidated in row cache on behalf of
cleanup compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-02-19 19:29:32 -03:00
Avi Kivity
6c7aa18238 Merge "Introduce schema::get_partitioner" from Piotr
"
Introduce schema::get_partitioner and use it instead of dht::global_partitioner.

Fixes #5493

Tests: unit(dev, release, debug)
"

* 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits)
  cdc: stop using partitioners
  partitioner_test: stop calling set_global_partitioner
  storage_service: stop calling global_partitioner()
  mutation_writer_test: stop calling global_partitioner()
  schema: reduce number of global_partitioner() calls
  test_services: stop calling global_partitioner()
  sstable_utils: stop calling global_partitioner()
  sstable_resharding_test: stop depending on global partitioner
  sstable_mutation_test: stop calling global_partitioner()
  sstable_data_file_test: stop calling global_partitioner()
  random_schema: stop taking partitioner in constructor
  mutation_reader_test: stop calling global_partitioner()
  multishard_mutation_query_test: stop calling global_partitioner()
  row_level repair: stop calling global_partitioner()
  distribute_reader_and_consume_on_shards: don't take partitioner
  thrift: reduce global_partitioner() calls
  binary_search: stop calling global_partitioner()
  index_entry: stop calling global_partitioner()
  mc writer: stop calling global_partitioner()
  sstable: stop calling global_partitioner()
  ...
2020-02-17 18:12:53 +02:00
Tomasz Grabiec
76d1dd7ec6 Merge "nodetool scrub: implement validation and the skip-corrupted flag
" from Botond

Nodetool scrub rewrites all sstables, validating their data. If corrupt
data is found the scrub is aborted. If the skip-corrupted flag is set,
corrupt data is instead logged (just the keys) and skipped.

The scrubbing algorithm itself is fairly simple, especially that we
already have a mutation stream validator that we can use to validate the
data. However currently scrub is piggy-backed on top of cleanup
compaction. To implement this flag, we have to make scrub a separate
compaction type and propagate down the flag. This required some
massaging of the code:
* Add support for more than two (cleanup or not) compaction types.
* Allow passing custom options for each compaction type.
* Allow stopping a compaction without the manager retrying it later.

Additionally the validator itself needed some changes to allow different
ways to handle errors, as needed by the scrub.

Fixes: #5487

* https://github.com/denesb/nodetool-scrub-skip-corrupted/v7:
  table: cleanup_sstables(): only short-circuit on actual cleanup
  compaction: compaction_type: add Upgrade
  compaction: introduce compaction_options
  compaction: compaction_descriptor: use compaction options instead of
    cleanup flag
  compaction_manager: collect all cleanup related logic in
    perform_cleanup()
  sstables: compaction_stop_exception: add retry flag
  mutation_fragment_stream_validator: split into low-level and
    high-level API
  compaction: introduce scrub_compaction
  compaction_manager: scrub: don't piggy-back on upgrade_sstables()
  test: sstable_datafile_test: add scrub unit test
2020-02-17 15:28:07 +02:00
Piotr Jastrzebski
56e3cb8c3a binary_search: stop calling global_partitioner()
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Piotr Jastrzebski
1db437ee91 index_entry: stop calling global_partitioner()
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Piotr Jastrzebski
1f866d7001 mc writer: stop calling global_partitioner()
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Piotr Jastrzebski
6fe0dcbac4 sstable: stop calling global_partitioner()
parse functions now take const schema& which allows
them to reach a partitioner. It's safe to take schema
by const& because the only caller takes the schema
from an sstable object.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Piotr Jastrzebski
ca4a89d239 dht: add dht::decorate_key
and replace all dht::global_partitioner().decorate_key
with dht::decorate_key

It is an improvement because dht::decorate_key takes schema
and uses it to obtain partitioner instead of using global
partitioner as it was before.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:06 +01:00
Piotr Jastrzebski
abd76e566f dht::shard_of: stop calling global_partitioner()
Take const schema& as a parameter of shard_of and
use it to obtain partitioner instead of calling
global_partitioner().

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:23:16 +01:00
Piotr Jastrzebski
57e4b7f215 ring_position_range_sharder: stop calling global_partitioner
Remove ring_position_range_sharder(nonwrapping_range<ring_position>)
which calls another constructor with partitioner obtained with
dht::global_partitioner().

Fix all the places the removed constructor was used and obtain
partitioner from schema instead.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:19:15 +01:00
Piotr Jastrzebski
dd1120454b dht: move sharders to a separate header
i_partitioner.hh is widely included while sharders are used
only in 6 places so there's no need to include them in
the whole codebase.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:19:02 +01:00
Pavel Emelyanov
b11cf6e950 cql3/query_processor.hh: Debloat from other headers
This gives ~30% less (251 jobs -> 181 jobs) recompile when touching it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200212225828.3374-1-xemul@scylladb.com>
2020-02-16 11:22:30 +02:00
Botond Dénes
26d4c8be95 compaction_manager: scrub: don't piggy-back on upgrade_sstables()
Now that we have the necessary infrastructure to do actual scrubbing,
don't rely on `upgrade_sstables()` anymore behind the scenes, instead do
an actual scrub.

Also, use the skip-corrupted flag.
2020-02-13 15:02:37 +02:00
Botond Dénes
33c126e8c0 compaction: introduce scrub_compaction
A specialized compaction subclass for executing a scrub compaction.
`scrub_compaction` supplies a specialized reader which will validate its
input and stop on the first error. If it is configured with
`skip_corrupted`, it will instead skip bad data, logging it.
2020-02-13 15:02:37 +02:00
Botond Dénes
1b7725af4b mutation_fragment_stream_validator: split into low-level and high-level API
The low-level validator allows fine-grained validation of different
aspects of monotonicity of a fragment stream. It doesn't do any error
handling. Since different aspects can be validated with different
functions, this allows callers to understand what exactly is invalid.

The high-level API is the previous fragment filter one. This is now
built on the low-level API.

This division allows for advanced use cases where the user of the
validator wants to do all error handling and wants to decide exactly
what monotonicity to validate. The motivating use-case is scrubbing
compaction, added in the next patches.
2020-02-13 15:02:32 +02:00
Botond Dénes
7d3bce403d sstables: compaction_stop_exception: add retry flag
Allow the thrower to communicate that it doesn't want the compaction to
be retried later. I know, using exceptions for control flow is *very*
bad, but this is the existing mechanism to stop a compaction and I don't
want to invent a new one for this.

Also massage the error messages a bit to take the value of this flag
into consideration.
2020-02-11 18:38:35 +02:00
Botond Dénes
8014c7124d compaction_manager: collect all cleanup related logic in perform_cleanup()
Currently the call chain for a cleanup collection looks like this:
compaction_manager::perform_cleanup()
    compaction_manager::rewrite_sstables()
        table::cleanup_sstables()
            ...

`perform_cleanup()` is essentially empty, immediately deferring to
`rewrite_sstables()`. Cleanup related logic is scattered between the
latter two methods on the call chain. These methods however recently
started serving as generic methods for compactions that want to
rewrite each sstable one-by-one, collecting cleanup related ifs in
various places.
The reason is historic, we first had cleanup, then bolted others on top,
trying to share the underlying code as much as possible.

It is time this is cleaned up (pun intended). Make `perform_cleanup()`
the place where all cleanup related logic is, with the rest of the stack
made truly generic.
2020-02-11 17:47:44 +02:00
Botond Dénes
b2dc5d4895 compaction: compaction_descriptor: use compaction options instead of cleanup flag
Instead of the restrictive `cleanup` boolean flag, which allows for choosing
between only two compaction types, use `compaction_options`, which in
addition to allowing any number of compaction types to be selected,
also allows seamlessly passing specific options to them.
2020-02-11 17:47:44 +02:00
Botond Dénes
8579bef076 compaction: introduce compaction_options
Currently the compaction API is quite restrictive. It offers a generic
`compact_sstables()` and `reshard_sstables()` methods. The former is the
one used by all but resharding, however it only really supports two
modes: regular and cleanup. The latter is supported by a semi-hidden
`cleanup` flag in `compaction_description`. Actually there are two more
compaction types already which are piggy-backed on cleanup: upgrade and
scrub. The upper layers distinguish between actual cleanup and "fake"
cleanup by a `is_actual_cleanup` flag. The latter two "fake" cleanup
compactions cannot be distinguished even by the upper layers.
This is terribly confusing and hard to follow, in addition to being
restrictive.

This worked so far, because upgrade is served quite well by the cleanup
compaction type, turning off certain preparations by the above mentioned
`is_actual_cleanup` flag. Scrub is barely implemented and just an
upgrade behind the scenes.

This situation is however preventing really specializing each
compaction. Enter `compaction_options`. This variant in disguise is
designed to allow passing specific option to each compaction type, and
doubles as an enum allowing more than two low level compaction type.

This patch only adds the option class itself, propagating and handling
it will be done by the next patches.
2020-02-11 17:47:44 +02:00
Botond Dénes
6bc3b41c20 compaction: compaction_type: add Upgrade
Although we currently do support upgrade compaction, it is piggy-backed
on top of cleanup compaction. This is soon going to change, so in
preparation to that, add an `Upgrade` member to the `compaction_type`
enum.
2020-02-11 17:47:44 +02:00
Raphael S. Carvalho
140520ff87 sstables/compaction_manager: add metric for pending compaction tasks
we have compaction_manager.compactions metric for the number of active tasks,
but they don't account for tasks blocked waiting for an opportunity to run,
and they're the problematic ones.

Fixes #5254.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200210131929.30981-1-raphaelsc@scylladb.com>
2020-02-10 17:55:02 +01:00
Avi Kivity
bed61b96a2 Merge "Move features from storage- into feature-service" from Pavel
"
There's a lot of code around that needs storage service purely to
get the specific feature value (cluster_supports_<something> calls).
This creates several circular dependencies, e.g. storage_service <->
migration_manager one and database <-> storage_servuce. Also features
sit on storage_service, but register themselfs on the feature_service
and the former subscribes on them back which also looks strange.

I propose to keep all the features on feature_service, this keeps the
latter intependent from other components, makes it possible to break
one of the mentioned circle dependencyand heavily relax the other.

Also the set helps us fighting the globals and, after it, the
feature_service can be safely stopped at the very last moment.

Tests: unit(dev), manual debug build start-stop
"

* 'br-features-to-service-5' of https://github.com/xemul/scylla:
  gossiper: Avoid string merge-split for nothing
  features: Stop on shutdown
  storage_service: Remove helpers
  storage_service: Prepare to switch from on-board feature helpers
  cql3: Check feature in .validate
  database: Use feature service
  storage_proxy: Use feature service
  migration_manager: Use feature service
  start: Pass needed feature as argument into migrate_truncation_records
  features: Unfriend storage_service
  features: Simplify feature registration
  features: Introduce known_feature_set
  features: Move disabled features set from storage_service
  features: Move schema_features helper
  features: Move all features from storage_service to feature_service
  storage_service: Use feature_config from _feature_service
  features: Add feature_config
  storage_service: Kill set_disabled_features
  gms: Move features stuff into own .cc file
  migration_manager: Move some fns into class
2020-02-09 19:22:07 +02:00
Pavel Emelyanov
d1775dd701 utils: Move disk-error-handler into it
The disk-error-handler is purely auxiliary thing that helps
propagating IO errors to the rest of the code. It well
deserves not sitting in the root namespace.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200207112443.18475-1-xemul@scylladb.com>
2020-02-09 17:26:52 +02:00
Piotr Jastrzebski
8813a6ca2a index_reader: avoid copying schema to lambda
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-06 14:10:58 +01:00
Avi Kivity
e719ea1bba Merge "Fix assert on initialization error" (in large_data_handler) from Rafael
"
This series fixes an assertion when initialization fails after
creating a database. I don't know of a case where that currently
happens, but it is easy to cause that when writing a patch and the
produced assert is just confusing.
"

* 'espindola/dont-assert-on-init-error' of https://github.com/espindola/scylla:
  db: Replace large_data_handler::_stopped with _running
  db: Move nop_large_data_handler constructor out-of-line
  db: Move large_data_handler::stop out-of-line
2020-02-05 18:49:11 +02:00
Piotr Jastrzebski
1d1ac476c3 token: remove token_view
Now that both token and token_view contain int64_t
it makes no sense to keep the view.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-05 09:31:32 +01:00
Piotr Jastrzebski
06dfd16aad sstables: use copy constructor for tokens
instead of manually creating new token from another
token internals.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-05 09:31:32 +01:00
Piotr Jastrzebski
05e0451b27 token: change _data to int64_t
Previously _data was stored as array of 8 bytes in
network byte order.
After this change it stores the same value in int64_t
in host byte order.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-05 09:31:32 +01:00
Piotr Jastrzebski
b569d127a0 token: change data to array<uint8_t, 8>
It is save to do such change because we support only
Murmur3Partitioner which uses only tokens that are
8 bytes long.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-05 09:30:46 +01:00
Rafael Ávila de Espíndola
5d4671526c db: Replace large_data_handler::_stopped with _running
This is not just a direct flip to a variable with the negated Boolean
value. When created, a large_data_handler is not considered to be
running, the user has to call start() before it can be used.

The advantaged of doing this is that if initialization fails and a
database is destructed before the large_data_handler is started, the
assert

database::stop() {
    assert(!_large_data_handler->running());

is not triggered.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-02-04 21:15:44 -08:00
Pavel Emelyanov
0e62d615ae storage_service: Prepare to switch from on-board feature helpers
There are some places that get global storage_service instance
for individual features. In the next patch all these helpers
will be removed, so here's the preparation for it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-03 15:16:23 +03:00
Avi Kivity
adb64dc72f treewide: tighten concepts syntax
gcc 10 requires a semicolon after every compound requirement,
as per the standard. Add missing semicolons where necessary.
Message-Id: <20200129205805.20928-1-avi@scylladb.com>
2020-01-30 14:10:18 +02:00
Botond Dénes
dfc66194c8 index_reader: make the index file tracked
Track I/O going to the index file, similarly to how we already track I/O
going to the data file.
2020-01-28 08:13:16 +02:00
Botond Dénes
936619a8d3 sstables/continuous_data_consumer: track buffers used for parsing
Based on heap profiling, buffers used for storing half-parsed fields are
a major contributor to the overall memory consumption of reads. This
memory was completely "under the radar" before. Track it by using
tracked `temporary_buffer` instances everywhere in
`continuous_data_consumer`. As `continuous_data_consumer` is the basis
for parsing all index and data files, adding the tracing here
automatically covers all data, index and promoted index parsing.

I'm almost convinced that there is a better place to store the `permit`
then the three places now, but so far I was unable to completely
decipher the our data/index file parsing class hierarchy.
2020-01-28 08:13:16 +02:00
Botond Dénes
dfc8b2fc45 treewide: replace reader_resource_tracer with reader_permit
The former was never really more than a reader_permit with one
additional method. Currently using it doesn't even save one from any
includes. Now that readers will be using reader_permit we would have to
pass down both to mutation_source. Instead get rid of
reader_resource_tracker and just use reader_permit. Instead of making it
a last and optional parameter that is easy to ignore, make it a
first class parameter, right after schema, to signify that permits are
now a prominent part of the reader API.

This -- mostly mechanical -- patch essentially refactors mutation_source
to ask for the reader_permit instead of reader_resource_tracking and
updates all usage sites.
2020-01-28 08:13:16 +02:00
Botond Dénes
a74a82d4d2 flat_mutation_reader: mutation_fragment_stream_validator: add name
Add a name parameter to the validator, so that the validator can be
identified in log messages. Schema identity information is added to the
name automatically. This should help pinpoint the problematic place
where validation failed.
Although at the moment we have a single validator, it still benefits
from having a name, as we can now include in it the name of the sstable
being written and hence trace the source of the bad data.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200117150616.895878-1-bdenes@scylladb.com>
2020-01-20 11:06:30 +01:00
Raphael S. Carvalho
390c8b9b37 sstables: Move STCS implementation to source file
header only implementation potentially create a problem with duplicate symbols

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200107154258.9746-1-raphaelsc@scylladb.com>
2020-01-08 09:55:35 +02:00
Avi Kivity
e5e42672f5 sstables: reduce bloat from sstables::write_simple()
sstables::write_simple() has quite a lot of boilerplate
which gets replicated into each template instance. Move
all of that into a non-template do_write_simple(), leaving
only things that truly depend on the component being written
in the template, and encapsulating them with a
noncopyable_function.

An explicit template instantiation was added, since this
is used in a header file. Before, it likely worked by
accident and stopped working when the template became
small enough to inline.

Tests: unit (dev)
Message-Id: <20200106135453.1634311-1-avi@scylladb.com>
2020-01-07 11:56:11 +01:00
Rafael Ávila de Espíndola
75817d1fe7 sstable: Add checks to help track problems with large_data_handler use after free
I can't quite figure out how we were trying to write a sstable with
the large data handler already stopped, but the backtrace suggests a
good place to add extra checks.

This patch adds two check. One at the start and one at the end of
sstable::write_components. The first one should give us better
backtraces if the large_data_handler is already stopped. The second
one should help catch some race condition.

Refs: #5470
Message-Id: <20191231173237.19040-1-espindola@scylladb.com>
2020-01-01 12:03:31 +02:00
Benny Halevy
abda12107f sstables: move_to_new_dir: add do_sync_dirs param
To be used for "batch" move of several sstables from staging
to the base directory, allowing the caller to sync the directories
once when all are moved rather than for each one of them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Benny Halevy
6efef84185 sstable: return future from move_to_new_dir
distributed_loader::probe_file needlessly creates a seastar
thread for it and the next patch will use it as part of
a parallel_for_each loop to move a list of sstables
(and sync the directories once at the end).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Pavel Solodovnikov
2f442f28af treewide: add const qualifiers throughout the code base 2019-11-26 02:24:49 +03:00
Benny Halevy
f9e93bba38 sstables: compaction: move cleanup parameter to compaction_descriptor
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>
2019-11-18 10:52:20 +01:00
Nadav Har'El
2fb2eb27a2 sstables: allow non-traditional characters in table name
The goal of this patch is to fix issue #5280, a rather serious Alternator
bug, where Scylla fails to restart when an Alternator table has secondary
indexes (LSI or GSI).

Traditionally, Cassandra allows table names to contain only alphanumeric
characters and underscores. However, most of our internal implementation
doesn't actually have this restriction. So Alternator uses the characters
':' and '!' in the table names to mark global and local secondary indexes,
respectively. And this actually works. Or almost...

This patch fixes a problem of listing, during boot, the sstables stored
for tables with such non-traditional names. The sstable listing code
needlessly assumes that the *directory* name, i.e., the CF names, matches
the "\w+" regular expression. When an sstable is found in a directory not
matching such regular expression, the boot fails. But there is no real
reason to require such a strict regular expression. So this patch relaxes
this requirement, and allows Scylla to boot with Alternator's GSI and LSI
tables and their names which include the ":" and "!" characters, and in
fact any other name allowed as a directory name.

Fixes #5280.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191114153811.17386-1-nyh@scylladb.com>
2019-11-17 14:27:47 +02:00
Avi Kivity
27ef73f4f1 Merge "Report file I/O in CQL tracing when reading from sstables." from Kamil
"
Introduce the traced_file class which wraps a file, adding CQL trace messages before and after every operation that returns a future.
Use this file to trace reads from SSTable data and index files.

Fixes #4908.
"

* 'traced_file' of https://github.com/kbr-/scylla:
  sstables: report sstable index file I/O in CQL tracing
  sstables: report sstable data file I/O in CQL tracing
  tracing: add traced_file class
2019-10-26 22:53:37 +03:00
Kamil Braun
432ef7c9af sstables: report sstable index file I/O in CQL tracing
Use tracing::make_traced_file when reading from the index file in
index_reader.
2019-10-25 14:10:28 +02:00