Commit Graph

797 Commits

Author SHA1 Message Date
Botond Dénes
cc5137ffe3 table: require a valid permit to be passed to most read methods
Now that the most prevalent users (range scan and single partition
reads) all pass valid permits we require all users to do so and
propagate the permit down towards `make_sstable_reader()`. The plan is
to use this permit for restricting the sstable readers, instead of the
semaphore the table is configured with. The various
`make_streaming_*reader()` overloads keep using the internal semaphores
as but they also create the permit before the read starts and pass it to
`make_sstable_reader()`.
2020-05-28 11:34:35 +03:00
Botond Dénes
14743c4412 data_query, mutation_query: use query_class_config
We want to move away from the current practice of selecting the relevant
read concurrency semaphore inside `table` and instead want to pass it
down from `database` so that we can pass down a semaphore that is
appropriate for the class of the query. Use the recently created
`query_class_config` struct for this. This is added as a parameter to
`data_query`, `mutation_query` and propagated down to the point where we
create the `querier` to execute the read. We are already propagating
down a parameter down the same route -- max_memory_reverse_query --
which also happens to be part of `query_class_config`, so simply replace
this parameter with a `query_class_config` one. As the lower layers are
not prepared for a semaphore passed from above, make sure this semaphore
is the same that is selected inside `table`. After the lower layers are
prepared for a semaphore arriving from above, we will switch it to be
the appropriate one for the class of the query.
2020-05-28 11:34:35 +03:00
Botond Dénes
e0b98ba921 database: give system reads a concurrency boost during startup
In the next patches we will match reads to the appropriate reader
concurrency semaphore based on the scheduling group they run in. This
will result in a lot of system reads that are executed during startup
and that were up to now (incorrectly) using the user read semaphore to
switch to the system read semaphore. This latter has a much more
constrained concurrency, which was observed to cause system reads to
saturate and block on the semaphore, slowing down startup.
To solve this, boost the concurrency of the system read semaphore during
startup to match that of the user semaphore. This is ok, as during
startup there are no user reads to compete with. After startup, before
we start serving user reads the concurrency is reverted back to the
normal value.
2020-05-28 10:40:08 +03:00
Tomasz Grabiec
1424543e11 Merge "Move sstables_format on sstable_manager" from Pavel Emelyanov
The format is currently sitting in storage_service, but the
previous set patched all the users not to call it, instead
they use sstables_manager to get the highest supported format.
So this set finalizes this effort and places the format on
sstables_manager(s).

The set introduces the db::sstables_format_selector, that

 - starts with the lowest format (ka)
 - reads one on start from system tables
 - subscribes on sstables-related features and bumps
   up the selection if the respective feature is enabled

During its lifetime the selector holds a reference to the
sharded<database> and updates the format on it, the database,
in turn, propagates it further to sstables_managers. The
managers start with the highest known format (mc) which is
done for tests.

* https://github.com/xemul/scylla br-move-sstables-format-4:
  storage_service: Get rid of one-line helpers
  system_keyspace: Cleanup setup() from storage_service
  format_selector: Log which format is being selected
  sstables_manager: Keep format on
  format_selector: Make it standalone
  format_selector: Move the code into db/
  format_selector: Select format locally
  storage_service: Introduce format_selector
  storage_service: Split feature_enabled_listener::on_enabled
  storage_service: Tossing bits around
  features: Introduce and use masked features
  features: Get rid of per-features booleans
2020-05-27 08:40:05 +03:00
Pavel Emelyanov
89a1b09214 sstables_manager: Keep format on
Make the database be the format_selector target, so
when the format is selected its set on database which
in turn just forwards the selection into sstables
managers. All users of the format are already patched
to read it from those managers.

The initial value for the format is the highest, which
is needed by tests. When scylla starts the format is
updated by format_selector, first after reading from
system tables, then by selectiing it from features.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-05-25 14:17:28 +03:00
Piotr Sarna
92aadb94e5 treewide: propagate trace state to write path
In order to add tracing to places where it can be useful,
e.g. materialized view updates and hinted handoff, tracing state
is propagated to all applicable call sites.
2020-05-18 16:05:23 +02:00
Avi Kivity
beaeda5234 database: remove variadic future from query() and query_mutations()
Variadic futures are deprecated; replace with future<std::tuple<...>>.

Tests: unit (dev)
2020-05-17 18:45:38 +02:00
Glauber Costa
7423ccc318 compaction_manager: allow early aborts through abort sources.
The shutdown process of compaction manager starts with an explicit call
from the database object. However that can only happen everything is
already initialized. This works well today, but I am soon to change
the resharding process to operate before the node is fully ready.

One can still stop the database in this case, but reshardings will
have to finish before the abort signal is processed.

This patch passes the existing abort source to the construction of the
compaction_manager and subscribes to it. If the abort source is
triggered, the compaction manager will react to it firing and all
compactions it manages will be stopped.

We still want the database object to be able to wait for the compaction
manager, since the database is the object that owns the lifetime of
the compaction manager. To make that possible we'll use a future
that is return from stop(): no matter what triggered the abort, either
an early abort during initial resharding or a database-level event like
drain, everything will shut down in the right order.

The abort source is passed to the database, who is responsible from
constructing the compaction manager.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-05-13 16:51:25 -04:00
Avi Kivity
76d21a0c22 Merge 'Make it possible to turn caching off per table and stop caching CDC Log' from Piotr J.
"
We inherited from Origin a `caching` table parameter. It's a map of named caching parameters. Before this PR two caching parameters were expected: `keys` and `rows_per_partition`. So far we have been ignoring them. This PR adds a new caching parameter called `enabled` which can be set to `true` or `false` and controls the usage of the cache for the table. By default, it's set to `true` which reflects Scylla behavior before this PR.

This new capability is used to disable caching for CDC Log table. It is desirable because CDC Log entries are not expected to be read often. They also put much more pressure on memory than entries in Base Table. This is caused by the fact that some writes to Base Table can override previous writes. Every write to CDC Log is unique and does not invalidate any previous entry.

Fixes #6098
Fixes #6146

Tests: unit(dev, release), manual
"

* haaawk-dont_cache_cdc:
  cdc: Don't cache CDC Log table
  table: invalidate disabled cache on memtable flush
  table: Add cache_enabled member function
  cf_prop_defs: persist caching_options in schema
  property_definitions: add get that returns variant
  feature: add PER_TABLE_CACHING feature
  caching_options: add enabled parameter
2020-05-10 15:39:42 +03:00
Avi Kivity
5b971397aa Revert "compaction_manager: allow early aborts through abort sources."
This reverts commit e8213fb5c3. It results
in an assertion failure in remove_index_file_test.

Fixes #6413.
2020-05-10 12:32:18 +03:00
Raphael S. Carvalho
88d2486fca sstables: Synchronize deletion of SSTables in resharding with other operations
Input SSTables of resharding is deleted at the coordinator shard, not at the
shards they belong to.
We're not acquiring deletion semaphore before removing those input SSTables
from the SSTable set, so it could happen that resharding deletes those
SSTables while another operation like snapshot, which acquires the semaphore,
find them deleted.

Let's acquire the deletion semaphore so that the input SSTables will only
be removed from the set, when we're certain that nobody is relying on their
existence anymore.
Now resharding will only delete input SStables after they're safely removed
from the SSTable set of all shards they belong to.

unit: test(dev).

Fixes #6328.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200507233636.92104-1-raphaelsc@scylladb.com>
2020-05-10 10:50:32 +03:00
Ivan Prisyazhnyy
84e25e8ba4 api: support table auto compaction control
The patch implements:

- /storage_service/auto_compaction API endpoint
- /column_family/autocompaction/{name} API endpoint

Those APIs allow to control and request the status of background
compaction jobs for the existing tables.

The implementation introduces the table::_compaction_disabled_by_user.
Then the CompactionManager checks if it can push the background
compaction job for the corresponding table.

New members
===

    table::enable_auto_compaction();
    table::disable_auto_compaction();
    bool table::is_auto_compaction_disabled_by_user() const

Test
===
Tests: unit(sstable_datafile_test autocompaction_control_test), manual

    $ ninja build/dev/test/boost/sstable_datafile_test
    $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000

The test tries to submit a compaction job after playing
with autocompaction control table switch. However, there is
no reliable way to hook pending compaction task. The code
assumed that with_scheduling_group() closure will never
preempt execution of the stats check.

Revert
===
Reverts commit c8247ac. In previous version the execution
sometimes resulted into the following error:

    test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
    critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed

This version adds a few sstables to the cf, starts
the compaction and awaits until it is finished.

API change
===

- `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled).
- `/column_family/autocompaction/` got support for POST/DELETE per table

Fixes
===

Fixes #1488
Fixes #1808
Fixes #440

Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
2020-05-07 16:23:38 +03:00
Glauber Costa
e8213fb5c3 compaction_manager: allow early aborts through abort sources.
The shutdown process of compaction manager starts with an explicit call
from the database object. However that can only happen everything is
already initialized. This works well today, but I am soon to change
the resharding process to operate before the node is fully ready.

One can still stop the database in this case, but reshardings will
have to finish before the abort signal is processed.

This patch passes the existing abort source to the construction of the
compaction_manager and subscribes to it. If the abort source is
triggered, the compaction manager will react to it firing and all
compactions it manages will be stopped.

We still want the database object to be able to wait for the compaction
manager, since the database is the object that owns the lifetime of
the compaction manager. To make that possible we'll use a future
that is return from stop(): no matter what triggered the abort, either
an early abort during initial resharding or a database-level event like
drain, everything will shut down in the right order.

The abort source is passed to the database, who is responsible from
constructing the compaction manager.

Tests: unit (dev), manual start+stop, manual drain + stop

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200506184749.98288-1-glauber@scylladb.com>
2020-05-07 13:24:47 +03:00
Piotr Jastrzebski
1a43849cd2 table: Add cache_enabled member function
This function determines cache usage based both on table _config
and dynamic schema information.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-05-06 18:39:01 +02:00
Piotr Sarna
c66661c582 table: bypass cache when generating view updates from streaming
There's no indication that data needed for generating view updates
from staging sstables is going to be immediately useful for the
user, and a large amount of it can push hot rows out of the cache,
thus deteriorating performance.

Fixes #6233
Tests: unit(dev)
2020-04-26 15:43:02 +03:00
Piotr Sarna
71ac6ebcc5 Merge 'prepare the view building generator to work through a compaction' from Glauber
There is no reason to read a single SSTable at a time from the staging
directory. Moving SSTables from staging directory essentially involves
scanning input SSTables and creating new SSTables (albeit in a different
directory).

We have a mechanism that does that: compactions. In a follow up patch, I
will introduce a new specialization of compaction that moves SSTables
from staging (potentially compacting them if there are plenty).

In preparation for that, some signatures have to be changed and the
view_updating_consumer has to be more compaction friendly. Meaning:
    - Operating with an sstable vector
    - taking a table reference, not a database

Because this code is a bit fragile and the reviewer set is fundamentally
different from anything compaction related, I am sending this separately

* glommer-view_build:
  staging: potentially read many SSTables at the same time
  view_build_test: make sure it works with smp > 1
2020-04-15 18:07:09 +02:00
Glauber Costa
4e6400293e staging: potentially read many SSTables at the same time
There is no reason to read a single SSTable at a time from the staging
directory. Moving SSTables from staging directory essentially involves
scanning input SSTables and creating new SSTables (albeit in a different
directory).

We have a mechanism that does that: compactions. In a follow up patch, I
will introduce a new specialization of compaction that moves SSTables
from staging (potentially compacting them if there are plenty).

In preparation for that, some signatures have to be changed and the
view_updating_consumer has to be more compaction friendly. Meaning:
- Operating with an sstable vector
- taking a table reference, not a database

Because this code is a bit fragile and the reviewer set is fundamentally
different from anything compaction related, I am sending this separately

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-04-15 11:26:44 -04:00
Konstantin Osipov
18b9bb57ac lwt: rename metrics to match accepted terminology
Rename inherited metrics cas_propose and cas_commit
to cas_accept and cas_learn respectively.

A while ago we made a decision to stick to widely accepted
terms for Paxos rounds: prepare, accept, learn. The rest
of the code is using these terms, so rename the metrics
to avoid confusion/technical debt.

While at it, rename a few internal methods and functions.

Fixes #6169

Message-Id: <20200414213537.129547-1-kostja@scylladb.com>
2020-04-15 12:20:30 +02:00
Pekka Enberg
c8247aced6 Revert "api: support table auto compaction control"
This reverts commit 1c444b7e1e. The test
it adds sometimes fails as follows:

  test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
  critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed

Ivan is working on a fix, but let's revert this commit to avoid blocking
next promotion failing from time to time.
2020-04-11 17:56:02 +03:00
Ivan Prisyazhnyy
1c444b7e1e api: support table auto compaction control
This patch adds API endpoint /column_family/autocompaction/{name}
that listen to GET and POST requests to pick and control table
background compactions.

To implement that the patch introduces "_compaction_disabled_by_user"
flag that affects if CompactionManager is allowed to push background
compactions jobs into the work.

It introduces

    table::enable_auto_compaction();
    table::disable_auto_compaction();
    bool table::is_auto_compaction_disabled_by_user() const

to control auto compaction state.

Fixes #1488
Fixes #1808
Fixes #440
Tests: unit(sstable_datafile_test autocompaction_control_test), manual
2020-04-08 21:18:38 +03:00
Glauber Costa
463d0ab37c compaction: move rewrite_sstables to the compaction_manager
There is no reason why the table code has to be aware of the efforts of
rewriting (cleanup, scrub, upgrade) an SSTable versus compacting it.

Rewrite is special, because we need to do it one SSTable at a time,
without lumping it together. However, the compaction manager is totally
capable of doing that itself. If we do that, the special
"table::rewrite_sstables" can be killed.

This code would maybe be better off as a thread, where we wouldn't need
to keep state. However there are some methods like maybe_stop_on_error()
that expect a future so I am leaving this be for now. This is a cleanup
that can be done later.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200401162722.28780-2-glauber@scylladb.com>
2020-04-06 16:02:30 +03:00
Glauber Costa
87dd23db03 compaction: use a larger min_threshold during bootstrap, replace
During bootstrap and replace operations the node can't take reads and
we'd like to see the process ending ASAP. This is because until the
process ends, we keep having to duplicate writes to an extended set. Not
to mention, in the case of a cluster expansion users want to use the
added capacity sooner rather than later.

Streaming generates a lot of compaction activity, that competes with the
bootstrap itself, slowing it down.

Long term, we are moving to treat those compactions differently and
maybe postpone them altogether. However for now we can reduce the amount
of compactions by increasing the minimum threshold of SSTables that have
to accumulate before they are selected for compactions. The default is
2, meaning we will trigger a compaction every time 2 SSTables of about
the same size are found (for STCS, others follow a similar pattern).

Until we have offstrategy infrastructure we don't want the compactions
to stop happening altogether so the reads, when they start, don't
suffer.  This patch sets the minimum threshold to 16 (for the default
max_threshold of 32), meaning we will generate a lot less compaction
activity during streaming. Once streaming is done we revert it to its
original.

Unfortunately there isn't much we can do at the moment about decommission.
During decommission the nodes receiving data are also taking reads and
we don't want SSTables to accumulate.

Fixes #5109

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-04-01 10:06:27 +03:00
Rafael Ávila de Espíndola
c5795e8199 everywhere: Replace engine().cpu_id() with this_shard_id()
This is a bit simpler and might allow removing a few includes of
reactor.hh.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200326194656.74041-1-espindola@scylladb.com>
2020-03-27 11:40:03 +03:00
Nadav Har'El
7922b9eb8f materialized views: reduce recompilation when db/view/view.hh changes.
Before this patch, when db/view/view.hh was modified, 89 source files had to
be recompiled. After this patch, this number is down to 5.

Most of the irrelevant source files got view.hh by including database.hh,
which included view.hh just for the definition of statistics. So in this
patch we split the view statistics to a separate header file, view_stats.hh,
and database.hh only includes that. A few source files which included
only database.hh and also needed view.hh (for materialized-view related
functions) now need to include view.hh explicitly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200319121031.540-1-nyh@scylladb.com>
2020-03-19 15:46:14 +02:00
Pavel Emelyanov
96e3d0fa36 mutation_partition: Debloat header form others
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200317191051.12623-1-xemul@scylladb.com>
2020-03-18 11:53:36 +02:00
Piotr Jastrzebski
924ed7bb1c make_multishard_combining_reader: stop taking partitioner
The function already takes schema so there's no need
for it to take partitioner. It can be obtained using
schema::get_partitioner

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:20 +01:00
Piotr Jastrzebski
54d24553bb schema: get_partitioner return const&
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-06 13:33:53 +01:00
Pavel Emelyanov
4fa12f2fb8 header: De-bloat schema.hh
The header sits in many other headers, but there's a handy
schema_fwd.hh that's tiny and contains needed declarations
for other headers. So replace shema.hh with schema_fwd.hh
in most of the headers (and remove completely from some).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200303102050.18462-1-xemul@scylladb.com>
2020-03-03 11:34:00 +01:00
Avi Kivity
157fe4bd19 Merge "Remove default timeouts" from Botond
"
Timeouts defaulted to `db::no_timeout` are dangerous. They allow any
modifications to the code to drop timeouts and introduce a source of
unbounded request queue to the system.

This series removes the last such default timeouts from the code. No
problems were found, only test code had to be updated.

tests: unit(dev)
"

* 'no-default-timeouts/v1' of https://github.com/denesb/scylla:
  database: database::query*(), database::apply*(): remove default timeouts
  database: table::query(): remove default timeout
  mutation_query: data_query(): remove default timeout
  mutation_query: mutation_query(): remove default timeout
  multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout
  reader_concurrency_semaphore: wait_admission(): remove default timeout
  utils/logallog: run_when_memory_available(): remove default timeout
2020-03-01 17:29:17 +02:00
Rafael Ávila de Espíndola
ba453d832b Pass string_view to keyspace_metadata::new_keyspace
This avoids a few sstring copies.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola
94d07fba07 Pass string_view to the keyspace_metadata constructor
This avoids a few sstring copies when constructing keyspace_metadata.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-02-28 17:04:12 -08:00
Rafael Ávila de Espíndola
2b96abcece Pass string_view to no_such_column_family's constructor
With this we don't have to construct a sstring to construct a
no_such_column_family.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-02-28 08:36:27 -08:00
Botond Dénes
1073094f04 database: database::query*(), database::apply*(): remove default timeouts 2020-02-27 19:14:12 +02:00
Botond Dénes
2c1ee7b9cd database: table::query(): remove default timeout 2020-02-27 19:14:09 +02:00
Botond Dénes
75efa707ce db/config: add config memory limit of otherwise unlimited queries
We have a few kind of queries whose memory consumption is not limited at
all. One of these is reverse queries, which reads entire partitions into
memory, before reversing them. These partitions can be larger than
memory and thus such a query can single-handedly cause OOM.
This patch introduces a configuration for a memory limit for such
queries. This will serve as a hard limit and queries which attempt to
use more memory than this, will be aborted.
The limit is propagated to table objects, with the intention of keeping
system tables unlimited. These tables are usually small and initiators
of system queries are not prepared for failures.
2020-02-27 18:11:54 +02:00
Piotr Sarna
5e07c00eeb Merge 'Delete table snapshot' from Amnon
This series adds an option to the API that supports deleting
a specific table from a snapshot.
The implementation works in a similar way to the option
to specify specific keyspaces when deleting a snapshot.
The motivation is to allow reducing disk-space when using
the snapshot for backup. A dtest PR is sent to the dtest
repository.

Fixes #5658

Original PR #5805

Tests: (database_test) (dtest snapshot_test.py:TestSnapshot.test_cleaning_snapshot_by_cf)

* amnonh/delete_table_snapshot:
  test/boost/database_test: adopt new clear_snapshot signature
  api/storage_service: Support specifying a table when deleting a snapshot
  storage_service: Add optional table name to clear snapshot

* amnonh/delete_table_snapshot:
  test/boost/database_test: adopt new clear_snapshot signature
  api/storage_service: Support specifying a table when deleting a snapshot
  storage_service: Add optional table name to clear snapshot
2020-02-24 09:38:57 +01:00
Tomasz Grabiec
d0b6be0820 Merge "Don't return stale data by properly invalidating row cache after cleanup" from Raphael
Row cache needs to be invalidated whenever data in sstables
changes. Cleanup removes data from sstables which doesn't belong to
the node anymore, which means cache must be invalidated on cleanup.
Currently, stale data can be returned when a node re-owns ranges which
data are still stored in the node's row cache, because cleanup didn't
invalidate the cache."

Fixes #4446.

tests:
- unit tests (dev mode)
- dtests:
    update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test
    cleanup_test.py
2020-02-20 18:20:56 +01:00
Raphael S. Carvalho
fa16845353 database: Fix on_compaction_completion doc
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-02-19 19:30:34 -03:00
Raphael S. Carvalho
65b4fc8bcd sstables/compaction: Introduce compaction_completion_desc
This descriptor contain all information needed for table to be properly
updated on compaction completion. A new member will be added to it soon,
which will store ranges to be invalidated in row cache on behalf of
cleanup compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-02-19 19:29:32 -03:00
Pavel Emelyanov
8435e93549 db: Move unbounded_range_tombstones listening from storage_service
Now the database keeps reference on feature service, so we
can listen on the feature in it directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-19 14:08:24 +03:00
Amnon Heiman
c3260bad25 storage_service: Add optional table name to clear snapshot
There are cases when it is useful to delete specific table from a
snapshot.

An example is when a snapshot is used for backup. Backup can take a long
period of time, during that time, each of the tables can be deleted once
it was backup without waiting for the entire backup process to
completed.

This patch adds such an option to the database and to the storage_service
wrapping method that calls it.

If a table is specified a filter function is created that filter only
the column family with that given name.

This is similar to the filtering at the keyspace level.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-02-18 16:34:10 +02:00
Avi Kivity
6c7aa18238 Merge "Introduce schema::get_partitioner" from Piotr
"
Introduce schema::get_partitioner and use it instead of dht::global_partitioner.

Fixes #5493

Tests: unit(dev, release, debug)
"

* 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits)
  cdc: stop using partitioners
  partitioner_test: stop calling set_global_partitioner
  storage_service: stop calling global_partitioner()
  mutation_writer_test: stop calling global_partitioner()
  schema: reduce number of global_partitioner() calls
  test_services: stop calling global_partitioner()
  sstable_utils: stop calling global_partitioner()
  sstable_resharding_test: stop depending on global partitioner
  sstable_mutation_test: stop calling global_partitioner()
  sstable_data_file_test: stop calling global_partitioner()
  random_schema: stop taking partitioner in constructor
  mutation_reader_test: stop calling global_partitioner()
  multishard_mutation_query_test: stop calling global_partitioner()
  row_level repair: stop calling global_partitioner()
  distribute_reader_and_consume_on_shards: don't take partitioner
  thrift: reduce global_partitioner() calls
  binary_search: stop calling global_partitioner()
  index_entry: stop calling global_partitioner()
  mc writer: stop calling global_partitioner()
  sstable: stop calling global_partitioner()
  ...
2020-02-17 18:12:53 +02:00
Piotr Dulikowski
01084a79b8 hh: send orphaned hints on HINT_MUTATION verb
When replaying a hint with a destination node that is no longer in the
cluster, it will be sent with cl=ALL to all its new replicas. Before
this patch, the MUTATION verb was used, which causes such hints to be
handled on the same connection and with the same priority as regular
writes. This can cause problems when a large number of hints is
orphaned and they are scheduled to be sent at once. Such situation
may happen when replacing a dead node - all nodes that accumulated hints
for the dead node will now send them with cl=ALL to their new replicas.

This patch changes the verb used to send such hints to HINT_MUTATION.
This verb is handled on a separate connection and with streaming
scheduling group, which gives them similar priority to non-orphaned
hints.

Refs: #4712

Tests: unit(dev)
2020-02-17 14:45:22 +01:00
Tomasz Grabiec
76d1dd7ec6 Merge "nodetool scrub: implement validation and the skip-corrupted flag
" from Botond

Nodetool scrub rewrites all sstables, validating their data. If corrupt
data is found the scrub is aborted. If the skip-corrupted flag is set,
corrupt data is instead logged (just the keys) and skipped.

The scrubbing algorithm itself is fairly simple, especially that we
already have a mutation stream validator that we can use to validate the
data. However currently scrub is piggy-backed on top of cleanup
compaction. To implement this flag, we have to make scrub a separate
compaction type and propagate down the flag. This required some
massaging of the code:
* Add support for more than two (cleanup or not) compaction types.
* Allow passing custom options for each compaction type.
* Allow stopping a compaction without the manager retrying it later.

Additionally the validator itself needed some changes to allow different
ways to handle errors, as needed by the scrub.

Fixes: #5487

* https://github.com/denesb/nodetool-scrub-skip-corrupted/v7:
  table: cleanup_sstables(): only short-circuit on actual cleanup
  compaction: compaction_type: add Upgrade
  compaction: introduce compaction_options
  compaction: compaction_descriptor: use compaction options instead of
    cleanup flag
  compaction_manager: collect all cleanup related logic in
    perform_cleanup()
  sstables: compaction_stop_exception: add retry flag
  mutation_fragment_stream_validator: split into low-level and
    high-level API
  compaction: introduce scrub_compaction
  compaction_manager: scrub: don't piggy-back on upgrade_sstables()
  test: sstable_datafile_test: add scrub unit test
2020-02-17 15:28:07 +02:00
Piotr Jastrzebski
abd76e566f dht::shard_of: stop calling global_partitioner()
Take const schema& as a parameter of shard_of and
use it to obtain partitioner instead of calling
global_partitioner().

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:23:16 +01:00
Botond Dénes
8014c7124d compaction_manager: collect all cleanup related logic in perform_cleanup()
Currently the call chain for a cleanup collection looks like this:
compaction_manager::perform_cleanup()
    compaction_manager::rewrite_sstables()
        table::cleanup_sstables()
            ...

`perform_cleanup()` is essentially empty, immediately deferring to
`rewrite_sstables()`. Cleanup related logic is scattered between the
latter two methods on the call chain. These methods however recently
started serving as generic methods for compactions that want to
rewrite each sstable one-by-one, collecting cleanup related ifs in
various places.
The reason is historic, we first had cleanup, then bolted others on top,
trying to share the underlying code as much as possible.

It is time this is cleaned up (pun intended). Make `perform_cleanup()`
the place where all cleanup related logic is, with the rest of the stack
made truly generic.
2020-02-11 17:47:44 +02:00
Botond Dénes
b2dc5d4895 compaction: compaction_descriptor: use compaction options instead of cleanup flag
Instead of the restrictive `cleanup` boolean flag, which allows for choosing
between only two compaction types, use `compaction_options`, which in
addition to allowing any number of compaction types to be selected,
also allows seamlessly passing specific options to them.
2020-02-11 17:47:44 +02:00
Pavel Emelyanov
1a3f78a57d database: Use own token_metadata
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-10 20:54:32 +03:00
Pavel Emelyanov
de1dc59548 migration_manager: Refactor validation of new/updating ksm
The goal is to have token_metadata reference intide the
keyspace_metadata.validate method. This can be acheived
by doing the validation through the database reference
which is "at hands" in migration_manager.

While at it, merge the validation with exists/not-exists
checks done in the same places.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-10 18:10:38 +03:00
Avi Kivity
bed61b96a2 Merge "Move features from storage- into feature-service" from Pavel
"
There's a lot of code around that needs storage service purely to
get the specific feature value (cluster_supports_<something> calls).
This creates several circular dependencies, e.g. storage_service <->
migration_manager one and database <-> storage_servuce. Also features
sit on storage_service, but register themselfs on the feature_service
and the former subscribes on them back which also looks strange.

I propose to keep all the features on feature_service, this keeps the
latter intependent from other components, makes it possible to break
one of the mentioned circle dependencyand heavily relax the other.

Also the set helps us fighting the globals and, after it, the
feature_service can be safely stopped at the very last moment.

Tests: unit(dev), manual debug build start-stop
"

* 'br-features-to-service-5' of https://github.com/xemul/scylla:
  gossiper: Avoid string merge-split for nothing
  features: Stop on shutdown
  storage_service: Remove helpers
  storage_service: Prepare to switch from on-board feature helpers
  cql3: Check feature in .validate
  database: Use feature service
  storage_proxy: Use feature service
  migration_manager: Use feature service
  start: Pass needed feature as argument into migrate_truncation_records
  features: Unfriend storage_service
  features: Simplify feature registration
  features: Introduce known_feature_set
  features: Move disabled features set from storage_service
  features: Move schema_features helper
  features: Move all features from storage_service to feature_service
  storage_service: Use feature_config from _feature_service
  features: Add feature_config
  storage_service: Kill set_disabled_features
  gms: Move features stuff into own .cc file
  migration_manager: Move some fns into class
2020-02-09 19:22:07 +02:00