Commit Graph

39 Commits

Author SHA1 Message Date
Botond Dénes
cc5137ffe3 table: require a valid permit to be passed to most read methods
Now that the most prevalent users (range scan and single partition
reads) all pass valid permits we require all users to do so and
propagate the permit down towards `make_sstable_reader()`. The plan is
to use this permit for restricting the sstable readers, instead of the
semaphore the table is configured with. The various
`make_streaming_*reader()` overloads keep using the internal semaphores
as but they also create the permit before the read starts and pass it to
`make_sstable_reader()`.
2020-05-28 11:34:35 +03:00
Glauber Costa
e29701ca1c compaction_manager: expand state to be able to differentiate between enabled and stopped
We are having many issues with the stop code in the compaction_manager.
Part of the reason is that the "stopped" state has its meaning overloaded
to indicate both "compaction manager is not accepting compactions" and
"compaction manager is not ready or destructed".

In a later step we could default to enabled-at-start, but right now we
maintain current behavior to minimize noise.

It is only possible to stop the compaction manager once.
It is possible to enable / disable the compaction manager many times.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-05-13 16:51:25 -04:00
Glauber Costa
70a89ab4ab compaction: do not assume I/O priority class
We shouldn't assume the I/O priority class for compactions.  For
instance, if we are dealing with offstrategy compactions we may want to
use the maintenance group priority for them.

For now, all compactions are put in the compaction class.  rewrite
compactions (scrub, cleanup) could be maintenance, but we don't have
clear access to the database object at this time to derive the
equivalent CPU priority. This is planned to be changed in the future,
and when we do change it, we'll adjust.

Same goes for resharding: while we could at this point change it we'd
risking memory pressure since resharding is run online and sstables are
shared until resharding is done. When we move it to offline execution
we'll do it with maintenance priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200512002233.306538-3-glauber@scylladb.com>
2020-05-12 08:23:19 +03:00
Ivan Prisyazhnyy
84e25e8ba4 api: support table auto compaction control
The patch implements:

- /storage_service/auto_compaction API endpoint
- /column_family/autocompaction/{name} API endpoint

Those APIs allow to control and request the status of background
compaction jobs for the existing tables.

The implementation introduces the table::_compaction_disabled_by_user.
Then the CompactionManager checks if it can push the background
compaction job for the corresponding table.

New members
===

    table::enable_auto_compaction();
    table::disable_auto_compaction();
    bool table::is_auto_compaction_disabled_by_user() const

Test
===
Tests: unit(sstable_datafile_test autocompaction_control_test), manual

    $ ninja build/dev/test/boost/sstable_datafile_test
    $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000

The test tries to submit a compaction job after playing
with autocompaction control table switch. However, there is
no reliable way to hook pending compaction task. The code
assumed that with_scheduling_group() closure will never
preempt execution of the stats check.

Revert
===
Reverts commit c8247ac. In previous version the execution
sometimes resulted into the following error:

    test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
    critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed

This version adds a few sstables to the cf, starts
the compaction and awaits until it is finished.

API change
===

- `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled).
- `/column_family/autocompaction/` got support for POST/DELETE per table

Fixes
===

Fixes #1488
Fixes #1808
Fixes #440

Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
2020-05-07 16:23:38 +03:00
Raphael S. Carvalho
a214ccdf89 sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set
Garbage collected SSTable is incorrectly added to SSTable set with a function
that invalidates row cache. This problem is fixed by adding GC SStable
to set using mechanism which replaces old sstables with new sstables.

Also, adding GC SSTable to set in a separate call is not correct.
We should make sure that GC SSTable reaches the SSTable set at the same time
its respective old (input) SSTable is removed from the set, and that's done
using a single request call to table.

Fixes #5956.
Fixes #6275.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-05-05 12:03:19 -03:00
Raphael S. Carvalho
8f4458f1d5 sstables/compaction: Change meaning of compaction_completion_desc input and output fields
input_sstables is renamed to old_sstables and is about old SSTables that should be
deleted and removed from the SSTable set.
output_sstables is renamed to new_sstables and is about new SSTable that should be
added to the SSTable set, replacing the old ones.

This will allow us, for example, to add auxiliary SSTables to SSTable set using
the same call which replaces output SSTables by input SSTables in compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-05-05 12:03:08 -03:00
Glauber Costa
55f5ca39a9 sstable_test: rework test to use a thread
The compaction_manager test lives inside a thread and it is not taking
advantage of it, with continuations all over.

One of the side effects of it is that the test is calling stop() twice
on the compaction_manager.  While this works today, it is not good
practice. A change I am making is just about to break it.

This patch converts the test to fully use .get() instead of chained
continuations and in doing so also guarantees that the compaction
manager will be RAII-stopped just one, from a defer object.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200503161420.8346-2-glauber@scylladb.com>
2020-05-03 19:54:04 +03:00
Pekka Enberg
c8247aced6 Revert "api: support table auto compaction control"
This reverts commit 1c444b7e1e. The test
it adds sometimes fails as follows:

  test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
  critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed

Ivan is working on a fix, but let's revert this commit to avoid blocking
next promotion failing from time to time.
2020-04-11 17:56:02 +03:00
Ivan Prisyazhnyy
1c444b7e1e api: support table auto compaction control
This patch adds API endpoint /column_family/autocompaction/{name}
that listen to GET and POST requests to pick and control table
background compactions.

To implement that the patch introduces "_compaction_disabled_by_user"
flag that affects if CompactionManager is allowed to push background
compactions jobs into the work.

It introduces

    table::enable_auto_compaction();
    table::disable_auto_compaction();
    bool table::is_auto_compaction_disabled_by_user() const

to control auto compaction state.

Fixes #1488
Fixes #1808
Fixes #440
Tests: unit(sstable_datafile_test autocompaction_control_test), manual
2020-04-08 21:18:38 +03:00
Avi Kivity
e9e2b75a76 Merge "Allow Major compactions for TWCS" from Glauber
"
This patch makes makes major compaction aware of time buckets
for TWCS. That means that calling a major compaction with TWCS
will not bundle all SSTables together, but rather split them
based on their timestamps.

There are two motivations for this work:

Telling users not to ever major compact is easier said than
done: in practice due to a variety of circumstances it might
end up being done in which case data will have a hard time
expiring later.

We are about to start working with offstrategy compactions,
which are compactions that work in parallel with the main
compactions. In those cases we may be converting SSTables from
one format to another and it might be necessary to split a single
big STCS SSTable into something that TWCS expects

In order to achieve that, we start by changing the way resharding works:
it will now work with a read interposer, similar to the one TWCS uses for
streaming data. Once we do that, a lot of assumptions that exist in the
compaction code can be simplified and supporting TWCS major
compactions become a matter of simply enabling its interposer in the
compaction code as well.

There are many further simplifications that this work exposes:

The compaction method create_new_sstable seems out of place. It is not
used by resharding, and it seems duplicated for normal compactions. We
could clean it up with more refactoring in a later patch.
The whole logic of the feed_writer could be part of the consumer code.
Testing details:

scylla unit tests (dev, release)
sstable_datafile_test (debug)
dtests (resharding_test.py)
manual scylla resharding

Fixes #1431
"

Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

* 'twcs-major-v3' of github.com:glommer/scylla:
  compaction: make major compaction time-aware with TWCS
  compaction: do resharding through an interposer
  mutation_writer: introduce shard_based splitting writer
  mutation_writer: factor out part of the code for the timestamp splitter
  compaction: abort if create_new_sstable is called from resharding
2020-04-06 12:54:08 +03:00
Avi Kivity
88ade3110f treewide: replace calls to engine().some_api() with some_api()
This removes the need to include reactor.hh, a source of compile
time bloat.

In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.

Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().

Ref #1
2020-04-05 12:46:04 +03:00
Glauber Costa
098b215b0d compaction: make major compaction time-aware with TWCS
This patch makes makes major compaction aware of time buckets
for TWCS. That means that calling a major compaction with TWCS
will not bundle all SSTables together, but rather split them
based on their timestamps.

There are two motivations for this work:
1. Telling users not to ever major compact is easier said than
   done: in practice due to a variety of circumstances it might
   end up being done in which case data will have a hard time
   expiring later.

2. We are about to start working with offstrategy compactions,
   which are compactions that work in parallel with the main
   compactions. In those cases we may be converting SSTables from
   one format to another and it might be necessary to split a single
   big STCS SSTable into something that TWCS expects

With the motivation out of the way, let's talk about the implementation:
The implementation is quite simple and builds upon the previous patches.
It simply specializes the interposer implementation for regular compaction
with a table-specific interposer.

Fixes #1431

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-04-03 10:10:10 -04:00
Pekka Enberg
75b55cea88 Merge "Resharding through compact sstables" from Glauber
"
This patchseries is part of my effort to make resharding less special -
and hopefully less problematic.  The next steps are a bit heavy, so I'd
like to, if possible, get this out of the way.

After these two patches, there is no more need to ever call
reshard_sstables: compact_sstables will do, and it will be able to
recognize resharding compactions.

To do that we need to unify the creator function, which is trivially
done by adding a shard parameter to regular compactions as well: they
can just ignore it. I have considered just making the
compaction_descriptor have a virtual create() function and specializing
it, but because we have to store the creator in the compaction object I
decided to keep the virtual function for now.

In a later cleanup step, if we can for instance store the entire
compaction_descriptor object in the compaction object we could do that.

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Tests: unit tests (dev), dtest (resharding.py)
"

* 'resharding-through-compact-sstables' of github.com:glommer/scylla:
  resharding: get rid of special reshard_sstables
  compaction: enhance compaction_descriptor with creator and replace function
2020-04-02 14:43:35 +02:00
Glauber Costa
e8801cd77b compaction: enhance compaction_descriptor with creator and replace function
There are many differences between resharding and compaction that are
artificial, arising more from the way we ended up implementing it than
necessity. This patch attempts to pass the creator and replacer functions
through the compaction_descriptor.

There is a difference between the creator function for resharding and
regular compaction: resharding has to pass the shard number on behalf
of which the SSTable is created. However regular compactions can just
ignore this. No need to have a special path just for this.

After this is done, the constructor for the compaction object can be
greatly simplified. In further patches I intend to simplify it a bit
further, but some more cleanup has to happen first.

To make that happen we have to construct a compaction_descriptor object
inside the resharding function. This is temporary: resharding currently
works with a descriptor, but at some point that descriptor is lost and
broken into pieces to be passed to this function. The overarching goal
of this work is exactly to be able to keep that descriptor for as long
as possible, which should simplify things a lot.

Callers are patched, but there are plenty for sstable_datafile_test.cc.
For their benefit, a helper function is provided to keep the previous
signature (test only).

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-03-31 19:41:25 -04:00
Piotr Jastrzebski
e72696a8e6 sharding_info: rename the class to sharder
Also rename all variables that were named si or sinfo
to sharder.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
7bd2b8d73f schema: make it possible to set sharding_info per schema
Previously schema::get_sharding_info was obtaining
sharding_info from the partitioner but we want to remove
sharding_info from the partitioner so we need a place
in schema to store it there instead.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
dc2e060313 create_token_range_from_keys: use sharding info for shard_of
Replace i_partitioner::shard_of with sharding_info::shard_of

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Glauber Costa
dd65f7dcbb tests: move token_generation_for_shard to common code
We now have a utils file for SSTables. This is potentially useful for
other tests.

As a matter of fact, this function is repeated right now for the
resharding test. And to add insult to injury, the version in the
resharding test has the parameters shard and number of tokens flipped,
which although extremely confusing is the predictable outcome of
such repetition

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-03-22 19:00:26 +02:00
Piotr Jastrzebski
7064f6b831 partitioner: hide dht::default_partitioner
Remove last usage of this global outside i_partitioner.cc
and hide it inside the compilation unit.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:20 +01:00
Piotr Jastrzebski
54d24553bb schema: get_partitioner return const&
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-06 13:33:53 +01:00
Piotr Jastrzebski
08ebf1f69d sstable_datafile_test: stop calling dht::global_partitioner()
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-06 13:33:53 +01:00
Konstantin Osipov
ff3f9cb7cf test: stop using BOOST_TEST_MESSAGE() for logging
We use boost test logging primarily to generate nice XML xunit
files used in Jenkins. These XML files can be bloated
with messages from BOOST_TEST_MESSAGE(), hundreds of megabytes
of build archives, on every build.

Let's use seastar logger for test logging instead, reserving
the use of boost log facilities for boost test markup information.
2020-03-05 11:38:11 +03:00
Avi Kivity
906784639d Merge "Clean sstables from using global objects" from Pavel E
"
This set cleans sstable_writer_config and surrounding sstables
code from using global storage_ and feature_ service-s and database
by moving the configuration logic onto sstables_manager (that
was supposed to do it since eebc3701a5).

Most of the complexity is hidden around sstable_writer_config
creation, this set makes the sstables_manager create this object
with an explicit call. All the rest are consequences of this change.

Tests: unit(debug), manual start-stop
"

* 'br-clean-sstables-manager-2' of https://github.com/xemul/scylla:
  sstables: Move get_highest_supported_format
  sstables: Remove global get_config() helper
  sstables: Use manager's config() in .new_sstable_component_file()
  sstable_writer_config: Extend with more db::config stuff
  sstables_manager: Don't use global helper to generate writer config
  sstable_writer_config: Sanitize out some features fields initialization
  sstable_writer_config: Factor out some field initialization
  sstables: Generate writer config via manager only
  sstables: Keep reference on manager
  test: Re-use existing global sstables_manager
  table: Pass sstable_writer_config into write_memtable_to_sstable
2020-03-03 18:33:01 +02:00
Botond Dénes
b6f8a6fbd3 test/boost: sstable_datafile_test: sstable_scrub_test: stop table
`table` is not registered with the database, and hence will not be
waited on during shutdown.
Stop it explicitly to prevent any asynchronous operation on it racing
with shutdown.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200302142845.569638-1-bdenes@scylladb.com>
2020-03-02 16:20:00 +01:00
Pavel Emelyanov
5adce3390c sstables: Generate writer config via manager only
The sstable_writer_config creation looks simple (just declare
the struct instance) but behind the scenes references storage
and feature services, messes with database config, etc.

This patch teaches the sstables_manager generate the writer
config and makes the rest of the code use it. For future
safety by-hands creation of the sstable_writer_config is
prohibited.

The manager is referenced through table-s and sstable-s, but
two existing sstables_managers live on database object, and
table-s and sstable-s both live shorter than the database,
this reference is save.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-02-25 14:31:04 +03:00
Raphael S. Carvalho
65b4fc8bcd sstables/compaction: Introduce compaction_completion_desc
This descriptor contain all information needed for table to be properly
updated on compaction completion. A new member will be added to it soon,
which will store ranges to be invalidated in row cache on behalf of
cleanup compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2020-02-19 19:29:32 -03:00
Avi Kivity
6c7aa18238 Merge "Introduce schema::get_partitioner" from Piotr
"
Introduce schema::get_partitioner and use it instead of dht::global_partitioner.

Fixes #5493

Tests: unit(dev, release, debug)
"

* 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits)
  cdc: stop using partitioners
  partitioner_test: stop calling set_global_partitioner
  storage_service: stop calling global_partitioner()
  mutation_writer_test: stop calling global_partitioner()
  schema: reduce number of global_partitioner() calls
  test_services: stop calling global_partitioner()
  sstable_utils: stop calling global_partitioner()
  sstable_resharding_test: stop depending on global partitioner
  sstable_mutation_test: stop calling global_partitioner()
  sstable_data_file_test: stop calling global_partitioner()
  random_schema: stop taking partitioner in constructor
  mutation_reader_test: stop calling global_partitioner()
  multishard_mutation_query_test: stop calling global_partitioner()
  row_level repair: stop calling global_partitioner()
  distribute_reader_and_consume_on_shards: don't take partitioner
  thrift: reduce global_partitioner() calls
  binary_search: stop calling global_partitioner()
  index_entry: stop calling global_partitioner()
  mc writer: stop calling global_partitioner()
  sstable: stop calling global_partitioner()
  ...
2020-02-17 18:12:53 +02:00
Tomasz Grabiec
76d1dd7ec6 Merge "nodetool scrub: implement validation and the skip-corrupted flag
" from Botond

Nodetool scrub rewrites all sstables, validating their data. If corrupt
data is found the scrub is aborted. If the skip-corrupted flag is set,
corrupt data is instead logged (just the keys) and skipped.

The scrubbing algorithm itself is fairly simple, especially that we
already have a mutation stream validator that we can use to validate the
data. However currently scrub is piggy-backed on top of cleanup
compaction. To implement this flag, we have to make scrub a separate
compaction type and propagate down the flag. This required some
massaging of the code:
* Add support for more than two (cleanup or not) compaction types.
* Allow passing custom options for each compaction type.
* Allow stopping a compaction without the manager retrying it later.

Additionally the validator itself needed some changes to allow different
ways to handle errors, as needed by the scrub.

Fixes: #5487

* https://github.com/denesb/nodetool-scrub-skip-corrupted/v7:
  table: cleanup_sstables(): only short-circuit on actual cleanup
  compaction: compaction_type: add Upgrade
  compaction: introduce compaction_options
  compaction: compaction_descriptor: use compaction options instead of
    cleanup flag
  compaction_manager: collect all cleanup related logic in
    perform_cleanup()
  sstables: compaction_stop_exception: add retry flag
  mutation_fragment_stream_validator: split into low-level and
    high-level API
  compaction: introduce scrub_compaction
  compaction_manager: scrub: don't piggy-back on upgrade_sstables()
  test: sstable_datafile_test: add scrub unit test
2020-02-17 15:28:07 +02:00
Piotr Jastrzebski
8a9dc8b394 test_services: stop calling global_partitioner()
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Piotr Jastrzebski
aae6240273 sstable_data_file_test: stop calling global_partitioner()
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Piotr Jastrzebski
ca4a89d239 dht: add dht::decorate_key
and replace all dht::global_partitioner().decorate_key
with dht::decorate_key

It is an improvement because dht::decorate_key takes schema
and uses it to obtain partitioner instead of using global
partitioner as it was before.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:06 +01:00
Avi Kivity
91c4409376 locator: token_metadata: remove unused include "query-request.hh"
sstable_datafile_test.cc lost access to interval_map (via
position_in_partition.hh), so it now includes that directly.
2020-02-14 20:46:25 +02:00
Botond Dénes
78624b5069 test: sstable_datafile_test: add scrub unit test 2020-02-13 15:02:37 +02:00
Botond Dénes
b2dc5d4895 compaction: compaction_descriptor: use compaction options instead of cleanup flag
Instead of the restrictive `cleanup` boolean flag, which allows for choosing
between only two compaction types, use `compaction_options`, which in
addition to allowing any number of compaction types to be selected,
also allows seamlessly passing specific options to them.
2020-02-11 17:47:44 +02:00
Piotr Jastrzebski
05e0451b27 token: change _data to int64_t
Previously _data was stored as array of 8 bytes in
network byte order.
After this change it stores the same value in int64_t
in host byte order.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-05 09:31:32 +01:00
Piotr Jastrzebski
b569d127a0 token: change data to array<uint8_t, 8>
It is save to do such change because we support only
Murmur3Partitioner which uses only tokens that are
8 bytes long.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-05 09:30:46 +01:00
Botond Dénes
936619a8d3 sstables/continuous_data_consumer: track buffers used for parsing
Based on heap profiling, buffers used for storing half-parsed fields are
a major contributor to the overall memory consumption of reads. This
memory was completely "under the radar" before. Track it by using
tracked `temporary_buffer` instances everywhere in
`continuous_data_consumer`. As `continuous_data_consumer` is the basis
for parsing all index and data files, adding the tracing here
automatically covers all data, index and promoted index parsing.

I'm almost convinced that there is a better place to store the `permit`
then the three places now, but so far I was unable to completely
decipher the our data/index file parsing class hierarchy.
2020-01-28 08:13:16 +02:00
Botond Dénes
dfc8b2fc45 treewide: replace reader_resource_tracer with reader_permit
The former was never really more than a reader_permit with one
additional method. Currently using it doesn't even save one from any
includes. Now that readers will be using reader_permit we would have to
pass down both to mutation_source. Instead get rid of
reader_resource_tracker and just use reader_permit. Instead of making it
a last and optional parameter that is easy to ignore, make it a
first class parameter, right after schema, to signify that permits are
now a prominent part of the reader API.

This -- mostly mechanical -- patch essentially refactors mutation_source
to ask for the reader_permit instead of reader_resource_tracking and
updates all usage sites.
2020-01-28 08:13:16 +02:00
Konstantin Osipov
1c8736f998 tests: move all test source files to their new locations
1. Move tests to test (using singular seems to be a convention
   in the rest of the code base)
2. Move boost tests to test/boost, other
   (non-boost) unit tests to test/unit, tests which are
   expected to be run manually to test/manual.

Update configure.py and test.py with new paths to tests.
2019-12-16 17:47:42 +03:00