Commit Graph

17033 Commits

Author SHA1 Message Date
Tomasz Grabiec
bf0164cdaf sstables: compress: Use libdeflate for crc32
Improves memtable flush performance by 10% in a CPU-bound case.

Unlike the zlib implementation, libdeflate is optimized for modern
CPUs. It utilizes the PCLMUL instruction.
2018-11-26 18:59:42 +01:00
Tomasz Grabiec
0ac1905f4f sstables: compress: Rename crc32_utils to zlib_crc32_checksummer 2018-11-26 18:59:42 +01:00
Tomasz Grabiec
ba141a4852 licenses: Add libdeflate license 2018-11-26 18:59:41 +01:00
Tomasz Grabiec
048d569b45 Integrate libdeflate with the build system 2018-11-26 18:59:09 +01:00
Tomasz Grabiec
f704f7bc19 Add libdeflate submodule 2018-11-26 18:57:51 +01:00
Tomasz Grabiec
743cf43847 sstables: Avoid checksum_combine() for the crc32 checksummer
checksum_combine() is much slower than re-feeding the buffer to
checksum() for the zlib CRC32 checksummer.

Introduce Checksum::prefer_combine() to determine this and select
more optimal behavior for given checksummer.

Improves performance of memtable flush with compression enabled by 30%.
2018-11-26 18:57:33 +01:00
Tomasz Grabiec
88cf1c61ba sstables: compress: Avoid unnecessary checksum_combine() 2018-11-26 14:31:38 +01:00
Tomasz Grabiec
8372cf7bcc sstables: checksum_utils: Add missing include 2018-11-26 14:31:38 +01:00
Piotr Sarna
6ab8235369 main: fix deinitialization order for view update generator
View update generator should be stopped only after
drain_on_shutdown() is performed on storage service.
Message-Id: <4d2bda4c73422a2ebf46d6dcd06c95d960839889.1543230849.git.sarna@scylladb.com>
2018-11-26 11:21:37 +00:00
Duarte Nunes
2a371c2689 Merge 'Allow bypassing cache on a per-query basis' from Avi
"
Some queries are very unlikely to hit cache. Usually this includes
range queries on large tables, but other patterns are possible.

While the database should adapt to the query pattern, sometimes the
user has information the database does not have. By passing this
information along, the user helps the database manage its resources
more optimally.

To do this, this patch introduces a BYPASS CACHE clause to the
SELECT statement. A query thus marked will not attempt to read
from the cache, and instead will read from sstables and memtables
only. This reduces CPU time spent to query and populate the cache,
and will prevent the cache from being flooded with data that is
not likely to be read again soon. The existing cache disabled path
is engaged when the option is selected.

Tests: unit (release), manual metrics verification with ccm with and without the
    BYPASS CACHE clause.

Ref #3770.
"

* tag 'cache-bypass/v2' of https://github.com/avikivity/scylla:
  doc: document SELECT ... BYPASS CACHE
  tests: add test for SELECT ... BYPASS CACHE
  cql: add SELECT ... BYPASS CACHE clause
  db: add query option to bypass cache
2018-11-26 09:59:40 +00:00
Paweł Dziepak
13385778fd Merge "Measure performance of dataset population in perf_fast_forward" from Tomasz
* tag 'perf-ffwd-dataset-population-v2' of github.com:tgrabiec/scylla:
  tests: perf_fast_forward: Measure performance of dataset population
  tests: perf_fast_forward: Record the dataset on which test case was run
  tests: perf_fast_forward: Introduce the concept of a dataset
  tests: perf_fast_forward: Introduce make_compaction_disabling_guard()
  tests: perf_fast_forward: Initialize output manager before population
  tests: perf_fast_forward: Handle empty test parameter set
  tests: perf_fast_forward: Extract json_output_writer::write_common_test_group()
  tests: perf_fast_forward: Factor out access to cfg to a single place per function
  tests: perf_fast_forward: Extract result_collector
  tests: perf_fast_forward: Take writes into account in AIO statistics
  tests: perf_fast_forward: Reorder members
  tests: perf_fast_forward: Add --sstable-format command line option
2018-11-26 09:45:55 +00:00
Avi Kivity
58033ad3a4 doc: document SELECT ... BYPASS CACHE
Add a new cql-extensions.md file and document BYPASS CACHE there.
2018-11-26 11:37:52 +02:00
Avi Kivity
f69401c609 tests: add test for SELECT ... BYPASS CACHE
The test verifies that cache read metrics are not incremented during a cache
bypass read.
2018-11-26 11:37:52 +02:00
Avi Kivity
ecf3f92ec7 cql: add SELECT ... BYPASS CACHE clause
The BYPASS CACHE clause instructs the database not to read from or populate the
cache for this query. The new keywords (BYPASS and CACHE) are not reserved.
2018-11-26 11:37:49 +02:00
Takuya ASADA
7740cd2142 dist/common/systemd/scylla-housekeeping-restart.service.mustache: specify correct repo for Debian variants
We do specify correct repo for both Red Hat/Debian variants on -deily, but
mistakenly don't for -restart, so do same on -restart.

Fixes #3906

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181109224509.27380-1-syuu@scylladb.com>
2018-11-26 11:02:25 +02:00
Rafael Ávila de Espíndola
6746907999 Use fully covered switches in continuous_data_consumer
do_process_buffer had two unreachable default cases and a long
if-else-if chain.

This converts the the if-else-if chain to a switch and a helper
function.

This moves the error checking from run time to compile time. If we
were to add a 128 bit integer for example, gcc would complain about it
missing from the switch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181125221451.106067-1-espindola@scylladb.com>
2018-11-25 22:52:11 +00:00
Avi Kivity
b4765af790 Merge "Introduce SSTable-run-based compaction" from Raphael
"
This new compaction approach consists of releasing exhausted fragments[1] of a run[2] a
compaction proceeds, so decreasing considerably the space requirement.
These changes will immediately benefit leveled strategy because it already works with
the run concept.

[1] fragment is a sstable composing a run; exhausted means sstable was fully consumed
by compaction procedure.
[2] run is a set of non-overlapping sstables which roughly span the
entire token range.

Note:
Last patch includes an example compaction strategy showing how to work with the interface.

unit tests: all modes passing
dtests: compaction ones passing
"

* 'sstable_run_based_compaction_v10' of github.com:raphaelsc/scylla:
  tests: add example compaction strategy for sstable run based approach
  sstables/compaction: propagate sstable replacement to all compaction of a CF
  sstables: store cf pointer in compaction_info
  tests/sstable_test: add test for compaction replacement of exhausted sstable
  sstables: add sstable's on closed handling
  tests/sstables: add test for sstable run based compaction
  sstables/compaction_manager: prevent partial run from being selected for compaction
  compaction: use same run identifier for sstables generated by same compaction
  sstables: introduce sstable run
  sstables/compaction_manager: release reference to exhausted sstable through callback
  sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor
  database: do not keep reference to sstable in selector when done selecting
  compaction: share sstable set with incremental reader selector
  sstables/compaction: release space earlier of exhausted input sstables
  sstables: make partitioned sstable set's incremental selector resilient to changes in the set
  database: do not store reference to sstable in incremental selector
  tests/sstables: add run identifier correctness test
  sstables: use a random uuid for sstables without run identifier
  sstables: add run identifier to scylla metadata
2018-11-25 17:20:24 +02:00
Avi Kivity
b835b93ee6 db: add query option to bypass cache
With the option enabled, we bypass the cache unconditionally and only
read from memtables+sstables. This is useful for analytics queries.
2018-11-25 16:26:08 +02:00
Raphael S. Carvalho
3fa70d6b5f tests: add example compaction strategy for sstable run based approach
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 20:16:54 -02:00
Raphael S. Carvalho
2058001f94 sstables/compaction: propagate sstable replacement to all compaction of a CF
This is needed for parallel compaction to work with sstable run based approach.
That's because regular compaction clones a set containing all sstables of its
column family. So compaction A can potentially hold a reference to a compacting
sstable of compaction B, so preventing compacting B from releasing its exhausted
sstable.

So all replacements are propagated to all compactions of a given column family,
and compactions in turn, including the one which initiated the propagation,
will do the replacement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:30 -02:00
Raphael S. Carvalho
953fdcc867 sstables: store cf pointer in compaction_info
motivation is that we need a more efficient way to find compactions
that belong to a given column family in compaction list.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:28 -02:00
Raphael S. Carvalho
baf89f0df3 tests/sstable_test: add test for compaction replacement of exhausted sstable
Make sure that compaction is capable of releasing exhausted sstable space
early in the procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:26 -02:00
Raphael S. Carvalho
824c20b76d sstables: add sstable's on closed handling
Motivation is that it will be useful for catching regression on compaction
when releasing early exhausted sstables. That's because sstable's space
is only released once it's closed. So this will allow us to write a test
case and possibly use it for entities holding exhausted sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:25 -02:00
Raphael S. Carvalho
0085e8371d tests/sstables: add test for sstable run based compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:23 -02:00
Raphael S. Carvalho
e88d1d54b9 sstables/compaction_manager: prevent partial run from being selected for compaction
Filter out sstable belonging to a partial run being generated by an ongoing
compaction. Otherwise, that could lead to wrong decisions by the compaction
strategy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:22 -02:00
Raphael S. Carvalho
23884fe9f6 compaction: use same run identifier for sstables generated by same compaction
SSTables composing the same run will share the same run identifier.
Therefore, a new compaction strategy will be able to get all sstables belong
to the same run from sstable_set, which now keeps track of existing runs.

Same UUID is passed to writers of a given compaction. Otherwise, a new UUID
is picked for every sstable created by compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:20 -02:00
Raphael S. Carvalho
4f68cb34a6 sstables: introduce sstable run
sstable run is a structure that will hold all sstables that has the same
run identifier. All sstables belonging to the same run will not overlap
with one another.
It can be used by compaction strategy to work on runs instead of individual
sstables.

sstable_set structure which holds all sstables for a given column family
will be responsible for providing to its user an interface to work with
runs instead of individual sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:18 -02:00
Raphael S. Carvalho
fc92fb955d sstables/compaction_manager: release reference to exhausted sstable through callback
That's important for the reference to sstable to not be kept throughout
the compaction procedure, which would break the goal of releasing
space during compaction.

Manager passes a callback to compaction which calls it whenever
there's sstable replacement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:16 -02:00
Raphael S. Carvalho
3f309ebba9 sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor
Motivation is that we want to release space for exhausted sstable and that
will only happen when all references to it are gone *and* that backlog
tracker takes the early replacement into account.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:13 -02:00
Raphael S. Carvalho
3433de3dc0 database: do not keep reference to sstable in selector when done selecting
When compacting, we'll create all readers at once and will not select
again from incremental selector, meaning the selector will keep all
respective sstables in current_sstables, preventing compaction from
releasing space as it goes on.

The change is about refreshing sstable set's selector such that it
will not hold a reference to an exhausted sstable whatsoever.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:12 -02:00
Raphael S. Carvalho
f6df949c1a compaction: share sstable set with incremental reader selector
By doing that, we'll be able to release exhausted sstable from both
simulteaneously.
That's achieved by sharing set containing input sstables with the incremental
reader selector and removing exhausted sstables from shared set when the
time has come.

Step towards reducing disk requirement for compaction by making it delete
sstable which all data is in a sealed new sstable. For that to happen,
all references must be gone.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:10 -02:00
Raphael S. Carvalho
e5a0b05c15 sstables/compaction: release space earlier of exhausted input sstables
Currently, compaction only replace input sstables at end of compaction,
meaning compaction must be finished for all the space of those sstables
to be released.

What we can do instead is to delete earlier some input sstable under
some conditions:

1) SStable data should be committed to a new, sealed output sstable,
meaning it's exhausted.
2) Exhausted sstable mustn't overlap with a non-exhausted sstable
because a tombstone in the exhausted could have been purged and the
shadowed data in non-exhausted could be ressurected if system
crashes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:07 -02:00
Raphael S. Carvalho
ace070c8fc sstables: make partitioned sstable set's incremental selector resilient to changes in the set
The motivation is that compaction may remove a sstable from the set while the
incremental selector is alive, and for that to work, we need to invalidate
the iterators stored by the selector. We could have added a method to notify
it, but there will be a case where the one keeping the set cannot forward
the notification to the selector. So it's better for the selector to take
care of itself. Change counter approach is used which allows the selector
to know when to invalidate the iterators.

After invalidation, selector will move the iterator back into its right
place by looking for lower bound for current pos.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:05 -02:00
Raphael S. Carvalho
8d11b0bbb4 database: do not store reference to sstable in incremental selector
Use sstable generation instead to keep track of read sstables.
The motivation is that we'll not keep reference to sstables, so allowing
their space on disk to be released as soon they get exhausted.
Generation is used because it guarantees uniqueness of the sstable.

Reviewed-by: Botond Dénes <bdenes@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:04 -02:00
Raphael S. Carvalho
edc87014c1 tests/sstables: add run identifier correctness test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:02 -02:00
Raphael S. Carvalho
a66b1954cc sstables: use a random uuid for sstables without run identifier
Older sstables must have an identifier for them to be associated
with their own run.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:01 -02:00
Raphael S. Carvalho
62025fa52c sstables: add run identifier to scylla metadata
It identifies a run which a particular sstable belongs to.
Existing sstables will have a random uuid associated with it
in memory.

UUID is the correct choice because it allows sstables to be
exported without having conflicts when using identifier generated
by different nodes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:52:44 -02:00
Rafael Ávila de Espíndola
d18bbe9d45 Remove unreachable default cases.
These switches are fully covered. We can be sure they will stay this
way because of -Werror and gcc's -Wswitch warning.

We can also be sure that we never have an invalid enum value since the
state machine values are not read from disk.

The patch also removes a superfluous ';'.
Message-Id: <20181124020128.111083-1-espindola@scylladb.com>
2018-11-24 09:31:51 +00:00
Raphael S. Carvalho
d29482dce8 sstables: deprecate sstable metadata's ancestors
The reason for that is that it's not available in sstable format mc,
so we can no longer rely on it in common code for the currently
supported formats.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com>
2018-11-23 19:38:32 +01:00
Tomasz Grabiec
8e93046abc tests: perf_fast_forward: Measure performance of dataset population 2018-11-23 19:22:50 +01:00
Tomasz Grabiec
2c95aa4d8d tests: perf_fast_forward: Record the dataset on which test case was run
Now any given test case can potentially run on many different datasets.
2018-11-23 19:22:12 +01:00
Tomasz Grabiec
470552b7ab tests: perf_fast_forward: Introduce the concept of a dataset
A dataset represents a table with data, populated in certain way, with
certain characteristics of the schema and data.

Before this change, datasets were implicitly defined, with population
hard-coded inside the populate() function.

This change gathers logic related to datasets into classes, in order to:

  - make it easier to define new datasets.

  - be able to measure performance of dataset population in a
    standardized way.

  - being able to express constraints on datasets imposed by different
    test cases.  Test cases are matched with possible datasets based
    on the abstract interface they accept (e.g. clustered_ds,
    multipartition_ds), and which must be implemented by a compatible
    dataset. To facilitate this matching, test function is now wrapped
    into a dataset_acceptor object, with an automatically-generated can_run()
    virtual method, deduced by make_test_fn().

  - be able to select tests to run based on the dataset name.
    Only tests which are compatible with that dataset will be run.
2018-11-23 19:22:09 +01:00
Tomasz Grabiec
2746f78a9f tests: perf_fast_forward: Introduce make_compaction_disabling_guard() 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
b00d360281 tests: perf_fast_forward: Initialize output manager before population 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
25dc481030 tests: perf_fast_forward: Handle empty test parameter set 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
38a1b7e87b tests: perf_fast_forward: Extract json_output_writer::write_common_test_group() 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
a507ca8159 tests: perf_fast_forward: Factor out access to cfg to a single place per function
Preparatory change before making n_rows be determined through a
dataset object.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
3fc78a25bf tests: perf_fast_forward: Extract result_collector
Extracts the result collection and reporting logic out of
run_test_case(). Will be needed in population tests, for which we
don't want the looping logic.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
f4a70283ee tests: perf_fast_forward: Take writes into account in AIO statistics
Relevant for population tests. So far all tests were read tests.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
96f5bd2f46 tests: perf_fast_forward: Reorder members 2018-11-23 19:18:09 +01:00