Commit Graph

57 Commits

Author SHA1 Message Date
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
4f5ccf82cb mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_clustering_row() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Avi Kivity
5e292897df test: perf/perf_sstable.hh: prepare for asynchronously closed sstables_manager
Obtain a test_env using do_with_async_returning(); and pass it to
column_family_test_config so it can stop using test_sstables_manager.
2020-09-23 20:55:14 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Avi Kivity
3530e80ce1 Merge "Support md format" from Benny
"
This series adds support for the "md" sstable format.

Support is based on the following:

* do not use clustering based filtering in the presence
  of static row, tombstones.
* Disabling min/max column names in the metadata for
  formats older than "md".
* When updating the metadata, reset and disable min/max
  in the presence of range tombstones (like Cassandra does
  and until we process them accurately).
* Fix the way we maintain min/max column names by:
  keeping whole clustering key prefixes as min/max
  rather than calculating min/max independently for
  each component, like Cassandra does in the "md" format.

Fixes #4442

Tests: unit(dev), cql_query_test -t test_clustering_filtering* (debug)
md migration_test dtest from git@github.com:bhalevy/scylla-dtest.git migration_test-md-v1
"

* tag 'md-format-v4' of github.com:bhalevy/scylla: (27 commits)
  config: enable_sstables_md_format by default
  test: cql_query_test: add test_clustering_filtering unit tests
  table: filter_sstable_for_reader: allow clustering filtering md-format sstables
  table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results
  table: filter_sstable_for_reader: adjust to md-format
  table: filter_sstable_for_reader: include non-scylla sstables with tombstones
  table: filter_sstable_for_reader: do not filter if static column is requested
  table: filter_sstable_for_reader: refactor clustering filtering conditional expression
  features: add MD_SSTABLE_FORMAT cluster feature
  config: add enable_sstables_md_format
  database: add set_format_by_config
  test: sstable_3_x_test: test both mc and md versions
  test: Add support for the "md" format
  sstables: mx/writer: use version from sstable for write calls
  sstables: mx/writer: update_min_max_components for partition tombstone
  sstables: metadata_collector: support min_max_components for range tombstones
  sstable: validate_min_max_metadata: drop outdated logic
  sstables: rename mc folder to mx
  sstables: may_contain_rows: always true for old formats
  sstables: add may_contain_rows
  ...
2020-08-11 13:29:11 +03:00
Piotr Jastrzebski
80e3923b3c codebase wide: replace find(...) != end() with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
the code pattern looked like:

<collection>.find(<element>) != <collection>.end()

In C++20 the same can be expressed with:

<collection>.contains(<element>)

This is not only more concise but also expresses the intend of the code
more clearly.

This commit replaces all the occurences of the old pattern with the new
approach.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>
2020-08-11 13:28:50 +03:00
Benny Halevy
e2340d0684 config: enable_sstables_md_format by default
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:32 +03:00
Benny Halevy
65239a6e50 config: add enable_sstables_md_format
MD format is disabled by default at this point.

The option extends enable_sstables_mc_format
so that both are needed to be set for supporting
the md format.

The MD_FORMAT cluster feature will be added in
a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Avi Kivity
257c17a87a Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael
"
While working on another patch I was getting odd compiler errors
saying that a call to ::make_shared was ambiguous. The reason was that
seastar has both:

template <typename T, typename... A>
shared_ptr<T> make_shared(A&&... a);

template <typename T>
shared_ptr<T> make_shared(T&& a);

The second variant doesn't exist in std::make_shared.

This series drops the dependency in scylla, so that a future change
can make seastar::make_shared a bit more like std::make_shared.
"

* 'espindola/make_shared' of https://github.com/espindola/scylla:
  Everywhere: Explicitly instantiate make_lw_shared
  Everywhere: Add a make_shared_schema helper
  Everywhere: Explicitly instantiate make_shared
  cql3: Add a create_multi_column_relation helper
  main: Return a shared_ptr from defer_verbose_shutdown
2020-08-02 19:51:24 +03:00
Botond Dénes
6660a5df51 result_memory_accounter: remove default constructor
If somebody wants to bypass proper memory accounting they should at
the very least be forced to consider if that is indeed wise and think a
second about the limit they want to apply.
2020-07-28 18:00:29 +03:00
Rafael Ávila de Espíndola
efeaded427 Everywhere: Add a make_shared_schema helper
This replaces a lot of make_lw_shared(schema(...)) with
make_shared_schema(...).

This makes it easier to drop a dependency on the differences between
seastar::make_shared and std::make_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-07-21 10:33:49 -07:00
Pavel Emelyanov
f8ffc31218 test: Print more sizes in memory_footprint_test
The row cache memory footprint changed after switch to B+
because we no longer have a sole cache_entry allocation, but
also the bplus::data and bplus::node. Knowing their sizes
helps analyzing the footprint changes.

Also print the size of memtable_entry that's now also stored
in B+'s data.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:30:02 +03:00
Pavel Emelyanov
174b101a49 row_cache: Switch partition tree onto B+ rails
The row_cache::partitions_type is replaced from boost::intrusive::set
to bplus::tree<Key = int64_t, T = array_trusted_bounds<cache_entry>>

Where token is used to quickly locate the partition by its token and
the internal array -- to resolve hashing conflicts.

Summary of changes in cache_entry:

- compare's goes away as the new collection needs tri-compare one which
  is provided by ring_position_comparator

- when initialized the dummy entry is added with "after_all_keys" kind,
  not "before_all_keys" as it was by default. This is to make tree
  entries sorted by token

- insertion and removing of cache_entries happens inside double_decker,
  most of the changes in row_cache.cc are about passing constructor args
  from current_allocator.construct into double_decker.empace_before()

- the _flags is extended to keep array head/tail bits. There's a room
  for it, sizeof(cache_entry) remains unchanged

The rest fits smothly into the double_decker API.

Also, as was told in the previous patch, insertion and removal _may_
invalidate iterators, but may leave them intact. However, currently
this doesn't seem to be a problem as the cache_tracker ::insert() and
::on_partition_erase do invalidate iterators unconditionally.

Later this can be otimized, as iterators are invalidated by double-decker
only in case of hash conflict, otherwise it doesn't change arrays and
B+ tree doesn't invalidate its.

tests: unit(dev), perf(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:30:02 +03:00
Pavel Emelyanov
95f15ea383 utils: B+ tree implementation
// The story is at
// https://groups.google.com/forum/#!msg/scylladb-dev/sxqTHM9rSDQ/WqwF1AQDAQAJ

This is the B+ version which satisfies several specific requirements
to be suitable for row-cache usage.

1. Insert/Remove doesn't invalidate iterators
2. Elements should be LSA-compactable
3. Low overhead of data nodes (1 pointer)
4. External less-only comparator
5. As little actions on insert/delete as possible
6. Iterator walks the sorted keys

The design, briefly is:

There are 3 types of nodes: inner, leaf and data, inner and leaf
keep build-in array of N keys and N(+1) nodes. Leaf nodes sit in
a doubly linked list. Data nodes live separately from the leaf ones
and keep pointers on them. Tree handler keeps pointers on root and
left-most and right-most leaves. Nodes do _not_ keep pointers or
references on the tree (except 3 of them, see below).

changes in v9:

- explicitly marked keys/kids indices with type aliases
- marked the whole erase/clear stuff noexcept
- disposers now accept object pointer instead of reference
- clear tree in destructor
- added more comments
- style/readability review comments fixed

Prior changes

**
- Add noexcepts where possible
- Restrict Less-comparator constraint -- it must be noexcept
- Generalized node_id
- Packed code for beging()/cbegin()

**
- Unsigned indices everywhere
- Cosmetics changes

**
- Const iterators
- C++20 concepts

**
- The index_for() implmenetation is templatized the other way
  to make it possible for AVX key search specialization (further
  patching)

**
- Insertion tries to push kids to siblings before split

  Before this change insertion into full node resulted into this
  node being split into two equal parts. This behaviour for random
  keys stress gives a tree with ~2/3 of nodes half-filled.

  With this change before splitting the full node try to push one
  element to each of the siblings (if they exist and not full).
  This slows the insertion a bit (but it's still way faster than
  the std::set), but gives 15% less total number of nodes.

- Iterator method to reconstruct the data at the given position

  The helper creates a new data node, emplaces data into it and
  replaces the iterator's one with it. Needed to keep arrays of
  data in tree.

- Milli-optimize erase()
  - Return back an iterator that will likely be not re-validated
  - Do not try to update ancestors separation key for leftmost kid

  This caused the clear()-like workload work poorly as compared to
  std:set. In particular the row_cache::invalidate() method does
  exactly this and this change improves its timing.

- Perf test to measure drain speed
- Helper call to collect tree counters

**
- Fix corner case of iterator.emplace_before()
- Clean heterogenous lookup API
- Handle exceptions from nodes allocations
- Explicitly mark places where the key is copied (for future)
- Extend the tree.lower_bound() API to report back whether
  the bound hit the key or not
- Addressed style/cleanness review comments

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:29:43 +03:00
Pavel Emelyanov
9d38846ed2 test: Move perf measurement helpers into header
To use the code in new perf tests in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 12:58:26 +03:00
Tomasz Grabiec
19501d9ef2 sstables, tests: Allow disabling binary search in promoted index from perf tests 2020-06-16 16:15:23 +02:00
Raphael S. Carvalho
8e47f61df7 compaction: Enable tombstone expiration based on the presence of the sstable set
For tombstone expiration to proceed correctly without the risk of resurrecting
data, the sstable set must be present.
Regular compaction and derivatives provide the sstable set, so they're able
to expire tombstones with no resurrection risk.
Resharding, on the other hand, can run on any shard, not necessarily on the
same shard that one of the input sstables belongs to, so it currently cannot
provide a sstable set for tombstone expiration to proceed safely.
That being said, let's only do expiration based on the presence of the set.
This makes room for the sstable set to be feeded to compaction via descriptor,
allowing even resharding to do expiration. Currently, compaction thinks that
sstable set can only come from the table, and that also needs to be changed
for further flexibility.

It's theoretically possible that a given resharding job will resurrect data if
a fully expired SSTable is resharded at a shard which it doesn't belong to.
Resharding will have no way to tell that expiring all that data will lead to
resurrection because the relevant SSTables are at different shards.
This is fixed by checking for fully expired sstables only on presence of
the sstable set.

Fixes #6600.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200605200954.24696-1-raphaelsc@scylladb.com>
2020-06-07 11:46:48 +03:00
Piotr Sarna
d1f5d42a25 test: add big_decimal perf test
In order to be able to measure the impact of rewritting
the parsing mechanism from std::regex to a hand-written
state machine.
2020-06-01 16:11:49 +02:00
Botond Dénes
d68ac8bf18 treewide: remove all uses of no_reader_permit() 2020-05-28 11:34:35 +03:00
Botond Dénes
fe024cecdc row_cache: pass a valid permit to underlying read
All reader are soon going to require a valid permit, so make sure we
have a valid permit which we can pass to the underlying reader when
creating it. This means `row_cache::make_reader()` now also requires
a permit to be passed to it.
2020-05-28 11:34:35 +03:00
Botond Dénes
9ede82ebf8 memtable: pass a valid permit to the delegate reader
All reader are soon going to require a valid permit, so make sure we
have a valid permit which we can pass to the delegate reader when
creating it. This means `memtable::make_flat_reader()` now also requires
a permit to be passed to it.
Internally the permit is stored in `scanning_reader`, which is used both
for flushes and normal reads. In the former case a permit is not
required.
2020-05-28 11:34:35 +03:00
Botond Dénes
cc5137ffe3 table: require a valid permit to be passed to most read methods
Now that the most prevalent users (range scan and single partition
reads) all pass valid permits we require all users to do so and
propagate the permit down towards `make_sstable_reader()`. The plan is
to use this permit for restricting the sstable readers, instead of the
semaphore the table is configured with. The various
`make_streaming_*reader()` overloads keep using the internal semaphores
as but they also create the permit before the read starts and pass it to
`make_sstable_reader()`.
2020-05-28 11:34:35 +03:00
Piotr Sarna
92aadb94e5 treewide: propagate trace state to write path
In order to add tracing to places where it can be useful,
e.g. materialized view updates and hinted handoff, tracing state
is propagated to all applicable call sites.
2020-05-18 16:05:23 +02:00
Glauber Costa
70a89ab4ab compaction: do not assume I/O priority class
We shouldn't assume the I/O priority class for compactions.  For
instance, if we are dealing with offstrategy compactions we may want to
use the maintenance group priority for them.

For now, all compactions are put in the compaction class.  rewrite
compactions (scrub, cleanup) could be maintenance, but we don't have
clear access to the database object at this time to derive the
equivalent CPU priority. This is planned to be changed in the future,
and when we do change it, we'll adjust.

Same goes for resharding: while we could at this point change it we'd
risking memory pressure since resharding is run online and sstables are
shared until resharding is done. When we move it to offline execution
we'll do it with maintenance priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200512002233.306538-3-glauber@scylladb.com>
2020-05-12 08:23:19 +03:00
Tomasz Grabiec
2078016f84 test: memory_footprint: Avoid invalid identifiers as columnnames
Column name should not start with a digit, as can be the case with
random_string().

Message-Id: <1588860648-15796-1-git-send-email-tgrabiec@scylladb.com>
2020-05-07 17:33:34 +03:00
Pavel Emelyanov
ef181fb2d0 test: Add option to flush memtables for perf_simple_query
The test in question measures the speed of memtables, not
the row_cache. With this option it can do both.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200507140603.12350-1-xemul@scylladb.com>
2020-05-07 16:09:40 +02:00
Piotr Sarna
bec95a0605 treewide: use thread-safe variant of localtime
In order to ensure thread-safety, all usages of localtime()
are replaced with localtime_r(), which may accept a local
buffer.

Tests: unit(dev)
Fixes #6364
Message-Id: <ad4a0c0e1707f0318325718715a3a647e3ebfdfe.1588592156.git.sarna@scylladb.com>
2020-05-04 14:46:08 +03:00
Raphael S. Carvalho
5ac0d31323 test: perf_simple_query: fix test with smp count > 1
that code doesn't run under a thread, so let's futurize it.
the code worked with single cpu because get() returns right away
due to no deferring point.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200427155303.82763-1-raphaelsc@scylladb.com>
2020-04-27 18:58:25 +03:00
Tomasz Grabiec
92771e904a test: memory_footprint: Silence logging by default 2020-04-17 11:34:13 +02:00
Tomasz Grabiec
1df63b60c3 test: memory_footprint: Introduce --partition-count option 2020-04-17 11:34:13 +02:00
Tomasz Grabiec
7c2f6dd75e test: memory_footprint: Run under a cql_test_env 2020-04-17 11:34:13 +02:00
Tomasz Grabiec
04c093cbec test: memory_footprint: Calculate sstable size for each format version 2020-04-17 11:34:12 +02:00
Avi Kivity
88ade3110f treewide: replace calls to engine().some_api() with some_api()
This removes the need to include reactor.hh, a source of compile
time bloat.

In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.

Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().

Ref #1
2020-04-05 12:46:04 +03:00
Avi Kivity
1799cfa88a logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope
This allows us to drop a #include <reactor.hh>, reducing compile time.

Several translation units that lost access to required declarations
are updated with the required includes (this can be an include of
reactor.hh itself, in case the translation unit that lost it got it
indirectly via logalloc.hh)

Ref #1.
2020-04-05 12:45:08 +03:00
Botond Dénes
240b5e0594 frozen_schema: key() remove unused schema parameter
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200402092249.680210-1-bdenes@scylladb.com>
2020-04-02 14:43:35 +02:00
Glauber Costa
e8801cd77b compaction: enhance compaction_descriptor with creator and replace function
There are many differences between resharding and compaction that are
artificial, arising more from the way we ended up implementing it than
necessity. This patch attempts to pass the creator and replacer functions
through the compaction_descriptor.

There is a difference between the creator function for resharding and
regular compaction: resharding has to pass the shard number on behalf
of which the SSTable is created. However regular compactions can just
ignore this. No need to have a special path just for this.

After this is done, the constructor for the compaction object can be
greatly simplified. In further patches I intend to simplify it a bit
further, but some more cleanup has to happen first.

To make that happen we have to construct a compaction_descriptor object
inside the resharding function. This is temporary: resharding currently
works with a descriptor, but at some point that descriptor is lost and
broken into pieces to be passed to this function. The overarching goal
of this work is exactly to be able to keep that descriptor for as long
as possible, which should simplify things a lot.

Callers are patched, but there are plenty for sstable_datafile_test.cc.
For their benefit, a helper function is provided to keep the previous
signature (test only).

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-03-31 19:41:25 -04:00
Rafael Ávila de Espíndola
c5795e8199 everywhere: Replace engine().cpu_id() with this_shard_id()
This is a bit simpler and might allow removing a few includes of
reactor.hh.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200326194656.74041-1-espindola@scylladb.com>
2020-03-27 11:40:03 +03:00
Botond Dénes
575466b2cf test/boost/sstable_test.hh: move generic stuff to test/lib/sstable_utils.hh
sstable_test.hh started as collection of utilities shared between the
various `_sstable_test.cc` files. Predictably other tests started using
it as well, among them some that are non boost unit tests. This poses a
problem as if we add the missing boost/test/unit_test.hpp include to
sstable_test.hh these tests will suddenly have missing symbols from
boost::test. To avoid linking boost::test into all these users, extract
utilities more widely used into sstable_utils.hh
2020-03-23 09:29:45 +02:00
Rafael Ávila de Espíndola
80d969ce31 everywhere: Use uninitialized_string instead of sstring::initialized_later
This is just a trivial wrapper over initialized_later when using
sstring, but also works when std::string is used.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-03-10 13:17:49 -07:00
Rafael Ávila de Espíndola
caef2ef903 everywhere: Don't assume sstring::begin() and sstring::end() are pointers
If we switch to using std::string we have to handle begin and end
returning iterators.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-03-10 13:13:48 -07:00
Konstantin Osipov
ac0717fb64 test: consistently use a global testlog object in all tests
Use test/lib/log.hh in all tests now that we have it.
2020-03-05 13:34:24 +03:00
Avi Kivity
157fe4bd19 Merge "Remove default timeouts" from Botond
"
Timeouts defaulted to `db::no_timeout` are dangerous. They allow any
modifications to the code to drop timeouts and introduce a source of
unbounded request queue to the system.

This series removes the last such default timeouts from the code. No
problems were found, only test code had to be updated.

tests: unit(dev)
"

* 'no-default-timeouts/v1' of https://github.com/denesb/scylla:
  database: database::query*(), database::apply*(): remove default timeouts
  database: table::query(): remove default timeout
  mutation_query: data_query(): remove default timeout
  mutation_query: mutation_query(): remove default timeout
  multishard_mutation_query: query_mutations_on_all_shards(): remove default timeout
  reader_concurrency_semaphore: wait_admission(): remove default timeout
  utils/logallog: run_when_memory_available(): remove default timeout
2020-03-01 17:29:17 +02:00
Rafael Ávila de Espíndola
2679c0cc87 perf_simple_query: Pass a string_view to make_counter_schema
With this we don't need to construct a sstring just to call
make_counter_schema.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-02-28 08:36:27 -08:00
Botond Dénes
1073094f04 database: database::query*(), database::apply*(): remove default timeouts 2020-02-27 19:14:12 +02:00
Rafael Ávila de Espíndola
17f12a8197 perf_simple_query: Call set_abort_on_internal_error(true)
We should never ignore an internal error in a perf test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200225055745.321086-2-espindola@scylladb.com>
2020-02-26 18:22:05 +02:00
Rafael Ávila de Espíndola
c6897dcbea perf_simple_query: Simplify with seastar::thread
There is no reason not to use a seastar::thread in setup code.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200225055745.321086-1-espindola@scylladb.com>
2020-02-26 18:22:04 +02:00
Piotr Jastrzebski
ca4a89d239 dht: add dht::decorate_key
and replace all dht::global_partitioner().decorate_key
with dht::decorate_key

It is an improvement because dht::decorate_key takes schema
and uses it to obtain partitioner instead of using global
partitioner as it was before.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:06 +01:00
Piotr Jastrzebski
edd7398a0c memory_footprint_test: Make it clear it has to run with -c1
The test fails when run on number of cores different than 1.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-05 10:22:32 +01:00
Piotr Jastrzebski
1a8fe4befd tests: move memory_footprint_test to perf/
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-05 10:18:28 +01:00