Commit Graph

17 Commits

Author SHA1 Message Date
Avi Kivity
71398f3fb4 Merge "Cleanup sstable writer" from Benny
"
This series cleans up the legacy and common ssatble writer code.

metadata_collector::_ancestors were moved to class sstable
so that the former can be moved out of sstable into file_writer_impl.

Moved setting of replay position and sstable level via
sstable_writer_config so that compaction won't need to access
the metadata_collector via the sstable.

With that, metadata_collector could be moved from class sstable
to sstable_writer::writer_impl along with the column_stats.

That allowed moved "generic" file_writer methods that were actually
k/l format specific into sstable_writer_k_l.

Eventually `file_writer` code is moved into sstables/writer.cc
and sstable_writer_k_l into sstables/kl/writer.{hh,cc}

A bonus cleanup is the ability to get rid of
sstable::_correctly_serialize_non_compound_range_tombstones as
it's now available to the writers via the writer configuration
and not required to be stored in the sstable object.

Fixes #3012

Test: unit(dev)
"

* tag 'cleanup-sstable-writer-v2' of github.com:bhalevy/scylla:
  sstables: move writer code away to writer.cc
  sstables: move sstable_writer_k_l away to kl/writer
  sstables: get rid of sstable::_correctly_serialize_non_compound_range_tombstones
  sstables: move writer methods to sstable_writer_k_l
  sstables: move compaction ancestors to sstable
  sstables: sstable_writer: optionally set sstable level via config
  sstables: sstable_writer: optionally set replay position via config
  sstables: compaction: make_sstable_writer_config
  sstables: open code update_stats_on_end_of_stream in sstable_writer::consume_end_of_stream
  sstables: fold components_writer into sstable_writer_k_l
  sstables: move sstable_writer_k_l definition upwards
  sstables: components_writer: turn _index into unique_ptr
2020-10-15 10:40:28 +03:00
Benny Halevy
8cd4d53643 sstables: mx/writer: fix copy-paste error in reader_semaphore name
It was copied from sstables.cc in
6ca0464af5.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201014171651.541232-1-bhalevy@scylladb.com>
2020-10-14 22:17:49 +02:00
Benny Halevy
96cd6adc71 sstables: get rid of sstable::_correctly_serialize_non_compound_range_tombstones
Now it's available to the writers via the writer configuration
and not required to be stored in the sstable object.

Refs #3012

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 19:53:23 +03:00
Benny Halevy
97a446f9fa sstables: move writer methods to sstable_writer_k_l
They are called solely from the sstable_writer_k_l path.
With that, moce the metadata collector and column stats
to writer_impl.  They are now only used by the sstable
writers.

Refs #3012

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 19:52:17 +03:00
Benny Halevy
e1692bec17 sstables: move compaction ancestors to sstable
Compaction needs access to the sstable's ancestors so we need to
keep the ancestors for the sstable separately from the metadata collector
as the latter is about to be moved to the sstable writer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-10-14 19:51:26 +03:00
Avi Kivity
86bbf1763d Merge "reader concurrency semaphore: dump permit diagnostics on timeout or queue overflow" from Botond
"
The reader concurrency semaphore timing out or its queue being overflown
are fairly common events both in production and in testing. At the same
time it is a hard to diagnose problem that often has a benign cause
(especially during testing), but it is equally possible that it points
to something serious. So when this error starts to appear in logs,
usually we want to investigate and the investigation is lengthy...
either involves looking at metrics or coredumps or both.

This patch intends to jumpstart this process by dumping a diagnostics on
semaphore timeout or queue overflow. The diagnostics is printed to the
log with debug level to avoid excessive spamming. It contains a
histogram of all the permits associated with the problematic semaphore
organized by table, operation and state.

Example:

DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore -
Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics:
Permits with state admitted, sorted by memory
memory  count   name
3499M   27      ks.test:data-query

3499M   27      total

Permits with state waiting, sorted by count
count   memory  name
1       0B      ks.test:drain
7650    0B      ks.test:data-query

7651    0B      total

Permits with state registered, sorted by count
count   memory  name

0       0B      total

Total: permits: 7678, memory: 3499M

This allows determining several things at glance:
* What are the tables involved
* What are the operations involved
* Where is the memory

This can speed up a follow-up investigation greatly, or it can even be
enough on its own to determine that the issue is benign.

Tests: unit(dev, debug)
"

* 'dump-diagnostics-on-semaphore-timeout/v2' of https://github.com/denesb/scylla:
  reader_concurrency_semaphore: dump permit diagnostics on timeout or queue overflow
  utils: add to_hr_size()
  reader_concurrency_semaphore: link permits into an intrusive list
  reader_concurrency_semaphore: move expiry_handler::operator()() out-of-line
  reader_concurrency_semaphore: move constructors out-of-line
  reader_concurrency_semaphore: add state to permits
  reader_concurrency_semaphore: name permits
  querier_cache_test: test_immediate_evict_on_insert: use two permits
  multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader()
  multishard_combining_reader: add permit parameter
  multishard_combining_reader: shard_reader: use multishard reader's permit
2020-10-13 12:44:23 +03:00
Botond Dénes
ff623e70b3 reader_concurrency_semaphore: name permits
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.

As not all read can be associated with one schema, the schema is allowed
to be null.

The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
2020-10-13 12:32:13 +03:00
Avi Kivity
5065ae835f sstables: move bound_kind_m formatter to namespace sstables
Without this, clang complains that we violate argument dependent
lookup rules:

  note: 'operator<<' should be declared prior to the call site or in namespace 'sstables'
  std::ostream& operator<<(std::ostream&, const sstables::bound_kind_m&);

we can't enforce the #include order, but we can easily move it it
to namespace sstables (where it belongs anyway), so let's do that.

gcc is happy either way.
2020-10-12 20:38:11 +03:00
Avi Kivity
a00fca1a69 sstables: move bound_kind_m formatter to its natural place
Move bound_kind_m's formatter to the same header file where
is is defined. This prevents cases where the compiler decays
the type (an enum) to the underlying integral type because it
does not see the formatter declaration, resulting in the wrong
output.
2020-10-12 20:36:10 +03:00
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Pavel Emelyanov
812eed27fe code: Force formatting of pointer in .debug and .trace
... and tests. Printin a pointer in logs is considered to be a bad practice,
so the proposal is to keep this explicit (with fmt::ptr) and allow it for
.debug and .trace cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Benny Halevy
f5ffd5fc5f sstables: Fix reactor stall in sstables::seal_summary()
With relatively big summaries, reactor can be stalled for a couple
of milliseconds.

This patch:
a. allocates positions upfront to avoid excessive reallocation.
b. returns a future from seal_summary() and uses `seastar::do_for_each`
to iterate over the summary entries so the loop can yield if necessary.

Fixes #7108.

Based on 2470aad5a389dfd32621737d2c17c7e319437692 by Raphael S. Carvalho <raphaelsc@scylladb.com>

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200826091337.28530-1-bhalevy@scylladb.com>
2020-08-26 12:18:05 +03:00
Avi Kivity
3530e80ce1 Merge "Support md format" from Benny
"
This series adds support for the "md" sstable format.

Support is based on the following:

* do not use clustering based filtering in the presence
  of static row, tombstones.
* Disabling min/max column names in the metadata for
  formats older than "md".
* When updating the metadata, reset and disable min/max
  in the presence of range tombstones (like Cassandra does
  and until we process them accurately).
* Fix the way we maintain min/max column names by:
  keeping whole clustering key prefixes as min/max
  rather than calculating min/max independently for
  each component, like Cassandra does in the "md" format.

Fixes #4442

Tests: unit(dev), cql_query_test -t test_clustering_filtering* (debug)
md migration_test dtest from git@github.com:bhalevy/scylla-dtest.git migration_test-md-v1
"

* tag 'md-format-v4' of github.com:bhalevy/scylla: (27 commits)
  config: enable_sstables_md_format by default
  test: cql_query_test: add test_clustering_filtering unit tests
  table: filter_sstable_for_reader: allow clustering filtering md-format sstables
  table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results
  table: filter_sstable_for_reader: adjust to md-format
  table: filter_sstable_for_reader: include non-scylla sstables with tombstones
  table: filter_sstable_for_reader: do not filter if static column is requested
  table: filter_sstable_for_reader: refactor clustering filtering conditional expression
  features: add MD_SSTABLE_FORMAT cluster feature
  config: add enable_sstables_md_format
  database: add set_format_by_config
  test: sstable_3_x_test: test both mc and md versions
  test: Add support for the "md" format
  sstables: mx/writer: use version from sstable for write calls
  sstables: mx/writer: update_min_max_components for partition tombstone
  sstables: metadata_collector: support min_max_components for range tombstones
  sstable: validate_min_max_metadata: drop outdated logic
  sstables: rename mc folder to mx
  sstables: may_contain_rows: always true for old formats
  sstables: add may_contain_rows
  ...
2020-08-11 13:29:11 +03:00
Benny Halevy
e44ec45ab9 sstables: mx/writer: use version from sstable for write calls
Rather than using a constant sstable_version_types::mc.
In preparation to supporting sstable_version_types::md.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
bd4383a842 sstables: mx/writer: update_min_max_components for partition tombstone
Partition tombstones represent an implicit clustering range
that is unbound on both sides, so reflect than in min/max
column names metadata using empty clustering key prefixes.

If we don't do that, when using the sstable for filtering, we have no
other way of distinguishing range tombstones from partition tombstones
given the sstable metadata and we would need to include any sstable
with tombstones, even if those are range tombstone, for which
we can do a better filtering job, using the sstable min/max
column names metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
68acae5873 sstables: metadata_collector: support min_max_components for range tombstones
We essentially treat min/max column names as range bounds
with min as incl_start and max as incl_end.

By generating a bound_view for min/max column names on the fly,
we can correctly track and compare also short clustering
key prefixes that may be used as bounds for range tombstones.

Extend the sstable_tombstone_metadata_check unit test
to cover these cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
12393c5ec2 sstables: rename mc folder to mx
Prepare for supporting the md format.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00