Commit Graph

23141 Commits

Author SHA1 Message Date
Avi Kivity
d36601a838 Merge 'Make commitlog respect disk limit better' from Calle
"
Refs #6148

Separates disk usage into two cases: Allocated and used.
Since we use both reserve and recycled segments, both
which are not actually filled with anything at the point
of waiting.

Also refuses to recycle segments or increase reserve size
if our current disk footprint exceeds threshold.

And finally uses some initial heuristics to determine when
we should suggest flushing, based on disk limit, segment
size, and current usage. Right now, when we only have
a half segment left before hitting used == max.

Some initial tests show an improved adherence to limit
though it will still be exceeded, because we do _not_
force waiting for segments to become cleared or similar
if we need to add data, thus slow flushing can still make
usage create extra segments. We will however attempt to
shrink disk usage when load is lighter.

Somewhat unclear how much this impacts performance
with tight limits, and how much this matters.
"

* elcallio-calle/commitlog_size:
  commitlog: Make commitlog respect disk limit better
  commitlog: Demote buffer write log messages to trace
2020-08-11 15:03:32 +03:00
Dejan Mircevski
013893b08d auth: Drop needless role-manager check
The service constructor included a check ensuring that only
standard_role_manager can be used with password_authenticator. But
after 00f7bc6, password_authenticator does not depend on any action of
standard_role_manager. All queries to meta::roles_table in
password_authenticator seem self-contained: the table is created at
the start if missing, and salted_hash is CRUDed independently of any
other columns bar the primary key role_col_name.

NOTE: a nonstandard role manager may not delete a role's row in
meta::roles_table when that role is dropped. This will result in
successful authentication for that non-existing role. But the clients
call check_user_can_login() after such authentication, which in turn
calls role_manager::exists(role). Any correctly implemented role
manager will then return false, and authentication_exception will be
thrown. Therefore, no dependencies exist on the role-manager
behaviour, other than it being self-consistent.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-08-11 14:56:18 +03:00
Avi Kivity
4547949420 Merge "Fix repair stalls in get_sync_boundary and apply_rows_on_master_in_thread" from Asias
"
This path set fixes stalls in repair that are caused by std::list merge and clear operations during test_latency_read_with_nemesis test.

Fixes #6940
Fixes #6975
Fixes #6976
"

* 'fix_repair_list_stall_merge_clear_v2' of github.com:asias/scylla:
  repair: Fix stall in apply_rows_on_master_in_thread and apply_rows_on_follower
  repair: Use clear_gently in get_sync_boundary to avoid stall
  utils: Add clear_gently
  repair: Use merge_to_gently to merge two lists
  utils: Add merge_to_gently
2020-08-11 14:52:23 +03:00
Botond Dénes
db5926134a sstables: sstable_mutation_reader: read_partition(): include more information in exception
Resolve the FIXME to help investigating related issues and include the
position of the consumer in the error message.

Refs: #6529

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200811111101.1576222-1-bdenes@scylladb.com>
2020-08-11 14:52:04 +03:00
Asias He
c65ad02fcd repair: Fix stall in apply_rows_on_master_in_thread and apply_rows_on_follower
The row_diff list in apply_rows_on_master_in_thread and
apply_rows_on_follower can be large. Modify do_apply_rows to remove the
row from the list when the row is consumed to avoid stall when the list
is destroyed.

Fixes #6975
2020-08-11 19:37:47 +08:00
Asias He
9f4b3a5fa6 repair: Use clear_gently in get_sync_boundary to avoid stall
The _row_buf and _working_row_buf list can be large. Use
clear_gently helper to avoid stalls.

Fixes #6940
2020-08-11 19:37:47 +08:00
Asias He
3e8c4a6788 utils: Add clear_gently
A helper to clear a list without stall.

Refs #6975
Refs #6940
2020-08-11 19:37:47 +08:00
Calle Wilund
ed86e870ee docs/cdc.md: Add short explanation of stream ID bit composition
Bit layout, sort order and field usage of CDC stream ids.
2020-08-11 14:09:45 +03:00
Avi Kivity
41a75f2b99 Merge "make do_io_check path noexcept" from Benny
"
Make do_io_check and the io_check functions that
call it noexcept.  Up to sstable_write_io_check
and sstable_touch_directory_io_check.

Tests: unit (dev)
"

* tag 'io-check-noexcept-v1' of github.com:bhalevy/scylla:
  ssstable: io_check functions: make noexcept
  utils: do_io_check: adjust indentation
  utils: io_check: make noexcept for future-returning functions
2020-08-11 13:41:20 +03:00
Calle Wilund
5d044ab74e commitlog: Make commitlog respect disk limit better
Refs #6148

Separates disk usage into two cases: Allocated and used.
Since we use both reserve and recycled segments, both
which are not actually filled with anything at the point
of waiting.

Also refuses to recycle segments or increase reserve size
if our current disk footprint exceeds threshold.

And finally uses some initial heuristics to determine when
we should suggest flushing, based on disk limit, segment
size, and current usage. Right now, when we only have
a half segment left before hitting used == max.

Some initial tests show an improved adherence to limit
though it will still be exceeded, because we do _not_
force waiting for segments to become cleared or similar
if we need to add data, thus slow flushing can still make
usage create extra segments. We will however attempt to
shrink disk usage when load is lighter.

Somewhat unclear how much this impacts performance
with tight limits, and how much this matters.

v2:
* Add some comments/explanations
v3:
* Made disk footprint subtract happen post delete (non-optimistic)
2020-08-11 10:40:56 +00:00
Avi Kivity
3530e80ce1 Merge "Support md format" from Benny
"
This series adds support for the "md" sstable format.

Support is based on the following:

* do not use clustering based filtering in the presence
  of static row, tombstones.
* Disabling min/max column names in the metadata for
  formats older than "md".
* When updating the metadata, reset and disable min/max
  in the presence of range tombstones (like Cassandra does
  and until we process them accurately).
* Fix the way we maintain min/max column names by:
  keeping whole clustering key prefixes as min/max
  rather than calculating min/max independently for
  each component, like Cassandra does in the "md" format.

Fixes #4442

Tests: unit(dev), cql_query_test -t test_clustering_filtering* (debug)
md migration_test dtest from git@github.com:bhalevy/scylla-dtest.git migration_test-md-v1
"

* tag 'md-format-v4' of github.com:bhalevy/scylla: (27 commits)
  config: enable_sstables_md_format by default
  test: cql_query_test: add test_clustering_filtering unit tests
  table: filter_sstable_for_reader: allow clustering filtering md-format sstables
  table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results
  table: filter_sstable_for_reader: adjust to md-format
  table: filter_sstable_for_reader: include non-scylla sstables with tombstones
  table: filter_sstable_for_reader: do not filter if static column is requested
  table: filter_sstable_for_reader: refactor clustering filtering conditional expression
  features: add MD_SSTABLE_FORMAT cluster feature
  config: add enable_sstables_md_format
  database: add set_format_by_config
  test: sstable_3_x_test: test both mc and md versions
  test: Add support for the "md" format
  sstables: mx/writer: use version from sstable for write calls
  sstables: mx/writer: update_min_max_components for partition tombstone
  sstables: metadata_collector: support min_max_components for range tombstones
  sstable: validate_min_max_metadata: drop outdated logic
  sstables: rename mc folder to mx
  sstables: may_contain_rows: always true for old formats
  sstables: add may_contain_rows
  ...
2020-08-11 13:29:11 +03:00
Piotr Jastrzebski
80e3923b3c codebase wide: replace find(...) != end() with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
the code pattern looked like:

<collection>.find(<element>) != <collection>.end()

In C++20 the same can be expressed with:

<collection>.contains(<element>)

This is not only more concise but also expresses the intend of the code
more clearly.

This commit replaces all the occurences of the old pattern with the new
approach.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>
2020-08-11 13:28:50 +03:00
Avi Kivity
55cf219c97 Merge "sstable: close files on error" from Benny
"
Make sure to close sstable files also on error paths.

Refs #5509
Fixes #6448

Tests: unit (dev)
"

* tag 'sstable-close-files-on-error-v6' of github.com:bhalevy/scylla:
  sstable: file_writer: auto-close in destructor
  sstable: file_writer: add optional filename member
  sstable: add make_component_file_writer
  sstable: remove_by_toc_name: accept std::string_view
  sstable: remove_by_toc_name: always close file and input stream
  sstable: delete_sstables: delete outdated FIXME comment
  sstable: remove_by_toc_name: drop error_handler parameter
  sstable: remove_by_toc_name: make static
  sstable: read_toc: always close file
  sstable: mark read_toc and methods calling it noexcept
  sstable: read_toc: get rid of file_path
  sstable: open_data, create_data: set member only on success.
  sstable: open_file: mark as noexcept
  sstable: new_sstable_component_file: make noexcept
  sstable: new_sstable_component_file: close file on failure
  sstable: rename_new_sstable_component_file: do not pass file
  sstable: open_sstable_component_file_non_checked: mark as noexcept
  sstable: open_integrity_checked_file_dma: make noexcept
  sstable: open_integrity_checked_file_dma: close file on failure
2020-08-11 13:28:50 +03:00
Botond Dénes
b11d181413 scylla-gdb.py: restore python2 compatibility
Although python2 should be a distant memory by now, the reality is that
we still need to debug scylla on platforms that still have no python3
available (centos7), so we need to keep scylla-gdb.py python2
compatible.

Refs: #7014
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200811093753.1567689-1-bdenes@scylladb.com>
2020-08-11 12:55:42 +03:00
Nadav Har'El
796ad24f37 docs: correct typo in maintainers.md
maintainers.md contains a very helpful explanation of how to backport
Seastar fixes to old branches of Scylla, but has a tiny typo, which
this patch corrects.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200811095350.77146-1-nyh@scylladb.com>
2020-08-11 12:54:41 +03:00
Takuya ASADA
6fbbe836c1 scylla_raid_setup: use mdadm.service on older Debian variants
On older Debian variants does not have mdmonitor.service, we should use
mdadm.service instead.

Fixes #7000
2020-08-11 12:52:24 +03:00
Calle Wilund
a6ad70d3da cdc:stream_id: Encode format version + vnode grouping/index in id
Fixes #6948

Changes the stream_id format from
 <token:64>:<rand:64>
to
 <token:64>:<rand:38><index:22><version:4>

The code will attempt to assert version match when
presented with a stored id (i.e. construct from bytes).
This means that ID:s created by previous (experimental)
versions will break.

Moves the ID encoding fully into the ID class, and makes
the code path private for the topology generation code
path.

Removes some superflous accessors but adds accessors for
token, version and index. (For alternator etc).
2020-08-11 12:48:04 +03:00
Calle Wilund
9167d1ac76 commitlog: Demote buffer write log messages to trace
Because they become very plentiful and annoying when
one tries to analyze segment behaviour. More so in
batch mode.
2020-08-11 09:18:23 +00:00
Asias He
53fee789f0 repair: Use merge_to_gently to merge two lists
During a performance test, test_latency_read_with_nemesis during manager
repair, it experienced a stall of 73 ms:

```
 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >::operator=(repair_row const&) at /usr/include/c++/9/bits/stl_iterator.h:515
 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::__copy_move<false, false, std::bidirectional_iterator_tag>::__copy_m<std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > >(std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >) at /usr/include/c++/9/bits/stl_algobase.h:312
 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::__copy_move_a<false, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > >(std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >) at /usr/include/c++/9/bits/stl_algobase.h:404
 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::__copy_move_a2<false, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > >(std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >) at /usr/include/c++/9/bits/stl_algobase.h:440
 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::copy<std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > >(std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >) at /usr/include/c++/9/bits/stl_algobase.h:474
 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::__merge<std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >, __gnu_cxx::__ops::_Iter_comp_iter<repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}> >(std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >, std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, __gnu_cxx::__ops::_Iter_comp_iter<repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}>, __gnu_cxx::__ops::_Iter_comp_iter<repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}>) at /usr/include/c++/9/bits/stl_algo.h:4923
 (inlined by) std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > > std::merge<std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >, repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}>(std::_List_iterator<repair_row>, std::back_insert_iterator<std::__cxx11::list<repair_row, std::allocator<repair_row> > >, std::_List_iterator<repair_row>, std::_List_iterator<repair_row>, repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}, repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int)::{lambda(repair_row const&, repair_row const&)#1}) at /usr/include/c++/9/bits/stl_algo.h:5018
 (inlined by) repair_meta::apply_rows_on_master_in_thread(std::__cxx11::list<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >, gms::inet_address, seastar::bool_class<update_working_row_buf_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, unsigned int) at ./repair/row_level.cc:1242
repair_meta::get_row_diff_source_op(seastar::bool_class<update_peer_row_hash_sets_tag>, gms::inet_address, unsigned int, seastar::rpc::sink<repair_hash_with_cmd>&, seastar::rpc::source<repair_row_on_wire_with_cmd>&) at ./repair/row_level.cc:1608
repair_meta::get_row_diff_with_rpc_stream(std::unordered_set<repair_hash, std::hash<repair_hash>, std::equal_to<repair_hash>, std::allocator<repair_hash> >, seastar::bool_class<needs_all_rows_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, gms::inet_address, unsigned int) at ./repair/row_level.cc:1674
row_level_repair::get_missing_rows_from_follower_nodes(repair_meta&) at ./repair/row_level.cc:2413
```

The problem was that when std::merge() ran out of one range, it copied the second range.

To fix, use the new merge_to_gently helper.

Fixes #6976
2020-08-11 10:37:34 +08:00
Asias He
0bf0019eeb utils: Add merge_to_gently
This helper is similar to std::merge but it runs inside a thread and
does not stall.

Refs #6976
2020-08-11 10:37:34 +08:00
Benny Halevy
e2340d0684 config: enable_sstables_md_format by default
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:32 +03:00
Benny Halevy
0d85ceaf37 test: cql_query_test: add test_clustering_filtering unit tests
Add unit tests reproducing https://github.com/scylladb/scylla/issues/3552
with clustering-key filtering enabled.

enable_sstables_md_format option is set to true as clustering-key
filtering is enabled only for md-format sstables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:32 +03:00
Benny Halevy
7cfca519cb table: filter_sstable_for_reader: allow clustering filtering md-format sstables
Now that it is safe to filter md format sstable by min/max column names
we can remove the `filtering_broken` variable that disabled filtering
in 19b76bf75b to fix #4442.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:32 +03:00
Benny Halevy
ab67629ea6 table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results
To prevent https://github.com/scylladb/scylla/issues/3552
we want to ensure that in any case that the partition exists in any
sstable, we emit partition_start/end, even when returning no rows.

In the first filtering pass, filter_sstable_for_reader_by_pk filters
the input sstables based on the partition key, and num_sstables is set the size
of the sstables list after the first filtering pass.

An empty sstables list at this stage means there are indeed no sstables
with the required partition so returning an empty result will leave the
cache in the desired state.

Otherwise, we filter again, using filter_sstable_for_reader_by_ck,
and examine the list of the remaining readers.

If num_readers != num_sstables, we know that
some sstables were filterd by clustering key, so
we append a flat_mutation_reader_from_mutations to
the list of readers and return a combined reader as before.
This will ensure that we will always have a partition_start/end
mutations for the queried partition, even if the filtered
readers emit no rows.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:32 +03:00
Benny Halevy
a672747da3 table: filter_sstable_for_reader: adjust to md-format
With the md sstable format, min/max column names in the metadata now
track clustering rows (with or without row tombstones),
range tombstones, and partition tombstones (that are
reflected with empty min/max column names - indicating
the full range).

As such, min and max column names may be of different lengths
due to range tombstones and potentially short clustering key
prefixes with compact storage, so the current matching algorithm
must be changed to take this into account.

To determine if a slice range overlaps the min/max range
we are using position_range::overlaps.

sstable::clustering_components_ranges was renamed to position_range
as it now holds a single position_range rather than a vector of bytes_view ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 19:19:30 +03:00
Benny Halevy
90d0fea7df table: filter_sstable_for_reader: include non-scylla sstables with tombstones
Move contains_rows from table code to sstable::may_contain_rows
since its implementation now has too specific knowledge of sstable
internals.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
2a57ec8c3d table: filter_sstable_for_reader: do not filter if static column is requested
Static rows aren't reflected in the sstable min/max clustering keys metadata.
Since we don't have any indication in the metadata that the sstable stores
static rows, we must read all sstables if a static column is requested.

Refs #3553

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
2fed3f472c table: filter_sstable_for_reader: refactor clustering filtering conditional expression
We're about to drop `filtering_broken` in a future patche
when clustering filtering can be supported for md-format sstables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
e8d7744040 features: add MD_SSTABLE_FORMAT cluster feature
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
65239a6e50 config: add enable_sstables_md_format
MD format is disabled by default at this point.

The option extends enable_sstables_mc_format
so that both are needed to be set for supporting
the md format.

The MD_FORMAT cluster feature will be added in
a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
8e0e2c8a48 database: add set_format_by_config
This is required for test applications that may select a sstable
format different than the default mc format, like perf_fast_forward.

These apps don't use the gossip-based sstables_format_selector
to set the format based on the cluster feature and so they
need to rely on the db config.

Call set_format_by_config in single_node_cql_env::do_with.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
d77ceba498 test: sstable_3_x_test: test both mc and md versions
Run the test cases that write sstables using both the
mc and md versions.  Note that we can still compare the
resulting Data, Index, Digest, and Filter components
with the prepared mc sstables we have since these
haven't changed in md.

We take special consideration around validating
min/max column names that are now calculated using
a revised algorithm in the md format.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Pekka Enberg
3168be3483 test: Add support for the "md" format
Test also the md format in all_sstable_versions.
Add pre-computed md-sstable files generated using Cassandra version 3.11.7

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
e44ec45ab9 sstables: mx/writer: use version from sstable for write calls
Rather than using a constant sstable_version_types::mc.
In preparation to supporting sstable_version_types::md.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
bd4383a842 sstables: mx/writer: update_min_max_components for partition tombstone
Partition tombstones represent an implicit clustering range
that is unbound on both sides, so reflect than in min/max
column names metadata using empty clustering key prefixes.

If we don't do that, when using the sstable for filtering, we have no
other way of distinguishing range tombstones from partition tombstones
given the sstable metadata and we would need to include any sstable
with tombstones, even if those are range tombstone, for which
we can do a better filtering job, using the sstable min/max
column names metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
68acae5873 sstables: metadata_collector: support min_max_components for range tombstones
We essentially treat min/max column names as range bounds
with min as incl_start and max as incl_end.

By generating a bound_view for min/max column names on the fly,
we can correctly track and compare also short clustering
key prefixes that may be used as bounds for range tombstones.

Extend the sstable_tombstone_metadata_check unit test
to cover these cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
34fb95dacf sstable: validate_min_max_metadata: drop outdated logic
The following checks were introduced in 0a5af61176
To deal with a bug in min max metadata generation of our own,
from a time where only ka / la were supported.

This is no longer relevant now that we'll consider min_max_column_names
only for sstable format > mc (in sstable::may_contain_rows)

We choose not to clear_incorrect_min_max_column_names
from older versions here as this disturbs sstable unit tests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
12393c5ec2 sstables: rename mc folder to mx
Prepare for supporting the md format.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
7139fb92e6 sstables: may_contain_rows: always true for old formats
the min/max column names metadata can be trusted only
starting the md format, so just always return `true`
for older sstable formats.

Note that we could achieve that by clearing the min/max
metadata in set_clustering_components_ranges but we choose
not to do so since it disturbs sstable unit tests

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
200d8d41d9 sstables: add may_contain_rows
Move the logic from table to sstable as it will contain
intimate knowledge of the sstable min/max column names validity
for md format.

Also, get rid of the sstable::clustering_components_ranges() method
as the member is used only internally by the sstable code now.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Pekka Enberg
a37eaaa022 sstables: Add support for the "md" format enum value
Add the sstable_version_types::md enum value
and logically extend sstable_version_types comparisons to cover
also the > sstable_version_types::mc cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
7de004d42a sstables: version: delete unused is_latest_supported predicate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
025b74e20e sstables: metadata_collector: use empty key to represent full min/max range
Instead of keeping the `_has_min_max_clustering_keys` flag,
just store an empty key for `_{min,max}_clustering_key` to represent
the full range.  These will never be narrowed down and will be
encoded as empty min/max column names as if they weren't set.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
9f114d821a sstables: keep whole clustering_key_prefix as min/max_column_names
Currently we compare each min/max component independently.
This may lead to suboptimal, inclusive clustering ranges
that do not indicate any actual key we encountered.

For example: ['a', 2], ['b', 1] will lead to min=['a', 1], max=['b', 2]
instead of the keys themselves.

This change keeps the min or max keys as a whole.

It considers shorter clustering prefixes (that are possible with compact
storage) as range tombstone bounds, so that a shorter key is considered
less than the minimum if the latter has a common prefix, and greater
than the maximum if the latter has a common prefix.

Extend the min_max_clustering_key_test to test for this case.
Previously {"a", "2"}, {"b", "1"} clustering keys would erronuously
end up with min={"a", "1"} max={"b", "2"} while we want them to be
min={"a", "2"} max={"b", "1"}.

Adjust sstable_3_x_test to ignore original mc sstables that were
previously computed with different min/max column names.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:03 +03:00
Benny Halevy
707b098f44 sstables: metadata_collector: construct with schema
Pass the sstable schema to the metadata_collector constructor.

Note that the long term plan is to move metadata_collector
to the sstable writer but this requires a bigger change to
get rid of the dependencies on it in the legacy writer code
in class sstable methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:52:43 +03:00
Benny Halevy
c9cade833c sstables: metadata_collector: make only for write path
make a metadata_collector only when writing the sstable,
no need to make one when reading.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:51:12 +03:00
Rafael Ávila de Espíndola
74db08165d tests: Convert to using memory::with_allocation_failures
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200805155143.122396-1-espindola@scylladb.com>
2020-08-10 18:37:42 +03:00
Piotr Jastrzebski
52ec0c683e codebase wide: replace erase + remove_if with erase_if
C++20 introduced std::erase_if which simplifies removal of elements
from the collection. Previously the code pattern looked like:

<collection>.erase(
        std::remove_if(<collection>.begin(), <collection>.end(), <predicate>),
        <collection>.end());

In C++20 the same can be expressed with:

std::erase_if(<collection>, <predicate>);

This commit replaces all the occurences of the old pattern with the new
approach.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <6ffcace5cce79793ca6bd65c61dc86e6297233fd.1597064990.git.piotr@scylladb.com>
2020-08-10 18:17:38 +03:00
Calle Wilund
9620755c7f database: Do not assert on replay positions if truncate does not flush
Fixes #6995

In c2c6c71 the assert on replay positions in flushed sstables discarded by
truncate was broken, by the fact that we no longer flush all sstables
unless auto snapshot is enabled.

This means the low_mark assertion does not hold, because we maybe/probably
never got around to creating the sstables that would hold said mark.

Note that the (old) change to not create sstables and then just delete
them is in itself good. But in that case we should not try to verify
the rp mark.
2020-08-10 18:17:38 +03:00
Avi Kivity
f9aea94c5c Merge 'add out of box configs for GCP VMs with nvmes' from Lubos
"
not recommended setups will still run iotune
fixes #6631
"

* tarzanek-gcp-iosetup:
  scylla_io_setup: Supported GCP VMs with NVMEs get out of box I/O configs
  scylla_util.py: add support for gcp instances
  scylla_util.py: support http headers in curl function
  scylla_io_setup: refactor iotune run to a function
2020-08-10 18:17:38 +03:00