Commit Graph

896 Commits

Author SHA1 Message Date
Pavel Emelyanov
5ecbc33be5 database.*: Remove unused headers
The database.hh is the central recursive-headers knot -- it has ~50
includes. This patch leaves only 34 (it remains the champion though).
Similar thing for database.cc.
Both changes help the latter compile ~4% faster :)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210414183107.30374-1-xemul@scylladb.com>
2021-04-18 14:03:17 +03:00
Kamil Braun
5c7ed7a83f time_series_sstable_set: return partition start if some sstables were ck-filtered out
When a particular partition exists in at least one sstable, the cache
expects any single-partition query to this partition to return a `partition_start`
fragment, even if the result is empty.

In `time_series_sstable_set::create_single_key_sstable_reader` it could
happen that all sstables containing data for the given query get
filtered out and only sstables without the relevant partition are left,
resulting in a reader which immediately returns end-of-stream (while it
should return a `partition_start` and if not in forwarding mode, a
`partition_end`). This commit fixes that.

We do it by extending the reader queue (used by the clustering reader
merger) with a `dummy_reader` which will be returned by the queue as
the very first reader. This reader only emits a `partition_start` and,
if not in forwarding mode, a `partition_end` fragment.

Fixes #8447.

Closes #8448
2021-04-14 13:16:00 +02:00
Calle Wilund
03590c8254 commitlog_test: Add test for deadlock in shutdown w. segment wait
Refs #8438

Ensures shutting down (well behaved) works even if an allocating
path is stuck waiting for a new segment - i.e. other aspect of

Closes #8475
2021-04-14 13:16:00 +02:00
Avi Kivity
b756693e64 Merge "mutation_query: move query methods into table" from Botond
"
These methods are generic ways to query a mutation source. At least they
used to be, but nowadays they are pretty specific to how tables are
queried -- they use a querier cache to lookup queriers from and save
them into. With the coming changes to how permits are obtained, they are
about to get even more specific to tables. Instead of forcing the
genericity and keep adding new parameters, this patchset bites the
bullet and moves them to table. `data_query()` is inlined into
`table::query()`, while `mutation_query()` is replaced with
`table::mutation_query()`.
The only other users besides table are tests and they are adjusted to
use similarly named local methods that just combine the right querier
with the right result builder. This combination is what the tests really
want to test, as this is also what is used by the table methods behind
the scenes.

Tests: unit(release, debug)
"

* 'mutation-query-move-query-methods-into-table/v1' of https://github.com/denesb/scylla:
  mutation_query: remove now unused mutation_query()
  test: mutation_query_test: use local mutation_query() implementation
  database: mutation_query(): use table::mutation_query()
  table: add mutation_query()
  query: remove the now unused data_query()
  test: mutation_query_test: use local data_query() implementation
  table: query(): inline data_query() code into query()
  table: make query() a coroutine
2021-04-14 13:15:59 +02:00
Avi Kivity
e3db889057 Merge 'Introduce service levels' from Piotr Sarna
This series introduces service level syntax borrowed from https://docs.scylladb.com/using-scylla/workload-prioritization/ , but without workload prioritization itself - just for the sake of using identical syntax to provide different parameters later. The new parameters may include:
 * per-service-level timeouts
 * oltp/olap declaration, which may change the way Scylla treats long requests - e.g. time them out (the oltp way) or keep them sustained with empty pages (the olap way)

Refs #7617

Closes #7867

* github.com:scylladb/scylla:
  transport: initialize query state with service level controller
  main: add initializing service level data accessor
  service: make enable_shared_from_this inheritance public
  cql3: add SERVICE LEVEL syntax (without an underscore)
  unit test: Add unit test for per user sla syntax
  cql: Add support for service level cql queries
  auth: Add service_level resource for supporting in authorization of cql service_level
  cql: Support accessing service_level_controller from query state
  instantiate and initialize the service_level_controller
  qos: Add a standard implementation for service level data accessor
  qos: add waiting for the updater future
  service/qos: adding service level controller
  service_levels: Add documentation for distributed tables
  service/qos: adding service level table to the distributed keyspace
  service/qos: add common definitions
  auth: add support for role attributes
2021-04-12 17:34:43 +03:00
Eliran Sinvani
144fe02c23 unit test: Add unit test for per user sla syntax
This commit adds the infrastructure needed to test per user sla,
more specificaly, a service level accessor that triggers the
update_service_levels_from_distributed_data function uppon any
change to the dystributed sla data.
A test was added that indirectly consumes this infrastructure by
changing the distributed service level data with cql queries.
Message-Id: <23b2211e409446c4f4e3e57b00f78d9ff75fc978.1609249294.git.sarna@scylladb.com>
2021-04-12 16:31:26 +02:00
Eliran Sinvani
a88929da15 auth: Add service_level resource for supporting in authorization of cql service_level
queries

In order to be able to manage service_level configuration one must be authorized
to do so, or to be a superuser. This commit adds the support for service_levels
resource. Since service_levels are relative, reconfiguring one service level is not locallized
only to that service level and will affect the QOS for all of the service levels,
so there is not much sense of granting permissions to manage individual service_levels.
This is why only root resource named service_levels that  represents all service levels is used.
This commit also implements the unit test additions for the newly introduced resource.
Message-Id: <81ab16fa813b61be117155feea405da6266921e3.1609237687.git.sarna@scylladb.com>
2021-04-12 16:01:04 +02:00
Ivan Prisyazhnyy
0836efd830 tracing: test/boost/tracing: fix use after free
fixes AddressSanitizer: stack-buffer-underflow on address 0x7ffd9a375820 at pc 0x555ac9721b4e bp 0x7ffd9a374e70 sp 0x7ffd9a374620

Backend registry holds a unique pointer to the backend implementation
that must outlive the whole tracing lifetime until the shutdown call.

So it must be catched/moved before the program exits its scope by
passing out the lambda chain.

Regarding deletion of the default destructor: moving object requires
a move constructor (for do_with) that is not implicitly provided if
there is a user-defined object destructor defined even tho its impl
is default.

Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>

Closes #8461
2021-04-12 16:44:07 +03:00
Kamil Braun
7ffb0d826b clustering_order_reader_merger: handle empty readers
The merger could return end-of-stream if some (but not all) of the
underlying readers were empty (i.e. not even returning a
`partition_start`). This could happen in places where it was used
(`time_series_sstable_set::create_single_key_sstable_reader`) if we
opened an sstable which did not have the queried partition but passed
all the filters (specifically, the bloom filter returned a false
positive for this sstable).

The commit also extends the random tests for the merger to include empty
readers and adds an explicit test case that catches this bug (in a
limited scope: when we merge a single empty reader).

It also modifies `test_twcs_single_key_reader_filtering` (regression
test for #8432) because the time where the clustering key filter is
invoked changes (some invocations move from the constructor of the
merger to operator()). I checked manually that it still catches the bug
when I reintroduce it.

Fixes #8445.

Closes #8446
2021-04-12 10:34:52 +03:00
Tomasz Grabiec
305372820d Merge "Make position_in_partition::tri_compare use strong_ordering" from Pavel Emelyanov
There are some users of that tri_comparator which are also
converted to strong_ordering. Most of the code using those
is, in turn, already handling return values interchangeably.

The bound_view::tri_compare, which's used by the guy, is
still returning int.

tests: unit(dev)

* xemul/br-position-tri-compare:
  code: Relax position_in_partition::tri_compare users
  position_in_partition: Convert tri_compare to strong_ordering
  test: Convert clustering_fragment_summary::tri_cmp to strong_ordering
  repair: Convert repair_sync_boundary::tri_compare to strong_ordering
  view: Don't expect int from position_in_partition::tri_compare
2021-04-09 17:54:38 +02:00
Pavel Emelyanov
64074f45ce code: Relax position_in_partition::tri_compare users
There are some pieces left doing res <=> 0 with the
res now being a strong_ordering itself. All these can
be just dropped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 18:20:39 +03:00
Pavel Emelyanov
a15f158661 test: Convert clustering_fragment_summary::tri_cmp to strong_ordering
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 18:20:39 +03:00
Botond Dénes
3dbb456fba test: mutation_query_test: use local mutation_query() implementation
Add a local `mutation_query()` variant, which only contains the pieces
of logic the test really wants to test: invoking
`mutation_querier::consume_page()` with a `reconcilable_result_builder`.
This allows us to get rid of the now otherwise unused
`mutation_query()`.
2021-04-09 13:40:27 +03:00
Botond Dénes
59ea36731b test: mutation_query_test: use local data_query() implementation
The test only wants to test result size calculation so it doesn't need
the whole `data_query()` logic. Replace the call to `data_query()` with
one to a local alternative which contains just the necessary bits --
invoking `data_querier::consume_page()` with the right result builder.
This allows us get rid of the now otherwise unused `data_query()`.
2021-04-09 13:40:27 +03:00
Pavel Emelyanov
4558eb3afc partition_snapshot_row_cursor: Move cells hash creation to reader
Right now call to .row() method may create hash on row's cells.
It's counterintuitive to see a const method that transparently
changes something it points to. Since the only caller of a row()
who knows whether the hash creation is required is the cache
reader, it's better to move the call to prepare_hash() into it.

Other than making the .row() less surprising this also helps to
get rid of the whole method by the next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 12:18:29 +03:00
Pavel Emelyanov
00caf5f219 partition_snapshot_row_cursor: Move read_partition into test
The method in question is test-only helper, there's no
need in keeping it as a part of the API.

Another reason to move is that the method is O(number of
rows) and doesn't preempt while looping, but cursor code
users try hard not to stall the reactor. So even though
this method has a meaningful semantics within the class,
it will better be reinvented if needed in core code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 12:16:13 +03:00
Kamil Braun
3687757115 sstables: fix TWCS single key reader sstable filter
The filter passed to `min_position_reader_queue`, which was used by
`clustering_order_reader_merger`, would incorrectly include sstables as
soon as they passed through the PK (bloom) filter, and would include
sstables which didn't pass the PK filter (if they passed the CK
filter). Fortunately this wouldn't cause incorrect data to be returned,
but it would cause sstables to be opened unnecessarily (these sstables
would immediately return eof), resulting in a performance drop. This commit
fixes the filter and adds a regression test which uses statistics to
check how many times the CK filter was invoked.

Fixes #8432.

Closes #8433
2021-04-08 18:03:49 +03:00
Piotr Sarna
8e808a56d2 Merge 'commitlog: Fix race and edge condition in delete_segments' from Calle Wilund
Fixes #8363
Fixes #8376

Delete segements has two issues when running with size-limited
commit log and strict adherence to said limit.

1.) It uses parallel processing, with deferral. This means that
    the disk usage variables it looks at might not be fully valid
    - i.e. we might have already issued a file delete that will
    reduce disk footprint such that a segment could instead be
    recycled, but since vars are (and should) only updated
    _post_ delete, we don't know.
2.) It does not take into account edge conditions, when we only
    delete a single segment, and this segment is the border segment
    - i.e. the one pushing us over the limit, yet allocation is
    desperately waiting for recycling. In this case we should
    allow it to live on, and assume that next delete will reduce
    footprint. Note: to ensure exact size limit, make sure
    total size is a multiple of segment size.

if we had an error in recycling (disk rename?), and no elements
are available, we could have waiters hoping they will get segements.
abort the queue (not permanent, but wakes up waiters), and let them
retry. Since we did deletions instead, disk footprint should allow
for new allocs at least. Or more likely, everything is broken, but
we will at least make more noise.

Closes #8372

* github.com:scylladb/scylla:
  commitlog: Add signalling to recycle queue iff we fail to recycle
  commitlog: Fix race and edge condition in delete_segments
  commitlog: coroutinize delete_segments
  commitlog_test: Add test for deadlock in recycle waiter
2021-04-07 15:13:25 +02:00
Raphael S. Carvalho
8e0a1ca866 sstable_set: Implement compound_sstable_set's create_single_key_sstable_reader()
compound set isn't overriding create_single_key_sstable_reader(), so
default implementation is always called. Although default impl will
provide correct behavior, specialized ones which provides better perf,
which currently is only available for TWCS, were being ignored.

compound set impl of single key reader will basically combine single key
readers of all sets managed by it.

Fixes #8415.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210406205009.75020-1-raphaelsc@scylladb.com>
2021-04-07 12:36:30 +03:00
Calle Wilund
813694b617 commitlog_test: Add test for deadlock in recycle waiter
Not a very good test, mind you. Nothing to verify, just see if
the test times out. But try to make it at least complete for
failure report.
2021-04-06 16:38:14 +00:00
Konstantin Osipov
c83cf1f965 uuid: switch the API to use std::chrono
A follow up for the patch for #7611. This change was requested
during review and moved out of #7611 to reduce its scope.

The patch switches UUID_gen API from using plain integers to
hold time units to units from std::chrono.

For one, we plan to switch the entire code base to std::chrono units,
to ensure type safety. Secondly, using std::chrono units allows to
increase code reuse with template metaprogramming and remove a few
of UUID_gen functions that beceme redundant as a result.

* switch  get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch
  min_time_UUID(), max_time_UUID(), create_time_safe() to
  std::chrono
* remove unused variant of from_unix_timestamp()
* remove unused get_time_UUID_bytes(), create_time_unsafe(),
  redundant get_adjusted_timestamp()
* inline get_raw_UUID_bytes()
* collapse to similar implementations of get_time_UUID()
* switch internal constants to std::chrono
* remove unnecessary unique_ptr from UUID_gen::_instance
Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>
2021-04-06 17:12:54 +03:00
Avi Kivity
4739df2cb1 Merge 'cql3: remove linearizations in the write path' from Michał Chojnowski
As a part of the effort of removing big, contiguous buffers from the codebase,
cql3::raw_value should be made fragmented. Unfortunately a straightforward
rewrite to a fragmented buffer type is not possible, because we want
cql3::raw_value to be compatible with cql3::raw_value_view, and we want that
view to be based on fragmented_temporary_buffer::view, so that it can be
used to view data coming directly from seastar without copying.

This patch makes cql3::raw_value fragmented by making cql3::raw_value_view
a `variant` of managed_bytes_view and fragmented_temporary_buffer::view.

Code users which depended on `cql3::raw_value` being `bytes`,
and cql::raw_value_view being `fragmented_temporary_buffer::view` underneath
were adjusted to the new, dual representation, mainly through the
`cql3::raw_value_view::with_value` visitor and deserialization/validation
helpers added to `cql3::raw_value_view`.

The second part of this series gets rid of linearizations occuring when processing
compound types in the CQL layer. This is achieved by storing their elements in
`managed_bytes` instead of `bytes` in the partially deserialized form (`lists::value`
`tuples::value`, etc.) outputting `managed_bytes` instead of `bytes` in functions
which go from the partially deserialized form to the atomic cell format (for frozen
types), and avoiding calling deserialize/serialize on individual elements when
it's not necessary. (It's only necessary for CQLv2, because since CQLv3 the format
on the wire is the same as our internal one).

The above also forces some changes to `expression.cc`, and `restrictions`, mainly because
`IN` clauses store their arguments as `lists` and `tuples`, and the code which handled
this clause expected `bytes`.

After this series, the path from prepared CQL statements to `atomic_cell_or_collection`
is almost completely linearization-free. The last remaining place is `collection_mutation_description`,
where map keys are linearized to `bytes`.

Closes #8160

* github.com:scylladb/scylla:
  cql3: update_parameters: remove unused version of make_cell for bytes_view
  types: collection: remove an unused version of pack_fragmented
  cql3: optimize the deserialization of collections
  cql3: maps, sets: switch the element type from bytes to managed_bytes
  cql3: expression: use managed_bytes instead of bytes where possible
  cql3: expr: expression: make the argument of to_range a forwarding reference
  cql3: don't linearize elements of lists, tuples, and user types
  cql3: values: add const managed_bytes& constructor to raw_value_view
  cql3: output managed_bytes instead of bytes in get_with_protocol_version
  types: collection: add versions of pack for fragmented buffers
  types: add write_collection_{value,size} for managed_bytes_mutable_view
  cql3: tuples, user_types: avoid linearization in from_serialized() and get()
  types: tuple: add build_value_fragmented
  cql3: update_parameters: add make_cell version for managed_bytes_view
  cql3: remove operation::make_*cell
  cql3: values: make raw_value fragmented
  cql3: values: remove raw_value_view::operator==
  cql3: switch users of cql3::raw_value_view to internals-independent API
  cql3: values: add an internals-independent API to raw_value_view
  utils: managed_bytes: add a managed_bytes constructor from FragmentedView
  utils: managed_bytes: add operator<< and to_hex for managed_bytes
  utils: fragment_range: add to_hex
  configure: remove unused link dependencies from UUID_test
2021-04-01 15:21:32 +03:00
Pavel Emelyanov
8bbe2eae5e btree: Convert comparator to <=>
It turned out that all the users of btree can already be converted
to use safer std::strong_ordering. The only meaningful change here
is the btree code itself -- no more ints there.

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210330153648.27049-1-xemul@scylladb.com>
2021-04-01 12:56:08 +03:00
Michał Chojnowski
5984d6b2ce cql3: values: remove raw_value_view::operator==
It's only used in a single test, and there is no reason why it should ever
be used anywhere else. So let's remove it from the public header and move
it to that test.
2021-04-01 10:42:07 +02:00
Michał Chojnowski
b9322a6b71 cql3: switch users of cql3::raw_value_view to internals-independent API
We want to change the internals of cql3::raw_value{_view}.
However, users of cql3::raw_value and cql3::raw_value_view often
use them by extracting the internal representation, which will be different
after the planned change.

This commit prepares us for the change by making all accesses to the value
inside cql3::raw_value(_view) be done through helper methods which don't expose
the internal representation publicly.

After this commit we are free to change the internal representation of
raw_value_{view} without messing up their users.
2021-04-01 10:42:04 +02:00
Michał Chojnowski
4715268e30 utils: managed_bytes: add operator<< and to_hex for managed_bytes
We will need them to replace bytes with managed_bytes in some places in an
upcoming patch.

The change to configure.py is necessary because opearator<< links to to_hex
in bytes.cc.
2021-04-01 10:39:42 +02:00
Piotr Sarna
6de2691bbd sstables,test: remove variables depending on old features
In order to maintain backward compatibility wrt. cluster features,
two boolean variables were kept in sstable writers:
 - correctly_serialize_non_compound_range_tombstones
 - correctly_serialize_static_compact_in_mc

Since these features are assumed to always be present now,
the above variables are no longer needed and can be purged.
2021-03-30 09:37:41 +02:00
Botond Dénes
3c54c990ab test: view_build_test: test_view_update_generator_buffering: fail gracefully
Failures in this test typically happen inside the test consumer object.
These however don't stop the test as the code invoking the consumer
object handles exceptions coming from it. So the test will run to
completion and will fail again when comparing the produced output with
the expected one. This results in distracting failures. The real problem
is not the difference in the output, but the first check that failed,
which is however buried in the noise. To prevent this add an "ok" flag
which is set to false if the consumer fails. In this case the additional
checks are skipped in the end to not generate useless noise.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210326083147.26113-2-bdenes@scylladb.com>
2021-03-29 17:58:28 +03:00
Avi Kivity
a8463cfb37 Merge "reader_permit: signal leaked resources" from Botond
"
When a permit is destroyed we check if it still holds on to any
resources in the destructor. Any resources the permit still holds on are
leaked resources, as users should have released these. Currently we just
invoke `on_internal_error_noexcept()` to handle this, which -- depending
on the configuration -- will result in an error message or an assert. In
the former case, the resources will be leaked for good. This mini-series
fixes this, by signaling back these resources to the semaphore. This
helps avoid an eventual complete dry-up of all semaphore resources and a
subsequent complete shutdown of reads.

Tests: unit(release, debug)
"

* 'reader-permit-signal-leaked-resources/v1' of https://github.com/denesb/scylla:
  reader_permit: signal leaked resources
  test: test_reader_lifecycle_policy: keep semaphores alive until all ops cease
  sstables: generate_summary(): extend the lifecycle of the reader concurrency semaphore
2021-03-29 17:57:31 +03:00
Botond Dénes
9e01c4c667 test: view_build_test: test_view_update_generator_buffering: use separate permit for readers
Said test has two separate logical readers, but they share the same
permit, which is illegal. This didn't cause any problems yet, but soon
the semaphore will start to keep score of active/inactive permits which
will be confused by such sharing, so have them use separate permits.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210326083147.26113-1-bdenes@scylladb.com>
2021-03-29 17:35:51 +03:00
Pavel Solodovnikov
7c229998e8 raft: unit-tests for raft_address_map
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-03-26 20:22:44 +03:00
Botond Dénes
0f1a72ba59 test: test_reader_lifecycle_policy: keep semaphores alive until all ops cease
To ensure the semaphores outlive all permits created as part of the
tests.
2021-03-26 14:22:43 +02:00
Piotr Wojtczak
c1daf2bb24 column_family: Make toppartitions queries more generic
Right now toppartitions can only be invoked on one column family at a time.
This change introduces a natural extension to this functionality,
allowing to specify a list of families.

We provide three ways for filtering in the query parameter "name_list":
    1. A specific column family to include in the form "ks:cf"
    2. A keyspace, telling the server to include all column families in it.
       Specified by omitting the cf name, i.e. "ks:"
    3. All column families, which is represented by an empty list
The list can include any amount of one or both of the 1. and 2. option.

Fixes #4520

Closes #7864
2021-03-24 17:54:05 +02:00
Avi Kivity
3c44445c07 Merge "Introduce off-strategy compaction for repair-based bootstrap and replace" from Raphael
"
Scylla suffers with aggressive compaction after repair-based operation has initiated. That translates into bad latency and slowness for the operation itself.

This aggressiveness comes from the fact that:
1) new sstables are immediately added to the compaction backlog, so reducing bandwidth available for the operation.
2) new sstables are in bad shape when integrated into the main sstable set, not conforming to the strategy invariant.

To solve this problem, new sstables will be incrementally reshaped, off the compaction strategy, until finally integrated into the main set.

The solution takes advantage there's only one sstable per vnode range, meaning sstables generated by repair-based operations are disjoint.

NOTE: off-strategy for repair-based decommission and removenode will follow this series and require little work as the infrastructure is introduced in this series.

Refs #5226.
"

* 'offstrategy_v7' of github.com:raphaelsc/scylla:
  tests: Add unit test for off-strategy sstable compaction
  table: Wire up off-strategy compaction on repair-based bootstrap and replace
  table: extend add_sstable_and_update_cache() for off-strategy
  sstables/compaction_manager: Add function to submit off-strategy work
  table: Introduce off-strategy compaction on maintenance sstable set
  table: change build_new_sstable_list() to accept other sstable sets
  table: change non_staging_sstables() to filter out off-strategy sstables
  table: Introduce maintenance sstable set
  table: Wire compound sstable set
  table: prepare make_reader_excluding_sstables() to work with compound sstable set
  table: prepare discard_sstables() to work with compound sstable set
  table: extract add_sstable() common code into a function
  sstable_set: Introduce compound sstable set
  reshape: STCS: preserve token contiguity when reshaping disjoint sstables
2021-03-22 10:43:13 +02:00
Benny Halevy
f562c9c2f3 test: sstable_datafile_test: tombstone_purge_test: use a longer ttl
As seen in next-3319 unit testing on jenkins
The cell ttl may expire during the test (presuming
that the test machine was overloaded), leading to:
```
INFO  2021-03-21 10:05:23,048 [shard 0] compaction - [Compact tests.tombstone_purge 2fcaf680-8a1c-11eb-b1b9-97020c5d261e] Compacting [/jenkins/workspace/scylla-master/next/scylla/testlog/release/scylla-af8644ec-7f07-4ffe-80bf-6703a942e435/la-17-big-Data.db:level=0:origin=, ]
INFO  2021-03-21 10:05:23,048 [shard 0] compaction - [Compact tests.tombstone_purge 2fcaf680-8a1c-11eb-b1b9-97020c5d261e] Compacted 1 sstables to []. 4kB to 0 bytes (~0% of original) in 0ms = 0 bytes/s. ~128 total partitions merged to 0.
./test/lib/mutation_assertions.hh(108): fatal error: in "tombstone_purge_test": Mutations differ, expected {table: 'tests.tombstone_purge', key: {'id': alpha, token: -7531858254489963}, mutation_partition: {
  rows: [
    {
      cont: true,
      dummy: false,
      position: {
        bound_weight: 0,
      },
      'value': { atomic_cell{1,ts=1616313953,expiry=1616313958,ttl=5} },
    },
  ]
}
}
 ...but got: {table: 'tests.tombstone_purge', key: {'id': alpha, token: -7531858254489963}, mutation_partition: {
  rows: [
    {
      cont: true,
      dummy: false,
      position: {
        bound_weight: 0,
      },
      'value': { atomic_cell{DEAD,ts=1616313953,deletion_time=1616313953} },
    },
  ]
}
}
```

This corresponds to:
```
2395            auto mut2 = make_expiring(alpha, ttl);
2396            auto mut3 = make_insert(beta);
...
2399            auto sst2 = make_sstable_containing(sst_gen, {mut2, mut3});
```

Extend (logical) ttl to 10 seconds to reduce flakiness
due to real-time timing.

Test: sstable_datafile_test(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210321142931.1226850-1-bhalevy@scylladb.com>
2021-03-21 16:42:00 +02:00
Avi Kivity
1e820687eb Merge "reader_concurrency_semaphore: limit non-admitted inactive reads" from Botond
"
Due to bad interaction of recent changes (913d970 and 4c8ab10) inctive
readers that are not admitted have managed to completely fly under the
radar, avoiding any sort of limitation. The reason is that pre-admission
the permits don't forward their resource cost to the semaphore, to
prevent them possibly blocking their own admission later. However this
meant that if such a reader is registered as inactive, it completely
avoids the normal resource based eviction mechanism and can accumulate
without bounds.
The real solution to this is to move the semaphore before the cache and
make all reads pass admission before they get started (#4758). Although
work has been started towards this, it is still a while until it lands.
In the meanwhile this patchset provides a workaround in the form of a
new inactive state, which -- like admitted -- causes the permit to
forward its cost to the semaphore, making sure these un-admitted
inactive reads are accounted for and evicted if there is too much of
them.

Fixes: #8258

Tests: unit(release), dtest(oppartitions_test.py:TestTopPartitions.test_read_by_gause_key_distribution_for_compound_primary_key_and_large_rows_number)
"

* 'reader-concurrency-semaphore-limit-inactive-reads/v4' of https://github.com/denesb/scylla:
  test: mutation_reader_test: add test for permit cleanup
  test: querier_cache_test: add memory based cache eviction test
  reader_permit: add inactive state
  querier: insert(): account immediately evicted querier as resource based eviction
  reader_concurrency_semaphore: fix clear_inactive_reads()
  reader_concurrency_semaphore: make inactive_read_handle a weak reference
  reader_concurrency_semaphore: make evict() noexcept
  reader_concurrency_semaphore: update out-of-date comments
2021-03-21 16:24:54 +02:00
Avi Kivity
a78f43b071 Merge 'tracing: fast slow query tracing' from Ivan Prisyazhnyy
The set of patches introduces a new tracing mode - `fast slow query tracing`. In this mode, Scylla tracks only tracing sessions and omits all tracing events if the tracing context does not have a `full_tracing` state set.

Fixes #2572

Motivation
---

We want to run production systems with that option always enabled so we could always catch slow queries without an overhead. The next step is we are gonna optimize further the costs of having tracing enabled to minimize session context handling overhead to allow it to be as transparent for the end-user as possible.

Fast tracing mode
---

To read the status do

    $ curl -v http://localhost:10000/storage_service/slow_query

To enable fast slow-query tracing

    $ curl -v --request POST http://localhost:10000/storage_service/slow_query\?fast=true\&enable=true

Potential optimizations
---

- remove tracing::begin(lazy_eval)
- replace tracing::begin(string) for enum to remove copying and memory allocations
- merge parameters allocations
- group parameters check for trace context
- delay formatting
- reuse prepared statement shared_ptr instead of both copying it and copying its query

Performance
---

100% cache hits
---

1 Core:

```
$ SCYLLA_HOME=/home/sitano.public/Projects/scylla build/release/scylla --smp 1 --cpuset 7 --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --workdir /home/sitano.public/Projects/scylla --developer-mode 1 --listen-address 0.0.0.0 --api-address 0.0.0.0 --rpc-address 0.0.0.0 --broadcast-rpc-address 172.18.0.1 --broadcast-address 127.0.0.1

./cassandra-stress write n=100000 no-warmup -pop seq=1..100000 -node 127.0.0.1 -log level=verbose -rate threads=1 -mode native cql3

curl --request POST http://localhost:10000/storage_service/slow_query\?fast\=false\&enable\=false
for i in $(seq 5); do
  taskset -c 2,3,4,5 ./cassandra-stress read duration=5m -pop seq=1..100000 -node 127.0.0.1 -log level=verbose -rate threads=4 throttle=30000/s -mode native cql3
done

curl --request POST http://localhost:10000/storage_service/slow_query\?fast\=true\&enable\=true
for i in $(seq 5); do
  taskset -c 2,3,4,5 ./cassandra-stress read duration=5m -pop seq=1..100000 -node 127.0.0.1 -log level=verbose -rate threads=4 throttle=30000/s -mode native cql3
done

curl --request POST http://localhost:10000/storage_service/slow_query\?fast\=false\&enable\=true
for i in $(seq 5); do
  taskset -c 2,3,4,5 ./cassandra-stress read duration=5m -pop seq=1..100000 -node 127.0.0.1 -log level=verbose -rate threads=4 throttle=30000/s -mode native cql3
done
```

  | qps |   |   |  
-- | -- | -- | -- | --
  | baseline | fast, slow | nofast, slow | %[1-fastslow/baseline]
  | 29,018 | 26,468 | 23,591 | 8.79%
  | 28,909 | 26,274 | 23,584 | 9.11%
  | 28,900 | 26,547 | 23,598 | 8.14%
  | 28,921 | 26,669 | 23,596 | 7.79%
  | 28,821 | 26,385 | 23,601 | 8.45%
stdev | 70.24030182 | 150.9678774 | 6.670832032 |  
avg | 28,914 | 26,469 | 23,594 |  
stderr | 0.24% | 0.57% | 0.03% |  
%[avg/baseline] |   | **8.46%** | 18.40% |  

8.46% performance degradation in `fast slow query mode` for pure in-memory workload with minimum traces.
18.40%  performance degradation in `original slow query mode` for pure in-memory workload with minimum traces.

0% cache hits
---

1GB memory, 1 Core:

    $ SCYLLA_HOME=/home/sitano.public/Projects/scylla build/release/scylla --memory 1G --smp 1 --cpuset 7 --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --workdir /home/sitano.public/Projects/scylla --developer-mode 1 --listen-address 0.0.0.0 --api-address 0.0.0.0 --rpc-address 0.0.0.0 --broadcast-rpc-address 172.18.0.1 --broadcast-address 127.0.0.1

2.4GB, 10000000 keys data:

    $ ./cassandra-stress write n=10000000 no-warmup -pop seq=1..10000000 -node 127.0.0.1 -log level=verbose -rate threads=4 -mode native cql3
    $ curl --request POST http://localhost:10000/storage_service/slow_query\?fast\=true\&enable\=true

CASSANDRA_STRESS prepared statements with BYPASS CACHE

    $ taskset -c 2,3,4,5 ./cassandra-stress read duration=5m -pop seq=1..10000000 -node 127.0.0.1 -log level=verbose -rate threads=4 throttle=30000/s -mode native cql3

20000 reads IOPS, 100MB/s from disk

  | qps |   |   |  
-- | -- | -- | -- | --
  | baseline reads | fast, slow reads | %[1-fastslow/baseline] |  
  | 9,575 | 9,054 | 5.44% |  
  | 9,614 | 9,065 | 5.71% |  
  | 9,610 | 9,066 | 5.66% |  
  | 9,611 | 9,062 | 5.71% |  
  | 9,614 | 9,073 | 5.63% |  
stdev | 16.75410397 | 6.892024376 |
avg | 9,605 | 9,064 |
stderr | 0.17% | 0.08% |
%[avg/baseline] |   | **5.63%** |

5.63% performance degradation in `fast slow query mode` for pure on-disk workload with minimum traces.

Closes #8314

* github.com:scylladb/scylla:
  tracing: fast mode unit test
  tracing: rest api for lightweight slow query tracing
  tracing: omit tracing session events and subsessions in fast mode
2021-03-21 12:15:17 +02:00
Avi Kivity
58b7f225ab keys: convert trichotomic comparators to return std::strong_ordering
A trichotomic comparator returning an int an easily be mistaken
for a less comparator as the return types are convertible.

Use the new std::strong_ordering instead.

A caller in cql3's update_parameters.hh is also converted, following
the path of least resistance.

Ref #1449.

Test: unit (dev)

Closes #8323
2021-03-21 09:30:43 +02:00
Raphael S. Carvalho
64d78eae6a tests: Add unit test for off-strategy sstable compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-03-18 16:56:00 -03:00
Raphael S. Carvalho
439e9b6fab table: change build_new_sstable_list() to accept other sstable sets
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-03-18 11:47:49 -03:00
Raphael S. Carvalho
6e95860e09 table: change non_staging_sstables() to filter out off-strategy sstables
SSTables that are off-strategy should be excluded by this function as
it's used to select candidates for regular compaction.
So in addition to only returning candidates from the main set, let's
also rename it to precisely reflect its behavior.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-03-18 11:47:49 -03:00
Raphael S. Carvalho
1e7a444a8b table: Wire compound sstable set
From now own, _sstables  becomes the compound set, and _main_sstables refer
only to the main sstables of the table. In the near future, maintenance
set will be introduced and will also be managed by the compound set.

So add_sstable() and on_compaction_completion() are changed to
explicitly insert and remove sstables from the main set.

By storing compound set in _sstables, functions which used _sstables for
creating reader, computing statistics, etc, will not have to be changed
when we introduce the maintenance set, so code change is a lot minimized
by this approach.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-03-18 11:46:06 -03:00
Raphael S. Carvalho
e4b5f5ba33 sstable_set: Introduce compound sstable set
This new sstable set implementation is useful for combining operation of
multiple sstable sets, which can still be referenced individually via
its shared ptr reference.
It will be used when maintenance set is introduced in table, so a
compound set is required to allow both sets to have their operations
efficiently combined.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-03-18 11:42:49 -03:00
Botond Dénes
ad02f313dd test: mutation_reader_test: add test for permit cleanup
Check that a permit correctly restores the units on the semaphore in
each state it can be destroyed in.
2021-03-18 16:18:22 +02:00
Ivan Prisyazhnyy
f00391af8b tracing: fast mode unit test
Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
2021-03-18 15:05:09 +02:00
Botond Dénes
c822f0d02a test: querier_cache_test: add memory based cache eviction test
Ensure that the memory consumption of querier cache entries is kept
under the limit.
2021-03-18 14:58:21 +02:00
Botond Dénes
594636ebbf querier: insert(): account immediately evicted querier as resource based eviction
`reader_concurrency_semaphore::register_inactive_read()` drops the
registered inactive read immediately if there is a resource shortage.
This is in effect a resource based eviction, so account it as such in
`querier::insert()`.
2021-03-18 14:57:57 +02:00
Botond Dénes
1a337d0ec1 reader_concurrency_semaphore: fix clear_inactive_reads()
Broken by the move to an intrusive container (9cbbf40), which caused
said method to only clear the container but not destroy the inactive
reads contained therein. This patch restores the previous behaviour and
also adds a call the destructor (to ensure inactive reads are cleaned up
under any circumstances), as well as a unit test.
2021-03-18 14:57:57 +02:00
Piotr Sarna
2509b7dbde Merge 'dht: convert ring_position and decorated_key to std::strong_ordering' from Avi Kivity
As #1449 notes, trichotomic comparators returning int are dangerous as they
can be mistaken for less comparators. This series converts dht::ring_position
and dht::decorated_key, as well as a few closely related downstream types, to
return std::strong_ordering.

Closes #8225

* github.com:scylladb/scylla:
  dht: ring_position, decorated_key: convert tri_comparators to std::strong_ordering
  pager: rephrase misleading comparison check
  test: total_order_checks: prepare for std::strong_ordering
  test: mutation_test: prepare merge_container for std::strong_ordering
  intrusive_array: prepare for std::strong_ordering
  utils: collection-concepts: prepare for std::strong_ordering
2021-03-18 11:51:54 +01:00
Avi Kivity
a5f17b9a2d test: total_order_checks: prepare for std::strong_ordering
Adjust the total_order_check template to work with comparators
returning either int (as a temporary compatibility measure) or
std::strong_ordering (for #1449 safety).
2021-03-18 12:40:05 +02:00