Commit Graph

1823 Commits

Author SHA1 Message Date
Nadav Har'El
c8bb147f84 Merge 'cql3: don't ignore other restrictions when a multi column restriction is present during filtering' from Jan Ciołek
When filtering with multi column restriction present all other restrictions were ignored.
So a query like:
`SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;`
would ignore the restriction `regular_col = 0`.

This was caused by a bug in the filtering code:
2779a171fc/cql3/selection/selection.cc (L433-L449)

When multi column restrictions were detected, the code checked if they are satisfied and returned immediately.
This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied.

This code was introduced back in 2019, when fixing #3574.
Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct.

Fixes: #6200
Fixes: #12014

Closes #12031

* github.com:scylladb/scylladb:
  cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works
  boost/restrictions-test: uncomment part of the test that passes now
  cql-pytest: enable test for filtering combined multi column and regular column restrictions
  cql3: don't ignore other restrictions when a multi column restriction is present during filtering

(cherry picked from commit 2d2034ea28)
2022-11-21 14:02:33 +02:00
Tomasz Grabiec
25d2da08d1 db: range_tombstone_list: Avoid quadratic behavior when applying
Range tombstones are kept in memory (cache/memtable) in
range_tombstone_list. It keeps them deoverlapped, so applying a range
tombstone which covers many range tombstones will erase existing range
tombstones from the list. This operation needs to be exception-safe,
so range_tombstone_list maintains an undo log. This undo log will
receive a record for each range tombstone which is removed. For
exception safety reasons, before pushing an undo log entry, we reserve
space in the log by calling std::vector::reserve(size() + 1). This is
O(N) where N is the number of undo log entries. Therefore, the whole
application is O(N^2).

This can cause reactor stalls and availability issues when replicas
apply such deletions.

This patch avoids the problem by reserving exponentially increasing
amount of space. Also, to avoid large allocations, switches the
container to chunked_vector.

Fixes #11211

Closes #11215

(cherry picked from commit 7f80602b01)
2022-09-30 00:01:26 +03:00
Botond Dénes
9b1a570f6f sstables: crawling mx-reader: make on_out_of_clustering_range() no-op
Said method currently emits a partition-end. This method is only called
when the last fragment in the stream is a range tombstone change with a
position after all clustered rows. The problem is that
consume_partition_end() is also called unconditionally, resulting in two
partition-end fragments being emitted. The fix is simple: make this
method a no-op, there is nothing to do there.

Also add two tests: one targeted to this bug and another one testing the
crawling reader with random mutations generated for random schema.

Fixes: #11421

Closes #11422

(cherry picked from commit be9d1c4df4)
2022-09-29 23:42:01 +03:00
Piotr Sarna
26ead53304 Merge 'Fix mutation commutativity with shadowable tombstone'
from Tomasz Grabiec

This series fixes lack of mutation associativity which manifests as
sporadic failures in
row_cache_test.cc::test_concurrent_reads_and_eviction due to differences
in mutations applied and read.

No known production impact.

Refs https://github.com/scylladb/scylladb/issues/11307

Closes #11312

* github.com:scylladb/scylladb:
  test: mutation_test: Add explicit test for mutation commutativity
  test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones
  db: mutation_partition: Drop unnecessary maybe_shadow()
  db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone
  mutation_partition: row: make row marker shadowing symmetric

(cherry picked from commit 484004e766)
2022-09-20 23:21:58 +02:00
Tomasz Grabiec
f60bab9471 test: row_cache: Use more narrow key range to stress overlapping reads more
This makes catching issues related to concurrent access of same or
adjacent entries more likely. For example, catches #11239.

Closes #11260

(cherry picked from commit 8ee5b69f80)
2022-09-20 23:21:54 +02:00
Avi Kivity
2eadaad9f7 Merge 'database: evict all inactive reads for table when detaching table' from Botond Dénes
Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later.

Fixes: https://github.com/scylladb/scylladb/issues/11264

Closes #11273

* github.com:scylladb/scylladb:
  querier: querier_cache: remove now unused evict_all_for_table()
  database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table()
  reader_concurrency_semaphore: add evict_inactive_reads_for_table()

(cherry picked from commit afa7960926)
2022-09-02 10:41:22 +03:00
Avi Kivity
856703a85e Merge 'row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy' from Tomasz Grabiec
Scenario:

cache = [
    row(pos=2, continuous=false),
    row(pos=after(2), dummy=true)
]

Scanning read starts, starts populating [-inf, before(2)] from sstables.

row(pos=2) is evicted.

cache = [
    row(pos=after(2), dummy=true)
]

Scanning read finishes reading from sstables.

Refreshes cache cursor via
partition_snapshot_row_cursor::maybe_refresh(), which calls
partition_snapshot_row_cursor::advance_to() because iterators are
invalidated. This advances the cursor to
after(2). no_clustering_row_between(2, after(2)) returns true, so
advance_to() returns true, and maybe_refresh() returns true. This is
interpreted by the cache reader as "the cursor has not moved forward",
so it marks the range as complete, without emitting the row with
pos=2. Also, it marks row(pos=after(2)) as continuous, so later reads
will also miss the row.

The bug is in advance_to(), which is using
no_clustering_row_between(a, b) to determine its result, which by
definition excludes the starting key.

Discovered by row_cache_test.cc::test_concurrent_reads_and_eviction
with reduced key range in the random_mutation_generator (1024 -> 16).

Fixes #11239

Closes #11240

* github.com:scylladb/scylladb:
  test: mvcc: Fix illegal use of maybe_refresh()
  tests: row_cache_test: Add test_eviction_of_upper_bound_of_population_range()
  tests: row_cache_test: Introduce one_shot mode to throttle
  row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy
2022-08-11 16:51:59 +02:00
Benny Halevy
14faa3b6f4 compaction_manager: perform_cleanup, perform_sstable_upgrade: use a lw_shared_ptr for owned token ranges
And completely get rid of the dependency on replica::database.

Also, add respective rest_api tests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 08:08:11 +03:00
Benny Halevy
e1fe598760 compaction: cleanup, upgrade: use a lw_shared_ptr for owned token ranges
Currently they are copied for the get_sstables function
so this change reduces copies.

Also, it will allow further decoupling of compaction_manager
from replica::database, by letting the caller of
perform_cleanup and perform_sstable_upgrade get the
owned token ranges from db and pass it to the perform_*
functions in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 07:57:41 +03:00
Aleksandra Martyniuk
6ea5bc96d7 scrub compaction: return status indicating aborted operations
over the rest api

Performing compaction scrub user did not know whether an operation
was aborted.

If compaction scrub is aborted, return status the user gets over
rest api is set to 1.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
3a805a9d9b compaction: extract statistics in compaction_result
Statistics from compaction_result are extracted to new struct
compaction_stats and stored as a field of compaction_result.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
ab85dab05d scrub compaction: count validation errors
The number of validation errors encountered during scrub compaction
is counted.
2022-07-29 09:35:20 +02:00
Avi Kivity
e66809d051 Merge 'Memtable flush: wait for sstable count reduction if needed' from Benny Halevy
Called from try_flush_memtable_to_sstable,
maybe_wait_for_sstable_count_reduction will wait for
compaction to catch up with memtable flush if there
the bucket to compact is inflated, having too many
sstables.  In that case we don't want to add fuel
to the fire by creating yet another sstable.

Fixes #4116

Closes #10954

* github.com:scylladb/scylla:
  table: Add test where compaction doesn't keep up with flush rate.
  compaction_manager: add maybe_wait_for_sstable_count_reduction
  time_window_compaction_strategy: get_sstables_for_compaction: clean up code
  time_window_compaction_strategy: make get_sstables_for_compaction idempotent
  time_window_compaction_strategy: get_sstables_for_compaction: improve debug messages
  leveled_manifest: pass compaction_counter as const&
2022-07-28 19:11:04 +03:00
Avi Kivity
09a6b93ddf Merge 'logalloc: region: properly track listeners when moved' from Benny Halevy
Currently logalloc::region is relying on boost binomial_heap handle to properly move listeners registration when the region (when derived from dirty_memory_manager_logalloc::size_tracked_region) is moved, like boost::intrusive link hooks do -
hence 81e20ceaab/dirty_memory_manager.cc (L89-L90) does nothing.

Unfortunately, this doesn't work as expected.

This series adds a unit test that verifies the move semantics
and a fix to size_tracked_region and region_group code to make it pass.

Also "logalloc: region: get_impl might be called on disengaged _impl when moved"
fixes a couple corner cases where the shared _impl could be dereferenced when disengaged, and
the change also adds a unit test for that too.

Closes #11141

* github.com:scylladb/scylla:
  logalloc: region: properly track listeners when moved
  logalloc: region_impl: add moved method
  logalloc: region: merge: optimize getting other impl
  logalloc: region: merge: call region_impl::unlisten
  logalloc: region: call unlisten rather than open coding it
  logalloc: region move-ctor: initialize _impl
  logalloc: region: get_impl might be called on disengaged _impl when moved
2022-07-28 15:29:54 +03:00
Mikołaj Sielużycki
e0c6e1ef3c table: Add test where compaction doesn't keep up with flush rate.
The test simulates a situation where 2 threads issue flushes to 2
tables. Both issue small flushes, but one has injected reactor stalls.
This can lead to a situation where lots of small sstables accumulate on
disk, and, if compaction never has a chance to keep up, resources can be
exhausted.

(cherry picked from commit b5684aa96d)
(cherry picked from commit 25407a7e41)
2022-07-28 14:43:33 +03:00
Botond Dénes
26f1295536 Merge 'mutation: Ignore dummy rows when consuming clustering fragments' from Mikołaj Sielużycki
consume_clustering_fragments already ignores dummy rows, but does it in
the wrong place. Currently they're ignored after comparing them with
range tombstones. This change skips them before any useful work is done
with them.

Consider a simplified mutation reversal scenario scenario (ckp is
clustering key prefix, -1, 0, 1 are bound_weights):

schema_ptr s = schema_builder{"ks", "cf"}
    .with_column("pk", bytes_type, column_kind::partition_key)
    .with_column("ck1", bytes_type, column_kind::clustering_key)
    .build();

Input range tombstone positions:
    {clustered, ckp{}, before}
    {clustered, ckp{1}, after}

Clustering rows:
    {clustered, ckp{2}, equal}
    {clustered, ckp{}, after} // dummy row

During reversal, clustering rows are read backwards, and reversed range
tombstone positions are read forwards (because the range tombstones are
reversed and applied backwards). The read order in the example above is:

Reversed range tombstone positions:
    1: {clustered, ckp{}, before}
    2: {clustered, ckp{1}, before}

Clustering rows read backwards:
    3: {clustered, ckp{}, after} // dummy row
    4: {clustered, ckp{2}, equal}

Then we effectively do the merge part of merge sort, trying to put all
fragments in order according to their positions from the two lists
above. However, the dummy row is used in the comparison, and it compares
to be gt each of the reversed range tombstone positions. Then we
try to emit the clustering row, but only at that point we notice it's
dummy and should be skipped. Subsequent row with ckp{2} is compared to
the last used range tombstone position and the fragments are out of
order (in reversed schema, ckp{2} should come before ckp{1}).

The solution is to move the logic skipping the dummy clustering rows to
the beginning of the loop, so they can be ignored before they're used.

Fixes: https://github.com/scylladb/scylla/issues/11147

Closes #11129

* github.com:scylladb/scylla:
  mutation: Add test if mutations are consumed in order
  test: Move validating_consumer to test/lib/mutation_assertions.hh
  mutation: Ignore dummy rows when consuming clustering fragments
2022-07-28 11:18:36 +03:00
Benny Halevy
f6645313d8 logalloc: region: properly track listeners when moved
And add targeted unit tests for that.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 11:17:55 +03:00
Benny Halevy
c7d77e4076 logalloc: region: get_impl might be called on disengaged _impl when moved
First check if _impl is engaged before accessing it
to set its _region = this in the move constructor and
move assignment operator.

Add unit test for these odd orner cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-28 10:48:58 +03:00
Avi Kivity
2c0932cc41 Merge 'Reduce the amount of per-table metrics' from Amnon Heiman
This series is the first step in the effort to reduce the number of metrics reported by Scylla.
The series focuses on the per-table metrics.

The combination of histograms, per-tables, and per shard makes the number of metrics in a cluster explode.
The following series uses multiple tools to reduce the number of metrics.
1. Multiple metrics should only be reported for the user tables and the condition that checked it was not updated when more non-user keyspaces were added.
2. Second, instead of a histogram, per table, per shard, it will report a summary per table, per shard, and a single histogram per node.
3. Histograms, summaries, and counters will be reported only if they are used (for example, the cas-related metrics will not be reported for tables that are not using cas).

Closes #11058

* github.com:scylladb/scylla:
  Add summary_test
  database: Reduce the number of per-table metrics
  replica/table.cc: Do not register per-table metrics for system
  histogram_metrics_helper.hh: Add to_metrics_summary function
  Unified histogram, estimated_histogram, rates, and summaries
  Split the timed_rate_moving_average into data and timer
  utils/histogram.hh: should_sample should use a bitmask
  estimated_histogram: add missing getter method
2022-07-27 22:01:08 +03:00
Avi Kivity
4438865a26 Merge 'memtable flush error handling' from Benny Halevy
The series unifies memtable flush error handling into table::seal_active_memtable
following up on f6d9d6175f.

The goal here is to prevent an infinite retry loop as in #10498
by aborting on any error that is not bad_alloc.

Fixes #10498

Closes #10691

* github.com:scylladb/scylla:
  test: memtable_test: failed_flush_prevents_writes: notify_soft_pressure only once
  test: memtable_test: failed_flush_prevents_writes: extend error injection
  table: seal_active_memtable: abort if retried for too long
  table: seal_active_memtable: abort on unexpected error
  table: try_flush_memtable_to_sstable: propagate errors to seal_active_memtable
  dirty_memory_manager: flush_when_needed: move error handling to flush_one/seal_active_memtable
  dirty_memory_manager: flush_permit: add has_sstable_write_permit
  dirty_memory_manager: flush_permit: release_sstable_write_permit: mark noexcept
  dirty_memory_manager: flush_permit: make _sstable_write_permit optional
  table: reindent seal_active_memtable
  table: coroutinize seal_active_memtable
  memtable_list: mark functions noexcept
  commitlog: make discard_completed_segments and friends noexcept
  dirty_memory_manager: flush_when_needed: target error handling at flush_one
  database: delete unused seal_delayed_fn_type
  dirty_memory_manager: mark functions noexcept
  memtable: mark functions noexcept
  memtable: memtable_encoding_stats_collector: mark functions noexcept
  encoding_state: mark functions noexcept
  logalloc: mark free functions noexcept
  logalloc: allocating_section: mark functions noexcept
  logalloc: allocating_section: guard: mark constructor noexcept
  logalloc: reclaim_lock: mark functions noexcept
  logalloc: tracker_reclaimer_lock: mark constructor noexcept
  logalloc: mark shard_tracker noexcept
  logalloc: region: mark functions const/noexcept
  logalloc: basic_region_impl: mark functions noexcept
  logalloc: region_impl: mark functions noexcept
  utils: log_heap: mark functions noexcept
  logalloc: region_impl: object_descriptor: mark functions noexcept
  logalloc: region_group: mark functions noexcept
  logalloc: tracker: mark functions const/noexcept
  logalloc: tracker::impl: make region_occupancy and friends const
  logalloc: tracker::impl: occupancy: get rid of reclaiming_lock
  logalloc: tracker::impl: mark functions noexcept
  logalloc: segment: mark functions const / noexcept
  logalloc: segment_pool: add const variant of descriptor method
  logalloc: segment_pool: move descriptor method to class definition
  logalloc: segment_pool: mark functions const/noexcept
  logalloc: segment_pool: delete unused free_or_restore_to_reserve method
  utils: dynamic_bitset: mark functions noexcept
  utils: dynamic_bitset: delete unused members
  logalloc: segment_store, segment_pool: idx_from_segment: get a const segment* in const overload
  logalloc: segment_store, segment_pool: return const segment* from segment_from_idx() const
  logalloc: segment_store: make can_allocate_more_segments const
  logalloc: segment_store: mark functions noexcept
  logalloc: segment_descriptor: mark functions noexcept
  logalloc: occupancy_stats: mark functions noexcept
  min_max_tracker: mark functions noexcept
  gc_clock, db_clock: mark functions noexcept
  dirty_memory_manager: region_group: mark functions noexcept
  dirty_memory_manager: region_group: make simple constructor noexcept
  dirty_memory_manager: region_group_reclaimer mark functions noexcept
  logalloc: lsa_buffer: mark functions noexcept
2022-07-27 19:08:59 +03:00
Amnon Heiman
3658aa9ec2 Add summary_test
This patch adds unit tests for the summary implementation.
2022-07-27 16:58:52 +03:00
Raphael S. Carvalho
0796b8c97a sstables: Enforce disjoint invariant in sstable_run
We know that sstable_run is supposed to contain disjoint files only,
but this assumption can temporarily break when switching strategies
as TWCS, for example, can incorrectly pick the same run id for
sstables in different windows during segregation. So when switching
from TWCS to ICS, it could happen a sstable_run won't contain disjoint
files. We should definitely fix TWCS and any other strategy doing
that, but sstable_run should have disjointness as actual invariant,
not be relaxed on it. Otherwise, we cannot build readers on this
assumption, so more complicated logic have to be added to merge
overlapping files.
After this patch, sstable_run will reject insertion of a file that
will cause the invariant to break, so caller will have to check
that and push that file into a different sstable run.

Closes #11116
2022-07-27 14:48:28 +03:00
Benny Halevy
bb9eddc67f test: memtable_test: failed_flush_prevents_writes: notify_soft_pressure only once
Now that memtable flush error handling was moved entirely
to table::seal_active_memtable, we don't need to notify_soft_pressure
to keep retry going.  The inifinite retry loop should
eventually either succeed or die (by isolating the node or aborting)
on its own.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-27 14:06:59 +03:00
Benny Halevy
b5abbb971f test: memtable_test: failed_flush_prevents_writes: extend error injection
Inject errors into all seal_active_memtable distinct error
handling sites.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-27 14:06:59 +03:00
Benny Halevy
f0a597a252 table: try_flush_memtable_to_sstable: propagate errors to seal_active_memtable
And let seal_active_memtable decide about how to handle them
as now all flush error handling logic is implemented there.

In particular, unlike today, sstable write errors will
cause internal error rather than loop forever.

Also, check for shutdown earlier to ignore errors
like semaphore_broken that might happen when
the table is stopped.

Refs #10498

(The issue will be considered fixed when going
into maintenance mode on write errors rather than
throwing internal error and potentially retrying forever)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-27 14:04:55 +03:00
Mikołaj Sielużycki
9f5655bb97 mutation: Add test if mutations are consumed in order
It explicitly interleaves clustering rows with range tombstones and
ensures the last clustering row is dummy.
2022-07-27 11:22:55 +02:00
Mikołaj Sielużycki
9c43f1266a test: Move validating_consumer to test/lib/mutation_assertions.hh 2022-07-27 11:19:50 +02:00
Botond Dénes
81e20ceaab Merge 'logalloc, dirty_memory_manager: move region_groups to dirty_memory_manager' from Avi Kivity
logalloc manages regions of log-structured allocated memory, and region_groups
containing such regions and other region_groups. region_groups were introduced
for accounting purposes - first to limit the amount of memory in memtables, then to
match new dirty memory allocation rate with memtable flushing rate so we never
hit a situation where allocation rate exceeded flush rate, and we exceed our limit.

The problem is that the abstraction is very weak - if we want to change anything
in memtable flush control we'll need to change region_groups too - and also
expensive to maintain.

The solution is to break the abstraction and move region_groups to memtable
dirty memory management code. Instead introduce a new, simpler abstraction,
the region_listener, which communicates changes in region memory consumption
to an external piece of code, which can then choose to do with it what it likes.

The long term plan is to completely remove region_groups and fold them into dirty_memory_manager:
 - make each memtable a region_listener so it gets called back after size changes
 - make memtables inform their dirty_memory_manager about the size to dirty_memory_manager can decide to throttle writes and which memtable to pick to flush

Closes #10839

* github.com:scylladb/scylla:
  logalloc: drop region_impl public accessors
  logalloc, dirty_memory_manager: move size-tracking binomial heap out of logalloc
  logalloc: relax lifetime rules around region_listener
  logalloc, dirty_memory_manager: move region_group and associated code
  logalloc: expose tracker_reclaimer_lock
  logalloc: reimplement tracker_reclaim_lock to avoid using hidden classes
  logalloc: reduce friendship between region and region_group
  logalloc: decouple region_group from region
  memtable: stop using logalloc::region::group() to test for flushed memtables
2022-07-26 17:08:37 +03:00
Nadav Har'El
cb8a67dc98 Merge 'Allow materialized views to by synchronous' from Piotr Sarna
This pull request introduces a "synchronous mode" for global views. In this mode, all view updates are applied synchronously as if the view was local.

Marking view as a synchronous one can be done using `CREATE MATERIALIZED VIEW` and `ALTER MATERIALIZED VIEW`. E.g.:
```cql
ALTER MATERIALIZED VIEW ks.v WITH synchronous_updates = true;
```

Marking view as a synchronous one was done using tags (originally used by alternator). No big modifications in the view's code were needed.

Fixes: https://github.com/scylladb/scylla/issues/10545

Closes #11013

* github.com:scylladb/scylla:
  cql-pytest: extend synchronous mv test with new cases
  cql-pytest: allow extra parameters in new_materialized_view
  docs: add a paragraph on view synchronous updates
  test/boost/cql_query_test: add test setting synchronous updates property
  test: cql-pytest: add a test for synchronous mode materialized views
  db: view: react to synchronous updates tag
  cql3: statements: cf_prop_defs: apply synchronous updates tag
  alternator, db: move the tag code to db/tags
  cql3: statements: add a synchronous_updates property
2022-07-26 15:42:51 +03:00
Avi Kivity
2cb5f79e9d logalloc, dirty_memory_manager: move size-tracking binomial heap out of logalloc
The region_group mechanism used an intrusive heap handle embedded in
logalloc::region to allow region_group:s to track the largest region. But
with region_group moved out of logalloc, the handle is out of place.

Move it out, introducing a new intermediate class size_tracked_region
to hold the heap handle. We might eventually merge the new class into
memtable (which derives from it), but that requires a large rearrangement
of unit tests, so defer that.
2022-07-26 11:12:10 +03:00
Avi Kivity
ee720fa23b logalloc: relax lifetime rules around region_listener
Currently, a region_listener is added during construction and removed
during destruction. This was done to mimick the old region(region_group&)
constructor, as region_listener replaces region_group.

However, this makes moving the binomial heap handle outside logalloc
difficult. The natural place for the handle is in a derived class
of logalloc::region (e.g. memtable), but members of this derived class
will be destroyed earlier than the logalloc::region here. We could play
trickes with an earlier base class but it's better to just decouple
region lifecycle from listener lifecycle.

Do that be adding listen()/unlisten() methods. Some small awkwardness
remains in that merge() implicitly unlistens (see comment in
region::unlisten).

Unit tests are adjusted.
2022-07-26 11:12:10 +03:00
Avi Kivity
fbe8ea7727 logalloc, dirty_memory_manager: move region_group and associated code
region_group is an abstraction that allows accounting for groups of
regions, but the cost/benefit ratio of maintaining the abstraction
is poor. Each time we need to change decision algorithm of memtable
flushing (admittedly rarely), we need to distill that into an abstraction
for region_groups and then use it. An example is virtual regions groups;
we wanted to account for the partially flushed memtables and had to
invent region groups to stand in their place.

Rather than continuing to invest in the abstraction, break it now
and move it to the memtable dirty memory manager which is responsible
for making those decisions. The relevant code is moved to
dirty_memory_manager.hh and dirty_memory_manager.cc (new file), and
a new unit test file is added as well.

A downside of the change is that unit testing will be more difficult.
2022-07-26 11:12:10 +03:00
Michał Sala
c7b78cfd81 test/boost/cql_query_test: add test setting synchronous updates property
The test checks if a synchronous_updates property can be set via ALTER
MATERIALIZED VIEW or CREATE MATERIALIZED VIEW statements.
2022-07-25 09:53:33 +02:00
Avi Kivity
fd663bcb94 cql3: util: change where clause utilities to accept a single expression rather than a vector of terms
Conversion to terms happens internally via boolean_factors().
2022-07-22 20:14:48 +03:00
Avi Kivity
a5dd588465 cql3: statement_restrictions: accept a single expression rather than a vector
Move closer to the goal of accepting a generic expression for WHERE
clause by accepting a generic expression in statement_restrictions. The
various callers will synthesize it from a vector of terms.
2022-07-22 20:14:48 +03:00
Avi Kivity
8085b9f57a cql3: expr: add boolean_factors() function to factorize an expression
When analyzing a WHERE clause, we want to separate individual
factors (usually relations), and later partition them into
partition key, clustering key, and regular column relations. The
first step is separation, for which this helper is added.

Currently, it is not required since the grammar supplies the
expression in separated form, but this will not work once it is
relaxed to allow any expression in the WHERE clause.

A unit test is added.
2022-07-22 20:14:48 +03:00
Avi Kivity
13a64d8ab2 Merge 'Remove all remaining restrictions classes' from Jan Ciołek
This PR removes all code that used classes `restriction`, `restrictions` and their children.

There were two fields in `statement_restrictions` that needed to be dealt with: `_clustering_columns_restrictions` and `_nonprimary_key_restrictions`.

Each function was reimplemented to operate on the new expression representaiion and eventually these fields weren't needed anymore.

After that the restriction classes weren't used anymore and could be deleted as well.

Now all of the code responsible for analyzing WHERE clause and planning a query works on expressions.

Closes #11069

* github.com:scylladb/scylla:
  cql3: Remove all remaining restrictions code
  cql3: Move a function from restrictions class to the test
  cql3: Remove initial_key_restrictions
  cql3: expr: Remove convert_to_restriction
  cql3: Remove _new from _new_nonprimary_key_restrictions
  cql3: Remove _nonprimary_key_restrictions field
  cql3: Reimplement uses of _nonprimary_key_restrictions using expression
  cql3: Keep a map of single column nonprimary key restrictions
  cql3: Remove _new from _new_clustering_columns_restrictions
  cql3: Remove _clustering_columns_restrictions from statement_restrictions
  cql3: Use a variable instead of dynamic cast
  cql3: Use the new map of single column clustering restrictions
  cql3: Keep a map of single column clustering key restrictions
  cql3: Return an expression in get_clustering_columns_restrctions()
  cql3: Reimplement _clustering_columns_restrictions->has_supporting_index()
  cql3: Don't create single element conjunction
  cql3: Add expr::index_supports_some_column
  cql3: Reimplement has_unrestricted_components()
  cql3: Reimplement _clustering_columns_restrictions->need_filtering()
  cql3: Reimplement num_prefix_columns_that_need_not_be_filtered
  cql3: Use the new clustering restrictions field instead of ->expression
  cql3: Reimplement _clustering_columns_restrictions->size() using expressions
  cql3: Reimplement _clustering_columns_restrictions->get_column_defs() using expressions
  cql3: Reimplement _clustering_columns_restrictions->is_all_eq() using expressions
  cql3: expr: Add has_only_eq_binops function
  cql3: Reimplement _clustering_columns_restrictions->empty() using expressions
2022-07-20 18:01:15 +03:00
Jan Ciolek
599bcd6ea7 cql3: Remove all remaining restrictions code
The classes restriction, restrictions and its children
aren't used anywhere now and can be safely removed.

Some includes need to be modified for the code to compile.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:31 +02:00
Jan Ciolek
bff0b87c18 cql3: Move a function from restrictions class to the test
statement_restrictions_test uses a function that is defined
in multi_column_restriction.hh.

This file will be removed soon and for the test to still work
the function is moved to the test source.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2022-07-20 09:10:31 +02:00
Botond Dénes
6e20cb3255 Merge 'database_test: test_truncate_without_snapshot_during_writes: apply mutation on the correct shard' from Benny Halevy
Currently, all the mutations this test generates are applied on shard 0.
In rare cases, this may lead to the following crash, when the flushed
sstable doesn't contain any key that belongs to the current shard,
as seen in https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1390/artifact/testlog/x86_64/dev/database_test.test_truncate_without_snapshot_during_writes.114.log
```
WARN  2022-07-17 17:41:36,630 [shard 0] sstable - create_sharding_metadata: range=[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}] has no intersection with shard=0 first_key={key: pk{00046b657930}, token:-468459073612751032} last_key={key: pk{00046b657930}, token:-468459073612751032} ranges_single_shard=[] ranges_all_shards={{1, {[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}]}}}
ERROR 2022-07-17 17:41:36,630 [shard 0] table - failed to write sstable /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db)
ERROR 2022-07-17 17:41:36,631 [shard 0] table - Memtable flush failed due to: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db). Aborting, at 0x329e28e 0x329e780 0x329ea88 0xf5bc69 0xf956b1 0x3196dc4 0x3198037 0x319742a 0x32be2e4 0x32bd8e1 0x32ba01c 0x317f97d /lib64/libpthread.so.0+0x92a4 /lib64/libc.so.6+0x100322
```

Instead, generate random keys and apply them on their
owning shard, and truncate all database shards.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11066

* github.com:scylladb/scylla:
  database_test: test_truncate_without_snapshot_during_writes: apply mutation on the correct shard
  table: try_flush_memtable_to_sstable: consume: close reader on error
2022-07-20 09:06:07 +03:00
Avi Kivity
5a30f9b789 Merge 'Distributed aggregate query' from Michał Jadwiszczak
This PR extends #9209. It consists of 2 main points:

To enable parallelization of user-defined aggregates, reduction function was added to UDA definition. Reduction function is optional and it has to be scalar function that takes 2 arguments with type of UDA's state and returns UDA's state

All currently implemented native aggregates got their reducible counterpart, which return their state as final result, so it can be reduced with other result. Hence all native aggregates can now be distributed.

Local 3-node cluster made with current master. `node1` updated to this branch. Accessing node with `ccm <node-name> cqlsh`

I've tested belowed things from both old and new node:
- creating UDA with reduce function - not allowed
- selecting count(*) - distributed
- selecting other aggregate function - not distributed

Fixes: #10224

Closes #10295

* github.com:scylladb/scylla:
  test: add tests for parallelized aggregates
  test: cql3: Add UDA REDUCEFUNC test
  forward_service: enable multiple selection
  forward_service: support UDA and native aggregate parallelization
  cql3:functions: Add cql3::functions::functions::mock_get()
  cql3: selection: detect parallelize reduction type
  db,cql3: Move part of cql3's function into db
  selection: detect if selectors factory contains only simple selectors
  cql3: reducible aggregates
  DB: Add `scylla_aggregates` system table
  db,gms: Add SCYLLA_AGGREGATES schema features
  CQL3: Add reduce function to UDA
  gms: add UDA_NATIVE_PARALLELIZED_AGGREGATION feature
2022-07-19 19:05:19 +03:00
Benny Halevy
1c26d49fba database_test: test_truncate_without_snapshot_during_writes: apply mutation on the correct shard
Currently, all the mutations this test generates are applied on shard 0.
In rare cases, this may lead to the following crash, when the flushed
sstable doesn't contain any key that belongs to the current shard,
as seen in https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1390/artifact/testlog/x86_64/dev/database_test.test_truncate_without_snapshot_during_writes.114.log
```
WARN  2022-07-17 17:41:36,630 [shard 0] sstable - create_sharding_metadata: range=[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}] has no intersection with shard=0 first_key={key: pk{00046b657930}, token:-468459073612751032} last_key={key: pk{00046b657930}, token:-468459073612751032} ranges_single_shard=[] ranges_all_shards={{1, {[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}]}}}
ERROR 2022-07-17 17:41:36,630 [shard 0] table - failed to write sstable /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db)
ERROR 2022-07-17 17:41:36,631 [shard 0] table - Memtable flush failed due to: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db). Aborting, at 0x329e28e 0x329e780 0x329ea88 0xf5bc69 0xf956b1 0x3196dc4 0x3198037 0x319742a 0x32be2e4 0x32bd8e1 0x32ba01c 0x317f97d /lib64/libpthread.so.0+0x92a4 /lib64/libc.so.6+0x100322
```

Instead, generate random keys and apply them on their
owning shard, and truncate all database shards.

Fixes #11076

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-19 16:55:11 +03:00
Pavel Emelyanov
a56e2c83f3 sstables: Keep priority class on sstable_directory
Current code accepts priotity class as an argument to various functions
that need it and all its callers use streaming class. Next patches will
needs to sometimes use default class, but it will require heavy patching
of the distributed loader. Things get simpler if the priority class is
kept on sstable_directory on start.

This change also simplifies the ongoing effort on unification of sched
and IO classes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-19 12:14:41 +03:00
Jadw1
7497fda370 test: add tests for parallelized aggregates 2022-07-18 15:25:42 +02:00
Botond Dénes
9afd2dc428 Merge 'Make compaction manager switch to table abstraction ' from Raphael "Raph" Carvalho
This work gets us a step closer to compaction groups.

Everything in compaction layer but compaction_manager was converted to table_state.

After this work, we can start implementing compaction groups, as each group will be represented by its own table_state. User-triggered operations that span the entire table, not only a group, can be done by calling the manager operation on behalf of each group and then merging the results, if any.

Closes #11028

* github.com:scylladb/scylla:
  compaction: remove forward declaration of replica::table
  compaction_manager: make add() and remove() switch to table_state
  compaction_manager: make run_custom_job() switch to table_state
  compaction_manager: major: switch to table_state
  compaction_manager: scrub: switch to table_state
  compaction_manager: upgrade: switch to table_state
  compaction: table_state: add get_sstables_manager()
  compaction_manager: cleanup: switch to table_state
  compaction_manager: offstrategy: switch to table_state()
  compaction_manager: rewrite_sstables(): switch to table_state
  compaction_manager: make run_with_compaction_disabled() switch to table_state
  compaction_manager: compaction_reenabler: switch to table_state
  compaction_manager: make submit(T) switch to table_state
  compaction_manager: task: switch to table_state
  compaction: table_state: Add is_auto_compaction_disabled_by_user()
  compaction: table_state: Add on_compaction_completion()
  compaction: table_state: Add make_sstable()
  compaction_manager: make can_proceed switch to table_state
  compaction_manager: make stop compaction procedures switch to table_state
  compaction_manager: make get_compactions() switch to table_state
  compaction_manager: change task::update_history() to use table_state instead
  compaction_manager: make can_register_compaction() switch to table_state
  compaction_manager: make get_candidates() switch to table_state
  compaction_manager: make propagate_replacement() switch to table_state
  compaction: Move table::in_strategy_sstables() and switch to table_state
  compaction: table_state: Add maintenance sstable set
  compaction_manager: make has_table_ongoing_compaction() switch to table_state
  compaction_manager: make compaction_disabled() switch to table_state
  compaction_manager: switch to table_state for mapping of compaction_state
  compaction_manager: move task ctor into source
2022-07-18 15:18:29 +03:00
Benny Halevy
bbbbea65fb database: clear_snapshot: remove dropped table directory when it has no remaining snapshots
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
c70a675d77 database: clear_snapshot: make it a coroutine and use thread
and use an async thread around `directory_lister`
rather than `lister::scan_dir` to simplify the implementation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
e710fe527c database_test: add clear_multiple_snapshots test
Based on the `clear_snapshot` test.

Test with multiple snapshots and different
combinations of parameters to database::clear_snapshot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
ae3b1b5a64 database_test: drop_table_with_snapshots: test auto_snapshot
Refactor test_drop_table_with_auto_snapshot out of
drop_table_with_snapshots, adding a auto_snapshot param,
controlling how to configure the cql_test_env db:.config::auto_snapshot,
so we can test both cases - auto_snapshot enabled and disabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00
Benny Halevy
af6805dd75 database_test: populate_from_quarantine_works: pass optional db:config to do_with_some_data
Instead of just `tmpdir_for_data`, so we can easily set auto_snapshot
for `drop_table_with_snapshots` in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-07-17 14:33:34 +03:00