scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Nadav Har'El	c8bb147f84	Merge 'cql3: don't ignore other restrictions when a multi column restriction is present during filtering' from Jan Ciołek When filtering with multi column restriction present all other restrictions were ignored. So a query like: `SELECT * FROM WHERE pk = 0 AND (ck1, ck2) < (0, 0) AND regular_col = 0 ALLOW FILTERING;` would ignore the restriction `regular_col = 0`. This was caused by a bug in the filtering code: `2779a171fc/cql3/selection/selection.cc (L433-L449)` When multi column restrictions were detected, the code checked if they are satisfied and returned immediately. This is fixed by returning only when these restrictions are not satisfied. When they are satisfied the other restrictions are checked as well to ensure all of them are satisfied. This code was introduced back in 2019, when fixing #3574. Perhaps back then it was impossible to mix multi column and regular columns and this approach was correct. Fixes: #6200 Fixes: #12014 Closes #12031 * github.com:scylladb/scylladb: cql-pytest: add a reproducer for #12014, verify that filtering multi column and regular restrictions works boost/restrictions-test: uncomment part of the test that passes now cql-pytest: enable test for filtering combined multi column and regular column restrictions cql3: don't ignore other restrictions when a multi column restriction is present during filtering (cherry picked from commit `2d2034ea28`)	2022-11-21 14:02:33 +02:00
Tomasz Grabiec	25d2da08d1	db: range_tombstone_list: Avoid quadratic behavior when applying Range tombstones are kept in memory (cache/memtable) in range_tombstone_list. It keeps them deoverlapped, so applying a range tombstone which covers many range tombstones will erase existing range tombstones from the list. This operation needs to be exception-safe, so range_tombstone_list maintains an undo log. This undo log will receive a record for each range tombstone which is removed. For exception safety reasons, before pushing an undo log entry, we reserve space in the log by calling std::vector::reserve(size() + 1). This is O(N) where N is the number of undo log entries. Therefore, the whole application is O(N^2). This can cause reactor stalls and availability issues when replicas apply such deletions. This patch avoids the problem by reserving exponentially increasing amount of space. Also, to avoid large allocations, switches the container to chunked_vector. Fixes #11211 Closes #11215 (cherry picked from commit `7f80602b01`)	2022-09-30 00:01:26 +03:00
Botond Dénes	9b1a570f6f	sstables: crawling mx-reader: make on_out_of_clustering_range() no-op Said method currently emits a partition-end. This method is only called when the last fragment in the stream is a range tombstone change with a position after all clustered rows. The problem is that consume_partition_end() is also called unconditionally, resulting in two partition-end fragments being emitted. The fix is simple: make this method a no-op, there is nothing to do there. Also add two tests: one targeted to this bug and another one testing the crawling reader with random mutations generated for random schema. Fixes: #11421 Closes #11422 (cherry picked from commit `be9d1c4df4`)	2022-09-29 23:42:01 +03:00
Piotr Sarna	26ead53304	Merge 'Fix mutation commutativity with shadowable tombstone' from Tomasz Grabiec This series fixes lack of mutation associativity which manifests as sporadic failures in row_cache_test.cc::test_concurrent_reads_and_eviction due to differences in mutations applied and read. No known production impact. Refs https://github.com/scylladb/scylladb/issues/11307 Closes #11312 * github.com:scylladb/scylladb: test: mutation_test: Add explicit test for mutation commutativity test: random_mutation_generator: Workaround for non-associativity of mutations with shadowable tombstones db: mutation_partition: Drop unnecessary maybe_shadow() db: mutation_partition: Maintain shadowable tombstone invariant when applying a hard tombstone mutation_partition: row: make row marker shadowing symmetric (cherry picked from commit `484004e766`)	2022-09-20 23:21:58 +02:00
Tomasz Grabiec	f60bab9471	test: row_cache: Use more narrow key range to stress overlapping reads more This makes catching issues related to concurrent access of same or adjacent entries more likely. For example, catches #11239. Closes #11260 (cherry picked from commit `8ee5b69f80`)	2022-09-20 23:21:54 +02:00
Avi Kivity	2eadaad9f7	Merge 'database: evict all inactive reads for table when detaching table' from Botond Dénes Currently, when detaching the table from the database, we force-evict all queriers for said table. This series broadens the scope of this force-evict to include all inactive reads registered at the semaphore. This ensures that any regular inactive read "forgotten" for any reason in the semaphore, will not end up in said readers accessing a dangling table reference when destroyed later. Fixes: https://github.com/scylladb/scylladb/issues/11264 Closes #11273 * github.com:scylladb/scylladb: querier: querier_cache: remove now unused evict_all_for_table() database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table() reader_concurrency_semaphore: add evict_inactive_reads_for_table() (cherry picked from commit `afa7960926`)	2022-09-02 10:41:22 +03:00
Avi Kivity	856703a85e	Merge 'row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy' from Tomasz Grabiec Scenario: cache = [ row(pos=2, continuous=false), row(pos=after(2), dummy=true) ] Scanning read starts, starts populating [-inf, before(2)] from sstables. row(pos=2) is evicted. cache = [ row(pos=after(2), dummy=true) ] Scanning read finishes reading from sstables. Refreshes cache cursor via partition_snapshot_row_cursor::maybe_refresh(), which calls partition_snapshot_row_cursor::advance_to() because iterators are invalidated. This advances the cursor to after(2). no_clustering_row_between(2, after(2)) returns true, so advance_to() returns true, and maybe_refresh() returns true. This is interpreted by the cache reader as "the cursor has not moved forward", so it marks the range as complete, without emitting the row with pos=2. Also, it marks row(pos=after(2)) as continuous, so later reads will also miss the row. The bug is in advance_to(), which is using no_clustering_row_between(a, b) to determine its result, which by definition excludes the starting key. Discovered by row_cache_test.cc::test_concurrent_reads_and_eviction with reduced key range in the random_mutation_generator (1024 -> 16). Fixes #11239 Closes #11240 * github.com:scylladb/scylladb: test: mvcc: Fix illegal use of maybe_refresh() tests: row_cache_test: Add test_eviction_of_upper_bound_of_population_range() tests: row_cache_test: Introduce one_shot mode to throttle row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy	2022-08-11 16:51:59 +02:00
Benny Halevy	14faa3b6f4	compaction_manager: perform_cleanup, perform_sstable_upgrade: use a lw_shared_ptr for owned token ranges And completely get rid of the dependency on replica::database. Also, add respective rest_api tests. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-02 08:08:11 +03:00
Benny Halevy	e1fe598760	compaction: cleanup, upgrade: use a lw_shared_ptr for owned token ranges Currently they are copied for the get_sstables function so this change reduces copies. Also, it will allow further decoupling of compaction_manager from replica::database, by letting the caller of perform_cleanup and perform_sstable_upgrade get the owned token ranges from db and pass it to the perform_* functions in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-02 07:57:41 +03:00
Aleksandra Martyniuk	6ea5bc96d7	scrub compaction: return status indicating aborted operations over the rest api Performing compaction scrub user did not know whether an operation was aborted. If compaction scrub is aborted, return status the user gets over rest api is set to 1.	2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk	3a805a9d9b	compaction: extract statistics in compaction_result Statistics from compaction_result are extracted to new struct compaction_stats and stored as a field of compaction_result.	2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk	ab85dab05d	scrub compaction: count validation errors The number of validation errors encountered during scrub compaction is counted.	2022-07-29 09:35:20 +02:00
Avi Kivity	e66809d051	Merge 'Memtable flush: wait for sstable count reduction if needed' from Benny Halevy Called from try_flush_memtable_to_sstable, maybe_wait_for_sstable_count_reduction will wait for compaction to catch up with memtable flush if there the bucket to compact is inflated, having too many sstables. In that case we don't want to add fuel to the fire by creating yet another sstable. Fixes #4116 Closes #10954 * github.com:scylladb/scylla: table: Add test where compaction doesn't keep up with flush rate. compaction_manager: add maybe_wait_for_sstable_count_reduction time_window_compaction_strategy: get_sstables_for_compaction: clean up code time_window_compaction_strategy: make get_sstables_for_compaction idempotent time_window_compaction_strategy: get_sstables_for_compaction: improve debug messages leveled_manifest: pass compaction_counter as const&	2022-07-28 19:11:04 +03:00
Avi Kivity	09a6b93ddf	Merge 'logalloc: region: properly track listeners when moved' from Benny Halevy Currently logalloc::region is relying on boost binomial_heap handle to properly move listeners registration when the region (when derived from dirty_memory_manager_logalloc::size_tracked_region) is moved, like boost::intrusive link hooks do - hence `81e20ceaab/dirty_memory_manager.cc (L89-L90)` does nothing. Unfortunately, this doesn't work as expected. This series adds a unit test that verifies the move semantics and a fix to size_tracked_region and region_group code to make it pass. Also "logalloc: region: get_impl might be called on disengaged _impl when moved" fixes a couple corner cases where the shared _impl could be dereferenced when disengaged, and the change also adds a unit test for that too. Closes #11141 * github.com:scylladb/scylla: logalloc: region: properly track listeners when moved logalloc: region_impl: add moved method logalloc: region: merge: optimize getting other impl logalloc: region: merge: call region_impl::unlisten logalloc: region: call unlisten rather than open coding it logalloc: region move-ctor: initialize _impl logalloc: region: get_impl might be called on disengaged _impl when moved	2022-07-28 15:29:54 +03:00
Mikołaj Sielużycki	e0c6e1ef3c	table: Add test where compaction doesn't keep up with flush rate. The test simulates a situation where 2 threads issue flushes to 2 tables. Both issue small flushes, but one has injected reactor stalls. This can lead to a situation where lots of small sstables accumulate on disk, and, if compaction never has a chance to keep up, resources can be exhausted. (cherry picked from commit `b5684aa96d`) (cherry picked from commit `25407a7e41`)	2022-07-28 14:43:33 +03:00
Botond Dénes	26f1295536	Merge 'mutation: Ignore dummy rows when consuming clustering fragments' from Mikołaj Sielużycki consume_clustering_fragments already ignores dummy rows, but does it in the wrong place. Currently they're ignored after comparing them with range tombstones. This change skips them before any useful work is done with them. Consider a simplified mutation reversal scenario scenario (ckp is clustering key prefix, -1, 0, 1 are bound_weights): schema_ptr s = schema_builder{"ks", "cf"} .with_column("pk", bytes_type, column_kind::partition_key) .with_column("ck1", bytes_type, column_kind::clustering_key) .build(); Input range tombstone positions: {clustered, ckp{}, before} {clustered, ckp{1}, after} Clustering rows: {clustered, ckp{2}, equal} {clustered, ckp{}, after} // dummy row During reversal, clustering rows are read backwards, and reversed range tombstone positions are read forwards (because the range tombstones are reversed and applied backwards). The read order in the example above is: Reversed range tombstone positions: 1: {clustered, ckp{}, before} 2: {clustered, ckp{1}, before} Clustering rows read backwards: 3: {clustered, ckp{}, after} // dummy row 4: {clustered, ckp{2}, equal} Then we effectively do the merge part of merge sort, trying to put all fragments in order according to their positions from the two lists above. However, the dummy row is used in the comparison, and it compares to be gt each of the reversed range tombstone positions. Then we try to emit the clustering row, but only at that point we notice it's dummy and should be skipped. Subsequent row with ckp{2} is compared to the last used range tombstone position and the fragments are out of order (in reversed schema, ckp{2} should come before ckp{1}). The solution is to move the logic skipping the dummy clustering rows to the beginning of the loop, so they can be ignored before they're used. Fixes: https://github.com/scylladb/scylla/issues/11147 Closes #11129 * github.com:scylladb/scylla: mutation: Add test if mutations are consumed in order test: Move validating_consumer to test/lib/mutation_assertions.hh mutation: Ignore dummy rows when consuming clustering fragments	2022-07-28 11:18:36 +03:00
Benny Halevy	f6645313d8	logalloc: region: properly track listeners when moved And add targeted unit tests for that. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-28 11:17:55 +03:00
Benny Halevy	c7d77e4076	logalloc: region: get_impl might be called on disengaged _impl when moved First check if _impl is engaged before accessing it to set its _region = this in the move constructor and move assignment operator. Add unit test for these odd orner cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-28 10:48:58 +03:00
Avi Kivity	2c0932cc41	Merge 'Reduce the amount of per-table metrics' from Amnon Heiman This series is the first step in the effort to reduce the number of metrics reported by Scylla. The series focuses on the per-table metrics. The combination of histograms, per-tables, and per shard makes the number of metrics in a cluster explode. The following series uses multiple tools to reduce the number of metrics. 1. Multiple metrics should only be reported for the user tables and the condition that checked it was not updated when more non-user keyspaces were added. 2. Second, instead of a histogram, per table, per shard, it will report a summary per table, per shard, and a single histogram per node. 3. Histograms, summaries, and counters will be reported only if they are used (for example, the cas-related metrics will not be reported for tables that are not using cas). Closes #11058 * github.com:scylladb/scylla: Add summary_test database: Reduce the number of per-table metrics replica/table.cc: Do not register per-table metrics for system histogram_metrics_helper.hh: Add to_metrics_summary function Unified histogram, estimated_histogram, rates, and summaries Split the timed_rate_moving_average into data and timer utils/histogram.hh: should_sample should use a bitmask estimated_histogram: add missing getter method	2022-07-27 22:01:08 +03:00
Avi Kivity	4438865a26	Merge 'memtable flush error handling' from Benny Halevy The series unifies memtable flush error handling into table::seal_active_memtable following up on `f6d9d6175f`. The goal here is to prevent an infinite retry loop as in #10498 by aborting on any error that is not bad_alloc. Fixes #10498 Closes #10691 * github.com:scylladb/scylla: test: memtable_test: failed_flush_prevents_writes: notify_soft_pressure only once test: memtable_test: failed_flush_prevents_writes: extend error injection table: seal_active_memtable: abort if retried for too long table: seal_active_memtable: abort on unexpected error table: try_flush_memtable_to_sstable: propagate errors to seal_active_memtable dirty_memory_manager: flush_when_needed: move error handling to flush_one/seal_active_memtable dirty_memory_manager: flush_permit: add has_sstable_write_permit dirty_memory_manager: flush_permit: release_sstable_write_permit: mark noexcept dirty_memory_manager: flush_permit: make _sstable_write_permit optional table: reindent seal_active_memtable table: coroutinize seal_active_memtable memtable_list: mark functions noexcept commitlog: make discard_completed_segments and friends noexcept dirty_memory_manager: flush_when_needed: target error handling at flush_one database: delete unused seal_delayed_fn_type dirty_memory_manager: mark functions noexcept memtable: mark functions noexcept memtable: memtable_encoding_stats_collector: mark functions noexcept encoding_state: mark functions noexcept logalloc: mark free functions noexcept logalloc: allocating_section: mark functions noexcept logalloc: allocating_section: guard: mark constructor noexcept logalloc: reclaim_lock: mark functions noexcept logalloc: tracker_reclaimer_lock: mark constructor noexcept logalloc: mark shard_tracker noexcept logalloc: region: mark functions const/noexcept logalloc: basic_region_impl: mark functions noexcept logalloc: region_impl: mark functions noexcept utils: log_heap: mark functions noexcept logalloc: region_impl: object_descriptor: mark functions noexcept logalloc: region_group: mark functions noexcept logalloc: tracker: mark functions const/noexcept logalloc: tracker::impl: make region_occupancy and friends const logalloc: tracker::impl: occupancy: get rid of reclaiming_lock logalloc: tracker::impl: mark functions noexcept logalloc: segment: mark functions const / noexcept logalloc: segment_pool: add const variant of descriptor method logalloc: segment_pool: move descriptor method to class definition logalloc: segment_pool: mark functions const/noexcept logalloc: segment_pool: delete unused free_or_restore_to_reserve method utils: dynamic_bitset: mark functions noexcept utils: dynamic_bitset: delete unused members logalloc: segment_store, segment_pool: idx_from_segment: get a const segment* in const overload logalloc: segment_store, segment_pool: return const segment* from segment_from_idx() const logalloc: segment_store: make can_allocate_more_segments const logalloc: segment_store: mark functions noexcept logalloc: segment_descriptor: mark functions noexcept logalloc: occupancy_stats: mark functions noexcept min_max_tracker: mark functions noexcept gc_clock, db_clock: mark functions noexcept dirty_memory_manager: region_group: mark functions noexcept dirty_memory_manager: region_group: make simple constructor noexcept dirty_memory_manager: region_group_reclaimer mark functions noexcept logalloc: lsa_buffer: mark functions noexcept	2022-07-27 19:08:59 +03:00
Amnon Heiman	3658aa9ec2	Add summary_test This patch adds unit tests for the summary implementation.	2022-07-27 16:58:52 +03:00
Raphael S. Carvalho	0796b8c97a	sstables: Enforce disjoint invariant in sstable_run We know that sstable_run is supposed to contain disjoint files only, but this assumption can temporarily break when switching strategies as TWCS, for example, can incorrectly pick the same run id for sstables in different windows during segregation. So when switching from TWCS to ICS, it could happen a sstable_run won't contain disjoint files. We should definitely fix TWCS and any other strategy doing that, but sstable_run should have disjointness as actual invariant, not be relaxed on it. Otherwise, we cannot build readers on this assumption, so more complicated logic have to be added to merge overlapping files. After this patch, sstable_run will reject insertion of a file that will cause the invariant to break, so caller will have to check that and push that file into a different sstable run. Closes #11116	2022-07-27 14:48:28 +03:00
Benny Halevy	bb9eddc67f	test: memtable_test: failed_flush_prevents_writes: notify_soft_pressure only once Now that memtable flush error handling was moved entirely to table::seal_active_memtable, we don't need to notify_soft_pressure to keep retry going. The inifinite retry loop should eventually either succeed or die (by isolating the node or aborting) on its own. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-27 14:06:59 +03:00
Benny Halevy	b5abbb971f	test: memtable_test: failed_flush_prevents_writes: extend error injection Inject errors into all seal_active_memtable distinct error handling sites. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-27 14:06:59 +03:00
Benny Halevy	f0a597a252	table: try_flush_memtable_to_sstable: propagate errors to seal_active_memtable And let seal_active_memtable decide about how to handle them as now all flush error handling logic is implemented there. In particular, unlike today, sstable write errors will cause internal error rather than loop forever. Also, check for shutdown earlier to ignore errors like semaphore_broken that might happen when the table is stopped. Refs #10498 (The issue will be considered fixed when going into maintenance mode on write errors rather than throwing internal error and potentially retrying forever) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-27 14:04:55 +03:00
Mikołaj Sielużycki	9f5655bb97	mutation: Add test if mutations are consumed in order It explicitly interleaves clustering rows with range tombstones and ensures the last clustering row is dummy.	2022-07-27 11:22:55 +02:00
Mikołaj Sielużycki	9c43f1266a	test: Move validating_consumer to test/lib/mutation_assertions.hh	2022-07-27 11:19:50 +02:00
Botond Dénes	81e20ceaab	Merge 'logalloc, dirty_memory_manager: move region_groups to dirty_memory_manager' from Avi Kivity logalloc manages regions of log-structured allocated memory, and region_groups containing such regions and other region_groups. region_groups were introduced for accounting purposes - first to limit the amount of memory in memtables, then to match new dirty memory allocation rate with memtable flushing rate so we never hit a situation where allocation rate exceeded flush rate, and we exceed our limit. The problem is that the abstraction is very weak - if we want to change anything in memtable flush control we'll need to change region_groups too - and also expensive to maintain. The solution is to break the abstraction and move region_groups to memtable dirty memory management code. Instead introduce a new, simpler abstraction, the region_listener, which communicates changes in region memory consumption to an external piece of code, which can then choose to do with it what it likes. The long term plan is to completely remove region_groups and fold them into dirty_memory_manager: - make each memtable a region_listener so it gets called back after size changes - make memtables inform their dirty_memory_manager about the size to dirty_memory_manager can decide to throttle writes and which memtable to pick to flush Closes #10839 * github.com:scylladb/scylla: logalloc: drop region_impl public accessors logalloc, dirty_memory_manager: move size-tracking binomial heap out of logalloc logalloc: relax lifetime rules around region_listener logalloc, dirty_memory_manager: move region_group and associated code logalloc: expose tracker_reclaimer_lock logalloc: reimplement tracker_reclaim_lock to avoid using hidden classes logalloc: reduce friendship between region and region_group logalloc: decouple region_group from region memtable: stop using logalloc::region::group() to test for flushed memtables	2022-07-26 17:08:37 +03:00
Nadav Har'El	cb8a67dc98	Merge 'Allow materialized views to by synchronous' from Piotr Sarna This pull request introduces a "synchronous mode" for global views. In this mode, all view updates are applied synchronously as if the view was local. Marking view as a synchronous one can be done using `CREATE MATERIALIZED VIEW` and `ALTER MATERIALIZED VIEW`. E.g.: ```cql ALTER MATERIALIZED VIEW ks.v WITH synchronous_updates = true; ``` Marking view as a synchronous one was done using tags (originally used by alternator). No big modifications in the view's code were needed. Fixes: https://github.com/scylladb/scylla/issues/10545 Closes #11013 * github.com:scylladb/scylla: cql-pytest: extend synchronous mv test with new cases cql-pytest: allow extra parameters in new_materialized_view docs: add a paragraph on view synchronous updates test/boost/cql_query_test: add test setting synchronous updates property test: cql-pytest: add a test for synchronous mode materialized views db: view: react to synchronous updates tag cql3: statements: cf_prop_defs: apply synchronous updates tag alternator, db: move the tag code to db/tags cql3: statements: add a synchronous_updates property	2022-07-26 15:42:51 +03:00
Avi Kivity	2cb5f79e9d	logalloc, dirty_memory_manager: move size-tracking binomial heap out of logalloc The region_group mechanism used an intrusive heap handle embedded in logalloc::region to allow region_group:s to track the largest region. But with region_group moved out of logalloc, the handle is out of place. Move it out, introducing a new intermediate class size_tracked_region to hold the heap handle. We might eventually merge the new class into memtable (which derives from it), but that requires a large rearrangement of unit tests, so defer that.	2022-07-26 11:12:10 +03:00
Avi Kivity	ee720fa23b	logalloc: relax lifetime rules around region_listener Currently, a region_listener is added during construction and removed during destruction. This was done to mimick the old region(region_group&) constructor, as region_listener replaces region_group. However, this makes moving the binomial heap handle outside logalloc difficult. The natural place for the handle is in a derived class of logalloc::region (e.g. memtable), but members of this derived class will be destroyed earlier than the logalloc::region here. We could play trickes with an earlier base class but it's better to just decouple region lifecycle from listener lifecycle. Do that be adding listen()/unlisten() methods. Some small awkwardness remains in that merge() implicitly unlistens (see comment in region::unlisten). Unit tests are adjusted.	2022-07-26 11:12:10 +03:00
Avi Kivity	fbe8ea7727	logalloc, dirty_memory_manager: move region_group and associated code region_group is an abstraction that allows accounting for groups of regions, but the cost/benefit ratio of maintaining the abstraction is poor. Each time we need to change decision algorithm of memtable flushing (admittedly rarely), we need to distill that into an abstraction for region_groups and then use it. An example is virtual regions groups; we wanted to account for the partially flushed memtables and had to invent region groups to stand in their place. Rather than continuing to invest in the abstraction, break it now and move it to the memtable dirty memory manager which is responsible for making those decisions. The relevant code is moved to dirty_memory_manager.hh and dirty_memory_manager.cc (new file), and a new unit test file is added as well. A downside of the change is that unit testing will be more difficult.	2022-07-26 11:12:10 +03:00
Michał Sala	c7b78cfd81	test/boost/cql_query_test: add test setting synchronous updates property The test checks if a synchronous_updates property can be set via ALTER MATERIALIZED VIEW or CREATE MATERIALIZED VIEW statements.	2022-07-25 09:53:33 +02:00
Avi Kivity	fd663bcb94	cql3: util: change where clause utilities to accept a single expression rather than a vector of terms Conversion to terms happens internally via boolean_factors().	2022-07-22 20:14:48 +03:00
Avi Kivity	a5dd588465	cql3: statement_restrictions: accept a single expression rather than a vector Move closer to the goal of accepting a generic expression for WHERE clause by accepting a generic expression in statement_restrictions. The various callers will synthesize it from a vector of terms.	2022-07-22 20:14:48 +03:00
Avi Kivity	8085b9f57a	cql3: expr: add boolean_factors() function to factorize an expression When analyzing a WHERE clause, we want to separate individual factors (usually relations), and later partition them into partition key, clustering key, and regular column relations. The first step is separation, for which this helper is added. Currently, it is not required since the grammar supplies the expression in separated form, but this will not work once it is relaxed to allow any expression in the WHERE clause. A unit test is added.	2022-07-22 20:14:48 +03:00
Avi Kivity	13a64d8ab2	Merge 'Remove all remaining restrictions classes' from Jan Ciołek This PR removes all code that used classes `restriction`, `restrictions` and their children. There were two fields in `statement_restrictions` that needed to be dealt with: `_clustering_columns_restrictions` and `_nonprimary_key_restrictions`. Each function was reimplemented to operate on the new expression representaiion and eventually these fields weren't needed anymore. After that the restriction classes weren't used anymore and could be deleted as well. Now all of the code responsible for analyzing WHERE clause and planning a query works on expressions. Closes #11069 * github.com:scylladb/scylla: cql3: Remove all remaining restrictions code cql3: Move a function from restrictions class to the test cql3: Remove initial_key_restrictions cql3: expr: Remove convert_to_restriction cql3: Remove _new from _new_nonprimary_key_restrictions cql3: Remove _nonprimary_key_restrictions field cql3: Reimplement uses of _nonprimary_key_restrictions using expression cql3: Keep a map of single column nonprimary key restrictions cql3: Remove _new from _new_clustering_columns_restrictions cql3: Remove _clustering_columns_restrictions from statement_restrictions cql3: Use a variable instead of dynamic cast cql3: Use the new map of single column clustering restrictions cql3: Keep a map of single column clustering key restrictions cql3: Return an expression in get_clustering_columns_restrctions() cql3: Reimplement _clustering_columns_restrictions->has_supporting_index() cql3: Don't create single element conjunction cql3: Add expr::index_supports_some_column cql3: Reimplement has_unrestricted_components() cql3: Reimplement _clustering_columns_restrictions->need_filtering() cql3: Reimplement num_prefix_columns_that_need_not_be_filtered cql3: Use the new clustering restrictions field instead of ->expression cql3: Reimplement _clustering_columns_restrictions->size() using expressions cql3: Reimplement _clustering_columns_restrictions->get_column_defs() using expressions cql3: Reimplement _clustering_columns_restrictions->is_all_eq() using expressions cql3: expr: Add has_only_eq_binops function cql3: Reimplement _clustering_columns_restrictions->empty() using expressions	2022-07-20 18:01:15 +03:00
Jan Ciolek	599bcd6ea7	cql3: Remove all remaining restrictions code The classes restriction, restrictions and its children aren't used anywhere now and can be safely removed. Some includes need to be modified for the code to compile. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-20 09:10:31 +02:00
Jan Ciolek	bff0b87c18	cql3: Move a function from restrictions class to the test statement_restrictions_test uses a function that is defined in multi_column_restriction.hh. This file will be removed soon and for the test to still work the function is moved to the test source. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-20 09:10:31 +02:00
Botond Dénes	6e20cb3255	Merge 'database_test: test_truncate_without_snapshot_during_writes: apply mutation on the correct shard' from Benny Halevy Currently, all the mutations this test generates are applied on shard 0. In rare cases, this may lead to the following crash, when the flushed sstable doesn't contain any key that belongs to the current shard, as seen in https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1390/artifact/testlog/x86_64/dev/database_test.test_truncate_without_snapshot_during_writes.114.log ``` WARN 2022-07-17 17:41:36,630 [shard 0] sstable - create_sharding_metadata: range=[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}] has no intersection with shard=0 first_key={key: pk{00046b657930}, token:-468459073612751032} last_key={key: pk{00046b657930}, token:-468459073612751032} ranges_single_shard=[] ranges_all_shards={{1, {[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}]}}} ERROR 2022-07-17 17:41:36,630 [shard 0] table - failed to write sstable /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db) ERROR 2022-07-17 17:41:36,631 [shard 0] table - Memtable flush failed due to: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db). Aborting, at 0x329e28e 0x329e780 0x329ea88 0xf5bc69 0xf956b1 0x3196dc4 0x3198037 0x319742a 0x32be2e4 0x32bd8e1 0x32ba01c 0x317f97d /lib64/libpthread.so.0+0x92a4 /lib64/libc.so.6+0x100322 ``` Instead, generate random keys and apply them on their owning shard, and truncate all database shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11066 * github.com:scylladb/scylla: database_test: test_truncate_without_snapshot_during_writes: apply mutation on the correct shard table: try_flush_memtable_to_sstable: consume: close reader on error	2022-07-20 09:06:07 +03:00
Avi Kivity	5a30f9b789	Merge 'Distributed aggregate query' from Michał Jadwiszczak This PR extends #9209. It consists of 2 main points: To enable parallelization of user-defined aggregates, reduction function was added to UDA definition. Reduction function is optional and it has to be scalar function that takes 2 arguments with type of UDA's state and returns UDA's state All currently implemented native aggregates got their reducible counterpart, which return their state as final result, so it can be reduced with other result. Hence all native aggregates can now be distributed. Local 3-node cluster made with current master. `node1` updated to this branch. Accessing node with `ccm <node-name> cqlsh` I've tested belowed things from both old and new node: - creating UDA with reduce function - not allowed - selecting count() - distributed - selecting other aggregate function - not distributed Fixes: #10224 Closes #10295 github.com:scylladb/scylla: test: add tests for parallelized aggregates test: cql3: Add UDA REDUCEFUNC test forward_service: enable multiple selection forward_service: support UDA and native aggregate parallelization cql3:functions: Add cql3::functions::functions::mock_get() cql3: selection: detect parallelize reduction type db,cql3: Move part of cql3's function into db selection: detect if selectors factory contains only simple selectors cql3: reducible aggregates DB: Add `scylla_aggregates` system table db,gms: Add SCYLLA_AGGREGATES schema features CQL3: Add reduce function to UDA gms: add UDA_NATIVE_PARALLELIZED_AGGREGATION feature	2022-07-19 19:05:19 +03:00
Benny Halevy	1c26d49fba	database_test: test_truncate_without_snapshot_during_writes: apply mutation on the correct shard Currently, all the mutations this test generates are applied on shard 0. In rare cases, this may lead to the following crash, when the flushed sstable doesn't contain any key that belongs to the current shard, as seen in https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1390/artifact/testlog/x86_64/dev/database_test.test_truncate_without_snapshot_during_writes.114.log ``` WARN 2022-07-17 17:41:36,630 [shard 0] sstable - create_sharding_metadata: range=[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}] has no intersection with shard=0 first_key={key: pk{00046b657930}, token:-468459073612751032} last_key={key: pk{00046b657930}, token:-468459073612751032} ranges_single_shard=[] ranges_all_shards={{1, {[{-468459073612751032, pk{00046b657930}}, {-468459073612751032, pk{00046b657930}}]}}} ERROR 2022-07-17 17:41:36,630 [shard 0] table - failed to write sstable /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db) ERROR 2022-07-17 17:41:36,631 [shard 0] table - Memtable flush failed due to: std::runtime_error (Failed to generate sharding metadata for /jenkins/workspace/releng/Scylla-CI/scylla/testlog/x86_64/dev/scylla-e2b694c7-db4f-4f9d-9940-9c6c21850888/ks/cf-8f74aba005de11ed92fa8661a0ed7890/me-2-big-Data.db). Aborting, at 0x329e28e 0x329e780 0x329ea88 0xf5bc69 0xf956b1 0x3196dc4 0x3198037 0x319742a 0x32be2e4 0x32bd8e1 0x32ba01c 0x317f97d /lib64/libpthread.so.0+0x92a4 /lib64/libc.so.6+0x100322 ``` Instead, generate random keys and apply them on their owning shard, and truncate all database shards. Fixes #11076 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-19 16:55:11 +03:00
Pavel Emelyanov	a56e2c83f3	sstables: Keep priority class on sstable_directory Current code accepts priotity class as an argument to various functions that need it and all its callers use streaming class. Next patches will needs to sometimes use default class, but it will require heavy patching of the distributed loader. Things get simpler if the priority class is kept on sstable_directory on start. This change also simplifies the ongoing effort on unification of sched and IO classes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-07-19 12:14:41 +03:00
Jadw1	7497fda370	test: add tests for parallelized aggregates	2022-07-18 15:25:42 +02:00
Botond Dénes	9afd2dc428	Merge 'Make compaction manager switch to table abstraction ' from Raphael "Raph" Carvalho This work gets us a step closer to compaction groups. Everything in compaction layer but compaction_manager was converted to table_state. After this work, we can start implementing compaction groups, as each group will be represented by its own table_state. User-triggered operations that span the entire table, not only a group, can be done by calling the manager operation on behalf of each group and then merging the results, if any. Closes #11028 * github.com:scylladb/scylla: compaction: remove forward declaration of replica::table compaction_manager: make add() and remove() switch to table_state compaction_manager: make run_custom_job() switch to table_state compaction_manager: major: switch to table_state compaction_manager: scrub: switch to table_state compaction_manager: upgrade: switch to table_state compaction: table_state: add get_sstables_manager() compaction_manager: cleanup: switch to table_state compaction_manager: offstrategy: switch to table_state() compaction_manager: rewrite_sstables(): switch to table_state compaction_manager: make run_with_compaction_disabled() switch to table_state compaction_manager: compaction_reenabler: switch to table_state compaction_manager: make submit(T) switch to table_state compaction_manager: task: switch to table_state compaction: table_state: Add is_auto_compaction_disabled_by_user() compaction: table_state: Add on_compaction_completion() compaction: table_state: Add make_sstable() compaction_manager: make can_proceed switch to table_state compaction_manager: make stop compaction procedures switch to table_state compaction_manager: make get_compactions() switch to table_state compaction_manager: change task::update_history() to use table_state instead compaction_manager: make can_register_compaction() switch to table_state compaction_manager: make get_candidates() switch to table_state compaction_manager: make propagate_replacement() switch to table_state compaction: Move table::in_strategy_sstables() and switch to table_state compaction: table_state: Add maintenance sstable set compaction_manager: make has_table_ongoing_compaction() switch to table_state compaction_manager: make compaction_disabled() switch to table_state compaction_manager: switch to table_state for mapping of compaction_state compaction_manager: move task ctor into source	2022-07-18 15:18:29 +03:00
Benny Halevy	bbbbea65fb	database: clear_snapshot: remove dropped table directory when it has no remaining snapshots Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-17 14:33:34 +03:00
Benny Halevy	c70a675d77	database: clear_snapshot: make it a coroutine and use thread and use an async thread around `directory_lister` rather than `lister::scan_dir` to simplify the implementation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-17 14:33:34 +03:00
Benny Halevy	e710fe527c	database_test: add clear_multiple_snapshots test Based on the `clear_snapshot` test. Test with multiple snapshots and different combinations of parameters to database::clear_snapshot. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-17 14:33:34 +03:00
Benny Halevy	ae3b1b5a64	database_test: drop_table_with_snapshots: test auto_snapshot Refactor test_drop_table_with_auto_snapshot out of drop_table_with_snapshots, adding a auto_snapshot param, controlling how to configure the cql_test_env db:.config::auto_snapshot, so we can test both cases - auto_snapshot enabled and disabled. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-17 14:33:34 +03:00
Benny Halevy	af6805dd75	database_test: populate_from_quarantine_works: pass optional db:config to do_with_some_data Instead of just `tmpdir_for_data`, so we can easily set auto_snapshot for `drop_table_with_snapshots` in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-17 14:33:34 +03:00

1 2 3 4 5 ...

1823 Commits