scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	737285d342	db/view/build_progress_virtual_reader: Fix use-after-move use-after-free in ctor, which potentially leads to a failure when locating table from moved schema object. static report In file included from db/system_keyspace.cc:51: ./db/view/build_progress_virtual_reader.hh:202:40: warning: invalid invocation of method 'operator->' on object 's' while it is in the 'consumed' state [-Wconsumed] _db.find_column_family(s->ks_name(), system_keyspace::v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS), Fixes #13395. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `1ecba373d6`)	2023-05-15 20:26:17 +03:00
Benny Halevy	d7e65a1a0a	view: view_builder: start: demote sleep_aborted log error This is not really an error, so print it in debug log_level rather than error log_level. Fixes #13374 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #13462 (cherry picked from commit `cc42f00232`)	2023-05-14 21:22:21 +03:00
Marcin Maliszkiewicz	f4200098ce	db: view: use deferred_close for closing staging_sstable_reader When consume_in_thread throws the reader should still be closed. Related https://github.com/scylladb/scylla-enterprise/issues/2661 Closes #13398 Refs: scylladb/scylla-enterprise#2661 Fixes: #13413 (cherry picked from commit `99f8d7dcbe`)	2023-05-08 09:45:54 +03:00
Botond Dénes	efba2c38ef	Merge 'db: system_keyspace: use microsecond resolution for group0_history range tombstone' from Kamil Braun in `make_group0_history_state_id_mutation`, when adding a new entry to the group 0 history table, if the parameter `gc_older_than` is engaged, we create a range tombstone in the mutation which deletes entries older than the new one by `gc_older_than`. In particular if `gc_older_than = 0`, we want to delete all older entries. There was a subtle bug there: we were using millisecond resolution when generating the tombstone, while the provided state IDs used microsecond resolution. On a super fast machine it could happen that we managed to perform two schema changes in a single millisecond; this happened sometimes in `group0_test.test_group0_history_clearing_old_entries` on our new CI/promotion machines, causing the test to fail because the tombstone didn't clear the entry correspodning to the previous schema change when performing the next schema change (since they happened in the same millisecond). Use microsecond resolution to fix that. The consecutive state IDs used in group 0 mutations are guaranteed to be strictly monotonic at microsecond resolution (see `generate_group0_state_id` in service/raft/raft_group0_client.cc). Fixes #13594 Closes #13604 * github.com:scylladb/scylladb: db: system_keyspace: use microsecond resolution for group0_history range tombstone utils: UUID_gen: accept decimicroseconds in min_time_UUID (cherry picked from commit `10c1f1dc80`)	2023-04-23 16:03:21 +03:00
Botond Dénes	6fa78b90b5	db/view/view_update_check: check_needs_view_update_path(): filter out non-member hosts We currently don't clean up the system_distributed.view_build_status table after removed nodes. This can cause false-positive check for whether view update generation is needed for streaming. The proper fix is to clean up this table, but that will be more involved, it even when done, it might not be immediate. So until then and to be on the safe side, filter out entries belonging to unknown hosts from said table. Fixes: #11905 Refs: #11836 Closes #11860 (cherry picked from commit `84a69b6adb`)	2023-03-22 09:08:37 +02:00
Wojciech Mitros	772ac59299	functions: initialize aggregates on scylla start Currently, UDAs can't be reused if Scylla has been restarted since they have been created. This is caused by the missing initialization of saved UDAs that should have inserted them to the cql3::functions::functions::_declared map, that should store all (user-)created functions and aggregates. This patch adds the missing implementation in a way that's analogous to the method of inserting UDF to the _declared map. Fixes #11309 (cherry picked from commit `e558c7d988`)	2023-03-09 12:21:07 +02:00
Benny Halevy	f3a6af663d	view: row_lock: lock_ck: find or construct row_lock under partition lock Since we're potentially searching the row_lock in parallel to acquiring the read_lock on the partition, we're racing with row_locker::unlock that may erase the _row_locks entry for the same clustering key, since there is no lock to protect it up until the partition lock has been acquired and the lock_partition future is resolved. This change moves the code to search for or allocate the row lock _after_ the partition lock has been acquired to make sure we're synchronously starting the read/write lock function on it, without yielding, to prevent this use-after-free. This adds an allocation for copying the clustering key in advance even if a row_lock entry already exists, that wasn't needed before. It only us slows down (a bit) when there is contention and the lock already existed when we want to go locking. In the fast path there is no contention and then the code already had to create the lock and copy the key. In any case, the penalty of copying the key once is tiny compared to the rest of the work that view updates are doing. This is required on top of `5007ded2c1` as seen in https://github.com/scylladb/scylladb/issues/12632 which is closely related to #12168 but demonstrates a different race causing use-after-free. Fixes #12632 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `4b5e324ecb`)	2023-02-05 17:38:29 +02:00
Michał Chojnowski	5e88421360	commitlog: fix total_size_on_disk accounting after segment file removal Currently, segment file removal first calls `f.remove_file()` and does `total_size_on_disk -= f.known_size()` later. However, `remove_file()` resets `known_size` to 0, so in effect the freed space in not accounted for. `total_size_on_disk` is not just a metric. It is also responsible for deciding whether a segment should be recycled -- it is recycled only if `total_size_on_disk - known_size < max_disk_size`. Therefore this bug has dire performance consequences: if `total_size_on_disk - known_size` ever exceeds `max_disk_size`, the recycling of commitlog segments will stop permanently, because `total_size_on_disk - known_size` will never go back below `max_disk_size` due to the accounting bug. All new segments from this point will be allocated from scratch. The bug was uncovered by a QA performance test. It isn't easy to trigger -- it took the test 7 hours of constant high load to step into it. However, the fact that the effect is permanent, and degrades the performance of the cluster silently, makes the bug potentially quite severe. The bug can be easily spotted with Prometheus as infinitely rising `commitlog_total_size_on_disk` on the affected shards. Fixes #12645 Closes #12646 (cherry picked from commit `fa7e904cd6`)	2023-02-01 21:54:52 +02:00
Nadav Har'El	099145fe9a	materialized view: fix bug in some large modifications to base partitions Sometimes a single modification to a base partition requires updates to a large number of view rows. A common example is deletion of a base partition containing many rows. A large BATCH is also possible. To avoid large allocations, we split the large amount of work into batch of 100 (max_rows_for_view_updates) rows each. The existing code assumed an empty result from one of these batches meant that we are done. But this assumption was incorrect: There are several cases when a base-table update may not need a view update to be generated (see can_skip_view_updates()) so if all 100 rows in a batch were skipped, the view update stopped prematurely. This patch includes two tests showing when this bug can happen - one test using a partition deletion with a USING TIMESTAMP causing the deletion to not affect the first 100 rows, and a second test using a specially-crafed large BATCH. These use cases are fairly esoteric, but in fact hit a user in the wild, which led to the discovery of this bug. The fix is fairly simple: To detect when build_some() is done it is no longer enough to check if it returned zero view-update rows; Rather, it explicitly returns whether or not it is done as an std::optional. The patch includes several tests for this bug, which pass on Cassandra, failed on Scylla before this patch, and pass with this patch. Fixes #12297. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #12305 (cherry picked from commit `92d03be37b`)	2023-01-04 10:05:18 +02:00
Benny Halevy	9173a3d808	view: row_lock: lock_ck: serialize partition and row locking The problematic scenario this patch fixes might happen due to unfortunate serialization of locks/unlocks between lock_pk and lock_ck, as follows: 1. lock_pk acquires an exclusive lock on the partition. 2.a lock_ck attempts to acquire shared lock on the partition and any lock on the row. both cases currently use a fiber returning a future<rwlock::holder>. 2.b since the partition is locked, the lock_partition times out returning an exceptional future. lock_row has no such problem and succeeds, returning a future holding a rwlock::holder, pointing to the row lock. 3.a the lock_holder previously returned by lock_pk is destroyed, calling `row_locker::unlock` 3.b row_locker::unlock sees that the partition is not locked and erases it, including the row locks it contains. 4.a when_all_succeeds continuation in lock_ck runs. Since the lock_partition future failed, it destroyes both futures. 4.b the lock_row future is destroyed with the rwlock::holder value. 4.c ~holder attempts to return the semaphore units to the row rwlock, but the latter was already destroyed in 3.b above. Acquiring the partition lock and row lock in parallel doesn't help anything, but it complicates error handling as seen above, This patch serializes acquiring the row lock in lock_ck after locking the partition to prevent the above race. This way, erasing the unlocked partition is never expected to happen while any of its rows locks is held. Fixes #12168 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12208 (cherry picked from commit `5007ded2c1`)	2022-12-13 14:51:44 +02:00
Nadav Har'El	eebe77b5b8	materialized views: fix view writes after base table schema change When we write to a materialized view, we need to know some information defined in the base table such as the columns in its schema. We have a "view_info" object that tracks each view and its base. This view_info object has a couple of mutable attributes which are used to lazily-calculate and cache the SELECT statement needed to read from the base table. If the base-table schema ever changes - and the code calls set_base_info() at that point - we need to forget this cached statement. If we don't (as before this patch), the SELECT will use the wrong schema and writes will no longer work. This patch also includes a reproducing test that failed before this patch, and passes afterwords. The test creates a base table with a view that has a non-trivial SELECT (it has a filter on one of the base-regular columns), makes a benign modification to the base table (just a silly addition of a comment), and then tries to write to the view - and before this patch it fails. Fixes #10026 Fixes #11542 (cherry picked from commit `2f2f01b045`)	2022-12-05 20:09:15 +02:00
Botond Dénes	ee82323599	db/view/view_builder: don't drop partition and range tombstones when resuming The view builder builds the views from a given base table in view_builder::batch_size batches of rows. After processing this many rows, it suspends so the view builder can switch to building views for other base tables in the name of fairness. When resuming the build step for a given base table, it reuses the reader used previously (also serving the role of a snapshot, pinning sstables read from). The compactor however is created anew. As the reader can be in the middle of a partition, the view builder injects a partition start into the compactor to prime it for continuing the partition. This however only included the partition-key, crucially missing any active tombstones: partition tombstone or -- since the v2 transition -- active range tombstone. This can result in base rows covered by either of this to be resurrected and the view builder to generate view updates for them. This patch solves this by using the detach-state mechanism of the compactor which was explicitly developed for situations like this (in the range scan code) -- resuming a read with the readers kept but the compactor recreated. Also included are two test cases reproducing the problem, one with a range tombstone, the other with a partition tombstone. Fixes: #11668 Closes #11671 (cherry picked from commit `5621cdd7f9`)	2022-11-07 11:45:37 +02:00
Botond Dénes	fa94222662	Merge 'Alternator, MV: fix bug in some view updates which set the view key to its existing value' from Nadav Har'El As described in issue #11801, we saw in Alternator when a GSI has both partition and sort keys which were non-key attributes in the base, cases where updating the GSI-sort-key attribute to the same value it already had caused the entire GSI row to be deleted. In this series fix this bug (it was a bug in our materialized views implementation) and add a reproducing test (plus a few more tests for similar situations which worked before the patch, and continue to work after it). Fixes #11801 Closes #11808 * github.com:scylladb/scylladb: test/alternator: add test for issue 11801 MV: fix handling of view update which reassign the same key value materialized views: inline used-once and confusing function, replace_entry() (cherry picked from commit `e981bd4f21`)	2022-11-01 13:14:21 +02:00
Michał Chojnowski	4047528bd9	db: commitlog: don't print INFO logs on shutdown The intention was for these logs to be printed during the database shutdown sequence, but it was overlooked that it's not the only place where commitlog::shutdown is called. Commitlogs are started and shut down periodically by hinted handoff. When that happens, these messages spam the log. Fix that by adding INFO commitlog shutdown logs to database::stop, and change the level of the commitlog::shutdown log call to DEBUG. Fixes #11508 Closes #11536 (cherry picked from commit `9b6fc553b4`)	2022-09-18 13:33:05 +03:00
Michał Chojnowski	1a82c61452	sstables: add a flag for disabling long-term index caching Long-term index caching in the global cache, as introduced in 4.6, is a major pessimization for workloads where accesses to the index are (spacially) sparse. We want to have a way to disable it for the affected workloads. There is already infrastructure in place for disabling it for BYPASS CACHE queries. One way of solving the issue is hijacking that infrastructure. This patch adds a global flag (and a corresponding CLI option) which controls index caching. Setting the flag to `false` causes all index reads to behave like they would in BYPASS CACHE queries. Consequences of this choice: - The per-SSTable partition_index_cache is unused. Every index_reader has its own, and they die together. Independent reads can no longer reuse the work of other reads which hit the same index pages. This is not crucial, since partition accesses have no (natural) spatial locality. Note that the original reason for partition_index_cache -- the ability to share reads for the lower and upper bound of the query -- is unaffected. - The per-SSTable cached_file is unused. Every index_reader has its own (uncached) input stream from the index file, and every bsearch_clustered_cursor has its own cached_file, which dies together with the cursor. Note that the cursor still can perform its binary search with caching. However, it won't be able to reuse the file pages read by index_reader. In particular, if the promoted index is small, and fits inside the same file page as its index_entry, that page will be re-read. It can also happen that index_reader will read the same index file page multiple times. When the summary is so dense that multiple index pages fit in one index file page, advancing the upper bound, which reads the next index page, will read the same index file page. Since summary:disk ratio is 1:2000, this is expected to happen for partitions with size greater than 2000 partition keys. Fixes #11202 (cherry picked from commit `cdb3e71045`)	2022-09-18 13:27:46 +03:00
Avi Kivity	268e4abe77	Merge 'wasm: reuse instances for wasm UDFs' from Wojciech Mitros Calling WebAssembly UDFs requires wasmtime instance. Creating such an instance is expensive, but these instances can be reused for subsequent calls of the same UDF on various inputs. This patch introduces a way of reusing wasmtime instances: a wasm instance cache. The cache stores a wasmtime instance for each UDF and scheduling group. The instances are evicted using LRU strategy and their size is based on the size of their wasm memories. The instances stored in the cache are also dropped when the UDF is dropped itself. For that reason, the first patch modifies the current implementation of UDF dropping, so that the instance dropping may be added later. The patch also removes the need of compiling the UDF again when dropping it. The second patch contains the implementation and use of the new cache. The cache is implemented in `lang/wasm_instance_cache.hh` and the main ways of using it are the `run_script` methods from `wasm.hh` The third patch adds tests to `test_wasm.py` that check the correctness and performance of the new cache. The tests confirm the instance reuse, size limits, instance eviction after timeout and after dropping the UDF. Closes #10306 * github.com:scylladb/scylladb: wasm: test instances reuse wasm: reuse UDF instances schema_tables: simplify merge_functions and avoid extra compilation	2022-08-02 13:51:16 +03:00
Benny Halevy	edd308c705	config: use ordered map for experimental features So that the help string will be sorted lexicographically. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11178	2022-08-01 17:40:10 +03:00
Benny Halevy	5991482049	commitlog: make discard_completed_segments and friends noexcept To simplify table::seal_active_memtable error handling and retry logic. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-27 13:43:17 +03:00
Michał Sala	d573ab0b58	db: view: react to synchronous updates tag Code that waited for all remote view updates was already there. This commit modifies the conditions of this wait to take into account the "synchronous mode" (enabled when db::SYNCHRONOUS_VIEW_UPDATES_TAG_KEY is set).	2022-07-25 09:53:33 +02:00
Michał Sala	128806f022	cql3: statements: cf_prop_defs: apply synchronous updates tag This commit defines a new tag key (SYNCHRONOUS_VIEW_UPDATES_TAG_KEY) to be used for marking "synchronous mode" views. This key is used in `cf_prop_defs::apply_to_builder` if the properties contain KW_SYNCHRONOUS_UPDATES.	2022-07-25 09:53:33 +02:00
Michał Sala	041cb77ad0	alternator, db: move the tag code to db/tags Tags are a useful mechanism that could be used outside of alternator namespace. My motivation to move tags_extension and other utilities to db/tags/ was that I wanted to use them to mark "synchronous mode" views. I have extracted `get_tags_of_table`, `find_tag` and `update_tags` method to db/tags/utils.cc and moved alternator/tags_extension.hh to db/tags/. The signature of `get_tags_of_table` was changed from `const std::map<sstring, sstring>&` to `const std::map<sstring, sstring>*` Original behavior of this function was to throw an `alternator::api_error` exception. This was undesirable, as it introduced a dependency on the alternator module. I chose to change it to return a potentially null value, and added a wrapper function to the alternator module - `get_tags_of_table_or_throw` to keep the previous throwing behavior.	2022-07-25 09:53:33 +02:00
Wojciech Mitros	9281ba3919	wasm: reuse UDF instances When executing a wasm UDF, most of the time is spent on setting up the instance. To minimize its cost, we reuse the instance using wasm::instance_cache. This patch adds a wasm instance cache, that stores a wasmtime instance for each UDF and scheduling group. The instances are evicted using LRU strategy. The cache may store some entries for the UDF after evicting the instance, but they are evicted when the corresponding UDF is dropped, which greatly limits their number. The size of stored instances is estimated using the size of their WASM memories. In order to be able to read the size of memory, we require that the memory is exported by the client. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2022-07-20 18:19:22 +02:00
Wojciech Mitros	d7a933068a	schema_tables: simplify merge_functions and avoid extra compilation Currently, we have 2 mere_functions methods, where one is only the only call to the other. We can replace them with a simple one. The merge_functions method compiles a UDF (using create_func) only to read its signature. We can avoid that by reading it from the row ourselves. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2022-07-20 18:10:21 +02:00
Avi Kivity	13a64d8ab2	Merge 'Remove all remaining restrictions classes' from Jan Ciołek This PR removes all code that used classes `restriction`, `restrictions` and their children. There were two fields in `statement_restrictions` that needed to be dealt with: `_clustering_columns_restrictions` and `_nonprimary_key_restrictions`. Each function was reimplemented to operate on the new expression representaiion and eventually these fields weren't needed anymore. After that the restriction classes weren't used anymore and could be deleted as well. Now all of the code responsible for analyzing WHERE clause and planning a query works on expressions. Closes #11069 * github.com:scylladb/scylla: cql3: Remove all remaining restrictions code cql3: Move a function from restrictions class to the test cql3: Remove initial_key_restrictions cql3: expr: Remove convert_to_restriction cql3: Remove _new from _new_nonprimary_key_restrictions cql3: Remove _nonprimary_key_restrictions field cql3: Reimplement uses of _nonprimary_key_restrictions using expression cql3: Keep a map of single column nonprimary key restrictions cql3: Remove _new from _new_clustering_columns_restrictions cql3: Remove _clustering_columns_restrictions from statement_restrictions cql3: Use a variable instead of dynamic cast cql3: Use the new map of single column clustering restrictions cql3: Keep a map of single column clustering key restrictions cql3: Return an expression in get_clustering_columns_restrctions() cql3: Reimplement _clustering_columns_restrictions->has_supporting_index() cql3: Don't create single element conjunction cql3: Add expr::index_supports_some_column cql3: Reimplement has_unrestricted_components() cql3: Reimplement _clustering_columns_restrictions->need_filtering() cql3: Reimplement num_prefix_columns_that_need_not_be_filtered cql3: Use the new clustering restrictions field instead of ->expression cql3: Reimplement _clustering_columns_restrictions->size() using expressions cql3: Reimplement _clustering_columns_restrictions->get_column_defs() using expressions cql3: Reimplement _clustering_columns_restrictions->is_all_eq() using expressions cql3: expr: Add has_only_eq_binops function cql3: Reimplement _clustering_columns_restrictions->empty() using expressions	2022-07-20 18:01:15 +03:00
Jan Ciolek	9d1ba07471	cql3: Reimplement uses of _nonprimary_key_restrictions using expression All parts of the code that use _nonprimary_key_restrictions are changed to use _new_nonprimary_key_restrictions instead. I decided not to split this into multiple commits, as there isn't a lot of changes and they are analogous to the ones done before for partition and clustering columns. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-20 09:10:30 +02:00
Avi Kivity	5a30f9b789	Merge 'Distributed aggregate query' from Michał Jadwiszczak This PR extends #9209. It consists of 2 main points: To enable parallelization of user-defined aggregates, reduction function was added to UDA definition. Reduction function is optional and it has to be scalar function that takes 2 arguments with type of UDA's state and returns UDA's state All currently implemented native aggregates got their reducible counterpart, which return their state as final result, so it can be reduced with other result. Hence all native aggregates can now be distributed. Local 3-node cluster made with current master. `node1` updated to this branch. Accessing node with `ccm <node-name> cqlsh` I've tested belowed things from both old and new node: - creating UDA with reduce function - not allowed - selecting count() - distributed - selecting other aggregate function - not distributed Fixes: #10224 Closes #10295 github.com:scylladb/scylla: test: add tests for parallelized aggregates test: cql3: Add UDA REDUCEFUNC test forward_service: enable multiple selection forward_service: support UDA and native aggregate parallelization cql3:functions: Add cql3::functions::functions::mock_get() cql3: selection: detect parallelize reduction type db,cql3: Move part of cql3's function into db selection: detect if selectors factory contains only simple selectors cql3: reducible aggregates DB: Add `scylla_aggregates` system table db,gms: Add SCYLLA_AGGREGATES schema features CQL3: Add reduce function to UDA gms: add UDA_NATIVE_PARALLELIZED_AGGREGATION feature	2022-07-19 19:05:19 +03:00
Avi Kivity	1f21c1ecc8	Merge "Add IO throttling to streaming class" from Pavel E " Same thing was done for compaction class some time ago, now it's time for streaming to keep repair-generated IO in bounds. This set mostly resembles the one for compaction IO class with the exception that boot-time reshard/reshape currently runs in streaming class, but that's nod great if the class is throttled, so the set also moves boot-time IO into default IO class. " * 'br-streaming-class-throttling-2' of https://github.com/xemul/scylla: distributed_loader: Populate keyspaces in default class streaming: Maintain class bandwidth streaming: Pass db::config& to manager constructor config: Add stream_io_throughput_mb_per_sec option sstables: Keep priority class on sstable_directory	2022-07-19 17:10:25 +03:00
Jan Ciolek	2b7ffd57fb	cql3: Return an expression in get_clustering_columns_restrctions() get_clustering_columns_restrctions() used to return a shared pointer to the clustering_restrictions class. Now everything is being converted to expression, so it should return an expression as well. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-07-19 16:02:01 +02:00
Pavel Emelyanov	07460761fb	Merge "Make compaction_static_shares and memtable_flush_static_shares live updateable" from Igor Ribeiro Barbosa Duarte (3): Currently, after updating the static shares it's necessary to restart the cluster. This patch series makes compaction_static_shares and memtable_flush_static_shares live updateable so that this restart isn't necessary anymore. dtests: https://github.com/igorribeiroduarte/scylla-dtest/tree/test_liveupdate_compaction_static_shares ci: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1412/ * https://github.com/igorribeiroduarte/scylla/tree/make_compaction_static_shares_live_updateable: memtable_flush: Make memtable_flush_static_shares liveupdateable compaction: Make compaction_static_shares liveupdateable backlog_controller: Unify backlog_controller constructors	2022-07-19 16:55:55 +03:00
Igor Ribeiro Barbosa Duarte	3b19bcf1a1	memtable_flush: Make memtable_flush_static_shares liveupdateable This patch makes memtable_flush_static_shares liveupdateable to avoid having to restart the cluster after updating this config. Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>	2022-07-19 10:10:46 -03:00
Igor Ribeiro Barbosa Duarte	8dd0f4672d	compaction: Make compaction_static_shares liveupdateable This patch makes compaction_static_shares liveupdateable to avoid having to restart the cluster after updating this config. Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>	2022-07-19 10:10:46 -03:00
Pavel Emelyanov	85d32485d9	config: Mark compaction_throughput_mb_per_sec option as Used Otherwise it's not shown in the --help output. Should've been the part of `868c3be0` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220716085221.26634-1-xemul@scylladb.com>	2022-07-19 13:18:17 +03:00
Pavel Emelyanov	7d0110cd31	config: Add stream_io_throughput_mb_per_sec option It's going to control the bandwidth for the streaming prio class. For now it's jsut added but does't work for real Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-07-19 12:14:41 +03:00
Pavel Emelyanov	62d95f09de	view: De-futurize make_view_update_builder() It doesn't sleep, just returns ready future with builder tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1384 it's red because e-mail notification is broken (scylla-pkg#2988) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220718132529.30751-1-xemul@scylladb.com>	2022-07-18 17:15:48 +03:00
Jadw1	59498caeca	db,cql3: Move part of cql3's function into db Moving `function`, `function_name` and `aggregate_function` into db namespace to avoid including cql3 namespace into query-request. For now, only minimal subset of cql3 function was moved to db.	2022-07-18 15:25:41 +02:00
Jadw1	d13f347621	DB: Add `scylla_aggregates` system table Saving information about UDA's reduce function to `scylla_aggregates` table and distributing it across cluster.	2022-07-18 15:25:37 +02:00
Jadw1	2c46222e31	db,gms: Add SCYLLA_AGGREGATES schema features This schema feature will be used to guard system_schema.scylla_aggregates schema table.	2022-07-18 14:18:48 +02:00
Jadw1	d8f3461147	CQL3: Add reduce function to UDA Add optional field to UDA, that describes reduce function to allow parallelization of UDA aggregates.	2022-07-18 14:18:48 +02:00
Benny Halevy	3f0402db68	legacy_schema_migrator: simplify drop_legacy_tables There is no need for utils::make_joinpoint now that the function calls replica::database::drop_table_on_all_shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-18 10:28:18 +03:00
Benny Halevy	71aad45757	schema_tables: merge_tables_and_views: use drop_table_on_all_shards So that the dropped table's directory can be removed after it has been dropped on all shards if it has no snapshots. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-17 14:33:34 +03:00
Benny Halevy	e005629afb	database: add drop_table_on_all_shards Runs drop_column_family on all database shards. Will be extended later to consider removing the table directory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-17 14:33:34 +03:00
Botond Dénes	4d2ce5c304	mutation_compactor: remove emit_only_live_rows template parameter Now that we use emit_only_live_rows::no everywhere we can remove this template parameters. Only the template parameter is removed, the internal logic around it is left in place (will be removed in a next patch), by hard-wiring `only_live()`.	2022-07-12 08:43:49 +03:00
Botond Dénes	bedc82e52c	tree: use emit_only_live_rows::no emit_only_live_rows is a convenience so downstream consumers of the mutation compactors don't have to check the `bool is_live` already passed to them. This convenience however causes a template parameter and additional logic for the compactor. As the most prominent of these consumers (the query result builder) will soon have to switch to emit_only_live_rows::no for other reasons anyway (it will want to count tombstones), we take the opportunity to switch everybody to ::no. This can be done with very little additional complexity to these consumer -- basically an additional if or two. This prepares the ground for removing this template parameter and the associate logic from the compactor.	2022-07-12 08:41:51 +03:00
Pavel Emelyanov	5526738794	view: Fix trace-state pointer use after move It's moved into .mutate_locally() but it captured and used in its continuation. It works well just because moved-from pointer looks like nullptr and all the tracing code checks for it to be non-such. tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1266/ (CI job failed on post-actions thus it's red) Fixes #11015 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220711134152.30346-1-xemul@scylladb.com>	2022-07-11 17:20:51 +03:00
Nadav Har'El	cc69177dcc	config: fix printing of experimental feature list Recently we noticed a regression where with certain versions of the fmt library, SELECT value FROM system.config WHERE name = 'experimental_features' returns string numbers, like "5", instead of feature names like "raft". It turns out that the fmt library keep changing their overload resolution order when there are several ways to print something. For enum_option<T> we happen to have to conflicting ways to print it: 1. We have an explicit operator<<. 2. We have an implicit convertor to the type held by T. We were hoping that the operator<< always wins. But in fmt 8.1, there is special logic that if the type is convertable to an int, this is used before operator<<()! For experimental_features_t, the type held in it was an old-style enum, so it is indeed convertible to int. The solution I used in this patch is to replace the old-style enum in experimental_features_t by the newer and more recommended "enum class", which does not have an implicit conversion to int. I could have fixed it in other ways, but it wouldn't have been much prettier. For example, dropping the implicit convertor would require us to change a bunch of switch() statements over enum_option (and not just experimental_features_t, but other types of enum_option). Going forward, all uses of enum_option should use "enum class", not "enum". tri_mode_restriction_t was already using an enum class, and now so does experimental_features_t. I changed the examples in the comments to also use "enum class" instead of enum. This patch also adds to the existing experimental_features test a check that the feature names are words that are not numbers. Fixes #11003. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11004	2022-07-11 09:17:30 +02:00
Nadav Har'El	a7fa29bceb	cross-tree: fix header file self-sufficiency Scylla's coding standard requires that each header is self-sufficient, i.e., it includes whatever other headers it needs - so it can be included without having to include any other header before it. We have a test for this, "ninja dev-headers", but it isn't run very frequently, and it turns out our code deviated from this requirement in a few places. This patch fixes those places, and after it "ninja dev-headers" succeeds again. Fixes #10995 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #10997	2022-07-08 12:59:14 +03:00
Avi Kivity	3b20407f25	Merge 'db: Avoid memtable flush latency on schema merge' from Tomasz Grabiec Currently, applying schema mutations involves flushing all schema tables so that on restart commit log replay is performed on top of latest schema (for correctness). The downside is that schema merge is very sensitive to fdatasync latency. Flushing a single memtable involves many syncs, and we flush several of them. It was observed to take as long as 30 seconds on GCE disks under some conditions. This patch changes the schema merge to rely on a separate commit log to replay the mutations on restart. This way it doesn't have to wait for memtables to be flushed. It has to wait for the commitlog to be synced, but this cost is well amortized. We put the mutations into a separate commit log so that schema can be recovered before replaying user mutations. This is necessary because regular writes have a dependency on schema version, and replaying on top of latest schema satisfies all dependencies. Without this, we could get loss of writes if we replay a write which depends on the latest schema on top of old schema. Also, if we have a separate commit log for schema we can delay schema parsing for after the replay and avoid complexity of recognizing schema transactions in the log and invoking the schema merge logic. I reproduced bad behavior locally on my machine with a tired (high latency) SSD disk, load driver remote. Under high load, I saw table alter (server-side part) taking up to 10 seconds before. After the patch, it takes up to 200 ms (50:1 improvement). Without load, it is 300ms vs 50ms. Fixes #8272 Fixes #8309 Fixes #1459 Closes #10333 * github.com:scylladb/scylla: config: Introduce force_schema_commit_log option config: Introduce unsafe_ignore_truncation_record db: Avoid memtable flush latency on schema merge db: Allow splitting initiatlization of system tables db: Flush system.scylla_local on change migration_manager: Do not drop system.IndexInfo on keyspace drop Introduce SCHEMA_COMMITLOG cluster feature frozen_mutation: Introduce freeze/unfreeze helpers for vectors of mutations db/commitlog: Improve error messages in case of unknown column mapping db/commitlog: Fix error format string to print the version db: Introduce multi-table atomic apply()	2022-07-07 16:03:50 +03:00
Benny Halevy	acae3cc223	treewide: stop use of deprecated coroutine::make_exception Convert most use sites from `co_return coroutine::make_exception` to `co_await coroutine::return_exception{,_ptr}` where possible. In cases this is done in a catch clause, convert to `co_return coroutine::exception`, generating an exception_ptr if needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10972	2022-07-07 15:02:16 +03:00
Avi Kivity	bfc521ee9c	Merge "Activate compaction_throughput_mb_per_sec option" from Pavel E " The option controlls the IO bandwidth of the compaction sched class. It's not set to be 16MB/s, but is unused. This set makes it 0 by default (which means unlimited), live-updateable and plugs it to the seastar sched group IO throttling. branch: https://github.com/xemul/scylla/tree/br-compaction-throttling-3 tests: unit(dev), v2: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1010/ , v2: manual config update " * 'br-compaction-throttling-3-a' of https://github.com/xemul/scylla: compaction_manager: Add compaction throughput limit updateable_value: Support dummy observing serialized_action: Allow being observer for updateable_value config: Tune the config option	2022-07-07 13:14:07 +03:00
Tomasz Grabiec	6622e3369a	config: Introduce force_schema_commit_log option	2022-07-06 22:08:56 +02:00

1 2 3 4 5 ...

2664 Commits