scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Jan Ciolek	7bbfa48bc5	cql3: Add has_eq_restriction_on_column function Adds a function that checks whether a given expression has eq restrction on the specified column. It finds restrictions like col = ... or (col, col2) = ... IN restrictions don't count, they aren't EQ restrictions Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-12-09 12:06:43 +01:00
Jan Ciolek	f76a1cd4bf	cql3: Reorganize orderings code Reorganized the code that handles column ordering (ASC or DESC). I feel that it's now clearer and easier to understand. Added an enum that describes column ordering. It has two possible values: ascending or descending. It used to be a bool that was sometimes called 'reversed', which could mean multiple things. Instead of column.type->is_reversed() != <ordering bool> there is now a function called are_column_select_results_reversed. Split checking if ordering is reversed and verifying whether it's correct into two functions. Before all of this was done by is_reversed() This is a preparation to later allow skipping ORDER BY restrictions on some columns. Adding this to the existing code caused it to get quite complex, but this new version is better suited for the task. The diff is a bit messy because I moved all ordering functions to one place, it's better to read select_statement.cc lines 1495-1651 directly. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-12-09 12:06:42 +01:00
Piotr Sarna	337906bc1c	alternator: precompute scan range parameters in a function This commit addresses a very simple FIXME left in alternator TTL implementation - it reduces the number of parameters passed to scan_table_ranges() by enclosing the parameters in a separate object. Tests: unit(release) Message-Id: <214afcd9d5c1968182ad98550105f82add216c80.1638354094.git.sarna@scylladb.com>	2021-12-02 10:04:05 +02:00
Raphael S. Carvalho	de165b864c	repair: Enable off-strategy compaction for rebuild Let's enable offstrategy for repair based rebuild, for it to take advantage of offstrategy benefits, one of the most important being compaction not acting aggressively, which is important for both reducing operation time and delivering good latency while the operation is running. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211130115957.13779-1-raphaelsc@scylladb.com>	2021-12-02 09:58:58 +02:00
David Garcia	954d5d5d63	Fix cql docs error Closes #9613	2021-12-02 09:58:58 +02:00
Avi Kivity	ef3edcf848	test: refine test suite names exposed via xunit format The test suite names seen by Jenkins are suboptimal: there is no distinction between modes, and the ".cc" suffix of file names is interpreted as a class name, which is converted to a tree node that must be clicked to expand. Massage the names to remove unnecessary information and add the mode. Closes #9696	2021-12-02 09:58:58 +02:00
Avi Kivity	9edd86362a	test: sstable_test: don't read compressed file size from closed file We read the compressed file size from a file that was already closed, resulting in EBADF on my machine. Not sure why it works for everyone else. Fix by reading the size using the path. Closes #9675	2021-12-01 16:28:46 +02:00
Raphael S. Carvalho	f23e0d7f2d	compaction_manager: Disconsider inactive tasks when filtering sstables After commit `1f5b17f`, overlapping can be introduced in level 1 because procedure that filters out sstables from partial runs is considering inactive tasks, so L1 sstables can be incorrectly filtered out from next compaction attempt. When L0 is merged into L1, overlapping is then introduced in L1 because old L1 sstables weren't considered in L0 -> L1 compaction. From now on, compaction_manager::get_candidates() will only consider active tasks, to make sure actual partial runs are filtered out. Fixes #9693. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211129180459.125847-1-raphaelsc@scylladb.com>	2021-12-01 16:11:44 +02:00
Raphael S. Carvalho	9de7abdc80	compaction: LCS: Fix inefficiency when pushing SSTables to higher levels To satisfy backlog controller, commit `28382cb25c` changed LCS to incrementally push sstables to highest level when there's nothing else to be done. That's overkill because controller will be satisfied with level L being fanout times larger than L-1. No need to push everything to last level as it's even worse than a major, because any file being promoted will overlap with ~10 files in next level. At least, the cost is amortized by multiple iterations, but terrible write amplification is still there. Consequently, this reduces overall efficiency. For example, it might happen that LCS in table A start pushing everything to highest level, when table B needs resources for compaction to reduce its backlog. Increased write amplification in A may prevent other tables from reducing their backlog in a timely manner. It's clear that LCS should stop promoting as soon as level L is 10x larger than L-1, so strategy will still be satisfied while fixing the inefficiency problem. Now layout will look like as follow: SSTables in each level: [0, 2, 15, 121] Previously, it looked like once table stopped being written to: SSTables in each level: [0, 0, 0, 138] It's always good to have everything in a single run, but that comes with a high write amplification cost which we cannot afford in steady state. With this change, the layout will still be good enough to make everybody happy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211129143606.71257-1-raphaelsc@scylladb.com>	2021-12-01 16:10:25 +02:00
Gleb Natapov	f2ab5f4e60	raft service: insert a new raft instance into the servers' list only after it is started RPC module starts to dispatching calls to a server the moment it is in the servers' list, but until raft::server::start() completes the instance is not fully created yet and is not ready to accept anything. Fix the code that initialize new raft group to insert new raft instance into the list only after it is started. Message-Id: <YZTFFW9v0NlV7spR@scylladb.com>	2021-12-01 13:11:49 +01:00
Nadav Har'El	94eb5c55c8	Merge 'Loading cache improve eviction use policy' from Vladislav Zolotarov This series introduces a new version of a loading_cache class. The old implementation was susceptible to a "pollution" phenomena when frequently used entry can get evicted by an intensive burst of "used once" entries pushed into the cache. The new version is going to have a privileged and unprivileged cache sections and there's a new loading_cache template parameter - SectionHitThreshold. The new cache algorithm goes as follows: * We define 2 dynamic cache sections which total size should not exceed the maximum cache size. * New cache entry is always added to the "unprivileged" section. * After a cache entry is read more than SectionHitThreshold times it moves to the second cache section. * Both sections' entries obey expiration and reload rules in the same way as before this patch. * When cache entries need to be evicted due to a size restriction "unprivileged" section's least recently used entries are evicted first. More details may be found in #8674. In addition, during a testing another issue was found in the authorized_prepared_statements_cache: #9590. There is a patch that fixes it as well. Closes #9708 * github.com:scylladb/scylla: loading_cache: account unprivileged section evictions loading_cache: implement a variation of least frequent recently used (LFRU) eviction policy authorized_prepared_statements_cache: always "touch" a corresponding cache entry when accessed loading_cache::timestamped::lru_entry: refactoring loading_cache.hh: rearrange the code (no functional change) loading_cache: use std::pmr::polymorphic_allocator	2021-12-01 13:13:53 +02:00
Calle Wilund	3e21fea2b6	test_streamts: test_streams_starting_sequence_number fix 'LastEvaluatedShardId' usage It is not part of raw response, but of the 'StreamDescription' object. Test fails internmittently depending on PK randomization. Closes #9710	2021-12-01 11:05:40 +02:00
Avi Kivity	03755b362a	Merge 'compaction_manager api: stop ongoing compactions' from Benny Halevy This series extends `compaction_manager::stop_ongoing_compaction` so it can be used from the api layer for: - table::disable_auto_compaction - compaction_manager::stop_compaction Fixes #9313 Fixes #9695 Test: unit(dev) Closes #9699 * github.com:scylladb/scylla: compaction_manager: stop_compaction: wait for ongoing compactions to stop compaction_manager: stop_ongoing_compactions: log Stopping 0 tasks at debug level compaction_manager: unify stop_ongoing_compactions implementations compaction_manager: stop_ongoing_compactions: add compaction_type option compaction_manager: get_compactions: get a table* parameter table: disable_auto_compaction: stop ongoing compactions compaction_manager: make stop_ongoing_compactions public table: futurize disable_auto_compactions	2021-11-30 19:08:14 +02:00
Avi Kivity	595cc328b1	Merge 'cql3: Remove term, replace with expression' from Jan Ciołek This PR finally removes the `term` class and replaces it with `expression`. * There was some trouble with `lwt_cache_id` in `expr::function_call`. The current code works the following way: * for each `function_call` inside a `term` that describes a pk restriction, `prepare_context::add_pk_function_call` is called. * `add_pk_function_call` takes a `::shared_ptr<cql3::functions::function_call>`, sets its `cache_id` and pushes this shared pointer onto a vector of all collected function calls * Later when some condiition is met we want to clear cache ids of all those collected function calls. To do this we iterate through shared pointers collected in `prepare_context` and clear cache id for each of them. This doesn't work with `expr::function_call` because it isn't kept inside a shared pointer. To solve this I put the `lwt_cache_id` inside a shared pointer and then `prepare_context` collects these shared pointers to cache ids. I also experimented with doing this without any shared pointers, maybe we could just walk through the expression and clear the cache ids ourselves. But the problem is that expressions are copied all the time, we could clear the cache in one place, but forget about a copy. Doing it using shared pointers more closely matches the original behaviour. The experiment is on the [term2-pr3-backup-altcache](https://github.com/cvybhu/scylla/tree/term2-pr3-backup-altcache) branch * `shared_ptr<term>` being `nullptr` could mean: * It represents a cql value `null` * That there is no value, like `std::nullopt` (for example in `attributes.hh`) * That it's a mistake, it shouldn't be possible A good way to distinguish between optional and mistake is to look for `my_term->bind_and_get()`, we then know that it's not an optional value. * On the other hand `raw_value` cased to bool means: * `false` - null or unset * `true` - some value, maybe empty I ran a simple benchmark on my laptop to see how performance is affected: ``` build/release/test/perf/perf_simple_query --smp 1 -m 1G --operations-per-shard 1000000 --task-quota-ms 10 ``` * On master (`a21b1fbb2f`) I get: ``` 176506.60 tps ( 77.0 allocs/op, 12.0 tasks/op, 45831 insns/op) median 176506.60 tps ( 77.0 allocs/op, 12.0 tasks/op, 45831 insns/op) median absolute deviation: 0.00 maximum: 176506.60 minimum: 176506.60 ``` * On this branch I get: ``` 172225.30 tps ( 75.1 allocs/op, 12.1 tasks/op, 46106 insns/op) median 172225.30 tps ( 75.1 allocs/op, 12.1 tasks/op, 46106 insns/op) median absolute deviation: 0.00 maximum: 172225.30 minimum: 172225.30 ``` Closes #9481 * github.com:scylladb/scylla: cql3: Remove remaining mentions of term cql3: Remove term cql3: Rename prepare_term to prepare_expression cql3: Make prepare_term return an expression instead of term cql3: expr: Add size check to evaluate_set cql3: expr: Add expr::contains_bind_marker cql3: expr: Rename find_atom to find_binop cql3: expr: Add find_in_expression cql3: Remove term in operations cql3: Remove term in relations cql3: Remove term in multi_column_restrictions cql3: Remove term in term_slice, rename to bounds_slice cql3: expr: Remove term in expression cql3: expr: Add evaluate_IN_list(expression, options) cql3: Remove term in column_condition cql3: Remove term in select_statement cql3: Remove term in update_statement cql3: Use internal cql format in insert_prepared_json_statement cache types: Add map_type_impl::serialize(range of <bytes, bytes>) cql3: Remove term in cql3/attributes cql3: expr: Add constant::view() method cql3: expr: Implement fill_prepare_context(expression) cql3: expr: add expr::visit that takes a mutable expression cql3: expr: Add receiver to expr::bind_variable	2021-11-30 16:39:39 +02:00
Avi Kivity	078f69c133	Merge "raft: (service) implement group 0 as a service" from Kostja " To ensure consistency of schema and topology changes, Scylla needs a linearizable storage for this data available at every member of the database cluster. The series introduces such storage as a service, available to all Scylla subsystems. Using this service, any other internal service such as gossip or migrations (schema) could persist changes to cluster metadata and expect this to be done in a consistent, linearizable way. The series uses the built-in Raft library to implement a dedicated Raft group, running on shard 0, which includes all members of the cluster (group 0), adds hooks to topology change events, such as adding or removing nodes of the cluster, to update group 0 membership, ensures the group is started when the server boots. The state machine for the group, i.e. the actual storage for cluster-wide information still remains a stub. Extending it to actually persist changes of schema or token ring is subject to a subsequent series. Another Raft related service was implemented earlier: Raft Group Registry. The purpose of the registry is to allow Scylla have an arbitrary number of groups, each with its own subset of cluster members and a relevant state machine, sharing a common transport. Group 0 is one (the first) group among many. " * 'raft-group-0-v12' of github.com:scylladb/scylla-dev: raft: (server) improve tracing raft: (metrics) fix spelling of waiters_awaken raft: make forwarding optional raft: (service) manage Raft configuration during topology changes raft: (service) break a dependency loop raft: (discovery) introduce leader discovery state machine system_keyspace: mark scylla_local table as always-sync commitlog system_keyspace: persistence for Raft Group 0 id and Raft Server Id raft: add a test case for adding entries on follower raft: (server) allow adding entries/modify config on a follower raft: (test) replace virtual with override in derived class raft: (server) fix a typo in exception message raft: (server) implement id() helper raft: (server) remove apply_dummy_entry() raft: (test) fix missing initialization in generator.hh	2021-11-30 16:24:51 +02:00
Raphael S. Carvalho	0d5ac845e1	compaction: Make cleanup withstand better disk pressure scenario It's not uncommong for cleanup to be issued against an entire keyspace, which may be composed of tons of tables. To increase chances of success if low on space, cleanup will now start from smaller tables first, such that bigger tables will have more space available, once they're reached, to satisfy their space requirement. parallel_for_each() is dropped and wasn't needed given that manager performs per-shard serialization of cleanup jobs. Refs #9504. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211130133712.64517-1-raphaelsc@scylladb.com>	2021-11-30 16:15:24 +02:00
Benny Halevy	957003e73f	compaction_manager: stop_compaction: wait for ongoing compactions to stop Similar to #9313, stop_compaction should also reuse the stop_ongoing_comapctions() infrastructure and wait on ongoing compactions of the given type to stop. Fixes #9695 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:09:11 +02:00
Benny Halevy	b9ba181d3c	compaction_manager: stop_ongoing_compactions: log Stopping 0 tasks at debug level Normally, "Stopping 0 tasks for 0 ongoing compactions for table ..." is not very interesting so demote its log_level to debug. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:09:11 +02:00
Benny Halevy	03e969dbef	compaction_manager: unify stop_ongoing_compactions implementations Now stop_ongoing_compactions(reason) is equivalent to to stop_ongoing_compactions(reason, nullptr, std::nullopt) so share the code of the latter for the former entry point. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:09:07 +02:00
Benny Halevy	94011bdcca	compaction_manager: stop_ongoing_compactions: add compaction_type option And make the table optional as well, so it can be used by stop_compaction() to a particular compaction type on all tables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:07:47 +02:00
Benny Halevy	a419759835	compaction_manager: get_compactions: get a table* parameter Optionally get running compaction on the provided table. This is required for stop_ongoing_compactions on a given table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:06:34 +02:00
Benny Halevy	4affa801a5	table: disable_auto_compaction: stop ongoing compactions The api call disables new regular compaction jobs from starting but it doesn't wait for ongoing compaction to stop and so it's much less useful. Returning after stopping regular compaction jobs and waiting for them to stop guarantees that no regular compactions job are running when nodetool disableautocompaction returns successfully. Fixes #9313 Test: sstable_compaction_test,sstable_directory_test(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:06:34 +02:00
Benny Halevy	3c721eb228	compaction_manager: make stop_ongoing_compactions public So it can be used directly by table code in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:06:29 +02:00
Raphael S. Carvalho	3006394312	compaction: Allow incremental compaction with interposer consumer Until commit `c94e6f8567`, interposer consumer wouldn't work with our GC writer, needed for incremental compaction correctness. Now that the technical debt is gone, let's allow incremental compaction with interposer consumer. The only change needed is serialization of replacer as two consumers cannot step on each toe, like when we have concurrent bucket writers with TWCS. sstable_compaction_test.test_bug_6472 passes with this change, which was added when #6472 was fixed by not allowing incremental compaction with interposer consumer. Refs #6472. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211126191000.43292-1-raphaelsc@scylladb.com>	2021-11-30 15:24:17 +02:00
Eliran Sinvani	ddd7248b3b	testlib: close index_reader to avoid racing condition In order to avoid race condition introduced in `9dce1e4` the index_reader should be closed prior to it's destruction. This only exposes 4.4 and earlier releases to this specific race. However, it is always a good idea to first close the index reader and only then destroy it since it is most likely to be assumed by all developers that will change the reader index in the future. Ref #9704 (because on 4.4 and earlier releases are vulnerable). Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes #9705	2021-11-30 13:05:24 +01:00
Benny Halevy	b60d697084	table: futurize disable_auto_compactions So it can stop ongoing compaction and wait for them to complete. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 08:33:04 +02:00
Vlad Zolotarov	4cb245fe3c	loading_cache: account unprivileged section evictions Provide a template parameter to provide a static callbacks object to increment a counter of evictions from the unprivileged section. If entries are evicted from the cache while still in the unprivileged section indicates a not efficient usage of the cache and should be investigated. This patch instruments authorized_prepared_statements_cache and a prepared_statements_cache objects to provide non-empty callbacks. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2021-11-29 21:45:53 -05:00
Vlad Zolotarov	1a9c6d9fd3	loading_cache: implement a variation of least frequent recently used (LFRU) eviction policy This patch implements a simple variation of LFRU eviction policy: * We define 2 dynamic cache sections which total size should not exceed the maximum cache size. * New cache entry is always added to the "unprivileged" section. * After a cache entry is read more than SectionHitThreshold times it moves to the second cache section. * Both sections' entries obey expiration and reload rules in the same way as before this patch. * When cache entries need to be evicted due to a size restriction "unprivileged" section's least recently used entries are evicted first. Note: With a 2 sections cache it's not enough for a new entry to have the latest timestamp in order not be evicted right after insertion: e.g. if all all other entries are from the privileged section. And obviously we want to allow new cache entries to be added to a cache. Therefore we can no longer first add a new entry and then shrink the cache. Switching the order of these two operations resolves the culprit. Fixes #8674 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2021-11-29 21:45:21 -05:00
Pavel Solodovnikov	e3f922c48b	raft: write raft log in user memory System dirty memory space is limited by 10MB capacity. This means that memtables cannot accumulate more than 5MB before they are flushed to sstables. This can impact performance under load. Move the `system.raft` table to the regular dirty memory space. Fixes: #9692 Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20211129200044.1144961-1-pa.solodovnikov@scylladb.com>	2021-11-29 23:51:24 +01:00
Vlad Zolotarov	66c150769b	authorized_prepared_statements_cache: always "touch" a corresponding cache entry when accessed Always "touch" a prepared_statements_cache entry when it's accessed via authorized_prepared_statements_cache. If we don't do this it may turn out that the most recently used prepared statement doesn't have the newest last_read timestamp and can get evicted before the not-so-recently-read statement if we need to create space in the prepared statements cache for a new entry. And this is going to trigger an eviction of the corresponding entry from the authorized_prepared_cache breaking the LRU paradigm of these caches. Fixes #9590 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2021-11-29 17:37:25 -05:00
Nadav Har'El	d9c5c4eab6	test/alternator: tests for Select parameter in GSI and LSI We already have tests for the behavior of the "Select" parameter when querying a base table, but this patch adds additional tests for its behavior when querying a GSI or a LSI. There are some differences: Select=ALL_PROJECTED_ATTRIBUTES is not allowed for base tables, but is allowed - and in fact is the default - for GSI and LSI. Also, GSI may not allow ALL_ATTRIBUTES (which is the default for base tables) if only a subset of the attributes were projected. The new tests xfail because the Select and Projection features have not yet been implemented in Alternator. They pass in DynamoDB. After this patch we have (hopefully) complete test coverage of the Select feature, which will be helpful when we start implementing it. Refs #5058 (Select) Refs #5036 (Projection) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211125100443.746917-1-nyh@scylladb.com>	2021-11-29 20:28:43 +01:00
Nadav Har'El	1c279118f4	test/alternator: more test cases for Select parameter Add to the existing tests for the Select parameter of the Query and Scan operations another check: That when Select is ALL_ATTRIBUTES or COUNT, specifying AttributesToGet or ProjectionExpression is forbidden - because the combination doesn't make sense. The expanded test continues to xfail on Alternator (because the Select parameter isn't yet implemented), and passes on DynamoDB. Strengthening the tests for this feature will be helpful when we decide to implement it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211125074128.741677-1-nyh@scylladb.com>	2021-11-29 20:28:25 +01:00
Vlad Zolotarov	cbabde9622	loading_cache::timestamped::lru_entry: refactoring * Store a reference to a parent (loading_cache) object instead of holding references to separate fields. * Access loading_cache fields via accessors. * Move the LRU "touch" logic to the loading_cache. * Keep only a plain "list entry" logic in the lru_entry class. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2021-11-29 14:24:56 -05:00
Vlad Zolotarov	9125b4545e	loading_cache.hh: rearrange the code (no functional change) Hide internal classes inside the loading_cache class: * Simpler calls - no need for a tricky back-referencing to access loading_cache fields. * Cleaner interface. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2021-11-29 14:24:56 -05:00
Vlad Zolotarov	fd92718f48	loading_cache: use std::pmr::polymorphic_allocator Use std::pmr::polymorphic_allocator instead of std::allocator - the former allows not to define the allocated object during the template specification. As a result we won't have to have lru_entry defined before loading_cache, which in line would allow us to rearrange classes making all classes internal to loading_cache and hence simplifying the interface. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2021-11-29 14:24:56 -05:00
Raphael S. Carvalho	1f3135abb4	sstable_set: use for_each_sstable() in make_crawling_reader() sstable_set_impl::all() may have to copy all sstables from multiple sets, if compound. let's avoid this overhead by using sstable_set_impl::for_each_sstable(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211127181037.56542-1-raphaelsc@scylladb.com>	2021-11-29 19:59:39 +02:00
Michael Livshin	f0e2ada748	fix mutation_source::operator bool() for v2 factories A mutation source is valid when it has either a v1 or v2 flat mutation reader factory, but `operator bool()` only checks for the former. Fixes #9697 Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Closes #9698	2021-11-29 19:50:37 +02:00
Nadav Har'El	8618346331	config: automate experimental_features_t::all() The experimental_features_t has an all() method, supposedly returning all values of the enum - but it's easy to forget to update it when adding a new experimental feature - and it's currently out-of-sync (it's missing the ALTERNATOR_TTL option). We already have another method, map(), where a new experimental feature must be listed otherwise it can't be used, so let's just take all()'s values from map(), automatically, instead of forcing developers to keep both lists up-to-date. Note that using the all() function to enable all experimental features is not recommended - the best practice is to enable specific experimental features, not all of them. Nevertheless, this all() function is still used in one place - in the cql_repl tool - which uses it to enable all experimental features. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211108135601.78460-1-nyh@scylladb.com>	2021-11-29 18:44:23 +02:00
Tomasz Grabiec	3226c5bf9d	Merge 'sstables: mx: enable position fast-forwarding in reverse mode' from Kamil Braun Most of the machinery was already implemented since it was used when jumping between clustering ranges of a query slice. We need only perform one additional thing when performing an index skip during fast-forwarding: reset the stored range tombstone in the consumer (which may only be stored in fast-forwarding mode, so it didn't matter that it wasn't reset earlier). Comments were added to explain the details. As a preparation for the change, we extend the sstable reversing reader random schema test with a fast-forwarding test and include some minor fixes. Fixes #9427. Closes #9484 * github.com:scylladb/scylla: query-request: add comment about clustering ranges with non-full prefix key bounds sstables: mx: enable position fast-forwarding in reverse mode test: sstable_conforms_to_mutation_source_test: extend `test_sstable_reversing_reader_random_schema` with fast-forwarding test: sstable_conforms_to_mutation_source_test: fix `vector::erase` call test: mutation_source_test: extract `forwardable_reader_to_mutation` function test: random_schema: fix clustering column printing in `random_schema::cql`	2021-11-29 16:01:53 +01:00
Raphael S. Carvalho	80a1ebf0f3	compaction_manager: Fix race when selecting sstables for rewrite operations Rewrite operations are scrub, cleanup and upgrade. Race can happen because 'selection of sstables' and 'mark sstables as compacting' are decoupled. So any deferring point in between can lead to a parallel compaction picking the same files. After commit `2cf0c4bbf`, files are marked as compacting before rewrite starts, but it didn't take into account the commit `c84217ad` which moved retrieval of candidates to a deferring thread, before rewrite_sstables() is even called. Scrub isn't affected by this because it uses a coarse grained approach where whole operation is run with compaction disabled, which isn't good because regular compaction cannot run until its completion. From now on, selection of files and marking them as compacting will be serialized by running them with compaction disabled. Now cleanup will also retrieve sstables with compaction disabled, meaning it will no longer leave uncleaned files behind, which is important to avoid data resurrection if node regains ownership of data in uncleaned files. Fixes #8168. Refs #8155. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211129133107.53011-1-raphaelsc@scylladb.com>	2021-11-29 16:27:29 +02:00
Avi Kivity	bcadd8229b	Merge "memtable-sstable: Add compacting reader when flushing memtable." from Mikołaj " When memtable contains both mutations and tombstones that delete them, the output flushed to sstables contains both mutations. Inserting a compacting reader results in writing smaller sstables and saves compaction work later. There are mixed performance implications of this change: - If no rows are removed, there is a ~12% penalty on writing. Read times are not affected. A heuristic is implemented to avoid this problem - compaction is executed only if there are tombstones. - Read and write performance linearly improves with percentage of rows removed. At ~15% of rows removed, writes become faster than without compaction. In the tables below in columns 4 and 7, values below 100% denote improvement and values over 100% denote regression. The tests were performed on a table with 5 columns and the exact percentages will vary across different table schemas. 1. percentage removed 2. write duration/row no compaction 3. write duration/row with compaction 4. write performance new/old 5. read duration/row no compaction 6. read duration/row with compaction 7. read performance new/old 1 2 3 4 5 6 7 5 8.91E-07 9.64E-07 108.25% 6.05E-07 5.76E-07 95.23% 10 9.28E-07 9.94E-07 107.15% 6.14E-07 5.56E-07 90.55% 15 9.27E-07 9.21E-07 99.43% 6.24E-07 5.27E-07 84.39% 20 9.28E-07 9.03E-07 97.31% 6.19E-07 4.83E-07 78.03% 25 9.49E-07 8.58E-07 90.40% 6.40E-07 4.59E-07 71.76% 30 9.68E-07 8.28E-07 85.61% 6.35E-07 4.20E-07 66.07% 35 9.81E-07 8.07E-07 82.26% 6.38E-07 3.88E-07 60.85% 40 9.97E-07 7.81E-07 78.35% 6.43E-07 3.59E-07 55.91% 45 1.01E-06 7.59E-07 75.28% 6.45E-07 3.34E-07 51.75% 50 1.02E-06 7.30E-07 71.52% 6.55E-07 3.00E-07 45.78% 55 1.06E-06 7.08E-07 66.97% 6.65E-07 2.70E-07 40.56% 60 1.04E-06 6.87E-07 66.20% 6.62E-07 2.40E-07 36.22% 65 1.05E-06 6.56E-07 62.49% 6.60E-07 2.12E-07 32.04% 70 1.06E-06 6.34E-07 59.58% 6.66E-07 1.80E-07 27.07% 75 1.07E-06 6.09E-07 56.90% 6.69E-07 1.50E-07 22.38% 80 1.09E-06 5.84E-07 53.58% 6.80E-07 1.20E-07 17.62% 85 1.10E-06 5.56E-07 50.49% 6.83E-07 9.00E-08 13.18% 90 1.11E-06 5.33E-07 47.92% 6.90E-07 5.97E-08 8.66% 95 1.12E-06 5.07E-07 45.10% 6.93E-07 3.04E-08 4.39% 100 1.14E-06 4.87E-07 42.77% 6.97E-07 6.56E-12 0.00% 1. percentage removed 2. write instructions retired/row no compaction 3. write instructions retired/row with compaction 4. write performance new/old 5. read instructions retired/row no compaction 6. read instructions retired/row with compaction 7. read performance new/old 1 2 3 4 5 6 7 5 10276 11188 108.88% 7735 7297 94.34% 10 10463 10891 104.09% 7797 6913 88.66% 15 10633 10596 99.65% 7852 6529 83.15% 20 10811 10300 95.27% 7910 6145 77.69% 25 10997 9998 90.92% 7976 5755 72.15% 30 11177 9707 86.85% 8033 5376 66.92% 35 11353 9412 82.90% 8092 4992 61.69% 40 11522 9111 79.07% 8143 4604 56.54% 45 11708 8819 75.32% 8208 4224 51.46% 50 11877 8520 71.74% 8259 3836 46.45% 55 12064 8228 68.20% 8325 3456 41.51% 60 12240 7928 64.77% 8382 3069 36.61% 65 12419 7635 61.48% 8440 2688 31.85% 70 12598 7339 58.26% 8499 2304 27.11% 75 12768 7043 55.16% 8549 1920 22.46% 80 12977 6747 51.99% 8616 1536 17.83% 85 13131 6451 49.13% 8673 1152 13.28% 90 13311 6155 46.24% 8731 767 8.78% 95 13487 5858 43.43% 8790 383 4.36% 100 13657 5562 40.73% 8841 0 0.00% " * 'add-compacting-reader-when-flushing-memtable-v6' of github.com:mikolajsieluzycki/scylla: memtable-sstable: Add compacting reader when flushing memtable. memtable-sstable: Track existence of tombstones in memtable.	2021-11-29 15:15:59 +02:00
Mikołaj Sielużycki	a88f7df195	memtable-sstable: Add compacting reader when flushing memtable. When memtable contains both mutations and tombstones that delete them, the output flushed to sstables contains both mutations. Inserting a compacting reader results in writing smaller sstables and saves compaction work later. Performance tests of this change have shown a regression in a common case where there are no deletes. A heuristic is employed to skip compaction unless there are tombstones in the memtable to minimise the impact of that issue.	2021-11-29 13:19:42 +01:00
Mikołaj Sielużycki	6dd9f63f3b	memtable-sstable: Track existence of tombstones in memtable. Add flags if memtable contains tombstones. They can be used as a heuristic to determine if a memtable should be compacted on flush. It's an intermediate step until we can compact during applying mutations on a memtable.	2021-11-29 13:06:12 +01:00
Kamil Braun	b2b242d0ad	query-request: add comment about clustering ranges with non-full prefix key bounds	2021-11-29 11:10:49 +01:00
Kamil Braun	8722e0d23c	sstables: mx: enable position fast-forwarding in reverse mode Most of the machinery was already implemented since it was used when jumping between clustering ranges of a query slice. We need only perform one additional thing when performing an index skip during fast-forwarding: reset the stored range tombstone in the consumer (which may only be stored in fast-forwarding mode, so it didn't matter that it wasn't reset earlier). Comments were added to explain the details.	2021-11-29 11:10:49 +01:00
Kamil Braun	ea6310961c	test: sstable_conforms_to_mutation_source_test: extend `test_sstable_reversing_reader_random_schema` with fast-forwarding The test would check whether the forward and reverse readers returned consistent results when created in non-forwarding mode with slicing. Do the same but using fast-forwarding instead of slicing. To do this we require a vector of `position_range`s. We also need a vector of `clustering_range`s for the existing test. We modify the existing `random_ranges` function to return `position_range`s instead of `clustering_range`s since `position_range`s are easier to reason about, especially when we consider non-full clustering key prefixes. A function is introduced to convert a `position_range` to a `clustering_range` for the existing test.	2021-11-29 11:10:46 +01:00
Benny Halevy	cf528d7df9	database: shutdown: don't shutdown keyspaces yet Don't shutdown the keyspaces just yet, since they are needed during shutdown. FIXME: restore when #8995 is fixed and no queries are issued after the database shuts down. Refs #8995 Fixes #9684 Test: unit(dev) - scylla-gdb test fails locally with #9677 DTest: update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_adding_info_{1,2}_test(dev) - running now into #8995. dtest fails with unexpected error: "storage_proxy - Exception when communicating with 127.0.62.4, to read from system_distributed.service_levels: seastar::gate_closed_exception (gate closed)" Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211127083348.146649-2-bhalevy@scylladb.com>	2021-11-29 11:59:45 +02:00
Benny Halevy	93367ba55f	effective_replication_map_factory: temporarily unregister outstanding maps when destroyed The next patch will disable stopping the keyspaces in database shutdown due to #9684. This will leave outstanding e_r_m:s when the factory is destroyed. They must be unregistered from the factory so they won't try to submit_background_work() to gently clear their contents. Support that temporarily until shutdown is fixed to ensure they are no outstanding e_r_m:s when the factory is destroyed, at which point this can turn into an internal error. Refs #8995 Refs #9684 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211127083348.146649-1-bhalevy@scylladb.com>	2021-11-29 11:59:44 +02:00
Nadav Har'El	1e2ecd282a	Merge 'Harden compaction manager remove' from Benny Halevy This series hardens compaction_manager::remove by: - add debug logging around task execution and stopping. - access compaction_state as lw_shared_ptr rather than via a raw pointer. - with that, detach it from `_compaction_state` in `compaction_manager::remove` right away, to prevent further use of it while compactions are stopped. - added write_lock in `remove` to make sure the lock is not held by any stray task. Test: unit(dev), sstable_compaction_test(debug) Dtest: alternator_tests.py:AlternatorTest.test_slow_query_logging (debug) Closes #9636 * github.com:scylladb/scylla: compaction_manager: add compaction_state when table is constructed compaction_manager: remove: fixup indentation compaction_manager: remove: detach compaction_state before stopping ongoing compactions compaction_manager: remove: serialize stop_ongoing_compactions and gate.close compaction_manager: task: keep a reference on compaction_state test: sstable_compaction_test: incremental_compaction_data_resurrection_test: stop table before it's destroyed. test: sstable_utils: compact_sstables: deregister compaction also on error path test: sstable_compaction_test: partial_sstable_run_filtered_out_test: deregiser_compaction also on error path test: compaction_manager_test: add debug logging to register/deregister compaction test: compaction_manager_test: deregister_compaction: erase by iterator test: compaction_manager_test: move methods out of line compaction_manager: compaction_state: use counter for compaction_disabled compaction_manager: task: delete move and copy constructors compaction_manager: add per-task debug log messages compaction_manager: stop_ongoing_compactions: log number of tasks to stop	2021-11-28 22:12:52 +02:00
Avi Kivity	b23af15432	tests: consolidate boost xunit result files The recent parallelization of boost unit tests caused an increase in xml result files. This is challenging to Jenkins, since it appears to use rpc-over-ssh to read the result files, and as a result it takes more than an hour to read all result files when the Jenkins main node is not on the same continent as the agent. To fix this, merge the result files in test.py and leave one result file per mode. Later we can leave one result file overall (integrating the mode into the testsuite name), but that can wait. Tested on a local Jenkins instance (just reading the result files, not the entire build). Closes #9668	2021-11-28 22:12:52 +02:00

1 2 3 4 5 ...

29204 Commits