scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 05:35:48 +00:00

Author	SHA1	Message	Date
Benny Halevy	6ad94fedf3	utils: clear_gently: do not clear null unique_ptr Otherwise the null pointer is dereferenced. Add a unit test reproducing the issue and testing this fix. Fixes #13636 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `12877ad026`)	2023-04-24 17:51:01 +03:00
Botond Dénes	7b2215d8e0	Merge 'Backport bugfixes regarding UDT, UDF, UDA interactions to branch-5.2' from Wojciech Mitros This patch backports https://github.com/scylladb/scylladb/pull/12710 to branch-5.2. To resolve the conflicts that it's causing, it also includes * https://github.com/scylladb/scylladb/pull/12680 * https://github.com/scylladb/scylladb/pull/12681 Closes #13542 * github.com:scylladb/scylladb: uda: change the UDF used in a UDA if it's replaced functions: add helper same_signature method uda: return aggregate functions as shared pointers udf: also check reducefunc to confirm that a UDF is not used in a UDA udf: fix dropping UDFs that share names with other UDFs used in UDAs pytest: add optional argument for new_function argument types udt: disallow dropping a user type used in a user function	2023-04-19 01:38:08 -04:00
Botond Dénes	c9a17c80f6	mutation/mutation_compactor: consume_partition_end(): reset _stop The purpose of `_stop` is to remember whether the consumption of the last partition was interrupted or it was consumed fully. In the former case, the compactor allows retreiving the compaction state for the given partition, so that its compaction can be resumed at a later point in time. Currently, `_stop` is set to `stop_iteration::yes` whenever the return value of any of the `consume()` methods is also `stop_iteration::yes`. Meaning, if the consuming of the partition is interrupted, this is remembered in `_stop`. However, a partition whose consumption was interrupted is not always continued later. Sometimes consumption of a partitions is interrputed because the partition is not interesting and the downstream consumer wants to stop it. In these cases the compactor should not return an engagned optional from `detach_state()`, because there is not state to detach, the state should be thrown away. This was incorrectly handled so far and is fixed in this patch, but overwriting `_stop` in `consume_partition_end()` with whatever the downstream consumer returns. Meaning if they want to skip the partition, then `_stop` is reset to `stop_partition::no` and `detach_state()` will return a disengaged optional as it should in this case. Fixes: #12629 Closes #13365 (cherry picked from commit `bae62f899d`)	2023-04-18 02:32:24 -04:00
Wojciech Mitros	51f19d1b8c	udt: disallow dropping a user type used in a user function Currently, nothing prevents us from dropping a user type used in a user function, even though doing so may make us unable to use the function correctly. This patch prevents this behavior by checking all function argument and return types when executing a drop type statement and preventing it from completing if the type is referenced by any of them. (cherry picked from commit `86c61828e6`)	2023-04-17 13:13:35 +02:00
Botond Dénes	3e10c3fc89	reader_concurrency_semaphore: don't evict inactive readers needlessly Inactive readers should only be evicted to free up resources for waiting readers. Evicting them when waiters are not admitted for any other reason than resources is wasteful and leads to extra load later on when these evicted readers have to be recreated end requeued. This patch changes the logic on both the registering path and the admission path to not evict inactive readers unless there are readers actually waiting on resources. A unit-test is also added, reproducing the overly-agressive eviction and checking that it doesn't happen anymore. Fixes: #11803 Closes #13286 (cherry picked from commit `bd57471e54`)	2023-04-14 10:37:30 +03:00
Pavel Emelyanov	487ba9f3e1	Merge '[backport] reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict()' from Botond Dénes This PR backports `2f4a793457` to branch-5.2. Said patch depends on some other patches that are not part of any release yet. This PR should apply to 5.1 and 5.0 too. Closes #13162 * github.com:scylladb/scylladb: reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict() reader_permit: expose operator<<(reader_permit::state) reader_permit: add get_state() accessor	2023-03-16 18:41:08 +03:00
Botond Dénes	bd4f9e3615	Merge 'readers/nonforwarding: don't emit partition_end on next_partition,fast_forward_to' from Gusev Petr The series fixes the `make_nonforwardable` reader, it shouldn't emit `partition_end` for previous partition after `next_partition()` and `fast_forward_to()` Fixes: #12249 Closes #12978 * github.com:scylladb/scylladb: flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE make_nonforwardable: test through run_mutation_source_tests make_nonforwardable: next_partition and fast_forward_to when single_partition is true make_forwardable: fix next_partition flat_mutation_reader_v2: drop forward_buffer_to nonforwardable reader: fix indentation nonforwardable reader: refactor, extract reset_partition nonforwardable reader: add more tests nonforwardable reader: no partition_end after fast_forward_to() nonforwardable reader: no partition_end after next_partition() nonforwardable reader: no partition_end for empty reader row_cache: pass partition_start though nonforwardable reader (cherry picked from commit `46efdfa1a1`)	2023-03-16 10:42:03 +02:00
Botond Dénes	c68deb2461	reader_concurrency_semaphore:: clear_inactive_reads(): defer evicting to evict() Instead of open-coding the same, in an incomplete way. clear_inactive_reads() does incomplete eviction in severeal ways: * it doesn't decrement _stats.inactive_reads * it doesn't set the permit to evicted state * it doesn't cancel the ttl timer (if any) * it doesn't call the eviction notifier on the permit (if there is one) The list goes on. We already have an evict() method that all this correctly, use that instead of the current badly open-coded alternative. This patch also enhances the existing test for clear_inactive_reads() and adds a new one specifically for `stop()` being called while having inactive reads. Fixes: #13048 Closes #13049 (cherry picked from commit `2f4a793457`)	2023-03-14 09:50:16 +02:00
Raphael S. Carvalho	22c1685b3d	sstables: Temporarily disable loading of first and last position metadata It's known that reading large cells in reverse cause large allocations. Source: https://github.com/scylladb/scylladb/issues/11642 The loading is preliminary work for splitting large partitions into fragments composing a run and then be able to later read such a run in an efficiency way using the position metadata. The splitting is not turned on yet, anywhere. Therefore, we can temporarily disable the loading, as a way to avoid regressions in stable versions. Large allocations can cause stalls due to foreground memory eviction kicking in. The default values for position metadata say that first and last position include all clustering rows, but they aren't used anywhere other than by sstable_run to determine if a run is disjoint at clustering level, but given that no splitting is done yet, it does not really matter. Unit tests relying on position metadata were adjusted to enable the loading, such that they can still pass. Fixes #11642. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12979 (cherry picked from commit `d73ffe7220`)	2023-02-27 08:58:34 +02:00
Avi Kivity	0b418fa7cf	cql3, transport, tests: remove "unset" from value type system The CQL binary protocol introduced "unset" values in version 4 of the protocol. Unset values can be bound to variables, which cause certain CQL fragments to be skipped. For example, the fragment `SET a = :var` will not change the value of `a` if `:var` is bound to an unset value. Unsets, however, are very limited in where they can appear. They can only appear at the top-level of an expression, and any computation done with them is invalid. For example, `SET list_column = [3, :var]` is invalid if `:var` is bound to unset. This causes the code to be littered with checks for unset, and there are plenty of tests dedicated to catching unsets. However, a simpler way is possible - prevent the infiltration of unsets at the point of entry (when evaluating a bind variable expression), and introduce guards to check for the few cases where unsets are allowed. This is what this long patch does. It performs the following: (general) 1. unset is removed from the possible values of cql3::raw_value and cql3::raw_value_view. (external->cql3) 2. query_options is fortified with a vector of booleans, unset_bind_variable_vector, where each boolean corresponds to a bind variable index and is true when it is unset. 3. To avoid churn, two compatiblity structs are introduced: cql3::raw_value{,_view}_vector_with_unset, which can be constructed from a std::vector<raw_value{,_view/}>, which is what most callers have. They can also be constructed with explicit unset vectors, for the few cases they are needed. (cql3->variables) 4. query_options::get_value_at() now throws if the requested bind variable is unset. This replaces all the throwing checks in expression evaluation and statement execution, which are removed. 5. A new query_options::is_unset() is added for the users that can tolerate unset; though it is not used directly. 6. A new cql3::unset_operation_guard class guards against unsets. It accepts an expression, and can be queried whether an unset is present. Two conditions are checked: the expression must be a singleton bind variable, and at runtime it must be bound to an unset value. 7. The modification_statement operations are split into two, via two new subclasses of cql3::operation. cql3::operation_no_unset_support ignores unsets completely. cql3::operation_skip_if_unset checks if an operand is unset (luckily all operations have at most one operand that tolerates unset) and applies unset_operation_guard to it. 8. The various sites that accept expressions or operations are modified to check for should_skip_operation(). This are the loops around operations in update_statement and delete_statement, and the checks for unset in attributes (LIMIT and PER PARTITION LIMIT) (tests) 9. Many unset tests are removed. It's now impossible to enter an unset value into the expression evaluation machinery (there's just no unset value), so it's impossible to test for it. 10. Other unset tests now have to be invoked via bind variables, since there's no way to create an unset cql3::expr::constant. 11. Many tests have their exception message match strings relaxed. Since unsets are now checked very early, we don't know the context where they happen. It would be possible to reintroduce it (by adding a format string parameter to cql3::unset_operation_guard), but it seems not to be worth the effort. Usage of unsets is rare, and it is explicit (at least with the Python driver, an unset cannot be introduced by ommission). I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't recognize unsets) with cql3::maybe_unset_value (that does), but that caused huge amounts of churn, so I abandoned that in favor of the current approach. Closes #12517	2023-01-16 21:10:56 +02:00
Kamil Braun	bed555d1e5	db: system_keyspace: rename 'raft_config' to 'raft_snapshot_config' Make it clear that the table stores the snapshot configuration, which is not necessarily the currently operating configuration (the last one appended to the log). In the future we plan to have a separate virtual table for showing the currently operating configuration, perhaps we will call it `system.raft_config`.	2023-01-12 16:21:26 +01:00
Michał Radwański	dcab289656	boost/mvcc_test: use failure_injecting_allocation_strategy where it is meant to In test_apply_is_atomic, a basic form of exception testing is used. There is failure_injecting_allocation_strategy, which however is not used for any allocation, since for some reason, `with_allocator(r.allocator()` is used instead of `with_allocator(alloc`. Fix that. Closes #12354	2023-01-10 12:01:36 +01:00
Tomasz Grabiec	ebcd736343	cache: Fix undefined behavior when populating with non-full keys Regression introduced in `23e4c8315`. view_and_holder position_in_partiton::after_key() triggers undefined behavior when the key was not full because the holder is moved, which invalidates the view. Fixes #12367 Closes #12447	2023-01-10 12:51:54 +02:00
Raphael S. Carvalho	05ffb024bb	replica: Kill table::calculate_shard_from_sstable_generation() Inferring shard from generation is long gone. We still use it in some scripts, but that's no longer needed in Scylla, when loading the SSTables, and it also conflicts with ongoing work of UUID-based generations. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12476	2023-01-09 20:17:57 +02:00
Nadav Har'El	d6e6820f33	Merge 'Drop support for cql binary protocols versions 1 and 2' from Avi Kivity The CQL binary protocol version 3 was introduced in 2014. All Scylla version support it, and Cassandra versions 2.1 and newer. Versions 1 and 2 have 16-bit collection sizes, while protocol 3 and newer use 32-bit collection sizes. Unfortunately, we implemented support for multiple serialization formats very intrusively, by pushing the format everywhere. This avoids the need to re-serialize (sometimes) but is quite obnoxious. It's also likely to be broken, since it's almost untested and it's too easy to write cql_serialization_format::internal() instead of propagating the client specified value. Since protocols 1 and 2 are obsolete for 9 years, just drop them. It's easy to verify that they are no longer in use on a running system by examining the `system.clients` table before upgrade. Fixes #10607 Closes #12432 * github.com:scylladb/scylladb: treewide: drop cql_serialization_format cql: modification_statement: drop protocol check for LWT transport: drop cql protocol versions 1 and 2	2023-01-09 18:52:41 +02:00
Tomasz Grabiec	f97268d8f2	row_cache: Fix violation of the "oldest version are evicted first" when evicting last dummy Consider the following MVCC state of a partition: v2: ==== <7> [entry2] ==== <9> ===== <last dummy> v1: ================================ <last dummy> [entry1] Where === means a continuous range and --- means a discontinuous range. After two LRU items are evicted (entry1 and entry2), we will end up with: v2: ---------------------- <9> ===== <last dummy> v1: ================================ <last dummy> [entry1] This will cause readers to incorrectly think there are no rows before entry <9>, because the range is continuous in v1, and continuity of a snapshot is a union of continuous intervals in all versions. The cursor will see the interval before <9> as continuous and the reader will produce no rows. This is only temporary, because current MVCC merging rules are such that the flag on the latest entry wins, so we'll end up with this once v1 is no longer needed: v2: ---------------------- <9> ===== <last dummy> ...and the reader will go to sstables to fetch the evicted rows before entry <9>, as expected. The bug is in rows_entry::on_evicted(), which treats the last dummy entry in a special way, and doesn't evict it, and doesn't clear the continuity by omission. The situation is not easy to trigger because it requires certain eviction pattern concurrent with multiple reads of the same partition in different versions, so across memtable flushes. Closes #12452	2023-01-09 16:10:52 +02:00
Avi Kivity	5ffe4fee6d	Merge 'Remove legacy half reverse' from Michał Radwański This commit removes consume_in_reverse::legacy_half_reverse, an option once used to indicate that the given key ranges are sorted descending, based on the clustering key of the start of the range, and that the range tombstones inside partition would be sorted (descending, as all the mutation fragments would) according to their end (but range tombstone would still be stored according to their start bound). As it turns out, mutation::consume, when called with legacy_half_reverse option produces invalid fragment stream, one where all the row tombstone changes come after all the clustering rows. This was not an issue, since when constructing results from the query, Scylla would not pass the tombstones to the client, but instead compact data beforehand. In this commit, the consume_in_reverse::legacy_half_reverse is removed, along with all the uses. As for the swap out in mutation_partition.cc in query_mutation and to_data_query_result: The downstream was not prepared to deal with legacy_half_reverse. mutation::consume contains ``` if (reverse == consume_in_reverse::yes) { while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) { co_await yield(); } } else { while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) { co_await yield(); } } ``` So why did it work at all? to_data_query_result deals with a single slice. The used consumer (compact_for_query_v2) compacts-away the range tombstone changes, and thus the only difference between the consume_in_reverse::no and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys and the second one was ordered decreasing. This property is maintained if we swap out for the consume_in_reverse::yes format. Refs: #12353 Closes #12453 * github.com:scylladb/scylladb: mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse mutation_partition_view: treat query::partition_slice::option::reversed in to_data_query_result as consume_in_reverse::yes mutation: move consume_in_reverse def to mutation_consumer.hh	2023-01-08 15:42:00 +02:00
Botond Dénes	c4688563e3	sstables: track decompressed buffers Convert decompressed temporary buffers into tracked buffers just before returning them to the upper layer. This ensures these buffers are known to the reader concurrency semaphore and it has an accurate view of the actual memory consumption of reads. Fixes: #12448 Closes #12454	2023-01-08 15:34:28 +02:00
Wojciech Mitros	996a942e05	test: assert that WASM allocations can fail without crashing The main source of big allocations in the WASM UDF implementation is the WASM Linear Memory. We do not want Scylla to crash even if a memory allocation for the WASM Memory fails, so we assert that an exception is thrown instead. The wasmtime runtime does not actually fail on an allocation failure (assuming the memory allocator does not abort and returns nullptr instead - which our seastar allocator does). What happens then depends on the failed allocation handling of the code that was compiled to WASM. If the original code threw an exception or aborted, the resulting WASM code will trap. To make sure that we can handle the trap, we need to allow wasmtime to handle SIGILL signals, because that what is used to carry information about WASM traps. The new test uses a special WASM Memory allocator that fails after n allocations, and the allocations include both memory growth instructions in WASM, as well as growing memory manually using the wasmtime API. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2023-01-06 14:07:29 +01:00
Wojciech Mitros	f05d612da8	wasm: limit memory allocated using mmap The wasmtime runtime allocates memory for the executable code of the WASM programs using mmap and not the seastar allocator. As a result, the memory that Scylla actually uses becomes not only the memory preallocated for the seastar allocator but the sum of that and the memory allocated for executable codes by the WASM runtime. To keep limiting the memory used by Scylla, we measure how much memory do the WASM programs use and if they use too much, compiled WASM UDFs (modules) that are currently not in use are evicted to make room. To evict a module it is required to evict all instances of this module (the underlying implementation of modules and instances uses shared pointers to the executable code). For this reason, we add reference counts to modules. Each instance using a module is a reference. When an instance is destroyed, a reference is removed. If all references to a module are removed, the executable code for this module is deallocated. The eviction of a module is actually acheved by eviction of all its references. When we want to free memory for a new module we repeatedly evict instances from the wasm_instance_cache using its LRU strategy until some module loses all its instances. This process may not succeed if the instances currently in use (so not in the cache) use too much memory - in this case the query also fails. Otherwise the new module is added to the tracking system. This strategy may evict some instances unnecessarily, but evicting modules should not happen frequently, and any more efficient solution requires an even bigger intervention into the code.	2023-01-06 14:07:29 +01:00
Wojciech Mitros	b8d28a95bf	wasm: add configuration options for instance cache and udf execution Different users may require different limits for their UDFs. This patch allows them to configure the size of their cache of wasm, the maximum size of indivitual instances stored in the cache, the time after which the instances are evicted, the fuel that all wasm UDFs are allowed to consume before yielding (for the control of latency), the fuel that wasm UDFs are allowed to consume in total (to allow performing longer computations in the UDF without detecting an infinite loop) and the hard limit of the size of UDFs that are executed (to avoid large allocations)	2023-01-06 14:07:27 +01:00
Wojciech Mitros	3214f5c2db	test: check that wasmtime functions yield The new implementation for WASM UDFs allows executing the UDFs in pieces. This commit adds a test asserting that the UDF is in fact divided and that each of the execution segments takes no longer than 1ms.	2023-01-06 14:05:53 +01:00
Michał Radwański	1fbf433966	mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse This commit removes consume_in_reverse::legacy_half_reverse, an option once used to indicate that the given key ranges are sorted descending, based on the clustering key of the start of the range, and that the range tombstones inside partition would be sorted (descending, as all the mutation fragments would) according to their end (but range tombstone would still be stored according to their start bound). As it turns out, mutation::consume, when called with legacy_half_reverse option produces invalid fragment stream, one where all the row tombstone changes come after all the clustering rows. This was not an issue, since when constructing results from the query, Scylla would not pass the tombstones to the client, but instead compact data beforehand. In this commit, the consume_in_reverse::legacy_half_reverse is removed, along with all the uses. As for the swap out in mutation_partition.cc in query_mutation and to_data_query_result: The downstream was not prepared to deal with legacy_half_reverse. mutation::consume contains ``` if (reverse == consume_in_reverse::yes) { while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) { co_await yield(); } } else { while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) { co_await yield(); } } ``` So why did it work at all? to_data_query_result deals with a single slice. The used consumer (compact_for_query_v2) compacts-away the range tombstone changes, and thus the only difference between the consume_in_reverse::no and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys and the second one was ordered decreasing. This property is maintained if we swap out for the consume_in_reverse::yes format.	2023-01-05 18:48:55 +01:00
Michał Jadwiszczak	83bb77b8bb	test/boost/cql_query_test: enable `parallelized_aggregation` Run tests for parallelized aggregation with `enable_parallelized_aggregation` set always to true, so the tests work even if the default value of the option is false. Closes #12409	2023-01-04 10:11:25 +02:00
Avi Kivity	2739ac66ed	treewide: drop cql_serialization_format Now that we don't accept cql protocol version 1 or 2, we can drop cql_serialization format everywhere, except when in the IDL (since it's part of the inter-node protocol). A few functions had duplicate versions, one with and one without a cql_serialization_format parameter. They are deduplicated. Care is taken that `partition_slice`, which communicates the cql_serialization_format across nodes, still presents a valid cql_serialization_format to other nodes when transmitting itself and rejects protocol 1 and 2 serialization\ format when receiving. The IDL is unchanged. One test checking the 16-bit serialization format is removed.	2023-01-03 19:54:13 +02:00
Raphael S. Carvalho	b4e4bbd64a	database_test: Reduce x_log2_compaction_group values to avoid timeout database_test in timing out because it's having to run the tests calling do_with_cql_env_and_compaction_groups 3x, one for each compaction group setting. reduce it to 2 settings instead of 3 if running in debug mode. Refs #12396. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12421	2023-01-01 13:56:18 +02:00
Raphael S. Carvalho	e7380bea65	test: mutation_test: Test multiple compaction groups Extends mutation_test to run the tests with more than one compaction group, in addition to a single one (default). Piggyback on existing tests. Avoids duplication. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 12:36:07 -03:00
Raphael S. Carvalho	e3e7c3c7e5	test: database_test: Test multiple compaction groups Extends database_test to run the tests with more than one compaction group, in addition to a single one (default). Piggyback on existing tests. Avoids duplication. Caught a bug when snapshotting, in implementation of table::can_flush(), showing its usefulness. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 12:36:07 -03:00
Raphael S. Carvalho	e103e41c76	test: database_test: Adapt it to compaction groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 12:36:05 -03:00
Raphael S. Carvalho	c807e61715	test: sstable_test: Stop referencing single compaction group Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:20 -03:00
Raphael S. Carvalho	ef8f542d75	replica: Adapt table::active_memtable() to compaction groups active_memtable() was fine to a single group, but with multiple groups, there will be one active memtable per group. Let's change the interface to reflect that. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:14 -03:00
Avi Kivity	7c7eb81a66	Merge 'Encapsulate filesystem access by sstable into filesystem_storage subsclass' from Pavel Emelyanov This is to define the API sstable needs from underlying storage. When implementing object-storage backend it will need to implement those. The API looks like future<> snapshot(const sstable& sst, sstring dir, absolute_path abs) const; future<> quarantine(const sstable& sst, delayed_commit_changes* delay); future<> move(const sstable& sst, sstring new_dir, generation_type generation, delayed_commit_changes* delay); void open(sstable& sst, const io_priority_class& pc); // runs in async context future<> wipe(const sstable& sst) noexcept; future<file> open_component(const sstable& sst, component_type type, open_flags flags, file_open_options options, bool check_integrity); It doesn't have "list" or alike, because it's not a method of an individual sstable, but rather the one from sstables_manager. It will come as separate PR. Closes #12217 * github.com:scylladb/scylladb: sstable, storage: Mark dir/temp_dir private sstable: Remove get_dir() (well, almost) sstable: Add quarantine() method to storage sstable: Use absolute/relative path marking for snapshot() sstable: Remove temp_... stuff from sstable sstable: Move open_component() on storage sstable: Mark rename_new_sstable_component_file() const sstable: Print filename(type) on open-component error sstable: Reorganize new_sstable_component_file() sstable: Mark filename() private sstable: Introduce index_filename() tests: Disclosure private filename() calls sstable: Move wipe_storage() on storage sstable: Remove temp dir in wipe_storage() sstable: Move unlink parts into wipe_storage sstable: Remove get_temp_dir() sstable: Move write_toc() to storage sstable: Shuffle open_sstable() sstable: Move touch_temp_dir() to storage sstable: Move move() to storage sstable: Move create_links() to storage sstable: Move seal_sstable() to storage sstable: Tossing internals of seal_sstable() sstable: Move remove_temp_dir() to storage sstable: Move create_links_common() to storage sstable: Move check_create_links_replay() to storage sstable: Remove one of create_links() overloads sstable: Remove create_links_and_mark_for_removal() sstable: Indentation fix after prevuous patch sstable: Coroutinize create_links_common() sstable: Rename create_links_common()'s "dir" argument sstable: Make mark_for_removal bool_class sstable, table: Add sstable::snapshot() and use in table::take_snapshot sstable: Move _dir and _temp_dir on filesystem_storage sstable: Use sync_directory() method test, sstable: Use component_basename in test sstables: Move read_{digest\|checksum} on sstable	2022-12-18 17:29:35 +02:00
Botond Dénes	8f8284783a	Merge 'Fix handling of non-full clustering keys in the read path' from Tomasz Grabiec This PR fixes several bugs related to handling of non-full clustering keys. One is in trim_clustering_row_ranges_to(), which is broken for non-full keys in reverse mode. It will trim the range to position_in_partition_view::after_key(full_key) instead of position_in_partition_view::before_key(key), hence it will include the key in the resulting range rather than exclude it. Fixes #12180 after_key() was creating a position which is after all keys prefixed by a non-full key, rather than a position which is right after that key. This will issue will be caught by cql_query_test::test_compact_storage in debug mode when mutation_partition_v2 merging starts inserting sentinels at position after_key() on preemption. It probably already causes problems for such keys as after_key() is used in various parts in the read path. Refs #1446 Closes #12234 * github.com:scylladb/scylladb: position_in_partition: Make after_key() work with non-full keys position_in_partition: Introduce before_key(position_in_partition_view) db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order types: Fix comparison of frozen sets with empty values	2022-12-15 10:47:12 +02:00
Pavel Emelyanov	a46d378bee	sstable: Remove temp_... stuff from sstable There's a bunch of helpers around XFS-specific temp-dir sitting in publie sstable part. Drop it altogether, no code needs it for real. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	bbbbd6dbfc	tests: Disclosure private filename() calls The sstable::filename() is going to become private method. Lots of tests call it, but tests do call a lot of other sstable private methods, that's OK. Make the sstable::filename() yet another one of that kind in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Pavel Emelyanov	18f6165993	sstable: Move create_links() to storage This method is currently used in two places: sstable::snapshot() and sstable::seal_sstable(). The latter additionally touches the target backup/ subdir. This patch moves the whole thing on storage and adds touch for all the cases. For snapshots this might be excessive, but harmless. Tests get their private-disclosure way to access sstable._storage in few places to call create_links directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:45 +03:00
Pavel Emelyanov	e934f42402	test, sstable: Use component_basename in test One case gets full sstable datafile path to get the basename from it. There's already the basename helper on the class sstable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:13:44 +03:00
Pavel Emelyanov	d561495f0d	Merge 'topology: get rid of pending state' from Benny Halevy Now, with `a44ca06906`, is_normal_token_owner that replaced is_member does not rely anymore on the pending status of endpoints in topology. With that we can get rid of this state and just keep all endpoints we know about in the topology. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12294 * github.com:scylladb/scylladb: topology: get rid of pending state topology: debug log update and remove endpoint	2022-12-14 19:28:35 +03:00
Tomasz Grabiec	23e4c83155	position_in_partition: Make after_key() work with non-full keys This fixes a long standing bug related to handling of non-full clustering keys, issue #1446. after_key() was creating a position which is after all keys prefixed by a non-full key, rather than a position which is right after that key. This will issue will be caught by cql_query_test::test_compact_storage in debug mode when mutation_partition_v2 merging starts inserting sentinels at position after_key() on preemption. It probably already causes problems for such keys.	2022-12-14 14:47:33 +01:00
Avi Kivity	3fa230fee4	Merge 'cql3: expr: make it possible to prepare and evaluate conjunctions' from Jan Ciołek This PR implements two things: * Getting the value of a conjunction of elements separated by `AND` using `expr::evaluate` * Preparing conjunctions using `prepare_expression` --- `NULL` is treated as an "unkown value" - maybe `true` maybe `false`. `TRUE AND NULL` evaluates to `NULL` because it might be `true` but also might be `false`. `FALSE AND NULL` evaluates to `FALSE` because no matter what value `NULL` acts as, the result will still be `FALSE`. Unset and empty values are not allowed. Usually in CQL the rule is that when `NULL` occurs in an operation the whole expression becomes `NULL`, but here we decided to deviate from this behavior. Treating `NULL` as an "unkown value" is the standard SQL way of handing `NULLs` in conjunctions. It works this way in MySQL and Postgres so we do it this way as well. The evaluation short-circuits. Once `FALSE` is encountered the function returns `FALSE` immediately without evaluating any further elements. It works this way in Postgres as well, for example: `SELECT true AND NULL AND 1/0 = 0` will throw a division by zero error, but `SELECT false AND 1/0 = 0` will successfully evaluate to `FALSE`. Closes #12300 * github.com:scylladb/scylladb: expr_test: add unit tests for prepare_expression(conjunction) cql3: expr: make it possible to prepare conjunctions expr_test: add tests for evaluate(conjunction) cql3: expr: make it possible to evaluate conjunctions	2022-12-14 09:48:26 +02:00
Jan Ciolek	9afa9f0e50	expr_test: add unit tests for prepare_expression(conjunction) Add unit tests which ensure that preparing conjunctions works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-12-13 20:23:17 +01:00
Jan Ciolek	5f5b1c4701	expr_test: add tests for evaluate(conjunction) Add unit tests which ensure that evaluating a conjunction behaves as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-12-13 20:23:17 +01:00
Benny Halevy	68141d0aac	topology: get rid of pending state Now, with `a44ca06906`, is_normal_token_owner that replaced is_member does not rely anymore on the pending status of endpoints in topology. With that we can get rid of this state and just keep all endpoints we know about in the topology. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-13 14:17:18 +02:00
Avi Kivity	e6ffc22053	Merge 'cql3: Server-side DESC statement' from Michał Jadwiszczak This PR adds server-side `DESCRIBE` statement, which is required in latest cqlsh version. The only change from the user perspective is the `DESC ...` statement can be used with cqlsh version >= 6.0. Previously the statement was executed from client side, but starting with Cassandra 4.0 and cqlsh 6.0, execution of describe was moved to server side, so the user was unable to do `DESC ...` with Scylla and cqlsh 6.0. Implemented describe statements: - `DESC CLUSTER` - `DESC [FULL] SCHEMA` - `DESC [ONLY] KEYSPACE` - `DESC KEYSPACES/TYPES/FUNCTIONS/AGGREGATES/TABLES` - `DESC TYPE/FUNCTION/AGGREGATE/MATERIALIZED VIEW/INDEX/TABLE` - `DESC` [Cassandra's implementation for reference](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/DescribeStatement.java) Changes in this patch: - cql3::util: added `single_quite()` function - added `data_dictionary::keyspace_element` interface - implemented `data_dictionary::keyspace_element` for: - keyspace_metadata, - UDT, UDF, UDA - schema - cql3::functions: added `get_user_functions()` and `get_user_aggregates()` to get all UDFs/UDAs in specified keyspace - data_dictionary::user_types_metadata: added `has_type()` function - extracted `describe_ring()` from storage_service to standalone helper function in `locator/util.hh` - storage_proxy: added `describe_ring()` (implemented using helper function mentioned above) - extended CQL grammar to handle describe statement - increased version in `version.hh` to 4.0.0, so cqlsh will use server-side describe statement Referring: https://github.com/scylladb/scylla/issues/9571, https://github.com/scylladb/scylladb/issues/11475 Closes #11106 * github.com:scylladb/scylladb: version: Increasing version cql-pytest: Add tests for server-side describe statement cql-pytest: creating random elements for describe's tests cql3: Extend CQL grammar with server-side describe statement cql3:statements: server-side describe statement data_dictonary: add `get_all_keyspaces()` and `get_user_keyspaces()` storage_proxy: add `describe_ring()` method storage_service, locator: extract describe_ring() data_dictionary:user_types_metadata: add has_type() function cql3:functions: `get_user_functions()` and `get_user_aggregates()` implement `keyspace_element` interface data_dictionary: add `keyspace_element` interface cql3: single_quote() util function view: row_lock: lock_ck: reindent test/topology: enable replace tests service/raft: report an error when Raft ID can't be found in `raft_group0::remove_from_group0` service: handle replace correctly with Raft enabled gms/gossiper: fetch RAFT_SERVER_ID during shadow round service: storage_service: sleep 2*ring_delay instead of BROADCAST_INTERVAL before replace	2022-12-11 18:29:36 +02:00
Michał Jadwiszczak	29ad5a08a8	implement `keyspace_element` interface This patch implements `data_dictionary::keyspace_element` interfece in: `keyspace_metadata`, `user_type_impl`, `user_function`, `user_aggregate` and schema.	2022-12-10 12:34:09 +01:00
Pavel Emelyanov	6075e01312	test/lib: Remove sstable_utils.hh from simple_schema.hh The latter is pretty popular test/lib header that disseminates the former one over whole lot of unit tests. The former, in turn, naturally includes sstables.hh thus making tons of unrelated tests depend on sstables class unused by them. However, simple removal doesn't work, becase of local_shard_only bool class definition in sstable_utils.hh used in simple_schema.hh. This thing, in turn, is used in keys making helpers that don't belong to sstable utils, so these are moved into simple_schema as well. When done, this affects the mutation_source_test.hh, which needs the local_shard_only bool class (and helps spreading the sstables.hh throughout more unrelated tests) and a bunch of .cc test sources that used sstable_utils.hh to indirectly include various headers of their demand. After patching, sstables.hh touches 2x times less tests. As a side effect the sstables_manager.hh also becomes 2x times less dependent on by tests. Continuation of `9bdea110a6` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12240	2022-12-08 15:37:33 +02:00
Tomasz Grabiec	536c0ab194	db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order trim_clustering_row_ranges_to() is broken for non-full keys in reverse mode. It will trim the range to position_in_partition_view::after_key(full_key) instead of position_in_partition_view::before_key(key), hence it will include the key in the resulting range rather than exclude it. Fixes #12180 Refs #1446	2022-12-08 13:41:28 +01:00
Tomasz Grabiec	232ce699ab	types: Fix comparison of frozen sets with empty values A frozen set can be part of the clustering key, and with compact storage, the corresponding key component can have an empty value. Comparison was not prepared for this, the iterator attempts to deserialize the item count and will fail if the value is empty. Fixes #12242	2022-12-08 13:41:11 +01:00
Nadav Har'El	4cdaba778d	Merge 'Secondary indexes on static columns' from Piotr Dulikowski This pull request introduces support for global secondary indexes based on static columns. Local secondary indexes based on secondary columns are not planned to be supported and are explicitly forbidden. Because there is only one static row per partition and local indexes require full partition key when querying, such indexes wouldn't be very useful and would only waste resources. The index table for secondary indexes on static columns, unlike other secondary indexes, do not contain clustering keys from the base table. A static column's value determines a set of full partitions, so the clustering keys would only be unnecessary. The already existing logic for querying using secondary indexes works after introducing minimal notifications. The view update generation path now works on a common representation of static and clustering rows, but the new representation allowed to keep most of the logic intact. New cql-pytests are added. All but one of the existing tests for secondary indexes on static columns - ported from Cassandra - now work and have their `xfail` marks lifted; the remaining test requires support for collection indexing, so it will start working only after #2962 is fixed. Materialized view with static rows as a key are __not__ implemented in this PR. Fixes: #2963 Closes #11166 * github.com:scylladb/scylladb: test_materialized_view: verify that static columns are not allowed test_secondary_index: add (currently failing) test for static index paging test_secondary_index: add more tests for secondary indexes on static columns cassandra_tests: enable existing tests for static columns create_index_statement: lift restriction on secondary indexes on static rows db/view: fetch and process static rows when building indexes gms/feature_service: introduce SECONDARY_INDEXES_ON_STATIC_COLUMNS cluster feature create_index_statement: disallow creation of local indexes with static columns select_statement: prepare paging for indexes on static columns select_statement: do not attempt to fetch clustering columns from secondary index's table secondary_index_manager: don't add clustering key columns to index table of static column index replica/table: adjust the view read-before-write to return static rows when needed db/view: process static rows in view_update_builder::on_results db/view: adjust existing view update generation path to use clustering_or_static_row column_computation: adjust to use clustering_or_static_row db/view: add clustering_or_static_row deletable_row: add column_kind parameter to is_live view_info: adjust view_column to accept column_kind db/view: base_dependent_view_info: split non-pk columns into regular and static	2022-12-08 09:54:05 +02:00
Avi Kivity	444de2831e	dirty_memory_manager: move to replica module It's a replica-side thing, so move it there. The related flush_permit and sstable_write_permit are moved alongside.	2022-12-06 22:24:17 +02:00

1 2 3 4 5 ...

2103 Commits