scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 16:33:35 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	2e3f6a9622	tests: perf_fast_forward: Print outpout directory Message-Id: <20210203180053.230627-1-tgrabiec@scylladb.com>	2021-02-04 10:39:41 +02:00
Tomasz Grabiec	e0ceb454c0	tests: perf_fast_forward: Print error hints to stdout They point to lines printed to stdout, so should be aligned with them. Message-Id: <20210203180016.230547-1-tgrabiec@scylladb.com>	2021-02-04 10:39:41 +02:00
Benny Halevy	ca6f5cb0bc	test: commitlog_test: test_allocation_failure: fill memory using smaller allocations commitlog was changed to use fragmented_temporary_buffer::ostream (db::commitlog::output). So if there are discontiguous small memory blocks, they can be used to satisfy an allocation even if no contiguous memory blocks are available. To prevent that, as Avi suggested, this change allocates in 128K blocks and frees the last one to succeed (so that we won't fail on allocating continuations). Fixes #8028 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210203100333.862036-1-bhalevy@scylladb.com>	2021-02-03 12:21:20 +02:00
Avi Kivity	913d970c64	Merge "Unify inactive readers" from Botond " Currently inactive readers are stored in two different places: * reader concurrency semaphore * querier cache With the latter registering its inactive readers with the former. This is an unnecessarily complex (and possibly surprising) setup that we want to move away from. This series solves this by moving the responsibility if storing of inactive reads solely to the reader concurrency semaphore, including all supported eviction policies. The querier cache is now only responsible for indexing queriers and maintaining relevant stats. This makes the ownership of the inactive readers much more clear, hopefully making Benny's work on introducing close() and abort() a little bit easier. Tests: unit(release, debug:v1) " * 'unify-inactive-readers/v2' of https://github.com/denesb/scylla: reader_concurrency_semaphore: store inactive readers directly querier_cache: store readers in the reader concurrency semaphore directly querier_cache: retire memory based cache eviction querier_cache: delegate expiry to the reader_concurrency_semaphore reader_concurrency_semaphore: introduce ttl for inactive reads querier_cache: use new eviction notify mechanism to maintain stats reader_concurrency_semaphore: add eviction notification facility reader_concurrency_semaphore: extract evict code into method evict()	2021-02-03 10:59:04 +02:00
Tomasz Grabiec	873e732042	Merge "Switch partition rows onto B-tree" from Pavel Emelyanov This is the continuaiton of the row-cache performance improvements, this time -- the rework of clustering keys part. The goal is to solve the same set of problems: - logN eviction complexity - deep and sparse tree Unlike partitions, this cache has one big feature that makes it impossible to just use existing B+ tree: There's no copyable key at hands. The clustering key is the managed_bytes() that is not nothrow-copy-constructibe, neither it's hash-able for lookup due to prefix lookup. Thus the choice is the B-tree, which is also N-ary one, but doesn't copy keys around. B-trees are like B+, but can have key:data pairs in inner nodes, thus those nodes may be significantly bigger then B+ ones, that have data-s only in leaf trees. Not to make the memory footprint worse, the tree assumes that keys and data live on the same object (the rows_entry one), and the tree itself manages only the key pointers. Not to invalidate iterators on insert/remove the tree nodes keep pointers on keys, not the keys themselves. The tree uses tri-compare instead of less-compare. This makes the .find and .lower_bound methods do ~10% less comparisons on random insert/lookup test. Numbers: - memory_footprint: B-tree master rows_entry size: 216 232 1 row in-cache: 968 960 (because of dummy entry) in-memtable: 1006 1022 100 rows in-cache: 50774 50856 in-memtable: 50620 50918 - mutation_test: B-tree master tps.average: 891177 833896 - simple_query: B-tree master tps.median: 71807 71656 tps.maximum: 71847 71708 * xemul/clustering-cache-over-btree-4: mutation_partition: Save one keys comparison partition_snapshot_row_cursor: Remove rows pointer mutation_partition: Use B-tree insertion sugar perf-test : Print B-tree sizes mutation_partition: Switch cache of rows onto B-tree partition_snapshot_reader: Rename cmp to less for explicity mutation_partition: Make insertion bullet-proof mutation_partition: Use tri-compare in non-set places flat_mutation_reader: Use clear() in destroy_current_mutation() rows_entry: Generalize compare utils: Intrusive B-tree (with tests) tests: Generalize bptree compaction test tests: Generalize bptree stress test	2021-02-02 12:26:02 +01:00
Tomasz Grabiec	75eb97b12c	Merge 'Commitlog multi-entry write' from Calle Wilund Fixes #7615 Makes the CL writer interface N-valued (though still 1 for the "old" paths). Adds a new write path to input N mutations -> N rp_handles. Guarantees that all entries are written or none are, and that they will be flushed to disk together. Small test included. Closes #7616 * github.com:scylladb/scylla: commitlog_test: Add multi-entry write test commitlog: Add "add_entries" call to allow inputting N mutations commitlog: Make commitlog entries optionally multi-entry commitlog: Move entry_writer definition to cc file	2021-02-02 12:23:19 +01:00
Tomasz Grabiec	7b17969a6e	Merge 'sstable: reader: preempt after every fragment' from Avi Kivity Whenever we push a fragment, we check whether the buffer is full and return proceed::no if so, so that the state machine pauses and lets the consumer continue. This patch adds an additional condition - if preemption is needed, we also return proceed::no. This drops us back to the outer loop (in sstable_mutation_reader::fill_buffer), which will yield to the reactor as part of seastar::do_until(). Two cases (partition_start and partition_end) did not have the check for is_buffer_full(); it is added now. This can trigger is the partition has no rows. Unlike the previous attempt, push_ready_fragments() is not touched. The extra preemption opportunities triggered a preexisting bug in clustering_ranges_walker; it is fixed in the first patch of the series. I tested this by reading from a large partition with a simple schema (pk int, ck int, primary key(pk, ck)) with BYPASS CACHE. However, even without the patch I only got sporadic stalls with the detector set to 1ms, so it's possible I'm not testing correctly. Test: unit (dev, debug, release) Fixes #7883. Closes #7928 * github.com:scylladb/scylla: sstable: reader: preempt after every fragment clustering_range_walker: fix false discontiguity detected after a static row	2021-02-02 12:21:58 +01:00
Benny Halevy	0fecc78d88	user_function: throw on_internal_error if executed outside a seastar thread Rather than asserting, as seen in #7977. This shouldn't crash the server in production. Add unit test that reproduces this scenario and verifies the internal error exception. Fixes #7977 Test: unit(release) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210201163051.1775536-1-bhalevy@scylladb.com>	2021-02-02 13:03:39 +02:00
Calle Wilund	720a47fe8a	commitlog_test: Add multi-entry write test	2021-02-02 10:41:08 +00:00
Avi Kivity	da4fa0629a	Merge "sstables: add sstable_origin to scylla_metadata" from Benny " This series extends the scylla_metadata sstable component to hold an optional testual description of the sstable origin. It describes where the sstables originated from (e.g. memtable, repair, streaming, compaction, etc.) The origin string is provided by the sstable writer via sstable_writer_config, written to the scylla_metadata component, and loaded on sstable::load(). A get_origin() method was added to class sstable to retrieve its origin. It returns an empty string by default if the origin is missing. Compaction now logs the sstable origin for each sstable it compacts, and it generates the sstable origin for all sstables in generates. Regular compaction origin is simply set to "compaction" while other compaction types are mentioned by name, as "cleanup", "resharding", "reshaping", etc. A unit test was added to test the sstable_origin by writing either an empty origin and a random string, and then comparing the origin retrieved by sstable::load to the one written. Test: unit(release) Fixes #7880 " * tag 'sstable-origin-v2' of github.com:bhalevy/scylla: compaction: log sstable origin sstables: scylla_metadata: add support for sstable_origin sstables: sstable_writer_config: add origin member	2021-02-02 10:35:11 +02:00
Pavel Emelyanov	a92eb2f7a9	perf-test : Print B-tree sizes After the switch from BST to B-tree the memory foorprint includes inner/leaf nodes from the B-tree, so it's useful to know their sizes too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:30:30 +03:00
Pavel Emelyanov	5c0f9a8180	mutation_partition: Switch cache of rows onto B-tree The switch is pretty straightforward, and consists of - change less-compare into tri-compare - rename insert/insert_check into insert_before_hint - use tree::key_grabber in mutation_partition::apply_monotonically to exception-safely transfer a row from one tree to another - explicitly erase the row from tree in rows_entry::on_evicted, there's a O(1) tree::iterator method for this - rewrite rows_entry -> cache_entry transofrmation in the on_evicted to fit the B-tree API - include the B-tree's external memory usage into stats That's it. The number of keys per node was is set to 12 with linear search and linear extention of 20 because - experimenting with tree shows that numbers 8 through 10 keys with linear search show the best performance on stress tests for insert/find-s of keys that are memcmp-able arrays of bytes (which is an approximation of current clustring key compare). More keys work slower, but still better than any bigger value with any type of search up to 64 keys per node - having 12 keys per nodes is the threshold at which the memory footprint for B-tree becomes smaller than for boost::intrusive::set for partitions with 32+ keys - 20 keys for linear root eats the first-split peak and still performs well in linear search As a result the footpring for B tree is bigger than the one for BST only for trees filled with 21...32 keys by 0.1...0.7 bytes per key. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:30:30 +03:00
Pavel Emelyanov	2f7c03d84c	utils: Intrusive B-tree (with tests) The design of the tree goes from the row-cache needs, which are 1. Insert/Remove do not invalidate iterators 2. Elements are LSA-manageable 3. Low key overhead 4. External tri-comparator 5. As little actions on insert/remove as possible With the above the design is Two types of nodes -- inner and leaf. Both types keep pointer on parent nodes and N pointers on keys (not keys themselves). Two differences: inner nodes have array of pointers on kids, leaf nodes keep pointer on the tree (to update left- and rightmost tree pointers on node move). Nodes do not keep pointers/references on trees, thus we have O(1) move of any object, but O(logN) to get the tree size. Fortunately, with big keys-per-node value this won't result in too many steps. In turn, the tree has 3 pointers -- root, left- and rightmost leaves. The latter is for constant-time begin() and end(). Keys are managed by user with the help of embeddable member_hook instance, which is 1 pointer in size. The code was copied from the B+ tree one, then heavily reworked, the internal algorythms turned out to differ quite significantly. For the sake of mutation_partition::apply_monotonically(), which needs to move an element from one tree into another, there's a key_grabber helping wrapper that allows doing this move respecting the exception-safety requirement. As measured by the perf_collections test the B-tree with 8 keys is faster, than the std::set, but slower than the B+tree: vs set vs b+tree fill: +13% -6% find: +23% -35% Another neat thing is that 1-key insertion-removal is ~40% faster than for BST (the same number of allocations, but the key object is smaller, less pointers to set-up and less instructions to execute when linking node with root). v4: - equip insertion methods with on_alloc_point() calls to catch potential exception guarantees violations eariler - add unlink_leftmost_without_rebalance. The method is borrowed from boost intrusive set, and is added to kill two birds -- provide it, as it turns out to be popular, and use a bit faster step-by-step tree destruction than plain begin+erase loop v3: - introduce "inline" root node that is embedded into tree object and in which the 1st key is inserted. This greatly improves the 1-key-tree performance, which is pretty common case for rows cache v2: - introduce "linear" root leaf that grows on demand This improves the memory consumption for small trees. This linear node may and should over-grow the NodeSize parameter. This comes from the fact that there are two big per-key memory spikes on small trees -- 1-key root leaf and the first split, when the tree becomes 1-key root with two half-filled leaves. If the linear extention goes above NodeSize it can flatten even the 2nd peak - mitigate the keys indirection a bit Prefetching the keys while doing the intra-node linear scan and the nodes while descending the tree gives ~+5% of fill and find - generalize stress tests for B and B+ trees - cosmetic changes TODO: - fix few inefficincies in the core code (walks the sub-tree twice sometimes) - try to optimize the leaf nodes, that are not lef-/righmost not to carry unused tree pointer on board Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:30:29 +03:00
Pavel Emelyanov	6d63bdbefe	tests: Generalize bptree compaction test Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:28:59 +03:00
Pavel Emelyanov	8bdad0bb28	tests: Generalize bptree stress test Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:28:57 +03:00
Avi Kivity	7634a90dd2	clustering_range_walker: fix false discontiguity detected after a static row clustering_range_walker detects when we jump from one row range to another. When a static row is included in the query, the constructor sets up the first before/after bounds to be exactly that static row. That creates an artificial range crossing if the first clustering range is contiguous with the static row. This can cause the index to be consulted needlessly if we happen to fall back to sstable_mutation_reader after reading the static row. A unit test is added. Ref #7883.	2021-02-01 19:32:07 +02:00
Pavel Solodovnikov	9d17a654a6	raft: use null_sharder for raft tables Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210201105300.110210-1-pa.solodovnikov@scylladb.com>	2021-02-01 18:52:04 +02:00
Tomasz Grabiec	eac9c1d80a	Merge "raft: configuration changes with joint consensus" from Kostja Support configuration changes based on joint consensus. When a user adds a configuration entry, commit an interim "joint consensus" configuration to the log first, and transition to the final configuration once both C_old and C_new configurations accept the joint entry. Misc cleanups. * scylla-dev/raft-config-changes-v2: raft: update README.md raft: add a simple test for configuration changes raft: joint consensus, wire up configuration changes in the API raft: joint consensus, count votes using joint config raft: joint consensus, wire up configuration changes in FSM raft: joint consensus, update progress tracker with joint configuration raft: joint consensus, don't store configuration in FSM raft: joint consensus, keep track of the last confchange index in the log raft: joint consensus, implement helpers in class configuration raft: joint consensus, use unordered_set for server_address list raft: joint consensus, switch configuration to joint raft: rename check_committed() to maybe_commit() raft: fix spelling and add comments	2021-02-01 18:52:04 +02:00
Benny Halevy	77328a936a	sstables: scylla_metadata: add support for sstable_origin Add new scylla_metadata_type::SSTableOrigin. Store and retrive a sstring to the scylla metadata component. Pass sstable_writer_config::origin from the mx sstable writer and ignore it in the k_l writer. Add unit test to verify the sstable_origin extension using both empty and a random string. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-01 16:45:52 +02:00
Benny Halevy	22f6023ac3	sstables: sstable_writer_config: add origin member Add a string describing where the sstables originated from (e.g. memtable, repair, streaming, compaction, etc.) If configure_writer is called with a nullptr, the origin will be equal to an empty string. Introduce test_env_sstables_manager that provides an overload of configure_writer with no parmeters that calls the base-class' configure_writer with "test" origin. This was to reduce the code churn in this patch and to keep the tests simple. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-01 16:45:52 +02:00
Nadav Har'El	75a4281bff	cql-pytest: test the units supposed to be usable for "duration" type This patch adds a test for the different units which are supposed to be usable for assigning a "duration" type in CQL. It turns out that all documented units are supported correctly except µs (with a unicode mu), so the test reproduces issue #8001. The test xfails on Scylla (because µs is not supported) and passes on Cassandra. Refs: #8001. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210131192220.407481-1-nyh@scylladb.com>	2021-02-01 11:05:10 +01:00
Konstantin Osipov	b7692af8bc	raft: add a simple test for configuration changes Test adding, removing replacing a node. With fix-ups by Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-29 22:07:08 +03:00
Konstantin Osipov	1ca738d9a2	raft: joint consensus, use unordered_set for server_address list	2021-01-29 22:07:07 +03:00
Konstantin Osipov	df944f953c	raft: joint consensus, switch configuration to joint In order to work correctly in transitional configuration, participants must enter it after crashes, restarts and state changes. This means it must be stored in Raft log and snapshot on the leader and followers. This is most easily done if transitional configuration is just a flavour of standard configuration. In FSM, rename _current_config to _configuration, it now contains both current and future configuration at all times.	2021-01-29 22:07:07 +03:00
Gleb Natapov	aad0209b1c	raft: fix spelling and add comments Fix spelling errors in a few comments, improve comments. With fix-ups by Gleb Natapov <gleb@scylladb.com>	2021-01-29 22:07:07 +03:00
Pavel Emelyanov	575c992a35	test: Bring test_apply_monotonically_is_monotonic back to work The idea of the monotonicity checking test is: try to apply one one random partition to another random one sequentually failing allocations. Each time allocation fails (with the bad_alloc exception) -- check the exception guarantee is respected, then apply (!) the very same two partitions to each other. At the end of the test we make sure, that an exception may pop up at any point of application and it will be safe. This idea is flawed currently. When verifying the guarantee the test moves the 2nd partition and leaves it empty for the next loop iteration. So right on the 2nd attempt to apply partitions it becomes a no-op, doesn't fail and no more exceptions arise. Fix by restoring both partitions at the end of each check. Broken since `74db08165d`. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210129153641.5449-1-xemul@scylladb.com>	2021-01-29 18:47:15 +01:00
Tomasz Grabiec	16eb4c6ce2	Merge "raft: system table backed persistency module" from Pavel Solodovnikov This series contains an initial implementation of raft persistency module that uses `raft` system table as the underlying storage model. "system.raft" table will be used as a backend storage for implementing raft persistence module in Scylla. It combines both raft log, persisted vote and term, and snapshot info. The table is partitioned by group id, thus allowing multi-raft operation. The rest of the table structure mirrors the fields of corresponding core raft structures defined in `raft.hh`, such as `raft::log_entry`. The raft table stores the only the latest snapshot id while the actual snapshot will be available in a separate table called `system.raft_snapshots`. The schema of `raft_snapshots` mirrors the fields of `raft::snapshot` structure. IDL definitions are also added for every raft struct so that we automatically provide serialization and deserialization facilities needed both for persistency module and for future RPC implmementation. The first patch is a side-change needed to provide complete serialization/deserialization for `bytes_ostream`, which we need when persisting the raft log in the table (since `data` is a variant containing `raft::command` (aka `bytes_ostream`) among others). `bytes_ostream` was lacking `deserialize` function, which is added in the patch. The second patch provides serializer for `lw_shared_ptr<T>` which will be used for `raft::append_entries`, which has a field with `std::vector<const lw_shared_ptr<raft::log_entry>>` type. There is also a patch to extend `fragmented_temporary_buffer` with a static function `allocate_to_fit` that allocates an instance of the fragmented buffer that has a specified size. Individual fragment size is limited to 128kb. The patch-set also contains the test suite covering basic functionality of the persistency module. * manmanson/raft-api-impl-v11: raft/sys_table_storage: add basic tests for raft_sys_table_storage raft: introduce `raft_sys_table_storage` class utils: add `fragmented_temporary_buffer::allocate_to_fit` raft: add IDL definitions for raft types raft: create `system.raft` and `system.raft_snapshots` tables serializer: add `serializer<lw_shared_ptr<T>>` specialization serializer: add `deserialize` function overload for `bytes_ostream`	2021-01-29 11:40:39 +02:00
Pavel Solodovnikov	e309502c42	raft/sys_table_storage: add basic tests for raft_sys_table_storage The test suite covers the most basic use cases for the system table backed raft persistency module: * store/load vote and term * store/load snapshot * store snapshot with log tail truncation * store/load log entries * log truncation Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-29 02:00:27 +03:00
Kamil Braun	bf115e7d69	schema_tables: put schema tables on shard 0 We use a custom sharder for all schema tables: every table under the `system_schema` keyspace, plus `system.scylla_table_schema_history`. This sharder puts all data on shard 0. To achieve this, we hardcode the sharder in initial schema object definitions. Furthermore - since the sharder is not stored inside schema mutations yet - whenever we deserialize schema objects from mutations, we modify the sharder based on the schema's keyspace and table names. A regression test is added to ensure no one forgets to set the special sharder for newly added schema tables. This test assumes that all newly added schema tables will end up in the `system_schema` keyspace (other tables may go unnoticed, unfortunately). Closes #7947	2021-01-28 13:28:22 +02:00
Avi Kivity	32cdcc0c8b	Merge "sstables: consolidate reader factory methods" from Botond " Currently there are three different methods for creating an sstable reader: * one for single key reads * one for ranged reads * and one nobody uses This patch-set consolidates all these into a single `make_reader()` method, which behind the scenes uses the same logic to dispatch to the right sstable reader constructor that `sstables::as_mutation_source()` uses. This patch-set is part of an effort to clean up the jungle that is the various reader creation methods. The next step is to clean up the sstable_set, which has even more methods. One very sad discovery I made while working on this patch-set is that we still default `mutation_reader::forwarding` to `yes` in the sstable range reader creator method and in the `mutation_source::make_reader()`. I couldn't assume that all callers are passing what they mean as the value for that parameter. I found many sites in tests that create forwardable single partition readers. This is also something we should address soon. Tests: unit(release, debug:v3) " * 'sstables-consolidate-reader-factory-methods-v4' of https://github.com/denesb/scylla: cql_query_test: add unit test covering the non-optimal TWCS sstable read path sstable_mutation_reader: consolidate constructors tests: don't pass temporary ranges to readers sstables: sstable_mutation_reader: remove now unused whole sstable constructor sstables: stats: remove now unused sstable_partition_reads counter sstable: remove read_.row._flat() methods tree-wide: use sstables::make_reader() instead of the read_.row._flat() methods sstables: pass partition_range to create_single_key_sstable_reader() sstables: sstable: add make_reader()	2021-01-28 12:05:06 +02:00
Botond Dénes	1e9ce62ee6	cql_query_test: add unit test covering the non-optimal TWCS sstable read path The sstable read path for TWCS tables takes a different path when the optimized read path cannot be used. This path was found to be not covered at all by unit tests which allowed a trivial use-after-free to slip in. Add a unit test to cover this path as well, so ASAN can catch such bugs in the future.	2021-01-28 11:34:03 +02:00
Konstantin Osipov	b4f875f08e	uuid: reduce code dependency on UUID_gen.hh Do not include UUID_gen.hh in trace_state.hh and lists.hh to reduce header level dependency on it. Message-Id: <20210127173114.725761-2-kostja@scylladb.com>	2021-01-27 20:08:29 +02:00
Botond Dénes	dd26a96e63	tests: don't pass temporary ranges to readers The sstable_mutation_reader, like all other mutation readers expects that the partition-range passed to it is kept alive by its creator for the duration of its lifetime. However, the single-key constructor of the sstable reader was more tolerant, as it only extracted the key from the range, essentially requiring only the key to be kept alive (but not the containing range). Naturally in time some code come to rely on it and ended up passing temporary ranges to the reader. This behaviour will no longer be acceptable as we are about to consolidate the various sstable reader constructors, uniformly requiring that the range is kept alive. So this patch fixes up the tests so they work with this stricter requirement. Only two occurences were found.	2021-01-27 17:38:17 +02:00
Botond Dénes	c3b4e990a2	tree-wide: use sstables::make_reader() instead of the read_.row._flat() methods	2021-01-27 17:38:17 +02:00
Avi Kivity	aec231ba2e	Merge "Unify query paths" from Botond " Currently we have two parallel query paths: * database::query() -> table::query() -> data_query() * mutation::query() The former is used by single partition queries, the latter by range scans, as mutation::query() is used to convert reconcilable_result to query::result (which means it is also used in single partition queries if it triggers read repair). This is a rather unfortunate situation as we have two parallel implementation of the query code, which means they are prone to diverge, and in fact they already have -- more on that later. This patchset aims to remedy this situation by retiring `mutation::query()` and migrating users to an implementation based on the "standard" query path, in other words one using the same building blocks as the `database::query()` path. This means using `compact_mutation` for compacting and `query_result_builder` for result building. These components however were created to work with `flat_mutation_reader`, however introducing a reader into this pipeline would mean that we'd have to make all the related APIs asynchronous, which would cause an insane amount of churn. To avoid this, this patchset adds an API compatible `consume()` method to `mutation`, which can accept a `compact_mutation` instance as-is. This allows an elegant and succinct reimplementation. So far so good. Like mentioned above, the two implementations have diverged in time, or have been different from the start. The difference manifest when calculating digests, more precisely in which tombstones are included in the digest. The retired `mutation::query()` path incorporates only non-purgeable tombstones in the digest. The standard query path however incorporates all tombstones, even those that can be purged. After some scrutiny however this difference proved to be completely theoretical, as the code path where this would matter -- converting reconcilable result to query result -- passes min timestamp as the query time to the compaction, so nothing is compacted and hence the difference has no chance to manifest. This patch-set was motivated by the desire to provide a single solution to #7434, instead of two, one for each path. Tests: unit(release:v2, debug:v2, dev:v3) " * 'unified-query-path/v3' of https://github.com/denesb/scylla: mutation: remove now unused query() and query_compacted() treewide: use query_mutations() instead of mutation::query() mutation_test: test_query_digest: ensure digest is produced consistently mutation_query: introduce query_mutation() mutation_query: to_data_query_result(): migrate to standard query code mutation_query: move to_data_query_result() to mutation_partition.cc mutation: add consume() flat_mutation_reader: move mutation consumer concepts to separate header mutation compactor: query compaction: ignore purgeable tombstones	2021-01-27 15:58:47 +02:00
Nadav Har'El	2113849a2b	cql-pytest: reproducer for toJson() bug with doubles This patch adds a cql-pytest, test_json.py::test_tojson_double(), which reproduces issue #7972 - where toJson() prints some doubles incorrectly - truncated to integers, but some it prints fine (I still don't know why, this will need to be debugged). The test is marked xfail: It fails on Scylla, and passes on Cassandra. Refs #7972. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210127124338.297544-1-nyh@scylladb.com>	2021-01-27 14:00:25 +01:00
Tomasz Grabiec	90f6bb754e	Merge "raft: replication tests: fixes for debug mode" from Alejo The following patches fix issues seen occasionally in debug mode. Notes: - In debug mode there's still the UB nullptr arithmetic warning. * https://github.com/alecco/scylla/tree/raft-ale-tests-07h-wait-propagation: raft: replication test: wait for log propagation raft: replication test: move wait for log to a function raft: replication test: remove unused member raft: replication test: use later() raft: testing: remove election wait time and just yield	2021-01-26 11:14:42 +02:00
Avi Kivity	f58151d191	test: mutation_test: fix initialization order bug with thread local storage test_cell_external_memory_usage uses with_allocator() to observe how some types allocate memory. However, compiler reordering (observed with clang 11 on aarch64) can move the various thread-local CQL type object initialization into the with_allocator() scope; so any managed object allocated as part of this initialization also gets measured, and the test fails. The code movement is legal, as far as I can tell. Fix this by initializing the type object early; use an atomic_thread_fence as an optimization barrier so the compiler doesn't eliminate the or move the early initialization. Closes #7951	2021-01-26 11:14:42 +02:00
Nadav Har'El	356250f720	cql-pytest: tests for fromJson() failing to set tuple elements to null This patch adds a test for trying to set a tuple element to null with fromJson(), which works on Cassandra but fails on Scylla. So the test xfails on Scylla. Reproduces issue #7954. Refs #7954. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210124082311.126300-1-nyh@scylladb.com>	2021-01-26 11:14:42 +02:00
Takuya ASADA	7a6ee9858f	redis: fix large message handling If the message is larger than current buffer size, we need to consume more data until we reach to tail of the message. To do so, we need to return nullptr when it's not on the tail. Fixes #7273	2021-01-25 10:26:37 +09:00
Alejo Sanchez	0d694990cf	raft: replication test: wait for log propagation Wait until entries propagate after adding and before changing leader using the same code as done for partitioning. This fixes occasional hangs in debug mode when a test switches to a different leader without leaving enough time for full propagation. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-24 20:33:54 -04:00
Alejo Sanchez	4d1ec88f90	raft: replication test: move wait for log to a function Move wait for log propagation to its own function for reuse. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-24 20:25:48 -04:00
Alejo Sanchez	72f9b108e3	raft: replication test: remove unused member Initial state doesn't need to specify total entries anymore. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-24 20:25:48 -04:00
Alejo Sanchez	db95d6e7f1	raft: replication test: use later() Instead of sleep 1us use later() Also use later to yield after sending append entries in rpc test impl. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-24 20:25:48 -04:00
Benny Halevy	1847d49971	test: test_env: pick the highest sstable version by default If possible, test the highest sstable format version, as it's the mostly used. If there pre-written sstables we need to load from the test directory from an older version, either specify their version explicitly, or use the new test_env::reusable_sst method that looks up the latest sstable version in the given directory and generation. Test: unit(release) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201210161822.2833510-1-bhalevy@scylladb.com>	2021-01-24 10:38:55 +02:00
Botond Dénes	1a3ee71b39	treewide: use query_mutations() instead of mutation::query() We want to retire the latter.	2021-01-22 15:36:37 +02:00
Nadav Har'El	cb9e2ee00a	cql-pytest: tests for fromJson() setting a map<ascii, int> The fromJson() function can take a map JSON and use it to set a map column. However, the specific example of a map<ascii, int> doesn't work in Scylla (it does work in Cassandra). The xfailing tests in this patch demonstrate this. Although the tests use perfectly legal ASCII, scylla fails the fromJson() function, with a misleading error. Refs #7949. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210121233855.100640-1-nyh@scylladb.com>	2021-01-22 14:29:25 +01:00
Botond Dénes	a9d726c7ba	mutation_test: test_query_digest: ensure digest is produced consistently Before we retire the mutation::query() code, expand the digest test to check that the new code replacing it produces identical digest on all possible equivalent mutations.	2021-01-22 15:27:48 +02:00
Kamil Braun	570d15c7bc	multishard_combining_reader: do not use `smp::count` `multishard_combining_reader` currently only works under the assumption that every table uses the same sharder configured using the node's number of shards. But we could potentially specify a different sharder for a chosen table, e.g. one that puts everything on shard 0. Then this assumption will be broken and the reader causes a segfault. Fixes #7945.	2021-01-21 18:28:18 +02:00
Nadav Har'El	328be1ca7c	cql-pytest: tests for fromJson() not accepting empty string as integer When writing to an integer column, Cassandra's fromJson() function allows not just JSON number constants, it also allows a string containing a number. Strings which do not hold a number fail with a FunctionFailure. In particular, the empty string "" is an invalid number, and should fail. The tests in this patch check this for two integer types: int and varint. Curiously, Cassandra and Scylla have opposite bugs here: Scylla fails to recognize the error for varint, while Cassandra fails to recognize the error for int. The tests in this patch reproduce these bugs. The tests demonstrating Scylla's bug are marked xfail, and the tests demonstrating Cassandra's bug is marked "cassandra_bug" (which means it is marked xfail only when running against Cassandra, but expected to succeed on Scylla. Refs #7944. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210121133833.66075-1-nyh@scylladb.com>	2021-01-21 15:24:48 +01:00

1 2 3 4 5 ...

1200 Commits