scylladb

Author	SHA1	Message	Date
Botond Dénes	e8f3d7dd13	sstables/index_reader: short-circuit fast-forward-to when at EOF Attempting to call advance_to() on the index, after it is positioned at EOF, can result in an assert failure, because the operation results in an attempt to move backwards in the index-file (to read the last index page, which was already read). This only happens if the index cache entry belonging to the last index page is evicted, otherwise the advance operation just looks-up said entry and returns it. To prevent this, we add an early return conditioned on eof() to all the partition-level advance-to methods. A regression unit test reproducing the above described crash is also added.	2022-05-05 14:42:37 +03:00
Botond Dénes	98f3d516a2	test/lib/random_schema: add a simpler overload for fixed partition count Some tests want to generate a fixed amount of random partitions, make their life easier.	2022-05-05 14:33:37 +03:00
Calle Wilund	78350a7e1b	cdc: Ensure columns removed from log table are registered as dropped If we are redefining the log table, we need to ensure any dropped columns are registered in "dropped_columns" table, otherwise clients will not be able to read data older than now. Includes unit test. Should probably be backported to all CDC enabled versions. Fixes #10473 Closes #10474	2022-05-04 14:19:39 +02:00
Botond Dénes	4440d4b41a	Merge "De-globalize gossiper" from Pavel Emelyanov " - Alternator gets gossiper for its proxy dependency - Forward service method that takes global gossiper can re-use proxy method (forward -> proxy reference is already there) - Table code is patched to require gossiper argument - Snitch gets a dependency reference on snitch_ptr and some extra care for snitch driver vs snitch-ptr interaction and gossip test - Cql test env should carry gossiper reference on-board - Few places can re-use the existing local gossiper reference - Scylla-gdb needs to get gossiper from debug namespace and needs _not_ to get feature service from gossiper " * 'br-gossiper-deglobal-2' of https://github.com/xemul/scylla: code: De-globalize gossiper scylla-gdb, main: Get feature service without gossiper help test: Use cql-test-env gossiper cql test env: Keep gossiper reference on board code: Use gossiper reference where possible snitch: Use local gossiper in drivers snitch: Keep gossiper reference test: Remove snitch from manual gossip test gossiper: Use container() instead of the global pointer main, cql_test_env: Start snitch later snitch: Move snitch_base::get_endpoint_info() forward service: Re-use proxy's helper with duplicated code table: Don't use global gossiper alternator: Don't use global gossiper	2022-05-03 15:56:07 +03:00
Nadav Har'El	6fb762630b	cql-pytest: translate Cassandra's tests for SELECT operations This is a translation of Cassandra's CQL unit test source file validation/operations/SelectTest.java into our our cql-pytest framework. This large test file includes 78 tests for various types of SELECT operations. Four additional tests require UDF in Java syntax, and were skipped. All 78 tests pass on Cassandra. 25 of the tests fail on Scylla reproducing 3 already known Scylla issues and 8 previously-unknown issues: Previously known issues: Refs #2962: Collection column indexing Refs #4244: Add support for mixing token, multi- and single-column restrictions Refs #8627: Cleanly reject updates with indexed values where value > 64k Newly-discovered issues: Refs #10354: SELECT DISTINCT should allow filter on static columns, not just partition keys Refs #10357: Spurious static row returned from query with filtering, despite not matching filter Refs #10358: Comparison with UNSET_VALUE should produce an error Refs #10359: "CONTAINS NULL" and "CONTAINS KEY NULL" restrictions should match nothing Refs #10361: Null or UNSET_VALUE subscript should generate an invalid request error Refs #10366: Enforce Key-length limits during SELECT Refs #10443: SELECT with IN and ORDER BY orders rows per partition instead of for the entire response Refs #10448: The CQL token() function should validate its parameters Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #10449	2022-05-03 11:45:05 +03:00
Pavel Emelyanov	e80adbade3	code: De-globalize gossiper No code uses global gossiper instance, it can be removed. The main and cql-test-env code now have their own real local instances. This change also requires adding the debug:: pointer and fixing the scylle-gdb.py to find the correct global location. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	b25fc29801	test: Use cql-test-env gossiper There's yet another -test-env -- the alternator- one -- which needs gossiper. It now uses global reference, but can grab gossiper reference from the cql-test-env which partitipates in initialization. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	b0544ba7bd	cql test env: Keep gossiper reference on board The reference is already available at the env initialization, but it's not kept on the env instance itself. Will be used by the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	4bea0b7491	code: Use gossiper reference where possible Some places in the code has function-local gossiper reference but continue to use global instance. Re-use the local reference (it's going to become sharded<> instance soon). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	38c77d0d85	snitch: Keep gossiper reference The reference is put on the snitch_ptr because this is the sharded<> thing and because gossiper reference is the same for different snitch drivers. Also, getting gossiper from snitch_ptr by driver will look simpler than getting it from any base class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	52fc4d6b22	test: Remove snitch from manual gossip test It's not in use out there Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:40 +03:00
Pavel Emelyanov	2d32c47d0d	main, cql_test_env: Start snitch later Snitch depends on gossiper and system keyspace, so it needs to be started after those two do. fixes #10402 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:57:32 +03:00
Botond Dénes	53c66fe24a	Merge "Make LCS reshape and major more efficient by picking the ideal output level" from Raphael S. Carvalho " Today, both operations are picking the highest level as the ideal level for placing the output, but the size of input should be used instead. The formula for calculating the ideal level is: ceil(log base(fan_out) of (total_input_size / max_fragment_size)) where fan_out = 10 by default, total_input_size = total size of input data and max_fragment_size = maximum size for fragment (160M by default) such that 20 fragments will be placed at level 2, as level 1 capacity is 10 fragments only. By placing the output in the incorrect level, tons of backlog will be generated for LCS because it will either have to promote or demote fragments until the levels are properly balanced. " * 'optimize_lcs_major_and_reshape/v2' of https://github.com/raphaelsc/scylla: compaction: LCS: avoid needless work post major compaction completion compaction: LCS: avoid needless work post reshape completion compaction: LCS: extract calculation of ideal level for input compaction: LCS: Fix off-by-one in formula used to calculate ideal level	2022-05-02 10:16:09 +03:00
Avi Kivity	5169ce40ef	Merge 'loading_cache: force minimum size of unprivileged ' from Piotr Grabowski This series enforces a minimum size of the unprivileged section when performing `shrink()` operation. When the cache is shrunk, we still drop entries first from unprivileged section (as before this commit), however, if this section is already small (smaller than `max_size / 2`), we will drop entries from the privileged section. This is necessary, as before this change the unprivileged section could be starved. For example if the cache could store at most 50 entries and there are 49 entries in privileged section, after adding 5 entries (that would go to unprivileged section) 4 of them would get evicted and only the 5th one would stay. This caused problems with BATCH statements where all prepared statements in the batch have to stay in cache at the same time for the batch to correctly execute. To correctly check if the unprivileged section might get too small after dropping an entry, `_current_size` variable, which tracked the overall size of cache, is changed to two variables: `_unprivileged_section_size` and `_privileged_section_size`, tracking section sizes separately. New tests are added to check this new behavior and bookkeeping of the section sizes. A test is added, that sets up a CQL environment with a very small prepared statement cache, reproduces issue in #10440 and stresses the cache. Fixes #10440. Closes #10456 * github.com:scylladb/scylla: loading_cache_test: test prepared stmts cache loading_cache: force minimum size of unprivileged loading_cache: extract dropping entries to lambdas loading_cache: separately track size of sections loading_cache: fix typo in 'privileged'	2022-05-01 19:36:35 +03:00
Eliran Sinvani	e0c7178e75	query_processor: remove default internal query caching behavior When executing internal queries, it is important that the developer will decide if to cache the query internally or not since internal queries are cached indefinitely. Also important is that the programmer will be aware if caching is going to happen or not. The code contained two "groups" of `query_processor::execute_internal`, one group has caching by default and the other doesn't. Here we add overloads to eliminate default values for caching behaviour, forcing an explicit parameter for the caching values. All the call sites were changed to reflect the original caching default that was there. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2022-05-01 08:33:55 +03:00
Avi Kivity	7f1e368e92	Merge 'replica/database: drop_column_family(): properly cleanup stale querier cache entries' from Botond Dénes Said method has to evict all querier cache entries, belonging to the to-be-dropped table. This is already the case, but there was a window where new entries could sneak in, causing a stale reference to the table to be de-referenced later when they are evicted due to TTL. This window is now closed, the entries are evicted after the method has waited for all ongoing operations on said table to stop. Fixes: #10450 Closes #10451 * github.com:scylladb/scylla: replica/database: drop_column_family(): drop querier cache entries after waiting for ops replica/database: finish coroutinizing drop_column_family() replica/database: make remove(const column_family&) private	2022-04-29 22:06:51 +03:00
Tomasz Grabiec	dbef83af71	Merge 'raft: fix startup hangs' from Kamil Braun Fix hangs on Scylla node startup with Raft enabled that were caused by: - a deadlock when enabling the USES_RAFT feature, - a non-voter server forgetting who the leader is and not being able to forward a `modify_config` entry to become a voter. Read the commit messages for details. Fixes: #10379 Refs: #10355 Closes #10380 * github.com:scylladb/scylla: raft: actively search for a leader if it is not known for a tick duration raft: server: return immediately from `wait_for_leader` if leader is known service: raft: don't support/advertise USES_RAFT feature	2022-04-29 19:47:10 +02:00
Piotr Grabowski	6537dc6126	loading_cache_test: test prepared stmts cache Add a new test that sets up a CQL environment with a very small prepared statements cache. The test reproduces a scenario described in #10440, where a privileged section of prepared statement cache gets large and that could possibly starve the unprivileged section, making it impossible to execute BATCH statements. Additionally, at the end of the test, prepared statements/"simulated batches" with prepared statements are executed a random number of times, stressing the cache. To create a CQL environment with small prepared cache, cql_test_config is extended to allow setting custom memory_config value.	2022-04-29 19:22:55 +02:00
Piotr Grabowski	3f2224a47f	loading_cache: force minimum size of unprivileged This patch enforces a minimum size of unprivileged section when performing shrink() operation. When the cache is shrank, we still drop entries first from unprivileged section (as before this commit), however if this section is already small (smaller than max_size / 2), we will drop entries from the privileged section. For example if the cache could store at most 50 entries and there are 49 entries in privileged section, after adding 5 entries (that would go to unprivileged section) 4 of them would get evicted and only the 5th one would stay. This caused problems with BATCH statements where all prepared statements in the batch have to stay in cache at the same time for the batch to correctly execute. New tests are added to check this behavior and bookkeeping of section sizes. Fixes #10440.	2022-04-29 19:19:04 +02:00
Raphael S. Carvalho	736c96cc6f	compaction: LCS: avoid needless work post major compaction completion That's done by picking the ideal level for the input, such that LCS won't have to either promote or demote data, because the output level is not the best candidate for having the size of the output data. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-04-28 20:19:28 -03:00
Avi Kivity	aee94b7176	Merge "Convert remaining mutation sources to v2" from Botond " After the recent conversion of the row-cache, two v1 mutation sources remained: the memtable and the kl sstable reader. This series converts both to a native v2 implementation. The conversion is shallow: both continue to read and process the underlying (v1) data in v1, the fragments are converted to v2 right before being pushed to the reader's buffer. This conversion is simple, surgical and low-risk. It is also better than the upgrade_to_v2() used previously. Following this, the remaining v1 reader implementations are removed, with the exception of the downgrade_to_v1(), which is the only one left at this point. Removing this requires converting all mutation sinks to accept a v2 stream. upgrade_to_v2() is now not used in any production code. It is still needed to properly test downgrade_to_v1() (which is till used), so we can't remove it yet. Instead it hidden as a private method of mutation_source. This still allows for the above mentioned testing to continue, while preventing anyone from being tempted to introduce new usage. tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/191 " * 'convert-remaining-v1-mutation-sources/v2' of https://github.com/denesb/scylla: readers: make upgrade_to_v2() private test/lib/mutation_source_test: remove upgrade_to_v2 tests readers: remove v1 forwardable reader readers: remove v1 empty_reader readers: remove v1 delegating_reader sstables/kl: make reader impl v2 native sstables/kl: return v2 reader from factory methods sstables: move mp_row_consumer_reader_k_l to kl/reader.cc partition_snapshot_reader: convert implementation to native v2 mutation_fragment_v2: range_tombstone_change: add minimal_memory_usage()	2022-04-28 20:31:23 +03:00
Nadav Har'El	ad7cc71748	Merge 'sstables: Fix deletion of partial SSTables' from Raphael "Raph" Carvalho If SSTable write fails, it will leave a partial sst which contains a temporary TOC in addition to other components partially written. temporary TOC content is written upfront, to allow us from deleting all partial components using the former content if write fails. After commit `e5fc4b6`, partial sst cannot be deleted because it is incorrectly assuming all files being deleted unconditionally has TOC, but that's not true for partial files that need to be removed. The consequence of this is that space of partial files cannot be reclaimed, making it worse for Scylla to recover from ENOSPC, which could happen by selecting a set of files for compaction with higher chance of suceeeding given the free space. Let's fix this by taking into account temp TOC for partial files. Fixes #10410. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #10411 * github.com:scylladb/scylla: sstables: Fix deletion of partial SSTables sstables: Fix fsync_directory() sstables: Rename dirname() to a more descriptive name	2022-04-28 16:35:46 +03:00
Botond Dénes	272da51f80	test/lib/mutation_source_test: remove upgrade_to_v2 tests We don't have any upgrade_to_v2() left in production code, so no need to keep testing it. Removing it from this test paves the way for removing it for good (not in this series).	2022-04-28 14:12:24 +03:00
Botond Dénes	7420fb9411	readers: remove v1 forwardable reader No users.	2022-04-28 14:12:24 +03:00
Botond Dénes	f527956cdb	readers: remove v1 empty_reader The only user is row level repair: it is replaced with downgrade_to_v1(make_empty_flat_reader_v2()). The row level reader has lots of downgrade_to_v1() calls, we will deal with these later all at once. Another use is the empty mutation source, this is trivially converted to use the v2 variant.	2022-04-28 14:12:24 +03:00
Botond Dénes	ea37e9c04e	readers: remove v1 delegating_reader The only user is a test, which is hereby converted to use the v2 delegating reader.	2022-04-28 14:12:24 +03:00
Botond Dénes	024ceec61e	replica/database: drop_column_family(): drop querier cache entries after waiting for ops Reads (part of operations) running concurrent to `drop_column_family()` can create querier cache entries while we wait for them to finish in `await_pending_ops()`. Move the cache entry eviction to after this, to ensure such entries are also cleaned up before destroying the table object. This moves the `_querier_cache.evict_all_for_table()` from `database::remove()` to `database::drop_column_family()`. With that the former doesn't have to return `future<>` anymore. While at it (changing the signature) also rename `column_family` -> `table`. Also add a regression unit test.	2022-04-28 13:40:13 +03:00
Benny Halevy	e88871f4ec	replica: database: move shard_of implementation to mutation layer We don't need the database to determine the shard of the mutation, only its schema. So move the implementation to the respecive definitions of mutation and frozen_mutation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10430	2022-04-27 14:40:24 +03:00
Nadav Har'El	f6ce7891a5	test/alternator: add test for key length limits DynamoDB limits partition-key length to 2048 bytes and sort-key length to 1024 bytes. Alternator currently has no such limits officially, but if a user tries a key length of over 64 KB, the result will be an "internal server error" as Alternator runs into Scylla's low-level key length limit of 64 KB. In this patch we add (mostly xfailing) tests confirming all the above observations. The tests include extensive comments on what they are testing and why. Some of these tests (specifically, the ones checking what happens above 64 KB) should pass once Alternator is fixed. Other tests - requiring that the limits be exactly what they are in DynamoDB - may either not pass or change in the future, depending on what we decide the limits should be in Alternator. Refs #10347 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #10438	2022-04-26 18:09:19 +02:00
Raphael S. Carvalho	791403e4bb	sstables: Fix deletion of partial SSTables If SSTable write fails, it will leave a partial sst which contains a temporary TOC in addition to other components partially written. temporary TOC content is written upfront, to allow us from deleting all partial components using the former content if write fails. After commit `e5fc4b6`, partial sst cannot be deleted because deletion procedure is incorrectly assuming all SSTs being deleted unconditionally have TOC, but partial SSTs only have TMP TOC instead. That happens because parent_path() requires all path components to exist due to its usage of fs::path::canonical. The consequence of this is that space of partial files cannot be reclaimed, making it worse for Scylla to recover from ENOSPC, which could happen by selecting a set of files for compaction with higher chance of suceeeding given the free space. This is fixed by only calling parent_path() on TMP TOC, which is guaranteed to exist prior to calling fsync_directory(). Fixes #10410. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-04-26 11:00:27 -03:00
Avi Kivity	582802825a	treewide: use system-#include (angle brackets) for seastar Seastar is an external library from Scylla's point of view so we should use the angle bracket #include style. Most of the source follows this, this patch fixes a few stragglers. Also fix cases of #include which reached out to seastar's directory tree directly, via #include "seastar/include/sesatar/..." to just refer to <seastar/...>. Closes #10433	2022-04-26 14:46:42 +03:00
Gleb Natapov	7f26a8eef5	raft: actively search for a leader if it is not known for a tick duration For a follower to forward requests to a leader the leader must be known. But there may be a situation where a follower does not learn about a leader for a while. This may happen when a node becomes a follower while its log is up-to-date and there are no new entries submitted to raft. In such case the leader will send nothing to the follower and the only way to learn about the current leader is to get a message from it. Until a new entry is added to the raft's log a follower that does not know who the leader is will not be able to add entries. Kind of a deadlock. Note that the problem is specific to our implementation where failure detection is done by an outside module. In vanilla raft a leader sends messages to all followers periodically, so essentially it is never idle. The patch solves this by broadcasting specially crafted append reject to all nodes in the cluster on a tick in case a leader is not known. The leader responds to this message with an empty append request which will cause the node to learn about the leader. For optimisation purposes the patch sends the broadcast only in case there is actually an operation that waits for leader to be known. Fixes #10379	2022-04-25 14:51:22 +02:00
Avi Kivity	728479a6ea	Merge 'Fix map subscript crashes when map or subscript is null' from Nadav Har'El In the filtering expression "WHERE m[?] = 2", our implementation was buggy when either the map, or the subscript, was NULL (and also when the latter was an UNSET_VALUE). Our code ended up dereferencing null objects, yielding bizarre errors when we were lucky, or crashes when we were less lucky - see examples of both in issues #10361, #10399, #10401. The existing test `test_null.py::test_map_subscript_null` reproduced all these bugs sporadically. In this series we improve the test to reproduce the separate bugs separately, and also reproduce additional problems (like the UNSET_VALUE). We then define both `m[NULL]` and `NULL[2]` to result in NULL instead of the existing undefined (and buggy, and crashing) behavior. This new definition is consistent with our usual SQL-inspired tradition that NULL "wins" in expressions - e.g., `NULL < 2` is also defined as resulting in NULL. However, this decision differs from Cassandra, where `m[NULL]` is considered an error but `NULL[2]` is allowed. We believe that making `m[NULL]` be a NULL instead of an error is more consistent, and moreover - necessary if we ever want to support more complicate expressions like `m[a]`, where the column `a` can be NULL for some rows and non-NULL for others, and it doesn't make sense to return an "invalid query" error in the middle of the scan. Fixes #10361 Fixes #10399 Fixes #10401 Closes #10420 * github.com:scylladb/scylla: expressions: don't dereference invalid map subscript in filter expressions: fix invalid dereference in map subscript evaluation test/cql-pytest: improve tests for map subscripts and nulls	2022-04-24 21:16:10 +03:00
Nadav Har'El	fbb2a41246	expressions: don't dereference invalid map subscript in filter If we have the filter expression "WHERE m[?] = 2", the existing code simply assumed that the subscript is an object of the right type. However, while it should indeed be the right type (we already have code that verifies that), there are two more options: It can also be a NULL, or an UNSET_VALUE. Either of these cases causes the existing code to dereference a non-object as an object, leading to bizarre errors (as in issue #10361) or even crashes (as in issue #10399). Cassandra returns a invalid request error in these cases: "Unsupported unset map key for column m" or "Unsupported null map key for column m". We decided to do things differently: * For NULL, we consider m[NULL] to result in NULL - instead of an error. This behavior is more consistent with other expressions that contain null - for example NULL[2] and NULL<2 both result in NULL as well. Moreover, if in the future we allow more complex expressions, such as m[a] (where a is a column), we can find the subscript to be null for some rows and non-null for other rows - and throwing an "invalid query" in the middle of the filtering doesn't make sense. * For UNSET_VALUE, we do consider this an error like Cassandra, and use the same error message as Cassandra. However, the current implementation checks for this error only when the expression is evaluated - not before. It means that if the scan is empty before the filtering, the error will not be reported and we'll silently return an empty result set. We currently consider this ok, but we can also change this in the future by binding the expression only once (today we do it on every evaluation) and validating it once after this binding. Fixes #10361 Fixes #10399 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-04-24 16:05:34 +03:00
Nadav Har'El	808a93d29b	expressions: fix invalid dereference in map subscript evaluation When we have an filter such as "WHERE m[2] = 3" (where m is a map column), if a row had a null value for m, our expression evaluation code incorrectly dereferences an unset optional, and continued processing the result of this dereference which resulted in undefined behavior - sometimes we were lucky enough to get "marshaling error" but other times Scylla crashed. The fix is trivial - just check before dereferencing the optional value of the map. We return null in that case, which means that we consider the result of null[2] to be null. I think this is a reasonable approach and fits our overall approach of making null dominate expressions (e.g., the value of "null < 2" is also null). The test test_filtering.py::test_filtering_null_map_with_subscript, which used to frequently fail with marshaling errors or crashes, now passes every time so its "xfail" mark is removed. Fixes #10417 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-04-24 14:58:56 +03:00
Nadav Har'El	189b8845fe	test/cql-pytest: improve tests for map subscripts and nulls The test test_null.py::test_map_subscript_null turned out to reproduce multiple bugs related to using map subscripts in filtering expressions. One was issue #10361 (m[null] resulted in a bizarre error) or #10399 (m[null] resulted in a crash), and a different issue was #10401 (m[2] resulted in a bizarre error or a crash if m itself was null). Moreover, the same test uncovered different bugs depending how it was run - alone or with other tests - because it was using a shared table. In this patch we introduce two separate tests in test_filtering.py which are designed to reproduce these separate bugs instead of mixing them into one test. The new tests also cover a few more corners which the previous test (which focused on nulls) missed - such as UNSET_VALUE. The two new tests (and the old test_map_subscript_null) pass on Cassandra so still assume that the Cassandra behavior - that m[null] should be an error - is the correct behavior. We may want to change the desired behavior (e.g., to decide that m[null] be null, not an error), and change the tests accordingly later - but for now the tests follow Cassandra's behavior exactly, and pass on Cassandra and fail on Scylla (so are marked xfail). The bugs reproduced by these tests involve randomness or reading uninitialized memory, so these tests sometimes pass, sometimes fail, and sometimes even crash (as reported in #10399 and #10401). So to reproduce these bugs run the tests multiple times. For example: test/cql-pytest/run --count 100 --runxfail test_filtering.py::test_filtering_null_map_with_subscript Refs #10361 Refs #10399 Refs #10401 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-04-24 13:26:26 +03:00
Avi Kivity	8624718983	Merge "row_cache: update reader implementations to v2" from Botond " cache_flat_mutation_reader gets a native v2 implementation. The underlying mutation representation is not changed: range deletions are still stored as v1 range_tombstones in mutation_partition. These are converted to range tombstone changes during reading. This allows for separating the change of a native v2 reader implementation and a native v2 in-memory storage format, enabling the two to be done at separate times and incrementally. This means there is still conversion ingoing when reading from cache and when populating, but when reading from underlying, the stream can now be passed through as-is without conversions. Also, any future v2 related changes to the in-memory storage will now be limited to the cache reader implementation itself. In the process, the non-forwarding reader, whose only user is the cache, is also converted to v2. " Performance results reported by Botond: " build/release/test/perf/perf_simple_query -c1 -m2G --flush -- duration=20 BEFORE median 130421.76 tps ( 71.1 allocs/op, 12.1 tasks/op, 47462 insns/op) median absolute deviation: 319.64 maximum: 131028.33 minimum: 127502.55 AFTER median 133297.41 tps ( 64.1 allocs/op, 12.2 tasks/op, 45406 insns/op) median absolute deviation: 2964.24 maximum: 137581.56 minimum: 123739.4 Getting rid of those upgrade/downgrade was good for allocs and ops. Curiously there is a 0.1 rise in number of tasks though. " * 'row-cache-readers-v2/v1' of https://github.com/denesb/scylla: row_cache: update reader implementations to v2 range_tombstone_change_generator: flush(): add end_of_range readers/nonforwardable: convert to v2 read_context: fix indentation read_context: coroutinize move_to_next_partition() row_cache: cache_entry::read(): return v2 reader row_cache: return v2 readers from make_reader*() readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/	2022-04-23 19:10:43 +03:00
Botond Dénes	5e97fb9fc4	row_cache: update reader implementations to v2 cache_flat_mutation_reader gets a native v2 implementation. The underlying mutation representation is not changed: range deletions are still stored as v1 range_tombstones in mutation_partition. These are converted to range tombstone changes during reading. This allows for separating the change of a native v2 reader implementation and a native v2 in-memory storage format, enabling the two to be done at separate times and incrementally.	2022-04-21 14:57:04 +03:00
Botond Dénes	b061acb668	Merge 'Remove queue reader v1' from Mikołaj Sielużycki The patchset embeds the mutation_fragment upgrading logic from v1 to v2 into the mutation_fragment_queue. This way the mutation fragments coming to the mutation_fragment_queue can be v1, but the underlying query_reader receives mutation_fragment_v2, eliminating the last usage of query_reader (v1). The last commit removes query_reader, query_reader_handle and associated factory functions. tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test) Closes #10371 * github.com:scylladb/scylla: readers: Remove queue_reader v1 and associated code. repair: Make mutation_fragment_queue internally upgrade fragments to v2 repair: Make mutation_fragment_queue::impl a seastar::shared_ptr	2022-04-21 12:34:48 +03:00
Mikołaj Sielużycki	339b60e5b0	repair: Make mutation_fragment_queue internally upgrade fragments to v2	2022-04-20 17:55:58 +02:00
Mikołaj Sielużycki	eeb2b458de	repair: Make mutation_fragment_queue::impl a seastar::shared_ptr It makes mutation_fragment_queue copyable and makes the pointer to pending mutation fragments in next commit stable. This allows moving the mutation_fragment_queue without breaking the underlying upgrading_consumer.	2022-04-20 17:51:58 +02:00
Botond Dénes	0b035c9099	row_cache: return v2 readers from make_reader*() And adjust callers. The factory functions just sprinkle upgrade_to_v2() on returned readers for now. One test in row_cache_test.cc had to be disabled, because the upgrade to v2 wrapper we now have over cache readers doesn't allow it to directly control the reader's buffer size and so the test fails. There is a FIXME left in the test code and the test will be re-enabled once a native v2 reader implementation allows us to get rid of the upgrade wrapper.	2022-04-20 10:59:09 +03:00
Botond Dénes	c3c71b3aa5	readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/ The argument type (v1 or v2 reader) is enough to disambiguate and overloading the v1 method makes a transition to v2 more seamless.	2022-04-20 10:59:09 +03:00
Nadav Har'El	cc40685c28	test/cql-pytest: add test for filtering with IN restriction It turns out that Cassandra does not allow IN restrictions together with filtering, except, curiously, when the restriction is on a clustering key. There is no real reason for this limitation - the error message even says it is not yet supported. Scylla, on the other hand, does support this case. Of course it's not enough that we support it - we need to support it correctly... But we don't have a full regression test that this support is correct - in filtering_test.cc we test it with clustering and regular columns - but not partition key columns. So this patch adds a simple cql-pytest test that this sort of filtering works in Scylla correctly for partition, clustering and regular columns (and also confirms that these cases don't work, yet, on Cassandra). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220420075553.1008062-1-nyh@scylladb.com>	2022-04-20 09:56:22 +02:00
Botond Dénes	75786c42cb	Merge 'Add repair unit tests/v1' from Mikołaj Sielużycki This patch series splits up parts of repair pipeline to allow unit testing various bits of code without having to run full dtest suite. The reason why repair pipeline has no unit tests is that by definition repair requires multiple nodes, while unit test environment works only for a single node. However, it is possible to explicitly define interfaces between various parts of the pipeline, inject dependencies and test them individually. This patch series is focused on taking repair_rows_on_wire (frozen mutation representation of changes coming from another node) and flushing them to an sstable. The commits are split into the following parts: - pulling out classes to separate headers so that they can be included (potentially indirectly) from the test, - pulling out repair_meta::to_repair_rows_list and part of repair_meta::flush_rows_in_working_row_buf so that they can be tested, - refactoring repair_writer so that the actual writing logic can be injected as dependency, - creating the unit test. tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test) Closes #10345 * github.com:scylladb/scylla: repair: Add unit test for flushing repair_rows_on_wire to disk. repair: Extract mutation_fragment_queue and repair_writer::impl interfaces. repair: Make parts of repair_writer interface private. repair: Rename inputs to flush_rows. repair: Make repair_meta::flush_rows a free function. repair: Split flush_rows_in_working_row_buf to two functions and make one static. repair: Rename inputs to to_repair_rows_list. repair: Make to_repair_rows_list a free function. repair: Make repair_meta::to_repair_rows_list a static function repair: Fix indentation in repair_writer. repair: Move repair_writer to separate header. repair: Move repair_row to a separate header. repair: Move repair_sync_boundary to a separate header. repair: Move decorated_key_with_hash to separate header. repair: Move row_repair hashing logic to separate class and file.	2022-04-14 18:17:03 +03:00
Botond Dénes	737cc798ca	Merge "Add flat_mutation_reader_from_mutation_v2" from Benny Halevy " Optimize consuming from a single partition. This gives us significant improvement with single, small mutations, as shown with perf_mutation_readers, compared to the vector-based flat_mutation_reader_from_mutations_v2. These are expected to be common on the write path, and can be optimized for view building. results from: perf_mutation_readers -c1 --random-seed=840478750 (userspace cpu-frequency governer, 2.2GHz) test iterations median mad min max Before: combined.one_row 720118 825.668ns 1.020ns 824.648ns 827.750ns After: combined.one_mutation 881482 751.157ns 0.397ns 750.211ns 751.912ns combined.one_row 843270 756.553ns 0.303ns 755.889ns 757.911ns The grand plan is to follow up with make_flat_mutation_reader_from_frozen_mutation_v2 so that we can read directly from either a mutation or frozen_mutation without having to unfreeze it e.g. in table::push_view_replica_updates. Test: unit(dev) Perf: perf_mutation_readers(release) " * tag 'flat_mutation_reader_from_mutation-v3' of https://github.com/bhalevy/scylla: perf: perf_mutation_readers: add one_mutation case test: mutation_query_test: make make_source static mutation readers: refactor make_flat_mutation_reader_from_mutation*_v2 mutation readers: add make_flat_mutation_reader_from_mutation_v2 readers: delete slice_mutation.hh test: flat_mutation_reader_test: mock_consumer: add debug logging test: flat_mutation_reader_test: mock_consumer: make depth counter signed	2022-04-14 17:23:21 +03:00
Botond Dénes	fa75d58cf0	Merge "Make snitch start/stop code look classical" from Pavel Emelyanov " There's a generic way to start-stop services in scylla, that includes 5 "actions" (some are optional and/or implicit though) service_config cfg = ... sharded<service>.start(cfg) service.invoke_on_all(&service::start) service.invoke_on_all(&service::shutdown) service.invoke_on_all(&servuce::stop) sharded<service>.stop() and most of the service out there conforms to that scheme. Not snitch (spoiler: and not tracing), for which there's a couple of helpers that do all that magic behind the scenes, "configuring" snitch is done with the help of overloaded constructors. The latter is extra complicated with the need to register snitch drivers in class-registry for each constructor overload. Also there's an external shards synchronization on stop. This set brings snitch start/stop code to the described standard: the create/stop helpers are removed, creation acceps the config structure, per-shard start/stop (snitch has no drain for now) happens in the simple invoke-on-all manner. The intended side effect of this change is the ability to add explicit dependencies to snitch (in the future, not in this set). tests: unit(dev) " * 'br-snitch-config' of https://github.com/xemul/scylla: snitch: Remove create_snitch/stop_snitch snitch: Simplify stop (and pause_io) snitch: Move io_is_stopped to property-file driver snitch: Remove init_snitch_obj() snitch: Move instance creation into snitch_ptr constructor snitch: Make config-based construction of all drivers snitch: Declare snitch_ptr peering and rework container() method snitch: Introduce container() method	2022-04-14 16:56:32 +03:00
Benny Halevy	f5ef687acd	perf: perf_mutation_readers: add one_mutation case Measure performance of the single-mutation reader: make_flat_mutation_reader_from_mutation_v2. Comparable to the `one_row` case that consumes the single mutation using the multi-mutatio reader: make_flat_mutation_reader_from_mutations_v2 perf_mutation_readers shows ~20-30% improvement of make_flat_mutation_reader_from_mutation_v2 the same single mutation, just given as a single-item vector to make_flat_mutation_reader_from_mutations_v2. test iterations median mad min max Before: combined.one_row 720118 825.668ns 1.020ns 824.648ns 827.750ns After: combined.one_mutation 881482 751.157ns 0.397ns 750.211ns 751.912ns combined.one_row 843270 756.553ns 0.303ns 755.889ns 757.911ns Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-14 11:39:05 +03:00
Benny Halevy	a4b69fe7b6	test: mutation_query_test: make make_source static No need for it to be public. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-14 11:15:19 +03:00
Benny Halevy	e85241d5b6	mutation readers: add make_flat_mutation_reader_from_mutation_v2 Optimize reading from a single partition. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-14 11:14:43 +03:00

1 2 3 4 5 ...

3039 Commits