scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 20:16:43 +00:00

Author	SHA1	Message	Date
Avi Kivity	77a2b4b520	test: perf: perf_simple_query: add instructions_per_op to the json-result output It's in text output, but `863b49af03` forgot to add it to the machine readable results. Closes #9017	2021-07-27 20:26:19 +02:00
Pavel Emelyanov	b3c89787be	mutation_partition: Return immutable collection for range tombstones Patch the .row_tombstones() to return the range_tombstone_list wrapped into the immutable_collection<> so that callers are guaranteed not to touch the collection itself, but still can modify the tombstones. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-27 20:06:53 +03:00
Pavel Emelyanov	1bf643d4fd	mutation_partition: Pin mutable access to range tombstones Some callers of mutation_partition::row_tomstones() don't want (and shouldn't) modify the list itself, while they may want to modify the tombstones. This patch explicitly locates those that need to modify the collection, because the next patch will return immutable collection for the others. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-27 20:06:53 +03:00
Pavel Emelyanov	05b8cdfd24	mutation_partition: Return immutable collection for rows Patch the .clustered_rows() method to return the btree of rows wrapped into the immutable_collection<> so that callers are guaranteed not to touch the collection itself, but still can modify the elements in it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-27 20:06:53 +03:00
Pavel Emelyanov	e652b03b4e	btree tests: Dont use iterator erase Next patches will mark btree::iterator methods that modify the tree itself as private, so stop using them in tests. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-27 20:06:53 +03:00
Avi Kivity	f86e65b4e7	Merge "Fix quadratic behavior in memtable/row_cache with lots of range tombstones" from Tomasz " This series fixes two issues which cause very poor efficiency of reads when there is a lot of range tombstones per live row in a partition. The first issue is in the row_cache reader. Before the patch, all range tombstones up to the next row were copied into a vector, and then put into the buffer until it's full. This would get quadratic if there is much more range tombstones than fit in a buffer. The fix is to avoid the accumulation of all tombstones in the vector and invoke the callback instead, which stops the iteration as soon as the buffer is full. Fixes #2581. The second, similar issue was in the memtable reader. Tests: - unit (dev) - perf_row_cache_update (release) " * tag 'no-quadratic-rt-in-reads-v1' of github.com:tgrabiec/scylla: test: perf_row_cache_update: Uncomment test case for lots of range tombstones row_cache: Consume range tombstones incrementally partition_snapshot_reader: Avoid quadratic behavior with lots of range tombstones tests: mvcc: Relax monotonicity check range_tombstone_stream: Introduce peek_next()	2021-07-27 14:39:13 +03:00
Avi Kivity	2cca461652	Merge 'sstables: merge row consumer interfaces with implementations' from Wojciech Mitros This patch follows #9002, further reducing the complexity of the sstable readers. The split between row consumer interfaces and implementations has been first added in 2015, and there is no reason to create new implementations anymore. By merging those classes, we achieve a sizeable reduction in sstable reader length and complexity. Refs #7952 Tests: unit(dev) Closes #9073 * github.com:scylladb/scylla: sstables: merge row_consumer into mp_row_consumer_k_l sstables: move kl row_consumer sstables: merge consumer_m into mp_row_consumer_m sstables: move mp_row_consumer_m	2021-07-27 12:23:29 +03:00
Nadav Har'El	8030461a2c	cql-pytest: translate Cassandra's misc. type tests This is a translation of Cassandra's CQL unit test source file validation/entities/TypeTest.java into our our cql-pytest framework. This is a tiny test file, with only four test which apparently didn't find their place in other source files. All four tests pass on Cassandra, and all but one pass on Scylla - the test marked xfail discovered one previously-unknown incompatibility with Cassandra: Refs #9082: DROP TYPE IF EXISTS shouldn't fail on non-existent keyspace Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210726140934.1479443-1-nyh@scylladb.com>	2021-07-27 08:28:16 +03:00
Tomasz Grabiec	7578cef0a4	test: perf_row_cache_update: Uncomment test case for lots of range tombstones	2021-07-26 21:38:00 +02:00
Tomasz Grabiec	0d7b3f9463	tests: mvcc: Relax monotonicity check Consecutive range tombstones can have the same position. They will, in one of the test cases, after the range tombstone merger in partition_snapshot_flat_reader no longer uses range_tombstone_list to merge data form multiple versions, which deoverlaps, but rather merges the streams corresponding to each version, which interleaves range tombstones from different versions.	2021-07-26 17:27:03 +02:00
Nadav Har'El	b503ec36c2	cql-pytest: translate Cassandra's tests for tuples This is a translation of Cassandra's CQL unit test source file validation/entities/TupleTypeTest.java into our our cql-pytest framework. This test file checks has a few tests on various features of tuples. Unfortunately, some of the tests could not be easily translated into Python so were left commented out: Some tests try to send invalid input to the server which the Python driver "helpfully" forbids; Two tests used an external testing library "QuickTheories" and are the only two tests in the Cassandra test suite to use this library - so it's not a worthwhile to translate it to Python. 11 tests remain, all of them pass on Cassandra, and just one fails on Scylla (so marked xfail for now), reproducing one known issue: Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax Actually, += is not supposed to be supported on tuple columns anyway, but should print the appropriate error - not the syntax error we get now as the "+=" feature is not supported at all. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210722201900.1442391-1-nyh@scylladb.com>	2021-07-26 08:20:12 +03:00
Nadav Har'El	ec5e4c338b	cql: fix undefined behavior in timestamp verification Commit `2150c0f7a2` proposed by issue #5619 added a limitation that USING TIMESTAMP cannot be more than 3 days into the future. But the actual code used to check it, timestamp - now > MAX_DIFFERENCE only makes sense for positive timestamps. For negative timestamps, which are allowed in Cassandra, the difference "timestamp - now" might overflow the signed integer and the result is undefined - leading to the undefined-behavior sanitizer to complain as reported in issue #8895. Beyond the sanitizer, in practice, on my test setup, the timestamp -2^63+1 causes such overflow, which causes the above if() to make the nonsensical statement that the timestamp is more than 3 days into the future. This patch assumes that negative timestamps of any magnitude are still allowed (as they are in Cassandra), and fixes the above if() to only check timestamps which are in the future (timestamp > now). We also add a cql-pytest test for negative timestamps, passing on both Cassandra and Scylla (after this patch - it failed before, and also reported sanitizer errors in the debug build). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210621141255.309485-1-nyh@scylladb.com>	2021-07-24 11:01:08 +03:00
Tomasz Grabiec	b044db863f	Merge 'db/virtual_table: Streaming tables for large data + describe_ring example table' from Juliusz Stasiewicz This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data. This PR was created by @jul-stas and @StarostaGit Closes #8961 * github.com:scylladb/scylla: test/boost: run_mutation_source_tests on streaming virtual table system_keyspace: Introduce describe_ring table as virtual_table storage_service: Pass the reference down to system_keyspace endpoint_details: store `_host` as `gms::inet_address` queue_reader: implement next_partition() virtual_tables: Introduce streaming_virtual_table flat_mutation_reader: Add a new filtering reader factory method	2021-07-23 18:05:51 +02:00
Avi Kivity	aaf35b5ac2	Merge "Remove storage-service from transport (and a bit more)" from Pavel E " The cql-server -> storage-service dependency comes from the server's event_notifier which (un)subscribes on the lifecycle events that come from the storage service. To break this link the same trick as with migration manager notifications is used -- the notification engine is split out of the storage service and then is pushed directly into both -- the listeners (to (un)subscribe) and the storage service (to notify). tests: unit(dev), dtest(simple_boot_shutdown, dev) manual({ start/stop, with/without started transport, nodetool enable-/disablebinary } in various combinations, dev) " * 'br-remove-storage-service-from-transport' of https://github.com/xemul/scylla: transport.controller: Brushup cql_server declarations code: Remove storage-service header from irrelevant places storage_service: Remove (unlifecycle) subscribe methods transport: Use local notifier to (un)subscribe server transport: Keep lifecycle notifier sharded reference main: Use local lifecycle notifier to (un)subscribe listeners main, tests: Push notifier through storage service storage_service: Move notification core into dedicated class storage_service: Split lifecycle notification code transport, generic_server: Remove no longer used functionality transport: (Un)Subscribe cql_server::event_notifier from controller tests: Remove storage service from manual gossiper test	2021-07-22 19:27:45 +03:00
Pavel Emelyanov	c39f04fa6f	code: Remove storage-service header from irrelevant places Some .cc files over the code include the storage service for no real need. Drop the header and include (in some) what's really needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:50:19 +03:00
Pavel Emelyanov	8248bc9e33	main, tests: Push notifier through storage service Now it's time to move the lifecycle notifier from storage service to the main's scope. Next patches will remove the $lifecycle-subscriber -> storage_service dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:45:51 +03:00
Pavel Emelyanov	b57fb0aa9a	tests: Remove storage service from manual gossiper test It's not needed there, gossiper starts and works without it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:36:28 +03:00
Piotr Sarna	526ad2a151	Merge 'secondary_index: Fix TOKEN() restrictions in indexed SELECTs' from Jan Ciołek This is a rewrite of an old PR: #7582 `TOKEN()` restrictions don't work properly when a query uses an index. For example this returns both rows: ```cql CREATE TABLE t(pk int, ck int, v int, PRIMARY KEY(pk, ck)); CREATE INDEX ON t(v); INSERT INTO t (pk, ck, v) VALUES (0, 0, 0); INSERT INTO t (pk, ck, v) VALUES (1, 0, 0); SELECT token(pk), pk, ck, v FROM t WHERE v = 0 AND token(pk) = token(0) ALLOW FILTERING; ``` This functionality is supported on both old and new indexes. In old indexes the type of the token column was `blob`. This causes problems, because `blob` representation of tokens is ordered differently. Tokens represented as blobs are ordered like this: ``` 0, 1, 2, 3, 4, 5, ..., bigint_max, bigint_min, ...., -5, -4, -3, -2, -1 ``` Because of that clustering range for `token()` restrictions needs to be translated to two clustering ranges on the `blob` column. To create old indexes disable the feature called: `CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX` or run scylla version from branch [`cvybhu/si-token2-old-index`](https://github.com/cvybhu/scylla/commits/si-token2-old-index) I'm not sure if it's possible to create automatic tests with old indexes. I ran `dev-test` manually on the `si-token2-old-index` branch, and the only tests that failed were the ones testing row ordering. Rows should be ordered by `token`, but because in old indexes the token is represented as a `blob` this ordering breaks. This is a known issue (#7443), that has been fixed by introducing new indexes. To sum up: * `token()` restrictions are fixed on both new and old indexes. * When using old indexes, the rows are not properly ordered by token. * With new indexes the rows are properly ordered by token. Fixes #7043 Closes #9067 * github.com:scylladb/scylla: tests: add secondary index tests with TOKEN clause secondary_index_test: extract test data secondary_index: Fix TOKEN() restrictions in indexed SELECTs expression: Add replace_token function	2021-07-22 10:22:45 +02:00
Wojciech Mitros	1ff72ca0a6	sstables: move kl row_consumer In preparation for the next patch combining row_consumer and mp_row_consumer_k_l, move row_consumer next to row_consumer. Because row_consumer is going to be removed, we retire some old tests for different implementations of the row_consumer interface; as a result, we don't need to expose internal types of kl sstable reader for tests, so all classes from reader_impl.hh are moved to reader.cc, and the reader_impl.hh file is deleted, and the reader.cc file has an analogous structure to the reader.cc file in sstables/mx directory. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-07-21 18:04:22 +02:00
Piotr Grabowski	e06102aed9	tests: add secondary index tests with TOKEN clause Add tests of SELECTs with TOKEN clauses on tables with secondary indexes (both global and local). test_select_with_token_range_cases checks all possible token range combinations (inclusive/exclusive/infinity start/end) on tables without index, with local or with global index. test_select_with_token_range_filtering checks whether TOKEN restrictions combined with column restrictions work properly. As different code paths are taken if index is created on clustering key (first or non-first) or non-primary-key column, the tests checks scenarios when index is created on different columns.	2021-07-21 16:12:55 +02:00
Piotr Grabowski	e2bd1cdb9d	secondary_index_test: extract test data Extract test data to a separate variables, allowing it to be easily reused by other tests. The tokens are hard-coded, because calculating their value brought too much complexity to this code.	2021-07-21 16:12:55 +02:00
Raphael S. Carvalho	e4eb7df1a1	table: Make correctness of concurrent sstable list update robust Today, table relies on row_cache::invalidate() serialization for concurrent sstable list updates to produce correct results. That's very error prone because table is relying on an implementation detail of invalidate() to get things right. Instead, let's make table itself take care of serialization on concurrent updates. To achieve that, sstable_list_builder is introduced. Only one builder can be alive for a given table, so serialization is guaranteed as long as the builder is kept alive throughout the update procedure. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210721001716.210281-1-raphaelsc@scylladb.com>	2021-07-21 16:45:30 +03:00
Juliusz Stasiewicz	38b8a6ce2c	test/boost: run_mutation_source_tests on streaming virtual table Tests that require inter-partition forwarding are excluded.	2021-07-20 14:19:17 +02:00
Juliusz Stasiewicz	f8067d938d	storage_service: Pass the reference down to system_keyspace According to the policy of avoiding globals.	2021-07-20 14:18:24 +02:00
Tomasz Grabiec	50ec3ea295	lsa: Fix misaccunting of used space when allocating lsa_buffers lsa_buffer allocations are aligned to 4K. If smaller size is requested, whole 4K is used. However, only requested size was used in accounting segment occupancy. This can confuse reclaimer which may think the segment is sparse while it is actually dense, and compacting it will yield no or little gain. This can cause inefficient memory reclamation or lack of progress. Refs #9038 Message-Id: <20210720104110.463812-1-tgrabiec@scylladb.com>	2021-07-20 14:08:06 +03:00
Botond Dénes	11b39cbc23	reader_concurrency_semaphore: merge permit_stats into stats If there was any reason to have them separate when permit_stats was conceived, it is gone now, so merge the two. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210720073121.63027-1-bdenes@scylladb.com>	2021-07-20 10:35:12 +03:00
Nadav Har'El	36ec1d792e	Merge 'cql-pytest: Test selecting from indexed table using only clustering key' from Jan Ciołek Add examples from issue #8991 to tests Both of these tests pass on `cassandra 4.0` but fail on `scylla 4.4.3` First test tests that selecting values from indexed table using only clustering key returns correct values. The second test tests that performing this operation requires filtering. The filtering test looks similar to [the one for #7608](`1924e8d2b6/test/cql-pytest/test_allow_filtering.py (L124)`) but there are some differences - here the table has two clustering columns and an index, so it could test different code paths. Contains a quick fix for the `needs_filtering()` function to make these tests pass. It returns `true` for this case and the one described in #7708. This implementation is a bit conservative - it might sometimes return `true` where filtering isn't actually needed, but at least it prevents scylla from returning incorrect results. Fixes #8991. Fixes #7708. Closes #8994 * github.com:scylladb/scylla: cql3: Fix need_filtering on indexed table cql-pytest: Test selecting using only clustering key requires filtering cql-pytest: Test selecting from indexed table using clustering key	2021-07-19 18:23:08 +03:00
Tomasz Grabiec	049a1ef729	Merge 'flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler' from enedil The downgrade_to_v1 didn't reset the state of range tombstone assembler in case of the calls to next_partition or fast_forward_to, which caused a situation where the closing range tombstone change is cleared from the buffer before being emitted, without notifying the assembler. This patch fixes the behaviour in fast_forward_to as well. Fixes #9022 Closes #9023 * github.com:scylladb/scylla: flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler flat_mutation_reader: introduce public method returning the default size of internal buffer.	2021-07-19 17:10:23 +02:00
Jan Ciolek	54149242b4	cql3: Fix need_filtering on indexed table There were cases where a query on an indexed table needed filtering but need_filtering returned false. This is fixed by using new conditions in cases where we are using an index. Fixes #8991. Fixes #7708. For now this is an overly conservative implementation that returns true in some cases where filtering is not needed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-07-19 16:22:17 +02:00
Michał Radwański	67d99e02a7	flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler The downgrade_to_v1 didn't reset the state of range tombstone assembler in case of the calls to next_partition or fast_forward_to, which caused a situation where the closing range tombstone change is cleared from the buffer before being emitted, without notifying the assembler. This patch fixes the behaviour in fast_forward_to as well. Fixes #9022	2021-07-19 15:54:26 +02:00
Nadav Har'El	4c6dc5fce2	Merge 'continuous_data_consumer: properly skip bytes at the end of a range' from Wojciech Mitros When skipping bytes at the end of a continuous_data_consumer range, the position of the consumer is moved after the skipped bytes, but the position of the underlying input_stream is not. This patch adds skipping of the underlying input_stream, to make its position consistent with the position of the consumer. Fixes #9024 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #9039 * github.com:scylladb/scylla: tests: add test for skipping bytes at end of consumer continuous_data_consumer: properly skip bytes at the end of a range	2021-07-19 15:57:26 +03:00
Wojciech Mitros	507bdfc36a	tests: add test for skipping bytes at end of consumer The new tests confirms that the regression issue, where we didn't correctly skip bytes at the end of a continuous_data_consumer range, is fixed. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-07-19 14:42:38 +02:00
Jan Ciolek	9bd62a07c9	cql-pytest: Test selecting using only clustering key requires filtering Adds test that creates a table with primary key (p, c1, c2) with a global index on c2 and then selects where c1 = 1 and c2 = 1. This should require filtering, but doesn't. Refs #8991. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-07-19 10:24:48 +02:00
Jan Ciolek	a041767aa3	cql-pytest: Test selecting from indexed table using clustering key Adds test that creates a table with primary key (p, c1, c2) with a global index on c2 and then selects where c1 = 1 and c2 = 1. This currently fails. Refs #8991. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-07-19 10:24:46 +02:00
Avi Kivity	2cfc517874	main, test: adjust number of networking iocbs Seastar's default limit of 10,000 iocbs per shard is too low for some workload (it places an upper bound on the number of idle connections, above which a crash occurs). Use the new Seastar feature to raise the default to 50000. Also multiply the global reservation by 5, and round it upwards so the number is less weird. This prevents io_setup() from failing. For tests, the reservation is reduced since they don't create large numbers of connections. This reduces surprise test failures when they are run on machines that haven't been adjusted. Fixes #9051 Closes #9052	2021-07-18 14:38:44 +03:00
Avi Kivity	df822e09e0	Merge "Run test cases in parallel" from Pavel E " The debug-mode tests nowadays take ~1 hours to complete on a 24-cores threadripper machine. This is mostly because of a bunch of individual test cases that run sequentially (since they sit in one test) each taking half-an-hour and longer. The previous attempt was to break the longest tests into pieces, and to update the list of long-running test in suite.yaml file, but the concern was that the linkage time and disk space would grow without limits if this continues. Also the long-running tests list needs to be revisited every so often. So the new attempt is to resurrect Avi's patch that ran test cases in parallel for boost tests. This set applies parallelizm to all tests and allows to blacklist those that shound't (the logalloc needs the very first case to prime_segment_pools so that other cases run smoothly, thus is cannot be parallelized). Although this wild parallelizm adds an overhead for _each_ test case this is good enough even for short dev-mode tests (saves 25% of runtime), but greatly relaxes the maintenance of the "parallelizable list of tests". For debug tests the problem is not 100% solved. There are 6 cases that run longer than 30min, while all the others complete much- -much faster. So if excluding those slow 6 cases the full parallel run saves 50+% of the runtime -- 60+m now vs 25m with the patch. Those 6 slowest cases will need more incremental care. The --parallel-cases mode is not yet default, because it requires larger max-aio-nr value to be set, which is not (yet?) automatic. Also it sometimes hits nr-open-files limit, which also needs more work. tests: unit(dev), unit(debug) " * 'br-parallel-testpy-3' of https://github.com/xemul/scylla: tests: Update boost long tests list test.py: Parallelize test-cases run (for boost tests) test.py: Prepare BoostTest for running individual cases test.py: Prepare TestSuite::create_test() for parallelizm test.py: Treat shortname as composite test.py: Reformat tabluar output	2021-07-17 13:57:56 +03:00
Pavel Emelyanov	9d59f1daf3	tests: Update boost long tests list Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-16 17:25:07 +03:00
Pavel Emelyanov	cbb4837b77	test.py: Parallelize test-cases run (for boost tests) The parallelizm is acheived by listing the content of each (boost) test and by adding a test for each case found appending the '--run_test={case_name}' option. Also few tests (logallog and memtable) have cases that depend on each other (the former explicitly stated this in the head comment), so these are marked as "no_parallel_cases" in the suite.yaml file. In dev mode tests need 2m:5s to run by default. With parallelizm (and updated long-running tests list) -- 1m 35s. In debug mode there are 6 slow _cases_ that overrun 30 minutes. They finish last and deserve some special (incremental) care. All the other tests run ~1h by default vs ~25m in parallel. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-16 17:25:07 +03:00
Tomasz Grabiec	97aa335a60	Merge "test: raft: randomized_nemesis_test: refactors and improvements" from Kamil A couple of improvements to prepare for the next patchset. We move `logical_timer` and `ticker` to their own headers due to the generality of these data structures. They are not very specific to the test. `logical_timer` is extended with a `schedule` function, allowing to schedule any given function to be called at the given time point. The interface of `network` in `randomized_nemesis_test` is extended by `add_grudge` and `remove_grudge` functions for implementing network partitioning nemeses. Furthermore `network` can be now constructed with an arbitrary network delay, which was previously hardcoded. `with_env_and_ticker` is now generic w.r.t. return values (previously `future<>` was assumed). `environment` exposes a reference to the `network` through a getter. The `not_a_leader` exception now shows the leader's ID in the exception message. Useful for logging. In `logical_timer::with_timeout`, when we timeout, we don't just return `timed_out_error`. The returned exception now actually contains the original future... well almost; in any case, the user can now do something different to the future other than simply discarding it. We also fix some `broken_promise` exceptions appearing in discarded futures in certain scenarios. See the corresponding commit for detailed explanation. We handle `raft::dropped_entry` in the `call` function. `persistence` is fixed to avoid creating gaps in the log when storing snapshots and to support complex state types. Waiting for leader was refactored into a separate function and generalized (we wait for a set of nodes to elect a leader instead of a single node to elect itself) to be useful in more situations. Finally, we introduce `reconfigure`, a higher-level version of `set_configuration` which performs error handling and supports timeouts. * kbr/raft-nemesis-improvements-v4: test: raft: randomized_nemesis_test: `reconfigure` function test: raft: randomized_nemesis_test: refactor waiting for leader into a separate function test: raft: randomized_nemesis_test: persistence: avoid creating gaps in the log when storing snapshots test: raft: randomized_nemesis_test: persistence: handle complex state types test: raft: randomized_nemesis_test: `call`: handle `raft::dropped_entry` test: raft: randomized_nemesis_test: impure_state_machine/call: handle dropped channels test: raft: randomized_nemesis_test: environment: expose the network test: raft: randomized_nemesis_test: configurable network delay and FD convict threshold test: raft: randomized_nemesis_test: generalize `with_env_and_ticker` test: raft: randomized_nemesis_test: network: `add_grudge`, `remove_grudge` functions test: raft: randomized_nemesis_test: move `ticker` to its own header test: raft: randomized_nemesis_test: ticker: take `logger` as a constructor parameter test: raft: logical_timer: handle immediate timeout test: raft: logical_timer: on timeout, return the original future in the exception test: raft: logical_timer: add `schedule` member function test: raft: randomized_nemesis_test: move `logical_timer` to its own header test: raft: include the leader's ID in the `not_a_leader` exception's message	2021-07-16 16:12:05 +02:00
Nadav Har'El	5183e0cbe9	Merge 'Fix artificial view update size limit' from Piotr Sarna The series which split the view update process into smaller parts accidentally put an artificial 10MB limit on the generated mutation size, which is wrong - this limit is configurable for users, and, what's more important, this data was already validated when it was inserted into the base table. Thus, the limit is lifted. The series comes with a cql-pytest which failed before the fix and succeeds now. This bug is also covered by `wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view` dtest, but it needs over a minute to run, as opposed to cql-pytest's <1 second. Fixes #9047 Tests: unit(release), dtest(wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view) Closes #9048 * github.com:scylladb/scylla: cql-pytest: add a materialized views suite with first cases db,view: drop the artificial limit on view update mutation size	2021-07-15 17:03:07 +03:00
Piotr Sarna	c05340c4bf	cql-pytest: add a materialized views suite with first cases cql-pytest did not have a suite for materialized views, so one is created. At the same time, test cases for building/updating a view on a base table with large cells is added as a regression test for #9047.	2021-07-15 15:40:38 +02:00
Piotr Sarna	3d816b7c16	Merge 'Move the reader concurrency semaphore in front of the cache' from Botond This patchset combines two important changes to the way reader permits are created and admitted: 1) It switches admission to be up-front. 2) It changes the admission algorithm. (1) Currently permits are created before the read is started, but they only wait for admission when going to the disk. This leaves the resources consumption of cache and memtables reads unbounded, possibly leading to OOM (rare but happens). This series changes this that permits are admitted at the moment they are creating making admission up-front -- at least those reads that pass admission at all (some don't). (2) Admission currently is based on availability of resources. We have a certain amount of memory available, which derived from the memory available to the shard, as well a hardcoded count resource. Reads are admitted when a count and a certain amount (base cost) of memory is available. This patchset adds a new aspect to this admission process beyond the existing resource availability: the number of used/blocked reads. Namely it only admits new reads if in addition to the necessary amount of resources being available, all currently used readers are blocked. In other words we only admit new reads if all currently admitted reads requires something other than CPU to progress. They are either waiting on I/O, a remote shard, or attention from their consumers (not used currently). The reason for making these two changes at the same time is that up-front admission means cache reads now need to obtain a permit too. For cache reads the optimal concurrency is 1. Anything above that just increases latency (without increasing throughput). So we want to make sure that if a cache reader hits it doesn't get any competition for CPU and it can run to completion. We admit new reads only if the read misses and has to go to disk. A side effect of these changes is that the execution stages from the replica-side read path are replaced with the reader concurrency semaphore as an execution stage. This is necessary due to bad interaction between said execution stages and up-front admission. This has an important consequence: read timeouts are more strictly enforced because the execution stage doesn't have a timeout so it can execute already timed-out reads too. This is not the case with the semaphore's queue which will drop timed-out reads. Another consequence is that, now data and mutation reads share the same execution stage, which increases its effectiveness, on the other hand system and user reads don't anymore. Fixes: #4758 Fixes: #5718 Tests: unit(dev, release, debug) * 'reader-concurrency-semaphore-in-front-of-the-cache/v5.3' of https://github.com/denesb/scylla: (54 commits) test/boost/reader_concurrency_semaphore_test: add used/blocked test test/boost/reader_concurrency_semaphore_test: add admission test reader_permit: add operator<< for reader_resources reader_concurrency_semaphore: add reads_{admitted,enqueued} stats table: make_sstable_reader(): fix indentation table: clean up make_sstable_reader() database: remove now unused query execution stages mutation_reader: remove now unused restricting_reader sstables: sstable_set: remove now unused make_restricted_range_sstable_reader() reader_permit: remove now unused wait_admission() reader_concurrency_semaphore: remove now unused obtain_permit_nowait() reader_concurrency_semaphore: admission: flip the switch database: increase semaphore max queue size test: index_with_paging_test: increase semaphore's queue size reader_concurrency_semaphore: add set_max_queue_size() test: mutation_reader_test: remove restricted reader tests reader_concurrency_semaphore: remove now unused make_permit() test: reader_concurrency_semaphore_test: move away from make_permit() test: move away from make_permit() treewide: use make_tracking_only_permit() ...	2021-07-14 16:22:56 +02:00
Botond Dénes	e2dfb2df71	test/boost/reader_concurrency_semaphore_test: add used/blocked test Make sure that releasing a bunch of used/blocked guards in random order doesn't break the permit state.	2021-07-14 17:19:02 +03:00
Botond Dénes	0337d3ea4a	test/boost/reader_concurrency_semaphore_test: add admission test Checking every conceivable admission scenario (hopefully).	2021-07-14 17:19:02 +03:00
Botond Dénes	b81f39cec9	reader_permit: add operator<< for reader_resources And use it in tests, it results in actually useful error messages.	2021-07-14 17:19:02 +03:00
Botond Dénes	1b7eea0f52	reader_concurrency_semaphore: admission: flip the switch This patch flips two "switches": 1) It switches admission to be up-front. 2) It changes the admission algorithm. (1) by now all permits are obtained up-front, so this patch just yanks out the restricted reader from all reader stacks and simultaneously switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By doing this admission is now waited on when creating the permit. (2) we switch to an admission algorithm that adds a new aspect to the existing resource availability: the number of used/blocked reads. Namely it only admits new reads if in addition to the necessary amount of resources being available, all currently used readers are blocked. In other words we only admit new reads if all currently admitted reads requires something other than CPU to progress. They are either waiting on I/O, a remote shard, or attention from their consumers (not used currently). We flip these two switches at the same time because up-front admission means cache reads now need to obtain a permit too. For cache reads the optimal concurrency is 1. Anything above that just increases latency (without increasing throughput). So we want to make sure that if a cache reader hits it doesn't get any competition for CPU and it can run to completion. We admit new reads only if the read misses and has to go to disk. Another change made to accommodate this switch is the replacement of the replica side read execution stages which the reader concurrency semaphore as an execution stage. This replacement is needed because with the introduction of up-front admission, reads are not independent of each other any-more. One read executed can influence whether later reads executed will be admitted or not, and execution stages require independent operations to work well. By moving the execution stage into the semaphore, we have an execution stage which is in control of both admission and running the operations in batches, avoiding the bad interaction between the two.	2021-07-14 17:19:02 +03:00
Botond Dénes	dcf49dcb67	test: index_with_paging_test: increase semaphore's queue size To allow the flood of reads generated by this test to be queued up during up-front admission without failing the test.	2021-07-14 17:19:02 +03:00
Botond Dénes	388da36bbb	test: mutation_reader_test: remove restricted reader tests Soon we will switch to up-front admission which will break these tests. No point in trying to fix them as once the switch is done we'll retire the restricted reader too. Remove these tests now so they are not in the way of progress.	2021-07-14 17:19:02 +03:00
Botond Dénes	bacfaf9582	test: reader_concurrency_semaphore_test: move away from make_permit() Migrate to the appropriate up-front admission variants.	2021-07-14 17:19:02 +03:00
Botond Dénes	c07db00b70	test: move away from make_permit() Use the most appropriate up-front admission variant.	2021-07-14 17:19:02 +03:00

1 2 3 4 5 ...

2040 Commits