scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Gleb Natapov	09528b8671	raft: test: test leadership transfer timeout Test that if leadership transfer cannot be done in configured time frame fsm cancels the leadership transfer process. Also check that timeout_now message is resent on each tick while leadership transfer is in progress.	2021-06-22 14:42:50 +03:00
Nadav Har'El	a9b383f423	cql-pytest: improve test for SSL/TLS versions The existing test_ssl.py which tests for Scylla's support of various TLS and SSL versions, used a deprecated and misleading Python API for choosing the protocol version. In particular, the protocol version ssl.PROTOCOL_SSLv23 is not, despite it's name, SSL versions 2 or 3, or SSL at all - it is in fact an alias for the latest TLS version :-( This misunderstanding led us to open the incorrect issue #8837. So in this patch, we avoid the old Python APIs for choosing protocols, which were gradually deprecated, and switch to the new API introduced in Python 3.7 and OpenSSL 1.1.0g - supplying the minimum and maximum desired protocol version. With this new API, we can correctly connect with various versions of the SSL and TLS protocol - between SSLv3 through TLSv1.3. With the fixed test, we confirm that Scylla does not allow SSLv3 - as desired - so issue #8837 is a non-issue. Moreover, after issue #8827 was already fixed, this test now passes, so the "xfail" mark is removed. Refs #8837. Refs #8827. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210617134305.173034-1-nyh@scylladb.com>	2021-06-17 17:06:31 +03:00
Tomasz Grabiec	6d8440fe70	Merge "raft: (testing) leadership transfer tests" from Pavel Solodovnikov The patch set introduces a few leadership transfer tests, some of them are adaptations of corresponding etcd tests (e.g. `test_leader_transfer_ignore_proposal` and `test_transfer_non_member`). Others test different scenarios ensuring that pending leadership transfer doesn't disrupt the rest of the cluster from progressing: Lost `timeout_now` messages` (`test_leader_transfer_lost_timeout_now` and `test_leader_transferee_dies_upon_receiving_timeout_now`) as well as lost `vote_request(force)` from the new candidate (test_leader_transfer_lost_force_vote_request) don't impact the election process following that and the leader is elected as normal. * manmanson/leadership_transfer_tests_v3: raft: etcd_test: test_transfer_non_member raft: etcd_test: test_leader_transfer_ignore_proposal raft: fsm_test: test_leader_transfer_lost_force_vote_request raft: fsm_test: test_leader_transfer_lost_timeout_now raft: fsm_test: test_leader_transferee_dies_upon_receiving_timeout_now	2021-06-17 13:58:31 +02:00
Piotr Sarna	8cca68de75	cql3: add USING TIMEOUT support for deletes Turns out the DELETE statement already supports attributes like timestamp, so it's ridiculously easy to add USING TIMEOUT support - it's just the matter of accepting it in the grammar. Fixes #8855 Closes #8876	2021-06-17 14:21:01 +03:00
Avi Kivity	00ff3c1366	Merge 'treewide: add support for snapshot skip-flush option' from Benny Halevy The option is provided by nodetool snapshot https://docs.scylladb.com/operating-scylla/nodetool-commands/snapshot/ ``` nodetool [(-h <host> \| --host <host>)] [(-p <port> \| --port <port>)] [(-pp \| --print-port)] [(-pw <password> \| --password <password>)] [(-pwf <passwordFilePath> \| --password-file <passwordFilePath>)] [(-u <username> \| --username <username>)] snapshot [(-cf <table> \| --column-family <table> \| --table <table>)] [(-kc <kclist> \| --kc.list <kclist>)] [(-sf \| --skip-flush)] [(-t <tag> \| --tag <tag>)] [--] [<keyspaces...>] -sf / –skip-flush Do not flush memtables before snapshotting (snapshot will not contain unflushed data) ``` But is currently ignored by scylla-jmx (scylladb/scylla-jmx#167) and not supported at the api level. This patch adds support for the option in advance from the api service level down via snapshot_ctl to the table class and snapshot implementation. In addition, a corresponding unit test was added to verify that taking a snapshot with `skip_flush` does not flush the memtable (at the table::snapshot level). Refs #8725 Closes #8726 * github.com:scylladb/scylla: test: database_test: add snapshot_skip_flush_works api: storage_service/snapshots: support skip-flush option snapshot: support skip_flush option table: snapshot: add skip_flush option api: storage_service/snapshots: add sf (skip_flush) option	2021-06-17 13:32:23 +03:00
Nadav Har'El	7fd7e90213	cql-pytest: translate Cassandra's tests for static columns This is a translation of Cassandra's CQL unit test source file validation/entities/StaticColumnsTest.java into our our cql-pytest framework. This test file checks various features of static columns. All these tests pass on Cassandra, and all but one pass on Scylla. The xfailing test, testStaticColumnsWithSecondaryIndex, exposes a query that Cassandra allows but we don't. The new issue about that is: Refs #8869. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210616141633.114325-1-nyh@scylladb.com>	2021-06-17 11:08:28 +02:00
Tomasz Grabiec	6bdf8c4c46	Merge "raft: second series of preparatory patches for group 0 discovery" from Kostja Miscellaneous preparatory patches for group 0 discovery. * scylla-dev/raft-group-0-part-2-v4: raft: (service) servers map is gid -> server, not sid -> server system_keyspace: raft.group_id and raft_snapshots.group_id are TIMEUUID raft: (server) wait for configuration transition to complete raft: (server) implement raft::server::get_configuration() raft: (service) don't throw from schema state machine raft: (service) permit some scylla.raft cells to be empty raft: (service) properly handle failure to add a server raft: implement is_transient_error()	2021-06-17 00:15:40 +02:00
Konstantin Osipov	18e3fcdbf1	raft: (service) servers map is gid -> server, not sid -> server Raft Group registry should map Raft Group Id to Raft Server, not Raft Server ID (which is identical for all groups) to Raft server. Raft Group 0 ID works as a cluster identifier, so is generated when a new cluster is created and is shared by all nodes of the same cluster. Implement a helper to get raft::server by group id. Consistently throw a new raft_group_not_found exception if there is no server or rpc for the specified group id.	2021-06-16 19:05:50 +03:00
Avi Kivity	f05ddf0967	Merge "Improve LSA descriptor encoding" from Pavel " The LSA small objects allocation latency is greatly affected by the way this allocator encodes the object descriptor in front of each allocated slot. Nowadays it's one of VLE variants implemented with the help of a loop. Re-implementing this piece with less instructions and without a loop allows greatly reducing the allocation latency. The speed-up mostly comes from loop-less code that doesn't confuse branch predictor. Also the express encoder seems to benefit from writing 8 bytes of the encoded value in one go, rather than byte- -by-byte. Perf measurements: 1. (new) logallog test shows ~40% smaller times 2. perf_mutation in release mode shows ~2% increase in tps 3. the encoder itself is 2 - 4 times faster on x86_64 and 1.05 - 3 times faster on aarch64. The speed-up depends on the 'encoded length', old encoder has linear time, the new one is constant tests: unit(dev), perf(release), just encoder on Aarch64 " * 'br-lsa-alloc-latency-4' of https://github.com/xemul/scylla: lsa: Use express encoder uleb64: Add express encoding lsa: Extract uleb64 code into header test: LSA allocation perf test	2021-06-16 18:07:13 +03:00
Avi Kivity	0948908502	Merge "mutation_reader: multishard_combining_reader clean-up close path" from Botond " The close path of the multishard combining reader is riddled with workarounds the fact that the flat mutation reader couldn't wait on futures when destroyed. Now that we have a close() method that can do just that, all these workarounds can be removed. Even more workarounds can be found in tests, where resources like the reader concurrency semaphore are created separately for each tested multishard reader and then destroyed after it doesn't need it, so we had to come up with all sorts of creative and ugly workarounds to keep these alive until background cleanup is finished. This series fixes all this. Now, after calling close on the multishard reader, all resources it used, including the life-cycle policy, the semaphores created by it can be safely destroyed. This greatly simplifies the handling of the multishard reader, and makes it much easier to reason about life-cycle dependencies. Tests: unit(dev, release:v2, debug:v2, mutation_reader_test:debug -t test_multishard, multishard_mutation_query_test:debug, multishard_combining_reader_as_mutation_source:debug) " * 'multishard-combining-reader-close-cleanup/v3' of https://github.com/denesb/scylla: mutation_reader: reader_lifecycle_policy: remove convenience methods mutation_reader: multishard_combining_reader: store shard_reader via unique ptr test/lib/reader_lifecycle_policy: destroy_reader: cleanup context test/lib/reader_lifecycle_policy: get rid of lifecycle workarounds test/lib/reader_lifecycle_policy: destroy_reader(): stop the semaphore test/lib/reader_lifecycle_policy: use a more robust eviction mechanism reader_concurrency_semaphore: wait for all permits to be destroyed in stop() test/lib/reader_lifcecycle_policy: fix indentation mutation_reader: reader_lifecycle_policy::destroy_reader(): require to be called on native shard reader_lifecycle_policy implementations: fix indentation mutation_reader: reader_lifecycle_policy::destroy_reader(): de-futurize reader parameter mutation_reader: shard_reader::close(): wait on the remote reader multishard_mutation_query: destroy remote parts in the foreground mutation_reader: shard_reader::close(): close _reader mutation_reader: reader_lifcecycle_policy::destroy_reader(): remove out-of-date comment	2021-06-16 17:25:50 +03:00
Konstantin Osipov	9c93d77e74	system_keyspace: raft.group_id and raft_snapshots.group_id are TIMEUUID Fix a bug in definitions of system.raft, system.raft_snapshots, group_id is TIMEUUID, not long.	2021-06-16 16:52:43 +03:00
Pavel Emelyanov	1e67361267	test: LSA allocation perf test The test measures the time it takes to allocate a bunch of small objects on LSA inside single segment. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-16 13:40:44 +03:00
Botond Dénes	b4e69cf63d	test/lib/test_utils: require(): also log failed conditions Currently `require()` throws an exception when the condition fails. The problem with this is that the error is only printed at the end of the test, with no trace in the logs on where exactly it happened, compared to other logged events. This patchs also adds an error-level log line to address this. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210616065711.46224-1-bdenes@scylladb.com>	2021-06-16 12:05:25 +03:00
Botond Dénes	a69db31b5c	test/lib/reader_lifecycle_policy: destroy_reader: cleanup context Now that we don't rely on any external machinery to keep the relevant parts of the context alive until needed as its life-cycle is effectively enclosed in that of the life-cycle policy itself, we can cleanup the context in `destroy_reader()` itself, avoiding a background trip back to this shard.	2021-06-16 11:29:36 +03:00
Botond Dénes	d2ddaced4e	test/lib/reader_lifecycle_policy: get rid of lifecycle workarounds The lifecycle of the reader lifecycle policy and all the resources the reads use is now enclosed in that of the multishard reader thanks to its close() method. We can now remove all the workarounds we had in place to keep different resources as long as background reader cleanup finishes.	2021-06-16 11:29:36 +03:00
Botond Dénes	5a271e42a5	test/lib/reader_lifecycle_policy: destroy_reader(): stop the semaphore So that when this method returns the semaphore is safe to destroy. This in turn will enable us to get rid of all the machinery we have in place to deal with the semaphore having to out-live the lifecycle policy without a clear time as to when it can be safe to destroy.	2021-06-16 11:29:36 +03:00
Botond Dénes	c09c62a0fb	test/lib/reader_lifecycle_policy: use a more robust eviction mechanism The test reader lifecycle policy has a mode in which it wants to ensure all inactive readers are evicted, so tests can stress reader recreation logic. For this it currently employs a trick of creating a waiter on the semaphore. I don't even know how this even works (or if it even does) but it sure complicates the lifecycle policy code a lot. So switch to the much more reliable and simple method of creating the semaphore with a single count and no memory. This ensures that all inactive reads are immediately evicted, while still allows a single read to be admitted at all times.	2021-06-16 11:29:36 +03:00
Botond Dénes	578a092e4a	reader_concurrency_semaphore: wait for all permits to be destroyed in stop() To prevent use-after-free resulting from any permit out-living the semaphore.	2021-06-16 11:29:36 +03:00
Botond Dénes	a10a6e253e	test/lib/reader_lifcecycle_policy: fix indentation Left broken from the previous patch.	2021-06-16 11:29:36 +03:00
Botond Dénes	8c7447effd	mutation_reader: reader_lifecycle_policy::destroy_reader(): require to be called on native shard Currently shard_reader::close() (its caller) goes to the remote shard, copies back all fragments left there to the local shard, then calls `destroy_reader()`, which in the case of the multishard mutation query copies it all back to the native shard. This was required before because `shard_reader::stop()` (`close()`'s) predecessor) couldn't wait on `smp::submit_to()`. But close can, so we can get rid of all this back-and-forth and just call `destroy_reader()` on the shard the reader lives on, just like we do with `create_reader()`.	2021-06-16 11:29:35 +03:00
Botond Dénes	4ecf061c90	reader_lifecycle_policy implementations: fix indentation Left broken from the previous patch.	2021-06-16 11:21:38 +03:00
Botond Dénes	a7e59d3e2c	mutation_reader: reader_lifecycle_policy::destroy_reader(): de-futurize reader parameter The shard reader is now able to wait on the stopped reader and pass the already stopped reader to `destroy_reader()`, so we can de-futurize the reader parameter of said method. The shard reader was already patched to pass a ready future so adjusting the call-site is trivial. The most prominent implementation, the multishard mutation query, can now also drop its `_dismantling_gate` which was put in place so it can wait on the background stopping if readers. A consequence of this move is that handling errors that might happen during the stopping of the reader is now handled in the shard reader, not all lifecycle policy implementations.	2021-06-16 11:21:38 +03:00
Avi Kivity	fce124bd90	Merge "Introduce flat_mutation_reader_v2" from Tomasz " This series introduces a new version of the mutation fragment stream (called v2) which aims at improving range tombstone handling in the system. When compacting a mutation fragment stream (e.g. for sstable compaction, data query, repair), the compactor needs to accumulate range tombstones which are relevant for the yet-to-be-processed range. See range_tombstone_accumulator. One problem is that it has unbounded memory footprint because the accumulator needs to keep track of all the tombstoned ranges which are still active. Another, although more benign, problem is computational complexity needed to maintain that data structure. The fix is to get rid of the overlap of range tombstones in the mutation fragment stream. In v2 of the stream, there is no longer a range_tombstone fragment. Deletions of ranges of rows within a given partition are represented with range_tombstone_change fragments. At any point in the stream there is a single active clustered tombstone. It is initially equal to the neutral tombstone when the stream of each partition starts. The range_tombstone_change fragment type signify changes of the active clustered tombstone. All fragments emitted while a given clustered tombstone is active are affected by that tombstone. Like with the old range_tombstone fragments, the clustered tombstone is independent from the partition tombstone carried in partition_start. The memory needed to compact a stream is now constant, because the compactor needs to only track the current tombstone. Also, there is no need to expire ranges on each fragment because the stream emits a fragment when the range ends. This series doesn't convert all readers to v2. It introduces adaptors which can convert between v1 and v2 streams. Each mutation source can be constructed with either v1 or v2 stream factory, but it can be asked any version, performing conversion under the hood if necessary. In order to guarantee that v1 to v2 conversion produces a well-formed stream, this series needs to impose a constraint on v1 streams to trim range tombstones to clustering restrictions. Otherwise, v1->v2 converted could produce range tombstone changes which lie outside query restrictions, making the stream non-canonical. The v2 stream is strict about range tombstone trimming. It emits range tombstone changes which reflect range tombstones trimmed to query restrictions, and fast-forwarding ranges. This makes the stream more canonical, meaning that for a given set of writes, querying the database should produce the same stream of fragments for a given restrictions. There is less ambiguity in how the writes are represented in the fragment stream. It wasn't the case with v1. For example, A given set of deletions could be produced either as one range_tombstone, or may, split and/or deoverlapped with other fragments. Making a stream canonical is easier for diff-calculating. The mc sstable reader was converted to v2 because it seemed like a comparable effort to do that versus implementing range tombstone trimming in v1. The classes related to mutation fragment streams were cloned: flat_mutation_reader_v2, mutation_fragment_v2, related concepts. Refs #8625. To fully fix #8625 we need to finish the transition and get rid of the converters. Converters accumulate range tombstones. Tests: - unit [dev] " * tag 'flat_mutation_reader_range_tombstone_split-v3.2' of github.com:tgrabiec/scylla: (26 commits) tests: mutation_source_test: Run tests with conversions inserted in the middle tests: mutation_source_tests: Unroll run_flat_mutation_reader_tests() tests: Add tests for flat_mutation_reader_v2 flat_mutation_reader: Update the doc to reflect range tombstone trimming sstables: Switch the mx reader to flat_mutation_reader_v2 row_cache: Emit range tombstone adjacent to upper bound of population range tests: sstables: Fix test assertions to not expect more than they should flat_mutation_reader: Trim range tombstones in make_flat_mutation_reader_from_fragments() clustering_ranges_walker: Emit range tombstone changes while walking tests: flat_mutation_reader_assertions_v2: Adapt to the v2 stream Clone flat_reader_assertions into flat_reader_assertions_v2 test: lib: simple_schema: Reuse new_tombstone() test: lib: simple_schema: Accept tombstone in delete_range() mutation_source: Introduce make_reader_v2() partition_snapshot_flat_reader: Trim range tombstones to query ranges mutation_partition: Trim range tombstones to query ranges sstables: reader: Inline specialization of sstable_mutation_reader sstables: k_l: reader: Trim range tombstones to query ranges clustering_ranges_walker: Introduce split_tombstone() position_range: Introduce contains() check for ranges ...	2021-06-16 11:10:54 +03:00
Tomasz Grabiec	605a6e0166	Merge "Remove int_or_strong_ordering concept" from Pavel The one was added to smothly switch tri-comparing stuff from int to strong-ordering. As for today only tests still need it and the conversion is pretty simple, plus operator<<(ostream&) for the std::strong_ordering type. * xemul/br-remove-int-or-strong-ordering-2: util: Drop int_or_strong_ordering concept tests: Switch total-order-check onto strong_ordering to_string: Add formatter for strong_ordering tests: Return strong-ordering from tri-comparators	2021-06-16 09:34:49 +02:00
Tomasz Grabiec	3fcd1f43ba	tests: mutation_source_test: Run tests with conversions inserted in the middle	2021-06-16 00:23:49 +02:00
Tomasz Grabiec	cddcba27de	tests: mutation_source_tests: Unroll run_flat_mutation_reader_tests() All readers are now flat so there is no need for this grouping. Will be needed for the next patch, which needs a single function with all test cases.	2021-06-16 00:23:49 +02:00
Tomasz Grabiec	ffb616fef6	tests: Add tests for flat_mutation_reader_v2	2021-06-16 00:23:49 +02:00
Tomasz Grabiec	a4275cf8bc	sstables: Switch the mx reader to flat_mutation_reader_v2 The main difficulty was in making sure that emitted range tombstone changes reflect range tombstones trimmed to clustering restrictions. This is handled by mutation_fragment_filter and clustering_ranges_walker. They return a list of range_tombstone_change fragments to emit for each hop as the reader walks over the clustering domain. Tests which were using a normalizing reader expected range tombstones to be split around rows. Drop this an adjust the tests accoridngly. No reader splits range tombstones around rows now.	2021-06-16 00:23:49 +02:00
Tomasz Grabiec	cf958b0ad0	row_cache: Emit range tombstone adjacent to upper bound of population range Cache populating reader was emitting the row entry which stands for the upper bound of the population range, but did not emit range tombstones for the clustering range corresponding to: [ before(key), after(key) ). This surfaces after sstable readers are changed to trim emitted range tombstones to the fast-forwarding range. Before, it didn't cause problems, because that range tombstone part would be emitted as part of the sstable read. The fix is to drop the optimization which pushes the row after population is done, and let the regular handling for copy_from_cache_to_buffer() take care of emitting the row and tombstones for the remaining range. A unit test is added which covers population from all sstable versions.	2021-06-16 00:23:49 +02:00
Tomasz Grabiec	5b182ff29a	tests: sstables: Fix test assertions to not expect more than they should Before this patch, the tests expected readers to emit range tombstones which are outside clustering restrictions. Readers do not have to emit range tombstones outside clustering restrictions, so fix tests to only expect the part which overlaps with query ranges. This is a preparatory patch before changing readers to trim range tombstones to clustering ranges.	2021-06-16 00:23:49 +02:00
Tomasz Grabiec	ed055db63e	tests: flat_mutation_reader_assertions_v2: Adapt to the v2 stream	2021-06-16 00:23:49 +02:00
Tomasz Grabiec	276c68c867	Clone flat_reader_assertions into flat_reader_assertions_v2	2021-06-16 00:23:49 +02:00
Tomasz Grabiec	a13e7b30b7	test: lib: simple_schema: Reuse new_tombstone()	2021-06-16 00:23:49 +02:00
Tomasz Grabiec	7e01679c99	test: lib: simple_schema: Accept tombstone in delete_range()	2021-06-16 00:23:49 +02:00
Pavel Solodovnikov	e9258f43cd	raft: etcd_test: test_transfer_non_member Test that a node outside configuration, that receives `timeout_now` message, doesn't disrupt operation of the rest of the cluster. That is, `timeout_now` has no effect and the outsider stays in the follower state without promoting to the candidate. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-15 19:44:21 +03:00
Pavel Solodovnikov	2b6d73de98	raft: etcd_test: test_leader_transfer_ignore_proposal Test that a leader which has entered leader stepdown mode rejects new append requests. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-15 19:44:21 +03:00
Pavel Solodovnikov	ab6b0e3d62	raft: fsm_test: test_leader_transfer_lost_force_vote_request 3-node cluster (A, B, C). A is initially elected a leader. The leader adds a new configuration entry, that removes it from the cluster (B, C). Wait up until the former leader commits the new configuration and starts leader transfer procedure, sending out the `timeout_now` message to one of the remaining nodes. But at that point it haven't received it yet. Deliver the `timeout_now` message to the target but lose all the `vote_request(force)` messages it attempts to send. This should halt the election process. Then wait for election timeout so that candidate node starts another normal election (without `force` flag for vote requests). Check that this candidate further makes progress and is elected a leader. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-15 19:44:21 +03:00
Pavel Solodovnikov	97fe6f9d49	raft: fsm_test: test_leader_transfer_lost_timeout_now 3-node cluster (A, B, C). A is initially elected a leader. The leader adds a new configuration entry, that removes it from the cluster (B, C). Wait up until the former leader commits the new configuration and starts leader transfer procedure, sending out the `timeout_now` message to one of the remaining nodes. But at that point it haven't received it yet. Lose this message and verify that the rest of the cluster (B, C) can make progress and elect a new leader. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-15 19:44:21 +03:00
Pavel Solodovnikov	c32497b798	raft: fsm_test: test_leader_transferee_dies_upon_receiving_timeout_now 4-node cluster (A, B, C, D). A is initially elected a leader. The leader adds a new configuration entry, that removes it from the cluster (B, C, D). Communicate the cluster up to the point where A starts to resign its leadership (calls `transfer_leadership()`). At this point, A should send a `timeout_now` message to one the remaining nodes (B, C or D) and the new configuration should be committed. But no nodes actually have received the `timeout_now` message yet. Determine on which node the message should arrive, accept the `timeout_now` message and disconnect the target from the rest of the group. Check that after that the cluster, which has only two live members, could progress and elect a new leader through a normal election process. tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-15 19:44:19 +03:00
Alejo Sanchez	9a22a30554	raft: replication test: split elect_new_leader for prevote Branch URL: https://github.com/alecco/scylla/tree/raft-fixes-02-v3-01 Tests: unit ({dev}), unit ({debug}), unit ({release}) This fixes current election hangs in next. Message-Id: <20210610143558.131685-1-alejo.sanchez@scylladb.com>	2021-06-15 11:53:24 +02:00
Tomasz Grabiec	9d49a26e79	Merge "raft: randomized_nemesis_test: tick servers less often than the network in basic_test" from Kamil Previously `ticker` would use a single function, `on_tick`, which it called in a loop with yields in-between. In `basic_test` we would use this to tick every object in synchrony. However, to closely simulate a production environment, we want the tick ratios to be different. For example Raft servers should be ticked rarely compared to the network. We may also want to give the Seastar reactor more space between the function calls (e.g. if they cause a bunch of work to be created for the reactor that needs more than one tick to complete). To support these use cases we first generalize `ticker` to take a set of functions with associated numbers. These numbers are the call periods of their corresponding functions: given {n, f}, `f` will be called each `n`th tick. We use this new functionality to tick Raft servers less often than the network in basic_test. This patchset effectively reverts `01b6a2eb38` which caused the ticker to call `on_tick` only when the Seastar reactor had no work to do. This approach is unfortunately incompatible with the approach taken there. We do want the ticker to race with other work, potentially producing more work while already scheduled work is executing, and we want to see in tests what happens when we adjust the ticking ratios of different subsystems. The previous approach also had a problem where if there was an infinite task loop executing, the ticker wouldn't ever tick. The previous fix was introduced since the ticker caused too much work to be produced (so the reactor couldn't keep up) due to ticking the Raft servers too often (after each yield). This commit deals with the problem in a different way, by ticking the servers rarely, which also resembles "real-life" scenarios better. * kbr/tick-network-often-v4: raft: randomized_nemesis_test: generalize `ticker` to take a set of functions raft: randomized_nemesis_test: split `environment::tick` into two functions raft: randomized_nemesis_test: fix potential use-after-free in basic_test	2021-06-15 01:54:57 +02:00
Kamil Braun	8f1caa6a90	raft: randomized_nemesis_test: generalize `ticker` to take a set of functions ... with associated calling periods and use the new API in `basic_test`. Previously `ticker` would use a single function, `on_tick`, which it called in a loop with yields in-between. In `basic_test` we would use this to tick every object in synchrony. However, to closely simulate a production environment, we may want the tick ratios to be different. For example Raft servers should be ticked rarely compared to the network. We may also want to give the Seastar reactor more space between the function calls (e.g. if they cause a bunch of work to be created for the reactor that needs more than one tick to complete). To support these use cases we generalize `ticker` to take a set of functions with associated numbers. These numbers are the call periods of their corresponding functions: given {n, f}, `f` will be called each `n`th tick. We also modify `basic_test` to use this new approach: we tick Raft servers once per 10 network ticks (in particular, once per 10 reactor yields). This commit effectively reverts `01b6a2eb38` which caused the ticker to call `on_tick` only when the Seastar reactor had no work to do. This approach is unfortunately incompatible with the approach taken there. We do want the ticker to race with other work, potentially producing more work while already scheduled work is executing, and we want to see in tests what happens when we adjust the ticking ratios of different subsystems. The previous approach also had a problem where if there was an infinite task loop executing, the ticker wouldn't ever tick. The previous fix was introduced since the ticker caused too much work to be produced (so the reactor couldn't keep up) due to ticking the Raft servers too often (after each yield). This commit deals with the problem in a different way, by ticking the servers rarely, which also resembles "real-life" scenarios better. With this change we must also wait a bit longer for the first node to elect itself as a leader at the beginning of the test.	2021-06-14 16:54:38 +02:00
Kamil Braun	c0b80f1f8a	raft: randomized_nemesis_test: split `environment::tick` into two functions One for ticking the network and one for ticking the servers.	2021-06-14 16:54:38 +02:00
Kamil Braun	f42776aded	raft: randomized_nemesis_test: fix potential use-after-free in basic_test The test starts by waiting a certain number of ticks for the first node to elect itself as a leader. If this wait times out - i.e. the number of ticks passes before the node manages to elect itself - the future associated with the task which checks for the leader condition becomes discarded (it is passed to `with_timeout`) and the task may keep using the `environment` (which it has a reference to) even after the `environment` is destroyed. Furthermore, the aforementioned task is a coroutine which uses lambda captures in its body. Leaving `with_timeout` destroys the lambda object, causing the coroutine to refer to no-longer-existing captures. We fix the problems by: - making `environment` `weakly_referencable` and checking if its alive before it's used inside the task, - not capturing anything in the lambda but passing whatever's needed as function arguments (so these things get allocated inside the coroutine frame).	2021-06-14 16:54:38 +02:00
Nadav Har'El	3645c7104b	Merge: Wrap alternator start-stop into controller Merged patch series by Pavel Emelyanov: Alternator start and stop code is sitting inside the main() and it's a big piece of code out there. Havig it all in main complicates rework of start-stop sequences, it's much more handy to have it in alternator/. This set puts the mentioned code into transport- and thrift- like controller model. While doing it one more call for global storage service goes away. * 'br-alternator-clientize' of https://github.com/xemul/scylla: alternator: Move start-stop code into controller alternator: Move the whole starting code into a sched group alternator: Dont capture db, use cfg alternator: Controller skeleton alternator: Controller basement alternator: Drop storage service from executor	2021-06-14 15:44:10 +03:00
Alejo Sanchez	5c8092cf42	raft: fix election with disruptive candidate This patch also fixes rare hangs in debug mode for drops_04 without prevote. Branch URL: https://github.com/alecco/scylla/tree/raft-fixes-05-v2-dueling Tests: unit ({dev}), unit ({debug}), unit ({release}) Changes in v2: - Fixed commit message @kostja Whithout prevote, a node disconnected for long enough becomes candidate. While disconnected (A) it keeps increasing its term. When it rejoins it disrupts the current leader (C) which steps down due to the higher term in (A)'s append_entries_reply and (C) also increases its term. Meanwhile followers (B) and (D) don't know (C) stepped down but see it alive according to the current failure detecture implementation, and also (A) has shorter log than them. So they reject (A)'s vote requests (Raft 4.2.3 Disruptive servers). Then (C) rejects voting for (A) because it has shorter log. And (C) becomes candidate but even though (A) votes for (C), the previous followers (B) and (D) ignore a vote request while leader (C) is still alive and election timeout has not passed. (A) and (C) alone can't reach quorum 2/4. So elections never succeed. This patch addresses this problem by making followers not ignore vote requests from who they think is the current leader even though election timout was not reached. As @kostja noted, if failure detector would consider a leader alive only as long as it sends heartbeats (append requests) this patch is no longer needed. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20210611172734.254757-1-alejo.sanchez@scylladb.com>	2021-06-14 11:07:38 +02:00
Raphael S. Carvalho	846f0bd16e	sstables: Fix incremental selection with compound sstable set Incremental selection may not work properly for LCS and ICS due to an use-after-free bug in partitioned set which came into existence after compound set was introduced. The use-after-free happens because partitioned set wasn't taking into account that the next position can become the current position in the next iteration, which will be used by all selectors managed by compound set. So if next position is freed, when it were being used as current position, subsequent selectors would find the current position freed, making them produce incorrect results. Fix this by moving ownership of next pos from incremental_selector_impl to incremental_selector, which makes it more robust as the latter knows better when the selection is done with the next pos. incremental_selector will still return ring_position_view to avoid copies. Fixes #8802. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210611130957.156712-1-raphaelsc@scylladb.com>	2021-06-13 16:45:07 +03:00
Tomasz Grabiec	7521301b72	Merge "raft: add tests for non-voters and fix related bugs" from Kostja Add test coverage inspired by etcd for non-voter servers, and fix issues discovered when testing. * scylla-dev/raft-learner-test-v4: raft: (testing) test non-voter can vote raft: (testing) test receiving a confchange in a snapshot raft: (testing) test voter-non-voter config change loop raft: (testing) test non-voter doesn't start election on election timeout raft: (testing) test what happens when a learner gets TimeoutNow raft: (testing) implement a test for a leader becoming non-voter raft: style fix raft: step down as a leader if converted to a non-voter raft: improve configuration consistency checks raft: (testing) test that non-voter stays in PIPELINE mode raft: (testing) always return fsm_debug in create_follower()	2021-06-12 21:36:47 +03:00
Nadav Har'El	9774c146cc	cql-pytest: add test for connecting with different SSL/TLS versions This is a reproducer for issue #8827, that checks that a client which tries to connect to Scylla with an unsupported version of SSL or TLS gets the expected error alert - not some sort of unexpected EOF. Issue #8827 is still open, so this test is still xfailing. However, I verified that with a fix for this issue, the test passes. The test also prints which protocol versions worked - so it also helps checking issue #8837 (about the ancient SSL protocol being allowed). Refs #8837 Refs #8827 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210610151714.1746330-1-nyh@scylladb.com>	2021-06-12 21:36:47 +03:00
Michael Livshin	2bbc293e22	tests: improve error reporting of test_env::reusable_sst() Distinguish the "no such sstable" case from any reading errors. While at it, coroutinize the function. Refs #8785. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Message-Id: <20210610113304.264922-1-michael.livshin@scylladb.com>	2021-06-11 19:06:43 +02:00

1 2 3 4 5 ...

1860 Commits