Test that if leadership transfer cannot be done in configured time frame
fsm cancels the leadership transfer process. Also check that timeout_now
message is resent on each tick while leadership transfer is in progress.
The existing test_ssl.py which tests for Scylla's support of various TLS
and SSL versions, used a deprecated and misleading Python API for
choosing the protocol version. In particular, the protocol version
ssl.PROTOCOL_SSLv23 is *not*, despite it's name, SSL versions 2 or 3,
or SSL at all - it is in fact an alias for the latest TLS version :-(
This misunderstanding led us to open the incorrect issue #8837.
So in this patch, we avoid the old Python APIs for choosing protocols,
which were gradually deprecated, and switch to the new API introduced
in Python 3.7 and OpenSSL 1.1.0g - supplying the minimum and maximum
desired protocol version.
With this new API, we can correctly connect with various versions of
the SSL and TLS protocol - between SSLv3 through TLSv1.3. With the
fixed test, we confirm that Scylla does *not* allow SSLv3 - as desired -
so issue #8837 is a non-issue.
Moreover, after issue #8827 was already fixed, this test now passes,
so the "xfail" mark is removed.
Refs #8837.
Refs #8827.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210617134305.173034-1-nyh@scylladb.com>
The patch set introduces a few leadership transfer tests, some of them
are adaptations of corresponding etcd tests (e.g.
`test_leader_transfer_ignore_proposal` and `test_transfer_non_member`).
Others test different scenarios ensuring that pending leadership
transfer doesn't disrupt the rest of the cluster from progressing:
Lost `timeout_now` messages` (`test_leader_transfer_lost_timeout_now` and
`test_leader_transferee_dies_upon_receiving_timeout_now`) as well as
lost `vote_request(force)` from the new candidate
(test_leader_transfer_lost_force_vote_request) don't impact the
election process following that and the leader is elected as normal.
* manmanson/leadership_transfer_tests_v3:
raft: etcd_test: test_transfer_non_member
raft: etcd_test: test_leader_transfer_ignore_proposal
raft: fsm_test: test_leader_transfer_lost_force_vote_request
raft: fsm_test: test_leader_transfer_lost_timeout_now
raft: fsm_test: test_leader_transferee_dies_upon_receiving_timeout_now
Turns out the DELETE statement already supports attributes
like timestamp, so it's ridiculously easy to add USING TIMEOUT
support - it's just the matter of accepting it in the grammar.
Fixes#8855Closes#8876
The option is provided by nodetool snapshot
https://docs.scylladb.com/operating-scylla/nodetool-commands/snapshot/
```
nodetool [(-h <host> | --host <host>)] [(-p <port> | --port <port>)]
[(-pp | --print-port)] [(-pw <password> | --password <password>)]
[(-pwf <passwordFilePath> | --password-file <passwordFilePath>)]
[(-u <username> | --username <username>)] snapshot
[(-cf <table> | --column-family <table> | --table <table>)]
[(-kc <kclist> | --kc.list <kclist>)]
[(-sf | --skip-flush)] [(-t <tag> | --tag <tag>)] [--] [<keyspaces...>]
-sf / –skip-flush Do not flush memtables before snapshotting (snapshot will not contain unflushed data)
```
But is currently ignored by scylla-jmx (scylladb/scylla-jmx#167)
and not supported at the api level.
This patch adds support for the option in advance
from the api service level down via snapshot_ctl
to the table class and snapshot implementation.
In addition, a corresponding unit test was added to verify
that taking a snapshot with `skip_flush` does not flush the memtable
(at the table::snapshot level).
Refs #8725Closes#8726
* github.com:scylladb/scylla:
test: database_test: add snapshot_skip_flush_works
api: storage_service/snapshots: support skip-flush option
snapshot: support skip_flush option
table: snapshot: add skip_flush option
api: storage_service/snapshots: add sf (skip_flush) option
This is a translation of Cassandra's CQL unit test source file
validation/entities/StaticColumnsTest.java into our our cql-pytest framework.
This test file checks various features of static columns. All these tests
pass on Cassandra, and all but one pass on Scylla. The xfailing test,
testStaticColumnsWithSecondaryIndex, exposes a query that Cassandra
allows but we don't. The new issue about that is:
Refs #8869.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210616141633.114325-1-nyh@scylladb.com>
Miscellaneous preparatory patches for group 0 discovery.
* scylla-dev/raft-group-0-part-2-v4:
raft: (service) servers map is gid -> server, not sid -> server
system_keyspace: raft.group_id and raft_snapshots.group_id are TIMEUUID
raft: (server) wait for configuration transition to complete
raft: (server) implement raft::server::get_configuration()
raft: (service) don't throw from schema state machine
raft: (service) permit some scylla.raft cells to be empty
raft: (service) properly handle failure to add a server
raft: implement is_transient_error()
Raft Group registry should map Raft Group Id to Raft Server,
not Raft Server ID (which is identical for all groups) to Raft server.
Raft Group 0 ID works as a cluster identifier, so is generated when a
new cluster is created and is shared by all nodes of the same cluster.
Implement a helper to get raft::server by group id.
Consistently throw a new raft_group_not_found exception
if there is no server or rpc for the specified group id.
"
The LSA small objects allocation latency is greatly affected by
the way this allocator encodes the object descriptor in front of
each allocated slot.
Nowadays it's one of VLE variants implemented with the help of a
loop. Re-implementing this piece with less instructions and without
a loop allows greatly reducing the allocation latency.
The speed-up mostly comes from loop-less code that doesn't confuse
branch predictor. Also the express encoder seems to benefit from
writing 8 bytes of the encoded value in one go, rather than byte-
-by-byte.
Perf measurements:
1. (new) logallog test shows ~40% smaller times
2. perf_mutation in release mode shows ~2% increase in tps
3. the encoder itself is 2 - 4 times faster on x86_64 and
1.05 - 3 times faster on aarch64. The speed-up depends on
the 'encoded length', old encoder has linear time, the
new one is constant
tests: unit(dev), perf(release), just encoder on Aarch64
"
* 'br-lsa-alloc-latency-4' of https://github.com/xemul/scylla:
lsa: Use express encoder
uleb64: Add express encoding
lsa: Extract uleb64 code into header
test: LSA allocation perf test
"
The close path of the multishard combining reader is riddled with
workarounds the fact that the flat mutation reader couldn't wait on
futures when destroyed. Now that we have a close() method that can do
just that, all these workarounds can be removed.
Even more workarounds can be found in tests, where resources like the
reader concurrency semaphore are created separately for each tested
multishard reader and then destroyed after it doesn't need it, so we had
to come up with all sorts of creative and ugly workarounds to keep
these alive until background cleanup is finished.
This series fixes all this. Now, after calling close on the multishard
reader, all resources it used, including the life-cycle policy, the
semaphores created by it can be safely destroyed. This greatly
simplifies the handling of the multishard reader, and makes it much
easier to reason about life-cycle dependencies.
Tests: unit(dev, release:v2, debug:v2,
mutation_reader_test:debug -t test_multishard,
multishard_mutation_query_test:debug,
multishard_combining_reader_as_mutation_source:debug)
"
* 'multishard-combining-reader-close-cleanup/v3' of https://github.com/denesb/scylla:
mutation_reader: reader_lifecycle_policy: remove convenience methods
mutation_reader: multishard_combining_reader: store shard_reader via unique ptr
test/lib/reader_lifecycle_policy: destroy_reader: cleanup context
test/lib/reader_lifecycle_policy: get rid of lifecycle workarounds
test/lib/reader_lifecycle_policy: destroy_reader(): stop the semaphore
test/lib/reader_lifecycle_policy: use a more robust eviction mechanism
reader_concurrency_semaphore: wait for all permits to be destroyed in stop()
test/lib/reader_lifcecycle_policy: fix indentation
mutation_reader: reader_lifecycle_policy::destroy_reader(): require to be called on native shard
reader_lifecycle_policy implementations: fix indentation
mutation_reader: reader_lifecycle_policy::destroy_reader(): de-futurize reader parameter
mutation_reader: shard_reader::close(): wait on the remote reader
multishard_mutation_query: destroy remote parts in the foreground
mutation_reader: shard_reader::close(): close _reader
mutation_reader: reader_lifcecycle_policy::destroy_reader(): remove out-of-date comment
The test measures the time it takes to allocate a bunch
of small objects on LSA inside single segment.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently `require()` throws an exception when the condition fails. The
problem with this is that the error is only printed at the end of the
test, with no trace in the logs on where exactly it happened, compared
to other logged events. This patchs also adds an error-level log line to
address this.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210616065711.46224-1-bdenes@scylladb.com>
Now that we don't rely on any external machinery to keep the relevant
parts of the context alive until needed as its life-cycle is effectively
enclosed in that of the life-cycle policy itself, we can cleanup the
context in `destroy_reader()` itself, avoiding a background trip back to
this shard.
The lifecycle of the reader lifecycle policy and all the resources the
reads use is now enclosed in that of the multishard reader thanks to its
close() method. We can now remove all the workarounds we had in place to
keep different resources as long as background reader cleanup finishes.
So that when this method returns the semaphore is safe to destroy. This
in turn will enable us to get rid of all the machinery we have in place
to deal with the semaphore having to out-live the lifecycle policy
without a clear time as to when it can be safe to destroy.
The test reader lifecycle policy has a mode in which it wants to ensure
all inactive readers are evicted, so tests can stress reader recreation
logic. For this it currently employs a trick of creating a waiter on the
semaphore. I don't even know how this even works (or if it even does)
but it sure complicates the lifecycle policy code a lot.
So switch to the much more reliable and simple method of creating the
semaphore with a single count and no memory. This ensures that all
inactive reads are immediately evicted, while still allows a single read
to be admitted at all times.
Currently shard_reader::close() (its caller) goes to the remote shard,
copies back all fragments left there to the local shard, then calls
`destroy_reader()`, which in the case of the multishard mutation query
copies it all back to the native shard. This was required before because
`shard_reader::stop()` (`close()`'s) predecessor) couldn't wait on
`smp::submit_to()`. But close can, so we can get rid of all this
back-and-forth and just call `destroy_reader()` on the shard the reader
lives on, just like we do with `create_reader()`.
The shard reader is now able to wait on the stopped reader and pass the
already stopped reader to `destroy_reader()`, so we can de-futurize the
reader parameter of said method. The shard reader was already patched to
pass a ready future so adjusting the call-site is trivial.
The most prominent implementation, the multishard mutation query, can
now also drop its `_dismantling_gate` which was put in place so it can
wait on the background stopping if readers.
A consequence of this move is that handling errors that might happen
during the stopping of the reader is now handled in the shard reader,
not all lifecycle policy implementations.
"
This series introduces a new version of the mutation fragment stream (called v2)
which aims at improving range tombstone handling in the system.
When compacting a mutation fragment stream (e.g. for sstable compaction, data query, repair),
the compactor needs to accumulate range tombstones which are relevant for the yet-to-be-processed range.
See range_tombstone_accumulator. One problem is that it has unbounded memory footprint because the
accumulator needs to keep track of all the tombstoned ranges which are still active.
Another, although more benign, problem is computational complexity needed to maintain that data structure.
The fix is to get rid of the overlap of range tombstones in the mutation fragment stream. In v2 of the
stream, there is no longer a range_tombstone fragment. Deletions of ranges of rows within a given
partition are represented with range_tombstone_change fragments. At any point in the stream there
is a single active clustered tombstone. It is initially equal to the neutral tombstone when the
stream of each partition starts. The range_tombstone_change fragment type signify changes of the
active clustered tombstone. All fragments emitted while a given clustered tombstone is active are
affected by that tombstone. Like with the old range_tombstone fragments, the clustered tombstone
is independent from the partition tombstone carried in partition_start.
The memory needed to compact a stream is now constant, because the compactor needs to only track the
current tombstone. Also, there is no need to expire ranges on each fragment because the stream emits
a fragment when the range ends.
This series doesn't convert all readers to v2. It introduces adaptors which can convert
between v1 and v2 streams. Each mutation source can be constructed with either v1 or v2 stream factory,
but it can be asked any version, performing conversion under the hood if necessary.
In order to guarantee that v1 to v2 conversion produces a well-formed stream, this series needs to
impose a constraint on v1 streams to trim range tombstones to clustering restrictions. Otherwise,
v1->v2 converted could produce range tombstone changes which lie outside query restrictions, making
the stream non-canonical.
The v2 stream is strict about range tombstone trimming. It emits range tombstone changes which reflect
range tombstones trimmed to query restrictions, and fast-forwarding ranges. This makes the stream
more canonical, meaning that for a given set of writes, querying the database should produce the
same stream of fragments for a given restrictions. There is less ambiguity in how the writes
are represented in the fragment stream. It wasn't the case with v1. For example, A given set
of deletions could be produced either as one range_tombstone, or may, split and/or deoverlapped
with other fragments. Making a stream canonical is easier for diff-calculating.
The mc sstable reader was converted to v2 because it seemed like a comparable effort to do that
versus implementing range tombstone trimming in v1.
The classes related to mutation fragment streams were cloned:
flat_mutation_reader_v2, mutation_fragment_v2, related concepts.
Refs #8625. To fully fix#8625 we need to finish the transition and get rid of the converters.
Converters accumulate range tombstones.
Tests:
- unit [dev]
"
* tag 'flat_mutation_reader_range_tombstone_split-v3.2' of github.com:tgrabiec/scylla: (26 commits)
tests: mutation_source_test: Run tests with conversions inserted in the middle
tests: mutation_source_tests: Unroll run_flat_mutation_reader_tests()
tests: Add tests for flat_mutation_reader_v2
flat_mutation_reader: Update the doc to reflect range tombstone trimming
sstables: Switch the mx reader to flat_mutation_reader_v2
row_cache: Emit range tombstone adjacent to upper bound of population range
tests: sstables: Fix test assertions to not expect more than they should
flat_mutation_reader: Trim range tombstones in make_flat_mutation_reader_from_fragments()
clustering_ranges_walker: Emit range tombstone changes while walking
tests: flat_mutation_reader_assertions_v2: Adapt to the v2 stream
Clone flat_reader_assertions into flat_reader_assertions_v2
test: lib: simple_schema: Reuse new_tombstone()
test: lib: simple_schema: Accept tombstone in delete_range()
mutation_source: Introduce make_reader_v2()
partition_snapshot_flat_reader: Trim range tombstones to query ranges
mutation_partition: Trim range tombstones to query ranges
sstables: reader: Inline specialization of sstable_mutation_reader
sstables: k_l: reader: Trim range tombstones to query ranges
clustering_ranges_walker: Introduce split_tombstone()
position_range: Introduce contains() check for ranges
...
The one was added to smothly switch tri-comparing stuff from int
to strong-ordering. As for today only tests still need it and the
conversion is pretty simple, plus operator<<(ostream&) for the
std::strong_ordering type.
* xemul/br-remove-int-or-strong-ordering-2:
util: Drop int_or_strong_ordering concept
tests: Switch total-order-check onto strong_ordering
to_string: Add formatter for strong_ordering
tests: Return strong-ordering from tri-comparators
The main difficulty was in making sure that emitted range tombstone
changes reflect range tombstones trimmed to clustering restrictions.
This is handled by mutation_fragment_filter and
clustering_ranges_walker. They return a list of range_tombstone_change
fragments to emit for each hop as the reader walks over the clustering
domain.
Tests which were using a normalizing reader expected range tombstones
to be split around rows. Drop this an adjust the tests accoridngly. No
reader splits range tombstones around rows now.
Cache populating reader was emitting the row entry which stands for
the upper bound of the population range, but did not emit range
tombstones for the clustering range corresponding to:
[ before(key), after(key) ).
This surfaces after sstable readers are changed to trim emitted range
tombstones to the fast-forwarding range. Before, it didn't cause
problems, because that range tombstone part would be emitted as part
of the sstable read.
The fix is to drop the optimization which pushes the row after
population is done, and let the regular handling for
copy_from_cache_to_buffer() take care of emitting the row and
tombstones for the remaining range.
A unit test is added which covers population from all sstable
versions.
Before this patch, the tests expected readers to emit range tombstones
which are outside clustering restrictions. Readers do not have to emit
range tombstones outside clustering restrictions, so fix tests to only
expect the part which overlaps with query ranges.
This is a preparatory patch before changing readers to trim range
tombstones to clustering ranges.
Test that a node outside configuration, that receives `timeout_now`
message, doesn't disrupt operation of the rest of the cluster.
That is, `timeout_now` has no effect and the outsider stays in
the follower state without promoting to the candidate.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Test that a leader which has entered leader stepdown mode rejects
new append requests.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
3-node cluster (A, B, C). A is initially elected a leader.
The leader adds a new configuration entry, that removes it from the
cluster (B, C).
Wait up until the former leader commits the new configuration and starts
leader transfer procedure, sending out the `timeout_now` message to
one of the remaining nodes. But at that point it haven't received it yet.
Deliver the `timeout_now` message to the target but lose all the
`vote_request(force)` messages it attempts to send.
This should halt the election process.
Then wait for election timeout so that candidate node starts another
normal election (without `force` flag for vote requests).
Check that this candidate further makes progress and is elected a
leader.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
3-node cluster (A, B, C). A is initially elected a leader.
The leader adds a new configuration entry, that removes it from the
cluster (B, C).
Wait up until the former leader commits the new configuration and starts
leader transfer procedure, sending out the `timeout_now` message to
one of the remaining nodes. But at that point it haven't received it yet.
Lose this message and verify that the rest of the cluster (B, C)
can make progress and elect a new leader.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
4-node cluster (A, B, C, D). A is initially elected a leader.
The leader adds a new configuration entry, that removes it from the
cluster (B, C, D).
Communicate the cluster up to the point where A starts to resign
its leadership (calls `transfer_leadership()`).
At this point, A should send a `timeout_now` message to one
the remaining nodes (B, C or D) and the new configuration should be
committed. But no nodes actually have received the `timeout_now` message
yet.
Determine on which node the message should arrive, accept the
`timeout_now` message and disconnect the target from the rest of the
group.
Check that after that the cluster, which has only two live members,
could progress and elect a new leader through a normal election process.
tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Previously `ticker` would use a single function, `on_tick`, which it
called in a loop with yields in-between. In `basic_test` we would use
this to tick every object in synchrony.
However, to closely simulate a production environment, we want the
tick ratios to be different. For example Raft servers should be ticked
rarely compared to the network.
We may also want to give the Seastar reactor more space between the
function calls (e.g. if they cause a bunch of work to be created for the
reactor that needs more than one tick to complete).
To support these use cases we first generalize `ticker` to take a set of
functions with associated numbers. These numbers are the call periods of
their corresponding functions: given {n, f}, `f` will be called each
`n`th tick.
We use this new functionality to tick Raft servers less often than the
network in basic_test.
This patchset effectively reverts 01b6a2eb38
which caused the ticker to call `on_tick` only when the Seastar reactor
had no work to do. This approach is unfortunately incompatible with the
approach taken there. We *do* want the ticker to race with other work,
potentially producing more work while already scheduled work is executing,
and we want to see in tests what happens when we adjust the ticking ratios
of different subsystems.
The previous approach also had a problem where if there was an infinite task
loop executing, the ticker wouldn't ever tick.
The previous fix was introduced since the ticker caused too much work to
be produced (so the reactor couldn't keep up) due to ticking the Raft
servers too often (after each yield). This commit deals with the problem
in a different way, by ticking the servers rarely, which also resembles
"real-life" scenarios better.
* kbr/tick-network-often-v4:
raft: randomized_nemesis_test: generalize `ticker` to take a set of functions
raft: randomized_nemesis_test: split `environment::tick` into two functions
raft: randomized_nemesis_test: fix potential use-after-free in basic_test
... with associated calling periods and use the new API in `basic_test`.
Previously `ticker` would use a single function, `on_tick`, which it
called in a loop with yields in-between. In `basic_test` we would use
this to tick every object in synchrony.
However, to closely simulate a production environment, we may want the
tick ratios to be different. For example Raft servers should be ticked
rarely compared to the network.
We may also want to give the Seastar reactor more space between the
function calls (e.g. if they cause a bunch of work to be created for the
reactor that needs more than one tick to complete).
To support these use cases we generalize `ticker` to take a set of
functions with associated numbers. These numbers are the call periods of
their corresponding functions: given {n, f}, `f` will be called each
`n`th tick.
We also modify `basic_test` to use this new approach: we tick Raft
servers once per 10 network ticks (in particular, once per 10 reactor
yields).
This commit effectively reverts 01b6a2eb38
which caused the ticker to call `on_tick` only when the Seastar reactor
had no work to do. This approach is unfortunately incompatible with the
approach taken there. We *do* want the ticker to race with other work,
potentially producing more work while already scheduled work is executing,
and we want to see in tests what happens when we adjust the ticking ratios
of different subsystems.
The previous approach also had a problem where if there was an infinite task
loop executing, the ticker wouldn't ever tick.
The previous fix was introduced since the ticker caused too much work to
be produced (so the reactor couldn't keep up) due to ticking the Raft
servers too often (after each yield). This commit deals with the problem
in a different way, by ticking the servers rarely, which also resembles
"real-life" scenarios better.
With this change we must also wait a bit longer for the first node to
elect itself as a leader at the beginning of the test.
The test starts by waiting a certain number of ticks for the first node
to elect itself as a leader.
If this wait times out - i.e. the number of ticks passes before the node
manages to elect itself - the future associated with the task which checks
for the leader condition becomes discarded (it is passed to
`with_timeout`) and the task may keep using the `environment` (which it
has a reference to) even after the `environment` is destroyed.
Furthermore, the aforementioned task is a coroutine which uses lambda
captures in its body. Leaving `with_timeout` destroys the lambda object,
causing the coroutine to refer to no-longer-existing captures.
We fix the problems by:
- making `environment` `weakly_referencable` and checking if its alive
before it's used inside the task,
- not capturing anything in the lambda but passing whatever's needed as
function arguments (so these things get allocated inside the coroutine
frame).
Merged patch series by Pavel Emelyanov:
Alternator start and stop code is sitting inside the main()
and it's a big piece of code out there. Havig it all in main
complicates rework of start-stop sequences, it's much more
handy to have it in alternator/.
This set puts the mentioned code into transport- and thrift-
like controller model. While doing it one more call for global
storage service goes away.
* 'br-alternator-clientize' of https://github.com/xemul/scylla:
alternator: Move start-stop code into controller
alternator: Move the whole starting code into a sched group
alternator: Dont capture db, use cfg
alternator: Controller skeleton
alternator: Controller basement
alternator: Drop storage service from executor
This patch also fixes rare hangs in debug mode for drops_04 without
prevote.
Branch URL: https://github.com/alecco/scylla/tree/raft-fixes-05-v2-dueling
Tests: unit ({dev}), unit ({debug}), unit ({release})
Changes in v2:
- Fixed commit message @kostja
Whithout prevote, a node disconnected for long enough becomes candidate.
While disconnected (A) it keeps increasing its term.
When it rejoins it disrupts the current leader (C) which steps down due
to the higher term in (A)'s append_entries_reply and (C) also increases
its term.
Meanwhile followers (B) and (D) don't know (C) stepped down but see it
alive according to the current failure detecture implementation, and
also (A) has shorter log than them.
So they reject (A)'s vote requests (Raft 4.2.3 Disruptive servers).
Then (C) rejects voting for (A) because it has shorter log.
And (C) becomes candidate but even though (A) votes for (C), the
previous followers (B) and (D) ignore a vote request while leader (C) is
still alive and election timeout has not passed.
(A) and (C) alone can't reach quorum 2/4. So elections never succeed.
This patch addresses this problem by making followers not ignore vote
requests from who they think is the current leader even though
election timout was not reached.
As @kostja noted, if failure detector would consider a leader alive only
as long as it sends heartbeats (append requests) this patch is no longer
needed.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20210611172734.254757-1-alejo.sanchez@scylladb.com>
Incremental selection may not work properly for LCS and ICS due to an
use-after-free bug in partitioned set which came into existence after
compound set was introduced.
The use-after-free happens because partitioned set wasn't taking into
account that the next position can become the current position in the
next iteration, which will be used by all selectors managed by
compound set. So if next position is freed, when it were being used
as current position, subsequent selectors would find the current
position freed, making them produce incorrect results.
Fix this by moving ownership of next pos from incremental_selector_impl
to incremental_selector, which makes it more robust as the latter knows
better when the selection is done with the next pos. incremental_selector
will still return ring_position_view to avoid copies.
Fixes#8802.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210611130957.156712-1-raphaelsc@scylladb.com>
Add test coverage inspired by etcd for non-voter servers,
and fix issues discovered when testing.
* scylla-dev/raft-learner-test-v4:
raft: (testing) test non-voter can vote
raft: (testing) test receiving a confchange in a snapshot
raft: (testing) test voter-non-voter config change loop
raft: (testing) test non-voter doesn't start election on election timeout
raft: (testing) test what happens when a learner gets TimeoutNow
raft: (testing) implement a test for a leader becoming non-voter
raft: style fix
raft: step down as a leader if converted to a non-voter
raft: improve configuration consistency checks
raft: (testing) test that non-voter stays in PIPELINE mode
raft: (testing) always return fsm_debug in create_follower()
This is a reproducer for issue #8827, that checks that a client which
tries to connect to Scylla with an unsupported version of SSL or TLS
gets the expected error alert - not some sort of unexpected EOF.
Issue #8827 is still open, so this test is still xfailing. However,
I verified that with a fix for this issue, the test passes.
The test also prints which protocol versions worked - so it also helps
checking issue #8837 (about the ancient SSL protocol being allowed).
Refs #8837
Refs #8827
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210610151714.1746330-1-nyh@scylladb.com>