One INSERT statement was unnecessary for the test, so delete it.
Another was necessary, so explain it.
Tests: cql-pytest/test_null on both Scylla and Cassandra
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#8304
Test raft configuration changes:
a node with empty configuration, transitioning
to an entirely different cluster, transitioning
in presence of down nodes, leader change during
configuration change, stray replies, etc.
* scylla-dev/raft-empty-confchange-v5: (21 commits)
raft: (testing) stray replies from removed followers
raft: always return a non-zero configuration index from the log
raft: (testing) leader change during configuration change
raft: (testing) test confchange {ABCDE} -> {ABCDEFG}
raft: (testing) test confchange {ABCDEF} -> {ABCGH}
raft: (testing) test confchange {ABC} -> {CDE}
raft: (testing) test confchange {AB} -> {CD}
raft: (testing) test confchange {A} -> {B}
raft: (testing) test a server with empty configuration
raft: (testing) introduce testing utilities
raft: (testing) simplify id allocation in test
raft: (testing) add select_leader() helper
raft: (testing) introduce communicate() helper
raft: (testing) style cleanup in raft_fsm_test
raft: (testing) fix bug in election_threshold
raft: minor style changes & comments
raft: do not assert when transitioning to empty config
raft: assert we never apply a snapshot over uncommitted entries (leader)
raft: improve tracing
raft: add fsm_output::empty() helper to aid testing
...
Right now toppartitions can only be invoked on one column family at a time.
This change introduces a natural extension to this functionality,
allowing to specify a list of families.
We provide three ways for filtering in the query parameter "name_list":
1. A specific column family to include in the form "ks:cf"
2. A keyspace, telling the server to include all column families in it.
Specified by omitting the cf name, i.e. "ks:"
3. All column families, which is represented by an empty list
The list can include any amount of one or both of the 1. and 2. option.
Fixes#4520Closes#7864
This series implements leader stepdown extension. See patch 4 for
justification for its existence. First three patches either implement
cleanups to existing code that future patch will touch or fix bugs
that need to be fixed in order for stepdown test to work.
* 'raft-leader-stepdown-v3' of github.com:scylladb/scylla-dev:
raft: add test for leader stepdown
raft: introduce leader stepdown procedure
raft: fix replication when leader is not part of current config
raft: do not update last election time if current leader is not a part of current configuration
raft: move log limiting semaphore into the leader state
"
Scylla suffers with aggressive compaction after repair-based operation has initiated. That translates into bad latency and slowness for the operation itself.
This aggressiveness comes from the fact that:
1) new sstables are immediately added to the compaction backlog, so reducing bandwidth available for the operation.
2) new sstables are in bad shape when integrated into the main sstable set, not conforming to the strategy invariant.
To solve this problem, new sstables will be incrementally reshaped, off the compaction strategy, until finally integrated into the main set.
The solution takes advantage there's only one sstable per vnode range, meaning sstables generated by repair-based operations are disjoint.
NOTE: off-strategy for repair-based decommission and removenode will follow this series and require little work as the infrastructure is introduced in this series.
Refs #5226.
"
* 'offstrategy_v7' of github.com:raphaelsc/scylla:
tests: Add unit test for off-strategy sstable compaction
table: Wire up off-strategy compaction on repair-based bootstrap and replace
table: extend add_sstable_and_update_cache() for off-strategy
sstables/compaction_manager: Add function to submit off-strategy work
table: Introduce off-strategy compaction on maintenance sstable set
table: change build_new_sstable_list() to accept other sstable sets
table: change non_staging_sstables() to filter out off-strategy sstables
table: Introduce maintenance sstable set
table: Wire compound sstable set
table: prepare make_reader_excluding_sstables() to work with compound sstable set
table: prepare discard_sstables() to work with compound sstable set
table: extract add_sstable() common code into a function
sstable_set: Introduce compound sstable set
reshape: STCS: preserve token contiguity when reshaping disjoint sstables
Section 3.10 of the PhD describes two cases for which the extension can
be helpful:
1. Sometimes the leader must step down. For example, it may need to reboot
for maintenance, or it may be removed from the cluster. When it steps
down, the cluster will be idle for an election timeout until another
server times out and wins an election. This brief unavailability can be
avoided by having the leader transfer its leadership to another server
before it steps down.
2. In some cases, one or more servers may be more suitable to lead the
cluster than others. For example, a server with high load would not make
a good leader, or in a WAN deployment, servers in a primary datacenter
may be preferred in order to minimize the latency between clients and
the leader. Other consensus algorithms may be able to accommodate these
preferences during leader election, but Raft needs a server with a
sufficiently up-to-date log to become leader, which might not be the
most preferred one. Instead, a leader in Raft can periodically check
to see whether one of its available followers would be more suitable,
and if so, transfer its leadership to that server. (If only human leaders
were so graceful.)
The patch here implements the extension and employs it automatically
when a leader removes itself from a cluster.
"
Due to bad interaction of recent changes (913d970 and 4c8ab10) inctive
readers that are not admitted have managed to completely fly under the
radar, avoiding any sort of limitation. The reason is that pre-admission
the permits don't forward their resource cost to the semaphore, to
prevent them possibly blocking their own admission later. However this
meant that if such a reader is registered as inactive, it completely
avoids the normal resource based eviction mechanism and can accumulate
without bounds.
The real solution to this is to move the semaphore before the cache and
make all reads pass admission before they get started (#4758). Although
work has been started towards this, it is still a while until it lands.
In the meanwhile this patchset provides a workaround in the form of a
new inactive state, which -- like admitted -- causes the permit to
forward its cost to the semaphore, making sure these un-admitted
inactive reads are accounted for and evicted if there is too much of
them.
Fixes: #8258
Tests: unit(release), dtest(oppartitions_test.py:TestTopPartitions.test_read_by_gause_key_distribution_for_compound_primary_key_and_large_rows_number)
"
* 'reader-concurrency-semaphore-limit-inactive-reads/v4' of https://github.com/denesb/scylla:
test: mutation_reader_test: add test for permit cleanup
test: querier_cache_test: add memory based cache eviction test
reader_permit: add inactive state
querier: insert(): account immediately evicted querier as resource based eviction
reader_concurrency_semaphore: fix clear_inactive_reads()
reader_concurrency_semaphore: make inactive_read_handle a weak reference
reader_concurrency_semaphore: make evict() noexcept
reader_concurrency_semaphore: update out-of-date comments
After commit 0bd201d3ca ("cql3: Skip indexed
column for CK restrictions") fixed issue #7888, the test
cassandra_tests/validation/entities/frozen_collections_test.py::testClusteringColumnFiltering
began passing, as expected. So we can remove its "xfail" label.
Refs #7888.
cassandra_tests/validation/entities/frozen_collections_test.py::testClusteringColumnFiltering
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210321080522.1831115-1-nyh@scylladb.com>
This is a translation of Cassandra's CQL unit test source file
validation/entities/SecondaryIndexOnMapEntriesTest.java into our
our cql-pytest framework.
This test file checks various features of indexing (with secondary index)
individual entries of maps. All these tests pass on Cassandra, but fail on
Scylla because of issue #2962 - we do not yet support indexing of the content
of unfrozen collections. The failing test currently fail as soon as they
try to create the index, with the message:
"Cannot create secondary index on non-frozen collection or UDT column v".
Refs #2962.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210310124638.1653606-1-nyh@scylladb.com>
When a tuple value is serialized, we go through every element type and
use it to serialize element values. But an element type can be
reversed, which is artificially different from the type of the value
being read. This results in a server error due to the type mismatch.
Fix it by unreversing the element type prior to comparing it to the
value type.
Fixes#7902
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#8316
When querying an index table, we assemble clustering-column
restrictions for that query by going over the base table token,
partition columns, and clustering columns. But if one of those
columns is the indexed column, there is a problem; the indexed column
is the index table's partition key, not clustering key. We end up
with invalid clustering slice, which can cause problems downstream.
Fix this by skipping the indexed column when assembling the clustering
restrictions.
Tests: unit (dev)
Fixes#7888
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#8320
A trichotomic comparator returning an int an easily be mistaken
for a less comparator as the return types are convertible.
Use the new std::strong_ordering instead.
A caller in cql3's update_parameters.hh is also converted, following
the path of least resistance.
Ref #1449.
Test: unit (dev)
Closes#8323
Currently send_snapshot is the only two-way RPC used by Raft.
However, the sender (the leader) does not look at the receiver's
reply, other than checks it's not an error. This has the following
issues:
- if the follower has a newer term and rejects the snapshot for
that reason, the leader will not learn about a newer follower
term and will not step down
- the send_snapshot message doesn't pass through a single-endpoint
fsm::step() and thus may not follow the general Raft rules
which apply for all messages.
- making a general purpose transport that simply calls fsm::step()
for every message becomes impossible.
Fix it by actually responding with snapshot_reply to send_snapshot
RPC, generating this reply in fsm::step() on the follower,
and feeding into fsm::step() on the leader.
* scylla-dev/raft-send-snapshot-v2:
raft: pass snapshot_reply into fsm::step()
raft: respond with snapshot_reply to send_snapshot RPC
raft: set follower's next_idx when switching to SNAPSHOT mode
raft: set the current leader upon getting InstallSnapshot
By default the boto3 library waits up to 60 second for a response,
and if got no response, it sends the same request again, multiple
times. We already noticed in the past that it retries too many times
thus slowing down failures, so in our test configuration lowered the
number of retries to 3, but the setting of 60-second-timeout plus
3 retries still causes two problems:
1. When the test machine and the build are extremely slow, and the
operation is long (usually, CreateTable or DeleteTable involving
multiple views), the 60 second timeout might not be enough.
2. If the timeout is reached, boto3 silently retries the same operation.
This retry may fail because the previous one really succeeded at
least partially! The symptom is tests which report an error when
creating a table which already exists, or deleting a table which
dooesn't exist.
The solution in this patch is first of all to never do retries - if
a query fails on internal server error, or times out, just report this
failure immediately. We don't expect to see transient errors during
local tests, so this is exactly the right behavior.
The second thing we do is to increase the default timeout. If 1 minute
was not enough, let's raise it to 5 minutes. 5 minutes should be enough
for every operation (famous last words...).
Even if 5 minutes is not enough for something, at least we'll now see
the timeout errors instead of some wierd errors caused by retrying an
operation which was already almost done.
Fixes#8135
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210222125630.1325011-1-nyh@scylladb.com>
SSTables that are off-strategy should be excluded by this function as
it's used to select candidates for regular compaction.
So in addition to only returning candidates from the main set, let's
also rename it to precisely reflect its behavior.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
From now own, _sstables becomes the compound set, and _main_sstables refer
only to the main sstables of the table. In the near future, maintenance
set will be introduced and will also be managed by the compound set.
So add_sstable() and on_compaction_completion() are changed to
explicitly insert and remove sstables from the main set.
By storing compound set in _sstables, functions which used _sstables for
creating reader, computing statistics, etc, will not have to be changed
when we introduce the maintenance set, so code change is a lot minimized
by this approach.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This new sstable set implementation is useful for combining operation of
multiple sstable sets, which can still be referenced individually via
its shared ptr reference.
It will be used when maintenance set is introduced in table, so a
compound set is required to allow both sets to have their operations
efficiently combined.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Raft send_snapshot RPC is actually two-way, the follower
responds with snapshot_reply message. This message until now
was, however, muted by RPC.
Do not mute snapshot_reply any more:
- to make it obvious the RPC is two way
- to feed the follower response directly into leader's FSM and
thus ensure that FSM testing results produced when using a test
transport are representative of the real world uses of
raft::rpc.
`reader_concurrency_semaphore::register_inactive_read()` drops the
registered inactive read immediately if there is a resource shortage.
This is in effect a resource based eviction, so account it as such in
`querier::insert()`.
Broken by the move to an intrusive container (9cbbf40), which caused
said method to only clear the container but not destroy the inactive
reads contained therein. This patch restores the previous behaviour and
also adds a call the destructor (to ensure inactive reads are cleaned up
under any circumstances), as well as a unit test.
As #1449 notes, trichotomic comparators returning int are dangerous as they
can be mistaken for less comparators. This series converts dht::ring_position
and dht::decorated_key, as well as a few closely related downstream types, to
return std::strong_ordering.
Closes#8225
* github.com:scylladb/scylla:
dht: ring_position, decorated_key: convert tri_comparators to std::strong_ordering
pager: rephrase misleading comparison check
test: total_order_checks: prepare for std::strong_ordering
test: mutation_test: prepare merge_container for std::strong_ordering
intrusive_array: prepare for std::strong_ordering
utils: collection-concepts: prepare for std::strong_ordering
Adjust the total_order_check template to work with comparators
returning either int (as a temporary compatibility measure) or
std::strong_ordering (for #1449 safety).
The function merge_container() accepts a trichotomic comparator returning
an int. As #1449 explains, this is dangerous as it could be mistaken for
a less comparator. Switch to std::strong_ordering, but leave a compatible
merge_container() in place as it is still needed (even after this series).
The slowest test in test_streams.py is test_list_streams_paged. It is meant
to test the ListStreams operation with paging. The existing test repeated
its test four times, for four different stream types. However, there is
no reason to suspect that the ListStreams operation might somehow be
different for the four stream types... We already have other tests which
create streams of the four types, and uses these streams - we don't
need the test for ListStreams to also test creating the four types.
By doing this test just once, not four times, we can save around 1.5
seconds of test time.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210318073755.1784349-1-nyh@scylladb.com>
In the test test_tracing.py::test_tracing_all, we do some operations and
then need to wait until they appear in the tracing table.
The current code used an exponentially-increasing delay during this wait,
starting with 0.1 seconds and then doubling the delay until we find what
we're looking for.
However, it turns out that the delay until the data appears in the table
is deliberately chosen by Scylla - and is always around 2 seconds.
In this case, an exponential delay is really bad - we will usually wait
for around 1 seconds too long after the needed wait of 2 seconds.
So in this patch we replace the exponential delay by a constant delay -
we wait 0.3 seconds between each retry.
This change makes the test test_tracing.py::test_tracing_all finish
in a little over 2 seconds, instead of a little over 3 seconds
before this patch. We cannot reduce this 2 second time any further
unless we make the 2-second tracing delay configurable.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210318000040.1782933-1-nyh@scylladb.com>
The test test_table.py::test_table_streams_on creates tables with various
stream types, and then immediately deletes them without testing anything.
This is a slow test (taking almost a full second on my laptop), and is
redundant because in test_streams.py we have tests which create tables
with streams in the same way - but then actually test that things work
with these streams. So this test might as well be removed, and this is
what we do in this patch.
Removing this test shaves another second from the Alternator test suite's
run time.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210317230530.1780849-1-nyh@scylladb.com>