Commit Graph

1467 Commits

Author SHA1 Message Date
Tomasz Grabiec
305372820d Merge "Make position_in_partition::tri_compare use strong_ordering" from Pavel Emelyanov
There are some users of that tri_comparator which are also
converted to strong_ordering. Most of the code using those
is, in turn, already handling return values interchangeably.

The bound_view::tri_compare, which's used by the guy, is
still returning int.

tests: unit(dev)

* xemul/br-position-tri-compare:
  code: Relax position_in_partition::tri_compare users
  position_in_partition: Convert tri_compare to strong_ordering
  test: Convert clustering_fragment_summary::tri_cmp to strong_ordering
  repair: Convert repair_sync_boundary::tri_compare to strong_ordering
  view: Don't expect int from position_in_partition::tri_compare
2021-04-09 17:54:38 +02:00
Pavel Emelyanov
64074f45ce code: Relax position_in_partition::tri_compare users
There are some pieces left doing res <=> 0 with the
res now being a strong_ordering itself. All these can
be just dropped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 18:20:39 +03:00
Pavel Emelyanov
a15f158661 test: Convert clustering_fragment_summary::tri_cmp to strong_ordering
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 18:20:39 +03:00
Pavel Emelyanov
4558eb3afc partition_snapshot_row_cursor: Move cells hash creation to reader
Right now call to .row() method may create hash on row's cells.
It's counterintuitive to see a const method that transparently
changes something it points to. Since the only caller of a row()
who knows whether the hash creation is required is the cache
reader, it's better to move the call to prepare_hash() into it.

Other than making the .row() less surprising this also helps to
get rid of the whole method by the next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 12:18:29 +03:00
Pavel Emelyanov
00caf5f219 partition_snapshot_row_cursor: Move read_partition into test
The method in question is test-only helper, there's no
need in keeping it as a part of the API.

Another reason to move is that the method is O(number of
rows) and doesn't preempt while looping, but cursor code
users try hard not to stall the reactor. So even though
this method has a meaningful semantics within the class,
it will better be reinvented if needed in core code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 12:16:13 +03:00
Gleb Natapov
b9175edea4 raft: test: check that a server with id zero cannot be neither created nor added to a config
Message-Id: <20210407134853.1964226-2-gleb@scylladb.com>
2021-04-08 17:07:18 +02:00
Kamil Braun
3687757115 sstables: fix TWCS single key reader sstable filter
The filter passed to `min_position_reader_queue`, which was used by
`clustering_order_reader_merger`, would incorrectly include sstables as
soon as they passed through the PK (bloom) filter, and would include
sstables which didn't pass the PK filter (if they passed the CK
filter). Fortunately this wouldn't cause incorrect data to be returned,
but it would cause sstables to be opened unnecessarily (these sstables
would immediately return eof), resulting in a performance drop. This commit
fixes the filter and adds a regression test which uses statistics to
check how many times the CK filter was invoked.

Fixes #8432.

Closes #8433
2021-04-08 18:03:49 +03:00
Tomasz Grabiec
6d6f39a7b3 Merge "fixes for stepdown and quorum check" from Gleb
The series contains code cleanups and fixes for stepdown process
and quorum check code.

Note this is re-send of already posted patches lumped together for
convenience.

* scylla-dev/raft-fixes-v1:
  raft: add test for check quorum on a leader
  raft: fix quorum check code for joint config and non-voting members
  raft: do not hang on waiting for entries on a leader that was removed from a cluster
  raft: add more tracing to stepdown code
  raft: use existing election_elapsed() function instead of redo the calculation
  raft: test: add test case for stepdown process
  raft: check that a node is still the leader after initiating stepdown process
2021-04-08 15:18:52 +02:00
Avi Kivity
29a674cd94 test: perf: perf_fast_forward: report allocation rate and tasks
These are more stable than cpu consumed across runs, and impact
performance directly.

Closes #8422
2021-04-07 15:41:43 +02:00
Piotr Sarna
8e808a56d2 Merge 'commitlog: Fix race and edge condition in delete_segments' from Calle Wilund
Fixes #8363
Fixes #8376

Delete segements has two issues when running with size-limited
commit log and strict adherence to said limit.

1.) It uses parallel processing, with deferral. This means that
    the disk usage variables it looks at might not be fully valid
    - i.e. we might have already issued a file delete that will
    reduce disk footprint such that a segment could instead be
    recycled, but since vars are (and should) only updated
    _post_ delete, we don't know.
2.) It does not take into account edge conditions, when we only
    delete a single segment, and this segment is the border segment
    - i.e. the one pushing us over the limit, yet allocation is
    desperately waiting for recycling. In this case we should
    allow it to live on, and assume that next delete will reduce
    footprint. Note: to ensure exact size limit, make sure
    total size is a multiple of segment size.

if we had an error in recycling (disk rename?), and no elements
are available, we could have waiters hoping they will get segements.
abort the queue (not permanent, but wakes up waiters), and let them
retry. Since we did deletions instead, disk footprint should allow
for new allocs at least. Or more likely, everything is broken, but
we will at least make more noise.

Closes #8372

* github.com:scylladb/scylla:
  commitlog: Add signalling to recycle queue iff we fail to recycle
  commitlog: Fix race and edge condition in delete_segments
  commitlog: coroutinize delete_segments
  commitlog_test: Add test for deadlock in recycle waiter
2021-04-07 15:13:25 +02:00
Raphael S. Carvalho
8e0a1ca866 sstable_set: Implement compound_sstable_set's create_single_key_sstable_reader()
compound set isn't overriding create_single_key_sstable_reader(), so
default implementation is always called. Although default impl will
provide correct behavior, specialized ones which provides better perf,
which currently is only available for TWCS, were being ignored.

compound set impl of single key reader will basically combine single key
readers of all sets managed by it.

Fixes #8415.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210406205009.75020-1-raphaelsc@scylladb.com>
2021-04-07 12:36:30 +03:00
Nadav Har'El
da11cd99f7 Merge 'Add a (failing) test for picking secondary indexes in order' from Piotr Sarna
Currently the heuristics for picking an index for a query
are not very well defined. It would be best if we used
statistics to pick the index which is likely to perform
the fastest, but for starters we should at least let the user
decide which index to pick by picking the first one by the
order of restrictions passed to the query.
The (failing) test case from this patch shows the expected
results.

Ref: #7969

Closes #8414

* github.com:scylladb/scylla:
  cql-pytest: add a failing test for index picking order
  cql3: add tracing used secondary index
2021-04-07 11:40:37 +03:00
Piotr Sarna
1f7b972db7 cql-pytest: add a failing test for index picking order
Currently the heuristics for picking an index for a query
are not very well defined. It would be best if we used
statistics to pick the index which is likely to perform
the fastest, but for starters we should at least let the user
decide which index to pick by picking the first one by the
order of restrictions passed to the query.
The (failing) test case from this patch shows the expected
results.

Ref: #7969
2021-04-07 10:05:00 +02:00
Gleb Natapov
68d73bd4c8 raft: add test for check quorum on a leader 2021-04-07 10:15:33 +03:00
Gleb Natapov
bdb59307d3 raft: test: add test case for stepdown process
Add the test for the case where C_new entry is not the last one in a
leader that is been removed from a cluster. In this case a leader will
continue replication even after committing C_new and will start stepdown
process later, when at least one follower is fully synchronized.
2021-04-07 10:15:33 +03:00
Calle Wilund
813694b617 commitlog_test: Add test for deadlock in recycle waiter
Not a very good test, mind you. Nothing to verify, just see if
the test times out. But try to make it at least complete for
failure report.
2021-04-06 16:38:14 +00:00
Tomasz Grabiec
4b10247a4f Merge "raft: do not assert when receiving unexpected messages in a leader state" from Gleb
* scylla-dev/raft-cleanup-v2:
  raft: test: add test that leader behaves as expected when it gets unexpended messages
  raft: do not assert when receiving unexpected messages in a leader state
  raft: use existing function to check if election timeout elapsed
2021-04-06 16:52:23 +02:00
Konstantin Osipov
c83cf1f965 uuid: switch the API to use std::chrono
A follow up for the patch for #7611. This change was requested
during review and moved out of #7611 to reduce its scope.

The patch switches UUID_gen API from using plain integers to
hold time units to units from std::chrono.

For one, we plan to switch the entire code base to std::chrono units,
to ensure type safety. Secondly, using std::chrono units allows to
increase code reuse with template metaprogramming and remove a few
of UUID_gen functions that beceme redundant as a result.

* switch  get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch
  min_time_UUID(), max_time_UUID(), create_time_safe() to
  std::chrono
* remove unused variant of from_unix_timestamp()
* remove unused get_time_UUID_bytes(), create_time_unsafe(),
  redundant get_adjusted_timestamp()
* inline get_raw_UUID_bytes()
* collapse to similar implementations of get_time_UUID()
* switch internal constants to std::chrono
* remove unnecessary unique_ptr from UUID_gen::_instance
Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>
2021-04-06 17:12:54 +03:00
Nadav Har'El
0d0db05cf3 test/alternator: speed up two slow xfailing tests
By far the two slowest Alternator tests when running a development build on
my laptop are
	test_gsi.py::test_gsi_projection_include
and
	test_gsi.py::test_gsi_projection_keys_only
Each of those takes around 3.2, and the sum of just these two tests is as
much as 10% (!) of all other 600 tests.

The reason why these tests are slow is that they check scanning a GSI
with *projection*. Scylla currently ignores the projection, so the scan
returns the wrong value. Because this is a GSI, which supports only
eventually- consistent reads, we need to retry the read - and did it for
up to 3 seconds!

But this retry only makes sense if the GSI read did not *yet* return
the expected data. But in these xfailing test, we read a *wrong* item
(with too many attributes) almost immediately, and this should indicate
an immediate failure - no amount of retry would help. So in this patch
we detect this case and fail the test immediately instead of wasting
3 seconds in retries.

On my laptop with dev build, this patch reduces the time to run the
entire Alternator test suite from 70 seconds to 63 seconds.

Also, now that we never just waste time until the timeout, we can
increase it to any number, and in this patch we increase it from 3
seconds to 5.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210317183918.1775383-1-nyh@scylladb.com>
2021-04-06 14:49:15 +02:00
Nadav Har'El
15cab90f7b test/alternator: switch some fixture scopes from "session" to "module"
In conftest.py we have several fixtures creating shared tables which many
test files can share, so they are marked with the "session" scope - all
the tests in the testing session may share the same instance. This is fine.

Some of test files have additional fixtures for creating special tables
needed only in those files. Those were also, unnecessarily, marked
"session" scope as well. This means that these temporary tables are
only deleted at the very end of test suite, event though they can be
deleted at the end of the test file which needed them. This is exactly
what the "module" fixture scope is, so this patch changes all the
fixtures private to one test file to be "module".

After this patch, the teardown of the last test in the suite goes down
from 4 seconds to just 1.5 seconds (it's still long because there are
still plenty of session-scoped fixtures in conftest.py).

Another small benefit is that the peak disk usage of the test suite is
lower, because some of the temporary tables are deleted sooner.

This patch does not change any test functionality, and also does not
make any test faster - it just changes the order of the fixture
teardowns.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210317175036.1773774-1-nyh@scylladb.com>
2021-04-06 14:43:36 +02:00
Avi Kivity
40b60e8f09 Merge 'repair: Switch to use NODE_OPS_CMD for replace operation' from Asias He
In commit c82250e0cf (gossip: Allow deferring
advertise of local node to be up), the replacing node is changed to postpone
the responding of gossip echo message to avoid other nodes sending read
requests to the replacing node. It works as following:

1) replacing node does not respond echo message to avoid other nodes to
   mark replacing node as alive

2) replacing node advertises hibernate state so other nodes knows
   replacing node is replacing

3) replacing node responds echo message so other nodes can mark
   replacing node as alive

This is problematic because after step 2, the existing nodes in the
cluster will start to send writes to the replacing node, but at this
time it is possible that existing nodes haven't marked the replacing
node as alive, thus failing the write request unnecessarily.

For instance, we saw the following errors in issue #8013 (Cassandra
stress fails to achieve consistency when only one of the nodes is down)

```
scylla:
[shard 1] consistency - Live nodes 2 do not satisfy ConsistencyLevel (2
required, 1 pending, live_endpoints={127.0.0.2, 127.0.0.1},
pending_endpoints={127.0.0.3}) [shard 0] gossip - Fail to send
EchoMessage to 127.0.0.3: std::runtime_error (Not ready to respond
gossip echo message)

c-s:
java.io.IOException: Operation x10 on key(s) [4c4f4d37324c35304c30]:
Error executing: (UnavailableException): Not enough replicas available
for query at consistency QUORUM (2 required but only 1 alive
```

To solve this problem, we can do the replacing operation in multiple stages.

One solution is to introduce a new gossip status state as proposed
here: gossip: Introduce STATUS_PREPARE_REPLACE #7416

1) replacing node does not respond echo message

2) replacing node advertises prepare_replace state (Remove replacing
   node from natural endpoint, but do not put in pending list yet)

3) replacing node responds echo message

4) replacing node advertises hibernate state (Put replacing node in
   pending list)

Since we now have the node ops verb introduced in
829b4c1438 (repair: Make removenode safe
by default), we can do the multiple stage without introducing a new
gossip status state.

This patch uses the NODE_OPS_CMD infrastructure to implement replace
operation.

Improvements:

1) It solves the race between marking replacing node alive and sending
   writes to replacing node

2) The cluster reverts to a state before the replace operation
   automatically in case of error. As a result, it solves when the
   replacing node fails in the middle of the operation, the repacing
   node will be in HIBERNATE status forever issue.

3) The gossip status of the node to be replaced is not changed until the
   replace operation is successful. HIBERNATE gossip status is not used
   anymore.

4) Users can now pass a list of dead nodes to ignore explicitly.

Fixes #8013

Closes #8330

* github.com:scylladb/scylla:
  repair: Switch to use NODE_OPS_CMD for replace operation
  gossip: Add advertise_to_nodes
  gossip: Add helper to wait for a node to be up
  gossip: Add is_normal_ring_member helper
2021-04-04 12:54:09 +03:00
Gleb Natapov
10781037f5 raft: test: add test that leader behaves as expected when it gets unexpended messages 2021-04-04 11:33:35 +03:00
Avi Kivity
4739df2cb1 Merge 'cql3: remove linearizations in the write path' from Michał Chojnowski
As a part of the effort of removing big, contiguous buffers from the codebase,
cql3::raw_value should be made fragmented. Unfortunately a straightforward
rewrite to a fragmented buffer type is not possible, because we want
cql3::raw_value to be compatible with cql3::raw_value_view, and we want that
view to be based on fragmented_temporary_buffer::view, so that it can be
used to view data coming directly from seastar without copying.

This patch makes cql3::raw_value fragmented by making cql3::raw_value_view
a `variant` of managed_bytes_view and fragmented_temporary_buffer::view.

Code users which depended on `cql3::raw_value` being `bytes`,
and cql::raw_value_view being `fragmented_temporary_buffer::view` underneath
were adjusted to the new, dual representation, mainly through the
`cql3::raw_value_view::with_value` visitor and deserialization/validation
helpers added to `cql3::raw_value_view`.

The second part of this series gets rid of linearizations occuring when processing
compound types in the CQL layer. This is achieved by storing their elements in
`managed_bytes` instead of `bytes` in the partially deserialized form (`lists::value`
`tuples::value`, etc.) outputting `managed_bytes` instead of `bytes` in functions
which go from the partially deserialized form to the atomic cell format (for frozen
types), and avoiding calling deserialize/serialize on individual elements when
it's not necessary. (It's only necessary for CQLv2, because since CQLv3 the format
on the wire is the same as our internal one).

The above also forces some changes to `expression.cc`, and `restrictions`, mainly because
`IN` clauses store their arguments as `lists` and `tuples`, and the code which handled
this clause expected `bytes`.

After this series, the path from prepared CQL statements to `atomic_cell_or_collection`
is almost completely linearization-free. The last remaining place is `collection_mutation_description`,
where map keys are linearized to `bytes`.

Closes #8160

* github.com:scylladb/scylla:
  cql3: update_parameters: remove unused version of make_cell for bytes_view
  types: collection: remove an unused version of pack_fragmented
  cql3: optimize the deserialization of collections
  cql3: maps, sets: switch the element type from bytes to managed_bytes
  cql3: expression: use managed_bytes instead of bytes where possible
  cql3: expr: expression: make the argument of to_range a forwarding reference
  cql3: don't linearize elements of lists, tuples, and user types
  cql3: values: add const managed_bytes& constructor to raw_value_view
  cql3: output managed_bytes instead of bytes in get_with_protocol_version
  types: collection: add versions of pack for fragmented buffers
  types: add write_collection_{value,size} for managed_bytes_mutable_view
  cql3: tuples, user_types: avoid linearization in from_serialized() and get()
  types: tuple: add build_value_fragmented
  cql3: update_parameters: add make_cell version for managed_bytes_view
  cql3: remove operation::make_*cell
  cql3: values: make raw_value fragmented
  cql3: values: remove raw_value_view::operator==
  cql3: switch users of cql3::raw_value_view to internals-independent API
  cql3: values: add an internals-independent API to raw_value_view
  utils: managed_bytes: add a managed_bytes constructor from FragmentedView
  utils: managed_bytes: add operator<< and to_hex for managed_bytes
  utils: fragment_range: add to_hex
  configure: remove unused link dependencies from UUID_test
2021-04-01 15:21:32 +03:00
Pavel Emelyanov
8bbe2eae5e btree: Convert comparator to <=>
It turned out that all the users of btree can already be converted
to use safer std::strong_ordering. The only meaningful change here
is the btree code itself -- no more ints there.

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210330153648.27049-1-xemul@scylladb.com>
2021-04-01 12:56:08 +03:00
Michał Chojnowski
5984d6b2ce cql3: values: remove raw_value_view::operator==
It's only used in a single test, and there is no reason why it should ever
be used anywhere else. So let's remove it from the public header and move
it to that test.
2021-04-01 10:42:07 +02:00
Michał Chojnowski
b9322a6b71 cql3: switch users of cql3::raw_value_view to internals-independent API
We want to change the internals of cql3::raw_value{_view}.
However, users of cql3::raw_value and cql3::raw_value_view often
use them by extracting the internal representation, which will be different
after the planned change.

This commit prepares us for the change by making all accesses to the value
inside cql3::raw_value(_view) be done through helper methods which don't expose
the internal representation publicly.

After this commit we are free to change the internal representation of
raw_value_{view} without messing up their users.
2021-04-01 10:42:04 +02:00
Michał Chojnowski
4715268e30 utils: managed_bytes: add operator<< and to_hex for managed_bytes
We will need them to replace bytes with managed_bytes in some places in an
upcoming patch.

The change to configure.py is necessary because opearator<< links to to_hex
in bytes.cc.
2021-04-01 10:39:42 +02:00
Asias He
bdb95233e8 gossip: Add advertise_to_nodes
gossiper::advertise_to_nodes() is added to allow respond to gossip echo
message with specified nodes and the current gossip generation number
for the nodes.

This is helpful to avoid the restarted node to be marked as alive during
a pending replace operation.

After this patch, when a node sends a echo message, the gossip
generation number is sent in the echo message. Since the generation
number changes after a restart, the receiver of the echo message can
compare the generation number to tell if the node has restarted.

Refs #8013
2021-04-01 09:38:54 +08:00
Piotr Jastrzebski
57c7964d6c config: ignore enable_sstables_mc_format flag
Don't allow users to disable MC sstables format any more.
We would like to retire some old cluster features that has been around
for years. Namely MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES. To do this
we first have to make sure that all existing clusters have them enabled.
It is impossible to know that unless we stop supporting
enable_sstables_mc_format flag.

Test: unit(dev)

Refs #8352

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>

Closes #8360
2021-03-31 12:23:59 +03:00
Avi Kivity
d2921b5112 Merge 'Clean up > 2-year-old features' from Piotr Sarna
Following the work started in 253a7640e, a new batch of old features is assumed to be always available. They are all still announced via gossip, but the code assumes that the feature is always true, because we only support upgrades from a previous release, and the release window is considerably smaller than 2 years.

Features picked this time via `git blame`, along with the date of their introduction:

* fe4afb1aa3 (Asias He                  2018-09-05 14:52:10 +0800  109) static const sstring ROW_LEVEL_REPAIR = "ROW_LEVEL_REPAIR";
* ff5e541335 (Calle Wilund              2019-02-05 13:06:07 +0000  110) static const sstring TRUNCATION_TABLE = "TRUNCATION_TABLE";
* fefef7b9eb (Tomasz Grabiec            2019-03-05 19:08:07 +0100  111) static const sstring CORRECT_STATIC_COMPACT_IN_MC = "CORRECT_STATIC_COMPACT_IN_MC";

Tests: unit(dev)

Closes #8235

* github.com:scylladb/scylla:
  sstables,test: remove variables depending on old features
  gms: make CORRECT_STATIC_COMPACT_IN_MC ft unconditionally true
  sstables: stop relying on CORRECT_STATIC_COMPACT_IN_MC feature
  gms: make TRUNCATION_TABLE feature unconditionally true
  gms: make ROW_LEVEL_REPAIR feature unconditionally true
  repair: stop relying on ROW_LEVEL_REPAIR feature
2021-03-30 16:13:35 +03:00
Avi Kivity
8785dd62cb tests: use kernel page cache
Tests are short-lived and use a small amount of data. They
are also often run repeatly, and the data is deleted immediately
after the test. This is a good scenario for using the kernel page
cache, as it can cache read-only data from test to test, and avoid
spilling write data to disk if it is deleted quickly.

Acknowledge this by using the new --kernel-page-cache option for
tests.

This is expected to help on large machines, where the disk can be
overloaded. Smaller machines with NVMe disks probably will not see
a difference.

Closes #8347
2021-03-30 12:04:55 +02:00
Piotr Sarna
6de2691bbd sstables,test: remove variables depending on old features
In order to maintain backward compatibility wrt. cluster features,
two boolean variables were kept in sstable writers:
 - correctly_serialize_non_compound_range_tombstones
 - correctly_serialize_static_compact_in_mc

Since these features are assumed to always be present now,
the above variables are no longer needed and can be purged.
2021-03-30 09:37:41 +02:00
Botond Dénes
3c54c990ab test: view_build_test: test_view_update_generator_buffering: fail gracefully
Failures in this test typically happen inside the test consumer object.
These however don't stop the test as the code invoking the consumer
object handles exceptions coming from it. So the test will run to
completion and will fail again when comparing the produced output with
the expected one. This results in distracting failures. The real problem
is not the difference in the output, but the first check that failed,
which is however buried in the noise. To prevent this add an "ok" flag
which is set to false if the consumer fails. In this case the additional
checks are skipped in the end to not generate useless noise.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210326083147.26113-2-bdenes@scylladb.com>
2021-03-29 17:58:28 +03:00
Avi Kivity
a8463cfb37 Merge "reader_permit: signal leaked resources" from Botond
"
When a permit is destroyed we check if it still holds on to any
resources in the destructor. Any resources the permit still holds on are
leaked resources, as users should have released these. Currently we just
invoke `on_internal_error_noexcept()` to handle this, which -- depending
on the configuration -- will result in an error message or an assert. In
the former case, the resources will be leaked for good. This mini-series
fixes this, by signaling back these resources to the semaphore. This
helps avoid an eventual complete dry-up of all semaphore resources and a
subsequent complete shutdown of reads.

Tests: unit(release, debug)
"

* 'reader-permit-signal-leaked-resources/v1' of https://github.com/denesb/scylla:
  reader_permit: signal leaked resources
  test: test_reader_lifecycle_policy: keep semaphores alive until all ops cease
  sstables: generate_summary(): extend the lifecycle of the reader concurrency semaphore
2021-03-29 17:57:31 +03:00
Botond Dénes
9e01c4c667 test: view_build_test: test_view_update_generator_buffering: use separate permit for readers
Said test has two separate logical readers, but they share the same
permit, which is illegal. This didn't cause any problems yet, but soon
the semaphore will start to keep score of active/inactive permits which
will be confused by such sharing, so have them use separate permits.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210326083147.26113-1-bdenes@scylladb.com>
2021-03-29 17:35:51 +03:00
Piotr Sarna
bc1c92fd05 Merge 'Improve flat_mutation_reader::consume_pausable' from Piotr Jastrzębski
`flat_mutation_reader::consume_pausable` is widely used in Scylla. Some places
worth mentioning are memtables and combined readers but there are others as
well.

This patchset improves `consume_pausable` in three ways:
1. it removes unnecessary allocation
2. it rearranges ifs to not check the same thing twice
3. for a consumer that returns plain stop_iteration not a future<stop_iteration>
   it reduces the amount of future usage

Test: unit(dev, release, debug)

Combined reader microbenchmark has shown from 2% to 22% improvement in median
execution time while memtable microbenchmark has shown from 3.6% to 7.8%
improvement in median execution time.

Before the change:
```
./build/release/test/perf/perf_mutation_readers --random-seed 3549335083
single run iterations:    0
single run duration:      1.000s
number of runs:           5
number of cores:          16
random seed:              3549335083

test                                      iterations      median         mad         min         max
combined.one_row                             1316234   140.120ns     0.020ns   140.074ns   140.141ns
combined.single_active                          7332    91.484us    31.890ns    91.453us    91.778us
combined.many_overlapping                        945   870.973us   429.720ns   868.625us   871.403us
combined.disjoint_interleaved                   7102    85.989us     7.847ns    85.973us    85.997us
combined.disjoint_ranges                        7129    85.570us     7.840ns    85.562us    85.596us
combined.overlapping_partitions_disjoint_rows        5458   124.787us    56.738ns   124.731us   125.370us
clustering_combined.ranges_generic           1920688   217.940ns     0.184ns   217.742ns   218.275ns
clustering_combined.ranges_specialized       1935318   194.610ns     0.199ns   194.210ns   195.228ns
memtable.one_partition_one_row                624001     1.600us     1.405ns     1.599us     1.605us
memtable.one_partition_many_rows               79551    12.555us     1.829ns    12.549us    12.558us
memtable.many_partitions_one_row               40557    24.748us    77.083ns    24.644us    25.135us
memtable.many_partitions_many_rows              3220   310.429us    57.628ns   310.295us   311.189us
```

After the change:
```
./build/release/test/perf/perf_mutation_readers --random-seed 3549335083
single run iterations:    0
single run duration:      1.000s
number of runs:           5
number of cores:          16
random seed:              3549335083

test                                      iterations      median         mad         min         max
combined.one_row                             1358839   109.222ns     0.122ns   109.089ns   109.348ns
combined.single_active                          7525    87.305us    25.540ns    87.273us    87.362us
combined.many_overlapping                        962   853.195us     1.904us   851.244us   855.142us
combined.disjoint_interleaved                   7310    81.988us    28.877ns    81.949us    82.032us
combined.disjoint_ranges                        7315    81.699us    37.144ns    81.662us    81.874us
combined.overlapping_partitions_disjoint_rows        5591   120.964us    15.294ns   120.949us   121.120us
clustering_combined.ranges_generic           1954722   211.993ns     0.052ns   211.883ns   212.084ns
clustering_combined.ranges_specialized       2042194   187.807ns     0.066ns   187.732ns   188.289ns
memtable.one_partition_one_row                648701     1.542us     0.339ns     1.542us     1.543us
memtable.one_partition_many_rows               85007    11.759us     1.168ns    11.752us    11.782us
memtable.many_partitions_one_row               43893    22.805us    17.147ns    22.782us    22.843us
memtable.many_partitions_many_rows              3441   290.220us    41.720ns   290.172us   290.306us
```

Closes #8359

* github.com:scylladb/scylla:
  flat_mutation_reader: optimize consume_pausable for some consumers
  flat_mutation_reader: special case consumers in consume_pausable
  flat_mutation_reader: Change order of checks in consume_pausable
  flat_mutation_reader: fix indentation in consume_pausable
  flat_mutation_reader: Remove allocation in consume_pausable
  perf: Add benchmarks for large partitions
2021-03-29 13:06:56 +02:00
Piotr Jastrzebski
3aa7bee5e3 perf: Add benchmarks for large partitions
in perf_mutation_readers.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2021-03-29 09:48:11 +02:00
Pavel Solodovnikov
7c229998e8 raft: unit-tests for raft_address_map
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-03-26 20:22:44 +03:00
Botond Dénes
0f1a72ba59 test: test_reader_lifecycle_policy: keep semaphores alive until all ops cease
To ensure the semaphores outlive all permits created as part of the
tests.
2021-03-26 14:22:43 +02:00
Tomasz Grabiec
ef06a939c4 Merge "raft: seven etcd unit tests ported" from Alejo
Seven etcd unit tests as boost tests.

* alejo/raft-tests-etcd-08-v4-communicate-v5:
  raft: etcd unit tests: test proposal handling scenarios
  raft: etcd unit tests: test old messages ignored
  raft: etcd unit tests: test single node precandidate
  raft: etcd unit tests: test dueling precandidates
  raft: etcd unit tests: test dueling candidates
  raft: etcd unit tests: test cannot commit without new term
  raft: etcd unit tests: test single node commit
  raft: etcd unit tests: update test_leader_election_overwrite_newer_logs
  raft: etcd unit tests: fix test_progress_leader
  raft: testing: log comparison helper functions
  raft: testing: helper to make fsm candidate
  raft: testing: expose log for test verification
  raft: testing: use server_address_set
  raft: testing: add prevote configuration
  raft: testing: make become_follower() available for tests
2021-03-25 20:27:07 +01:00
Alejo Sanchez
ace0ee514f raft: etcd unit tests: test proposal handling scenarios
TestProposal
For multiple scenarios, check proposal handling.

Note, instead of expecting an explicit result for each specified case,
the test automatically checks for expected behavior when quorum is
reached or not.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:29 -04:00
Alejo Sanchez
77163ea76a raft: etcd unit tests: test old messages ignored
TestOldMessages
Checks an append request from a leader from a previous term is ignored.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:29 -04:00
Alejo Sanchez
bf65b19803 raft: etcd unit tests: test single node precandidate
TestSingleNodePreCandidate
Checks a single node configuration with precandidate on works to
automatically elect the node.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:29 -04:00
Alejo Sanchez
de7051467b raft: etcd unit tests: test dueling precandidates
TestDuelingPreCandidates
In a configuration of 3 nodes, two nodes don't see each other and they
compete for leadership. Loser (3) should revert to follower when prevote
is rejected and revert to term 1.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:29 -04:00
Alejo Sanchez
aa7d23f86b raft: etcd unit tests: test dueling candidates
TestDuelingCandidates
In a configuration of 3 nodes, two nodes don't see each other and they
compete for leadership. Once reconnected, loser should not disrupt.

But note it will remain candidate with current algorithm without
prevoting and other fsms will not bump term.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:29 -04:00
Alejo Sanchez
1eac94e7d6 raft: etcd unit tests: test cannot commit without new term
TestCannotCommitWithoutNewTermEntry tests the entries cannot be
committed when leader changes, no new proposal comes in and ChangeTerm
proposal is filtered.

NOTE: this doesn't check committed but it's implicit for next round;
      this could also use communicate() providing committed output map

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:29 -04:00
Alejo Sanchez
b421fe3605 raft: etcd unit tests: test single node commit
Port etcd TestSingleNodeCommit

In a single node configuration elect the node, add 2 entries and check
number of committed entries.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:29 -04:00
Alejo Sanchez
9b4538476b raft: etcd unit tests: update test_leader_election_overwrite_newer_logs
Make test_leader_election_overwrite_newer_logs use newer communicate()
and other new helpers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:29 -04:00
Alejo Sanchez
368eec1190 raft: etcd unit tests: fix test_progress_leader
Make implementation follow closer to original test.
Use newer boost test helpers.

NOTE: in etcd it seems a leader's self progress is in PIPELINE state.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:28 -04:00
Alejo Sanchez
ba29970e29 raft: testing: log comparison helper functions
Two helper functions to compare logs. For now only index, term, and data
type are used. Data content comparison does not seem to be necessary for now.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-03-25 15:04:28 -04:00