Commit Graph

1578 Commits

Author SHA1 Message Date
Gleb Natapov
d0ebd79deb raft: test: return error from rpc module if nodes are disconnected
Returning an error when nodes are disconnected closer resembles what
will happen in real networking.
2021-05-06 11:34:31 +03:00
Gleb Natapov
745f63991f raft: test: fix c&p error in a test
Message-Id: <YJKBOwBX8hqHLxsB@scylladb.com>
2021-05-05 17:18:49 +02:00
Botond Dénes
992819b188 database: add get_unlimited_query_max_result_size()
Similar to the already existing get_reader_concurrency_semaphore(),
this method determines the appropriate max result size for the query
class, which is deduced from the current scheduling group. This method
shares its scheduling group -> query class association mechanism with
the above mentioned semaphore getter.
2021-05-05 13:30:42 +03:00
Tomasz Grabiec
121eb32679 Merge 'test: perf: report instructions retired per operations' from Avi Kivity
Instructions retired per op is a much more stable than time per op
(inverse throughput) since it isn't much affected by changes in
CPU frequencey or other load on the test system (it's still somewhat
affected since a slower system will run more reactor polls per op).
It's also less indicative of real performance, since it's possible for
fewer inststructions to execute in more time than more instructions,
but that isn't an issue for comparative tests).

This allows incremental changes to the code base to be compared with
more confidence.

Current results are around 55k instructions per read, and 52k for writes.

Closes #8563

* github.com:scylladb/scylla:
  test: perf: tidy up executor_stats snapshot computation
  test: perf: report instructions retired per operations
  test: perf: add RAII wrapper around Linux perf_event_open()
  test: perf: make executor_stats_snapshot() a member function of executor
2021-05-05 00:54:08 +02:00
Tomasz Grabiec
b8665c459d Merge "raft: replication test updates" from Alejo
Cleanups, fixes, and configuration change support for replication tests.

* alejo/raft-tests-replication-01-fixes-v13:
  raft: replication test: remove obsolete helper
  raft: replication test: add_entry with retries
  raft: replication test: support config change
  raft: replication test: add dummy command support
  raft: replication test: test both with and without prevote
  raft: replication test: make initial leader just default
  raft: replication test: create command helper
  raft: replication test: free elections as helper
  raft: replication test: fix election connectivity
  raft: replication test: fix custom election
  raft: replication test: add helpers for threshold and election
  raft: replication test: connectivity improvement
  raft: replication test: helper for server_address
  raft: replication test: use wait_log()
  raft: replication test: cycle leader more
  raft: replication test: fix a test description
  raft: replication test: remove multiple state machines
  raft: replication test: remove checksum
  raft: replication test: remove unused class param
2021-05-04 18:52:47 +02:00
Alejo Sanchez
27ad2a0f28 raft: replication test: remove obsolete helper
As we are now serially adding commands with consecutive integers there
is no need to build vectors of commands. Remove helper.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-04 11:01:07 -04:00
Alejo Sanchez
0a54fd848b raft: replication test: add_entry with retries
The current leader might have stepped down. Try again and learn if
there's a new leader.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-04 11:00:46 -04:00
Nadav Har'El
df65d09e08 Merge ' cdc: log: fill cdc$deleted_ columns in pre-images ' from Piotr Grabowski
Before this change, `cdc$deleted_` columns were all `NULL` in pre-images. Lack of such information made it hard to correctly interpret the pre-image rows, for example:

```
INSERT INTO tbl(pk, ck, v, v2) VALUES (1, 1, null, 1);
INSERT INTO tbl(pk, ck, v2) VALUES (1, 1, 1);
```

For this example, pre-image generated for the second operation would look like this (in both `true` and `full` pre-image mode):

```
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1
```

`v=NULL` has two meanings:
1. If pre-image was in `true` mode, `v=NULL` describes that v was not affected (affected columns: pk, ck, v2).
2. If pre-image was in `full` mode, `v=NULL` describes that v was equal to `NULL` in the pre-image.

Therefore, to properly decode pre-images you would need to know in which mode pre-image was configured on the CDC-enabled table at the moment this CDC log row was inserted. There is no way to determine such information (you can only check a current mode of pre-image).

A solution to this problem is to fill in the `cdc$deleted_` columns for pre-images. After this PR, for the `INSERT` described above, CDC now generates the following log row:

If in pre-image 'true' mode:
```
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1
```

If in pre-image 'full' mode:
```
pk=1, ck=1, v=NULL, cdc$deleted_v=true, v2=1
```

A client library now can properly decode a pre-image row. If it sees a `NULL` value, it can now check the `cdc$deleted_` column to determine if this `NULL` value was a part of pre-image or it was omitted due to not being an affected column in the delta operation.

No such change is necessary for the post-image rows, as those images are always generated in the `full` mode.

Additional example:
Additional example of trouble decoding pre-images before this change.
tbl2 - `true` pre-image mode, tbl3 - `full` pre-image mode:

```
INSERT INTO tbl2(pk, ck, v, v2) VALUES (1, 1, 5, 1);
INSERT INTO tbl3(pk, ck, v, v2) VALUES (1, 1, null, 1);
```

```
INSERT INTO tbl2(pk, ck, v2) VALUES (1, 1, 1);
```
generated pre-image:
```
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1
```

```
INSERT INTO tbl3(pk, ck, v2) VALUES (1, 1, 1);
```

generated pre-image:
```
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1
```

Both pre-images look the same, but:
1. `v=NULL` in tbl2 describes v being omitted from the pre-image.
2. `v=NULL` in tbl3 described v being `NULL` in the pre-image.

Closes #8568

* github.com:scylladb/scylla:
  cdc: log: assert post_image is always in full mode
  cdc: tests: check cdc$deleted_ columns in images
  cdc: log: fill cdc$deleted_ columns in pre-images
2021-05-04 14:45:27 +03:00
Piotr Grabowski
778fbb144f cdc: tests: check cdc$deleted_ columns in images
Add a test that checks whether the cdc$deleted_ columns are properly
filled in the pre/post-image rows.

This test checks tables with only atomic columns, tables with frozen
collections and non-frozen collections. The test is performed with
both 'true' pre-image mode and 'full' pre-image mode.
2021-05-04 12:33:15 +02:00
Calle Wilund
7e345e37e8 cql/cdc_batch_delete_postimage_test - rename test files + fix result
The tests, when added, where not named kosher (*_test), which the
runner apparently quaintly, require to pick it up (instead of the more
sensisble *.cql).

Thusly, the test was never run beyond initial creation, and also
bit-rotted slightly during behaviour changes.

Renamed and re-resulted.

Closes #8581
2021-05-04 12:47:33 +03:00
Alejo Sanchez
56e977ae69 raft: replication test: support config change
Add support for configuration change on leader.

Keep track of servers in config in test.

Add a dummy entry to confirm configuration changed. If the add fails,
because the old leader was not in the new config and stepped down, the
config is considered changed, too.

Add a test with some configuration changes.
Add a test cycling every scenario for 1 of 4 nodes removed.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
8d8af92cbb raft: replication test: add dummy command support
Use a special value as dummy entry to be ignored when seen in state
machine input.

Ignore dummy entries for count.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
4aa52be7e5 raft: replication test: test both with and without prevote
Before this change the default was prevote enabled.
With this change each test is run with and without prevote.
This duplicates the number of test cases.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
e759e492c7 raft: replication test: make initial leader just default
The test suite requires an initial leader and at the moment it's always
just 0. Make it default and simplify code.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
eb5bbcdec7 raft: replication test: create command helper
Factor out repeated code and make it available for other uses.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
eb94dd26dc raft: replication test: free elections as helper
Add a helper to run free elections and use it in partitioning.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
cb297a57df raft: replication test: fix election connectivity
If a leader was already disconnected the election of a new leader could
re-connect. Save original connectivity and restore it when done electing
new leader.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
0a5c605713 raft: replication test: fix custom election
Use the new specific connectivity to manage old leader disconnection
more specifically.

This fixes having elections where the vote of the old leader is required
for quorum. For example {A,B} and we want to switch leader.  For B to
become candidate it has to see A as down. Then A has to see B's request
for vote, and vote for A.

So to make the general case old leader needs to be first disconnected
from all nodes, make the desired node candidate, then have the old
leader connected only to the desired candidate (else, other nodes would
see the new candidate as disrupting a live leader).

Also, there might be stray messages from the former leader. These could
revert the candidate to follower. To handle this this patch retries
the process until the desired node becomes leader.

The helper function elect_me_leader() is split and renamed to
wait_until_candidate() and wait_election_done(). The former ticks until
the node is a candidate and the later waits until a candidate either
becomes a leader or reverts to follower

The existing etcd test workaround of incrementing from n=2 to n=3 nodes
is corrected back to original n=2.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
9909983e38 raft: replication test: add helpers for threshold and election
Add 2 helper functions for making nodes reach timeout threshold and to
elect a specific node.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
38526d7a2f raft: replication test: connectivity improvement
Replace simple full disconnect of a node with specific from -> to
disconnection tracking.

This will help electing new leaders.

Say there are {A,B,C} with A leader and we want to elect B.
Before this patch, we would disconnect A, run an election with just
{B,C}, and then re-connect A.

If we have {A,B} and want to elect B, this won't work as B needs 2/2+1
votes and A is disconnected. Even if we made A stepped down. This patch
corrects this shortcoming. (@gleb-cloudius)

With this patch, we can specify other followers (not the previous or
next leader) to not see the old leader, but the new and old leaders see
each other just fine. In the example {A,B,C} above we can cut A<->B
specifcally.

Also, this is closer to etcd testing and should help porting cases.

NOTE: in the current test implementation failure_detector reports
node.is_alive(other_node) if there is a connection both ways.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
f53dea432c raft: replication test: helper for server_address
A helper function to convert from local 0-based id to raft 1-based
server_address.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
294e16cf8b raft: replication test: use wait_log()
Use wait_log() helper in leftover election code.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
355c8a052f raft: replication test: cycle leader more
For ported etcd test cycle leader, cycle some more.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
5b2c9a6c94 raft: replication test: fix a test description
Fix replace_log_leaders_log_empty description comment.

Reported by @kbraun

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
bbb56e2265 raft: replication test: remove multiple state machines
Checksum was removed so undo support for multiple versions added in:

    test: add support for different state machines
    43dc5e7dc2

NOTE: as there is a test with custom total_values, expected value cannot
      be static const anymore. (line 630)

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
e77af8573b raft: replication test: remove checksum
Previously, entries were added in parallel and we needed to check if
order was broken. Using a simple checksum was better than a hash as you
could easily find the position it broke (we add consecutive numbers).

Now order of entries is forced so it's not useful. This patch removes
it.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Alejo Sanchez
9335941b49 raft: replication test: remove unused class param
persisted_snapshots is not used in state_machine class. Remove it.

Reported by @kbraun

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-05-03 07:53:35 -04:00
Pavel Solodovnikov
4c351ff260 raft: switch group_id type from uint64_t to utils::UUID
Introduce a tagged id struct for `group_id`.

Raft code would want to generate quite a lot of unique
raft groups in the future (e.g. tablets). UUID is designed
exactly for that (e.g. larger capacity than `uint64_t`, obviously,
and also has built-in procedures to generate random ids).

Also, this is a preparation to make "raft group 0" use a random
ID instead of a literal fixed `0` as a group id.

The purpose is that every scylla cluster must have a unique ID
for "raft group 0" since we don't want the nodes from some other
cluster to disrupt the current cluster. This can happen if,
for some reason, a foreign node happens to contact a node in
our cluster.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210429170630.533596-3-pa.solodovnikov@scylladb.com>
2021-05-02 16:39:54 +03:00
Botond Dénes
26ae9555d1 test: multishard_mutation_query_test: fuzzy-test: don't consume resource up-front
The fuzzy test consumes a large chunk of resource from the semaphore
up-front to simulate a contested semaphore. This isn't an accurate
simulation, because no permit will have more than 1 units in reality.
Furthermore this can even cause a deadlock since 8aaa3a7 as now we rely
on all count units being available to make forward progress when memory
is scarce.
This patch just cuts out this part of the test, we now have a dedicated
unit test for checking a heavily contested semaphore, that does it
properly, so no need to try to fix this clumsy attempt that is just
making trouble at this point.

Refs: #8493

Tests: release(multishard_mutation_query_test:fuzzy_test)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210429084458.40406-1-bdenes@scylladb.com>
2021-04-29 11:45:53 +03:00
Avi Kivity
2b252ef9b7 test: perf: tidy up executor_stats snapshot computation
Now that executor_stats_snapshot() is a member function, we can move
the capture of _count into invocations into it, capturing all the
stats in one place.
2021-04-28 19:02:35 +03:00
Avi Kivity
863b49af03 test: perf: report instructions retired per operations
Instructions retired per op is a much more stable than time per op
(inverse throughput) since it isn't much affected by changes in
CPU frequencey or other load on the test system (it's still somewhat
affected since a slower system will run more reactor polls per op).
It's also less indicative of real performance, since it's possible for
fewer inststructions to execute in more time than more instructions,
but that isn't an issue for comparative tests).

This allows incremental changes to the code base to be compared with
more confidence.
2021-04-28 18:46:55 +03:00
Avi Kivity
0bc98caf3e test: perf: add RAII wrapper around Linux perf_event_open()
Make it easy to embed in other classes.

A helper function is provided for the instructions retired counter.
2021-04-28 18:41:02 +03:00
Avi Kivity
498e6b9a64 test: perf: make executor_stats_snapshot() a member function of executor
I'd like to add an instructions counter which isn't accessible via
a global, so make the snapshot function a member. Out of respect to #1,
define functions for getting the number of allocations and tasks processed,
as they need heavy header files.
2021-04-28 18:38:35 +03:00
Avi Kivity
a43e896396 test: perf: don't truncate allocation/req and tasks/req report
I used {:.0} to truncate to integer, but apparently that resulted
in only one significant digit in the report, so 93.1 was reported as
90. Use the {:5.1f} to avoid truncation, and even get an extra
digit (we can have fractional tasks/op due to batching).

Current result is 93.1 allocs/op, 20.1 tasks/op (which suggests
batch size of around 10).

Closes #8550
2021-04-28 12:50:13 +02:00
Nadav Har'El
7d2df8a9bc test/alternator,cql-pytest: fix resource leak on failure
In the alternator and cql-pytest test frameworks, we have some convenient
contextmanager-based functions that allows us to create a temporary
resource (e.g., a table) that will be automatically deleted, for
example:

    with create_stream_test_table(...) as table:
        test_something(table)

However, our implementation of these functions wasn't safe. We had
code looking like:

    table = ...
    yield table
    table.delete()

The thinking was that the cleanup part (the table.delete()) will be
called after the user's code. However, if the user's code threw
(i.e., a failed assertion), the cleanup wasn't called... When the user's
code throws, it looks as if the "yield" throws. So the correct code
should look like:

    table = ...
    try:
        yield table
    finally:
        table.delete()

Python's contextmanager documentation indeed gives this idiom in its
example.

This patch fixes all contextmanager implementations in our tests to do
the cleanup even if the user's "with" block throws.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210428083748.552203-1-nyh@scylladb.com>
2021-04-28 10:51:02 +02:00
Nadav Har'El
f50db50d10 test/cql-pytest: test for "WHERE v=NULL" in restrictions
Issues #4476 and #8489, and also Cassandra's CASSANDRA-10715, all request
that filtering with "WHERE v=NULL" should return the rows where the column
v is unset. However, we made a deliberate decision to do something else:
That "WHERE v=NULL" should match no row. Exactly like it does in SQL.
This is what this test verifies - that "WHERE v=NULL" never matches any
row - not even rows where "v" is unset.

This test is expected to fail on Cassandra (so marked cassandra_bug),
because in Cassandra the "WHERE v=NULL" restriction is forbidden,
instead of succeeding and returning nothing.

Although we differ here from Cassandra, after a lot of deliberation we
decided that Scylla's behavior is the correct one, so this test verifies
it.

Refs #4776.
Refs #8489.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210426183145.323301-1-nyh@scylladb.com>
2021-04-27 09:26:33 +03:00
Kamil Braun
4c95277619 raft: fsm: fix assertion failure on stray rejects
When probes are sent over a slow network, the leader would send
multiple probes to a lagging follower before it would get a
reject response to the first probe back. After getting a reject, the
leader will be able to correctly position `next_idx` for that
follower and switch to pipeline mode. Then, an out of order reject
to a now irrelevant probe could crash the leader, since it would
effectively request it to "rewind" its `match_idx` for that
follower, and the code asserts this never happens.

We fix the problem by strengthening `is_stray_reject`. The check that
was previously only made in `PIPELINE` case
(`rejected.non_matching_idx <= match_idx`) is now always performed and
we add a new check: `rejected.last_idx < match_idx`. We also strengthen
the assert.

The commit improves the documentation by explaining that
`is_stray_reject` may return false negatives.  We also precisely state
the preconditions and postconditions of `is_stray_reject`, give a more
precise definition of `progress.match_idx`, argue how the
postconditions of `is_stray_reject` follow from its preconditions
and Raft invariants, and argue why the (strengthened) assert
must always pass.
Message-Id: <20210423173117.32939-1-kbraun@scylladb.com>
2021-04-27 01:07:22 +02:00
Nadav Har'El
f17de6ca45 test/cql-pytest: test that "!=" not supported in WHERE
Our documentation of SELECT https://docs.scylladb.com//getting-started/dml
suggests that like a "=" operator exists, there is also a "!=" operator.

However, this is not true: The != operator (which is recognized by the
parser) is not allowed in WHERE clauses. This test verifies that this
is indeed the case - neither Cassandra nor Scylla allow this operator
in WHERE clauses.

Refs https://github.com/scylladb/scylla-doc-issues/issues/732

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210426165511.318066-1-nyh@scylladb.com>
2021-04-26 20:23:21 +03:00
Piotr Sarna
8aaa3a7bb8 Merge 'reader_permit: always forward resources to the semaphore' from Botond
This series is a conceptual revert of 4c8ab10, which turned out to be a
misguided defense mechanism that proved to be a hotbed for bugs. This
protection was superseded by 0fe75571d9 which guarantees forward
progress at all times without all the gotchas and bad interactions
introduced by 4c8ab10.
The latest instance of bad interaction that triggered this series is a
case of resource units being leaked when a previously evicted reader is
re-admitted, leaking already owned resources on each re-admission.
To prove that neither the resource leak, nor the deadlock 4c8ab10 was
supposed to guard against exists after this series, it includes two unit
tests stressing the respective areas: readmission and admission on a
highly contested semaphore.

Fixes: #8493

Also on: https://github.com/denesb/scylla.git
reader-permit-resource-leak-v2

Changelog

v2:
* Rebase over the recently merged reader close series. Fix merge
  conflicts and an exposed bug.

* 'reader-permit-resource-leak-v2' of https://github.com/denesb/scylla:
  test: mutation_reader_test: add test_reader_concurrency_semaphore_forward_progress
  test: mutation_reader_test: add test_reader_concurrency_semaphore_readmission_preserves_units
  reader_concurrency_semaphore: add dump_diagnostics()
  reader_permit: always forward resources
  reader_concurrency_semaphore: inactive_read_handle: abandon(): close reader
2021-04-26 16:30:18 +02:00
Botond Dénes
45d580f056 test: mutation_reader_test: add test_reader_concurrency_semaphore_forward_progress
This unit test checks that the semaphore doesn't get into a deadlock
when contended, in the presence of many memory-only reads (that don't
wait for admission). This is tested by simulating the 3 kind of reads we
currently have in the system:
* memory-only: reads that don't pass admission and only own memory.
* admitted: reads that pass admission.
* evictable: admitted reads that are furthermore evictable.

The test creates and runs a large number of these reads in parallel,
read kinds being selected randomly, then creates a watchdog which
kills the test if no progress is being made.
2021-04-26 15:57:17 +03:00
Botond Dénes
cadc26de38 test: mutation_reader_test: add test_reader_concurrency_semaphore_readmission_preserves_units
This unit test passes a read through admission again-and-again, just
like an evictable reader would be during its lifetime. When readmitted
the read sometimes has to wait and sometimes not. This is to check that
the readmitting a previously admitted reader doesn't leak any units.
2021-04-26 15:57:17 +03:00
Botond Dénes
caaa8ef59a reader_permit: always forward resources
This commit conceptually reverts 4c8ab10. Said commit was meant to
prevent the scenario where memory-only permits -- those that don't pass
admission but still consume memory -- completely prevent the admission
of reads, possibly even causing a deadlock because a permit might even
blocks its own admission. The protection introduced by said commit
however proved to be very problematic. It made the status of resources
on the permit very hard to reason about and created loopholes via which
permits could accumulate without tracking or they could even leak
resources. Instead of continuing to patch this broken system, this
commit does away with this "protection" based on the observation that
deadlocks are now prevented anyway by the admission criteria introduced
by 0fe75571d9, which admits a read anyway when all the initial count
resources are available (meaning no admitted reader is alive),
regardless of availability of memory.
The benefits of this revert is that the semaphore now knows about all
the resources and is able to do its job better as it is not "lied to"
about resource by the permits. Furthermore the status of a permit's
resources is much simpler to reason about, there are no more loopholes
in unexpected state transitions to swallow/leak resources.
To prove that this revert is indeed safe, in the next commit we add
robust tests that stress test admission on a highly contested semaphore.
This patch also does away with the registered/admitted differentiation
of permits, as this doesn't make much sense anymore, instead these two
are unified into a single "active" state. One can always tell whether a
permit was admitted or not from whether it owns count resources anyway.
2021-04-26 15:56:56 +03:00
Botond Dénes
2b66f7222e reader_concurrency_semaphore: inactive_read_handle: abandon(): close reader
fa43d7680 recently introduced mandatory closing of readers before they
are destroyed. One reader destroy path that was left not closing the
reader before destruction is `inactive_reader_handle::abandon()`. This
path is executed when the handle is destroyed while still referring to a
non-evicted inactive read. This patch fixes it up to close the reader
and adds a small unit test which checks that this happens.
2021-04-26 15:56:54 +03:00
Piotr Sarna
0779fa8428 test: add username verification to alternator tracing tests
The test case now additionally checks if the username entry from
found tracing events matches the username used by the test suite.
2021-04-26 11:54:02 +02:00
Avi Kivity
fa43d7680c Merge "Close flat mutation readers" from Benny
"
This patchset adds future-returning close methods to all
flat_mutation_reader-s and makes sure that all readers
are explicitly closed and waited for.

The main motivation for doing so is for providing a path
for cancelling outstanding i/o requests via a the input_stream
close (See https://github.com/scylladb/seastar/issues/859)
and wait until they complete.

Also, this series also introduces a stop
method to reader_concurrency_semaphore to be used when
shutting down the database, instead of calling
clear_inactive_readers in the database destructor.

The series does not change microbenchmarks performance in a significant way.
It looks like the results are within the tests' jitter.

- perf_simple_query: (in transactions per second, more is better)
before: median 184701.83 tps (90 allocs/op, 20 tasks/op)
after:  median 188970.69 tps (90 allocs/op, 20 tasks/op) (+2.3%)

- perf_mutation_readers: (in time per iteration, less is better)
combined.one_row                               65.042ns  -> 57.961ns  (-10.9%)
combined.single_active                         46.634us  -> 46.216us  ( -0.9%)
combined.many_overlapping                      364.752us -> 371.507us ( +1.9%)
combined.disjoint_interleaved                  43.634us  -> 43.448us  ( -0.4%)
combined.disjoint_ranges                       43.011us  -> 42.991us  ( -0.0%)
combined.overlapping_partitions_disjoint_rows  57.609us  -> 58.820us  ( +2.1%)
clustering_combined.ranges_generic             93.464ns  -> 96.236ns  ( +3.0%)
clustering_combined.ranges_specialized         86.537ns  -> 87.645ns  ( +1.3%)
memtable.one_partition_one_row                 903.546ns -> 957.639ns ( +6.0%)
memtable.one_partition_many_rows               6.474us   -> 6.444us   ( -0.5%)
memtable.one_large_partition                   905.593us -> 878.271us ( -3.0%)
memtable.many_partitions_one_row               13.815us  -> 14.718us  ( +6.5%)
memtable.many_partitions_many_rows             161.250us -> 158.590us ( -1.6%)
memtable.many_large_partitions                 24.237ms  -> 23.348ms  ( -3.7%)
average                                        -0.02%

Fixes #1076
Refs #2927

Test: unit(release, debug)
Perf: perf_mutation_readers, perf_simple_query (release)
Dtest: next-gating(release),
       materialized_views_test:TestMaterializedViews.interrupt_build_process_and_resharding_max_to_half_test repair_additional_test:RepairAdditionalTest.repair_disjoint_row_3nodes_diff_shard_count_test(debug)
"

* tag 'flat_mutation_reader-close-v7' of github.com:bhalevy/scylla: (94 commits)
  mutation_reader: shard_reader: get rid of stop
  mutation_reader: multishard_combining_reader: get rid of destructor
  flat_mutation_reader: abort if not closed before destroyed
  flat_mutation_reader: require close
  repair: row_level_repair: run: close repair_meta when done
  repair: repair_reader: close underlying reader on_end_of_stream
  perf: everywhere: close flat_mutation_reader when done
  test: everywhere: close flat_mutation_reader when done
  mutation_partition: counter_write_query: close reader when done
  index: built_indexes_reader: implement close
  mutation_writer: multishard_writer: close readers when done
  mutation_writer: feed_writer: close reader when done
  table: for_all_partitions_slow: close iteration_step reader when done
  view_builder: stop: close all build_step readers
  stream_transfer_task: execute: close send_info reader when done
  view_update_generator: start: close staging_sstable_reader when done
  view: build_progress_virtual_reader: implement close method
  view: generate_view_updates: close builder readers when done
  view_builder: initialize_reader_at_current_token: close reader before reassigning it
  view_builder: do_build_step: close build_step reader when done
  ...
2021-04-25 13:53:11 +03:00
Benny Halevy
5b22731f9a flat_mutation_reader: require close
Make flat_mutation_reader::impl::close pure virtual
so that all implementations are required to implemnt it.

With that, provide a trivial implementation to
all implementations that currently use the default,
trivial close implementation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
391f942b2a perf: everywhere: close flat_mutation_reader when done
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
aa5289f255 test: everywhere: close flat_mutation_reader when done
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
7c7569f0ad querier_cache: implement stop
Close the _closing_gate to wait on background
close of dropped queries, and close all remaining queriers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
87c62b5f59 test: querier_cache_test: close looked up querier
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00