Commit Graph

2040 Commits

Author SHA1 Message Date
Avi Kivity
77a2b4b520 test: perf: perf_simple_query: add instructions_per_op to the json-result output
It's in text output, but 863b49af03 forgot to add it to the machine
readable results.

Closes #9017
2021-07-27 20:26:19 +02:00
Pavel Emelyanov
b3c89787be mutation_partition: Return immutable collection for range tombstones
Patch the .row_tombstones() to return the range_tombstone_list
wrapped into the immutable_collection<> so that callers are
guaranteed not to touch the collection itself, but still can
modify the tombstones.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-27 20:06:53 +03:00
Pavel Emelyanov
1bf643d4fd mutation_partition: Pin mutable access to range tombstones
Some callers of mutation_partition::row_tomstones() don't want
(and shouldn't) modify the list itself, while they may want to
modify the tombstones. This patch explicitly locates those that
need to modify the collection, because the next patch will
return immutable collection for the others.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-27 20:06:53 +03:00
Pavel Emelyanov
05b8cdfd24 mutation_partition: Return immutable collection for rows
Patch the .clustered_rows() method to return the btree of rows
wrapped into the immutable_collection<> so that callers are
guaranteed not to touch the collection itself, but still can
modify the elements in it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-27 20:06:53 +03:00
Pavel Emelyanov
e652b03b4e btree tests: Dont use iterator erase
Next patches will mark btree::iterator methods that modify
the tree itself as private, so stop using them in tests.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-27 20:06:53 +03:00
Avi Kivity
f86e65b4e7 Merge "Fix quadratic behavior in memtable/row_cache with lots of range tombstones" from Tomasz
"
This series fixes two issues which cause very poor efficiency of reads
when there is a lot of range tombstones per live row in a partition.

The first issue is in the row_cache reader. Before the patch, all range
tombstones up to the next row were copied into a vector, and then put
into the buffer until it's full. This would get quadratic if there is
much more range tombstones than fit in a buffer.

The fix is to avoid the accumulation of all tombstones in the vector
and invoke the callback instead, which stops the iteration as soon as
the buffer is full.

Fixes #2581.

The second, similar issue was in the memtable reader.

Tests:

  - unit (dev)
  - perf_row_cache_update (release)
"

* tag 'no-quadratic-rt-in-reads-v1' of github.com:tgrabiec/scylla:
  test: perf_row_cache_update: Uncomment test case for lots of range tombstones
  row_cache: Consume range tombstones incrementally
  partition_snapshot_reader: Avoid quadratic behavior with lots of range tombstones
  tests: mvcc: Relax monotonicity check
  range_tombstone_stream: Introduce peek_next()
2021-07-27 14:39:13 +03:00
Avi Kivity
2cca461652 Merge 'sstables: merge row consumer interfaces with implementations' from Wojciech Mitros
This patch follows #9002, further reducing the complexity of the sstable readers.
The split between row consumer interfaces and implementations has been first added in 2015, and there is no reason to create new implementations anymore. By merging those classes, we achieve a sizeable reduction in sstable reader length and complexity.
Refs #7952
Tests: unit(dev)

Closes #9073

* github.com:scylladb/scylla:
  sstables: merge row_consumer into mp_row_consumer_k_l
  sstables: move kl row_consumer
  sstables: merge consumer_m into mp_row_consumer_m
  sstables: move mp_row_consumer_m
2021-07-27 12:23:29 +03:00
Nadav Har'El
8030461a2c cql-pytest: translate Cassandra's misc. type tests
This is a translation of Cassandra's CQL unit test source file
validation/entities/TypeTest.java into our our cql-pytest framework.

This is a tiny test file, with only four test which apparently didn't
find their place in other source files. All four tests pass on Cassandra,
and all but one pass on Scylla - the test marked xfail discovered one
previously-unknown incompatibility with Cassandra:

Refs #9082: DROP TYPE IF EXISTS shouldn't fail on non-existent keyspace

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210726140934.1479443-1-nyh@scylladb.com>
2021-07-27 08:28:16 +03:00
Tomasz Grabiec
7578cef0a4 test: perf_row_cache_update: Uncomment test case for lots of range tombstones 2021-07-26 21:38:00 +02:00
Tomasz Grabiec
0d7b3f9463 tests: mvcc: Relax monotonicity check
Consecutive range tombstones can have the same position. They will, in
one of the test cases, after the range tombstone merger in
partition_snapshot_flat_reader no longer uses range_tombstone_list to
merge data form multiple versions, which deoverlaps, but rather merges
the streams corresponding to each version, which interleaves range
tombstones from different versions.
2021-07-26 17:27:03 +02:00
Nadav Har'El
b503ec36c2 cql-pytest: translate Cassandra's tests for tuples
This is a translation of Cassandra's CQL unit test source file
validation/entities/TupleTypeTest.java into our our cql-pytest framework.

This test file checks has a few tests on various features of tuples.
Unfortunately, some of the tests could not be easily translated into
Python so were left commented out: Some tests try to send invalid input
to the server which the Python driver "helpfully" forbids; Two tests
used an external testing library "QuickTheories" and are the only two
tests in the Cassandra test suite to use this library - so it's not
a worthwhile to translate it to Python.

11 tests remain, all of them pass on Cassandra, and just one fails on
Scylla (so marked xfail for now), reproducing one known issue:

Refs #7735: CQL parser missing support for Cassandra 3.10's new "+=" syntax
Actually, += is not supposed to be supported on tuple columns anyway, but
should print the appropriate error - not the syntax error we get now as
the "+=" feature is not supported at all.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210722201900.1442391-1-nyh@scylladb.com>
2021-07-26 08:20:12 +03:00
Nadav Har'El
ec5e4c338b cql: fix undefined behavior in timestamp verification
Commit 2150c0f7a2 proposed by issue #5619
added a limitation that USING TIMESTAMP cannot be more than 3 days into
the future. But the actual code used to check it,

     timestamp - now > MAX_DIFFERENCE

only makes sense for *positive* timestamps. For negative timestamps,
which are allowed in Cassandra, the difference "timestamp - now" might
overflow the signed integer and the result is undefined - leading to the
undefined-behavior sanitizer to complain as reported in issue #8895.
Beyond the sanitizer, in practice, on my test setup, the timestamp -2^63+1
causes such overflow, which causes the above if() to make the nonsensical
statement that the timestamp is more than 3 days into the future.

This patch assumes that negative timestamps of any magnitude are still
allowed (as they are in Cassandra), and fixes the above if() to only
check timestamps which are in the future (timestamp > now).

We also add a cql-pytest test for negative timestamps, passing on both
Cassandra and Scylla (after this patch - it failed before, and also
reported sanitizer errors in the debug build).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210621141255.309485-1-nyh@scylladb.com>
2021-07-24 11:01:08 +03:00
Tomasz Grabiec
b044db863f Merge 'db/virtual_table: Streaming tables for large data + describe_ring example table' from Juliusz Stasiewicz
This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data.

This PR was created by @jul-stas and @StarostaGit

Closes #8961

* github.com:scylladb/scylla:
  test/boost: run_mutation_source_tests on streaming virtual table
  system_keyspace: Introduce describe_ring table as virtual_table
  storage_service: Pass the reference down to system_keyspace
  endpoint_details: store `_host` as `gms::inet_address`
  queue_reader: implement next_partition()
  virtual_tables: Introduce streaming_virtual_table
  flat_mutation_reader: Add a new filtering reader factory method
2021-07-23 18:05:51 +02:00
Avi Kivity
aaf35b5ac2 Merge "Remove storage-service from transport (and a bit more)" from Pavel E
"
The cql-server -> storage-service dependency comes from the server's
event_notifier which (un)subscribes on the lifecycle events that come
from the storage service. To break this link the same trick as with
migration manager notifications is used -- the notification engine
is split out of the storage service and then is pushed directly into
both -- the listeners (to (un)subscribe) and the storage service (to
notify).

tests: unit(dev), dtest(simple_boot_shutdown, dev)
       manual({ start/stop,
                with/without started transport,
	        nodetool enable-/disablebinary
	      } in various combinations, dev)
"

* 'br-remove-storage-service-from-transport' of https://github.com/xemul/scylla:
  transport.controller: Brushup cql_server declarations
  code: Remove storage-service header from irrelevant places
  storage_service: Remove (unlifecycle) subscribe methods
  transport: Use local notifier to (un)subscribe server
  transport: Keep lifecycle notifier sharded reference
  main: Use local lifecycle notifier to (un)subscribe listeners
  main, tests: Push notifier through storage service
  storage_service: Move notification core into dedicated class
  storage_service: Split lifecycle notification code
  transport, generic_server: Remove no longer used functionality
  transport: (Un)Subscribe cql_server::event_notifier from controller
  tests: Remove storage service from manual gossiper test
2021-07-22 19:27:45 +03:00
Pavel Emelyanov
c39f04fa6f code: Remove storage-service header from irrelevant places
Some .cc files over the code include the storage service
for no real need. Drop the header and include (in some)
what's really needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:50:19 +03:00
Pavel Emelyanov
8248bc9e33 main, tests: Push notifier through storage service
Now it's time to move the lifecycle notifier from storage
service to the main's scope. Next patches will remove the
$lifecycle-subscriber -> storage_service dependency.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:45:51 +03:00
Pavel Emelyanov
b57fb0aa9a tests: Remove storage service from manual gossiper test
It's not needed there, gossiper starts and works without it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:36:28 +03:00
Piotr Sarna
526ad2a151 Merge 'secondary_index: Fix TOKEN() restrictions in indexed SELECTs' from Jan Ciołek
This is a rewrite of an old PR: #7582

`TOKEN()` restrictions don't work properly when a query uses an index.
For example this returns both rows:
```cql
CREATE TABLE t(pk int, ck int, v int, PRIMARY KEY(pk, ck));
CREATE INDEX ON t(v);
INSERT INTO t (pk, ck, v) VALUES (0, 0, 0);
INSERT INTO t (pk, ck, v) VALUES (1, 0, 0);
SELECT token(pk), pk, ck, v FROM t WHERE v = 0 AND token(pk) = token(0) ALLOW FILTERING;
```

This functionality is supported on both old and new indexes.  In old
indexes the type of the token column was `blob`.  This causes problems,
because `blob` representation of tokens is ordered differently. Tokens
represented as blobs are ordered like this:
```
0, 1, 2, 3, 4, 5, ..., bigint_max, bigint_min, ...., -5, -4, -3, -2, -1
```
Because of that clustering range for `token()` restrictions needs to be
translated to two clustering ranges on the `blob` column.

To create old indexes disable the feature called:
`CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX` or run scylla version from branch
[`cvybhu/si-token2-old-index`](https://github.com/cvybhu/scylla/commits/si-token2-old-index)

I'm not sure if it's possible to create automatic tests with old
indexes. I ran `dev-test` manually on the `si-token2-old-index` branch,
and the only tests that failed were the ones testing row ordering. Rows
should be ordered by `token`, but because in old indexes the token is
represented as a `blob` this ordering breaks. This is a known issue
(#7443), that has been fixed by introducing new indexes.

To sum up:
* `token()` restrictions are fixed on both new and old indexes.
* When using old indexes, the rows are not properly ordered by token.
* With new indexes the rows are properly ordered by token.

Fixes #7043

Closes #9067

* github.com:scylladb/scylla:
  tests: add secondary index tests with TOKEN clause
  secondary_index_test: extract test data
  secondary_index: Fix TOKEN() restrictions in indexed SELECTs
  expression: Add replace_token function
2021-07-22 10:22:45 +02:00
Wojciech Mitros
1ff72ca0a6 sstables: move kl row_consumer
In preparation for the next patch combining row_consumer and
mp_row_consumer_k_l, move row_consumer next to row_consumer.

Because row_consumer is going to be removed, we retire some
old tests for different implementations of the row_consumer
interface; as a result, we don't need to expose internal
types of kl sstable reader for tests, so all classes from
reader_impl.hh are moved to reader.cc, and the reader_impl.hh
file is deleted, and the reader.cc file has an analogous
structure to the reader.cc file in sstables/mx directory.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-07-21 18:04:22 +02:00
Piotr Grabowski
e06102aed9 tests: add secondary index tests with TOKEN clause
Add tests of SELECTs with TOKEN clauses on tables with secondary
indexes (both global and local).

test_select_with_token_range_cases checks all possible token range
combinations (inclusive/exclusive/infinity start/end) on tables without
index, with local or with global index.

test_select_with_token_range_filtering checks whether TOKEN restrictions
combined with column restrictions work properly. As different code paths
are taken if index is created on clustering key (first or non-first) or
non-primary-key column, the tests checks scenarios when index is created
on different columns.
2021-07-21 16:12:55 +02:00
Piotr Grabowski
e2bd1cdb9d secondary_index_test: extract test data
Extract test data to a separate variables, allowing it to be easily
reused by other tests. The tokens are hard-coded, because calculating
their value brought too much complexity to this code.
2021-07-21 16:12:55 +02:00
Raphael S. Carvalho
e4eb7df1a1 table: Make correctness of concurrent sstable list update robust
Today, table relies on row_cache::invalidate() serialization for
concurrent sstable list updates to produce correct results.
That's very error prone because table is relying on an implementation
detail of invalidate() to get things right.
Instead, let's make table itself take care of serialization on
concurrent updates.
To achieve that, sstable_list_builder is introduced. Only one
builder can be alive for a given table, so serialization is guaranteed
as long as the builder is kept alive throughout the update procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210721001716.210281-1-raphaelsc@scylladb.com>
2021-07-21 16:45:30 +03:00
Juliusz Stasiewicz
38b8a6ce2c test/boost: run_mutation_source_tests on streaming virtual table
Tests that require inter-partition forwarding are excluded.
2021-07-20 14:19:17 +02:00
Juliusz Stasiewicz
f8067d938d storage_service: Pass the reference down to system_keyspace
According to the policy of avoiding globals.
2021-07-20 14:18:24 +02:00
Tomasz Grabiec
50ec3ea295 lsa: Fix misaccunting of used space when allocating lsa_buffers
lsa_buffer allocations are aligned to 4K. If smaller size is
requested, whole 4K is used. However, only requested size was used in
accounting segment occupancy. This can confuse reclaimer which may
think the segment is sparse while it is actually dense, and compacting
it will yield no or little gain. This can cause inefficient memory
reclamation or lack of progress.

Refs #9038
Message-Id: <20210720104110.463812-1-tgrabiec@scylladb.com>
2021-07-20 14:08:06 +03:00
Botond Dénes
11b39cbc23 reader_concurrency_semaphore: merge permit_stats into stats
If there was any reason to have them separate when permit_stats was
conceived, it is gone now, so merge the two.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210720073121.63027-1-bdenes@scylladb.com>
2021-07-20 10:35:12 +03:00
Nadav Har'El
36ec1d792e Merge 'cql-pytest: Test selecting from indexed table using only clustering key' from Jan Ciołek
Add examples from issue #8991 to tests
Both of these tests pass on `cassandra 4.0` but fail on `scylla 4.4.3`

First test tests that selecting values from indexed table using only clustering key returns correct values.
The second test tests that performing this operation requires filtering.

The filtering test looks similar to [the one for #7608](1924e8d2b6/test/cql-pytest/test_allow_filtering.py (L124)) but there are some differences - here the table has two clustering columns and an index, so it could test different code paths.

Contains a quick fix for the `needs_filtering()` function to make these tests pass.
It returns `true` for this case and the one described in #7708.

This implementation is a bit conservative - it might sometimes return `true` where filtering isn't actually needed, but at least it prevents scylla from returning incorrect results.

Fixes #8991.
Fixes #7708.

Closes #8994

* github.com:scylladb/scylla:
  cql3: Fix need_filtering on indexed table
  cql-pytest: Test selecting using only clustering key requires filtering
  cql-pytest: Test selecting from indexed table using clustering key
2021-07-19 18:23:08 +03:00
Tomasz Grabiec
049a1ef729 Merge 'flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler' from enedil
The downgrade_to_v1 didn't reset the state of range tombstone assembler
in case of the calls to next_partition or fast_forward_to, which caused
a situation where the closing range tombstone change is cleared from the
buffer before being emitted, without notifying the assembler. This patch
fixes the behaviour in fast_forward_to as well.

Fixes #9022

Closes #9023

* github.com:scylladb/scylla:
  flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler
  flat_mutation_reader: introduce public method returning the default size of internal buffer.
2021-07-19 17:10:23 +02:00
Jan Ciolek
54149242b4 cql3: Fix need_filtering on indexed table
There were cases where a query on an indexed table
needed filtering but need_filtering returned false.

This is fixed by using new conditions in cases where
we are using an index.

Fixes #8991.
Fixes #7708.

For now this is an overly conservative implementation
that returns true in some cases where filtering
is not needed.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-07-19 16:22:17 +02:00
Michał Radwański
67d99e02a7 flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler
The downgrade_to_v1 didn't reset the state of range tombstone assembler
in case of the calls to next_partition or fast_forward_to, which caused
a situation where the closing range tombstone change is cleared from the
buffer before being emitted, without notifying the assembler. This patch
fixes the behaviour in fast_forward_to as well.

Fixes #9022
2021-07-19 15:54:26 +02:00
Nadav Har'El
4c6dc5fce2 Merge 'continuous_data_consumer: properly skip bytes at the end of a range' from Wojciech Mitros
When skipping bytes at the end of a continuous_data_consumer range,
the position of the consumer is moved after the skipped bytes, but
the position of the underlying input_stream is not.

This patch adds skipping of the underlying input_stream, to make
its position consistent with the position of the consumer.

Fixes #9024

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

Closes #9039

* github.com:scylladb/scylla:
  tests: add test for skipping bytes at end of consumer
  continuous_data_consumer: properly skip bytes at the end of a range
2021-07-19 15:57:26 +03:00
Wojciech Mitros
507bdfc36a tests: add test for skipping bytes at end of consumer
The new tests confirms that the regression issue, where
we didn't correctly skip bytes at the end of a
continuous_data_consumer range, is fixed.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-07-19 14:42:38 +02:00
Jan Ciolek
9bd62a07c9 cql-pytest: Test selecting using only clustering key requires filtering
Adds test that creates a table with primary key (p, c1, c2)
with a global index on c2 and then selects where c1 = 1 and c2 = 1.

This should require filtering, but doesn't.
Refs #8991.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-07-19 10:24:48 +02:00
Jan Ciolek
a041767aa3 cql-pytest: Test selecting from indexed table using clustering key
Adds test that creates a table with primary key (p, c1, c2)
with a global index on c2 and then selects where c1 = 1 and c2 = 1.

This currently fails.
Refs #8991.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
2021-07-19 10:24:46 +02:00
Avi Kivity
2cfc517874 main, test: adjust number of networking iocbs
Seastar's default limit of 10,000 iocbs per shard is too low for
some workload (it places an upper bound on the number of idle
connections, above which a crash occurs). Use the new Seastar
feature to raise the default to 50000.

Also multiply the global reservation by 5, and round it upwards
so the number is less weird. This prevents io_setup() from failing.

For tests, the reservation is reduced since they don't create large
numbers of connections. This reduces surprise test failures when they
are run on machines that haven't been adjusted.

Fixes #9051

Closes #9052
2021-07-18 14:38:44 +03:00
Avi Kivity
df822e09e0 Merge "Run test cases in parallel" from Pavel E
"
The debug-mode tests nowadays take ~1 hours to complete on a
24-cores threadripper machine. This is mostly because of a bunch
of individual test cases that run sequentially (since they sit
in one test) each taking half-an-hour and longer.

The previous attempt was to break the longest tests into pieces,
and to update the list of long-running test in suite.yaml file,
but the concern was that the linkage time and disk space would
grow without limits if this continues. Also the long-running tests
list needs to be revisited every so often.

So the new attempt is to resurrect Avi's patch that ran test
cases in parallel for boost tests. This set applies parallelizm
to all tests and allows to blacklist those that shound't (the
logalloc needs the very first case to prime_segment_pools so
that other cases run smoothly, thus is cannot be parallelized).

Although this wild parallelizm adds an overhead for _each_ test
case this is good enough even for short dev-mode tests (saves
25% of runtime), but greatly relaxes the maintenance of the
"parallelizable list of tests".

For debug tests the problem is not 100% solved. There are 6 cases
that run longer than 30min,  while all the others complete much-
-much faster. So if excluding those slow 6 cases the full parallel
run saves 50+% of the runtime -- 60+m now vs 25m with the patch.
Those 6 slowest cases will need more incremental care.

The --parallel-cases mode is not yet default, because it requires
larger max-aio-nr value to be set, which is not (yet?) automatic.
Also it sometimes hits nr-open-files limit, which also needs more
work.

tests: unit(dev), unit(debug)
"

* 'br-parallel-testpy-3' of https://github.com/xemul/scylla:
  tests: Update boost long tests list
  test.py: Parallelize test-cases run (for boost tests)
  test.py: Prepare BoostTest for running individual cases
  test.py: Prepare TestSuite::create_test() for parallelizm
  test.py: Treat shortname as composite
  test.py: Reformat tabluar output
2021-07-17 13:57:56 +03:00
Pavel Emelyanov
9d59f1daf3 tests: Update boost long tests list
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-16 17:25:07 +03:00
Pavel Emelyanov
cbb4837b77 test.py: Parallelize test-cases run (for boost tests)
The parallelizm is acheived by listing the content of each (boost)
test and by adding a test for each case found appending the
'--run_test={case_name}' option.

Also few tests (logallog and memtable) have cases that depend on
each other (the former explicitly stated this in the head comment),
so these are marked as "no_parallel_cases" in the suite.yaml file.

In dev mode tests need 2m:5s to run by default. With parallelizm
(and updated long-running tests list) -- 1m 35s.

In debug mode there are 6 slow _cases_ that overrun 30 minutes.
They finish last and deserve some special (incremental) care. All
the other tests run ~1h by default vs ~25m in parallel.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-16 17:25:07 +03:00
Tomasz Grabiec
97aa335a60 Merge "test: raft: randomized_nemesis_test: refactors and improvements" from Kamil
A couple of improvements to prepare for the next patchset.

We move `logical_timer` and `ticker` to their own headers due to the
generality of these data structures. They are not very specific to the
test.

`logical_timer` is extended with a `schedule` function, allowing to
schedule any given function to be called at the given time point.

The interface of `network` in `randomized_nemesis_test` is extended by
`add_grudge` and `remove_grudge` functions for implementing network
partitioning nemeses.
Furthermore `network` can be now constructed with an arbitrary network
delay, which was previously hardcoded.

`with_env_and_ticker` is now generic w.r.t. return values (previously
`future<>` was assumed).

`environment` exposes a reference to the `network` through a getter.

The `not_a_leader` exception now shows the leader's ID in the exception
message. Useful for logging.

In `logical_timer::with_timeout`, when we timeout, we don't just return
`timed_out_error`. The returned exception now actually contains the
original future... well almost; in any case, the user can now do
something different to the future other than simply discarding it.

We also fix some `broken_promise` exceptions appearing in discarded
futures in certain scenarios. See the corresponding commit for detailed
explanation.

We handle `raft::dropped_entry` in the `call` function.

`persistence` is fixed to avoid creating gaps in the log when storing
snapshots and to support complex state types.

Waiting for leader was refactored into a separate function and
generalized (we wait for a set of nodes to elect a leader instead of a
single node to elect itself) to be useful in more situations.

Finally, we introduce `reconfigure`, a higher-level version of
`set_configuration` which performs error handling and supports timeouts.

* kbr/raft-nemesis-improvements-v4:
  test: raft: randomized_nemesis_test: `reconfigure` function
  test: raft: randomized_nemesis_test: refactor waiting for leader into a separate function
  test: raft: randomized_nemesis_test: persistence: avoid creating gaps in the log when storing snapshots
  test: raft: randomized_nemesis_test: persistence: handle complex state types
  test: raft: randomized_nemesis_test: `call`: handle `raft::dropped_entry`
  test: raft: randomized_nemesis_test: impure_state_machine/call: handle dropped channels
  test: raft: randomized_nemesis_test: environment: expose the network
  test: raft: randomized_nemesis_test: configurable network delay and FD convict threshold
  test: raft: randomized_nemesis_test: generalize `with_env_and_ticker`
  test: raft: randomized_nemesis_test: network: `add_grudge`, `remove_grudge` functions
  test: raft: randomized_nemesis_test: move `ticker` to its own header
  test: raft: randomized_nemesis_test: ticker: take `logger` as a constructor parameter
  test: raft: logical_timer: handle immediate timeout
  test: raft: logical_timer: on timeout, return the original future in the exception
  test: raft: logical_timer: add `schedule` member function
  test: raft: randomized_nemesis_test: move `logical_timer` to its own header
  test: raft: include the leader's ID in the `not_a_leader` exception's message
2021-07-16 16:12:05 +02:00
Nadav Har'El
5183e0cbe9 Merge 'Fix artificial view update size limit' from Piotr Sarna
The series which split the view update process into smaller parts
accidentally put an artificial 10MB limit on the generated mutation
size, which is wrong - this limit is configurable for users,
and, what's more important, this data was already validated when
it was inserted into the base table. Thus, the limit is lifted.

The series comes with a cql-pytest which failed before the fix and succeeds now. This bug is also  covered by `wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view` dtest, but it needs over a minute to run, as opposed to cql-pytest's <1 second.

Fixes #9047

Tests: unit(release), dtest(wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view)

Closes #9048

* github.com:scylladb/scylla:
  cql-pytest: add a materialized views suite with first cases
  db,view: drop the artificial limit on view update mutation size
2021-07-15 17:03:07 +03:00
Piotr Sarna
c05340c4bf cql-pytest: add a materialized views suite with first cases
cql-pytest did not have a suite for materialized views, so one is
created. At the same time, test cases for building/updating a view on
a base table with large cells is added as a regression test for #9047.
2021-07-15 15:40:38 +02:00
Piotr Sarna
3d816b7c16 Merge 'Move the reader concurrency semaphore in front of the cache' from Botond
This patchset combines two important changes to the way reader permits
are created and admitted:
1) It switches admission to be up-front.
2) It changes the admission algorithm.

(1) Currently permits are created before the read is started, but they
only wait for admission when going to the disk. This leaves the
resources consumption of cache and memtables reads unbounded, possibly
leading to OOM (rare but happens). This series changes this that permits
are admitted at the moment they are creating making admission up-front
-- at least those reads that pass admission at all (some don't).

(2) Admission currently is based on availability of resources. We have a
certain amount of memory available, which derived from the memory
available to the shard, as well a hardcoded count resource. Reads are
admitted when a count and a certain amount (base cost) of memory is
available. This patchset adds a new aspect to this admission process
beyond the existing resource availability: the number of used/blocked
reads. Namely it only admits new reads if in addition to the necessary
amount of resources being available, all currently used readers are
blocked. In other words we only admit new reads if all currently
admitted reads requires something other than CPU to progress. They are
either waiting on I/O, a remote shard, or attention from their consumers
(not used currently).

The reason for making these two changes at the same time is that
up-front admission means cache reads now need to obtain a permit too.
For cache reads the optimal concurrency is 1. Anything above that just
increases latency (without increasing throughput). So we want to make sure
that if a cache reader hits it doesn't get any competition for CPU and
it can run to completion. We admit new reads only if the read misses and
has to go to disk.

A side effect of these changes is that the execution stages from the
replica-side read path are replaced with the reader concurrency
semaphore as an execution stage. This is necessary due to bad
interaction between said execution stages and up-front admission. This
has an important consequence: read timeouts are more strictly enforced
because the execution stage doesn't have a timeout so it can execute
already timed-out reads too. This is not the case with the semaphore's
queue which will drop timed-out reads. Another consequence is that, now
data and mutation reads share the same execution stage, which increases
its effectiveness, on the other hand system and user reads don't
anymore.

Fixes: #4758
Fixes: #5718

Tests: unit(dev, release, debug)

* 'reader-concurrency-semaphore-in-front-of-the-cache/v5.3' of https://github.com/denesb/scylla: (54 commits)
  test/boost/reader_concurrency_semaphore_test: add used/blocked test
  test/boost/reader_concurrency_semaphore_test: add admission test
  reader_permit: add operator<< for reader_resources
  reader_concurrency_semaphore: add reads_{admitted,enqueued} stats
  table: make_sstable_reader(): fix indentation
  table: clean up make_sstable_reader()
  database: remove now unused query execution stages
  mutation_reader: remove now unused restricting_reader
  sstables: sstable_set: remove now unused make_restricted_range_sstable_reader()
  reader_permit: remove now unused wait_admission()
  reader_concurrency_semaphore: remove now unused obtain_permit_nowait()
  reader_concurrency_semaphore: admission: flip the switch
  database: increase semaphore max queue size
  test: index_with_paging_test: increase semaphore's queue size
  reader_concurrency_semaphore: add set_max_queue_size()
  test: mutation_reader_test: remove restricted reader tests
  reader_concurrency_semaphore: remove now unused make_permit()
  test: reader_concurrency_semaphore_test: move away from make_permit()
  test: move away from make_permit()
  treewide: use make_tracking_only_permit()
  ...
2021-07-14 16:22:56 +02:00
Botond Dénes
e2dfb2df71 test/boost/reader_concurrency_semaphore_test: add used/blocked test
Make sure that releasing a bunch of used/blocked guards in random order
doesn't break the permit state.
2021-07-14 17:19:02 +03:00
Botond Dénes
0337d3ea4a test/boost/reader_concurrency_semaphore_test: add admission test
Checking every conceivable admission scenario (hopefully).
2021-07-14 17:19:02 +03:00
Botond Dénes
b81f39cec9 reader_permit: add operator<< for reader_resources
And use it in tests, it results in actually useful error messages.
2021-07-14 17:19:02 +03:00
Botond Dénes
1b7eea0f52 reader_concurrency_semaphore: admission: flip the switch
This patch flips two "switches":
1) It switches admission to be up-front.
2) It changes the admission algorithm.

(1) by now all permits are obtained up-front, so this patch just yanks
out the restricted reader from all reader stacks and simultaneously
switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By
doing this admission is now waited on when creating the permit.

(2) we switch to an admission algorithm that adds a new aspect to the
existing resource availability: the number of used/blocked reads. Namely
it only admits new reads if in addition to the necessary amount of
resources being available, all currently used readers are blocked. In
other words we only admit new reads if all currently admitted reads
requires something other than CPU to progress. They are either waiting
on I/O, a remote shard, or attention from their consumers (not used
currently).

We flip these two switches at the same time because up-front admission
means cache reads now need to obtain a permit too. For cache reads the
optimal concurrency is 1. Anything above that just increases latency
(without increasing throughput). So we want to make sure that if a cache
reader hits it doesn't get any competition for CPU and it can run to
completion. We admit new reads only if the read misses and has to go to
disk.

Another change made to accommodate this switch is the replacement of the
replica side read execution stages which the reader concurrency
semaphore as an execution stage. This replacement is needed because with
the introduction of up-front admission, reads are not independent of
each other any-more. One read executed can influence whether later reads
executed will be admitted or not, and execution stages require
independent operations to work well. By moving the execution stage into
the semaphore, we have an execution stage which is in control of both
admission and running the operations in batches, avoiding the bad
interaction between the two.
2021-07-14 17:19:02 +03:00
Botond Dénes
dcf49dcb67 test: index_with_paging_test: increase semaphore's queue size
To allow the flood of reads generated by this test to be queued up
during up-front admission without failing the test.
2021-07-14 17:19:02 +03:00
Botond Dénes
388da36bbb test: mutation_reader_test: remove restricted reader tests
Soon we will switch to up-front admission which will break these tests.
No point in trying to fix them as once the switch is done we'll retire
the restricted reader too. Remove these tests now so they are not in the
way of progress.
2021-07-14 17:19:02 +03:00
Botond Dénes
bacfaf9582 test: reader_concurrency_semaphore_test: move away from make_permit()
Migrate to the appropriate up-front admission variants.
2021-07-14 17:19:02 +03:00
Botond Dénes
c07db00b70 test: move away from make_permit()
Use the most appropriate up-front admission variant.
2021-07-14 17:19:02 +03:00