Commit Graph

3296 Commits

Author SHA1 Message Date
Avi Kivity
973d2a58d0 Merge 'docs: move docs to docs/dev folder' from David Garcia
In order to allow our Scylla OSS customers the ability to select a version for their documentation, we are migrating the Scylla docs content to the Scylla OSS repository. This PR covers the following points of the [Migration Plan](https://docs.google.com/document/d/15yBf39j15hgUVvjeuGR4MCbYeArqZrO1ir-z_1Urc6A/edit#):

1. Creates a subdirectory for dev docs: /docs/dev
2. Moves the existing dev doc content in the scylla repo to /docs/dev, but keep Alternator docs in /docs.
3. Flattens the structure in /docs/dev (remove the subfolders).
4. Adds redirects from `scylla.docs.scylladb.com/<version>/<document>` to `https://github.com/scylladb/scylla/blob/master/docs/dev/<document>.md`
5. Excludes publishing docs for /docs/devs.

1. Enter the docs folder with `cd docs`.
2. Run `make redirects`.
3. Enter the docs folder and run `make preview`. The docs should build without warnings.
4. Open http://127.0.0.1:5500 in your browser. You shoul donly see the alternator docs.
5. Open http://127.0.0.1:5500/stable/design-notes/IDL.html in your browser. It should redirect you to https://github.com/scylladb/scylla/blob/master/docs/dev/IDL.md and raise a 404 error since this PR is not merged yet.
6. Surf the `docs/dev` folder. It should have all the scylla project internal docs without subdirectories.

Closes #10873

* github.com:scylladb/scylla:
  Update docs/conf.py
  Update docs/dev/protocols.md
  Update docs/dev/README.md
  Update docs/dev/README.md
  Update docs/conf.py
  Fix broken links
  Remove source folder
  Add redirections
  Move dev docs to docs/dev
2022-07-03 20:37:11 +03:00
Wojciech Mitros
bfa3c0e734 test: move codes of UDFs compiled to WASM to test/resource
After compiling to WASM, UDFs become much larger than the
source code. When they're included in test_wasm.py, it
becomes difficult to navigate in the file. Moving them
to another place does not make understanding the test
scripts harder, because the source code is still included.

This problem will become even more severe when testing
UDFs using WASI.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

Closes #10934
2022-07-03 17:37:21 +03:00
Wojciech Mitros
4fd289a78c wasm: fix freeing in wasm UDFs using WASI
To call a UDF that is using WASI, we need to properly
configure the wasmtime instance that it will be called
on. The configuration was missing from udf_cache::load(),
so we add it here.

The free function does not return any value, so we should use
a calling method that does not expect any returns.
This patch adds such a method and uses it.

A test that did not pass without this fix and does pass after
is added.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

Closes #10935
2022-07-01 07:57:45 +02:00
Nadav Har'El
29a0a2d694 test/scylla-gdb: detect Scylla compiled without debugging information
test/scylla-gdb tests Scylla's gdb debugging tools, and cannot work if
Scylla was compiled without debug information (i.e, the "dev" build mode).
In the past, test/scylla-gdb/run detected this case and printed a clear error:

   Scylla executable was compiled without debugging information (-g)
   so cannot be used to test gdb. Please set SCYLLA environment variable.

Unfortunately, since recently this detection fails, because even when
Scylla is compiled without debug information we link into it a library
(libwasmtime.a) which has *some* debug information. As a result, instead
of one clear error message, we get all scylla-gdb tests running -
and each of them failing separately. This is ugly and unhelpful.

Each of the tests fail because our "gdb" test fixture tries to load
scylla-gdb.py and fails when the symbols it needs (e.g., "size_t")
cannot be found. So in this patch, we check once for the existance
of this symbol - and if missing we exit pytest instead of failing each
individual test.

Moreover, if loading scylla-gdb.py fails for some other unexpected
reason, let's exit the test as well, instead of failing each individual
test.

Fixes #10863.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10937
2022-06-30 20:22:50 +02:00
Botond Dénes
3a19412237 Merge 'Various improvements to perf_row_cache_upgrade test' from Tomasz Grabiec
Closes #10930

* github.com:scylladb/scylla:
  test: perf_row_cache_update: Flush std output after each line
  test: perf_row_cache_update: Drain background cleaner before starting the test
  test: perf_row_cache_update: Measure memtable filling time
  test: perf_row_cache_update: Respect preemption when applying mutations
  test: perf_row_cache_update: Drop unused pk variable
2022-06-30 15:57:29 +03:00
Tomasz Grabiec
8f3349b407 test: lib: flat_mutation_reader_assertion: Add trace-level logging of read fragments
Message-Id: <20220629153926.137824-1-tgrabiec@scylladb.com>
2022-06-30 08:43:30 +03:00
Nadav Har'El
8024da10f0 test/cql-pytest: avoid leaving behind temporary files
Before this patch, the test cql-pytest/test_tools.py left behind
a temporary file in /tmp. It used pytest's "tmp_path_factory" feature,
but it doesn't remove temporary files it creates.

This patch removes the temporary file when the fixture using it ends,
but moreover, it puts the temporary file not in /tmp but rather next
to Scylla's data directory. That directory will be eventually removed
entirely, so even if we accidentally leave a file there, it will
eventually be deleted.

Fixes #10924

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10929
2022-06-30 07:35:55 +03:00
Tomasz Grabiec
a6aef60b93 memtable: Fix missing range tombstones during reads under ceratin rare conditions
There is a bug introduced in e74c3c8 (4.6.0) which makes memtable
reader skip one a range tombstone for a certain pattern of deletions
and under certain sequence of events.

_rt_stream contains the result of deoverlapping range tombstones which
had the same position, which were sipped from all the versions. The
result of deoverlapping may produce a range tombstone which starts
later, at the same position as a more recent tombstone which has not
been sipped from the partition version yet. If we consume the old
range tombstone from _rt_stream and then refresh the iterators, the
refresh will skip over the newer tombstone.

The fix is to drop the logic which drains _rt_stream so that
_rt_stream is always merged with partition versions.

For the problem to trigger, there have to be multiple MVCC versions
(at least 2) which contain deletions of the following form:

[a, c] @ t0
[a, b) @ t1, [b, d] @ t2

c > b

The proper sequence for such versions is (assuming d > c):

[a, b) @ t1,
[b, d] @ t2

Due to the bug, the reader will produce:

[a, b) @ t1,
[b, c] @ t0

The reader also needs to be preempted right before processing [b, d] @
t2 and iterators need to get invalidated so that
lsa_partition_reader::do_refresh_state() is called and it skips over
[b, d] @ t2. Otherwise, the reader will emit [b, d] @ t2 later. If it
does emit the proper range tombstone, it's possible that it will violate
fragment order in the stream if _rt_stream accumulated remainders
(possible with 3 MVCC versions).

The problem goes away once MVCC versions merge.

Fixes #10913
Fixes #10830

Closes #10914
2022-06-29 19:02:23 +03:00
Tomasz Grabiec
3a222c1db3 test: perf_row_cache_update: Flush std output after each line 2022-06-29 17:36:02 +02:00
Tomasz Grabiec
daf0f041be test: perf_row_cache_update: Drain background cleaner before starting the test 2022-06-29 17:36:02 +02:00
Tomasz Grabiec
5d4bd5d6d5 test: perf_row_cache_update: Measure memtable filling time 2022-06-29 17:36:02 +02:00
Tomasz Grabiec
9d9bf8c196 test: perf_row_cache_update: Respect preemption when applying mutations
Otherwise, once preemption is signalled, memtable::apply() will keep
creating MVCC snapshots, which will slow the test down.
2022-06-29 17:36:02 +02:00
Tomasz Grabiec
46a5a606c4 test: perf_row_cache_update: Drop unused pk variable 2022-06-29 17:36:02 +02:00
Pavel Emelyanov
85033ea6ae Merge 'A bunch of refactors related to Raft group 0' from Kamil Braun
The commits here were extracted from PR https://github.com/scylladb/scylla/pull/10835 which implements upgrade procedure for Raft group 0.

They are mostly refactors which don't affect the behavior of the system, except one: the commit 4d439a16b3 causes all schema changes to be bounced to shard 0. Previously, they would only be bounced when the local Raft feature was enabled. I do that because:
1. eventually, we want this to be the default behavior
2. in the upgrade PR I remove the `is_raft_enabled()` function - the function was basically created with the mindset "Raft is either enabled or not" - which was right when we didn't support upgrade, but will be incorrect when we introduce intermediate states (when we upgrade from non-raft-based to raft-based operations); the upgrade PR introduces another mechanism to dispatch based on the upgrade state, but for the case of bouncing to shard 0, dispatching is simply not necessary.

Closes #10864

* github.com:scylladb/scylla:
  service/raft: raft_group_registry: add assertions when fetching servers for groups
  service/raft: raft_group_registry: remove `_raft_support_listener`
  service/raft: raft_group0: log adding/removing servers to/from group 0 RPC map
  service/raft: raft_group0: move group 0 RPC handlers from `storage_service`
  service/raft: messaging: extract raft_addr/inet_addr conversion functions
  service: storage_service: initialize `raft_group0` in `main` and pass a reference to `join_cluster`
  treewide: remove unnecessary `migration_manager::is_raft_enabled()` calls
  test/boost: memtable_test: perform schema operations on shard 0
  test/boost: cdc_test: remove test_cdc_across_shards
  message: rename `send_message_abortable` to `send_message_cancellable`
  message: change parameter order in `send_message_oneway_timeout`
2022-06-29 16:51:54 +03:00
Pavel Emelyanov
b60f2c220b test_tools: Do not create type if it exists
There effectively are several test-cases in this test, each calls the
scylla_sstable() to prepare, thus each creates a type in the same scylla
instance. The 2nd attempt ends up with the "already exists" error:

E   cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="A user type of name cql_test_1656396925652.type1 already exists"

tests: unit(dev)
       https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1075/
fixes: #10872

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220628081459.12791-1-xemul@scylladb.com>
2022-06-29 14:31:57 +03:00
Calle Wilund
aab7794c31 commitlog_test: Change timeout handling to do abort()
Refs #10805

To help debug spurious failures, ensure to do abort() for debugger/core
ease.

Closes #10843.
2022-06-29 13:26:51 +03:00
Nadav Har'El
fc0243a43a Merge 'Implement a number of improvements in test.py' from Konstantin Osipov
A number of improvements in test.py as requested by maintainers:

* don't capture pytest output
* stick to the specific server in control connections
* support --log-level option and pass it to logging module
* when checking if CQL is up, ignore timeout errors
* no longer force schema migration when starting the server
* use test uname, not id, in log output
* improve logging of ScyllaServer
* log what cluster is used for a test
* extend xml output with logs

On the same token, remove mypy warnings and make linter pass on test.py, as well as add some type checking.

Fixes #10871
Fixes #10785

Closes #10902

* github.com:scylladb/scylla:
  test.py: extend xml output with logs
  test.py: log what cluster is used for a test
  test.py: improve logging of ScyllaServer
  test.py: use test uname, not id, in log output
  test.py: support --log-level option and pass it to logging module
  test.py: make ScyllaServer more reliable and fast
  test.py: don't capture pytest output
  test.py: add type annotations
  test.py: convert log_filename to pathlib
  test.py: please linter
  test.py: remove mypy warnings
2022-06-29 13:06:07 +03:00
Pavel Emelyanov
3a753068be Merge "Make permissions cache live updateable and add an API for resetting authorization cache" from Igor Ribeiro Barbosa Duarte
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache (e.g. Adding a user and changing their permissions)
can become pretty annoying.
This patch series make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
It also adds an API for flushing the cache to make it easier for users who
don't want to modify their permissions_cache config.

branch: https://github.com/igorribeiroduarte/scylla/tree/make_permissions_cache_live_updateable
CI: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1005/
dtests: https://github.com/igorribeiroduarte/scylla-dtest/tree/test_permissions_cache

* https://github.com/igorribeiroduarte/scylla/make_permissions_cache_live_updateable:
  loading_cache_test: Test loading_cache::reset and loading_cache::update_config
  api: Add API for resetting authorization cache
  authorization_cache: Make permissions cache and authorized prepared statements cache live updateable
  auth_prep_statements_cache: Make aut_prep_statements_cache accept a config struct
  utils/loading_cache.hh: Add update_config method
  utils/loading_cache.hh: Rename permissions_cache_config to loading_cache_config and move it to loading_cache.hh
  utils/loading_cache.hh: Add reset method
2022-06-29 11:14:13 +03:00
Igor Ribeiro Barbosa Duarte
8cc2de5fe0 loading_cache_test: Test loading_cache::reset and loading_cache::update_config
Validate that the size of the cache is zero after calling the
reset method and that the config is being updated correctly
after calling update_config.

Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
2022-06-28 19:58:06 -03:00
Igor Ribeiro Barbosa Duarte
c8c48a98fa auth_prep_statements_cache: Make aut_prep_statements_cache accept a config struct
This patch makes authorized_prepared_statements_cache acccept a config struct,
similarly to permissions_cache. This will make it easier to make this cache
live updateable on the next patch.

Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
2022-06-28 19:57:52 -03:00
Igor Ribeiro Barbosa Duarte
667840a7eb utils/loading_cache.hh: Rename permissions_cache_config to loading_cache_config and move it to loading_cache.hh
This patch renames the permissions_cache_config struct to loading_cache_config
and moves it to utils/loading_cache.hh. This will make it easier to handle
config updates to the authorization caches on the next patches

Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
2022-06-28 19:46:22 -03:00
Nadav Har'El
630959bb77 Merge 'test.py async and schema changes' from Alecco
Change tests to use async mode and add helpers and tests for schema changes.

These test series will be expanded with topology changes.

Closes #10550

* github.com:scylladb/scylla:
  test.py topology: repro for issue #1207
  test.py: port fixture fails_without_raft
  test.py topology: table methods to add/remove index
  test.py topology: add/drop table column helpers
  test.py topology: insert sequential row
  test.py: remove deprecated test test_null
  test.py: managed random tables
  test.py: test_keyspace fixture async
  test.py: rename fixture test_keyspace to keyspace
  test.py topology: test with asyncio
2022-06-28 23:25:18 +03:00
Avi Kivity
37780c6521 Merge 'test: perf: allow testing timeouts in perf_simple_query' from Piotr Dulikowski
This PR adds necessary modifications to perf_simple_query so that it can be used to test performance of the timeout handling path. With an appropriate combination of flags, it is possible to consistently trigger timeouts on every operation.

The following flags are added:
- `--stop-on-error` - if true (which is the default), the test stops after encountering the first exception and reports it; otherwise it causes errors to be counted and reported at the end.
- `--timeout <x>` - allows to use `USE TIMEOUT <x>` in the benchmark query/statement.
- `--bypass-cache` - uses `BYPASS CACHE` in the benchmark query (relevant only to reads).

Examples:

```
./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000 --write
131023.65 tps ( 56.2 allocs/op,  13.2 tasks/op,   49784 insns/op,        0 errors)

./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000 --write --stop-on-error=false --timeout=0s
97163.73 tps ( 53.1 allocs/op,   5.1 tasks/op,   78687 insns/op,  1000000 errors)

./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000
154060.36 tps ( 63.1 allocs/op,  12.1 tasks/op,   42998 insns/op,        0 errors)

./build/release/test/perf/perf_simple_query --smp=1 --operations-per-shard=1000000 --stop-on-error=false --flush --bypass-cache --timeout=0s
30127.43 tps ( 48.2 allocs/op,  14.3 tasks/op,  312416 insns/op,  1000000 errors)
```

Refs: #2363

Closes #10899

* github.com:scylladb/scylla:
  test: perf: add bypass cache argument
  test: perf: add timeout argument
  test: perf: count errors and report the count in results
  test: perf: add stop-on-error argument
  test: perf: coroutinize run_worker()
  test: perf: fix crash on exception in time_parallel_ex
2022-06-28 19:28:22 +03:00
Nadav Har'El
a8b02f7965 test: set sanitizer options in run scripts
When the run scripts for tests of cql-pytest, alternator, redis, etc.,
run Scylla, they should set the UBSAN_OPTIONS and ASAN_OPTIONS so that
if the executable is built with sanitizers enabled, it will ignore false
positives that we know about, and fail on real errors.

The change in this patch affects all test/*/run scripts which use the
this shared Scylla-starting code. test.py already had the same settings,
and it affected the tests that it knows to run directly (unit tests,
cql-pytest, etc.).

Fixes #10904

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10915
2022-06-28 18:24:48 +03:00
Konstantin Osipov
22d30abed8 test.py: log what cluster is used for a test 2022-06-28 18:22:01 +03:00
Konstantin Osipov
dbdfac7a0f test.py: improve logging of ScyllaServer 2022-06-28 18:22:01 +03:00
Konstantin Osipov
ad7423649f test.py: make ScyllaServer more reliable and fast
1) Stick to the specific server in control connections.
It could happen that, when starting a cluster and checking
if a specific node is up, the check would actually execute
against an already running node. Prevent this from happening
by setting a white list connection balancing policy for control
connections.

2) When checking if CQL is up, ignore timeout errors
Scylla in debug mode can easily time out on a DDL query,
and the timeout error at start up would lead to the entire cluster
marked as broken. This is too harsh, allow timeouts at start.

3) No longer force schema migration when starting the server
By default, Raft is on, so the nodes are getting schema
through Raft leader. Schema migration significantly slows
down cluster start in debug mode (60 seconds -> 100 seconds),
and even though it was a great test that helped discover
several bugs in Scylla, it shouldn't be part of normal
cluster boot, so disable it.
2022-06-28 18:21:25 +03:00
David Garcia
b85843b9cc Fix broken links
Fix broken links
2022-06-28 15:19:36 +01:00
Alejo Sanchez
c478a53d9c test.py topology: repro for issue #1207
Repro for bug in concurrent schema changes for many tables and indexing
involved.

Do alter tables by doing in parallel new table creation, alter a table
(_alter), and index other tables (_index).

Original repro had sets of 20 of those and slept for 20 seconds to
settle. This repro does it for Scylla with just 1 set and 1 second.

This issue goes away once Raft is enabled.

https://github.com/scylladb/scylla/issues/1207

Originally at https://issues.apache.org/jira/browse/CASSANDRA-10250

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:27 +02:00
Alejo Sanchez
4228bfef84 test.py: port fixture fails_without_raft
Port fails_without_raft to higher level conftest file for future use in
topology pytests.

While there, make it async.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:27 +02:00
Alejo Sanchez
e2cc35b768 test.py topology: table methods to add/remove index
Add helper methods to add and drop indexes on a given column.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:27 +02:00
Alejo Sanchez
d80857e26e test.py topology: add/drop table column helpers
Helper to add/drop a specified or random column.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:27 +02:00
Alejo Sanchez
e8e6a8e85a test.py topology: insert sequential row
For each table keep a counter and insert rows with sequential values
generated correspondingly by each column's type.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:27 +02:00
Alejo Sanchez
0624be6d58 test.py: remove deprecated test test_null
With test_schema there's no need for test_null.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:27 +02:00
Alejo Sanchez
ed140f98d8 test.py: managed random tables
Helpers to create keyspace and manange randomized tables.

Fixture drops all created tables still active after the test finishes.

Includes helper methods to verify schema consistency.

These helpers will be used in Raft schema changes tests coming later.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:27 +02:00
Alejo Sanchez
df1a032c04 test.py: test_keyspace fixture async
Make test_keyspace fixture async.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:26 +02:00
Alejo Sanchez
fda69a0773 test.py: rename fixture test_keyspace to keyspace
Name makes better sense as it's not a test but a fixture for tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:26 +02:00
Alejo Sanchez
00648342a6 test.py topology: test with asyncio
Run test async using a wrapper for Cassandra python driver's future.

The wrapper was suggested by a user and brought forward by @fruch.
It's based on https://stackoverflow.com/a/49351069 .

Redefine pytest event_loop fixture to avoid issues with fixtures with
scope bigger than function (like keyspace).
See https://github.com/pytest-dev/pytest-asyncio/issues/68

Convert sample test_null to async. More useful test cases will come
afterwards.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-06-28 15:07:26 +02:00
David Garcia
8e7ebea335 Merge remote-tracking branch 'upstream/master' into move-dev-docs 2022-06-28 11:02:38 +01:00
Botond Dénes
6c818f8625 Merge 'sstables: generation_type tidy-up' from Michael Livshin
- Use `sstables::generation_type` in more places
- Enforce conceptual separation of `sstables::generation_type` and `int64_t`
- Fix `extremum_tracker` so that `sstables::generation_type` can be non-default-constructible

Fixes #10796.

Closes #10844

* github.com:scylladb/scylla:
  sstables: make generation_type an actual separate type
  sstables: use generation_type more soundly
  extremum_tracker: do not require default-constructible value types
2022-06-28 08:50:12 +03:00
Calle Wilund
688fd31e64 commitlog: Add counters for actual pending allocations + segment wait
Fixes #9367

The CL counters pending_allocations and requests_blocked_memory are
exposed in graphana (etc) and often referred to as metrics on whether
we are blocking on commit log. But they don't really show this, as
they only measure whether or not we are blocked on the memory bandwidth
semaphore that provides rate back pressure (fixed num bytes/s - sortof).

However, actual tasks in allocation or segment wait is not exposed, so
if we are blocked on disk IO or waiting for segments to become available,
we have no visible metrics.

While the "old" counters certainly are valid, I have yet to ever see them
be non-zero in modern life.

Closes #9368
2022-06-28 08:36:27 +03:00
Piotr Dulikowski
6c69606702 test: perf: add bypass cache argument
Adds the "--bypass-cache" argument which adds a "BYPASS CACHE" clause
to the query being run in the benchmark. It only affects the read mode.
2022-06-27 22:14:29 +02:00
Piotr Dulikowski
fdd0a4146f test: perf: add timeout argument
Adds the "--timeout" argument which allows specifying a timeout used in
all operations. It works by inserting "USING <timeout>" in appropriate
place in the query.

The flag is most useful when set to zero - with an appropriate
combination of other flags (flush, bypass cache) it guarantees that each
operation will time out and performance of the timeout handling logic
can be measured.
2022-06-27 22:14:29 +02:00
Piotr Dulikowski
b9250a43e3 test: perf: count errors and report the count in results
Now, exceptions encountered during the test are counted as errors, and
the error count is reported at the end of the test.
2022-06-27 22:14:29 +02:00
Piotr Dulikowski
21612f97b0 test: perf: add stop-on-error argument
Adds the "--stop-on-error" argument to perf_simple_query. When enabled
(and it is enabled by default), the benchmark will propagate exceptions
if any occur in the tested function. Otherwise, errors will be ignored.
2022-06-27 22:14:29 +02:00
Piotr Dulikowski
d3bc946859 test: perf: coroutinize run_worker()
Converts the executor::run_worker() method to a coroutine. This will
allow extending the function in further commits without having to
allocate continuations.
2022-06-27 22:14:29 +02:00
Piotr Dulikowski
33b22e78be test: perf: fix crash on exception in time_parallel_ex
The `time_parallel_ex` function creates a sharded<executor> and uses it
to run the benchmark on multiple shards in parallel. However, if the
benchmarking function throws an exception, the sharded<executor> will be
destroyed without being stopped, which triggers an assertion in
sharded<T> destructor.

This commit makes sure that the executor is stopped before being
destroyed by putting `exec.stop()` into a `seastar::defer`.
2022-06-27 22:14:29 +02:00
Avi Kivity
3131cbea62 Merge 'query: allow replica to provide arbitrary continue position' from Botond Dénes
Currently, we use the last row in the query result set as the position where the query is continued from on the next page. Since only live rows make it into query result set, this mandates the query to be stopped on a live row on the replica, lest any dead rows or tombstones processed after the live rows, would have to be re-processed on the next page (and the saved reader would have to be thrown away due to position mismatch). This requirement of having to stop on a live row is problematic with datasets which have lots of dead rows or tombstones, especially if these form a prefix. In the extreme case, a query can time out before it can process a single live row and the data-set becomes effectively unreadable until compaction gets rid of the tombstones.
This series prepares the way for the solution: it allows the replica to determine what position the query should continue from on the next page. This position can be that of a dead row, if the query stopped on a dead row. For now, the replica supplies the same position that would have been obtained with looking at the last row in the result set, this series merely introduces the infrastructure for transferring a position together with the query result, and it prepares the paging logic to make use of this position. If the coordinator is not prepared for the new field, it will simply fall-back to the old way of looking at the last row in the result set. As I said for now this is still the same as the content of the new field so there is no problem in mixed clusters.

Refs: https://github.com/scylladb/scylla/issues/3672
Refs: https://github.com/scylladb/scylla/issues/7689
Refs: https://github.com/scylladb/scylla/issues/7933

Tests: manual upgrade test.
I wrote a data set with:
```
./scylla-bench -mode=write -workload=sequential -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -clustering-row-size=8096 -partition-count=1000
```
This creates large, 80MB partitions, which should fill many pages if read in full. Then I started a read workload:
```
./scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100
```
I confirmed that paging is happening as expected, then upgraded the nodes one-by-one to this PR (while the read-load was ongoing). I observed no read errors or any other errors in the logs.

Closes #10829

* github.com:scylladb/scylla:
  query: have replica provide the last position
  idl/query: add last_position to query_result
  mutlishard_mutation_query: propagate compaction state to result builder
  multishard_mutation_query: defer creating result builder until needed
  querier: use full_position instead of ad-hoc struct
  querier: rely on compactor for position tracking
  mutation_compactor: add current_full_position() convenience accessor
  mutation_compactor: s/_last_clustering_pos/_last_pos/
  mutation_compactor: add state accessor to compact_mutation
  introduce full_position
  idl: move position_in_partition into own header
  service/paging: use position_in_partition instead of clustering_key for last row
  alternator/serialization: extract value object parsing logic
  service/pagers/query_pagers.cc: fix indentation
  position_in_partition: add to_string(partition_region) and parse_partition_region()
  mutation_fragment.hh: move operator<<(partition_region) to position_in_partition.hh
2022-06-27 12:23:21 +03:00
Pavel Emelyanov
708d3a1ea4 cql-test: Initialize cluster port as integer
Otherwise it complains like this:

_________ ERROR at setup of test_allow_filtering_indexed_no_filtering __________

request = <SubRequest 'cql' for <Function test_allow_filtering_indexed_no_filtering>>

    @pytest.fixture(scope="session")
    def cql(request):
        [...<snip>...]
>       return cluster.connect()

test/cql-pytest/conftest.py:66:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cassandra/cluster.py:1708: in cassandra.cluster.Cluster.connect
    ???
cassandra/cluster.py:1765: in cassandra.cluster.Cluster._new_session
    ???
cassandra/cluster.py:2563: in cassandra.cluster.Session.__init__
    ???
cassandra/pool.py:203: in cassandra.pool.Host.__str__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: %d format: a real number is required, not str

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-27 09:00:48 +03:00
Benny Halevy
751eceb2e6 types: time_point_to_string: use numeric formatting rather than chrono-format specifiers
As reported in #10867, newer versions of the fmt library
format %Y using 4-characters width, 0-padding the prefix
when needed, while older versions don't do that.

This change moves away from using %Y and friends
fmt specifiers to using explicit numeric-based formatting
conforming to ISO 8601 and making sure the year field
has at least 4 digits and is zero padded.  When
negative, the width is upped to 5 so it would show as -0001
rather than -001.

The unit test was updated respectively.

Fixes #10867

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10870
2022-06-27 08:28:56 +03:00