Commit Graph

2938 Commits

Author SHA1 Message Date
Calle Wilund
5b60a6cf7c perf: Add aio_writes mixin for perf_results
Can be used with time_parallel_ex. Adds measurements for aio writes/aio written bytes.
2022-04-05 13:42:36 +00:00
Calle Wilund
12ab34a3d9 test/perf/perf.hh: Make templated version of test routine to allow extended stats
Adds sub-template for time_parallel with templated result type + optional
per-iteration post-process func. Idea is that Res may be a subtype of
perf_result, with additional stats, initiated on init, and post-process
function can fix up and apply stats -> we can add stats to result.
2022-04-05 13:30:42 +00:00
Tomasz Grabiec
9c96a37143 Merge "raft: nemesis test: use abort_source for time-outs" from Kamil
When a Raft API call such as `add_entry`, `set_configuration` or
`modify_config` takes too long, we need to time-out. There was no way to
abort these calls previously so we would do that by discarding the futures.
Recently the APIs were extended with `abort_source` parameters. Use this.

Also improve debuggability if the functions throw an exception type that
we don't expect. Previously if they did, a cryptic assert would fail
somewhere deep in the generator code, making the problem hard to debug.

Also collect some statistics in the test about the number of successful
and failed ops. I used it to manually check whether there was a
difference in how often operations fail with using the out timeout
method and the new timeout method (there doesn't seem to be any).

* kbr/nemesis-abort-source:
  test: raft: randomized_nemesis_test: on timeout, abort calls instead of discarding them
  raft: server: translate semaphore_aborted to request_aborted
  test: raft: logical_timer: add abortable version of `sleep_until`
  test: raft: randomized_nemesis_test: collect statistics on successful and failed ops
2022-04-01 16:25:23 +02:00
Pavel Emelyanov
886a275192 Merge 'replica/table: remove v1 reader factory methods' from Botond
Only users are internal and tests.

Tests: unit(dev)

* replica-table-remove-make-reader-v1/v2 of github.com/denesb/scylla.git
  replica/table: remove v1 reader factory methods
  tests: move away from table::make_reader()
  replica/table: add short make_reader_v2() variant:
2022-04-01 13:57:10 +03:00
Botond Dénes
c8ea0715e9 tests: move away from table::make_reader()
Use v2 equivalents instead.
2022-04-01 13:39:26 +03:00
Nadav Har'El
758f8f01d7 test/alternator: turn REST API finding into a fixture
In test_tracing.py and util.py, we already have three duplicates of code
which looks for the Scylla REST API. We'll soon want to add even more uses
of this REST API, so it's good time to add a single fixture, "rest_api",
which can be use in all tests that need the Scylla REST API instead of
duplicating the same code.

A test using the "rest_api" fixture will be skipped if the server isn't
Scylla, or its port 10000 is not available or not responsive.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220331195337.64352-1-nyh@scylladb.com>
2022-04-01 10:51:59 +03:00
Avi Kivity
af07519928 Merge "Remove reader from mutations v1" from Botond
"
First migrate all users to the v2 variant, all of which are tests.
However, to be able to properly migrate all tests off it, a v2 variant
of the restricted reader is also needed. All restricted reader users are
then migrated to the freshly introduced v2 variant and the v1 variant is
removed.
Users include:
* replica::table::make_reader_v2()
* streaming_virtual_table::as_mutation_source()
* sstables::make_reader()
* tests

This allows us to get rid of a bunch of conversions on the query path,
which was mostly v2 already.

With a few tests we did kick the can down the road by wrapping the v2
reader in `downgrade_to_v1()`, but this series is long enough already.

Tests: unit(dev), unit(boost/flat_mutation_reader_test:debug)
"

* 'remove-reader-from-mutations-v1/v3' of https://github.com/denesb/scylla:
  readers: remove now unused v1 reader from mutations
  test: move away from v1 reader from mutations
  test/boost/mutation_reader_test: use fragment_scatterer
  test/boost/mutation_fragment_test: extract fragment_scatterer into a separate hh
  test/boost: mutation_fragment_test: refactor fragment_scatterer
  readers: remove now unused v1 reversing reader
  test/boost/flat_mutation_reader_test: convert to v2
  frozen_mutation: fragment_and_freeze(): convert to v2
  frozen_mutation: coroutinize fragment_and_freeze()
  readers: migrate away from v1 reversing reader
  db/virtual_table: use v2 variant of reversing and forwardable readers
  replica/table: use v2 variant of reversing reader
  sstables/sstable: remove unused make_crawling_reader_v1()
  sstables/sstable: remove make_reader_v1()
  readers: add v2 variant of reversing reader
  readers/reversing: remove FIXME
  readers: reader from mutations: use mutation's own schema when slicing
2022-03-31 13:29:11 +03:00
Avi Kivity
5fc093ad42 Merge 'wasm: manage memory using exports from the client' from Wojciech Mitros
This patch adds importing the `malloc` and `free` method from the wasm client, and using them for allocating wasm memory for UDF arguments and freeing its result. When the methods are not exported, the old behaviour is used instead. To make that possible, this patch also includes a fix to the usage of pages in wasm memory (methods `size` and `grow`) that were used for allocating memory for arguments until now. (The source codes  for the examples didn't work on my machine in their original form, so when updating paging I've also added small unrelated modifications)

Tests:unit(dev)

Closes #10234

* github.com:scylladb/scylla:
  wasm: add wasm ABI version 2
  wasm: add WASI handling
  wasm: add documentation
  wasm: add _scylla_abi export for specifying abi for wasm udfs
  wasm: update ABI for passing parameters to wasm UDFs
  wasm: move common code to a separate function
  wasm: use wasm pages for wasm memory
2022-03-31 12:33:55 +03:00
Wojciech Mitros
56c5459c50 wasm: add null handling for wasm udf
As the name suggests, for UDFs defined as RETURNS NULL ON NULL
INPUT, we sometimes want to return nulls. However, currently
we do not return nulls. Instead, we fail on the null check in
init_arg_visitor. Fix by adding null handling before passing
arguments, same as in lua.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>

Closes #10298
2022-03-31 12:27:38 +03:00
Piotr Sarna
c0fd53a9d7 cql3: fix qualifying restrictions with IN for indexing
When a query contains IN restriction on its partition key,
it's currently not eligible for indexing. It was however
erroneously qualified as such, which lead to fetching incorrect
results. This commit fixes the issue by not allowing such queries
to undergo indexing, and comes with a regression test.

Fixes #10300

Closes #10302
2022-03-31 11:04:17 +03:00
Nadav Har'El
3eaafbbdf7 test/cql-pytest: mark a test failing on Cassandra with cassandra_bug
We have a test for the LIKE restriction with ALLOW FILTERING.

Cassandra does not yet support this combination (it only supports LIKE
with SASI indexes), so this test fails on Cassandra, suggesting either
the test is wrong, or Cassandra is wrong. In this case, Cassandra is
wrong - they have an issue requesting this to be fixed -
https://issues.apache.org/jira/browse/CASSANDRA-17198, and even an
implementation which is being reviewed.

So let's mark this test with "cassandra_bug", meaning it is expected
to fail (xfail) when running against Cassandra. When CASSANDRA-17198
is fixed, we can remove the cassandra_bug mark.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220330211734.4103691-1-nyh@scylladb.com>
2022-03-31 09:47:44 +02:00
Botond Dénes
fd69add579 test: move away from v1 reader from mutations
Use the v2 variant instead.
2022-03-31 10:36:23 +03:00
Botond Dénes
2e00ff314d test/boost/mutation_reader_test: use fragment_scatterer
Instead of the open-coded equivalent the test currently has.
2022-03-31 10:25:45 +03:00
Botond Dénes
feecc19d5b test/boost/mutation_fragment_test: extract fragment_scatterer into a separate hh
We want to use it in test/boost/mutation_reader_test.cc too.
2022-03-31 10:25:45 +03:00
Botond Dénes
226f01162e test/boost: mutation_fragment_test: refactor fragment_scatterer
Instead of taking an output parameter in the constructor, take just the
desired number of mutations to build and return the mutation list from
`consume_end_of_stream()`.
2022-03-31 10:25:45 +03:00
Botond Dénes
b8f0ab3b98 readers: remove now unused v1 reversing reader 2022-03-31 10:04:45 +03:00
Botond Dénes
56e3c6add6 test/boost/flat_mutation_reader_test: convert to v2 2022-03-31 10:04:29 +03:00
Botond Dénes
2e634883d9 frozen_mutation: fragment_and_freeze(): convert to v2 2022-03-31 09:57:48 +03:00
Botond Dénes
219cb881a4 sstables/sstable: remove make_reader_v1()
No external users, only used internally, by make_reader(), who delegates
cases currently unsupported by v2 to it. The code needed from
make_reader_v1() is inlined into make_reader() and the former is
removed.
2022-03-31 09:57:48 +03:00
Botond Dénes
470dc0d013 readers: add v2 variant of reversing reader
The v2 format allows for a much simpler reversing mechanism since
clustering fragments can simply be reversed as they are read. Fragments
are directly pushed in the reader's buffer eliminating a separate move
phase.
Existing reverse reader unit tests are converted to test the v2 one.
2022-03-31 09:57:48 +03:00
Wojciech Mitros
93b082e8d9 wasm: add _scylla_abi export for specifying abi for wasm udfs
Different languages may require different ABIs for passing
parameters, etc. This patch adds a requirement for all wasm
UDFs to export an _scylla_abi symbol, that is an 32-bit integer
with a value specifying the ABI version.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2022-03-30 19:37:11 +02:00
Wojciech Mitros
a7ee3ccf52 wasm: update ABI for passing parameters to wasm UDFs
WebAssembly uses 32-bit address space, while also
having 64-bit integers as it native types. As a result,
when passing size of an object in memory and its address,
it can be combined into one 64-bit value. As a bonus,
if the object is null, we can signal it by passing -1 as
its size.

This patch implements handling of this new ABI and adjusts
expamples in test_wasm.py.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2022-03-30 17:13:25 +02:00
Wojciech Mitros
62761a7cf3 wasm: use wasm pages for wasm memory
The memory.grow and memory.size wasm methods return
the memory size in pages, and memory.size takes its
argument in the number of pages. A WebAssembly page
has a size of 64KiB, so during memory allocation
we have to divide our desired size in bytes by page
size and round up. Similarly, when reading memory
size we need to multiply the result by 64KiB to
get the size in bytes.

The change affects current naive allocator for
arguments when calling wasm UDFs and the examples
in wasm_test.py - both commented code and compiled
wasm in text representation.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2022-03-30 17:13:13 +02:00
Botond Dénes
b029bd3db7 tree: remove mutation_reader.hh include
In most files it was unused. We should move these to the patch which
moved out the last interesting reader from mutation_reader.hh (and added
the corresponding new header include) but its probably not worth the
effort.
Some other files still relied on mutation_reader.hh to provide reader
concurrency semaphore and some other misc reader related definitions.
2022-03-30 15:42:51 +03:00
Botond Dénes
b7954138ac mutation_reader: move compacting reader into readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
11c378a175 mutation_reader: move queue reader to readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
f24f2f726a mutation_reader: move filtering reader into readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
d0ea895671 readers: move multishard reader & friends to reader/multishard.cc
Since the multishard reader family weighs more than 1K SLOC, it gets
its own .cc file.
2022-03-30 15:42:51 +03:00
Botond Dénes
f8015d9c26 readers: move combined reader into readers/
Since the combined reader family weighs more than 1K SLOC, it gets its
own .cc file.
2022-03-30 15:42:51 +03:00
Botond Dénes
0c3d4091a4 Merge "Make TWCS' cleanup bucket aware" from Raphael S. Carvalho
"
Quoting patch 3/4:
"This continues the work in a69d98c3d0,
by implementing the cleanup method in TWCS to make it bucket aware.
Till now, the default impl was used which cleanups on file at a
time, starting from the smallest.

The cleanup strategy for TWCS is simple. It's simply calling the
size tiered cleanup method for each bucket, so there will be
one job for each tier in each window.

The next strategies to receive this improvement are LCS and ICS
(the latter one being only available in enterprise).

Refs #10097."

** Simply put, the goal is to reduce writeamp when performing cleanup
on a TWCS table, therefore reducing the operation time. **

tests: unit(dev).
"

* 'twcs_cleanup_bucket_aware/v1' of https://github.com/raphaelsc/scylla:
  tests: sstable_compaction_test: Add test for TWCS' bucket-aware cleanup
  compaction: TWCS: Implement cleanup method for bucket awareness
  compaction: TWCS: change get_buckets() signature to work with const qualified functions
  compaction_strategy: get_cleanup_compaction_jobs: accept candidates by value
2022-03-30 11:45:28 +03:00
Kamil Braun
475a408792 test: raft: randomized_nemesis_test: on timeout, abort calls instead of discarding them
When a Raft API call such as `add_entry`, `set_configuration` or
`modify_config` takes too long, we need to time-out. There was no way to
abort these calls previously so we would do that by discarding the futures.
Recently the APIs were extended with `abort_source` parameters. Use this.

Also improve debuggability if the functions throw an exception type that
we don't expect. Previously if they did, a cryptic assert would fail
somewhere deep in the generator code, making the problem hard to debug.
2022-03-29 18:25:26 +02:00
Kamil Braun
8d4dd53c25 test: raft: logical_timer: add abortable version of sleep_until
`sleep_until(time_point tp, abort_source& as)` will sleep until `tp`, or
until `as.request_abort()` is called, whatever comes first.
2022-03-29 15:10:29 +02:00
Kamil Braun
6f9dcd784c test: raft: randomized_nemesis_test: collect statistics on successful and failed ops 2022-03-29 15:10:29 +02:00
Raphael S. Carvalho
a1fd9c1ee8 tests: sstable_compaction_test: Add test for TWCS' bucket-aware cleanup
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-29 09:52:11 -03:00
Botond Dénes
316ff9eb86 test/boost/mutation_test: add test for mutation_fragment_stream_validator 2022-03-29 13:19:05 +03:00
Nadav Har'El
024ecd45a2 test/rest_api: add reproducer for REST API JSON encoding bug
This patch adds a reproducer for the JSON encoding in issue #9061.
The bug was already fixed (it was a Seastar bug, and Seastar was
updated in commit 5d4213e1b8), but
I verified that the test fails before that patch - and passes today.

It is useful to have such a test for regressions, as well as for
testing backports.

Unfortunately, the test isn't pretty. The test uses the toppartitions
API, which instead of having a "start" and "stop" request has a single
synchronous "start for a given duration" request, and we need to run
it with some fixed duration (we took 1 second), and in parallel, one
request.

Refs #9061.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220323180855.3307931-1-nyh@scylladb.com>
2022-03-28 15:28:45 +03:00
Nadav Har'El
d8c0680585 test/alternator: add regression test for old ALL_NEW bug
In commit 964500e47a, in the middle of
a larger series, I fixed a small Alternator bug that I found while working
on that series. The bug was that the ReturnValues=ALL_NEW feature moved out
the read previous_item, which breaks operations that need previous_item,
e.g., an ADD operation. Unfortunately, we never had a regression test for
this fix bug, so in this patch I add one.

This bug was re-discovered on an old branch by a user, at which point
I noticed that we don't have a test for it - so I want to add it now,
even though the bug itself is long gone from Scylla master.

I verified that the new test indeed fails on old versions of Scylla
before the aforementioned commit, and passes when backporting only that
commit.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220327074928.3608576-1-nyh@scylladb.com>
2022-03-28 08:40:28 +02:00
Avi Kivity
3c2271af52 Merge "De-globalize system keyspace local cache" from Pavel E
"
There's a static global sharded<local_cache> variable in system keyspace
the keeps several bits on board that other subsystems need to get from
the system keyspace, but what to have it in future<>-less manner.

Some time ago the system_keyspace became a classical sharded<> service
that references the qctx and the local cache. This set removes the global
cache variable and makes its instances be unique_ptr's sitting on the
system keyspace instances.

The biggest obstacle on this route is the local_host_id that was cached,
but at some point was copied onto db::config to simplify getting the value
from sstables manager (there's no system keyspace at hand there at all).
So the first thing this set does is removes the cached host_id and makes
all the users get it from the db::config.

(There's a BUG with config copy of host id  -- replace node doesn't
 update it. This set also fixes this place)

De-globalizing the cache is the prerequisite for untangling the snitch-
-messaging-gossiper-system_keyspace knot. Currently cache is initialized
too late -- when main calls system_keyspace.start() on all shards -- but
before this time messaging should already have access to it to store
its preferred IP mappings.

tests: unit(dev), dtest.simple_boot_shutdown(dev)
"

* 'br-trade-local-hostid-for-global-cache' of https://github.com/xemul/scylla:
  system_keyspace: Make set_local_host_id non-static
  system_keyspace: Make load_local_host_id non-static
  system_keyspace: Remove global cache instance
  system_keyspace: Make it peering service
  system_keyspace,snitch: Make load_dc_rack_info non-static
  system_keyspace,cdc,storage_service: Make bootstrap manipulations non-static
  system_keyspace: Coroutinize set_bootstrap_state
  gossiper: Add system keyspace dependency
  cdc_generation_service: Add system keyspace dependency
  system_keyspace: Remove local host id from local cache
  storage_service: Update config.host_id on replace
  storage_service: Indentation fix after previous patch
  storage_service: Coroutinize prepare_replacement_info()
  system_distributed_keyspace: Indentation fix after previous patch
  code,system_keyspace: Relax system_keyspace::load_local_host_id() usage
  code,system_keyspace: Remove system_keyspace::get_local_host_id()
2022-03-27 17:19:24 +03:00
Nadav Har'El
653f2df28f alternator: fix JSON escaping of error responses
In the DynamoDB API, error responses are in JSON format with specific
fields ("__type" and "message" in the x-amz-json-1.0 format currently
used). Alternator tried to be clever and build the string representation
of this JSON itself, instead of using RapidJSON. But this optimization
was a mistake - if the error message contains characters that need
escaping (such as double quotes and newlines), they weren't escaped,
and the resulting JSON was malformed. When the client library boto3
read this malformed JSON it got confused, cosidered the entire error
response to be a string, which resulted in an ugly error message.

The fix is easy - just build the JSON output as usual with RapidJSON
instead of trying to optimize using string operation.

The patch also includes two tests reproducing this bug and checking its
fix. The first test uses boto3 and shows it got confused on the type
of error (not understanding that it is a ValidationException). The
second test bypasses boto3 and shows exactly where the bug happens -
the response is an unparsable JSON.

Fixes #10278

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220327132705.3707979-1-nyh@scylladb.com>
2022-03-27 16:32:36 +03:00
Botond Dénes
a69d98c3d0 Merge "Improve efficiency of cleanup compaction by making it bucket aware" from Raphael S. Carvalho
"
Cleanup compaction works by rewriting all sstables that need clean up, one at
a time.
This approach can cause bad write amplification because the output data is
being made incrementally available for regular compaction.
Cleanup is a long operation on large data sets, and while it's happening,
new data can be written to buckets, triggering regular compaction.
Cleanup fighting for resources with regular compaction is a known problem.
With cleanup adding one file at a time to buckets, regular may require multiple
rounds to compact the data in a given bucket B, producing bad writeamp.

To fix this problem, cleanup will be made bucket aware. As each compaction
strategy has its own definition of bucket, strategies will implement their
own method to retrieve cleanup jobs. The method will be implemented such that
all files in a bucket B will be cleaned up together, and on completion,
they'll be made available for regular at once.

For STCS / ICS, a bucket is a size tier.
For TWCS, a bucket is a window.
For LCS, a bucket is a level.

In this way, writeamp problem is fixed as regular won't have to perform
multiple rounds to compact the data in a given bucket. Additionally, cleanup
will now be able to deduplicate data and will become way more efficient at
garbage collecting expired data.

The space requirement shouldn't be an issue, as compacting an entire bucket
happens during regular compaction anyway.
With leveled strategy, compacting an entire level is also not a problem because
files in a level L don't overlap and therefore incremental compaction is
employed to limit the space requirement.

By the time being, only STCS cleanup was made bucket aware. The others will be
using a default method, where one file is cleaned up at a time. Making cleanup
of other strategies bucket aware is relatively easy now and will be done soon.

Refs #10097.
"

* 'cleanup-compaction-revamp/v3' of https://github.com/raphaelsc/scylla:
  test: sstable_compaction_test: Add test for strategy cleanup method
  compaction: STCS: Implement cleanup strategy
  compaction_manager: Wire cleanup task into the strategy cleanup method
  compaction_strategy: Allow strategies to define their own cleanup strategy
  compaction: Introduce compaction_descriptor::sstables_size
  compaction: Move decision of garbage collection from strategy to task type
2022-03-25 16:30:28 +02:00
Raphael S. Carvalho
5312526e5e test: sstable_compaction_test: Add test for strategy cleanup method
Stresses default and STCS implementations of cleanup method

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-25 11:23:29 -03:00
Mikołaj Sielużycki
6f1b6da68a compile: Fix headers so that *-headers targets compile cleanly.
Closes #10273
2022-03-25 16:19:26 +02:00
Pavel Emelyanov
3da5f6ac30 gossiper: Add system keyspace dependency
The gossiper reads peer features from system keyspace. Also the snitch
code needs system keyspace, and since now it gets all its dependencies
from gossiper (will be fixed some day, but not now), it will do the same
for sys.ks.. Thus it's worth having gossiper->system_keyspace explicit
dependency.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-25 15:08:13 +03:00
Pavel Emelyanov
62417577ab cdc_generation_service: Add system keyspace dependency
The service uses system keyspace to, e.g., manage the generation id,
thus it depends on the system_keyspace instance and deserves the
explicit reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-25 13:39:32 +03:00
Nadav Har'El
06fdce82aa test/*/run: clearer error message on wrong SCYLLA setting
The test runners cql-pytest/run et al. try to automatically find the
last-compile Scylla executable, but this decision can be overriden by
the SCYLLA environment variable. If the user sets by mistake SCYLLA to
something which is not a valid path of an executable, the result was a
long and obscure Python stack trace.

So after this patch, if SCYLLA points to something which is not an
executable, a clear error is produced immediately, directing the user
to set it this variable to a correct executable

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220323164427.3301828-1-nyh@scylladb.com>
2022-03-23 19:01:17 +02:00
Jadw1
1438be5311 CQL3: Bloom filter efficacy test
Added CQL pytest to check bloom filter efficacy by
inserting `N` rows and reading `M` non-existing keys.

Added `bloom_filter_false_positives` method to Python `nodetool`
module. Method gets fp number by calling Scylla's API.

Fixes: #1055

Closes #10186
2022-03-23 16:51:50 +02:00
Calle Wilund
56c383ba8e test/perf/perf_commitlog: Add a small commitlog throughput test
Based on perf_simple_query, just bashes data into CL using
normal distribution min/max data chunk size, allowing direct
freeing of segments, _but_ delayed by a normal dist as well,
to "simulate" secondary delay in data persistance.

Needs more stuff.

Some baseline measurements on master:

--min-flush-delay-in-ms 10 --max-flush-delay-in-ms 200
--commitlog-use-hard-size-limit true
--commitlog-total-space-in-mb 10000 --min-data-size 160 --max-data-size 1024
--smp1

median 2065648.59 tps (  1.1 allocs/op,   0.0 tasks/op,    1482 insns/op)
median absolute deviation: 48752.44
maximum: 2161987.06
minimum: 1984267.90

--min-data-size 256 --max-data-size 16384

median 269385.25 tps (  2.2 allocs/op,   0.7 tasks/op,    3244 insns/op)
median absolute deviation: 15719.13
maximum: 323574.43
minimum: 228206.28

--min-data-size 4096 --max-data-size 61440

median 67734.22 tps (  6.4 allocs/op,   2.9 tasks/op,    9153 insns/op)
median absolute deviation: 2070.93
maximum: 82833.17
minimum: 61473.57

--min-data-size 61440 --max-data-size 1843200

median 2281.37 tps ( 79.7 allocs/op,  43.5 tasks/op,  202963 insns/op)
median absolute deviation: 128.87
maximum: 3143.84
minimum: 2140.80

--min-data-size 368640 --max-data-size 6144000

median 679.76 tps (225.5 allocs/op, 116.3 tasks/op,  662700 insns/op)
median absolute deviation: 39.30
maximum: 1148.95
minimum: 586.86

Actual throughput obviously meaningless, as it is run on my slow
machine, but IPS might be relevant.
Note that transaction throughput plummets as we increase median data
sizes above ~200k, since we then more or less always end up replacing
buffers in every call.

Closes #10230
2022-03-22 15:18:25 +02:00
Avi Kivity
72c6859c25 Merge "readers: get rid of v1 mutation from fragments" from Botond
"
The only real user is view building, which is converted to v2 and then
the v1 version of the mutation from fragments reader is removed.

Tests: unit(dev, release)
"

* 'v2-only-from-fragments-mutations/v1' of https://github.com/denesb/scylla:
  readers: remove now unused v1 reader from fragments
  test/boost: flat_mutation_reader_test: remove reader from fragments test
  replica/table: migrate generate_and_propagate_view_updates() to v2
  replica/table: migrate populate_views() to v2
  db/view: convert view_update_builder interface to v2
  db/view: migrate view_update_builder to v2
2022-03-22 15:18:25 +02:00
Raphael S. Carvalho
c25d8f6770 compaction: Move decision of garbage collection from strategy to task type
For compaction to be able to purge expired data, like tombstones, a
sstable set snapshot is set in the compaction descriptor.

That's a decision that belongs to task type. For example, all regular
compaction enable GC, whereas scrub for example doesn't for safety
reasons.

The problem is that the decision is being made by every instantiation
of compaction_descriptor in the strategies, which is both unnecessary
and also adds lots of boilerplate to the code, making it hard to
understand and work with.

As sstable set snapshot is an implementation detail, a new method
is being added to compaction_descriptor to make the intention
clearer, making the interface easier to understand.

can_purge_tombstones, used previously by rewrite task only, is being
reused for communicating GC intention into task::compact_sstables().

The boilerplate was a pain when adding a new strategy method for
the ongoing work on cleanup, described by issue #10097.
Another benefit is that we'll now only create a set snapshot when
compaction will really run. Before, it could happen that the snapshot
would be discarded if the compaction attempt had to be postponed,
which is a waste of cpu cycles.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-21 12:14:04 -03:00
Nadav Har'El
f76f6dbccb secondary index: avoid special characters in default index names
In CQL, table names are limited to so-called word characters (letters,
numbers and underscores), but column names don't have such a limitation.
When we create a secondary index, its default name is constructed from
the column name - so can contain problematic characters. It can include
even the "/" character. The problem is that the index name is then used,
like a table name, to create a directory with that name.

The test included in this patch demonstrates that before this patch, this
can be misused to create subdirectories anywhere in the filesystem, or to
crash Scylla when it fails to create a directory (which it considers an
unrecoverable I/O error).

In this patch we do what Cassandra does - remove all non-word
characters from the indexed column name before constructing the default
index name. In the included test - which can run on both Scylla and
Cassandra - we verify that the constructed index name is the same as
in Cassandra, which is useful to know (e.g., because knowing the index
name is needed to DROP the index).

Also, this patch adds a second line of defense against the security problem
described above: It is now an error to create a schema with a slash or
null (the two characters not allowed in Unix filenames) in the keyspace
or table names. So if the first line of defense (CQL checking the validity
of its commands) fails, we'll have that second line of defense. I verified
that if I revert the default-index-name fix, the second line of defense
kicks in, and the index creation is aborted and cannot create files in
the wrong place to crash Scylla.

Fixes #3403

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220320162543.3091121-1-nyh@scylladb.com>
2022-03-20 18:33:48 +02:00