Now that the timeout is stored in the reader
permit use it for admission rather than a timeout
parameter.
Note that evictable_reader::next_partition
currently passes db::no_timeout to
resume_or_create_reader, which propagated to
maybe_wait_readmission, but it seems to be
an oversight of the f_m_r api that doesn't
pass a timeout to next_partition().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Introduce `raft` experimental option.
Adjust the tests accordingly to accomodate the new option.
It's not enabled by default when providing
`--experimental=true` config option and should be
requested explicitly via `--experimental-options=raft`
config option.
Hide the code related to `raft_group_registry` behind
the switch. The service object is still constructed
but no initialization is performed (`init()` is not
called) if the flag is not set.
Later, other raft-related things, such as raft schema
changes, will also use this flag.
Also, don't introduce a corresponding gossiper feature
just yet, because again, it should be done after the
raft schema changes API contract is stabilized.
This will be done in a separate series, probably related to
implementing the feature itself.
Tests: unit(dev)
Ref #9239.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210823121956.167682-1-pa.solodovnikov@scylladb.com>
Merged patch series by By Benny Halevy:
Prepare for updating seastar submodule to a change
that requires deferred actions to be noexcept
(and return void).
Test: unit(dev, debug)
* tag 'deferred_action-noexcept-v1' of github.com:bhalevy/scylla:
everywhere: make deferred actions noexcept
cql3: prepare_context: mark methods noexcept
commitlog: segment, segment_manager: mark methods noexcept
everywhere: cleanup defer.hh includes
Use a forward declaration of cql3::expr::oper_t to reduce the
number of translation units depending on expression.hh.
Before:
$ find build/dev -name '*.d' | xargs cat | grep -c expression.hh
272
After:
$ find build/dev -name '*.d' | xargs cat | grep -c expression.hh
154
Some translation units adjust their includes to restore access
to required headers.
Closes#9229
Prepare for updating seastar submodule to a change
that requires deferred actions to be noexcept
(and return void).
Test: unit(dev, debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Get rid of unused includes of seastar/util/{defer,closeable}.hh
and add a few that are missing from source files.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
loading_shared_values/loading_cache'es iterators interface is dangerous/fragile because
iterator doesn't "lock" the entry it points to and if there is a
preemption point between aquiring non-end() iterator and its
dereferencing the corresponding cache entry may had already got evicted (for
whatever reason, e.g. cache size constraints or expiration) and then
dereferencing may end up in a use-after-free and we don't have any
protection against it in the value_extractor_fn today.
And this is in addition to #8920.
So, instead of trying to fix the iterator interface this patch kills two
birds in a single shot: we are ditching the iterators interface
completely and return value_ptr from find(...) instead - the same one we
are returning from loading_cache::get_ptr(...) asyncronous APIs.
A similar rework is done to a loading_shared_values loading_cache is
based on: we drop iterators interface and return
loading_shared_values::entry_ptr from find(...) instead.
loading_cache::value_ptr already takes care of "lock"ing the returned value so that it
would relain readable even if it's evicted from the cache by the time
one tries to read it. And of course it also takes care of updating the
last read time stamp and moving the corresponding item to the top of the
MRU list.
Fixes#8920
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20210817222404.3097708-1-vladz@scylladb.com>
Row-level rpair hashes the mutation fragment and wraps this into a
private fragment_hasher class. For some reason it takes ~20 minutes
for clang to compile the row_level.o with -O3 level (release mode).
Putting the whole fragment_hasher into a dedicated file reduces the
compilation times ~9 times.
However, it seems more natural not to move the fragment_hasher around
but to specialize the appending_hash<> for mutation_fragment and make
row_level.cc code just call feed_hash().
Compilation times (release mode):
before after
row_level.o 19m34s 2m4s
mutation_fragment.o 13s 17s
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patch is going to change the way row-level repair code hashes
mutation_fragment objects. This patch prepares the sanity check for
the hash values not be accidentally changed by hashing some simple
fragments and comparing them against known expected values.
The hash_mutation_fragment_for_test helper is added for this patch
only and will be removed really soon.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
* Make `sstable::make_reader()` return `flat_mutation_reader_v2`,
retain the old one as `sstable::make_reader_v1()`
* Start weaning tests off `sstable::make_reader_v1()` (done all the
easy ones, i.e. those not involving range tombstones)
"
* tag 'sstable-make-reader-v2-v1' of github.com:cmm/scylla:
tests: use flat_mutation_reader_v2 in the easier part of sstable_3_x_test
tests: upgrade the "buffer_overflow" test to flat_mutation_reader_v2
tests: get rid of sstable::make_reader_v1() in broken_sstable_test
tests: get rid of sstable::make_reader_v1() in the trivial cases
sstables: make sstable::make_reader() return flat_mutation_reader_v2
"
Validation compaction -- although I still maintain that it is a good
descriptive name -- was an unfortunate choice for the underlying
functionality because Origin has burned the name already as it uses it
for a compaction type used during repair. This opens the door for
confusion for users coming from Cassandra who will associate Validation
compaction with the purpose it is used for in Origin.
Additionally, since Origin's validation compaction was not user
initiated, it didn't have a corresponding `nodetool` command to start
it. Adding such a command would create an operational difference between
us and Origin.
To avoid all this we fold validation compaction into scrub compaction,
under a new "validation" mode. I decided against using the also
suggested `--dry-mode` flag as I feel that a new mode is a more natural
choice, we don't have to define how it interacts with all the other
modes, unlike with a `--dry-mode` flag.
Fixes: #7736
Tests: unit(dev), manual(REST API)
"
* 'scrub-validation-mode/v2' of https://github.com/denesb/scylla:
compaction/compaction_descriptor: add comment to Validation compaction type
compaction/compaction_descriptor: compaction_options: remove validate
api: storage_service: validate_keyspace -> scrub_keyspace (validate mode)
compaction/compaction_manager: hide perform_sstable_validation()
compaction: validation compaction -> scrub compaction (validate mode)
compaction/compaction_descriptor: compaction_options: add options() accessor
compaction/compaction_descriptor: compaction_options::scrub::mode: add validate
Rename the old version to `sstables::make_reader_v1()`, to have a
nicely searcheable eradication target.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Adds a sync_point structure. A sync point is a (possibly incomplete)
mapping from hint queues to a replay position in it. Users will be able
to create sync points consisting of the last written positions of some
hint queues, so then they can wait until hint replay in all of the
queues reach that point.
The sync point supports serialization - first it is serialized with the
help of IDL to a binary form, and then converted to a hexadecimal
string. Deserialization is also possible.
The base64 encoding/decoding functions will be used for serialization of
hint sync point descriptions. Base64 format is not specific to
Alternator, so it can be moved to utils.
Calculating clustering ranges on a local index has been rewritten to use the new `expression` variant.
This allows us to finally remove the old `bounds_ranges` function.
Closes#9080
* github.com:scylladb/scylla:
cql3: Remove unused functions like bounds_ranges
cql3: Use expressions to calculate the local-index clustering ranges
statement_restrictions_test: tests for extracting column restrictions
expression: add a function to extract restrictions for a column
We are folding validation compaction into scrub (at least on the
interface level), so remove the validation entry point accordingly and
have users go through `perform_sstable_scrub()` instead.
Fold validation compaction into scrub compaction (validate mode). Only
on the interface level though: to initiate validation compaction one now
has to use `compaction_options::make_scrub(compaction_options::scrub::mode::validate)`.
The implementation code stays as-is -- separate.
With commit 1924e8d2b6, compaction code was moved into a
top level dir as compaction is layered on top of sstables.
Let's continue this work by moving all compaction unit tests
into its own test file. This also makes things much more
organized.
sstable_datafile_test, as its name implies, will only contain
sstable data tests. Perhaps it should be renamed to only
sstable_data_test, as the test also contains tests involving
other components, not only the data one.
BEFORE
$ cat test/boost/sstable_datafile_test.cc | grep TEST_CASE | wc -l
105
AFTER
$ cat test/boost/sstable_compaction_test.cc | grep TEST_CASE | wc -l
57
$ cat test/boost/sstable_datafile_test.cc | grep TEST_CASE | wc -l
48
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210802192120.148583-1-raphaelsc@scylladb.com>
These include the following:
* `currenttimestamp()`
* `currenttime()`
* `currentdate()`
* `currenttimeuuid()`
Previously, they were incorrectly marked as pure, meaning
that the function is executed at "prepare" step.
For functions that possibly depend on timing and random seed,
this is clearly a bug. Cassandra doesn't have a notion of pure
functions, so they are lazily evaluated.
Make Scylla to match Cassandra behavior for these functions.
Add a unit-test for a fix (excluding `currentdate()` function,
because there is no way to use synthetic clock with query
processor and sleeping for a whole day to demonstrate correct
behavior is clearly not an option).
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
"
`function_call` AST nodes are created for each function
with side effects in a CQL query, i.e. non-deterministic
functions (`uuid()`, `now()` and some others timeuuid-related).
These nodes are evaluated either when a query itself is executed
or query restrictions are computed (e.g. partition/clustering
key ranges for LWT requests).
We need to cache the calls since otherwise when handling a
`bounce_to_shard` request for an LWT query, we can possibly
enter an infinite bouncing loop (in case a function is used
to calculate partition key ranges for a query), since the
results can be different each time.
Furthermore, we don't support bouncing more than one time.
Returning `bounce_to_shard` message more than one time
will result in a crash.
Caching works only for LWT statements and only for the function
calls that affect partition key range computation for the query.
`variable_specifications` class is renamed to `prepare_context`
and generalized to record information about each `function_call`
AST node and modify them, as needed:
* Check whether a given function call is a part of partition key
statement restriction.
* Assign ids for caching if above is true and the call is a part
of an LWT statement.
There is no need to include any kind of statement identifier
in the cache key since `query_options` (which holds the cache)
is limited to a single statement, anyway.
Function calls are indexed by the order in which they appear
within a statement while parsing. There is no need to
include any kind of statement identifier to the cache key
since `query_options` (which holds the cache) is limited
to a single statement, anyway.
Note that `function_call::raw` AST nodes are not created
for selection clauses of a SELECT statement hence they
can only accept only one of the following things as parameters:
* Other function calls.
* Literal values.
* Parameter markers.
In other words, only parameters that can be immediately reduced
to a byte buffer are allowed and we don't need to handle
database inputs to non-pure functions separately since they
are not possible in this context. Anyhow, we don't even have
a single non-pure function that accepts arguments, so precautions
are not needed at the moment.
Add a test written in `cql-pytest` framework to verify
that both prepared and unprepared lwt statements handle
`bounce_to_shard` messages correctly in such scenario.
Fixes: #8604
Tests: unit(dev, debug)
NOTE: the patchset uses `query_options` as a container for
cached values. This doesn't look clean and `service::query_state`
seems to be a better place to store them. But it's not
forwarded to most of the CQL code and would mean that a huge number
of places would have to be amended.
The series presents a trade-off to avoid forwarding `query_state`
everywhere (but maybe it's the thing that needs to be done, nonetheless).
"
* 'lwt_bounce_to_shard_cached_fn_v6' of https://github.com/ManManson/scylla:
cql-pytest: add a test for non-pure CQL functions
cql3: cache function calls evaluation for non-deterministic functions
cql3: rename `variable_specifications` to `prepare_context`
Convert all known tri-compares that return an int to return std::strong_ordering.
Returning an int is dangerous since the caller can treat it as a bool, and indeed
this series uncovered a minor bug (#9103).
Test: unit (dev)
Fixes#1449Closes#9106
* github.com:scylladb/scylla:
treewide: remove redundant "x <=> 0" compares
test: mutation_test: convert internal tri-compare to std::strong_ordering
utils: int_range: change to std::strong_ordering
test: change some internal comparators to std::strong_ordering
utils: big_decimal: change to std::strong_ordering
utils: fragment_range: change to std::strong_ordering
atomic_cell: change compare_atomic_cell_for_merge() to std::strong_ordering
types: drop scaffolding erected around lexicographical_tri_compare
sstables: keys: change to std::strong_ordering internally
bytes: compare_unsigned(): change to std::strong_ordering
uuid: change comparators to std::strong_ordering
types: convert abstract_type::compare and related to std::strong_ordering
types: reduce boilerplate when comparing empty value
serialized_tri_compare: change to std::strong_ordering
compound_compat: change to std::strong-ordering
types: change lexicographical_tri_compare, prefix_equality_tri_compare to std::strong_ordering
"
There are few places that call global storage service, but all
are easily fixable without significant changes.
1. alternator -- needs token metadata, switch to using proxy
2. api -- calls methods from storage service, all handlers are
registered in main and can capture storage service from there
3. thrift -- calls methods from storage service, can carry the
reference via controller
4. view -- needs tokens, switch to using (global) proxy
5. storage_service -- (surprisingly) can use "this"
tests: unit(dev), dtest(simple_boot_shutdown, dev)
"
* 'br-unglobal-storage-service' of https://github.com/xemul/scylla:
storage_service: Make it local
storage_service: Remove (de)?init_storage_service()
storage_service: Use container() in run_with(out)_api_lock
storage_service: Unmark update_topology static
storage_service: Capture this when appropriate
view: Use proxy to get token metadata from
thrift: Use local storage service in handlers
thrift: Carry sharded<storage_service>& down to handler
api: Capture and use sharded<storage_service>& in handlers
api: Carry sharded<storage_service>& down to some handlers
alternator: Take token metadata from server's storage_proxy
alternator: Keep storage_proxy on server
"
This is the 3rd slowest test in the set. There are 3 cases out
there that are hard-coded to be sequential. However, splitting
them into boost test cases helps running this test faster in
--parallel-cases mode. Timings for debug mode:
Total before the patch: 25 min
Sequential after the patch: 25 min
Basic case: 5 min
Evict-paused-readers case: 5 min
Single-mutation-buffer case: 15 min
tests: unit.multishard_combining_reader_as_mutation_source(debug)
"
* 'br-parallel-mcr-test' of https://github.com/xemul/scylla:
test: Split test_multishard_combining_reader_as_mutation_source into 3
test: Fix indentation after previous patch
test: Move out internals of test_multishard_combining_reader_as_mutation_source
There are 3 places that can now declare local instance:
- main
- cql_test_env
- boost gossiper test
The global pointer is saved in debug namespace for debugging.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
This series changes the behavior of the system when executing reads
annotated with "bypass cache" clause in CQL. Such reads will not
use nor populate the sstable partition index cache and sstable index page cache.
"
* 'bypass-cache-in-sstable-index-reads' of github.com:tgrabiec/scylla:
sstables: Do not populate page cache when searching in promoted index for "bypass cache" reads
sstables: Do not populate partition index cache for "bypass cache" reads
If x is of type std::strong_ordering, then "x <=> 0" is equivalent to
x. These no-ops were inserted during #1449 fixes, but are now unnecessary.
They have potential for harm, since they can hide an accidental of the
type of x to an arithmetic type, so remove them.
Ref #1449.
gcc 10.3.1 spews the following error:
```
_test_generator::generate_scenario(std::mt19937&) const’:
test/boost/mutation_reader_test.cc:3731:28: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
3731 | for (auto i = 0; i < num_ranges; ++i) {
| ~~^~~~~~~~~~~~
```
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210728073538.2467040-1-bhalevy@scylladb.com>
Patch the .row_tombstones() to return the range_tombstone_list
wrapped into the immutable_collection<> so that callers are
guaranteed not to touch the collection itself, but still can
modify the tombstones.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some callers of mutation_partition::row_tomstones() don't want
(and shouldn't) modify the list itself, while they may want to
modify the tombstones. This patch explicitly locates those that
need to modify the collection, because the next patch will
return immutable collection for the others.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Patch the .clustered_rows() method to return the btree of rows
wrapped into the immutable_collection<> so that callers are
guaranteed not to touch the collection itself, but still can
modify the elements in it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patches will mark btree::iterator methods that modify
the tree itself as private, so stop using them in tests.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
This series fixes two issues which cause very poor efficiency of reads
when there is a lot of range tombstones per live row in a partition.
The first issue is in the row_cache reader. Before the patch, all range
tombstones up to the next row were copied into a vector, and then put
into the buffer until it's full. This would get quadratic if there is
much more range tombstones than fit in a buffer.
The fix is to avoid the accumulation of all tombstones in the vector
and invoke the callback instead, which stops the iteration as soon as
the buffer is full.
Fixes#2581.
The second, similar issue was in the memtable reader.
Tests:
- unit (dev)
- perf_row_cache_update (release)
"
* tag 'no-quadratic-rt-in-reads-v1' of github.com:tgrabiec/scylla:
test: perf_row_cache_update: Uncomment test case for lots of range tombstones
row_cache: Consume range tombstones incrementally
partition_snapshot_reader: Avoid quadratic behavior with lots of range tombstones
tests: mvcc: Relax monotonicity check
range_tombstone_stream: Introduce peek_next()