Pass the test-generated shared_sstable to validate_read
and then to sstable_assertions so it can be used
for make_sstable version and generation params.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Just use the one we created during compaction
for verification so we won't have to rely on a particular
generation/version.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than calling cf.stop_and_keep_alive() before the test exits.
since it must be stopped also on failure.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We already use test_env::do_with_async in this function
but we didn't take full advantage of it to simplify the
implementation.
Do that before further changes are made.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
generations to idx
The function used `calculate_generation_for_new_table` for
the sstables generation. The so-called `generations` are just used
to generate key indices.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
this is the 13rd changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals:
- to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience
- to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules.
this changeset includes following changes:
- build: cmake: increase per link job mem to 4GiB
- build: cmake: add missing sources to test-lib
- build: cmake: add more tests
- build: cmake: remote quotes in "include()" commands
- build: cmake: drop unnecessary linkages
Closes#13199
* github.com:scylladb/scylladb:
build: cmake: drop unnecessary linkages
build: cmake: remote quotes in "include()" commands
build: cmake: add more tests
build: cmake: add missing sources to test-lib
build: cmake: increase per link job mem to 4GiB
The translated Cassandra unit tests in cassandra_tests/validation/operations/
reproduced three bugs in GROUP BY's interaction with LIMIT and PER PARTITION
LIMIT - issue #5361, #5362 and #5363. Unfortunately, those test functions
are very long, and each test fails on all of these issues and a few more,
making it difficult to use these tests to verify when those tests have
been fixed. In other words, ideally a patch for issue 5361 should un-xfail
some reproducing test for this issue - but all the existing tests will
continue to fail after fixing 5361, because of other remaining bugs.
So in this patch, I created a new test file test_group_by.py with my own
tests for the GROUP BY feature. I tried to explore the different
capabilities of the GROUP BY feature, its different success and error
paths, and how GROUP BY interacts with LIMIT and PER PARTITION LIMIT.
As usual, I created many small test functions and not one huge test
function, and as a result we now have 5 xfailing tests which each
reproduces one bug and when the bug is fixed, it will start to pass.
All tests added here pass on Cassandra.
Refs #5361
Refs #5362
Refs #5363
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13136
A read that requested memory and has to wait for it can be registered as inactive. This can happen for example if the memory request originated from a background I/O operation (a read-ahead maybe).
Handling this case is currently very difficult. What we want to do is evict such a read on-the-spot: the fact that there is a read waiting on memory means memory is in demand and so inactive reads should be evicted. To evict this reader, we'd first have to remove it from the memory wait list, which is almost impossible currently, because `expiring_fifo<>`, the type used for the wait list, doesn't allow for that. So in this PR we set out to make this possible first, by transforming all current queues to be intrusive lists of permits. Permits are already linked into an intrusive list, to allow for enumerating all existing permits. We use these existing hooks to link the permits into the appropriate queue, and back to `_permit_list` when they are not in any special queue. To make this possible we first have to make all lists store naked permits, moving all auxiliary data fields currently stored in wrappers like `entry` into the permit itself. With this, all queues and lists in the semaphore are intrusive lists, storing permits directly, which has the following implications:
* queues no longer take extra memory, as all of them are intrusive
* permits are completely self-sufficient w.r.t to queuing: code can queue or dequeue permits just with a reference to a permit at hand, no other wrapper, iterator, pointer, etc. is necessary.
* queues don't keep permits alive anymore; destroying a permit will automatically unlink it from the respective queue, although this might lead to use-after-free. Not a problem in practice, only one code-path (`reader_concurrenc_semaphore::with_permit()`) had to be adjusted.
After all that extensive preparations, we can now handle the case of evicting a reader which is queued on memory.
Fixes: #12700Closes#12777
* github.com:scylladb/scylladb:
reader_concurrency_semaphore: handle reader blocked on memory becoming inactive
reader_concurrency_semaphore: move _permit_list next to the other lists
reader_permit: evict inactive read on timeout
reader_concurrency_semaphore: move inactive_read to .cc
reader_concurrency_semaphore: store permits in _inactive_reads
reader_concurrency_semaphore: inactive_read: de-inline more methods
reader_concurrency_semaphore: make _ready_list intrusive
reader_permit: add wait_for_execution state
reader_concurrency_semaphore: make wait lists intrusive
reader_concurrency_semaphore: move most wait_queue methods out-of-line
reader_concurrency_semaphore: store permits directly in queues
reader_permit: introduce (private) operator * and ->
reader_concurrency_semaphore: remove redundant waiters() member
reader_concurrency_semaphore: add waiters counter
reader_permit: use check_abort() for timeout
reader_concurrency_semaphore: maybe_dump_permit_diagnostics(): remove permit list param
reader_concurrency_semaphroe: make foreach_permit() const
reader_permit: add get_schema() and get_op_name() accessors
reader_concurrency_semaphore: mark maybe_dump_permit_diagnostics as noexcept
this is the 12nd changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals:
- to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience
- to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules.
this changeset includes following changes:
- build: cmake: remove Seastar from the option name
- build: cmake: add missing sources in test-lib and utils
- build: cmake: do not include main.cc in scylla-main
- build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests
- build: cmake: add more tests
Closes#13180
* github.com:scylladb/scylladb:
build: cmake: add more tests
build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests
build: cmake: do not include main.cc in scylla-main
build: cmake: add missing sources in test-lib and utils
build: cmake: remove Seastar from the option name
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectGroupByTest.java into our cql-pytest
framework.
This test file contains only 8 separate test functions, but each of them
is very long checking hundreds of different combinations of GROUP BY with
other things like LIMIT, ORDER BY, etc., so 6 out of the 7 tests fail on
Scylla on one of the bugs listed below - most of the tests actually fail
in multiple places due to multiple bugs. All tests pass on Cassandra.
The tests reproduce six already-known Scylla issues and one new issue:
Already known issues:
Refs #2060: Allow mixing token and partition key restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #12477: Combination of COUNT with GROUP BY is different from Cassandra
in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
A new issue discovered by these tests:
Refs #13109: Incorrect sort order when combining IN, GROUP BY and ORDER BY
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13126
In debug mode the timings are:
view_schema_test: 90 sec
cql_query_test: 170 sec
memtable_test: 2090 sec
cql_functions_test: 2591 sec
other tests that are in/out of this list are not that obvious, but the
former two apparently deserve being replaced with the latter two.
Timings for dev/release modes are not that horrible, but the "first pair
is notably smaller than the latter" relation also exists.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13142
Today, the SSTable generation provides a hint on which shard owns a
particular SSTable. That hint determines which shard will load the
SSTable into memory.
With upcoming UUID generation, we will no longer have this hint
embedded into the SSTable generation, meaning that SSTables will be
loaded at random shards. This is not good because shards will have
to reference memory from other shards to access the SSTable
metadata that was allocated elsewhere.
This patch changes sstable_directory to:
1) Use generation value to only determine which shard will calculate
the owner shards for SSTables. Essentially works like a round-robin
distribution.
2) The shard assigned to compute the owners for a SSTable will do
so reading the minimum from disk, usually only Scylla file is
needed.
3) Once that shard finished computing the owners, it will forward
the SSTable to the shard that own it.
4) Shards will later load SSTables locally that were forwarded to
them.
Closes#13114
* github.com:scylladb/scylladb:
sstables: sstable_directory: Load SSTable at the shard that actually own it
sstables: sstable_directory: Give sstable_info_vector a more descriptive name
sstables: Allow owner shards to be computed for a partially loaded SSTable
sstables: Move SSTable loading to sstable_directory::sort_sstable()
sstables: Move sstable_directory::sort_sstable() to private interface
sstables: Restore indentation in sstable_directory::sort_sstable()
sstables: Coroutinize sstable_directory::sort_sstable()
sstables: sstable_directory: Extract sstable loading from process_descriptor()
sstables: sstable_directory: Separate private fields from methods
sstables: Coroutinize sstable_directory::process_descriptor
* test/boost: add more tests: all tests listed in test/boost/CMakeLists.txt
should build now.
* rust: add inc library, which is used for testing.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Simplified, more direct version of "dependency injection".
I.e. caller/initiator (main/cql_test_env) provides a set of
services it will eventually start. Configurable can remember
these. And use, at least after "start" notification.
Closes#13037
- build: cmake: remove test which does not exist yet
- build: cmake: document add_scylla_test()
- build: cmake: extract index, repair and data_dictionary out
- build: cmake: extract scylla-main out
- build: cmake: find Snappy before using it
- build: cmake: add missing linkages
- build: cmake: add missing sources to test-lib
- build: cmake: link sstables against libdeflate
- build: cmake: link Boost::regex against ICU::uc
Closes#13110
* github.com:scylladb/scylladb:
build: cmake: link Boost::regex against ICU::uc
build: cmake: link sstables against libdeflate
build: cmake: add missing sources to test-lib
build: cmake: add missing linkages
build: cmake: find Snappy before using it
build: cmake: extract scylla-main out
build: cmake: extract index, repair and data_dictionary out
build: cmake: document add_scylla_test()
build: cmake: remove test which does not exist yet
Today, owner shards can only be computed for a fully loaded SSTable.
For upcoming changes in the SSTable loader, we want to load the minimum
from disk to be able to compute the set of shards owning the SSTable.
If sharding metadata is available, it means we only need to read
TOC and Scylla components.
Otherwise, Summary must be read to provide first and last keys for
compute_shards_for_this_sstable() to operate on them instead.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Kill said read's memory requests with std::bad_alloc and dequeue it from
the memory wait list, then evict it on the spot.
Now that `_inactive_reads` just store permits, we can do this easily.
Allows a configurable to subscribe to life cycle notifications for scylla app.
I.e. do stuff on start/stop.
Also allow configurables in cql_test_env
v2:
* Fix camel casing
* Make callbacks future<> (should have been. mismerge?)
Closes#13035
When creating a deletion log for a bunch of sstables the code checks
that all sstables share the same "storage" by lexicographically
comparing their prefixes. That's not correct, as filesystem paths may
refer to the same directory even if not being equal.
So far that's been mostly OK, because paths manipulations were done in
simple forms without producing unequal paths. Patch 8a061bd8 (sstables,
code: Introduce and use change_state() call) triggerred a corner case.
fs::path foo("/foo");
sstring sub("");
foo = foo / sub;
produces a correct path of "/foo/", but the trailing slash breaks the
aforementioned assumption about prefixes comparison. As a result, when
an sstable moves between, say, staging and normal locations it may gain
a trailing slash breaking the deletion log creation code.
The fix is to restrict the deletion log creation not to rely on path
strings comparison completely and trim the trailing slash if it happens.
A test is included.
fixes: #13085
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13090
The compilation of wasm UDFs is performed by a call to a foreign
function, which cannot be divided with yielding points and, as a
result, causes long reactor stalls for big UDFs.
We avoid them by submitting the compilation task to a non-seastar
std::thread, and retrieving the result using seastar::alien.
The thread is created at the start of the program. It executes
tasks from a queue in an infinite loop.
All seastar shards reference the thread through a std::shared_ptr
to a `alien_thread_runner`.
Considering that the compilation takes a long time anyway, the
alien_thread_runner is implemented with focus on simplicity more
than on performance. The tasks are stored in an std::queue, reading
and writing to it is synchronized using an std::mutex for reading/
writing to the queue, and an std::condition_variable waiting until
the queue has elements.
When the destructor of the alien runner is called, an std::nullopt
sentinel is pushed to the queue, and after all remaining tasks are
finished and the sentinel is read, the thread finishes.
Fixes#12904Closes#13051
* github.com:scylladb/scylladb:
wasm: move compilation to an alien thread
wasm: convert compilation to a future
Adds two test cases which test what happens when we perform an LWT UPDATE, but the partition/clustering key has 0 possible values. This can happen e.g when a column is supposed to be equal to two different values (`c = 0 AND c = 1`).
Empty partition ranges work properly, empty clustering range currently causes a crash (#13129).
I added tests for both of these cases.
Closes#13130
* github.com:scylladb/scylladb:
cql-pytest/test_lwt: test LWT update with empty clustering range
cql-pytest/test_lwt: test LWT update with empty partition range
CQL supports type casting using C-style casts.
For example it's possible to do: `blob_column = (blob)funcReturningInt()`
This functionality is pretty limited, we only allow such casts between types that have a compatible binary representation. Compatible means that the bytes will stay unchanged after the conversion.
This means that it's legal to cast an int to blob (int is just a 4 byte blob), but it's illegal to cast a bigint to int (change 4 bytes -> 8 bytes).
This simplifies things, to cast we can just reinterpret the value as the other type.
Another use of C-style casts are type hints. Sometimes it's impossible to infer the exact type of an expression from the context. In such cases the type can be specified by casting the expression to this type.
For example: `overloadedFunction((int)?)`
Without the cast it would be impossible to guess what should be the bind marker's type. The function is overloaded, so there are many possible argument types. The type hint specifies that the bind marker has type int.
An interesting thing is that such casts don't have to be explicit. CQL allows to put an int value in a place where a blob value is expected and it will be automatically converted without any explicit casting.
---
I started looking at our implementation of casts because of #12900. In there the author expressed the need to specify a type hint for bind marker used to pass the WASM code. It could be either `(text)?` for text WASM, or `(blob)?` for binary WASM. This specific use of type hints wasn't supported because there was no `receiver` and the implementation of `prepare_expression` didn't handle that. Preparing casts without a receiver should be easy to implement - we can infer the type of the expression by looking at the type to which the expression is cast.
But while reading `prepare_expression` for `expr::cast` I noticed that the code there is a bit strange. The implementation prepared the expression to cast using the original `receiver` instead of a receiver with the cast type. This caused some issues because of which casting didn't work as expected.
For example it was possible to do:
```cql
blob_column = (blob)funcReturningInt()
```
But this didn't work at all:
```cql
blob_column = (blob)(int)12323
```
It tried to prepare `untyped_contant(12323)` with a `blob` receiver, which fails.
This makes `expr::cast` useless for casting. Casting when the representation is compatible is already implicit. I couldn't find a single case where adding a cast would change the behavior in any way.
There was some use for it as a type hint to choose a specific overload of a function, but it was worthless for casting.
Cassandra has the same issue, I created a `cql-pytest` test and it showed that we behave in the same way as Cassandra does.
I decided to improve this. By preparing the expression using a receiver with the cast type, `expr::cast` becomes actually useful for casting values. Things like `(blob)(int)12323` now work without any issues.
This diverges from the behavior in Cassandra, but it's an extension, not a breaking incompatibility.
---
This PR improves `prepare_expression` for `expr::cast` in the following ways:
1) Support for more complex casts by preparing the expression using a different receiver. This makes casts like `(blob)(int)123` possible
2) Support preparing `expr::cast` without a receiver. Type inference chooses the cast type as the type of the expression.
3) Add pytest tests for C-style casts
`2)` Is needed for #12900, the other changes is just something I decided to do since I was already working on this piece of code.
Closes#13053
* github.com:scylladb/scylladb:
expr_test: more tests for preparing bind variables with type hints
prepare_expr: implement preparing expr::cast with no receiver
prepare_expr: use :user formatting in cast_prepare_expression
prepare_expr: remove std::get<> in cast_prepare_expression
prepare_expr: improve cast_prepare_expression
prepare_expr: improve readability in cast_prepare_expression
cql-pytest: test expr::cast in test_cast.py
This series aims to allow users to set permissions on user-defined functions.
The implementation is based on Cassandra's documentation and should be fully compatible: https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissionsFixes: #5572Fixes: #10633Closes#12869
* github.com:scylladb/scylladb:
cql3: allow UDTs in permissions on UDFs
cql3: add type_parser::parse() method taking user_types_metadata
schema_change_test: stop using non-existent keyspace
cql3: fix parameter names in function resource constructors
cql3: handle complex types as when decoding function permissions
cql3: enforce permissions for ALTER FUNCTION
cql-pytest: add a (failing) test case for UDT in UDF
cql-pytest: add a test case for user-defined aggregate permissions
cql-pytest: add tests for function permissions
cql3: enforce permissions on function calls
selection: add a getter for used functions
abstract_function_selector: expose underlying function
cql3: enforce permissions on DROP FUNCTION
cql3: enforce permissions for CREATE FUNCTION
client_state: add functions for checking function permissions
cql-pytest: add a case for serializing function permissions
cql3: allow specifying function permissions in CQL
auth: add functions_resource to resources
Many sstable test cases create tempdir on their own to create sstables with. Sometimes it's justified when the test needs to check files on disk by hand for some validation, but often all checks are fs-agnostic. The latter case(s) can be patched to work on top of any storage, in particular -- on top of object storage. To make it work tests should stop creating sstables explicitly in tempdir and this PR does exactly that.
All relevant occurrences of tempdir are removed from test cases, instead the sstable::test_env's tempdir is used. Next, the test_env::{create_sstable|reusable_sst} are patched not to accept the `fs::path dir` argument and pick the env's tempdir. Finally, the `make_sstable_easy` helper is patched to use path-less env methods too.
refs: #13015Closes#13116
* github.com:scylladb/scylladb:
test,sstables: Remove path from make_sstable_easy()
test,lib: Remove wrapper over reusable_sst and move the comment
test: Make "compact" test case use env dir
test,compaction: Use env tempdir in some more cases
test,compaction: Make check_compacted_sstables() use env's dir
test: Relax making sstable with sequential generation
test/sstable::test_env: Keep track of auto-incrementing generation
test/lib: Add sstable maker helper without factory
test: Remove last occurrence of test_env::do_with(rval, ...)
test,sstables: Dont mess with tempdir where possible
test/sstable::test_env: Add dir-less sstables making helpers
test,sstables: Use sstables::test_env's tempdir with sweeper
test,sstables: Use sstables::test_env's tempdir
test/lib: Add tempdir sweeper
test/lib: Open-code make_sstabl_easy into make_sstable
test: Remove vector of mutation interposer from test_key_count_estimation