this is the 12nd changeset of a series which tries to give an overhaul to the CMake building system. this series has two goals:
- to enable developer to use CMake for building scylla. so they can use tools (CLion for instance) with CMake integration for better developer experience
- to enable us to tweak the dependencies in a simpler way. a well-defined cross module / subsystem dependency is a prerequisite for building this project with the C++20 modules.
this changeset includes following changes:
- build: cmake: remove Seastar from the option name
- build: cmake: add missing sources in test-lib and utils
- build: cmake: do not include main.cc in scylla-main
- build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests
- build: cmake: add more tests
Closes#13180
* github.com:scylladb/scylladb:
build: cmake: add more tests
build: cmake: define SEASTAR_TESTING_MAIN for SEASTAR tests
build: cmake: do not include main.cc in scylla-main
build: cmake: add missing sources in test-lib and utils
build: cmake: remove Seastar from the option name
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectGroupByTest.java into our cql-pytest
framework.
This test file contains only 8 separate test functions, but each of them
is very long checking hundreds of different combinations of GROUP BY with
other things like LIMIT, ORDER BY, etc., so 6 out of the 7 tests fail on
Scylla on one of the bugs listed below - most of the tests actually fail
in multiple places due to multiple bugs. All tests pass on Cassandra.
The tests reproduce six already-known Scylla issues and one new issue:
Already known issues:
Refs #2060: Allow mixing token and partition key restrictions
Refs #5361: LIMIT doesn't work when using GROUP BY
Refs #5362: LIMIT is not doing it right when using GROUP BY
Refs #5363: PER PARTITION LIMIT doesn't work right when using GROUP BY
Refs #12477: Combination of COUNT with GROUP BY is different from Cassandra
in case of no matches
Refs #12479: SELECT DISTINCT should refuse GROUP BY with clustering column
A new issue discovered by these tests:
Refs #13109: Incorrect sort order when combining IN, GROUP BY and ORDER BY
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#13126
In debug mode the timings are:
view_schema_test: 90 sec
cql_query_test: 170 sec
memtable_test: 2090 sec
cql_functions_test: 2591 sec
other tests that are in/out of this list are not that obvious, but the
former two apparently deserve being replaced with the latter two.
Timings for dev/release modes are not that horrible, but the "first pair
is notably smaller than the latter" relation also exists.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13142
Today, the SSTable generation provides a hint on which shard owns a
particular SSTable. That hint determines which shard will load the
SSTable into memory.
With upcoming UUID generation, we will no longer have this hint
embedded into the SSTable generation, meaning that SSTables will be
loaded at random shards. This is not good because shards will have
to reference memory from other shards to access the SSTable
metadata that was allocated elsewhere.
This patch changes sstable_directory to:
1) Use generation value to only determine which shard will calculate
the owner shards for SSTables. Essentially works like a round-robin
distribution.
2) The shard assigned to compute the owners for a SSTable will do
so reading the minimum from disk, usually only Scylla file is
needed.
3) Once that shard finished computing the owners, it will forward
the SSTable to the shard that own it.
4) Shards will later load SSTables locally that were forwarded to
them.
Closes#13114
* github.com:scylladb/scylladb:
sstables: sstable_directory: Load SSTable at the shard that actually own it
sstables: sstable_directory: Give sstable_info_vector a more descriptive name
sstables: Allow owner shards to be computed for a partially loaded SSTable
sstables: Move SSTable loading to sstable_directory::sort_sstable()
sstables: Move sstable_directory::sort_sstable() to private interface
sstables: Restore indentation in sstable_directory::sort_sstable()
sstables: Coroutinize sstable_directory::sort_sstable()
sstables: sstable_directory: Extract sstable loading from process_descriptor()
sstables: sstable_directory: Separate private fields from methods
sstables: Coroutinize sstable_directory::process_descriptor
* test/boost: add more tests: all tests listed in test/boost/CMakeLists.txt
should build now.
* rust: add inc library, which is used for testing.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
change the option name to "LINK_MEM_PER_JOB" as this is not
a Seastar option, but a top-level project option.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Under some circumstances, service_level_controller renames service
levels for internal purposes. However, the per-service-level metrics
registered by storage_proxy keep the name seen at first registration
time. This sometimes leads to mislabeled metrics.
Fix that by re-registering the metrics after scheduling groups
are renamed.
Fixes scylladb/scylla-enterprise#2755
Closes#13174
This series handles errors when aborting node operations and prints them rather letting them leak and be exposed to the user.
Also, cleanup the node_ops logging formats when aborting different node ops
and add more error logging around errors in the "worker" nodes.
Closes#12799
* github.com:scylladb/scylladb:
storage_service: node_ops_signal_abort: print a warning when signaling abort
storage_service: s/node_ops_singal_abort/node_ops_signal_abort/
storage_service: node_ops_abort: add log messages
storage_service: wire node_ops_ctl for node operations
storage_service: add node_ops_ctl class to formalize all node_ops flow
repair: node_ops_cmd_request: add print function
repair: do_decommission_removenode_with_repair: log ignore_nodes
repair: replace_with_repair: get ignore_nodes as unordered_set
gossiper: get_generation_for_nodes: get nodes as unordered_set
storage_service: don't let node_ops abort failures mask the real error
Simplified, more direct version of "dependency injection".
I.e. caller/initiator (main/cql_test_env) provides a set of
services it will eventually start. Configurable can remember
these. And use, at least after "start" notification.
Closes#13037
Until now, the instructions on generating wasm files and using them
for Scylla UDFs were stored in docs/dev, so they were not visible
on the docs website. Now that the Rust helper library for UDFs
is ready, and we're inviting users to try it out, we should also
make the rest of the Wasm UDF documentation readily available
for the users.
Closes#13139
- build: cmake: remove test which does not exist yet
- build: cmake: document add_scylla_test()
- build: cmake: extract index, repair and data_dictionary out
- build: cmake: extract scylla-main out
- build: cmake: find Snappy before using it
- build: cmake: add missing linkages
- build: cmake: add missing sources to test-lib
- build: cmake: link sstables against libdeflate
- build: cmake: link Boost::regex against ICU::uc
Closes#13110
* github.com:scylladb/scylladb:
build: cmake: link Boost::regex against ICU::uc
build: cmake: link sstables against libdeflate
build: cmake: add missing sources to test-lib
build: cmake: add missing linkages
build: cmake: find Snappy before using it
build: cmake: extract scylla-main out
build: cmake: extract index, repair and data_dictionary out
build: cmake: document add_scylla_test()
build: cmake: remove test which does not exist yet
The `database::stop` method is sometimes hanging and it's always hard to spot where exactly it sleeps. Few more logging messages would make this much simpler.
refs: #13100
refs: #10941Closes#13141
* github.com:scylladb/scylladb:
database: Increase verbosity of database::stop() method
large_data_handler: Increase verbosity on shutdown
large_data_handler: Coroutinize .stop() method
Today, the SSTable generation provides a hint on which shard owns a
particular SSTable. That hint determines which shard will load the
SSTable into memory.
With upcoming UUID generation, we will no longer have this hint
embedded into the SSTable generation, meaning that SSTables will be
loaded at random shards. This is not good because shards will have
to reference memory from other shards to access the SSTable
metadata that was allocated elsewhere.
This patch changes sstable_directory to:
1) Use generation value to only determine which shard will calculate
the owner shards for SSTables. Essentially works like a round-robin
distribution.
2) The shard assigned to compute the owners for a SSTable will do
so reading the minimum from disk, usually only Scylla file is
needed.
3) Once that shard finished computing the owners, it will forward
the SSTable to the shard that own it.
4) Shards will later load SSTables locally that were forwarded to
them.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Today, owner shards can only be computed for a fully loaded SSTable.
For upcoming changes in the SSTable loader, we want to load the minimum
from disk to be able to compute the set of shards owning the SSTable.
If sharding metadata is available, it means we only need to read
TOC and Scylla components.
Otherwise, Summary must be read to provide first and last keys for
compute_shards_for_this_sstable() to operate on them instead.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The reason for this change is that we'll want to fully load the
SSTable only at the destination shard.
Later, sort_sstable() will calculate set of owner shards for a
SSTable by only loading scylla metadata file.
If it turns out that the SSTable belongs to current shard, then
we'll fully load the SSTable using the new and fresh
sstable_directory::load_sstable().
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Will make it easier for process_descriptor to process the SSTable
without having to fully load the SSTable.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Following the expected coding convention. It's also somewhat
disturbing to see them mixed up.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Allows a configurable to subscribe to life cycle notifications for scylla app.
I.e. do stuff on start/stop.
Also allow configurables in cql_test_env
v2:
* Fix camel casing
* Make callbacks future<> (should have been. mismerge?)
Closes#13035
When creating a deletion log for a bunch of sstables the code checks
that all sstables share the same "storage" by lexicographically
comparing their prefixes. That's not correct, as filesystem paths may
refer to the same directory even if not being equal.
So far that's been mostly OK, because paths manipulations were done in
simple forms without producing unequal paths. Patch 8a061bd8 (sstables,
code: Introduce and use change_state() call) triggerred a corner case.
fs::path foo("/foo");
sstring sub("");
foo = foo / sub;
produces a correct path of "/foo/", but the trailing slash breaks the
aforementioned assumption about prefixes comparison. As a result, when
an sstable moves between, say, staging and normal locations it may gain
a trailing slash breaking the deletion log creation code.
The fix is to restrict the deletion log creation not to rely on path
strings comparison completely and trim the trailing slash if it happens.
A test is included.
fixes: #13085
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13090
The compilation of wasm UDFs is performed by a call to a foreign
function, which cannot be divided with yielding points and, as a
result, causes long reactor stalls for big UDFs.
We avoid them by submitting the compilation task to a non-seastar
std::thread, and retrieving the result using seastar::alien.
The thread is created at the start of the program. It executes
tasks from a queue in an infinite loop.
All seastar shards reference the thread through a std::shared_ptr
to a `alien_thread_runner`.
Considering that the compilation takes a long time anyway, the
alien_thread_runner is implemented with focus on simplicity more
than on performance. The tasks are stored in an std::queue, reading
and writing to it is synchronized using an std::mutex for reading/
writing to the queue, and an std::condition_variable waiting until
the queue has elements.
When the destructor of the alien runner is called, an std::nullopt
sentinel is pushed to the queue, and after all remaining tasks are
finished and the sentinel is read, the thread finishes.
Fixes#12904Closes#13051
* github.com:scylladb/scylladb:
wasm: move compilation to an alien thread
wasm: convert compilation to a future
These two are static binaries, so no need in yum/apt-installing them with dependencies.
Just download with curl and put them into /urs/local/bin with X-bit set.
This is needed for future object-storage work in order to run unit tests against minio.
refs: #12523
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
[avi: regenerate frozen toolchain]
Closes#13064Closes#13099
Adds two test cases which test what happens when we perform an LWT UPDATE, but the partition/clustering key has 0 possible values. This can happen e.g when a column is supposed to be equal to two different values (`c = 0 AND c = 1`).
Empty partition ranges work properly, empty clustering range currently causes a crash (#13129).
I added tests for both of these cases.
Closes#13130
* github.com:scylladb/scylladb:
cql-pytest/test_lwt: test LWT update with empty clustering range
cql-pytest/test_lwt: test LWT update with empty partition range
CQL supports type casting using C-style casts.
For example it's possible to do: `blob_column = (blob)funcReturningInt()`
This functionality is pretty limited, we only allow such casts between types that have a compatible binary representation. Compatible means that the bytes will stay unchanged after the conversion.
This means that it's legal to cast an int to blob (int is just a 4 byte blob), but it's illegal to cast a bigint to int (change 4 bytes -> 8 bytes).
This simplifies things, to cast we can just reinterpret the value as the other type.
Another use of C-style casts are type hints. Sometimes it's impossible to infer the exact type of an expression from the context. In such cases the type can be specified by casting the expression to this type.
For example: `overloadedFunction((int)?)`
Without the cast it would be impossible to guess what should be the bind marker's type. The function is overloaded, so there are many possible argument types. The type hint specifies that the bind marker has type int.
An interesting thing is that such casts don't have to be explicit. CQL allows to put an int value in a place where a blob value is expected and it will be automatically converted without any explicit casting.
---
I started looking at our implementation of casts because of #12900. In there the author expressed the need to specify a type hint for bind marker used to pass the WASM code. It could be either `(text)?` for text WASM, or `(blob)?` for binary WASM. This specific use of type hints wasn't supported because there was no `receiver` and the implementation of `prepare_expression` didn't handle that. Preparing casts without a receiver should be easy to implement - we can infer the type of the expression by looking at the type to which the expression is cast.
But while reading `prepare_expression` for `expr::cast` I noticed that the code there is a bit strange. The implementation prepared the expression to cast using the original `receiver` instead of a receiver with the cast type. This caused some issues because of which casting didn't work as expected.
For example it was possible to do:
```cql
blob_column = (blob)funcReturningInt()
```
But this didn't work at all:
```cql
blob_column = (blob)(int)12323
```
It tried to prepare `untyped_contant(12323)` with a `blob` receiver, which fails.
This makes `expr::cast` useless for casting. Casting when the representation is compatible is already implicit. I couldn't find a single case where adding a cast would change the behavior in any way.
There was some use for it as a type hint to choose a specific overload of a function, but it was worthless for casting.
Cassandra has the same issue, I created a `cql-pytest` test and it showed that we behave in the same way as Cassandra does.
I decided to improve this. By preparing the expression using a receiver with the cast type, `expr::cast` becomes actually useful for casting values. Things like `(blob)(int)12323` now work without any issues.
This diverges from the behavior in Cassandra, but it's an extension, not a breaking incompatibility.
---
This PR improves `prepare_expression` for `expr::cast` in the following ways:
1) Support for more complex casts by preparing the expression using a different receiver. This makes casts like `(blob)(int)123` possible
2) Support preparing `expr::cast` without a receiver. Type inference chooses the cast type as the type of the expression.
3) Add pytest tests for C-style casts
`2)` Is needed for #12900, the other changes is just something I decided to do since I was already working on this piece of code.
Closes#13053
* github.com:scylladb/scylladb:
expr_test: more tests for preparing bind variables with type hints
prepare_expr: implement preparing expr::cast with no receiver
prepare_expr: use :user formatting in cast_prepare_expression
prepare_expr: remove std::get<> in cast_prepare_expression
prepare_expr: improve cast_prepare_expression
prepare_expr: improve readability in cast_prepare_expression
cql-pytest: test expr::cast in test_cast.py
This series aims to allow users to set permissions on user-defined functions.
The implementation is based on Cassandra's documentation and should be fully compatible: https://cassandra.apache.org/doc/latest/cassandra/cql/security.html#cql-permissionsFixes: #5572Fixes: #10633Closes#12869
* github.com:scylladb/scylladb:
cql3: allow UDTs in permissions on UDFs
cql3: add type_parser::parse() method taking user_types_metadata
schema_change_test: stop using non-existent keyspace
cql3: fix parameter names in function resource constructors
cql3: handle complex types as when decoding function permissions
cql3: enforce permissions for ALTER FUNCTION
cql-pytest: add a (failing) test case for UDT in UDF
cql-pytest: add a test case for user-defined aggregate permissions
cql-pytest: add tests for function permissions
cql3: enforce permissions on function calls
selection: add a getter for used functions
abstract_function_selector: expose underlying function
cql3: enforce permissions on DROP FUNCTION
cql3: enforce permissions for CREATE FUNCTION
client_state: add functions for checking function permissions
cql-pytest: add a case for serializing function permissions
cql3: allow specifying function permissions in CQL
auth: add functions_resource to resources
It was suggested as candidate from one of previous reviews, so here it is.
Closes#13140
* github.com:scylladb/scylladb:
distributed_loader: Indentation fix after previous patch
distributed_loader: Coroutinize reshape() helper
Many sstable test cases create tempdir on their own to create sstables with. Sometimes it's justified when the test needs to check files on disk by hand for some validation, but often all checks are fs-agnostic. The latter case(s) can be patched to work on top of any storage, in particular -- on top of object storage. To make it work tests should stop creating sstables explicitly in tempdir and this PR does exactly that.
All relevant occurrences of tempdir are removed from test cases, instead the sstable::test_env's tempdir is used. Next, the test_env::{create_sstable|reusable_sst} are patched not to accept the `fs::path dir` argument and pick the env's tempdir. Finally, the `make_sstable_easy` helper is patched to use path-less env methods too.
refs: #13015Closes#13116
* github.com:scylladb/scylladb:
test,sstables: Remove path from make_sstable_easy()
test,lib: Remove wrapper over reusable_sst and move the comment
test: Make "compact" test case use env dir
test,compaction: Use env tempdir in some more cases
test,compaction: Make check_compacted_sstables() use env's dir
test: Relax making sstable with sequential generation
test/sstable::test_env: Keep track of auto-incrementing generation
test/lib: Add sstable maker helper without factory
test: Remove last occurrence of test_env::do_with(rval, ...)
test,sstables: Dont mess with tempdir where possible
test/sstable::test_env: Add dir-less sstables making helpers
test,sstables: Use sstables::test_env's tempdir with sweeper
test,sstables: Use sstables::test_env's tempdir
test/lib: Add tempdir sweeper
test/lib: Open-code make_sstabl_easy into make_sstable
test: Remove vector of mutation interposer from test_key_count_estimation
add type constraits for
`sstable_directory::parallel_for_each_restricted()`, to enforce the
constraints on the function so it should be invocable with the argument
of specified type. this helps to prevent the problems of passing
function which accepts `pair<key, value>` or `tuple<key, value>`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
`parallel_for_each_restricted()` maps the elements in the given
container with the specified function. in this case, the elements is
of type `unordered_map::value_type`, which is a `pair<const Key, Value>`.
to convert it to a `tuple<Key, Value>`, the constructor of the tuple
is called. but what we intend to do here is but to access the second
element in the `pair<>` here.
in this change, the function's signature is changed to match
`scan_descriptors_map::value_type` to avoid the unnecessary overhead of
constructor of `tuple<>`. also, because the underlying
`max_concurrent_for_each()` does not pass a xvalue to the given func,
instead, it just pass `*s.begin` to the function, where `s.begin` is
an `Iterator` returned by `std::begin(container)`. so let's just use
a plain reference as the parameter type for the function.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
- sstables: mark param of sstable::*_from_sstring() const
- sstables: mark param of reverse_map() const
- sstables: mark static lookup table const
Closes#13115
* github.com:scylladb/scylladb:
sstables: mark static lookup table const
sstables: mark param of reverse_map() const
sstables: mark param of sstable::*_from_sstring() const
This reverts commit 49e0d0402d, reversing
changes made to 25cf325674.
An old version of PR #13115 was accidentally merged into `master` (it
was dequeued concurrently while a running next promotion job included
it).
Revert the merge. We'll merge the new version as a follow-up.