Storage service API set/unset has two flaws.
First, unset doesn't happen, so after storage service is stopped its handlers become "local is not initialized"-assertion and use-after-free landmines.
Second, setting of storage service API carry gossiper and system keyspace references, thus duplicating the knowledge about storage service dependencies.
This PR fixes both by adding the storage service API unsetting and by making the handlers use _only_ storage service instance, not any externally provided references.
Closesscylladb/scylladb#15547
* github.com:scylladb/scylladb:
main, api: Set/Unset storage_service API in proper place
api/storage_service: Remove gossiper arg from API
api/storage_service: Remove system keyspace arg from API
api/storage_service: Get gossiper from storage service
api/storage_service: Get token_metadata from storage service
When presented with queries that use the same named bind variables twice, like this one:
```cql
SELECT p FROM table WHERE p = :x AND c = :x
```
Scylla generated empty `partition_key_bind_indexes` (`pk_indexes`).
`pk_indexes` tell the driver which bind variables it should use to calculate the partition token for a query. Without it, the driver is unable to determine the token and it will send the query to a random node.
Scylla should generate pk_indexes which tell the driver that it can use bind variable with `bind_index = 0` to calculate the partition token for this query.
The problem was that `_target_columns` keep only a single target_column for each bind variable.
In the example above `:x` is compared with both `p` and `c`, but `_target_columns` would contain only one of them, and Scylla wasn't able to tell that this bind variable is compared with a partition key column.
To fix it, let's replace `_target_columns` with `_targets`. `_targets` keeps all comparisons
between bind variables and other expressions, so none of them will be forgotten/overwritten.
A `cql-pytest` reproducer is added.
I also added some comments in `prepare_context.hh/cc` to make it easier to read.
Fixes: https://github.com/scylladb/scylladb/issues/15374Closesscylladb/scylladb#15526
* github.com:scylladb/scylladb:
cql-pytest/test-prepare: remove xfail marker from *pk_indexes_duplicate_named_variables
cql3/prepare_context: fix generating pk_indexes for duplicate named bind variables
cql3: improve readability of prepare_context
cql-pytest: test generation of pk indexes during PREPARE
There's a dedicated forward_service::shutdown() method that's defer-scheduled in main for very early invocation. That's not nice, the fwd service start-shutdown-stop sequence can be made "canonical" by moving the shutting down code into abort source subscription. Similar thing was done for view updates generator in 3b95f4f107
refs: #2737
refs: #4384Closesscylladb/scylladb#15545
* github.com:scylladb/scylladb:
forward_service: Remove .shutdown() method
forward_service: Set _shutdown in abort-source subscription
forward_service: Add abort_source to constructor
we compare the symbols lists of stripped ELF file ($orig.stripped) and
that of the one including debugging symbols ($orig.debug) to get a
an ELF file which includes only the necessary bits as the debuginfo
($orig.minidebug).
but we generate the symbol list of stripped ELF file using the
sysv format, while generate the one from the unstripped one using
posix format. the former is always padded the symbol names with spaces
so that their the length at least the same as the section name after
we split the fields with "|".
that's why the diff includes the stuff we don't expect. and hence,
we have tons of warnings like:
```
objcopy: build/node_exporter/node_exporter.keep_symbols:4910: Ignoring rubbish found on this line
```
when using objcopy to filter the ELF file to keep only the
symbols we are interested in.
so, in this change
* use the same format when dumping the symbols from unstripped ELF
file
* include the symbols in the text area -- the code, by checking
"T" and "t" in the dumped symbols. this was achieved by matching
the lines with "FUNC" before this change.
* include the the symbols in .init data section -- the global
variables which are initialized at compile time. they could
be also interesting when debugging an application.
Fixes#15513
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15514
When doing a SELECT CAST(b AS int), Cassandra returns a column named
cast(b as int). Currently, Scylla uses a different name -
system.castasint(b). For Cassandra compatibility, we should switch to
the same name.
fixes#14508Closesscylladb/scylladb#14800
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.
In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.
Release shard_repair_task_impl resources immediately after shard
repair is finished.
Fixes: #15505.
Closesscylladb/scylladb#15506
Before this commit the primary key was hashed for bloom filter check
for each sstable.
This commit makes the key be hashed once per sstable set and reused
for bloom filter lookups in all sstables in the set.
I tested this change using perf_simple_query with the following modifications:
1. Create more than one sstable to have sstable set of more than one elements
2. Try to prevent compactions (I wasn't 100% successful)
3. Use a key that's not present to avoid reading from disk
```
diff --git a/test/perf/perf_simple_query.cc b/test/perf/perf_simple_query.cc
index 26dbf1e99..6bd460df2 100644
--- a/test/perf/perf_simple_query.cc
+++ b/test/perf/perf_simple_query.cc
@@ -105,6 +105,8 @@ std::ostream& operator<<(std::ostream& os, const test_config& cfg) {
static void create_partitions(cql_test_env& env, test_config& cfg) {
std::cout << "Creating " << cfg.partitions << " partitions..." << std::endl;
+ // Create 10 sstables each with all the data
+ for (unsigned count = 0; count < 10; ++count) {
for (unsigned sequence = 0; sequence < cfg.partitions; ++sequence) {
if (cfg.counters) {
execute_counter_update_for_key(env, make_key(sequence));
@@ -117,6 +119,7 @@ static void create_partitions(cql_test_env& env, test_config& cfg) {
std::cout << "Flushing partitions..." << std::endl;
env.db().invoke_on_all(&replica::database::flush_all_memtables).get();
}
+ }
}
static int64_t make_random_seq(test_config& cfg) {
@@ -137,8 +140,18 @@ static std::vector<perf_result> test_read(cql_test_env& env, test_config& cfg) {
query += " using timeout " + cfg.timeout;
}
auto id = env.prepare(query).get0();
- return time_parallel([&env, &cfg, id] {
- bytes key = make_random_key(cfg);
+ // Always use the same key that is not present
+ // to make sure we don't read from disk and make
+ // the benchmark CPU bounded.
+ int64_t key_value = 6;
+ bytes key(bytes::initialized_later(), 5*sizeof(key_value));
+ auto i = key.begin();
+ write<uint64_t>(i, key_value);
+ write<uint64_t>(i, key_value);
+ write<uint64_t>(i, key_value);
+ write<uint64_t>(i, key_value);
+ write<uint64_t>(i, key_value);
+ return time_parallel([&env, id, key] {
return env.execute_prepared(id, {{cql3::raw_value::make_value(std::move(key))}}).discard_result();
}, cfg.concurrency, cfg.duration_in_seconds, cfg.operations_per_shard, cfg.stop_on_error);
}
@@ -423,6 +436,10 @@ static std::vector<perf_result> do_cql_test(cql_test_env& env, test_config& cfg)
.with_column("C2", bytes_type)
.with_column("C3", bytes_type)
.with_column("C4", bytes_type)
+ // Try to prevent compaction
+ // to keep the number of sstables high
+ .set_compaction_enabled(false)
+ .set_min_compaction_threshold(2000000000)
.build();
}).get();
@@ -539,6 +556,11 @@ int scylla_simple_query_main(int argc, char** argv) {
const auto enable_cache = app.configuration()["enable-cache"].as<bool>();
std::cout << "enable-cache=" << enable_cache << '\n';
db_cfg->enable_cache(enable_cache);
+ // Try to prevent compaction
+ // to keep the number of sstables high
+ db_cfg->concurrent_compactors(1);
+ db_cfg->compaction_enforce_min_threshold(true);
+ db_cfg->compaction_throughput_mb_per_sec(1);
cql_test_config cfg(db_cfg);
return do_with_cql_env_thread([&app] (auto&& env) {
```
The following command showed 2-3% improvement on my machine but this
depends on the lenght of the key and the number of sstables in the set.
```
./build/release/scylla perf-simple-query --bypass-cache --flush -c 1
--random-seed=2068087418 --enable-cache false
```
Signed-off-by: Piotr Jastrzebski <haaawk@gmail.com>
Closesscylladb/scylladb#15538
SSTable runs work hard to keep the disjointness invariant, therefore they're
expensive to build from scratch.
For every insertion, it keeps the elements sorted by their first key in
order to reject insertion of element that would introduce overlapping.
Additionally, a sstable run can grow to dozens of elements (or hundreds)
therefore, we can also make interaction with compaction strategies more
efficient by not copying them when building a list of candidates in compaction
manager. And less fragile by filtering out any sstable runs that are not
completely eligible for compaction.
Previously, ICS had to give up on using runs managed by sstable set due to
fragility of the interface (meaning runs are being built from scratch
on every call to the strategy, which is very inefficient, but that had to
be done for correctness), but now we can restore that.
Closesscylladb/scylladb#15440
* github.com:scylladb/scylladb:
compaction: Switch to strategy_control::candidates() for regular compaction
tests: Prepare sstable_compaction_test for change in compaction_strategy interface
compaction: Allow strategy to retrieve candidates either as sstables or runs
compaction: Make get_candidates() work with frozen_sstable_run too
sstables: add sstable_run::run_identifier()
sstables: tag sstable_run::insert() with nodiscard
sstables: Make all_sstable_runs() more efficient by exposing frozen shared runs
sstables: Simplify sstable_set interface to retrieve runs
Most of the time only the roots of tasks tree should be non internal.
Change default implementation of is_internal and delete overrides
consistent with it.
Closesscylladb/scylladb#15353
Distributed loader code still "knows" that table's datadir is a filesystem directory with some structure. For S3-backed sstables this still works because for S3 keyspaces scylla still creates and maintains empty directories in datadir. This set fixes the dist. loader assumptions about that and moves them into sstable directory's lister.
refs: #13020Closesscylladb/scylladb#15542
* github.com:scylladb/scylladb:
sstable_directory: Indentation fix after previous patch
sstable_directory: Simplify filesystem prepare()
distributed_loader: Remove get_path() method
distributed_loader: Move directory touching to sstable_directory
distributed_loader: Move directory existance checks to sstable_directory
sstable_directory: Move prepare() core to lister
for better readability, and do not create a CMAKE_BUILD_TYPE CACHE
entry if it is already set using `-D`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15543
Currently the storage-service API handlers are set up in "random" place.
It can happen earlier -- as soon as the storage service itself is ready.
Also, despite storage service is stopped on shutdown, API handlers
continue reference it leading to potential use-after-frees or "local is
not initialized" assertions.
Fix both. Unsetting is pretty bulky, scylladb/seastar#1620 is to help.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some handlers in set_storage_service() have implicit dependency on
gossiper. It's not API that should track it, but storage service itself,
so get the gossiper from service, not from the external argument (it
will be removed soon)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The API handlers that live in set_storage_service() should be
self-contained and operate on storage-service only. Said that, they
should get the token metadata, when needed, from storage service, not
from somewhere else.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There can be 2 waiters now (coordinator and CDC generation publisher),
so signal() is not enough.
Change made in c416c9ff33 missed to
update this site.
Closesscylladb/scylladb#15527
When preparing a `field_selection`, we need to prepare the UDT value,
and then verify that it has this field.
`field_selection_test_assignment` prepares the UDT value using the same
receiver as the whole `field_selection`. This is wrong, this receiver
has the type of the field, and not the UDT.
It's impossible to create a receiver for the UDT. Many different UDTs
can produce an `int` value when the field `a` is selected.
Therefore the receiver should be `nullptr`.
No unit test is added, as this bug doesn't currently cause any issues.
Preparing a column value doesn't do any type checks, so nothing fails.
Still it's good to fix it, just to be correct.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closesscylladb/scylladb#14788
Currently the bit is set in .shutdown() method which is called early on
stop. After the patch the bit it set in the abort-source subscription
callback which is also called early on stop.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now everything is prepared for the switch, let's do it.
Now let's wait for ICS to enjoy the set of changes.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That's needed for upcoming changes that will allow ICS to efficiently
retrieve sstable runs.
Next patch will remove candidates from compaction_strategy's interface
to retrieve candidates using this one instead.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
sstable_run may reject insertion of a sstable if it's going
to break the disjoint invariant of the run, but it's important
that the caller is aware of it, so it can act on it like
generating a new run id for the sstable so it can be inserted
in another run. the tag is important to avoid unknown
problems in this area.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Users of all_sstable_runs() don't want to mutate the runs, but rather
work with their content. So let's avoid copy and make the intention
explicit with the new frozen_sstable_run used as return type
for the interface.
This will guarantee that ICS will be able to fetch uncompacting
runs efficiently.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This interface selects all runs that store at least one of the
sstables in the vector.
But that's very fragile, to the point that even ICS had to
stop using it. A better interface is to return all runs
managed by the set and allow compaction manager to do its
filtering.
We want to use it in ICS to avoid the overhead of rebuilding
sstable runs which may be expensive as sorting is performed
to guarantee the disjoint invariant.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
more structured this way. this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.
Refs https://github.com/scylladb/scylladb/issues/15379Closesscylladb/scylladb#15515
* github.com:scylladb/scylladb:
build: extract get_os_ids() out
build: extract find_ninja() out
build: extract thrift_uses_boost_share_ptr() out
Currently, the tools loosely follow the following convention on
error-codes:
* return 1 if the error is with any of the command-line arguments
* return 2 on other errors
This patch changes the returned error-code on unknown operation/command
to 100 (instead of the previous 1). The intent is to allow any wrapper
script to determine that the tool failed because the operation is
unrecognized and not because of something else. In particular this
should enable us to write a wrapper script for scylla-nodetool, which
dispatches commands still un-implemented in scylla-nodetool, to the java
nodetool.
Note that the tool will still print an error message on an unknown
operation. So such wrapper script would have to make sure to not let
this bleed-through when it decides to forward the operation.
Closesscylladb/scylladb#15517
When FS lister gets prepared it
- checks if the directory exists
- creates if it it doesn't or bais out if it's quarantine one
- goes and checks the directory's owner and mode
The last step is excessive if the directory didn't exist on entry and
was created.
Indentation is deliberately left broken.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is continuation of the previous patch -- when populating a table,
creating directories should be (optionally) performed by the lister
backend, not by the generic loader.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The loader code still "knows" that tables' sstables live in directories
on datadir filesystem, but that's not always so. So whether or not the
directory with sstables exists should be checked by sstable directory's
component lister, not the loader.
After this change potentially missing quarantine directory will be
processed with the sstable directory with empty result, but that's OK,
empty directories should be already handled correctly, so even if the
directory lister doesn't produce any sstables because it found no files,
or because it just skipped scanning doesn't make any difference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Current sstable_directory::prepare() code checks the sstable directory
existance which only makes sense for filesystem-backed sstables.
S3-backed don't (well -- won't) have any directories in datadir, so the
check should be moved into component lister.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Issue #15374 has been fixed, so these tests can be enabled.
Duplicate bind variable names are now handled correctly.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
When presented with queries that use the same named bind variables twice,
like this one:
```cql
SELECT p FROM table WHERE p = :x AND c = :x
```
Scylla generated empty partition_key_bind_indexes (pk_indexes).
pk_indexes tell the driver which bind variables it should use to calculate the partition
token for a query. Without it, the driver is unable to determine the token and it will
send the query to a random node.
Scylla should generate pk_indexes which tell the driver that it can use bind variable
with bind_index = 0 to calculate the partition token for a query.
The problem was that _target_columns keep only a single target_column for each bind variable.
In the example above :x is compared with both p and c, but _target_columns would contain
only one of them, and Scylla wasn't able to tell that this bind variable is compared with
a partition key column.
To fix it, let's replace _target_columns with _targets. _targets keeps all comparisons
between bind variables and other expressions, so none of them will be forgotten/overwritten.
Fixes: #15374
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This commits adds a few comments and changes a few variable names
so that it's easier to figure out what the code does.
When I first started looking at this part of the code it wasn't
obvious what's going on - what are _specs, how are they different
from _target_columns? What happens when a variable doesn't have a name?
I hope that this change will make it easier to understand for future readers.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Add some tests that test whether `pk indexes` are generated correctly.
When a driver asks to prepare a statement, Scylla's response includes
the metadata for this prepared statement.
In this metadata there's `pk indexes`, which tells the driver which
bind variable values it should use to calculate the partition token.
For a query like:
SELECT * FROM t WHERE p2 = ? AND p1 = ? AND p3 = ?
The correct pk_indexes would be [1, 0, 2], which means
"To calculate the token calculate Hash(bind_vars[1] | bind_vars[0] | bind_vars[2])".
More information is available in the specification:
1959502d8b/doc/native_protocol_v4.spec (L699-L707)
Two tests are marked as xfail because of #15374 - Scylla doesn't correctly handle using the same
named variable in multiple places. This will be fixed soon.
I couldn't find a good place for these tests, so I created a new file - test_prepare.py.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Currently the datadir is ignored.
Use it to construct the table's base path.
Fixesscylladb/scylladb#15418Closesscylladb/scylladb#15480
* github.com:scylladb/scylladb:
distributed_loader: populate_keyspace: access cf by ref
distributed_loader: table_populator: use datadir for base_path
distributed_loader: populate_keyspace: issue table mark_ready_for_writes after all datadirs are processed
distributed_loader: populate_keyspace: fixup indentation
distributed_loader: populate_keyspace: iterate over datadirs in the inner loop
test: sstable_directory_test: add test_multiple_data_dirs
table: init_storage: create upload and staging subdirs on all datadirs
Issue #10357 is about a SELECT with a filter on a regular column which
incorrectly returns a static row without regular columns set (so the
filter would not have matched). We already have four tests reproducing
this issue, but each of them is a small part of a large tests translated
from Cassandra, making it hard to understand the scope of this bug.
So in this patch we add two new tests, one passing and one xfailing,
which clarify the scope of this bug. It turns out that the bug only
occurs when a partition has no clustering rows and only has a static
row. If the partition does have clustering rows - even if those don't
match the filter - the bug doesn't happen. The xfailing test is just
two statements long - a single INSERT and a single SELECT
Refs #10357.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#15120
Scylla can crash due to a complicated interaction of service level drop,
evictable readers, inactive read registration path.
1) service level drop invoke stop of reader concurrency semaphore, which will
wait for in flight requests
2) turns out it stops first the gate used for closing readers that will
become inactive.
3) proceeds to wait for in-flight reads by closing the reader permit gate.
4) one of evictable reads take the inactive read registration path, and
finds the gate for closing readers closed.
5) flat mutation reader is destroyed, but finds the underlying reader was
not closed gracefully and triggers the abort.
By closing permit gate first, evictable readers becoming inactive will
be able to properly close underlying reader, therefore avoiding the
crash.
Fixes#15534.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15535
in this series, we rename s3 credential related variable and option names so they are more consistent with AWS's official document. this should help with the maintainability.
Closesscylladb/scylladb#15529
* github.com:scylladb/scylladb:
main.cc: rename aws option
utils/s3/creds: rename aws_config member variables
So far generic describe (`DESC <name>`) followed Cassandra implementation and it only described keyspace/table/view/index.
This commit adds UDT/UDF/UDA to generic describe.
Fixes: #14170Closesscylladb/scylladb#14334
* github.com:scylladb/scylladb:
docs:cql: add information about generic describe
cql-pytest:test_describe: add test for generic UDT/UDF/UDA desc
cql3:statements:describe_statement: include UDT/UDF/UDA in generic describe
- s/aws_key/aws_access_key_id/
- s/aws_secret/aws_secret_access_key/
- s/aws_token/aws_session_token/
rename them to more popular names, these names are also used by
boto's API. this should improve the readability and consistency.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>