Regression introduced in 23e4c8315.
view_and_holder position_in_partiton::after_key() triggers undefined
behavior when the key was not full because the holder is moved, which invalidates the view.
Fixes#12367Closes#12447
With sufficiently many test cases we would eventually run out of IP
addresses, because IPs (which are leased from a global host registry)
would only be released at the end of an entire test suite.
In fact we already hit this during next promotions, causing much pain
indeed.
Release IPs when a cluster, after being marked dirty, is stopped and
thrown away.
Closes#12482
Introduce a new "script" operation, which loads a script from the specified path, then feeds the mutation fragment stream to it. The script can then extract, process and present information from the sstable as it wishes.
For now only Lua scripts are supported for the simple reason that Lua is easy to write bindings for, it is simple and lightweight and more importantly we already have Lua included in the Scylla binary as it is used as the implementation language for UDF/UDA. We might consider WASM support in the future, but for now we don't have any language support in WASM available.
Example:
```lua
function new_stats(key)
return {
partition_key = key,
total = 0,
partition = 0,
static_row = 0,
clustering_row = 0,
range_tombstone_change = 0,
};
end
total_stats = new_stats(nil);
function inc_stat(stats, field)
stats[field] = stats[field] + 1;
stats.total = stats.total + 1;
total_stats[field] = total_stats[field] + 1;
total_stats.total = total_stats.total + 1;
end
function on_new_sstable(sst)
max_partition_stats = new_stats(nil);
if sst then
current_sst_filename = sst.filename;
else
current_sst_filename = nil;
end
end
function consume_partition_start(ps)
current_partition_stats = new_stats(ps.key);
inc_stat(current_partition_stats, "partition");
end
function consume_static_row(sr)
inc_stat(current_partition_stats, "static_row");
end
function consume_clustering_row(cr)
inc_stat(current_partition_stats, "clustering_row");
end
function consume_range_tombstone_change(crt)
inc_stat(current_partition_stats, "range_tombstone_change");
end
function consume_partition_end()
if current_partition_stats.total > max_partition_stats.total then
max_partition_stats = current_partition_stats;
end
end
function on_end_of_sstable()
if current_sst_filename then
print(string.format("Stats for sstable %s:", current_sst_filename));
else
print("Stats for stream:");
end
print(string.format("\t%d fragments in %d partitions - %d static rows, %d clustering rows and %d range tombstone changes",
total_stats.total,
total_stats.partition,
total_stats.static_row,
total_stats.clustering_row,
total_stats.range_tombstone_change));
print(string.format("\tPartition with max number of fragments (%d): %s - %d static rows, %d clustering rows and %d range tombstone changes",
max_partition_stats.total,
max_partition_stats.partition_key,
max_partition_stats.static_row,
max_partition_stats.clustering_row,
max_partition_stats.range_tombstone_change));
end
```
Running this script wilt yield the following:
```
$ scylla sstable script --script-file fragment-stats.lua --system-schema system_schema.columns /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/me-1-big-Data.db
Stats for sstable /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f//me-1-big-Data.db:
397 fragments in 7 partitions - 0 static rows, 362 clustering rows and 28 range tombstone changes
Partition with max number of fragments (180): system - 0 static rows, 179 clustering rows and 0 range tombstone changes
```
Fixes: https://github.com/scylladb/scylladb/issues/9679Closes#11649
* github.com:scylladb/scylladb:
tools/scylla-sstable: consume_reader(): improve pause heuristincs
test/cql-pytest/test_tools.py: add test for scylla-sstable script
tools: add scylla-sstable-scripts directory
tools/scylla-sstable: remove custom operation
tools/scylla-sstable: add script operation
tools/sstable: introduce the Lua sstable consumer
dht/i_partitioner.hh: ring_position_ext: add weight() accessor
lang/lua: export Scylla <-> lua type conversion methods
lang/lua: use correct lib name for string lib
lang/lua: fix type in aligned_used_data (meant to be user_data)
lang/lua: use lua_State* in Scylla type <-> Lua type conversions
tools/sstable_consumer: more consistent method naming
tools/scylla-sstable: extract sstable_consumer interface into own header
tools/json_writer: add accessor to underlying writer
tools/scylla-sstable: fix indentation
tools/scylla-sstable: export mutation_fragment_json_writer declaration
tools/scylla-sstable: mutation_fragment_json_writer un-implement sstable_consumer
tools/scylla-sstable: extract json writing logic from json_dumper
tools/scylla-sstable: extract json_writer into its own header
tools/scylla-sstable: use json_writer::DataKey() to write all keys
tools/scylla-types: fix use-after-free on main lambda captures
Inferring shard from generation is long gone. We still use it in
some scripts, but that's no longer needed in Scylla, when loading
the SSTables, and it also conflicts with ongoing work of UUID-based
generations.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#12476
The CQL binary protocol version 3 was introduced in 2014. All Scylla
version support it, and Cassandra versions 2.1 and newer.
Versions 1 and 2 have 16-bit collection sizes, while protocol 3 and newer
use 32-bit collection sizes.
Unfortunately, we implemented support for multiple serialization formats
very intrusively, by pushing the format everywhere. This avoids the need
to re-serialize (sometimes) but is quite obnoxious. It's also likely to be
broken, since it's almost untested and it's too easy to write
cql_serialization_format::internal() instead of propagating the client
specified value.
Since protocols 1 and 2 are obsolete for 9 years, just drop them. It's
easy to verify that they are no longer in use on a running system by
examining the `system.clients` table before upgrade.
Fixes#10607Closes#12432
* github.com:scylladb/scylladb:
treewide: drop cql_serialization_format
cql: modification_statement: drop protocol check for LWT
transport: drop cql protocol versions 1 and 2
To test the script operation, we use some of the example scripts from
the example directory. Namely, dump.lua and slice.lua. These two scripts
together have a very good coverage of the entire script API. Testing
their functionality therefore also provides a good coverage of the lua
bindings. A further advantage is that since both scripts dump output in
identical format to that of the data-dump operation, it is trivial to do
a comparison against this already tested operation.
A targeted test is written for the sstable skip functionality of the
consumer API.
Consider the following MVCC state of a partition:
v2: ==== <7> [entry2] ==== <9> ===== <last dummy>
v1: ================================ <last dummy> [entry1]
Where === means a continuous range and --- means a discontinuous range.
After two LRU items are evicted (entry1 and entry2), we will end up with:
v2: ---------------------- <9> ===== <last dummy>
v1: ================================ <last dummy> [entry1]
This will cause readers to incorrectly think there are no rows before
entry <9>, because the range is continuous in v1, and continuity of a
snapshot is a union of continuous intervals in all versions. The
cursor will see the interval before <9> as continuous and the reader
will produce no rows.
This is only temporary, because current MVCC merging rules are such
that the flag on the latest entry wins, so we'll end up with this once
v1 is no longer needed:
v2: ---------------------- <9> ===== <last dummy>
...and the reader will go to sstables to fetch the evicted rows before
entry <9>, as expected.
The bug is in rows_entry::on_evicted(), which treats the last dummy
entry in a special way, and doesn't evict it, and doesn't clear the
continuity by omission.
The situation is not easy to trigger because it requires certain
eviction pattern concurrent with multiple reads of the same partition
in different versions, so across memtable flushes.
Closes#12452
Cassandra refuses a request with more than one equality relation to the
same column, for example
DELETE FROM tbl WHERE partitionKey = ? AND partitionKey = ?
It complains that
partitionkey cannot be restricted by more than one relation if it
includes an Equal
Currently, Scylla doesn't consider such requests an error. Whether or
not we should be compatible with Cassandra here is discussed in
issue #12472. But as long as we do accept this query, we should be
sure we do the right thing: "WHERE p = 1 AND p = 2" should match
nothing (not the first, or last, value being tested..), and "WHERE p = 1
AND p = 1" should match the matches of p = 1. This patch adds a test
for verify that these requests indeed yield correct results. The
test is scylla_only because, as explained above, Cassandra doesn't
support this feature at all.
Refs #12472
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12473
This commit removes consume_in_reverse::legacy_half_reverse, an option
once used to indicate that the given key ranges are sorted descending,
based on the clustering key of the start of the range, and that the
range tombstones inside partition would be sorted (descending, as all
the mutation fragments would) according to their end (but range
tombstone would still be stored according to their start bound).
As it turns out, mutation::consume, when called with legacy_half_reverse
option produces invalid fragment stream, one where all the row
tombstone changes come after all the clustering rows. This was not an
issue, since when constructing results from the query, Scylla would not
pass the tombstones to the client, but instead compact data beforehand.
In this commit, the consume_in_reverse::legacy_half_reverse is removed,
along with all the uses.
As for the swap out in mutation_partition.cc in query_mutation and
to_data_query_result:
The downstream was not prepared to deal with legacy_half_reverse.
mutation::consume contains
```
if (reverse == consume_in_reverse::yes) {
while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
co_await yield();
}
} else {
while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
co_await yield();
}
}
```
So why did it work at all? to_data_query_result deals with a single slice.
The used consumer (compact_for_query_v2) compacts-away the range tombstone
changes, and thus the only difference between the consume_in_reverse::no
and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys
and the second one was ordered decreasing. This property is maintained if
we swap out for the consume_in_reverse::yes format.
Refs: #12353Closes#12453
* github.com:scylladb/scylladb:
mutation{,_consumer,_partition}: remove consume_in_reverse::legacy_half_reverse
mutation_partition_view: treat query::partition_slice::option::reversed in to_data_query_result as consume_in_reverse::yes
mutation: move consume_in_reverse def to mutation_consumer.hh
Convert decompressed temporary buffers into tracked buffers just before
returning them to the upper layer. This ensures these buffers are known
to the reader concurrency semaphore and it has an accurate view of the
actual memory consumption of reads.
Fixes: #12448Closes#12454
The test would use a trick to start a separate Scylla cluster from the
one provided originally by the test framework. This is not supported by
the test framework and may cause unexpected problems.
Change the test to perform regular node operations. Instead of starting
a fresh cluster of 3 nodes, we join the first of these nodes to the
original framework-provided cluster, then decommission the original
nodes, then bootstrap the other 2 fresh nodes.
Also add some logging to the test.
Refs: #12438, #12442Closes#12457
This series adds the implementation and usage of rust wasmtime bindings.
The WASM UDFs introduced by this patch are interruptable and use memory allocated using the seastar allocator.
This series includes #11102 (the first two commits) because #11102 required disabling wasm UDFs completely. This patch disables them in the middle of the series, and enables them again at the end.
After this patch, `libwasmtime.a` can be removed from the toolchain.
This patch also removes the workaround for #https://github.com/scylladb/scylladb/issues/9387 but it hasn't been tested with ARM yet - if the ARM test causes issues I'll revert this part of the change.
Closes#11351
* github.com:scylladb/scylladb:
build: remove references to unused c bindings of wasmtime
test: assert that WASM allocations can fail without crashing
wasm: limit memory allocated using mmap
wasm: add configuration options for instance cache and udf execution
test: check that wasmtime functions yield
wasm: use the new rust bindings of wasmtime
rust: add Wasmtime bindings
rust: add build profiles more aligned with ninja modes
rust: adjust build according to cxxbridge's recommendations
tools: toolchain: dbuild: prepare for sharing cargo cache
In issue #3668, a discussion spanning several years theorized that several
things are wrong with the "timestamp" type. This patch begins by adding
several tests that demonstrate that Scylla is in fact behaving correctly,
and mostly identically to Cassandra except one esoteric error handling
case.
However, after eliminating the red herrings, we are left for the real
issue that prompted opening #3668, which is a duplicate of issues #2693
and #2694, and this patch also adds a reproducer for that. The issue is
that Cassandra 4 added support for arithmetic expressions on values,
and timestamps can be added durations, for example:
'2011-02-03 04:05:12.345+0000' - 1d
is a valid timestamp - and we don't currently support this syntax.
So the new test - which passes on Cassandra 4 and fails on Scylla
(or Cassandra 3) is marked xfail.
Refs #2693
Refs #2694
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12436
The main source of big allocations in the WASM UDF implementation
is the WASM Linear Memory. We do not want Scylla to crash even if
a memory allocation for the WASM Memory fails, so we assert that
an exception is thrown instead.
The wasmtime runtime does not actually fail on an allocation failure
(assuming the memory allocator does not abort and returns nullptr
instead - which our seastar allocator does). What happens then
depends on the failed allocation handling of the code that was
compiled to WASM. If the original code threw an exception or aborted,
the resulting WASM code will trap. To make sure that we can handle
the trap, we need to allow wasmtime to handle SIGILL signals, because
that what is used to carry information about WASM traps.
The new test uses a special WASM Memory allocator that fails after
n allocations, and the allocations include both memory growth
instructions in WASM, as well as growing memory manually using the
wasmtime API.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
The wasmtime runtime allocates memory for the executable code of
the WASM programs using mmap and not the seastar allocator. As
a result, the memory that Scylla actually uses becomes not only
the memory preallocated for the seastar allocator but the sum of
that and the memory allocated for executable codes by the WASM
runtime.
To keep limiting the memory used by Scylla, we measure how much
memory do the WASM programs use and if they use too much, compiled
WASM UDFs (modules) that are currently not in use are evicted to
make room.
To evict a module it is required to evict all instances of this
module (the underlying implementation of modules and instances uses
shared pointers to the executable code). For this reason, we add
reference counts to modules. Each instance using a module is a
reference. When an instance is destroyed, a reference is removed.
If all references to a module are removed, the executable code
for this module is deallocated.
The eviction of a module is actually acheved by eviction of all
its references. When we want to free memory for a new module we
repeatedly evict instances from the wasm_instance_cache using its
LRU strategy until some module loses all its instances. This
process may not succeed if the instances currently in use (so not
in the cache) use too much memory - in this case the query also
fails. Otherwise the new module is added to the tracking system.
This strategy may evict some instances unnecessarily, but evicting
modules should not happen frequently, and any more efficient
solution requires an even bigger intervention into the code.
Different users may require different limits for their UDFs. This
patch allows them to configure the size of their cache of wasm,
the maximum size of indivitual instances stored in the cache, the
time after which the instances are evicted, the fuel that all wasm
UDFs are allowed to consume before yielding (for the control of
latency), the fuel that wasm UDFs are allowed to consume in total
(to allow performing longer computations in the UDF without
detecting an infinite loop) and the hard limit of the size of UDFs
that are executed (to avoid large allocations)
The new implementation for WASM UDFs allows executing the UDFs
in pieces. This commit adds a test asserting that the UDF is in fact
divided and that each of the execution segments takes no longer than
1ms.
This commit removes consume_in_reverse::legacy_half_reverse, an option
once used to indicate that the given key ranges are sorted descending,
based on the clustering key of the start of the range, and that the
range tombstones inside partition would be sorted (descending, as all
the mutation fragments would) according to their end (but range
tombstone would still be stored according to their start bound).
As it turns out, mutation::consume, when called with legacy_half_reverse
option produces invalid fragment stream, one where all the row
tombstone changes come after all the clustering rows. This was not an
issue, since when constructing results from the query, Scylla would not
pass the tombstones to the client, but instead compact data beforehand.
In this commit, the consume_in_reverse::legacy_half_reverse is removed,
along with all the uses.
As for the swap out in mutation_partition.cc in query_mutation and
to_data_query_result:
The downstream was not prepared to deal with legacy_half_reverse.
mutation::consume contains
```
if (reverse == consume_in_reverse::yes) {
while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::yes>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
co_await yield();
}
} else {
while (!(stop_opt = consume_clustering_fragments<consume_in_reverse::no>(_ptr->_schema, partition, consumer, cookie, is_preemptible::yes))) {
co_await yield();
}
}
```
So why did it work at all? to_data_query_result deals with a single slice.
The used consumer (compact_for_query_v2) compacts-away the range tombstone
changes, and thus the only difference between the consume_in_reverse::no
and consume_in_reverse::yes was that one was ordered increasing wrt. ckeys
and the second one was ordered decreasing. This property is maintained if
we swap out for the consume_in_reverse::yes format.
Unlike other experimental feature we want to raft to be opt in even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.
* 'consistent-cluster-management-flag' of github.com:scylladb/scylla-dev:
raft: replace experimental raft option with dedicated flag
main: move supervisor notification about group registry start where it actually starts
raft_group0 used to register RPC verbs only on shard 0. This worked on
clusters with the same --smp setting on all nodes, since RPCs in this
case are processed on the same shard as the calling code, and
raft_group0 methods only run on shard 0.
A new test test_nodes_with_different_smp was added to identify the
problem. Since --smp can only be specified via the command line, a
corresponding parameter was added to the ManagerClient.server_add
method. It allows to override the default parameters set by the
SCYLLA_CMDLINE_OPTIONS variable by changing, adding or deleting
individual items.
Fixes: #12252Closes#12374
* github.com:scylladb/scylladb:
raft: raft_group0, register RPC verbs on all shards
raft: raft_append_entries, copy entries to the target shard
test.py, allow to specify the node's command line in test
Run tests for parallelized aggregation with
`enable_parallelized_aggregation` set always to true, so the tests work
even if the default value of the option is false.
Closes#12409
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.
test_ssl
In very slow debug builds the default driver timeouts are too low and
tests might fail. Bump up the values to a more reasonable time.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#12408
raft_group0 used to register RPC verbs only on shard 0.
This worked on clusters with the same --smp setting on
all nodes, since RPCs in this case are (usually)
processed on the same shard as the calling code,
and raft_group0 methods only run on shard 0.
A new test test_nodes_with_different_smp was added
to identify the problem.
Fixes: #12252
An optional parameter cmdline has been added to
the ManagerClient.server_add method.
It allows you to override the default parameters
set by the SCYLLA_CMDLINE_OPTIONS variable
by changing, adding or deleting individual
items. To change or add a parameter just specify
its name and value one after the other.
To remove parameter use the special keyword
__remove__ as a value. To set a parameter
without a value (such as --overprovisioned)
use the special keyword __missing__ as the value.
Add to test/cql-pytest/README.md an explanation of the philosophy
of the cql-pytest test suite, and some guideliness on how to write
good tests in that framework.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12400
Unlike other experimental feature we want to raft to be optional even
after it leaves experimental mode. For that we need to have a separate
option to enable it. The patch adds the binary option "consistent-cluster-management"
for that.
database_test in timing out because it's having to run the tests calling
do_with_cql_env_and_compaction_groups 3x, one for each compaction group
setting. reduce it to 2 settings instead of 3 if running in debug mode.
Refs #12396.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#12421
In commit acfa180766 we added to
test/cql-pytest a mechanism to detect when Scylla crashes in the middle
of a test function - in which case we report the culprit test and exit
immediately to avoid having a hundred more tests report that they failed
as well just because Scylla was down.
However, if Scylla was *never* up - e.g., if the user ran "pytest" without
ever running Scylla - we still report hundreds of tests as having failed,
which is confusing and not helpful.
So with this patch, if a connection cannot be made to Scylla at all,
the test exits immediately, explaining what went wrong, not blaming
any specific test:
$ pytest
...
! _pytest.outcomes.Exit: Cannot connect to Scylla at --host=localhost --port=9042 !
============================ no tests ran in 0.55s =============================
Beyond being a helpful reminder for a developer who runs "pytest" without
having started Scylla first (or using test/cql-pytest/run or test.py to
start Scylla easily), this patch is also important when running tests
through test.py if it reuses an instance of Scylla that crashed during an
earlier pytest file's run.
This patch does not fix test.py - it can still try to run pytest with
a dead Scylla server without checking. But at least with this patch
pytest will notice this problem immediately and won't report hundreds of
test functions having failed. The only report the user will see will be
the last test which crashed Scylla, which will make it easier to find
this failure without being hidden between hundreds of spurious failures.
Fixes#12360
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12401
In very slow debug builds the default driver timeouts are too low and
tests might fail. Bump up the values to more reasonable time.
These timeout values are the same as used in topology tests.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#12405
When the mutation compactor has all the rows it needs for a page, it
saves the decision to stop in a member flag: _stop.
For single partition queries, the mutation compactor is kept alive
across pages and so it has a method, start_new_page() to reset its state
for the next page. This method didn't clear the _stop flag. This meant
that the value set at the end of the previous could cause the new page
and subsequently the entire query to be stopped prematurely.
This can happen if the new page starts with a row that is covered by a
higher level tombstone and is completely empty after compaction.
Reset the _stop flag in start_new_page() to prevent this.
This commit also adds a unit test which reproduces the bug.
Fixes: #12361Closes#12384
In issue #10767, concerned were raised that the CLUSTERING ORDER BY
clause is handled incorrectly in a CREATE MATERIALIZED VIEW definition.
The tests in this patch try to explore the different ways in which
CLUSTERING ORDER BY can be used in CREATE MATERIALIZED VIEW and allows
us to compare Scylla's behaivor to Cassandra, and to common sense.
The tests discover that the CLUSTERING ORDER BY feature in materialized
views generally works as expected, but there are *three* differences
between Scylla and Cassandra in this feature. We consider two differences
to be bugs (and hence the test is marked xfail) and one a Scylla extension:
1. When a base table has a reverse-order clustering column and this
clustering column is used in the materialized view, in Cassandra
the view's clustering order inherits the reversed order. In Scylla,
the view's clustering order reverts to the default order.
Arguably, both behaviors can be justified, but usually when in doubt
we should implement Cassandra's behavior - not pick a different
behavior, even if the different behavior is also reasonable. So
this test (test_mv_inherit_clustering_order()) is marked "xfail",
and a new issue was created about this difference: #12308.
If we want to fix this behavior to match Cassandra's we should also
consider backward compatibility - what happens if we change this
behavior in Scylla now, after we had the opposite behavior in
previous releases? We may choose to enshrine Scylla's Cassandra-
incompatible behavior here - and document this difference.
2. The CLUSTERING ORDER BY should, as its name suggests, only list
clustering columns. In Scylla, specifying other things, like regular
columns, partition-key columns, or non-existent columns, is silently
ignored, whereas it should result in an Invalid Request error (as it
does in Cassandra). So test_mv_override_clustering_order_error()
is marked "xfail".
This is the difference already discovered in #10767.
3. When a materialized view has several clustering columns, Cassandra
requires that a CLUSTERING ORDER BY clause, if present, must specify
the order of all of *all* clustering columns. Scylla, in contrast,
allows the user to override the order of only *some* of these columns -
and the rest get the default order. I consider this to be a
legitimate Scylla extension, and not a compatibility bug, so marked
the test with "scylla_only", and no issue was opened about it.
Refs #10767
Refs #12308
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12307
This patch adds a scylla_inject_error(), a context manager which tests
can use to temporarily enable some error injection while some test
code is running. It can be used to write tests that artificially
inject certain errors instead of trying to reach the elaborate (and
often requiring precise timing or high amounts of data) situation where
they occur naturally.
The error-injection API is Scylla-specific (it uses the Scylla REST API)
and does not work on "release"-mode builds (all other modes are supported),
so when Cassandra or release-mode build are being tested, the test which
uses scylla_inject_error() gets skipped.
Example usage:
```python
from rest_api import scylla_inject_error
with scylla_inject_error(cql, "injection_name", one_shot=True):
# do something here
...
```
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12264
Retrieves the configuration item with the given name and prints its
value as well as its metadata.
Example:
(gdb) scylla get-config-value compaction_static_shares
value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart
Closes#12362
* github.com:scylladb/scylladb:
scylla-gdb.py: add scylla get-config-value gdb command
scylla-gdb.py: extract $downcast_vptr logic to standalone method
test: scylla-gdb/run: improve diagnostics for failed tests
Retrieves the configuration item with the given name and prints its
value as well as its metadata.
Example:
(gdb) scylla get-config-value compaction_static_shares
value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart
Allows static configuration of number of compaction groups per table per shard.
To bootstrap the project, config option x_log2_compaction_groups was added which controls both number of groups and partitioning within a shard.
With a value of 0 (default), it means 1 compaction group, therefore all tokens go there.
With a value of 3, it means 8 compaction groups, and 3 most-significant-bits of tokens being used to decide which group owns the token.
And so on.
It's still missing:
- integration with repair / streaming
- integration with reshard / reshape.
perf/perf_simple_query --smp 1 --memory 1G
BEFORE
-----
median 61358.55 tps ( 71.1 allocs/op, 12.2 tasks/op, 56375 insns/op, 0 errors)
median 61322.80 tps ( 71.1 allocs/op, 12.2 tasks/op, 56391 insns/op, 0 errors)
median 61058.58 tps ( 71.1 allocs/op, 12.2 tasks/op, 56386 insns/op, 0 errors)
median 61040.94 tps ( 71.1 allocs/op, 12.2 tasks/op, 56381 insns/op, 0 errors)
median 61118.40 tps ( 71.1 allocs/op, 12.2 tasks/op, 56379 insns/op, 0 errors)
AFTER
-----
median 61656.12 tps ( 71.1 allocs/op, 12.2 tasks/op, 56486 insns/op, 0 errors)
median 61483.29 tps ( 71.1 allocs/op, 12.2 tasks/op, 56495 insns/op, 0 errors)
median 61638.05 tps ( 71.1 allocs/op, 12.2 tasks/op, 56494 insns/op, 0 errors)
median 61726.09 tps ( 71.1 allocs/op, 12.2 tasks/op, 56509 insns/op, 0 errors)
median 61537.55 tps ( 71.1 allocs/op, 12.2 tasks/op, 56491 insns/op, 0 errors)
Closes#12139
* github.com:scylladb/scylladb:
test: mutation_test: Test multiple compaction groups
test: database_test: Test multiple compaction groups
test: database_test: Adapt it to compaction groups
db: Add config for setting static number of compaction groups
replica: Introduce static compaction groups
test: sstable_test: Stop referencing single compaction group
api: compaction_manager: Stop a compaction type for all groups
api: Estimate pending tasks on all compaction groups
api: storage_service: Run maintenance compactions on all compaction groups
replica: table: Adapt assertion to compaction groups
replica: database: stop and disable compaction on behalf of all groups
replica: Introduce table::parallel_foreach_table_state()
replica: disable auto compaction on behalf of all groups
replica: table: Rework compaction triggers for compaction groups
replica: Adapt table::get_sstables_including_compacted_undeleted() to compaction groups
replica: Adapt table::rebuild_statistics() to compaction groups
replica: table: Perform major compaction on behalf of all groups
replica: table: Perform off-strategy compaction on behalf of all groups
replica: table: Perform cleanup compaction on behalf of all groups
replica: Extend table::discard_sstables() to operate on all compaction groups
replica: table: Create compound sstable set for all groups
replica: table: Set compaction strategy on behalf of all groups
replica: table: Return min memtable timestamp across all groups
replica: Adapt table::stop() to compaction groups
replica: Adapt table::clear() to compaction groups
replica: Adapt table::can_flush() to compaction groups
replica: Adapt table::flush() to compaction groups
replica: Introduce parallel_foreach_compaction_group()
replica: Adapt table::set_schema() to compaction groups
replica: Add memtables from all compaction groups for reads
replica: Add memtable_count() method to compaction_group
replica: table: Reserve reader list capacity through a callback
replica: Extract addition of memtables to reader list into a new function
replica: Adapt table::occupancy() to compaction groups
replica: Adapt table::active_memtable() to compaction groups
replica: Introduce table::compaction_groups()
replica: Preparation for multiple compaction groups
scylla-gdb: Fix backward compatibility of scylla_memtables command
In the past we had issue #7933 where very long strings of consecutive
tombstones caused Alternator's paging to take an unbounded amount of
time and/or memory for a single page. This issue was fixed (by commit
e9cbc9ee85) but the two tests we had
reproducing that issue were left with the "xfail" mark.
They were also marked "veryslow" - each taking about 100 seconds - so
they didn't run by default so nobody noticed they started to pass.
In this patch I make these tests much faster (taking less than a second
together), confirm that they pass - and remove the "xfail" mark and
improve their descriptions.
The trick to making these tests faster is to not create a million
tombstones like we used to: We now know that after string of just 10,000
tombstones ('query_tombstone_page_limit') the page should end, so
we can check specifically this number. The story is more complicated for
partition tombstones, but there too it should be a multiple of
query_tombstone_page_limit. To make the tests even faster, we change
run.py to lower the query_tombstone_page_limit from the default 10,000
to 1000. The tests work correctly even without this change, but they are
ten times faster with it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12350
Extends mutation_test to run the tests with more than one
compaction group, in addition to a single one (default).
Piggyback on existing tests. Avoids duplication.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Extends database_test to run the tests with more than one
compaction group, in addition to a single one (default).
Piggyback on existing tests. Avoids duplication.
Caught a bug when snapshotting, in implementation of
table::can_flush(), showing its usefulness.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
active_memtable() was fine to a single group, but with multiple groups,
there will be one active memtable per group. Let's change the
interface to reflect that.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This is to define the API sstable needs from underlying storage. When implementing object-storage backend it will need to implement those. The API looks like
future<> snapshot(const sstable& sst, sstring dir, absolute_path abs) const;
future<> quarantine(const sstable& sst, delayed_commit_changes* delay);
future<> move(const sstable& sst, sstring new_dir, generation_type generation, delayed_commit_changes* delay);
void open(sstable& sst, const io_priority_class& pc); // runs in async context
future<> wipe(const sstable& sst) noexcept;
future<file> open_component(const sstable& sst, component_type type, open_flags flags, file_open_options options, bool check_integrity);
It doesn't have "list" or alike, because it's not a method of an individual sstable, but rather the one from sstables_manager. It will come as separate PR.
Closes#12217
* github.com:scylladb/scylladb:
sstable, storage: Mark dir/temp_dir private
sstable: Remove get_dir() (well, almost)
sstable: Add quarantine() method to storage
sstable: Use absolute/relative path marking for snapshot()
sstable: Remove temp_... stuff from sstable
sstable: Move open_component() on storage
sstable: Mark rename_new_sstable_component_file() const
sstable: Print filename(type) on open-component error
sstable: Reorganize new_sstable_component_file()
sstable: Mark filename() private
sstable: Introduce index_filename()
tests: Disclosure private filename() calls
sstable: Move wipe_storage() on storage
sstable: Remove temp dir in wipe_storage()
sstable: Move unlink parts into wipe_storage
sstable: Remove get_temp_dir()
sstable: Move write_toc() to storage
sstable: Shuffle open_sstable()
sstable: Move touch_temp_dir() to storage
sstable: Move move() to storage
sstable: Move create_links() to storage
sstable: Move seal_sstable() to storage
sstable: Tossing internals of seal_sstable()
sstable: Move remove_temp_dir() to storage
sstable: Move create_links_common() to storage
sstable: Move check_create_links_replay() to storage
sstable: Remove one of create_links() overloads
sstable: Remove create_links_and_mark_for_removal()
sstable: Indentation fix after prevuous patch
sstable: Coroutinize create_links_common()
sstable: Rename create_links_common()'s "dir" argument
sstable: Make mark_for_removal bool_class
sstable, table: Add sstable::snapshot() and use in table::take_snapshot
sstable: Move _dir and _temp_dir on filesystem_storage
sstable: Use sync_directory() method
test, sstable: Use component_basename in test
sstables: Move read_{digest|checksum} on sstable
Recently, the pytest script shipped by Fedora started invoking python
with the `-s` flag, which disables python considering user site
packages. This caused problems for our tests which install the cassandra
driver in the user site packages. This was worked around in e5e7780f32
by providing our own pytest interposer launcher script which does not
pass the above mentioned flag to python. Said patch fixed test.py but
not the run.py in cql-pytest. So if the cql-pytest suite is ran via
test.py it works fine, but if it is invoked via the run script, it fails
because it cannot find the cassandra driver. This patch patches run.py
to use our own pytest launcher script, so the suite can be run via the
run script as well.
Since run.py is shared with the alternator pytest suite, this patch also
fixes said test suite too.
Closes#12253
This PR fixes several bugs related to handling of non-full
clustering keys.
One is in trim_clustering_row_ranges_to(), which is broken for non-full keys in reverse
mode. It will trim the range to position_in_partition_view::after_key(full_key) instead of
position_in_partition_view::before_key(key), hence it will include the
key in the resulting range rather than exclude it.
Fixes#12180
after_key() was creating a position which is after all keys prefixed
by a non-full key, rather than a position which is right after that
key.
This will issue will be caught by cql_query_test::test_compact_storage
in debug mode when mutation_partition_v2 merging starts inserting
sentinels at position after_key() on preemption.
It probably already causes problems for such keys as after_key() is used
in various parts in the read path.
Refs #1446Closes#12234
* github.com:scylladb/scylladb:
position_in_partition: Make after_key() work with non-full keys
position_in_partition: Introduce before_key(position_in_partition_view)
db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order
types: Fix comparison of frozen sets with empty values
There's a bunch of helpers around XFS-specific temp-dir sitting in
publie sstable part. Drop it altogether, no code needs it for real.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>