When the mutation compactor has all the rows it needs for a page, it
saves the decision to stop in a member flag: _stop.
For single partition queries, the mutation compactor is kept alive
across pages and so it has a method, start_new_page() to reset its state
for the next page. This method didn't clear the _stop flag. This meant
that the value set at the end of the previous could cause the new page
and subsequently the entire query to be stopped prematurely.
This can happen if the new page starts with a row that is covered by a
higher level tombstone and is completely empty after compaction.
Reset the _stop flag in start_new_page() to prevent this.
This commit also adds a unit test which reproduces the bug.
Fixes: #12361Closes#12384
On some docker instance configuration, hostname resolution does not
work, so our script will fail on startup because we use hostname -i to
construct cqlshrc.
To prevent the error, we can use --rpc-address or --listen-address
for the address since it should be same.
Fixes#12011Closes#12115
In case a table is dropped, we should ignore it in the repair_updater,
since we can not update off strategy trigger for a dropped table.
Refs #12373Closes#12388
Every 1 hour, compaction manager will submit all registered table_state
for a regular compaction attempt, all without yielding.
This can potentially cause a reactor stall if there are 1000s of table
states, as compaction strategy heuristics will run on behalf of each,
and processing all buckets and picking the best one is not cheap.
This problem can be magnified with compaction groups, as each group
is represented by a table state.
This might appear in dashboard as periodic stalls, every 1h, misleading
the investigator into believing that the problem is caused by a
chronological job.
This is fixed by piggybacking on compaction reevaluation loop which
can yield between each submission attempt if needed.
Fixes#12390.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#12391
They are currently missing from the printout
when the a table is created, but they are determinal
to understanding the mode with which tombstones are to
be garbage-collected in the table. gcGraceSeconds alone
is no longer enough since the introduction of
tombstone_gc_option in a8ad385ecd.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#12381
In issue #10767, concerned were raised that the CLUSTERING ORDER BY
clause is handled incorrectly in a CREATE MATERIALIZED VIEW definition.
The tests in this patch try to explore the different ways in which
CLUSTERING ORDER BY can be used in CREATE MATERIALIZED VIEW and allows
us to compare Scylla's behaivor to Cassandra, and to common sense.
The tests discover that the CLUSTERING ORDER BY feature in materialized
views generally works as expected, but there are *three* differences
between Scylla and Cassandra in this feature. We consider two differences
to be bugs (and hence the test is marked xfail) and one a Scylla extension:
1. When a base table has a reverse-order clustering column and this
clustering column is used in the materialized view, in Cassandra
the view's clustering order inherits the reversed order. In Scylla,
the view's clustering order reverts to the default order.
Arguably, both behaviors can be justified, but usually when in doubt
we should implement Cassandra's behavior - not pick a different
behavior, even if the different behavior is also reasonable. So
this test (test_mv_inherit_clustering_order()) is marked "xfail",
and a new issue was created about this difference: #12308.
If we want to fix this behavior to match Cassandra's we should also
consider backward compatibility - what happens if we change this
behavior in Scylla now, after we had the opposite behavior in
previous releases? We may choose to enshrine Scylla's Cassandra-
incompatible behavior here - and document this difference.
2. The CLUSTERING ORDER BY should, as its name suggests, only list
clustering columns. In Scylla, specifying other things, like regular
columns, partition-key columns, or non-existent columns, is silently
ignored, whereas it should result in an Invalid Request error (as it
does in Cassandra). So test_mv_override_clustering_order_error()
is marked "xfail".
This is the difference already discovered in #10767.
3. When a materialized view has several clustering columns, Cassandra
requires that a CLUSTERING ORDER BY clause, if present, must specify
the order of all of *all* clustering columns. Scylla, in contrast,
allows the user to override the order of only *some* of these columns -
and the rest get the default order. I consider this to be a
legitimate Scylla extension, and not a compatibility bug, so marked
the test with "scylla_only", and no issue was opened about it.
Refs #10767
Refs #12308
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12307
This patch adds a scylla_inject_error(), a context manager which tests
can use to temporarily enable some error injection while some test
code is running. It can be used to write tests that artificially
inject certain errors instead of trying to reach the elaborate (and
often requiring precise timing or high amounts of data) situation where
they occur naturally.
The error-injection API is Scylla-specific (it uses the Scylla REST API)
and does not work on "release"-mode builds (all other modes are supported),
so when Cassandra or release-mode build are being tested, the test which
uses scylla_inject_error() gets skipped.
Example usage:
```python
from rest_api import scylla_inject_error
with scylla_inject_error(cql, "injection_name", one_shot=True):
# do something here
...
```
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12264
Retrieves the configuration item with the given name and prints its
value as well as its metadata.
Example:
(gdb) scylla get-config-value compaction_static_shares
value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart
Closes#12362
* github.com:scylladb/scylladb:
scylla-gdb.py: add scylla get-config-value gdb command
scylla-gdb.py: extract $downcast_vptr logic to standalone method
test: scylla-gdb/run: improve diagnostics for failed tests
These options have been nonsense since 2017.
--pie and --so are ignored, --static disables (sic!) static linking of
libraries.
Remove them.
Closes#12366
Retrieves the configuration item with the given name and prints its
value as well as its metadata.
Example:
(gdb) scylla get-config-value compaction_static_shares
value: 100, type: "float", source: SettingsFile, status: Used, live: MustRestart
Due to an oversight, the local index cache isn't evicted gently
when _upper_bound existed. This is a source of reactor stalls.
Fix that.
Fixes#12271Closes#12364
Currently the scylla tools (`scylla-types` and `scylla-sstable`) have documentation in two places: high level documentation can be found at `docs/operating-scylla/admin-tools/scylla-{types,sstable}.rst`, while low level, more detailed documentation is embedded in the tool itself. This is especially pronounced for `scylla-sstable`, which only has a short description of its operations online, all details being found only in the command-line help.
We want to move away from this model, such that all documentation can be found online, with the command-line help being reserved to documenting how the various switches and flags work, on top of a short description of the operation and a link to the detailed online docs.
Closes#12284
* github.com:scylladb/scylladb:
tool/scylla-sstable: move documentation online
docs: scylla-sstable.rst: add sstable content section
docs: scylla-{sstable,types}.rst: drop Syntax section
Allows static configuration of number of compaction groups per table per shard.
To bootstrap the project, config option x_log2_compaction_groups was added which controls both number of groups and partitioning within a shard.
With a value of 0 (default), it means 1 compaction group, therefore all tokens go there.
With a value of 3, it means 8 compaction groups, and 3 most-significant-bits of tokens being used to decide which group owns the token.
And so on.
It's still missing:
- integration with repair / streaming
- integration with reshard / reshape.
perf/perf_simple_query --smp 1 --memory 1G
BEFORE
-----
median 61358.55 tps ( 71.1 allocs/op, 12.2 tasks/op, 56375 insns/op, 0 errors)
median 61322.80 tps ( 71.1 allocs/op, 12.2 tasks/op, 56391 insns/op, 0 errors)
median 61058.58 tps ( 71.1 allocs/op, 12.2 tasks/op, 56386 insns/op, 0 errors)
median 61040.94 tps ( 71.1 allocs/op, 12.2 tasks/op, 56381 insns/op, 0 errors)
median 61118.40 tps ( 71.1 allocs/op, 12.2 tasks/op, 56379 insns/op, 0 errors)
AFTER
-----
median 61656.12 tps ( 71.1 allocs/op, 12.2 tasks/op, 56486 insns/op, 0 errors)
median 61483.29 tps ( 71.1 allocs/op, 12.2 tasks/op, 56495 insns/op, 0 errors)
median 61638.05 tps ( 71.1 allocs/op, 12.2 tasks/op, 56494 insns/op, 0 errors)
median 61726.09 tps ( 71.1 allocs/op, 12.2 tasks/op, 56509 insns/op, 0 errors)
median 61537.55 tps ( 71.1 allocs/op, 12.2 tasks/op, 56491 insns/op, 0 errors)
Closes#12139
* github.com:scylladb/scylladb:
test: mutation_test: Test multiple compaction groups
test: database_test: Test multiple compaction groups
test: database_test: Adapt it to compaction groups
db: Add config for setting static number of compaction groups
replica: Introduce static compaction groups
test: sstable_test: Stop referencing single compaction group
api: compaction_manager: Stop a compaction type for all groups
api: Estimate pending tasks on all compaction groups
api: storage_service: Run maintenance compactions on all compaction groups
replica: table: Adapt assertion to compaction groups
replica: database: stop and disable compaction on behalf of all groups
replica: Introduce table::parallel_foreach_table_state()
replica: disable auto compaction on behalf of all groups
replica: table: Rework compaction triggers for compaction groups
replica: Adapt table::get_sstables_including_compacted_undeleted() to compaction groups
replica: Adapt table::rebuild_statistics() to compaction groups
replica: table: Perform major compaction on behalf of all groups
replica: table: Perform off-strategy compaction on behalf of all groups
replica: table: Perform cleanup compaction on behalf of all groups
replica: Extend table::discard_sstables() to operate on all compaction groups
replica: table: Create compound sstable set for all groups
replica: table: Set compaction strategy on behalf of all groups
replica: table: Return min memtable timestamp across all groups
replica: Adapt table::stop() to compaction groups
replica: Adapt table::clear() to compaction groups
replica: Adapt table::can_flush() to compaction groups
replica: Adapt table::flush() to compaction groups
replica: Introduce parallel_foreach_compaction_group()
replica: Adapt table::set_schema() to compaction groups
replica: Add memtables from all compaction groups for reads
replica: Add memtable_count() method to compaction_group
replica: table: Reserve reader list capacity through a callback
replica: Extract addition of memtables to reader list into a new function
replica: Adapt table::occupancy() to compaction groups
replica: Adapt table::active_memtable() to compaction groups
replica: Introduce table::compaction_groups()
replica: Preparation for multiple compaction groups
scylla-gdb: Fix backward compatibility of scylla_memtables command
Said mechanism broke tools and tests to some extent: the read it executes on sstable load time means that if the sstable is broken enough to fail this read, it will fail to load, preventing diagnostic tools to load it and examine it and preventing tests from producing broken sstables for testing purposes.
Closes#12359
* github.com:scylladb/scylladb:
sstables: allow bypassing first/last position metadata loading
sstables: sstable::{load,open_data}(): fix indentation
sstables: coroutinize sstable::open_data()
sstables: sstable::open_data(): use clear_gently() to clear token ranges
sstables: coroutinize sstable::load()
Type of the id of node operations is changed from utils::UUID
to node_ops_id. This way the id of node operations would be easily
distinguished from the ids of other entities.
Closes#11673
* seastar 3a5db04197...3db15b5681 (27):
> build: get the full path of c-ares
> build: unbreak pkgconfig output
> http: Add 206 Partial Content response code
> http: Carry integer content_length on reply
> tls_test: drop duplicated includes
> tls_test: remove duplicated test case
> reactor: define __NR_pidfd_open if not defined
> sockets: Wait on socket peer closing the connection
> tcp: Close connection when getting RST from server
> Merge 'Enhance rpc tester with delays, timeouts and verbosity' from Pavel Emelyanov
> Merge 'build: use pkg_search_module(.. IMPORTED_TARGET ..) ' from Kefu Chai
> build: define GnuTLS_{LIBRARIES, INCLUDE_DIRS} only if GnuTLS is found
> build: use pkg_search_module(.. IMPORTED_TARGET ..)
> addr2line: extend asan regex
> abort_source: move-assign operator: call base class unlink
> coroutine: correct syntax error in doxygen comment
> demo: Extend http connection demo with https
> test: temporarily disable warning for tests triggering warnings
> tests/unit/coroutine: Include <ranges>
> sstring: Document why sstring exists at all
> test: log error when read/write to pipe fails
> test: use executables in /bin
> tests: spawn_test: use BOOST_CHECK_EQUAL() for checking equality of temporary_buffer
> docker: bump up to clang {14,15} and gcc {11,12}
> shared_ptr: ignore false alarm from GCC-12
> build: check for fix of CWG2631
> circleci: use versioned container image
Closes#12355
In the past we had issue #7933 where very long strings of consecutive
tombstones caused Alternator's paging to take an unbounded amount of
time and/or memory for a single page. This issue was fixed (by commit
e9cbc9ee85) but the two tests we had
reproducing that issue were left with the "xfail" mark.
They were also marked "veryslow" - each taking about 100 seconds - so
they didn't run by default so nobody noticed they started to pass.
In this patch I make these tests much faster (taking less than a second
together), confirm that they pass - and remove the "xfail" mark and
improve their descriptions.
The trick to making these tests faster is to not create a million
tombstones like we used to: We now know that after string of just 10,000
tombstones ('query_tombstone_page_limit') the page should end, so
we can check specifically this number. The story is more complicated for
partition tombstones, but there too it should be a multiple of
query_tombstone_page_limit. To make the tests even faster, we change
run.py to lower the query_tombstone_page_limit from the default 10,000
to 1000. The tests work correctly even without this change, but they are
ten times faster with it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12350
* Update Nixpkgs base
* Clarify some comments
* Get rid of custom-packaged cxxbridge (it's now present in Nixpkgs as
cxx-rs)
* Add missing libraries (libdeflate, libxcrypt)
* Fix expected hash of the gdb patch
* Fix a couple of small build problems
Fixes#12259Closes#12346
* github.com:scylladb/scylladb:
build: fix Nix devenv
cql3: mark several private fields as maybe_unused
configure.py: link with more abseil libs
* Update Nixpkgs base
* Clarify some comments
* Get rid of custom-packaged cxxbridge (it's now present in Nixpkgs as
cxx-rs)
* Add missing libraries (libdeflate, libxcrypt)
* Fix expected hash of the gdb patch
* Bump Python driver to 3.25.20-scylla
Fixes#12259
Because they are indeed unused -- they are initialized, passed down
through some layers, but not actually used. No idea why only Clang 12
in debug mode in Nix devenv complains about it, though.
Specifically libabsl_strings{,_internal}.a.
This fixes failure to link tests in the Nix devenv; since presumably
all is good in other setups, it must be something weird having to do
with inlining?
The extra linked libraries shouldn't hurt in any case.
Extends mutation_test to run the tests with more than one
compaction group, in addition to a single one (default).
Piggyback on existing tests. Avoids duplication.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Extends database_test to run the tests with more than one
compaction group, in addition to a single one (default).
Piggyback on existing tests. Avoids duplication.
Caught a bug when snapshotting, in implementation of
table::can_flush(), showing its usefulness.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This new option allows user to control the number of compaction groups
per table per shard. It's 0 by default which implies a single compaction
group, as is today.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This is the initial support for multiple groups.
_x_log2_compaction_groups controls the number of compaction groups
and the partitioning strategy within a single table.
The value in _x_log2_compaction_groups refers to log base 2 of the
actual number of groups.
0 means 1 compaction group.
1 means 2 groups and 2 most significant bits of token being
used to pick the target group.
The group partitioner should be later abstracted for making tablet
integration easier in the future.
_x_log2_compaction_groups is still a constant but a config option
will come next.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Estimates # of compaction jobs to be performed on a table.
Adaptation is done by adding estimation from all groups.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
With compaction group model, truncate_table_on_all_shards() needs
to stop and disable compaction for all groups.
replica::table::as_table_state() will be removed once no user
remains, as each table may map to multiple groups.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This will replace table::as_table_state(). The latter will be
killed once its usage drops to zero.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Allow table-wide compaction trigger, as well as fine-grained trigger
like after flushing a memtable on behalf of a single group.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
discard_sstables() runs on context of truncate, which is a table-wide
operation today, and will remain so with multiple static groups.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>