Cannot be fully converted to flat_mutation_reader_v2 yet, as the
selector is built on combined_reader interface which is still not
converted. So only updated wherever possible.
Subsequent work will update sstable_set readers, which uses the
selector, to flat_mutation_reader_v2.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
A short new test to verify that in the TagResource operation, the
Tags parameter - specifying which tags to set - is required.
The test passes on both AWS and Alternator - they both produce a
ValidationException in this case (the specific human-readable error
message is different, though, so we don't check it).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211206140541.1157574-1-nyh@scylladb.com>
The stall report uses the millisecond unit, but actually reports
nanoseconds.
Switch to microseconds (milliseconds are a bit too coarse) and
use the safer "duration / 1us" style rather than "duration::count()"
that leads to unit confusion.
Fixes#9733.
Closes#9734
"
This series adds an optional "quarantine" subdirectory
to the table data directory that may contain sstables
that are fenced-off from regular compaction.
The motivation, as discussed in
https://github.com/scylladb/scylla/issues/7658
and
https://github.com/scylladb/scylla/issues/9537#issuecomment-953635973,
is to prevent regular compaction from spreading sstable corruption
further to other sstables, and allow investigating the invalid sstables
using the scylla-sstable tool, or scrubbing them in segregate mode.
When sstables are found to be invalid in scrub::mode::validate
they are moved to the quarantine directory, where they will still
be available for reading, but will not be considered for regular
or major compaction.
By default scrub, in all other modes, will consider all sstables,
including the quaratined ones. To make it more efficient, a
new option was added and exposed via the storage_service/keyspace_scrub
api -
quarantine_mode. When set to quarantine_mode::only, scrub will read only
the quarantined sstables, so that the user can start with validate mode
to
detect invalid sstables and quarantine them, then scrub/segregate only
the quarantined sstables.
Test: unit(dev), database_test(debug)
DTest:
nodetool_additional_test.py:TestNodetool.{scrub_ks_sstable_with_invalid_fragment_test,scrub_segregate_sstable_with_invalid_fragment_test,scrub_segregate_ks_sstable_with_invalid_fragment_test,scrub_sstable_with_invalid_fragment_test,scrub_with_multi_nodes_expect_data_rebuild_test,scrub_with_one_node_expect_data_loss_test,validate_ks_sstable_with_invalid_fragment_test,validate_with_one_node_expect_data_loss_test,validate_sstable_with_invalid_fragment_test}
"
* tag 'quarantine-invalid-sstables-v6' of github.com:bhalevy/scylla:
test: sstable_compaction_test: add sstable_scrub_quarantine_mode_test
compaction: scrub: add quarantine_mode option
compaction_manager: perform_sstable_scrub: get the whole compaction_type_options::scrub
compaction: scrub_sstables_validate_mode: quarantine invalid sstables
test: database_test: add snapshot_with_quarantine_works
test: database_test: add populate_from_quarantine_works
distributed_loader: populate_keyspace: populate also from the quarantine dir
distributed_loader: populate_column_family: add must_exist param
sstables: add is_quarantined
sstables: add is_eligible_for_compaction
sstables: define symbolic names for table subdirectories
"
Thrift is one of the users of global storage proxy instance.
This set remove all such calls from the thrift/ code.
tests: unit(dev)
"
* 'br-thrift-reference-storage-proxy' of https://github.com/xemul/scylla:
thrift: Use local proxy reference in do_paged_slice
thrift: Use local proxy reference in handler methods
thrift: Keep sharded proxy reference on thrift_handler
For each quarantine mode:
Validate sstables to quarantine one of them
and then scrub with the given quarantine mode
and verify the output whwther the quarantined
sstable was scrubbed or not.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When invalid sstables are detected, move them
to the quarantine subdirectory so they won't be
selected for regular compaction.
Refs #7658
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Test that we load quarantined sstables by
creating a dataset, moving a sstable to the quarantine dir,
and then reload the table and verify the dataset.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
sstables in the quarantine subdirectory are part of the table.
They're just not eligible for non-scrub compaction.
Call populate_column_family also for the quarantine subdirectory,
allowing it to not exist.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Check if the directory to be loaded exists.
Currently must_exist=true in all cases,
but it may be set to false when loading directories
that may not exist.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Quarantined sstables will reside in a "quarantine" subdirectory
and are also not eligible for regular compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently compaction_manager tracks sstables
based on !requires_view_building() and similarly,
table::in_strategy_sstables picks up only sstables
that are not in staging.
is_eligible_for_compaction() generalizes this condition
in preparation for adding a quarantine subdirectory for
invalid sstables that should not be compacted as well.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define the "staging", "upload", and "snapshots" subdirectory
names as named const expressions in the sstables namespace
rather than relying on their string representation,
that could lead to typo mistakes.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Initializing a vector from an initializer_list defeats move construction
(since initializer_list is const). Moreover it is suspected to cause a
crash due to a miscompile. In any case, this patch fixes the crash.
Fixes#9735.
Closes#9736
Originally mentioned in: https://github.com/scylladb/scylla/pull/9481#issuecomment-982698208
Currently we call `_restrictions->need_filtering()` each time a prepared select is executed.
This is not super efficient - `need_filtering` has to scan through the whole AST and analyze it.
This PR calculates value of `_restrictions->need_filtering()` only once and then uses this precomputed value.
I ran `perf_simple_query` on my laptop throttled to 1GHz and it looks like this saves ~1000 instructions/op.
```bash
median 38459.09 tps ( 75.1 allocs/op, 12.1 tasks/op, 46099 insns/op)
median 38743.79 tps ( 75.1 allocs/op, 12.1 tasks/op, 46091 insns/op)
median 38489.52 tps ( 75.1 allocs/op, 12.1 tasks/op, 46097 insns/op)
median 38492.10 tps ( 75.1 allocs/op, 12.1 tasks/op, 46102 insns/op)
median 38478.65 tps ( 75.1 allocs/op, 12.1 tasks/op, 46098 insns/op)
median 38930.07 tps ( 75.1 allocs/op, 12.1 tasks/op, 44922 insns/op)
median 38777.52 tps ( 75.1 allocs/op, 12.1 tasks/op, 44904 insns/op)
median 39325.41 tps ( 75.1 allocs/op, 12.1 tasks/op, 44925 insns/op)
median 38640.51 tps ( 75.1 allocs/op, 12.1 tasks/op, 44907 insns/op)
median 39075.89 tps ( 75.1 allocs/op, 12.1 tasks/op, 44920 insns/op)
./build/release/test/perf/perf_simple_query --cpuset 1 -m 1G --random-seed 0 --task-quota-ms 10 --operations-per-shard 1000000
```
Closes#9727
* github.com:scylladb/scylla:
select_statement: Use precomputed value of _restrictions->need_filtering()
select_statement: Store whether restrictions need filtering in a variable
We had a logger called "query result log", with spaces, which made it
impossible to enable it with the REST API due to missing percent
decoding support in our HTTP server (see #9614).
Although that HTTP server bug should be fixed as well (in Seastar -
see scylladb/seastar#725), there is no good reason to have a logger
name with a space in it. This is the only logger whose name has
a space: We have 77 other loggers using underscores (_) in their name,
and only 9 using hyphens (-). So in this patch we choose the more
popular alternative - an underscore.
Fixes#9614.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211205093732.1092553-1-nyh@scylladb.com>
Instead of calculating _restrictions->need_filtering() each time,
we can now use the value that has been already calculated.
This used to happen during query execution,
so we get an increase in performance.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Instead of calculating _restrictions->need_filtering()
we can calculate it only once and then use this computed variable.
It turns out that _restrictions->need_filtering() is called
during execution of prepared statements and it has to scan through the whole AST,
so doing it only once gives us a performance gain.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
This strategy method was introduced unnecessarily. We assume it was
going to be needed, but turns out it was never needed, not even
for ICS. Also it's built on a wrong assumption as an output
sstable run being generated can never be compacted in parallel
as the non-overlapping requirement can be easily broken.
LCS for example can allow parallel compaction on different runs
(levels) but correctness cannto be guaranteed with same runs
are compacted in parallel.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
It was introduced by commit 5206a97915 because fully expired sstable
wouldn't be registed and therefore could be never removed from backlog
tracker. This is no longer possible as table is now responsible for
removing all input sstables. So let's kill on_skipped_expired_sstable()
as it's now only boilerplate we don't need.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
... and count_if()' from Avi Kivity
The expression code provides some utilities to examine and manipulate
expressions at prepare time. These are not (or should not be) in the fast
path and so should be optimized for compile time and code footprint
rather than run time.
This series does so by detemplating and deinlining find_in_expression()
and count_if().
Closes#9712
* github.com:scylladb/scylla:
cql3: expr: adjust indentation in recurse_until()
cql3: expr: detemplate count_if()
cql3: expr: detemplate count_if()
cql3: expr: rewrite count_if() in terms of recurse_until()
cql3: expr: deinline recurse_until()
cql3: expr: detemplate find_in_expression
Scylla doesn't support unset values inside UDT.
The old code used to convert `unset` to `null`, which seems incorrect.
There is an extra space in the error message to retain compatability with Cassandra.
Fixes: #9671Closes#9724
* github.com:scylladb/scylla:
cql-pytest: Enable test for UDT with unset values
cql3: Don't allow unset values inside UDT
The test testUDTWithUnsetValues was marked as xfail,
but now the issue has been fixed and we can enable it.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Scylla doesn't support unset values inside UDT.
The old code used to convert unset to null, which seems incorrect.
There is an extra space in the error message to retain compatability with Cassandra.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
When reshaping TWCS table in relaxed mode, which is the case for
offstrategy and boot, disjoint tolerance is too strict, which can
lead those processes to do more work than needed.
Let's increase the tolerance to max threshold, which will limit the
amount of sstables opened in compaction to a reasonable amount.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211130132538.56285-1-raphaelsc@scylladb.com>
unfortunately, correctness of std_unique_ptr and similar depends on
their implementation in libstdc++. let's support unique ptr on
newer systems while maintaining backward compatibility.
./test.py --mode=release scylla-gdb now passes to me, also verified
`scylla compaction-tasks` produces correct info.
Fixes#9677.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211202173534.359672-1-raphaelsc@scylladb.com>
"
couple of preparatory changes for coroutinization of manager
"
* 'some_compaction_manager_cleanups_v5' of github.com:raphaelsc/scylla:
compaction_manager: move check_for_cleanup into perform_cleanup()
compaction_manager: replace get_total_size by one liner
compaction_manager: make consistent usage of type and name table
compaction_manager: simplify rewrite_sstables()
compaction_manager: restore indentation
Currently storage service acts as a glue between database schema value
and the migration manager "passive_announce" call. This interposing is
not required, migration manager can do all the management itself, and
the linkage can be done in main.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Because today's migration_manager::stop is called drain-time.
Keep the .stop for next patch, but since it's called when the
whole migration_manager stops, guard it against re-entrances.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Both calls are now private. Also the non-maybe one can become void
and handle pull exceptions by itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Move the calls from respective storage service notification callbacks.
One non-move change is that token metadata that was available on the
storage service should be taken from storage proxy, but this change
is aligned with future changes -- migration manager depends on proxy
and will get a local proxy reference some day.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>