do_with_some_data runs a function in a seastar thread.
It needs to get() the future func returns rather
than propagating it.
This solves a secondary failure due to abandoned future
when the test case fails, as seen in
https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/4254/artifact/testlog/x86_64_debug/database_test.snapshot_with_quarantine_works.381.log
```
test/boost/database_test.cc(903): fatal error: in "snapshot_with_quarantine_works": critical check expected.empty() has failed
WARN 2021-12-08 00:35:16,300 [shard 0] seastar - Exceptional future ignored: boost::execution_aborted, backtrace: 0x10935e50 0x16ff2d8d 0x16ff2a4d 0x16ff5033 0x16ff5ec2 0x162d4ce9 0x10a2bdb5 0x10a2bd24 0x10a54ca4 0x10a27cf3 0x10a22151 0x10a67c9d 0x10a67a78 0x163ac37e 0x163b29e9 0x163b7690 0x163b51c1 0x17c212df 0x17c1f097 0x17bf8b4c 0x17bf83f2 0x17bf82a2 0x17bf7d52 0x10f8bf5a 0x166db84b /lib64/libpthread.so.0+0x9298 /lib64/libc.so.6+0x100352
...
*** 1 abandoned failed future(s) detected
Failing the test because fail was requested by --fail-on-abandoned-failed-futures
```
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211209174512.851945-1-bhalevy@scylladb.com>
Mutations are not guaranteed to come in the order of their timestamps.
If there is an expired tombstone in the sstable and a repair inserts old
data into memtable, the compaction would not consider memtable data and
purge the tombstone leading to data resurrection. The solution is to
disallow purging tombstones newer than min memtable timestamp.
Since sstable reader was already converted to flat_mutation_reader_v2, compaction layer can naturally be converted too.
There are many dependencies that use v1. Those strictly needed like readers in sstable set, which links compaction to sstable reader, were converted to v2 in this series. For those that aren't essential we're relying on V1<-->V2 adaptors, and conversion work on them will be postponed. Those being postponed are: scrub specialized reader (needs a validator for mutation_fragment_v2), interposer consumer, combined reader which is used by incremental selector. incremental selector itself was converted to v2.
tests: unit(debug).
Closes#9725
* github.com:scylladb/scylla:
compaction: update compaction::make_sstable_reader() to flat_mutation_reader_v2
sstable_set: update make_crawling_reader() to flat_mutation_reader_v2
sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2
sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2
sstable_set: update incremental_reader_selector to flat_mutation_reader_v2
"
Currently stateful (readers being saved and resumed on page boundaries)
multi-range scans are broken in multiple ways. Trying to use them can
result in anything from use-after-free (#6716) or getting corrupt data
(#9718). Luckily no-one is doing such queries today, but this started to
change recently as code such as Alternator TTL and distributed
aggregate reads started using this.
This series fixes both problems and adds a unit test too exercising this
previously completely unused code-path.
Fixes: #6716Fixes: #9718
Tests: unit(dev, release, debug)
"
* 'fix-stateful-multi-range-scans/v1' of https://github.com/denesb/scylla:
test/boost/multishard_mutation_query_test: add multi-range test
test/boost/multishard_mutation_query_test: add multi-range support
multishard_mutation_query: don't drop data during stateful multi-range reads
multishard_combining_reader: reader_lifecycle_policy: allow saving read range on fast-forward
As part of the drive to move over to flat_mutation_reader_v2, update
make_filtering_reader(). Since it doesn't examine range tombstones
(only the partition_start, to filter the key) the entire patch
is just glue code upgrading and downgrading users in the pipeline
(or removing a conversion, in one case).
Test: unit (dev)
Closes#9723
For each quarantine mode:
Validate sstables to quarantine one of them
and then scrub with the given quarantine mode
and verify the output whwther the quarantined
sstable was scrubbed or not.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When invalid sstables are detected, move them
to the quarantine subdirectory so they won't be
selected for regular compaction.
Refs #7658
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Test that we load quarantined sstables by
creating a dataset, moving a sstable to the quarantine dir,
and then reload the table and verify the dataset.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define the "staging", "upload", and "snapshots" subdirectory
names as named const expressions in the sstables namespace
rather than relying on their string representation,
that could lead to typo mistakes.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In the test infrastructure code, so we can add tests passing multiple
ranges to the tested `multishard_{mutation,data}_query()`, exercising
multi-range functionality.
Add the summary index and the bound's address to the error message, so
it can be correlated with other trace level logging when investigating a
problem.
Refs: #9446
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211202124955.542293-2-bdenes@scylladb.com>
We read the compressed file size from a file that was already closed,
resulting in EBADF on my machine. Not sure why it works for everyone
else.
Fix by reading the size using the path.
Closes#9675
This series introduces a new version of a loading_cache class.
The old implementation was susceptible to a "pollution" phenomena when frequently used entry can get evicted by an intensive burst of "used once" entries pushed into the cache.
The new version is going to have a privileged and unprivileged cache sections and there's a new loading_cache template parameter - SectionHitThreshold. The new cache algorithm goes as follows:
* We define 2 dynamic cache sections which total size should not exceed the maximum cache size.
* New cache entry is always added to the "unprivileged" section.
* After a cache entry is read more than SectionHitThreshold times it moves to the second cache section.
* Both sections' entries obey expiration and reload rules in the same way as before this patch.
* When cache entries need to be evicted due to a size restriction "unprivileged" section's
least recently used entries are evicted first.
More details may be found in #8674.
In addition, during a testing another issue was found in the authorized_prepared_statements_cache: #9590.
There is a patch that fixes it as well.
Closes#9708
* github.com:scylladb/scylla:
loading_cache: account unprivileged section evictions
loading_cache: implement a variation of least frequent recently used (LFRU) eviction policy
authorized_prepared_statements_cache: always "touch" a corresponding cache entry when accessed
loading_cache::timestamped::lru_entry: refactoring
loading_cache.hh: rearrange the code (no functional change)
loading_cache: use std::pmr::polymorphic_allocator
This series extends `compaction_manager::stop_ongoing_compaction` so it can be used from the api layer for:
- table::disable_auto_compaction
- compaction_manager::stop_compaction
Fixes#9313Fixes#9695
Test: unit(dev)
Closes#9699
* github.com:scylladb/scylla:
compaction_manager: stop_compaction: wait for ongoing compactions to stop
compaction_manager: stop_ongoing_compactions: log Stopping 0 tasks at debug level
compaction_manager: unify stop_ongoing_compactions implementations
compaction_manager: stop_ongoing_compactions: add compaction_type option
compaction_manager: get_compactions: get a table* parameter
table: disable_auto_compaction: stop ongoing compactions
compaction_manager: make stop_ongoing_compactions public
table: futurize disable_auto_compactions
This PR finally removes the `term` class and replaces it with `expression`.
* There was some trouble with `lwt_cache_id` in `expr::function_call`.
The current code works the following way:
* for each `function_call` inside a `term` that describes a pk restriction, `prepare_context::add_pk_function_call` is called.
* `add_pk_function_call` takes a `::shared_ptr<cql3::functions::function_call>`, sets its `cache_id` and pushes this shared pointer onto a vector of all collected function calls
* Later when some condiition is met we want to clear cache ids of all those collected function calls. To do this we iterate through shared pointers collected in `prepare_context` and clear cache id for each of them.
This doesn't work with `expr::function_call` because it isn't kept inside a shared pointer.
To solve this I put the `lwt_cache_id` inside a shared pointer and then `prepare_context` collects these shared pointers to cache ids.
I also experimented with doing this without any shared pointers, maybe we could just walk through the expression and clear the cache ids ourselves. But the problem is that expressions are copied all the time, we could clear the cache in one place, but forget about a copy. Doing it using shared pointers more closely matches the original behaviour.
The experiment is on the [term2-pr3-backup-altcache](https://github.com/cvybhu/scylla/tree/term2-pr3-backup-altcache) branch
* `shared_ptr<term>` being `nullptr` could mean:
* It represents a cql value `null`
* That there is no value, like `std::nullopt` (for example in `attributes.hh`)
* That it's a mistake, it shouldn't be possible
A good way to distinguish between optional and mistake is to look for `my_term->bind_and_get()`, we then know that it's not an optional value.
* On the other hand `raw_value` cased to bool means:
* `false` - null or unset
* `true` - some value, maybe empty
I ran a simple benchmark on my laptop to see how performance is affected:
```
build/release/test/perf/perf_simple_query --smp 1 -m 1G --operations-per-shard 1000000 --task-quota-ms 10
```
* On master (a21b1fbb2f) I get:
```
176506.60 tps ( 77.0 allocs/op, 12.0 tasks/op, 45831 insns/op)
median 176506.60 tps ( 77.0 allocs/op, 12.0 tasks/op, 45831 insns/op)
median absolute deviation: 0.00
maximum: 176506.60
minimum: 176506.60
```
* On this branch I get:
```
172225.30 tps ( 75.1 allocs/op, 12.1 tasks/op, 46106 insns/op)
median 172225.30 tps ( 75.1 allocs/op, 12.1 tasks/op, 46106 insns/op)
median absolute deviation: 0.00
maximum: 172225.30
minimum: 172225.30
```
Closes#9481
* github.com:scylladb/scylla:
cql3: Remove remaining mentions of term
cql3: Remove term
cql3: Rename prepare_term to prepare_expression
cql3: Make prepare_term return an expression instead of term
cql3: expr: Add size check to evaluate_set
cql3: expr: Add expr::contains_bind_marker
cql3: expr: Rename find_atom to find_binop
cql3: expr: Add find_in_expression
cql3: Remove term in operations
cql3: Remove term in relations
cql3: Remove term in multi_column_restrictions
cql3: Remove term in term_slice, rename to bounds_slice
cql3: expr: Remove term in expression
cql3: expr: Add evaluate_IN_list(expression, options)
cql3: Remove term in column_condition
cql3: Remove term in select_statement
cql3: Remove term in update_statement
cql3: Use internal cql format in insert_prepared_json_statement cache
types: Add map_type_impl::serialize(range of <bytes, bytes>)
cql3: Remove term in cql3/attributes
cql3: expr: Add constant::view() method
cql3: expr: Implement fill_prepare_context(expression)
cql3: expr: add expr::visit that takes a mutable expression
cql3: expr: Add receiver to expr::bind_variable
This patch implements a simple variation of LFRU eviction policy:
* We define 2 dynamic cache sections which total size should not exceed the maximum cache size.
* New cache entry is always added to the "unprivileged" section.
* After a cache entry is read more than SectionHitThreshold times it moves to the second cache section.
* Both sections' entries obey expiration and reload rules in the same way as before this patch.
* When cache entries need to be evicted due to a size restriction "unprivileged" section's
least recently used entries are evicted first.
Note:
With a 2 sections cache it's not enough for a new entry to have the latest timestamp
in order not be evicted right after insertion: e.g. if all all other entries
are from the privileged section.
And obviously we want to allow new cache entries to be added to a cache.
Therefore we can no longer first add a new entry and then shrink the cache.
Switching the order of these two operations resolves the culprit.
Fixes#8674
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Most of the machinery was already implemented since it was used when
jumping between clustering ranges of a query slice. We need only perform
one additional thing when performing an index skip during
fast-forwarding: reset the stored range tombstone in the consumer (which
may only be stored in fast-forwarding mode, so it didn't matter that it
wasn't reset earlier). Comments were added to explain the details.
As a preparation for the change, we extend the sstable reversing reader
random schema test with a fast-forwarding test and include some minor
fixes.
Fixes#9427.
Closes#9484
* github.com:scylladb/scylla:
query-request: add comment about clustering ranges with non-full prefix key bounds
sstables: mx: enable position fast-forwarding in reverse mode
test: sstable_conforms_to_mutation_source_test: extend `test_sstable_reversing_reader_random_schema` with fast-forwarding
test: sstable_conforms_to_mutation_source_test: fix `vector::erase` call
test: mutation_source_test: extract `forwardable_reader_to_mutation` function
test: random_schema: fix clustering column printing in `random_schema::cql`
When memtable contains both mutations and tombstones that delete them,
the output flushed to sstables contains both mutations. Inserting a
compacting reader results in writing smaller sstables and saves
compaction work later.
Performance tests of this change have shown a regression in a common
case where there are no deletes. A heuristic is employed to skip
compaction unless there are tombstones in the memtable to minimise
the impact of that issue.
The test would check whether the forward and reverse readers returned
consistent results when created in non-forwarding mode with slicing.
Do the same but using fast-forwarding instead of slicing.
To do this we require a vector of `position_range`s. We also need a
vector of `clustering_range`s for the existing test. We modify the
existing `random_ranges` function to return `position_range`s instead of
`clustering_range`s since `position_range`s are easier to reason about,
especially when we consider non-full clustering key prefixes. A function
is introduced to convert a `position_range` to a `clustering_range` for
the existing test.
This series hardens compaction_manager::remove by:
- add debug logging around task execution and stopping.
- access compaction_state as lw_shared_ptr rather than via a raw pointer.
- with that, detach it from `_compaction_state` in `compaction_manager::remove` right away, to prevent further use of it while compactions are stopped.
- added write_lock in `remove` to make sure the lock is not held by any stray task.
Test: unit(dev), sstable_compaction_test(debug)
Dtest: alternator_tests.py:AlternatorTest.test_slow_query_logging (debug)
Closes#9636
* github.com:scylladb/scylla:
compaction_manager: add compaction_state when table is constructed
compaction_manager: remove: fixup indentation
compaction_manager: remove: detach compaction_state before stopping ongoing compactions
compaction_manager: remove: serialize stop_ongoing_compactions and gate.close
compaction_manager: task: keep a reference on compaction_state
test: sstable_compaction_test: incremental_compaction_data_resurrection_test: stop table before it's destroyed.
test: sstable_utils: compact_sstables: deregister compaction also on error path
test: sstable_compaction_test: partial_sstable_run_filtered_out_test: deregiser_compaction also on error path
test: compaction_manager_test: add debug logging to register/deregister compaction
test: compaction_manager_test: deregister_compaction: erase by iterator
test: compaction_manager_test: move methods out of line
compaction_manager: compaction_state: use counter for compaction_disabled
compaction_manager: task: delete move and copy constructors
compaction_manager: add per-task debug log messages
compaction_manager: stop_ongoing_compactions: log number of tasks to stop
"
Reverse queries has to use the reverse schema (query schema) for the
read itself but the table schema for the result building, according to
the established interface with the coordinator (half-reverse format).
Range scans were using the query schema for both, which produced
un-parseable reconcilable results for mutation range scans.
This series fixes this and adds unit tests to cover this previously
uncovered area.
"
Fixes#9673.
* 'reverse-range-scan-test/v1' of https://github.com/denesb/scylla:
test/boost/multishard_mutation_query_test: add reverse read test
test/boost/multishard_mutation_query_test: add test for combinations of limits, paging and stateful
test/boost/multishard_mutation_query_test: generalize read_partitions_with_paged_scan()
test/boost/multishard_mutation_query_test: add read_all_partitions_one_by_one() overload with slice
multishard_mutation_query: fix reverse scans
partition_slice: init all fields in copy ctor
partition_slice: operator<<: print the entire partition row limit
partition_slice_builder: add with_partition_row_limit()
Which also tests combinations of limits, paging and statefulness.
Fixes: #9328
This patch fixes the above issue by providing the test said issue was
asking for to be considered fixed. The bug described therein was already
fixed by an earlier patch.
Extract all logic related to issuing the actual read and building the
combined result. This is now done by an ResultBuilder template object,
which allows reusing the paging logic for both mutation and data scans.
ResultBuilder implementations for which are also provided by this patch.
The paging logic is also fixed to work with correctly with
per-partition-row-limit.
This series gets rid of the global batchlog_manager instance.
It does so by first, allowing to set a global pointer
and instatiating stack-local instances in main and
cql_test_env.
Expose the cql_test_env batchlog_manager to tests
so they won't need the global `get_batchlog_manager()` as
used in batchlog_manager_test.test_execute_batch.
Then we pass a reference to the `sharded<db::batchlog_manager>` to
storage_service so it can be used instead of the global one.
Derive batchlog_manager from peering_sharded_service so it
get its `container()` rather than relying on the global `get_batchlog_manager()`.
And finally, handle a circular dependency between the batchlog_manager,
that relies on the query_processor that, in turn, relies on the storage_proxy,
and the the storage_proxy itself that depends on the batchlog_manager for
`mutate_atomically`.
Moved `endpoint_filter` to gossiper so `storage_proxy::mutate_atomically`
can call it via the `_gossiper` member it already has.
The function requires a gossiper object rather than a batchlog_manager
object.
Also moved `get_batch_log_mutation_for` to storage_proxy so it can be
called from `sync_write_to_batchlog` (also from the mutate_atomically path)
Test: unit(dev)
DTest: batch_test.py:TestBatch.test_batchlog_manager_issue(dev)
* git@github.com:bhalevy/scylla.git deglobalize-batchlog_manager-v2
get rid of the global batchlog_manager
batchlog_manager: get_batch_log_mutation_for: move to storage_proxy
batchlog_manager: endpoint_filter: move to gossiper
batchlog_manager: do_batch_log_replay: use lambda coroutine
batchlog_manager: derive from peering_sharded_service
storage_service: keep a reference to the batchlog_manager
test: cql_test_env: expose batchlog_manager
main: allow setting the global batchlog_manager
These test cases may crash if running with more shards.
This is not required for test.py runs, but rather when
running the test manually using the command line.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211122204340.1020932-1-bhalevy@scylladb.com>
It must remove itself from the compaction_manager,
that will stop_ongoing_compactions.
Without that we're hitting
```
sstable_compaction_test: ./seastar/include/seastar/core/gate.hh:56: seastar::gate::~gate(): Assertion `!_count && "gate destroyed with outstanding requests"' failed.
```
when destroying the compaction_manager.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And rename to get_batchlog_mutation_for while at it,
as it's about the batchlog, not batch_log.
This resolves a circular dependency between the
batchlog_manager and the storage_proxy that required
it in the case.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Indexed queries are using paging over the materialized view
table. Results of the view read are then used to issue reads of the
base table. If base table reads are short reads, the page is returned
to the user and paging state is adjusted accordingly so that when
paging is resumed it will query the view starting from the row
corresponding to the next row in the base which was not yet
returned. However, paging state's "remaining" count was not reset, so
if the view read was exhausted the reading will stop even though the
base table read was short.
Fix by restoring the "remaining" count when adjusting the paging state
on short read.
Tests:
- index_with_paging_test
- secondary_index_test
Fixes#9198
Message-Id: <20210818131840.1160267-1-tgrabiec@scylladb.com>
First, it doesn't test the gossiper so
it's unclear why have it at all.
And it doesn't test anything more than what we test
using the cql_test_env either.
For testing gossip there is test/manual/gossip.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211122081305.789375-2-bhalevy@scylladb.com>
"
After this series, compaction will finally stop including database.hh.
tests: unit(debug).
"
* 'stop_including_database_hh_for_compaction' of github.com:raphaelsc/scylla:
compaction: stop including database.hh
compaction: switch to table_state in get_fully_expired_sstables()
compaction: switch to table_state
compaction: table_state: Add missing methods required by compaction
Make compaction procedure switch to table_state. Only function in
compaction.cc still directly using table is
get_fully_expired_sstables(T,...), but subsequently we'll make it
switch to table_state and then we can finally stop including database.hh
in the compaction code.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
These are the only methods left for compaction to switch to
table_state, so compaction can finally stop including database.hh
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
"
Add a sharded locator::effective_replication_map_factory that holds
shared effective_replication_maps.
To search for e_r_m in the factory, we use a compound `factory_key`:
<replication_strategy type, replication_strategy options, token_metadata ring version>.
Start the sharded factory in main (plus cql_test_env and tools/schema_loader)
and pass a reference to it to storage_proxy and storage_server.
For each keyspace, use the registry to create the effective_replication_map.
When registered, effective_replication_map objects erase themselves
from the factory when destroyed. effective_replication_map then schedules
a background task to clear_gently its contents, protected by the e_r_m_f::stop()
function.
Note that for non-shard 0 instances, if the map
is not found in the registry, we construct it
by cloning the precalculated replication_map
from shard 0 to save the cpu cycles of re-calculating
it time and again on every shard.
Test: unit(dev), schema_loader_test(debug)
DTest: bootstrap_test.py:TestBootstrap.decommissioned_wiped_node_can_join_test update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_with_repair_test (dev)
"
* tag 'effective_replication_map_factory-v7' of https://github.com/bhalevy/scylla:
effective_replication_map: clear_gently when destroyed
database: shutdown keyspaces
test: cql_test_env: stop view_update_generator before database shuts down
effective_replication_map_factory: try cloning replication map from shard 0
tools: schema_loader: start a sharded erm_factory
storage_service: use erm_factory to create effective_replication_map
keyspace: use erm_factory to create effective_replication_map
effective_replication_map: erase from factory when destroyed
effective_replication_map_factory: add create_effective_replication_map
effective_replication_map: enable_lw_shared_from_this
effective_replication_map: define factory_key
keyspace: get a reference to the erm_factory
main: pass erm_factory to storage_service
main: pass erm_factory to storage_proxy
locator: add effective_replication_map_factory
To be used for creating effective_replication_map
when token_metadata changes, and update all
keyspaces with it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It will be used further to create shared copies
of effective_replication_map based on replication_strategy
type and config options.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>