When purging regular tombstone consult the min_live_timestamp, if available.
This is safe since we don't need to protect dead data from resurrection, as it is already dead.
For shadowable_tombstones, consult the min_memtable_live_row_marker_timestamp,
if available, otherwise fallback to the min_live_timestamp.
If we see in a view table a shadowable tombstone with time T, then in any row where the row marker's timestamp is higher than T the shadowable tombstone is completely ignored and it doesn't hide any data in any column, so the shadowable tombstone can be safely purged without any effect or risk resurrecting any deleted data.
In other words, rows which might cause problems for purging a shadowable tombstone with time T are rows with row markers older or equal T. So to know if a whole sstable can cause problems for shadowable tombstone of time T, we need to check if the sstable's oldest row marker (and not oldest column) is older or equal T. And the same check applies similarly to the memtable.
If both extended timestamp statistics are missing, fallback to the legacy (and inaccurate) min_timestamp.
Fixesscylladb/scylladb#20423Fixesscylladb/scylladb#20424
> [!NOTE]
> no backport needed at this time
> We may consider backport later on after given some soak time in master/enterprise
> since we do see tombstone accumulation in the field under some materialized views workloads
Closesscylladb/scylladb#20446
* github.com:scylladb/scylladb:
cql-pytest: add test_compaction_tombstone_gc
sstable_compaction_test: add mv_tombstone_purge_test
sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection
sstable_compaction_test: tombstone_purge_test: add testlog debugging
sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp
sstable, compaction: add debug logging for extended min timestamp stats
compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats
compaction: define max_purgeable_fn
tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh
sstables: scylla_metadata: add ext_timestamp_stats
compaction_group, storage_group, table_state: add extended timestamp stats getters
sstables, memtable: track live timestamps
memtable_encoding_stats_collector: update row_marker: do nothing if missing
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This PR introduces a new file data source implementation for uncompressed SSTables that will be validating the checksum of each chunk that is being read. Unlike for compressed SSTables, checksum validation for uncompressed SSTables will be active for scrub/validate reads but not for normal user reads to ensure we will not have any performance regression.
It consists of:
* A new file data source for uncompressed SSTables.
* Integration of checksums into SSTable's shareable components. The validation code loads the component on demand and manages its lifecycle with shared pointers.
* A new `integrity_check` flag to enable the new file data source for uncompressed SSTables. The flag is currently enabled only through the validation path, i.e., it does not affect normal user reads.
* New scrub tests for both compressed and uncompressed SSTables, as well as improvements in the existing ones.
* A change in JSON response of `scylla validate-checksums` to report if an uncompressed SSTable cannot be validated due to lack of checksums (no `CRC.db` in `TOC.txt`).
Refs #19058.
New feature, no backport is needed.
Closesscylladb/scylladb#20207
* github.com:scylladb/scylladb:
test: Add test to validate SSTables with no checksums
tools: Fix typo in help message of scylla validate-checksums
sstables: Allow validate_checksums() to report missing checksums
test: Add test for concurrent scrub/validate operations
test: Add scrub/validate tests for uncompressed SSTables
test/lib: Add option to create uncompressed random schemas
test: Add test for scrub/validate with file-level corruption
test: Check validation errors in scrub tests
sstables: Enable checksum validation for uncompressed SSTables
sstables: Expose integrity option via crawling mutation readers
sstables: Expose integrity option via data_consume_rows()
sstables: Add option for integrity check in data streams
sstables: Remove unused variable
sstables: Add checksum in the SSTable components
sstables: Introduce checksummed file data source implementation
sstables: Replace assert with on_internal_error
Fixes#20543
In cql_test_env, if cfg_in.ms_listen is set, we try to get a free port for the current test on
which message service rpc can bind. This to allow multiple tests in parallel.
However, we just do this by using random and getting a number, not actually verifying it against
host ports in use.
This is complicated further by the fact that port reuse is effectively disabled in seastar
(see reactor::posix_reuseport_detect()). Due to this, the solution applied here is a combo
of
* Create temp socket with port = 0 to get a previously free port
* Close socket right before listen (to handle reuse not working)
* Retry on EADDRINUSE
Closesscylladb/scylladb#20547
Migrate the `system_distributed.view_build_status` table to `system.view_build_status_v2`. The writes to the v2 table are done via raft group0 operations.
The new parameter `view_builder_version` stored in `scylla_local` indicates whether nodes should use the old or the new table.
New clusters use v2. Otherwise, the migration to v2 is initiated by the topology coordinator when the feature is enabled. It reads all the rows from the old table and writes them to the new table, and sets `view_builder_version` to v2. When the change is applied, all view_builder services are updated to write and read from the v2 table.
The old table `system_distributed.view_build_status` is set to read virtually from the new table in order to maintain compatibility.
When removing a node from the cluster, we remove its rows from the table atomically (fixes https://github.com/scylladb/scylladb/issues/11836). Also, during the migration, we remove all invalid rows.
Fixesscylladb/scylladb#15329
dtest https://github.com/scylladb/scylla-dtest/pull/4827Closesscylladb/scylladb#19745
* github.com:scylladb/scylladb:
view: test view_build_status table with node replace
test/pylib: use view_build_status_v2 table in wait_for_view
view_builder: common write view_build_status function
view_builder: improve migration to v2 with intermediate phase
view: delete node rows from view_build_status on node removal
view: sanitize view_build_status during migration
view: make old view_build_status table a virtual table
replica: move streaming_reader_lifecycle_policy to header file
view_builder: test view_build_status_v2
storage_service: add view_build_status to raft snapshot
view_builder: migration to v2
db:system_keyspace: add view_builder_version to scylla_local
view_builder: read view status from v2 table
view_builder: introduce writing status mutations via raft
view_builder: pass group0_client and qp to view_builder
view_builder: extract sys_dist status operations to functions
db:system_keyspace: add view_build_status_v2 table
In a previous patch we extended the return status of
`sstables::validate_checksums()` to report if an SSTable cannot be
validated due to a missing CRC component (i.e., CRC.db does not appear
in TOC.txt).
Add a test case for this.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Extend the `random_schema_specification` to support creating both
compressed and uncompressed schemas.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
To return the minimum live timestamp and live row-marker
timestamp across a compaction_group, storage_group, or
table_state.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The test-env in question is mostly started in one-shard mode. Also there
are several boost tests that start sharded<> environment. In that case
instances on different shards live in different temp dirs. That's not
critical yet, but better to have single directory for the whole test.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20412
Store references of group0_client and query_processor in the
view_builder service.
They are required for generating mutations and writing them via group0.
Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping.
In 19a6e69001 we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to.
However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection.
Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables.
A unit test is added.
Fixes#15559
This may be useful to users transitioning from Cassandra, so merits a backport.
Closesscylladb/scylladb#19493
* github.com:scylladb/scylladb:
cql3: add option to not unify bind variables with the same name
cql3: introduce dialect infrastructure
cql3: prepared_statement_cache: drop cache key default constructor
All users of it have sstable_test_env at hand (in fact -- they call env
method to get table_for_test). And since sstable_test_env already has a
bunch of methods to create sstable, the table_for_test wrapper doesn't
need to duplicate this code.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20360
in a recent seastar change (644bb662), we do not include
`seastar/testing/random.hh` in `seastar/testing/test_runner.hh` anymore,
as the latter is not a facade of the former, and neither does it use the
former. as a sequence, some tests which take the advantage of the
included `seastar/testing/random.hh` do not build with the latest
seastar:
```
FAILED: test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o
/usr/bin/clang++ -DBOOST_REGEX_DYN_LINK -DBOOST_REGEX_NO_LIB -DBOOST_UNIT_TEST_FRAMEWORK_DYN_LINK -DBOOST_UNIT_TEST_FRAMEWORK_NO_LIB -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSCYLLA_ENABLE_PREEMPTION_SOURCE -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/build/gen -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -I/__w/scylladb/scylladb/build -isystem /__w/scylladb/scylladb/abseil -isystem /__w/scylladb/scylladb/build/rust -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -MD -MT test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -MF test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o.d -o test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -c /__w/scylladb/scylladb/test/lib/key_utils.cc
In file included from /__w/scylladb/scylladb/test/lib/key_utils.cc:11:
/__w/scylladb/scylladb/test/lib/random_utils.hh:25:30: error: no member named 'local_random_engine' in namespace 'seastar::testing'
25 | return seastar::testing::local_random_engine;
| ~~~~~~~~~~~~~~~~~~^
1 error generated.
```
in this change, we include `seastar/testing/random.hh` when the random
facility is used, so that they can be compiled with the latest seastar
library.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20368
Bind variables in CQL have two formats: positional (`?`) where a
variable is referred to by its relative position in the statement,
and named (`:var`), where the user is expected to supply a
name->value mapping.
In 19a6e69001 we identified the case where a named bind variable
appears twice in a query, and collapsed it to a single entry in the
statement metadata. Without this, a driver using the named variable
syntax cannot disambiguate which variable is referred to.
However, it turns out that users can use the positional call form
even with the named variable syntax, by using the positional
API of the driver. To support this use case, we add a configuration
variable to disable the same-variable detection.
Because the detection has to happen when the entire statement is
visible, we have to supply the configuration to the parser. We
call it the `dialect` and pass it from all callers. The alternative
would be to add a pre-prepare call similar to fill_prepare_context that
rewrites all expressions in a statement to deduplicate variables.
A unit test is added.
Fixes#15559
A dialect is a different way to interpret the same CQL statement.
Examples:
- how duplicate bind variable names are handled (later in this series)
- whether `column = NULL` in LWT can return true (as is now) or
whether it always returns NULL (as in SQL)
Currently, dialect is an empty structure and will be filled in later.
It is passed to query_processor methods that also accept a CQL string,
and from there to the parser. It is part of the prepared statement cache
key, so that if the dialect is changed online, previous parses of the
statement are ignored and the statement is prepared again.
The patch is careful to pick up the dialect at the entry point (e.g.
CQL protocol server) so that the dialect doesn't change while a statement
is parsed, prepared, and cached.
This adds minimal implementation of the start-backup API call.
The method starts a task that uploads all files from the given keyspace's snapshot to the requested endpoint/bucket. Arguments are:
- endpoint -- the ID in object_store.yaml config file
- bucket -- the target bucket to put objects into
- keyspace -- the keyspace to work on
- snapshot -- the method assumes that the snapshot had been already taken and only copies sstables from it
The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion (hint: it's good to have non-zero TTL value to make sure fast backups don't finish before the caller manages to call wait_task API).
Sstables components are scanned for all tables in the keyspace and are uploaded into the /bucket/${cf_name}/${snapshot_name}/ path.
refs: #18391Closesscylladb/scylladb#19890
* github.com:scylladb/scylladb:
tools/scylla-nodetool: add backup integration
docs: Document the new backup method
test/object_store: Test that backup task is abortable
test/object_store: Add simple backup test
test/object_store: Move format_tuples()
test/pylib: Add more methods to rest client
backup-task: Make it abortable (almost)
code: Introduce backup API method
database: Export parse_table_directory_name() helper
database: Introduce format_table_directory_name() helper
snapshot-ctl: Add config to snapshot_ctl
snapshot-ctl: Add sstables::storage_manager dependency
snapshot-ctl: Maintain task manager module
snapshot-ctl: Add "snapshots" logger
snapshot-ctl: Outline stop() method and constructor
snapshot-ctl: Inline run_snapshot_list<>
test/cql_test_env: Export task manager from cql test env
task_manager: Print task ttl on start (for debugging)
docs: Update object_storage.md with AWS_ environment
docs: Restructure object_storage.md
Currently the major compaction task impl grabs this (non-updateable)
value from db::config. That's not good, all services including
compaction manager have their own configs from which they take options.
Said that, this patch puts the said option onto
compaction_manager::config, makes use of it and configures one from
db::config on start (and tests).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20174
in 30e82a81, we add a contraint to the template parameter of
boost_test_print_type() to prevent it from being matched with
types which can be formatted with operator<<. but it failed to
work. we still have test failure reports like:
```
[Exception] - critical check ['s', 's', 't', '_', 'm', 'r', '.', 'i', 's', '_', 'e', 'n', 'd', '_', 'o', 'f', '_', 's', 't', 'r', 'e', 'a', 'm', '(', ')'] has failed
```
this is not what we expect. the reason is that we passed the template
parameters to the `has_left_shift` trait in the wrong order, see
https://live.boost.org/doc/libs/1_83_0/libs/type_traits/doc/html/boost_typetraits/reference/has_left_shift.html.
we should have passed the lhs of operator<< expression as first
parameter, and rhs the second.
so, in this change, we correct the type constraint by passing the
template parameter in the right order, now the error message looks
better, like:
```
test/boost/mutation_query_test.cc(110): error: in "test_partition_query_is_full": check !partition_slice_builder(*s) .with_range({}) .build() .is_full() has failed
```
it turns out boost::transformed_range<> is formattable with operator<<,
as it fulfills the constraints of `boost::has_left_shift<ostream, R>`,
but when printing it, the compiler fails when it tries to insert the
elements in the range to the output stream.
so, in order to workaround this issue, we add a specialization for
`boost::transformed_range<F, R`.
also, to improve the readability, we reimplement the `has_left_shift<>`
as a concept, so that it's obvious that we need to put both the output
stream as the first parameter.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20233
Drop half-reversed (legacy) format of query::partition_slice.
The select query builds a fully reversed (native) slice for reversed queries and use it together with a reversed
schema to construct query::read_command that is further propagated to the database.
A cluster feature is added to support nodes that still operate on half-reversed slices. When the feature is turned off:
- query::read_command is transformed (to have table schema and half-reversed slices) before sending to other nodes
- query::read_command is transformed (to have query schema (reversed) and reversed slices) after receiving it from other nodes
- Similarly, mutations are transformed. They are reversed before being sent to other nodes or after receiving them from other nodes.
Additional manual tests were performed to test a mixed-node cluster:
1. 3-node cluster with one node upgraded: reverse read queries performed on an old node
2. 3-node cluster with one node upgraded: reverse read queries performed on a new node
3. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on an old node
4. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on a new node
All reverse read queries above consists of:
- single-partition reverse reads with no clustering key restrictions, with single column restrictions and multi column restrictions both with and without paging turned on
- multi-partition reverse reads with range restrictions with optional partition limit and partial ordering
The exact same tests were also performed on a fully upgraded cluster.
Fixes https://github.com/scylladb/scylladb/issues/12557Closesscylladb/scylladb#18864
* github.com:scylladb/scylladb:
mutation_partition: drop reverse parameter in compact_for_query
clustering_key_filter: unify get_ranges and get_native_ranges
streamed_mutation_freezer: drop the reverse parameter
reverse-reads.md: Drop legacy reverse format information
Fix comments refering to half-reversed (legacy) slices
select_statement::do_execute: Add tracing informaction
query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges
query-request: Drop half_reverse_slice as it is no longer used anywhere
readers: Use reversed schema and native reversed slices
database: accept reversed schema for reversed queries
storage_proxy: Support reverse queries in native format
query_pagers: Replace _schema with _query_schema
query_pagers: Support reverse queries in native format
select_statement: Execute reversed query in native format
storage_proxy::remote: Add support for mixed-node clusters
mutation_query: Add reversed function to reverse reconcilable_result
query-request: Add reversed function to reverse read_command
features: add native_reverse_queries
kl::reader::make_reader: Unify interface with mx::reader::make_reader
config: drop reversed_reads_auto_bypass_cache
config: drop enable_optimized_reversed_reads
If system_keyspace::stop() is called before system_keyspace::shutdown(),
it will never finish, because the uncleared shared pointers will keep
it alive indefinitely.
Currently this can happen if an exception is thrown before the construction
of the shutdown() defer. This patch moves the shutdown() call to immediately
before stop(). I see no reason why it should be elsewhere.
Fixesscylladb/scylla-enterprise#4380Closesscylladb/scylladb#20089
The reconcilable_result is built as it would be constructed for
forward read queries for tables with reversed order.
Mutations constructed for reversed queries are consumed forward.
Drop overloaded reversed functions that reverse read_command and
reconcilable_result directly and keep only those requiring smart
pointers. They are not used any more.
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
It's only needed to start hints via proxy, but proxy can do it without gossiper argument
Closesscylladb/scylladb#19894
* github.com:scylladb/scylladb:
storage_service: Remote gossiper argument from join_cluster()
proxy: Use remote gossiper to start hints resource manager
hints: Const-ify gossiper references and anchor pointers
Most callers of the raft group0 client interface are passing a real
source instance, so we can use the abort source reference in the client
interface. This change makes the code simpler and more consistent.
This pointer was only needed to pull all the way down the hints resource
manager start() method. It's no longer needed for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Keep task manager node ops module in storage service. It will be
used to create and manage tasks related to topology changes.
The module is created and registered in storage service constructor.
In storage_service::stop() the module is stopped and so all the remaining
tasks would be unregistered immediately after they are finished.
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.
The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.
Fixes https://github.com/scylladb/scylladb/issues/19722Closesscylladb/scylladb#19717
* github.com:scylladb/scylladb:
boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded
sstables: do not reload components of unlinked sstables
sstables/sstables_manager: introduce on_unlink method
`filesystem_storage` methods frequently call `sync_directory()`, for the sake of flushing (sync'ing) a directory. `sync_directory()` always brackets the sync with open and close, and given that most `sync_directory()` calls target the sstable base directory, those repeated opens and closes are considered wasteful. Rework the `filesystem_storage::_dir` member (from a mere pathname) so that it stand for an `opened_directory` object, which keeps the sstable base directory open, for the purpose of repeated sync'ing.
Resolves#2399.
Closesscylladb/scylladb#19624
* github.com:scylladb/scylladb:
sstables/storage: synch "dst_dir" more leanly in create_links_common()
sstables/storage: close previous directory asynchronously upon dir change
sstables/storage: futurize change_dir_for_test()
sstables/storage: sync through "opened_directory" in filesystem...::move()
sstables/storage: sync through "opened_directory" in the "easy" cases
sstables/storage: introduce "opened_directory" class
In v4 of scylladb/scylladb#19598 the last commit of the patch was replaced but this change missed merge so submitting it in a separate patch.
In the current patch, the original functions class correctly marks methods as const where appropriate, and the instance() method now returns a const object. This ensures protection against accidental modifications, as all changes must go through the change_batch object.
Since the functions_changer class was intended to serve the same purpose, it is now redundant. Therefore, we are reverting the commit that introduced it.
Relates scylladb/scylladb#19153Closesscylladb/scylladb#19647
* github.com:scylladb/scylladb:
cql3: functions: replace template with std::function in with_udf_iter()
cql3: functions: improve functions class constness handling
Revert "cql3: functions: make modification functions accessible only via batch class"
Currently change_dir_for_test() is synchronous. Make it return a future,
so that we can use async operations in change_dir_for_test() overrides.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.
The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.
Fixes#19722
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Pass origin when opening the sstable from the writer and store it in the
sstable object. This will make the origin available for the entire write
path.
Closesscylladb/scylladb#19721
* github.com:scylladb/scylladb:
sstables: use _origin in write path
sstable::open_sstable: pass and store origin
This PR adds support for aborting index reads from within `index_consume_entry_context::consume_input` when the server is being stopped. The abort source is now propagated down to the `index_consume_entry_context`, making it available for `consume_input` to check if an abort has been requested. If an abort is detected, `consume_input` will throw an exception to stop the index read operation.
Closesscylladb/scylladb#19453
* github.com:scylladb/scylladb:
test/boost: test abort behaviour during index read
sstables/index_reader: stop consuming index when abort has been requested
sstables::index_consume_entry_context: store abort_source
sstable: drop old filter only after the new filter is built during rebuild
sstables/sstables_manager: store abort_source in sstable_manager
replica/database: pass abort_source to database constructor
Pass origin when opening the sstable from the writer and store it in the
sstable object. This will make the origin available for the entire write
path.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Added a new boost test, index_reader_test, with a testcase to verifyi
the abort behaviour during an index read using
index_consume_entry_context.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Add a new member that stores the abort_source. This can later be used by
the sstables to check if an abort has been requested. Also implement
sstables_manager::get_abort_source() that returns a const reference to
the abort source.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
This is in preparation for the following patch that adds abort_source
variable to the sstables_manager.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
This patch is a follow-up to scylladb/scylladb#16585.
Once we have service levels on raft, we can get rid of update loop, which updates the configuration in a configured interval (default is 10s).
Instead, this PR introduces methods to `group0_state_machine` which look through table ids in mutations in `write_mutation` and update submodules based on that ids.
Fixes: scylladb/scylladb#18060Closesscylladb/scylladb#18758
* github.com:scylladb/scylladb:
test: remove `sleep()`s which were required to reload service levels configuration
test/cql_test_env: remove unit test service levels data accessors
service/storage_service: reload SL cache on topology_state_load()
service/qos/service_level_controller: move semaphore breaking to stop
service/qos/service_level_controller: maybe start and stop legacy update loop
service/qos/service_level_controller: make update loop legacy
raft/group0_state_machine: update submodules based on table_id
service/storage_service: add a proxy method to reload sl cache
Unit test data accessors were created to avoid starting update loop in
unit test and to update controller's configuration directly.
With raft data accessor and configuration updates on applying raft log,
we can get rid of unit test data accessors and use the raft one.
This also make unit test env a bit like real Scylla environment.
It was added to make integration of storage groups easier, but it's
complicated since it's another source of truth and we could have
problems if it becomes inconsistent with the group map.
Fixes#18506.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The following command had been executed to get the
list of headers that did not contain '#pragma once':
'grep -rnw . -e "#pragma once" --include *.hh -L'
This change adds missing include guard to headers
that did not contain any guard.
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#19626
before this change, we provide `boost_test_print_type()` for all types
which can be formatted using {fmt}. these types includes those who
fulfill the concept of range, and their element can be formatted using
{fmt}. if the compilation unit happens to include `fmt/ranges.h`.
the ranges are formatted with `boost_test_print_type()` as well. this
is what we expect. in other words, we use {fmt} to format types which
do not natively support {fmt}, but they fulfill the range concept.
but `boost::unit_test::basic_cstring` is one of them
- it can be formatted using operator<<, but it does not provide
fmt::format specialization
- it fulfills the concept of range
- and its element type is `char const`, which can be formatted using
{fmt}
that's why it's formatted like:
```
test/boost/sstable_directory_test.cc(317): fatal error: in "sstable_directory_test_generation_sanity": critical check ['s', 's', 't', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')', ' ', '=', '=', ' ', 's', 's', 't', '1', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')'] has failed`
```
where the string is formatted as a sequence-alike container. this
is far from readable.
so, in this change, we do not define `boost_test_print_type()` for
the types which natively support `operator<<` anymore. so they can
be printed with `operator<<` when boost::test prints them.
Fixesscylladb/scylladb#19637
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#19638
This is done to ease code reuse in the following commit.
It'd also help should we ever want properly mount functions
class to schema object instead of static storage.