for better readability.
presumably, `sstable::seal_sstable()` is not on the critical path,
and we don't need to worry about the overhead of using C++20 coroutine.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20410
in 372a4d1b79, we introduced a change
which was for debugging the logging message. but the logging message
intended for printing the temp_dir not prints an `optional<int>`. this
is both confusing, and more importantly, it hurts the debuggability.
in this change, the related change is reverted.
Fixesscylladb/scylladb#20408
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20409
The method starts a task that uses sstables_loader load-and-stream
functionality to bring new sstables into the cluster. The existing
load-and-stream picks up sstables from upload/ directory, the newly
introduced task collects them from S3 bucket and given prefix (that
correspond to the path where backup API method put them).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Current S3 storage driver keeps sstables in bucket in a form of
/bucket/generation/component-name
To get sstables that are backed up on S3 this format doesn't apply,
because components are uploaded with their names unmodified. This patch
makes S3 storage driver account for that and not re-format component
paths for upload sstable state.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When component lister is created it checks the target storage options
for what kind of lister to create. For local options it creates FS
lister that collects sstables from their component files. For S3
options, it relies on sstables registry.
When collecting sstables from backup, it's not possible to use registry,
because those entries are not there. Instead, lister should pick up
individual components as it they were on local FS. This patch prepares
the lister for that -- in case S3 options are provided and the sstables'
state is "upload", don't try to read those from registry, but
instantiate the FS lister that will later use s3::bucket_lister.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When sstable directory collects a entry from storage, it tries to parse
its full path with the help of sstables::parse_path(). There are two
overloads of that function -- one with ks:cf arguments and one without.
The latter tries to "guess" keyspace and table names from the directory
name.
However, ks and table names are already known by the directory, it
doesn't even use the returned ks and cf values, so this parsing is
excessive. Also, future patches will put here backup paths, that might
not match the ks_name/table_name-table_uuid/ pattern that the parser
expects.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch hides directory_lister and bucket_lister behind a common
facade. The intention is to provide a uniform API for sstable_directory
that it could use to list sstables' components wherever they are.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Typically the sstable_directory is constructed out of a table object.
Some code, namely tests and schema-loader, don't have table at hand and
construct directory out of schema, sharder, path-to-sstables, etc. This
code doesn't work with any storage options other than local ones, so
there's no need (yet) to carry this argument over.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20138
Drop half-reversed (legacy) format of query::partition_slice.
The select query builds a fully reversed (native) slice for reversed queries and use it together with a reversed
schema to construct query::read_command that is further propagated to the database.
A cluster feature is added to support nodes that still operate on half-reversed slices. When the feature is turned off:
- query::read_command is transformed (to have table schema and half-reversed slices) before sending to other nodes
- query::read_command is transformed (to have query schema (reversed) and reversed slices) after receiving it from other nodes
- Similarly, mutations are transformed. They are reversed before being sent to other nodes or after receiving them from other nodes.
Additional manual tests were performed to test a mixed-node cluster:
1. 3-node cluster with one node upgraded: reverse read queries performed on an old node
2. 3-node cluster with one node upgraded: reverse read queries performed on a new node
3. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on an old node
4. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on a new node
All reverse read queries above consists of:
- single-partition reverse reads with no clustering key restrictions, with single column restrictions and multi column restrictions both with and without paging turned on
- multi-partition reverse reads with range restrictions with optional partition limit and partial ordering
The exact same tests were also performed on a fully upgraded cluster.
Fixes https://github.com/scylladb/scylladb/issues/12557Closesscylladb/scylladb#18864
* github.com:scylladb/scylladb:
mutation_partition: drop reverse parameter in compact_for_query
clustering_key_filter: unify get_ranges and get_native_ranges
streamed_mutation_freezer: drop the reverse parameter
reverse-reads.md: Drop legacy reverse format information
Fix comments refering to half-reversed (legacy) slices
select_statement::do_execute: Add tracing informaction
query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges
query-request: Drop half_reverse_slice as it is no longer used anywhere
readers: Use reversed schema and native reversed slices
database: accept reversed schema for reversed queries
storage_proxy: Support reverse queries in native format
query_pagers: Replace _schema with _query_schema
query_pagers: Support reverse queries in native format
select_statement: Execute reversed query in native format
storage_proxy::remote: Add support for mixed-node clusters
mutation_query: Add reversed function to reverse reconcilable_result
query-request: Add reversed function to reverse read_command
features: add native_reverse_queries
kl::reader::make_reader: Unify interface with mx::reader::make_reader
config: drop reversed_reads_auto_bypass_cache
config: drop enable_optimized_reversed_reads
There are two load_sstable() overloads, and one of them is only used
inside process_descriptor(). What this loading helper does is, in fact,
processes given descriptor, so it's worth having it open-coded into its
caller.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The latter (caller) loads sstable, so does the former, so load it once
and then put it in either list/set, depending on flags and shard info.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some places that log sstable Data file name via sstable
descriptor. After previous patching all those loggers have sstable at
hand and can use sstable::get_filename() instead.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In order to decide which list to put sstable into, the sort_sstable()
first calls get_shards_for_this_sstable() which loads the sstable
anyway. If loaded shards contain only the current one (which is the
common case) sstable is loaded again. In fact, if the sstable happens to
be remote it's loaded anyway to get its open info.
Fix that by loading sstable, then getting shards directly from it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The reconcilable_result is built as it would be constructed for
forward read queries for tables with reversed order.
Mutations constructed for reversed queries are consumed forward.
Drop overloaded reversed functions that reverse read_command and
reconcilable_result directly and keep only those requiring smart
pointers. They are not used any more.
instead of using `std::rethrow_exception()`, use
`coroutine::return_exception_ptr()` which is a little bit more
efficient.
See also 6cafd83e1c
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20001
Commit ad0e6b79 (replica: Remove all_datadir from keyspace config) removed all_datadirs from keyspace config, now it's datadir turn. After this change keyspace no longer references any on-disk directories, only the sstables's storage driver attached to keyspace's tables does.
refs #12707Closesscylladb/scylladb#19866
* github.com:scylladb/scylladb:
replica: Remove keyspace::config::datadir
sstables/storage: Evaluate path for keyspace directory in storage
sstables/storage: Add sstables_manager arg to init_keyspace_storage()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
Currently, delete_atomically can be called with
a list of sstables from mixed prefixes in two cases:
1. truncate: where we delete all the sstables in the table directory
2. tablet cleanup: similar to truncate but restricted to sstables in a
single tablet replica
In both cases, it is possible that sstables in staging (or quarantine)
are mixed with sstables in the base directory.
Until a more comprehensive fix is in place,
(see https://github.com/scylladb/scylladb/pull/19555)
this change just lifts the ban on atomic deletion
of sstables from different prefixes, and acknowledging
that the implementation is not atomic across
prefixes. This is better than crashing for now,
and can be backported more easily to branches
that support tablets so tablet migration can
be done safely in the presence of repair of
tables with views.
Refs scylladb/scylladb#18862
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#19816
Currently the init_keyspace_storage() expects that the caller would
tell it where the ks directory is, but it's not nice as keyspace may
not necessarity keep its sstables in any directory.
This patch moves the directory path evaluation into storage code,
specifically to the lambda that is called for on-disk sstables. The
way directory is evaluated mirrors the one from make_keyspace_config()
that will be removed by next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.
The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.
Fixes https://github.com/scylladb/scylladb/issues/19722Closesscylladb/scylladb#19717
* github.com:scylladb/scylladb:
boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded
sstables: do not reload components of unlinked sstables
sstables/sstables_manager: introduce on_unlink method
filesystem_storage::create_links_common() runs on directories that
generally differ from "_dir", thus, we can't replace its sync_directory()
calls with _dir.sync(). We can still use a common (temporary)
"opened_directory" object for synching "dst_dir" three times, saving two
open and two close operations.
This patch is best viewed with "git show -W".
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
In "filesystem_storage", change_dir_for_test() and move() replace "_dir"
with "opened_directory(new_dir)" using the move assignment operator.
Consequently, the file descriptor underlying "_dir" is closed
synchronously as a part of object destruction.
Expose the async file::close() function through "opened_directory".
Introduce filesystem_storage::change_dir() as a common async workhorse for
both change_dir_for_test() and move(). In change_dir(), close the old
directory asynchronously.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Currently change_dir_for_test() is synchronous. Make it return a future,
so that we can use async operations in change_dir_for_test() overrides.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Near the end of filesystem_storage::move(), we sync both the old
directory, and the new directory, if "delay_commit" is null. At that
point, the new directory is just "_dir"; call _dir.sync() instead of
sync_directory().
This patch is best viewed with "git show -W".
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Replace
sst.sstable_write_io_check(sync_directory, _dir.native())
with
_dir.sync(sst._write_error_handler)
Also replace the explicit (but still relatively "easy")
open_checked_directory() + flush() + flush() operations in
filesystem_storage::seal() with two _dir.sync() calls.
Because filesystem_storage::create_links_common() is marked "const", we
need to declare "_dir" mutable.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
"filesystem_storage::_dir" is currently of type "std::filesystem::path".
Introduce a new class called "opened_directory", and change the type of
"_dir" to the new class "opened_directory".
"opened_directory" keeps the directory open, and offers synchronization on
that open directory (i.e., without having to reopen the directory every
time). In subsequent patches, that will be put to use.
The opening and closing of the wrapped directory cannot easily be handled
explicitly in the "filesystem_storage" member functions.
(
Namely, test::store() and test::rewrite_toc_without_scylla_component()
-- both in "test/lib/sstable_utils.hh" -- perform "open -> ... -> seal"
sequences, and such a sequence may be executed repeatedly. For example,
sstable_directory_shared_sstables_reshard_correctly()
[test/boost/sstable_directory_test.cc] does just that; it "reopens" the
"filesystem_storage" object repeatedly.
)
Rather than trying to restrict the order of "filesystem_storage" member
function calls, replace the "opened_directory" object with a new one
whenever the directory pathname is re-set; namely in
filesystem_storage::change_dir_for_test() and filesystem_storage::move().
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
The SSTable is removed from the reclaimed memory tracking logic only
when its object is deleted. However, there is a risk that the Bloom
filter reloader may attempt to reload the SSTable after it has been
unlinked but before the SSTable object is destroyed. Prevent this by
removing the SSTable from the reclaimed list maintained by the manager
as soon as it is unlinked.
The original logic that updated the memory tracking in
`sstables_manager::deactivate()` is left in place as (a) the variables
have to be updated only when the SSTable object is actually deleted, as
the memory used by the filter is not freed as long as the SSTable is
alive, and (b) the `_reclaimed.erase(*sst)` is still useful during
shutdown, for example, when the SSTable is not unlinked but just
destroyed.
Fixes#19722
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Added a new method, on_unlink() to the sstable_manager. This method is
now used by the sstable to notify the manager when it has been unlinked,
enabling the manager to update its bookkeeping as required. The
on_unlink method doesn't do anything yet but will be updated by the next
patch.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Pass origin when opening the sstable from the writer and store it in the
sstable object. This will make the origin available for the entire write
path.
Closesscylladb/scylladb#19721
* github.com:scylladb/scylladb:
sstables: use _origin in write path
sstable::open_sstable: pass and store origin
This PR adds support for aborting index reads from within `index_consume_entry_context::consume_input` when the server is being stopped. The abort source is now propagated down to the `index_consume_entry_context`, making it available for `consume_input` to check if an abort has been requested. If an abort is detected, `consume_input` will throw an exception to stop the index read operation.
Closesscylladb/scylladb#19453
* github.com:scylladb/scylladb:
test/boost: test abort behaviour during index read
sstables/index_reader: stop consuming index when abort has been requested
sstables::index_consume_entry_context: store abort_source
sstable: drop old filter only after the new filter is built during rebuild
sstables/sstables_manager: store abort_source in sstable_manager
replica/database: pass abort_source to database constructor
Now that the origin is available inside the sstable object, no need to
pass it to the methods called in the write path.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Pass origin when opening the sstable from the writer and store it in the
sstable object. This will make the origin available for the entire write
path.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
When an abort is requested, stop further reading of the index file and
throw and exception from index_consume_entry_context::process_state().
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Store abort source inside sstables::index_consume_entry_context, so that
the next patch can implement cancelling the index read when abort is
requested.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
sstable::maybe_rebuild_filter_from_index drops the existing filter first
and then rebuilds the new filter as the method is only called before the
sstable is sealed. But to make the index read abortable, the old filter
can be dropped only after the new filter is built so that in case if the
index consumer gets aborted, we still have the old filter to write to
disk.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Add a new member that stores the abort_source. This can later be used by
the sstables to check if an abort has been requested. Also implement
sstables_manager::get_abort_source() that returns a const reference to
the abort source.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).
It's easy to mix up the two and break something as a result.
The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.
The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.
The problem fixed by this patch is that the writer was wrongly creating
the compressor objects based on its own schema, but using them based
based on the sstable's schema the sstable's schema.
This patch forces the writer to use the sstable's schema for both.
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).
It's easy to mix up the two and break something as a result.
The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.
The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.
The problem fixed by this patch is that the writer was wrongly creating
the filter based on its own schema, while the layer outside the writer
was interpreting it as if it was created with the sstable's schema.
This patch forces the writer to pick the filter's parameters based on the
sstable's schema instead.
As of this patch, those static_casts are actually invalid in some cases
(they cast to the wrong type) because of an oversight.
A later patch will fix that. But to even write a reliable reproducer
for the problem, we must force the invalid casts to manifest as a crash
(instead of weird results).
This patch both allows writing a reproducer for the bug and serves
as a bit of defensive programming for the future.