Allow create_pending_deletion_log to delete a bunch of sstables
potentially resides in different prefixes (e.g. in the base directory
and under staging/).
The motivation arises from table::cleanup_tablet that calls compaction_group::cleanup on all cg:s via cleanup_compaction_groups. Cleanup, in turn, calls delete_sstables_atomically on all sstables in the compaction_group, in all states, including the normal state as well as staging - hence the requirement to support deleting sstables in different sub-directories.
Also, apparently truncate calls delete_atomically for all sstables too, via table::discard_sstables, so if it happened to be executed during view update generation, i.e. when there are sstables in staging, it should hit the assertion failure reported in https://github.com/scylladb/scylladb/issues/18862 as well (although I haven't seen it yet, but I see no reason why it would happen). So the issue was apparently present since the initial implementation of the pending_delete_log. It's just that with tablet migration it is more likely to be hit.
Fixesscylladb/scylladb#18862
Needs backport to 6.0 since tablets require this capability
Closesscylladb/scylladb#19555
* github.com:scylladb/scylladb:
sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory
sstables: storage: keep base directory in base class
sstables: storage: define opened_directory in header file
sstable_directory: use only dirlog
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
To be able to atomically delete sstables both in
base table directory and in its sub-directories,
like `staging/`, use a shared pending_delete_dir
under under the base directory.
Note that this requires loading and processing
the base directory first.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, there are leftover log messages using
sstlog rather than dirlog, that was introduced
in aebd965f0e,
and that makes debugging harder.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Squash call to lister.get() and check for the returned value into
while()'s condition. This saves few more lines of code as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method already uses yielding lister, but handles the exceptions
explicitly. Use with_closeable() helper, it makes the code shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method starts a task that uses sstables_loader load-and-stream
functionality to bring new sstables into the cluster. The existing
load-and-stream picks up sstables from upload/ directory, the newly
introduced task collects them from S3 bucket and given prefix (that
correspond to the path where backup API method put them).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When component lister is created it checks the target storage options
for what kind of lister to create. For local options it creates FS
lister that collects sstables from their component files. For S3
options, it relies on sstables registry.
When collecting sstables from backup, it's not possible to use registry,
because those entries are not there. Instead, lister should pick up
individual components as it they were on local FS. This patch prepares
the lister for that -- in case S3 options are provided and the sstables'
state is "upload", don't try to read those from registry, but
instantiate the FS lister that will later use s3::bucket_lister.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When sstable directory collects a entry from storage, it tries to parse
its full path with the help of sstables::parse_path(). There are two
overloads of that function -- one with ks:cf arguments and one without.
The latter tries to "guess" keyspace and table names from the directory
name.
However, ks and table names are already known by the directory, it
doesn't even use the returned ks and cf values, so this parsing is
excessive. Also, future patches will put here backup paths, that might
not match the ks_name/table_name-table_uuid/ pattern that the parser
expects.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch hides directory_lister and bucket_lister behind a common
facade. The intention is to provide a uniform API for sstable_directory
that it could use to list sstables' components wherever they are.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Typically the sstable_directory is constructed out of a table object.
Some code, namely tests and schema-loader, don't have table at hand and
construct directory out of schema, sharder, path-to-sstables, etc. This
code doesn't work with any storage options other than local ones, so
there's no need (yet) to carry this argument over.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20138
There are two load_sstable() overloads, and one of them is only used
inside process_descriptor(). What this loading helper does is, in fact,
processes given descriptor, so it's worth having it open-coded into its
caller.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The latter (caller) loads sstable, so does the former, so load it once
and then put it in either list/set, depending on flags and shard info.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some places that log sstable Data file name via sstable
descriptor. After previous patching all those loggers have sstable at
hand and can use sstable::get_filename() instead.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In order to decide which list to put sstable into, the sort_sstable()
first calls get_shards_for_this_sstable() which loads the sstable
anyway. If loaded shards contain only the current one (which is the
common case) sstable is loaded again. In fact, if the sstable happens to
be remote it's loaded anyway to get its open info.
Fix that by loading sstable, then getting shards directly from it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
instead of using `std::rethrow_exception()`, use
`coroutine::return_exception_ptr()` which is a little bit more
efficient.
See also 6cafd83e1c
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20001
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
Currently, delete_atomically can be called with
a list of sstables from mixed prefixes in two cases:
1. truncate: where we delete all the sstables in the table directory
2. tablet cleanup: similar to truncate but restricted to sstables in a
single tablet replica
In both cases, it is possible that sstables in staging (or quarantine)
are mixed with sstables in the base directory.
Until a more comprehensive fix is in place,
(see https://github.com/scylladb/scylladb/pull/19555)
this change just lifts the ban on atomic deletion
of sstables from different prefixes, and acknowledging
that the implementation is not atomic across
prefixes. This is better than crashing for now,
and can be backported more easily to branches
that support tablets so tablet migration can
be done safely in the presence of repair of
tables with views.
Refs scylladb/scylladb#18862
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#19816
The sstable_directory::sstable_filename() should generate a name of an
sstable for log messages. It's not accurate, because it silently assumes
that the filename is on local storage, which might not be the case.
Fixing it is large chage, so for now replace _sstable_dir with explicit
call to make_path(). The change is idempotent, as _sstable_dir is
initialized with the result of make_path() call in constructor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
At the beginning of .process() method there's a log message which path
and which storage is being processed. That's not really nice, because,
e.g. filesystem lister may skip processing quarantine directory. Also,
the registry lister doesn't list entries by their _sstable_dir, but
rather by its _location (spoiler: dir = location / state).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When populating sstables on boot a bunch of sstable_directory objects is
created. For each sstable there come three -- one for normal, quarantine
and staging state. Each is initialized with sstable location (which is
now a datadir/ks_name/cf_name-and-uuid) and the desired state (a enum
class). When created, the directory object wires up component lister,
depending on which storage options are provided. For local sstables a
legacy filesystem lister is created and it's initialized with a path
where to search files for -- location + / + string(state). But for s3
sstables, that keep their entries in registry, the lister is
errorneously initialized with the same location + / + string(state)
value. The mistake is that sstables in registry keep location and state
in different columns, so for any state lister should query registry with
the same location value (then it filters entries by state on its own).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The semaphore in question is used to limit parallelism of manipulations with table's sstables. It's currently used in two places -- sstable_directory (mainly on boot) and by table::take_snapshot() to take snapshot. For the latter, there's also a database -> sharded<directory_semaphore> reference.
This PR sanitizes the semaphore usage. The results are
- directory_semaphore no longer needs to friend several classes that mess with its internals
- database no longer references directory_semaphore
Closesscylladb/scylladb#18281
* github.com:scylladb/scylladb:
database: Keep local directory_semaphore to initialize sstables managers
database: Don't reference directory_semaphore
table: Use directory semaphore from sstables manager
table: Indentation fix after previous patch
table: Use directory_semaphore for rate-limited snapshot taking
sstables: Move directory_semaphore::parallel_for_each() to header
sstables: Move parallel_for_each_restricted to directory_semaphore
table: Use smp::all_cpus() to iterate over all CPUs locally
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.
Refs scylladb#13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
It's a template and in order to use it in other .cc files it's more
convenient to move it into a header file
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In order not to make sstable_directory mess with private members of this
class. Next patch will also make use of this new method.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
sstables_manager now depends on system_keyspace for access to the
system.sstables table, needed by object storage. This violates
modularity, since sstables_manager is a relatively low-level leaf
module while system_keyspace integrates large parts of the system
(including, indirectly, sstables_manager).
One area where this is grating is sstables::test_env, which has
to include the much higher level cql_test_env to accommodate it.
Fix this by having sstables_manager expose its dependency on
system_keyspace as an interface, sstables_registry, and have
system_keyspace implement the glue logic in
system_keyspace_sstables_manager.
Closesscylladb/scylladb#17868
load-and-stream is currently the only method -- for tablets -- that
can load SSTables while the node is online.
Today, sstable_directory relies on replication map (erm) not being
invalidated during loading, and the assumption is broken with
concurrent tablet migration.
It causes load-and-stream to segfault.
The sstable loader needs the sharder from erm in order to compute
the owning shard.
To fix, let's use auto_refreshing_sharder, which refreshes sharder
every time table has replication map updated. So we guarantee any
user of sharder will find it alive throughout the lifetime of
sstable_directory.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.
Replace with seastar::future::get(), which does the same thing.
The helper in question complicates the logic of sstable_directory::process() by making garbage collection differently for sstables deleted "atomically" and deleted "one-by-one". Also, the code that deletes sstables one-by-one and uses remove_by_toc_name() renders excessive TOC file reading, because there's sstable object at hand and it had all_components() ready for use.
Surprisingly, there was no test for the deletion-log functionality. This PR adds one. The test passes before the g.c. and regular unlink fix, and (of course) continues passing after it.
Closesscylladb/scylladb#16240
* github.com:scylladb/scylladb:
sstables: Drop remove_by_name()
sstables/fs_storage: Wipe by recognized+unrecognized components
sstable_directory: Enlight deletion log replay
sstables: Split remove_by_toc_name()
test: Add test case to validate deletion log work
sstable_directory: Close dir on exception
sstable_directory: Fix indentation after previous patch
sstable_directory: Coroutinize delete_with_pending_deletion_log()
test: Sstable on_delete() is not necessarily in a thread
sstable_directory: Split delete_with_pending_deletion_log()
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Garbage collection of sstables is scattered between two strages -- g.c.
per-se and the regular processing.
The former stage collects deletion logs and for each log found goes
ahead and deletes the full sstable with the standard sequence:
- move TOC -> TOC.tmp
- remove components
- remove TOC.tmp
The latter stage picks up partially unlinked sstables that didn't go via
atomic deletion with the log. This comes as
- collect all components
- keep TOC's and TOC.tmp's in separate lists
- attach other components to TOC/TOC.tmp by generation value
- for all TOC.tmp's get all attached components and remove them
- continue loading TOC's with attached components
Said that, replaying deletion log can be as light as just the first step
out of the above sequence -- just move TOC to TOC.tmp. After that the
regular processing would pick the remaining components and clean them
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When committing the deletion log creation its containing directory is
sync-ed via opened file. This place is not exception safe and directory
can be left unclosed
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The helper consists of three parts -- prepare the deletion log, unlink
sstables and drop the deletion log. For testing the first part is needed
as a separate step, so here's this split.
It renders two nested async contexts, but it will change soon.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now when the system.sstables has the state field, it can be changed
(UPDATEd). However, when changing the state AND generation, this still
won't work, because generation is the clustering key of the table in
question and cannot be just changed. This, nonetheless, is OK, as
generation changes with state only when moving an sstable from upload
dir into normal/staging and this is separate issue for S3 (#13018). For
now changing state only is OK.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The state is one of <empty>(normal)/staging/quarantine. Currently when
sstable is moved to non-normal state the s3 backend state_change() call
throws thus such sstables do not appear. Next patches are going to
change that and the new field in the system.sstables is needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>