scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 07:23:15 +00:00

Author	SHA1	Message	Date
Nadav Har'El	edfb89ef65	sstables: stop warning when auto-snapshot leaves non-empty directory When a table is dropped, we delete its sstables, and finally try to delete the table's top-level directory with the rmdir system call. When the auto-snapshot feature is enabled (this is still Scylla's default), the snapshot will remain in that directory so it won't be empty and will cannot be removed. Today, this results in a long, ugly and scary warning in the log: ``` WARN 2023-07-06 20:48:04,995 [shard 0] sstable - Could not remove table directory "/tmp/scylla-test-198265/data/alternator_alternator_Test_1688665684546/alternator_Test_1688665684546-4238f2201c2511eeb15859c589d9be4d/snapshots": std::filesystem::__cxx11::filesystem_error (error system:39, filesystem error: remove failed: Directory not empty [/tmp/scylla-test-198265/data/alternator_alternator_Test_1688665684546/alternator_Test_1688665684546-4238f2201c2511eeb15859c589d9be4d/snapshots]). Ignored. ``` It is bad to log as a warning something which is completely normal - it happens every time a table is dropped with the perfectly valid (and even default) auto-snapshot mode. We should only log a warning if the deletion failed because of some unexpected reason. And in fact, this is exactly what the code tried to do - it does not log a warning if the rmdir failed with EEXIST. It even had a comment saying why it was doing this. But the problem is that in Linux, deleting a non-empty directory does not return EEXIST, it returns ENOTEMPTY... Posix actually allows both. So we need to check both, and this is the only change in this patch. To confirm this that this patch works, edit test/cql-pytest/run.py and change auto-snapshot from 0 to 1, run test/alternator/run (for example) and see many "Directory not empty" warnings as above. With this patch, none of these warnings appear. Fixes #13538 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #14557	2023-07-07 11:08:10 +02:00
Kefu Chai	04434c02b3	sstables: print generation without {:d} the formatter for sstables::generation_type does not support "d" specifier, so we should not use "{:d}" for printing it. this works before `d7c90b5239`, but after that change, generation_type is not an alias of int64_t anymore. and its formatter does not support "d", so we should either specialize fmt::formatter<generation_type> to support it or just drop the specifier. since seastar::format() is using ```c++ fmt::format_to(fmt::appender(out), fmt::runtime(fmt), std::forward<A>(a)...); ``` to print the arguments with given fmt string, we cannot identify these kind of error at compile time. at runtime, if we have issues like this, {fmt} would throw exception like: ``` terminate called after throwing an instance of 'fmt::v9::format_error' what(): invalid format specifier ``` when constructing the `std::runtime_error` instance. so, in this change, "d" is removed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14427	2023-07-03 13:53:13 +03:00
Raphael S. Carvalho	1d8cb32a5d	table: Optimize creation of reader excluding staging for view building View building from staging creates a reader from scratch (memtable + sstables - staging) for every partition, in order to calculate the diff between new staging data and data in base sstable set, and then pushes the result into the view replicas. perf shows that the reader creation is very expensive: + 12.15% 10.75% reactor-3 scylla [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes + 10.01% 9.99% reactor-3 scylla [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 8.95% 8.94% reactor-3 scylla [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator() + 7.29% 7.28% reactor-3 scylla [.] dht::ring_position_tri_compare + 6.28% 6.27% reactor-3 scylla [.] dht::tri_compare + 4.11% 3.52% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 4.09% 4.07% reactor-3 scylla [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state + 3.46% 0.93% reactor-3 scylla [.] sstables::sstable_run::will_introduce_overlapping + 2.53% 2.53% reactor-3 libstdc++.so.6 [.] std::_Rb_tree_increment + 2.45% 2.45% reactor-3 scylla [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.14% 2.13% reactor-3 scylla [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> > + 2.07% 2.07% reactor-3 scylla [.] logalloc::region_impl::free + 2.06% 1.91% reactor-3 scylla [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()#1}::operator()() const::{lambda()#1}::operator() + 2.04% 2.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+ 1.87% 0.00% reactor-3 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 1.86% 0.00% reactor-3 [kernel.kallsyms] [k] do_syscall_64 + 1.39% 1.38% reactor-3 libc.so.6 [.] __memcmp_avx2_movbe + 1.37% 0.92% reactor-3 scylla [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables:: + 1.34% 1.33% reactor-3 scylla [.] logalloc::region_impl::alloc_small + 1.33% 1.33% reactor-3 scylla [.] seastar::memory::small_pool::add_more_objects + 1.30% 0.35% reactor-3 scylla [.] seastar::reactor::do_run + 1.29% 1.29% reactor-3 scylla [.] seastar::memory::allocate + 1.19% 0.05% reactor-3 libc.so.6 [.] syscall + 1.16% 1.04% reactor-3 scylla [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst + 1.07% 0.79% reactor-3 scylla [.] sstables::partitioned_sstable_set::insert That shows some significant amount of work for inserting sstables into the interval map and maintaining the sstable run (which sorts fragments by first key and checks for overlapping). The interval map is known for having issues with L0 sstables, as it will have to be replicated almost to every single interval stored by the map, causing terrible space and time complexity. With enough L0 sstables, it can fall into quadratic behavior. This overhead is fixed by not building a new fresh sstable set when recreating the reader, but rather supplying a predicate to sstable set that will filter out staging sstables when creating either a single-key or range scan reader. This could have another benefit over today's approach which may incorrectly consider a staging sstable as non-staging, if the staging sst wasn't included in the current batch for view building. With this improvement, view building was measured to be 3x faster. from INFO 2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s to INFO 2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s Refs #14089. Fixes #14244. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 22:30:39 -03:00
Kefu Chai	f014ccf369	Revert "Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai"" This reverts commit `562087beff`. The regressions introduced by the reverted change have been fixed. So let's revert this revert to resurrect the uuid_sstable_identifier_enabled support. Fixes #10459	2023-06-21 13:02:40 +03:00
Tomasz Grabiec	34f28aa0cb	sstables: Add trace-level logging related to shard calculation	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	18f567385c	sstable_directory: Improve trace-level logging	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	ad983ac23d	sstables: Compute sstable shards using sharder from erm when loading schema::get_sharder() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should obtain the sharder from erm::get_sharder().	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	17d6163548	sstables: Generate sharding metadata using sharder from erm when writing We need to keep sharding metadata consistent with tablet mapping to shards in order for node restart to detect that those sstables belong to a single shard and that resharding is not necessary. Resharding of sstables based on tablet metadata is not implemented yet and will abort after this series. Keeping sharding metadata accurate for tablets is only necessary until compaction group integration is finished. After that, we can use the sstable token range to determine the owning tablet and thus the owning shard. Before that, we can't, because a single sstable may contain keys from different tablets, and the whole key range may overlap with keys which belong to other shards.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	fe7922d65c	sstables: Move compute_shards_for_this_sstable() to load() Soon, compute_shards_for_this_sstable() will need to take a sharder object. open_data() is called indirectly from sstable::load() and directly after writing an sstable from various paths. The latter don't really need to compute shards, since the field is already set by the writer. In order to reduce code churn, move compute_shards_for_this_sstable() to the load() path only so that only load() needs to take the sharder.	2023-06-21 00:58:24 +02:00
Tomasz Grabiec	390bcf3fae	dht: Take sharder externally in splitting functions We need those functions to work with tablet sharder, which is not accessible through schema::get_sharder(). In order to propagate the right sharder, those functions need to take it externally rather from the schema object. The sharder will come from the effective_replication_map attached to the table object. Those splitting functions are used when generating sharding metadata of an sstable. We need to keep this sharding metadata consistent with tablet mapping to shards in order for node restart to detect that those sstables belong to a single shard and that resharding is not necessary. Resharding of sstables based on tablet metadata is not implemented yet and will abort after this series. Keeping sharding metadata accurate for tablets is only necessary until compaction group integration is finished. After that, we can use the sstable token range to determine the owning tablet and thus the owning shard. Before that, we can't, because a single sstable may contain keys from different tablets, and the whole key range may overlap with keys which belong to other shards.	2023-06-21 00:58:24 +02:00
Botond Dénes	562087beff	Revert "Merge 'treewide: add uuid_sstable_identifier_enabled support' from Kefu Chai" This reverts commit `d1dc579062`, reversing changes made to `3a73048bc9`. Said commit caused regressions in dtests. We need to investigate and fix those, but in the meanwhile let's revert this to reduce the disruption to our workflows. Refs: #14283	2023-06-19 08:49:27 +03:00
Kefu Chai	2d265e860d	replica,sstable: introduce invalid generation id the invalid sstable id is the NULL of a sstable identifier. with this concept, it would be a lot simpler to find/track the greatest generation. the complexity is hidden in the generation_type, which compares the a) integer-based identifiers b) uuid-based identifiers c) invalid identitifer in different ways. so, in this change * the default constructor generation_type is now public. * we don't check for empty generation anymore when loading SSTables or enumerating them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-06-15 17:54:59 +08:00
Kefu Chai	939fa087cc	sstables, replica: pass uuid_sstable_identifiers to generation generator before this change, we assume that generation is always integer based. in order to enable the UUID-based generation identifier if the related option is set, we should populate this option down to generation generator. because we don't have access to the cluster features in some places where a new generation is created, a new accessor exposing feature_service from sstable manager is added. Fixes #10459 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-06-15 17:54:59 +08:00
Kefu Chai	15543464ce	sstables, replica: support UUID in generation_type this change generalize the value of generation_type so it also supports UUID based identifier. * sstables/generation_type.h: - add formatter and parse for UUID. please note, Cassandra uses a different format for formatting the SSTable identifier. and this formatter suits our needs as it uses underscore "_" as the delimiter, as the file name of components uses dash "-" as the delimiter. instead of reinventing the formatting or just use another delimiter in the stringified UUID, we choose to use the Cassandra's formatting. - add accessors for accessing the type and value of generation_type - add constructor for constructing generation_type with UUID and string. - use hash for placing sstables with uuid identifiers into shards for more uniformed distrbution of tables in shards. * replica/table.cc: - only update the generator if the given generation contains an integer * test/boost: - add a simple test to verify the generation_type is able to parse and format Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-06-15 17:54:59 +08:00
Pavel Emelyanov	d1de796f6b	sstable: Move XFS renamer hack into fs storage The method sits on sstable, but is called only from fs storage and it's the only place that really needs it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14230	2023-06-14 12:35:04 +03:00
Botond Dénes	3479adc85f	Merge 'Prepare sstable_directory lister to garbage_collect() s3 stuff' from Pavel Emelyanov When scylla starts it collects dangling sstables from the datadir. It includes temporary sstable directories and pending-deletion log. S3-backed sstables cannot be garbage-collected like that, instead "garbage" entries from the ownership table should be processed. Currently the g.c. code is unaware of storage and scans datadir for whatever sstable it's called for. This PR prepares the garbage_collect() call to become virtual, but no-op for ownership-table lister. Proper S3 garbage-collecting is not yet here, it needs an extra patch to seastar http client. refs: #13024 Closes #14023 * github.com:scylladb/scylladb: sstable_directory: Do not collect filesystem garbage for S3-backed sstables sstable_directory: Deduplicate .process() location argument sstable_directory: Keep directory lister on stack sstable_directory: Use directory_lister API directly	2023-06-14 12:06:37 +03:00
Pavel Emelyanov	c68c154fb6	code: Reduce tracing/hh fanout There are some headers that include tracing/.hh ones despite all they need is forward-declared trace_state_ptr Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14155	2023-06-07 19:19:22 +03:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Benny Halevy	c685ef9e71	partitioned_sstable_set: insert: return early if sst is already in the set Currently, partitioned_sstable_set::insert may erase a sstable from the set inadvertently, if an exception is thrown while (re-)inserting it. To prevent that, simply return early after detecting that insertion didn't took place, based on the unordered_set::insert result. This issue is theoretical, as there are no known case of re-inserting sstables into the partitioned sstable set. Fixes #14060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #14061	2023-05-29 23:03:25 +03:00
Benny Halevy	26705ba6af	partitioned_sstable_set: erase empty runs When erasing a sstable first check if its run_id exists in _all_runs, otherwise do nothing with that respect, and then if the run becomes empty when erasing the last sstable (and it could have been a single-sstable run from get go), erase the run from `_all_runs`. Fixes #14052 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #14054	2023-05-29 23:03:24 +03:00
Botond Dénes	5a14c3311a	Merge 'Break S3 upload 50Gb file limit' from Pavel Emelyanov Current S3 uploading sink has implicit limit for the final file size that comes from two places. First, S3 protocol declares that uploading parts count from 1 to 10000 (inclusive). Second, uploading sink sends out parts once they grow above S3 minimal part size which is 5Mb. Since sstables puts data in 128kb (or smaller) portions, parts are almost exactly 5Mb in size, so the total uploading size cannot grow above ~50Gb. That's too low. To break the limit the new sink (called jumbo sink) uses the UploadPartCopy S3 call that helps splicing several objects into one right on the server. Jumbo sink starts uploading parts into an intermediate temporary object called a piece and named ${original_object}_${piece_number}. When the number of parts in current piece grows above the configured limit the piece is finalized and upload-copied into the object as its next part, then deleted. This happens in the background, meanwhile the new piece is created and subsequent data is put into it. When the sink is flushed the current piece is flushed as is and also squashed into the object. The new jumbo sink is capable of uploading ~500Tb of data, which looks enough. fixes: #13019 Closes #13577 * github.com:scylladb/scylladb: sstables: Switch data and index sink to use jumbo uploader s3/test: Tune-up multipart upload test alignment s3/test: Add jumbo upload test s3/client: Wait for background upload fiber on close-abort c3/client: Implement jumbo upload sink s3/client: Move memory buffers to upload_sink from base s3/client: Move last part upload out of finalize_upload() s3/client: Merge do_flush() with upload_part() s3/client: Rename upload_sink -> upload_sink_base	2023-05-25 11:44:06 +03:00
Pavel Emelyanov	e435ec1b5e	sstable_directory: Do not collect filesystem garbage for S3-backed sstables The sstable_directory::garbage_collect() scans /var/lib/scylla for whatever sstable it's called for. S3-backed ones don't have anything there, so the g.c. run is no-op. Make this call be lister virtual method, so that only filesystem lister does this scan and the ownership table lister becomes the real no-op. Later it will be filled with code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-24 17:45:50 +03:00
Pavel Emelyanov	16d66f2fe9	sstable_directory: Deduplicate .process() location argument When sstable directory calls lister it passes the _sstable_dir as an argument. However, the very same _sstable_dir was used to construct the lister, and by now all the lister implementations keep this value aboard. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-24 17:43:36 +03:00
Pavel Emelyanov	d6b5e18cb3	sstable_directory: Keep directory lister on stack The directory_lister _lister exists as class member, but is only used once -- when the .process() is called -- and then is closed forever. It's simpler to keep the lister on the .process() stack. This change also makes filesystem lister keep the copy of directory as class member, it will be useful for the next patch as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-24 17:42:08 +03:00
Pavel Emelyanov	524614087a	sstable_directory: Use directory_lister API directly The filesystem components lister has private wrappers on top of directory lister it uses internally. These are lefrovers from making the sstable directory storage-aware, now they can be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-24 17:40:38 +03:00
Botond Dénes	2526b232f1	Merge 'Remove explicit default_priority_class() usage from sstable aux methods' from Pavel Emelyanov There are few places in sstables/ code that require caller to specify priority class to pass it along to file stream options. All these callers use default class, so it makes little sense to keep it. This change makes the sched classes unification mega patch a bit smaller. ref: #13963 Closes #13996 * github.com:scylladb/scylladb: sstables: Remove default prio class from rewrite_statistics() sstables: Remove prio class from validate_checksums subs sstables: Remove always default io-prio from validate_checksums()	2023-05-24 09:23:24 +03:00
Pavel Emelyanov	6c453df9d7	sstables: Remove default prio class from rewrite_statistics() The method is called with explicitly default pririty class and puts one into the fstream options. This whole chain can be avoided Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 13:54:31 +03:00
Pavel Emelyanov	438132ad4b	sstables: Remove prio class from validate_checksums subs The sstable.read_checksum() and .read_digest() accept prio class argument from validate_checsums(), but it's always the "default" one. Remove the arg and remove stream options initializations as they'll pick up default prio class on their default constructing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 13:54:31 +03:00
Pavel Emelyanov	7396d9d291	sstables: Remove always default io-prio from validate_checksums() All calls to sstables::validate_checksums() happen with explicitly default priority class. Just hard-code it as such in the method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 13:54:31 +03:00
Pavel Emelyanov	2bb024c948	index_reader: Introduce and use default arguments to constructor Most of creators of index_reader construct it with default prio class, null trace pointer and use_caching::yes. Assigning implicit defaults to constructor arguments keeps the code shorter and easier to read. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 11:29:04 +03:00
Pavel Emelyanov	3fd5d3cc2b	index_reader: Use _pc field in get_file_input_stream_options() directly No need to pass this-> field into this-> call Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 11:18:14 +03:00
Pavel Emelyanov	21d24e8ea3	index_reader: Move index_reader::get_file_input_stream_options to private: block A "while at it" cleanup. When pathing the method (next patch) it turned out that there are no other callers other than local class, so it _is_ private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 11:18:14 +03:00
Botond Dénes	3b424e391b	Merge 'perform_cleanup: wait until all candidates are cleaned up' from Benny Halevy cleanup_compaction should resolve only after all sstables that require cleanup are cleaned up. Since it is possible that some of them are in staging and therefore cannot be cleaned up, retry once a second until they become eligible. Timeout if there is no progress within 5 minutes to prevent hanging due to view building bug. Fixes #9559 Closes #13812 * github.com:scylladb/scylladb: table: signal compaction_manager when staging sstables become eligible for cleanup compaction_manager: perform_cleanup: wait until all candidates are cleaned up compaction_manager: perform_cleanup: perform_offstrategy if needed compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance sstable_set: add for_each_sstable_gently* helpers	2023-05-19 12:35:59 +03:00
Botond Dénes	c2aee26278	Merge 'Keep sstables garbage collection in sstable_directory' from Pavel Emelyanov Currently temporary directories with incomplete sstables and pending deletion log are processed by distributed loader on start. That's not nice, because for s3 backed sstables this code makes no sense (and is currently a no-op because of incomplete implementation). This garbage collecting should be kept in sstable_directory where it can off-load this work onto lister component that is storage-aware. Once g.c. code moved, it allows to clean the class sstable list of static helpers a bit. refs: #13024 refs: #13020 refs: #12707 Closes #13767 * github.com:scylladb/scylladb: sstable: Toss tempdir extension usage sstable: Drop pending_delete_dir_basename() sstable: Drop is_pending_delete_dir() helper sstable_directory: Make garbage_collect() non-static sstable_directory: Move deletion log exists check distributed_loader: Move garbage collecting into sstable_directory distributed_loader: Collect garbace collecting in one call sstable: Coroutinize remove_temp_dir() sstable: Coroutinize touch_temp_dir() sstable: Use storage::temp_dir instead of hand-crafted path	2023-05-19 08:50:13 +03:00
Kefu Chai	03be1f438c	sstables: move get_components_lister() into sstable_directory sstables_manager::get_component_lister() is used by sstable_directory. and almost all the "ingredients" used to create a component lister are located in sstable_directory. among the other things, the two implementations of `components_lister` are located right in `sstable_directory`. there is no need to outsource this to sstables_manager just for accessing the system_keyspace, which is already exposed as a public function of `sstables_manager`. so let's move this helper into sstable_directory as a member function. with this change, we can even go further by moving the `components_lister` implementations into the same .cc file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13853	2023-05-18 08:43:35 +03:00
Kefu Chai	8bcbc9a90d	sstables: add an maybe_owned_by_this_shard() helper instead of encoding the fact that we are using generation identifier as a hint where the SSTable with this generation should be processed at the caller sites of `as_int()`, just provide an accessor on sstable_generation_generator's side. this helps to encapsulate the underlying type of generation in `generation_type` instead of exposing it to its users. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13846	2023-05-18 08:41:02 +03:00
Pavel Emelyanov	ed50fda1fe	sstable: Toss tempdir extension usage The tempdir for filesystem-based sstables is {generation}.sstable one. There are two places that need to know the ".sstable" extention -- the tempdir creating code and the tempdir garbage-collecting code. This patch simplifies the sstable class by patching the aforementioned functions to use newly introduced tempdir_extension string directly, without the help of static one-line helpers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 15:19:38 +03:00
Pavel Emelyanov	e8c0ae28b5	sstable: Drop pending_delete_dir_basename() The helper is used to return const char* value of the pending delete dir. Callers can use it directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 15:17:33 +03:00
Pavel Emelyanov	7792479865	sstable: Drop is_pending_delete_dir() helper It's only used by the sstable_directory::replay_pending_delete_log() method. The latter is only called by the sstable_directory itself with the path being pending-delete dir for sure. So the method can be made private and the is_pending_delete_dir() can be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 15:17:32 +03:00
Pavel Emelyanov	7429205632	sstable_directory: Make garbage_collect() non-static When non static the call can use sstable_directory::_sstable_dir path, not the provided argument. The main benefit is that the method can later be moved onto lister so that filesystem and ownership-table listers can process dangling bits differently. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 15:16:23 +03:00
Pavel Emelyanov	45adf61490	sstable_directory: Move deletion log exists check Check if the deletion log exists in the handling helper, not outside of it. This makes next patch shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 15:16:23 +03:00
Pavel Emelyanov	3d7122d2fe	distributed_loader: Move garbage collecting into sstable_directory It's the directory that owns the components lister and can reason about the way to pick up dangling bits, be it local directories or entries from the ownership table. First thing to do is to move the g.c. code into sstable_directory. While at it -- convert ssting dir into fs::path dir and switch logger. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 15:16:23 +03:00
Pavel Emelyanov	22299a31c8	sstable: Coroutinize remove_temp_dir() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 15:16:23 +03:00
Pavel Emelyanov	9db5e9f77f	sstable: Coroutinize touch_temp_dir() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 15:15:38 +03:00
Pavel Emelyanov	7e506354fd	sstable: Use storage::temp_dir instead of hand-crafted path When opening an sstable on filesystem it's first created in a temporary directory whose path is saved in storage::temp_dir variable. However, the opening method constructs the path by hand. Fix that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-17 15:14:04 +03:00
Benny Halevy	ff7c9c661d	sstable_set: add for_each_sstable_gently* helpers Currently callers of `for_each_sstable` need to use a seastar thread to allow preemption in the for_each_sstable loop. Provide for_each_sstable_gently and for_each_sstable_gently_until to make using this facility from a coroutine easier, without requiring a seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-17 11:31:07 +03:00
Pavel Emelyanov	b58ad040d2	sstables: Switch data and index sink to use jumbo uploader These two can grow large. Non-jumbo sink is effectively limited with 10000 parts, since each is ~5Mb the maximum uploadable data/index happens to be 50Gb which is too small. Other components shouldn't grow that big and continue using simple and a bit faster uploading sink. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-16 12:23:18 +03:00
Botond Dénes	20ff122a84	Merge 'Delete S3 sstables without the help of deletion log' from Pavel Emelyanov There are two layers of stables deletion -- delete-atomically and wipe. The former is in fact the "API" method, it's called by table code when the specific sstable(s) are no longer needed. It's called "atomically" because it's expected to fail in the middle in a safe manner so that subsequent boot would pick the dangling parts and proceed. The latter is a low-level removal function that can fail in the middle, but it's not of _its_ care. Currently the atomic deletion is implemented with the help of sstable_directory::delete_atomically() method that commits sstables files names into deletion log, then calls wipe (indirectly), then drops the deletion log. On boot all found deletion logs are replayed. The described functionality is used regardless of the sstable storage type, even for S3, though deletion log is an overkill for S3, it's better be implemented with the help of ownership table. In fact, S3 storage already implements atomic deletion in its wipe method thus being overly careful. So this PR - makes atomic deletion be storage-specific - makes S3 wipe non-atomic fixes: #13016 note: Replaying sstables deletion from ownership table on boot is not here, see #13024 Closes #13562 * github.com:scylladb/scylladb: sstables: Implement atomic deleter for s3 storage sstables: Get atomic deleter from underlying storage sstables: Move delete_atomically to manager and rename	2023-05-15 08:57:47 +03:00
Avi Kivity	5d6f31df8e	Merge 'Coroutinize sstable::read_toc()' from Pavel Emelyanov It consists of two parts -- call for do_read_simple() with lambda and handling of its results. PR coroutinizes it in two steps for review simplicity -- first the lambda, then the outer caller. Then restores indentation. Closes #13862 * github.com:scylladb/scylladb: sstables: Restore indentation after previous patches sstables: Coroutinuze read_toc() outer part sstables: Coroutinuze read_toc() inner part	2023-05-14 14:14:23 +03:00
Avi Kivity	0a78995e2b	Merge 'Share s3 clients between sstables' from Pavel Emelyanov Currently s3::client is created for each sstable::storage. It's later shared between sstable's files and upload sink(s). Also foreign_sstable_open_info can produce a file from a handle making a new standalone client. Coupled with the seastar's http client spawning connections on demand, this makes it impossible to control the amount of opened connections to object storage server. In order to put some policy on top of that (as well as apply workload prioritization) s3 clients should be collected in one place and then shared by users. Since s3::client uses seastar::http::client under the hood which, in turn, can generate many connections on demand, it's enough to produce a single s3::client per configured endpoint one each shard and then share it between all the sstables, files and sinks. There's one difficulty however, solving which is most of what this PR does. The file handle, that's used to transfer sstable's file across shards, should keep aboard all it needs to re-create a file on another shard. Since there's a single s3::client per shard, creation of a file out of a handle should grab that shard's client somehow. The meaningful shard-local object that can help is the sstables_manager and there are three ways to make use of it. All deal with the fact that sstables_manager-s are not sharded<> services, but are owner by the database independently on each shard. 1. walk the client -> sst.manager -> database -> container -> database -> sst.manager -> client chain by keeping its first half on the handle and unrolling the second half to produce a file 2. keep sharded peering service referenced by the sstables_manager that's initialized in main and passed though the database constructor down to sstables_manager(s) 3. equip file_handle::to_file with the "context" argument and teach sstables foreign info opener to push sstables_manager down to s3 file ... somehow This PR chooses the 2nd way and introduces the sstables::storage_manager main-local sharded peering service that maintains all the s3::clients. "While at it" the new manager gets the object_storage_config updating facilities from the database (it's overloaded even without it already). Later the manager will also be in charge of collecting and exporting S3 metrics. In order to limit the number of S3 connections it also needs a patch seastar http::client, there's PR already doing that, once (if) merged there'll come one more fix on top. refs: #13458 refs: #13369 refs: scylladb/seastar#1652 Closes #13859 * github.com:scylladb/scylladb: s3: Pick client from manager via handle s3: Generalize s3 file handle s3: Live-update clients' configs sstables: Keep clients shared across sstables storage_manager: Rewrap config map sstables, database: Move object storage config maintenance onto storage_manager sstables: Introduce sharded<storage_manager>	2023-05-14 14:14:23 +03:00

1 2 3 4 5 ...

3188 Commits