scylladb

Author	SHA1	Message	Date
Pavel Emelyanov	11a04bfb66	code: Introduce restore API method The method starts a task that uses sstables_loader load-and-stream functionality to bring new sstables into the cluster. The existing load-and-stream picks up sstables from upload/ directory, the newly introduced task collects them from S3 bucket and given prefix (that correspond to the path where backup API method put them). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-28 15:42:49 +03:00
Pavel Emelyanov	1f3f0b1926	sstable_loader: Add sstables::storage_manager dependency The storage_manager maintains set of clients to configured object storage(s). The sstables loader is going to spawn tasks that will talk to to those storages, thus it needs the storage manager to get the clients clients from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	06c3c53deb	sstable_loader: Maintain task manager module This service is going to start tasks managed by task manager. For that, it should have its module set up and registered. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	9cf95e8a07	sstable_loader: Out-line constructor It will grow and become more complicated. Better to have it outside the header. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	ae622d711e	sstables-loader: Run loading in its scheduling group Now the loading code has two different paths, and only one of them switches sched group. It's cleaner and more natural to switch the sched group in the loader itself, so that all code paths run in it and don't care switching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 11:07:58 +03:00
Pavel Emelyanov	b728857954	distributed_loader: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	0d946a5fdf	distributed_loader: Propagate view_builder& via process_upload_dir() Preparation to next patches, they'll make use of this new argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Kefu Chai	a439ebcfce	treewide: include fmt/ranges.h and/or fmt/std.h before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<. with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:56:16 +08:00
Pavel Emelyanov	1c1004d1bd	sstables_loader: Format list of sstables' filenames in place Loader wants to print set of sstables' names. For that it collects names into a dedicated vector, then prints it using fmt/ranges facility. There's a way to achieve the same goal without allocating extra vector with names -- use fmt::format() and pass it a range converting sstables into their names. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18159	2024-04-04 12:09:52 +03:00
Raphael S. Carvalho	6bdb456fad	sstables_loader: Fix loader when write selector is previous during tablet migration The loader is writing to pending replica even when write selector is set to previous. If migration is reverted, then the writes won't be rolled back as it assumes pending replicas weren't written to yet. That can cause data resurrection if tablet is later migrated back into the same replica. NOTE: write selector is handled correctly when set to next, because get_natural_endpoints() will return the next replica set, and none of the replicas will be considered leaving. And of course, selector set to both is also handled correctly. Fixes #17892. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17902	2024-03-24 01:20:50 +01:00
Raphael S. Carvalho	6115c113fe	sstables_loader: Don't discard sstable that is not fully exhausted Affects load-and-stream for tablets only. The intention is that only this loop is responsible for detecting exhausted sstables and then discarding them for next iterations: while (sstable_it != _sstables.rend() && exhausted(*sstable_it)) { sstable_it++; } But the loop which consumes non exhausted sstables, on behalf of each tablet, was incorrectly advancing the iterator, despite the sstable wasn't considered exhausted. Fixes #17733. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17899	2024-03-20 09:11:59 +02:00
Raphael S. Carvalho	771cbf9b79	sstables_loader: Stream to pending tablet replica if needed Even though taking erm blocks migration, it cannot prevent the load-and-stream to start while a migration is going on, erm only prevents migration from advancing. With tablets, new data will be streamed to pending replica too if the write replica selector, in transition metadata, is set to both. If migration is at a later stage where only new replica is written to, then data is streamed only to new replica as selector is set to next (== new replica set). primary_replica_only flag is handled by only streaming to pending if the primary replica is the one leaving through migration. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 15:17:05 -03:00
Raphael S. Carvalho	ab498489fe	sstables_loader: Implement tablet based load-and-stream Similar treatment to repair is given to load-and-stream. Jumps into a new streaming session for every tablet, so we guarantee data will be segregated into tablets co-habiting the same shard. Fixes #17315. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 13:04:20 -03:00
Raphael S. Carvalho	b9158e36ef	sstables_loader: Virtualize sstable_streamer for tablet virtualization allows for tablet version of streaming. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:30:14 -03:00
Raphael S. Carvalho	3523cc8063	sstables_loader: Avoid reallocations in vector Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Raphael S. Carvalho	d1db17d490	sstable_loader: Decouple sstable streaming from selection That will make it easy to introduce tablet-based load-and-stream. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Raphael S. Carvalho	0a41f2a11f	sstables_loader: Introduce sstable_streamer Will make it easier to implement tablet oriented variant. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Benny Halevy	7a7a1db86b	sstables_loader: load_new_sstables: auto-enable load-and-stream for tablets And call on_internal_error if process_upload_dir is called for tablets-enabled keyspace as it isn't supported at the moment (maybe it could be in the future if we make sure that the sstables are confined to tablets boundaries). Refs #12775 Fixes #16743 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16788	2024-01-16 18:43:52 +02:00
Tomasz Grabiec	fd3c089ccc	service: range_streamer: Propagate topology_guard to receivers	2023-12-06 18:36:16 +01:00
Botond Dénes	5a73c3374e	sstables_loader: opt-in for compacting the stream No point in loading expired/covered data.	2023-07-27 03:22:11 -04:00
Botond Dénes	42b0dd5558	replica/table: add optional compacting to make_streaming_reader() Opt-in is possible by passing an engaged `compaction_time` (gc_clock::time_point) to the method. When this new parameter is disengaged, no compaction happens. Note that there is a global override, via the enable_compacting_data_for_streaming_and_repair config item, which can force-disable this compaction. Compaction done on the output of the streaming reader does not garbage-collect tombstones! All call-sites are adjusted (the new parameter is not defaulted), but none opt in yet. This will be done in separate commit per user.	2023-07-27 03:22:11 -04:00
Kefu Chai	84683c3549	sstable_loader: update comment to reflect latest changes we have a dedicated facility for loading sstables since `68dfcf5256`, and column_family (i.e. table) is not responsible for loading new sstables. so update the comment to reflect this change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14154	2023-06-06 14:31:15 +03:00
Tomasz Grabiec	9b17ad3771	locator: Introduce per-table replication strategy Will be used by tablet-based replication strategies, for which effective replication map is different per table. Also, this patch adapts existing users of effective replication map to use the per-table effective replication map. For simplicity, every table has an effective replication map, even if the erm is per keyspace. This way the client code can be uniform and doesn't have to check whether replication strategy is per table. Not all users of per-keyspace get_effective_replication_map() are adapted yet to work per-table. Those algorithms will throw an exception when invoked on a keyspace which uses per-table replication strategy.	2023-04-24 10:49:36 +02:00
Raphael S. Carvalho	fe6df3d270	sstable_loader: Discard SSTable bloom filter on load-and-stream Load-and-stream reads the entire content from SSTables, therefore it can afford to discard the bloom filter that might otherwise consume a significant amount of memory. Bloom filters are only needed by compaction and other replica::table operations that might want to check the presence of keys in the SSTable files, like single-partition reads. It's not uncommon to see Data:Filter ratio of less than 100:1, meaning that for ~300G of data, filters will take ~3G. In addition to saving memory footprint, it also reduces operation time as load-and-stream no longer have to read, parse and build the filters from disk into memory. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-04-13 11:34:22 -03:00
Botond Dénes	156e5d346d	reader_permit: keep trace_state pointer on permit And propagate it down to where it is created. This will be used to add trace points for semaphore related events, but this will come in the next patches.	2023-03-22 04:58:01 -04:00
Raphael S. Carvalho	fbeee8b65d	Optimize load-and-stream load-and-stream implements no policy when deciding which SSTables will go in each streaming round (batch of 16 SSTables), meaning the choice is random. It can take advantage of the fact that the LSM-tree layout, with ICS and LCS, is a set of SSTable runs, where each run is composed of SSTables that are disjoint in their key range. By sorting SSTables to be streamed by their first key, the effect is that SSTable runs will be incrementally streamed (in token order). SSTable runs in the same replica group (or in the same node) will have their content deduplicated, reducing significantly the amount of data we need to put on the wire. The improvement is proportional to the space amplification in the table, which again, depends on the compaction strategy used. Another important benefit is that the destination nodes will receive SSTables in token order, allowing off-strategy compaction to be more efficient. This is how I tested it: 1) Generated a 5GB dataset to a ICS table. 2) Started a fresh 2-node cluster. RF=2. 3) Ran load-and-stream against one of the replicas. BEFORE: $ time curl -X POST "http://127.0.0.1:10000/storage_service/sstables/keyspace1?cf=standard1&load_and_stream=true" real 4m40.613s user 0m0.005s sys 0m0.007s AFTER: $ time curl -X POST "http://127.0.0.1:10000/storage_service/sstables/keyspace1?cf=standard1&load_and_stream=true" real 2m39.271s user 0m0.005s sys 0m0.004s That's ~1.76x faster. That's explained by deduplication: BEFORE: INFO 2023-02-17 22:59:01,100 [shard 0] stream_session - [Stream #79d3ce7a-ea47-4b6e-9214-930610a18ccd] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3445376, received_partitions=2755835 INFO 2023-02-17 22:59:41,491 [shard 0] stream_session - [Stream #bc6bad99-4438-4e1e-92db-b2cb394039c8] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3308288, received_partitions=2836491 INFO 2023-02-17 23:00:20,585 [shard 0] stream_session - [Stream #e95c4f49-0a2f-47ea-b41f-d900dd87ead5] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3129088, received_partitions=2734029 INFO 2023-02-17 23:00:49,297 [shard 0] stream_session - [Stream #255cba95-a099-4fec-a72c-f87d5cac2b1d] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2544128, received_partitions=1959370 INFO 2023-02-17 23:01:33,110 [shard 0] stream_session - [Stream #96b5737e-30c7-4af8-a8b8-96fecbcbcbd0] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3624576, received_partitions=3085681 INFO 2023-02-17 23:02:20,909 [shard 0] stream_session - [Stream #3185a48b-fb9e-4190-88f4-5c7a386bc9bd] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3505024, received_partitions=3079345 INFO 2023-02-17 23:03:02,039 [shard 0] stream_session - [Stream #0d2964dc-d5e3-4775-825c-97f736d14713] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2808192, received_partitions=2655811 AFTER: INFO 2023-02-17 23:12:49,155 [shard 0] stream_session - [Stream #bf00963c-3334-4035-b1a9-4b3ceb7a188a] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2965376, received_partitions=1006535 INFO 2023-02-17 23:13:13,365 [shard 0] stream_session - [Stream #1cd2e3ac-a68b-4cb5-8a06-707e91cf59db] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3543936, received_partitions=1406157 INFO 2023-02-17 23:13:37,474 [shard 0] stream_session - [Stream #5a278230-6b4b-461f-8396-c15df7092d03] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3639936, received_partitions=1371298 INFO 2023-02-17 23:14:02,132 [shard 0] stream_session - [Stream #19f40dc3-e02a-4321-a917-a6590d99dd03] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3638912, received_partitions=1435386 INFO 2023-02-17 23:14:26,673 [shard 0] stream_session - [Stream #d47507eb-2067-4e8f-a4f7-c82d5fbd4228] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=3561600, received_partitions=1423024 INFO 2023-02-17 23:14:49,307 [shard 0] stream_session - [Stream #d42ee911-253a-4de6-ac89-6a3c05b88d66] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2382592, received_partitions=1452656 INFO 2023-02-17 23:15:10,067 [shard 0] stream_session - [Stream #1f78c1bf-8e20-41bd-95de-16de3fc5f86c] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=2632320, received_partitions=1252298 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20230219191924.37070-1-raphaelsc@scylladb.com>	2023-02-20 12:46:14 +01:00
Benny Halevy	314e45d957	streaming: define plan_id as a strong tagged_uuid type Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-22 19:45:30 +03:00
Benny Halevy	257d74bb34	schema, everywhere: define and use table_id as a strong type Define table_id as a distinct utils::tagged_uuid modeled after raft tagged_id, so it can be differentiated from other uuid-class types, in particular from table_schema_version. Fixes #11207 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-08-08 08:09:41 +03:00
Raphael S. Carvalho	aa667e590e	sstable_set: Fix partitioned_sstable_set constructor The sstable set param isn't being used anywhere, and it's also buggy as sstable run list isn't being updated accordingly. so it could happen that set contains sstables but run list is empty, introducing inconsistency. we're fortunate that the bug wasn't activated as it would've been a hard one to catch. found this while auditting the code. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220617203438.74336-1-raphaelsc@scylladb.com>	2022-06-21 11:58:13 +03:00
Avi Kivity	afc06f0017	messaging: forward-declare types in messaging_service.hh messaging_service.hh is a switchboard - it includes many things, and many things include it. Therefore, changes in the things it includes affect many translation units. Reduce the dependencies by forward-declaring as much as possible. This isn't pretty, but it reduces compile time and recompilations. Other headers adjusted as needed so everything (including `ninja dev-headers`) still compile. Closes #10755	2022-06-09 15:52:12 +03:00
Michael Livshin	00bee4e0b3	sstables_loader: mutation_fragment_v1_stream() instead of downgrade_to_v1() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Avi Kivity	4b53af0bd5	treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines coroutine::parallel_for_each avoids an allocation and is therefore preferred. The lifetime of the function object is less ambiguous, and so it is safer. Replace all eligible occurences (i.e. caller is a coroutine). One case (storage_service::node_ops_cmd_heartbeat_updater()) needed a little extra attention since there was a handle_exception() continuation attached. It is converted to a try/catch. Closes #10699	2022-05-31 09:06:24 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Botond Dénes	97d74de8fc	Merge "flat_mutation_reader: clone evictable_reader & convert some others" from Michael Livshin " The first patch introduces evictable_reader_v2, and the second one further simplifies it. We clone instead of converting because there is at least one downstream (by way of multishard_combining_reader) use that is not itself straightforward to convert at the moment (multishard_mutation_query), and because evictable_reader instances cannot be {up,down}graded (since users also access the undelying buffers). This also means that shard_reader, reader_lifecycle_policy and multishard_combining_reader have to be cloned. " * tag 'clone-evictable-reader-to-v2/v3' of https://github.com/cmm/scylla: convert make_multishard_streaming_reader() to flat_mutation_reader_v2 convert table::make_streaming_reader() to flat_mutation_reader_v2 convert make_flat_multi_range_reader() to flat_mutation_reader_v2 view_update_generator: remove unneeded call to downgrade_to_v1() introduce multishard_combining_reader_v2 introduce shard_reader_v2 introduce the reader_lifecycle_policy_v2 abstract base evictable_reader_v2: further code simplifications introduce evictable_reader_v2 & friends	2022-01-11 17:01:08 +02:00
Michael Livshin	be5118a7c9	convert table::make_streaming_reader() to flat_mutation_reader_v2 All changes are mechanical. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-01-11 10:49:26 +02:00
Avi Kivity	4392c20bd3	replica: move distributed_loader into replica module distributed_loader is replica-side thing, so it belongs in the replica module ("distributed" refers to its ability to load sstables in their correct shards). So move it to the replica module.	2022-01-10 15:25:28 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Avi Kivity	e51fcc22f3	sstable_loader: add missing include <cfloat> Needed for FLT_EPSILON Closes #9646	2021-11-17 09:01:49 +02:00
Benny Halevy	fdaa891332	storage_service, sstables_loader: use effective_replication_map to get_natural_endpoints Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 13:50:27 +03:00
Pavel Emelyanov	68ecec0197	sstables_loader: Accept the sstables loading code The code was moved in the relevant .cc file by previous patch, now make it sit in the relevant class. One "significant" change is that the messaging service is available by local reference already, not by the sharded one. Other dependencies are already satisfied by the patch that introduced the sstables_loader class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-10-11 11:08:21 +03:00
Pavel Emelyanov	42f83f6669	storage_service: Move the sstables loading code Just cut-n-paste the code into sstables_loader.cc. No other changes but replace storage service logger with its own one. For now the code stays in storage_service class, but next patch will relocate the code into the sstables_loader one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-10-11 11:07:39 +03:00

41 Commits