scylladb/api at ec6c540c30ceaed55016cd2b6a4677bae80da618 - scylladb - Anomalous Gitea

mirrors/scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 03:30:49 +00:00

Files

History

Avi Kivity df3ef800c2 Merge 'Introduce load and stream feature' from Asias He

storage_service: Introduce load_and_stream

=== Introduction ===

This feature extends the nodetool refresh to allow loading arbitrary sstables
that do not belong to a node into the cluster. It loads the sstables from disk
and calculates the owning nodes of the data and streams to the owners
automatically.

From example, say the old cluster has 6 nodes and the new cluster has 3 nodes.
We can copy the sstables from the old cluster to any of the new nodes and
trigger the load and stream process.

This can make restores and migrations much easier.

=== Performance ===

I managed to get 40MB/s per shard on my build machine.
CPU: AMD Ryzen 7 1800X Eight-Core Processor
DISK: Samsung SSD 970 PRO 512GB

Assume 1TB sstables per node, each shard can do 40MB/s, each node has 32
shards, we can finish the load and stream 1TB of data in 13 mins on each
node.

1TB / 40 MB per shard * 32 shard / 60 s = 13 mins

=== Tests ===

backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_test
which creates a cluster with 4 nodes and inserts data, then use
load_and_stream to restore to a 2 nodes cluster.

=== Usage ===

curl -X POST "http://{ip}:10000/storage_service/sstables/{keyspace}?cf={table}&load_and_stream=true

=== Notes ===

Btw, with the old nodetool refresh, the node will not pick up the data
that does not belong to this node but it will not delete it either. One
has to run nodetool cleanup to remove those data manually which is a
surprise to me and probably to users as well. With load and stream, the
process will delete the sstables once it finishes stream, so no nodetool
cleanup is needed.

The name of this feature load and stream follows load and store in CPU world.

Fixes #7831

Closes #7846

* github.com:scylladb/scylla:
  storage_service: Introduce load_and_stream
  distributed_loader: Add get_sstables_from_upload_dir
  table: Add make_streaming_reader for given sstables set

2021-01-18 15:08:19 +02:00

..

storage_service: Introduce load_and_stream

2021-01-18 16:32:33 +08:00

api_init.hh

main: start a shared_token_metadata

2020-11-11 14:20:23 +02:00

api.cc

repair: Keep sharded messaging service in API

2020-08-19 20:50:53 +03:00

api.hh

per table metrics: change estimated_histogram to time_estimated_histogram

2020-07-14 11:17:43 +03:00

cache_service.cc

api/cache_service: Relax getting partitions count

2020-04-23 17:47:58 +02:00

cache_service.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

collectd.cc

everywhere: Be more explicit that we don't want std::make_shared

2020-03-10 13:13:48 -07:00

collectd.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

column_family.cc

api: remove potential large allocation in /column_family/ GET request handler

2021-01-13 12:04:18 +02:00

column_family.hh

per table metrics: change estimated_histogram to time_estimated_histogram

2020-07-14 11:17:43 +03:00

commitlog.cc

build: Be consistent about system versus regular headers

2020-06-10 15:49:51 +03:00

commitlog.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

compaction_manager.cc

api/compaction_manager: indentation

2019-08-12 14:04:40 +03:00

compaction_manager.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

config.cc

api: config: stop using _make_config_values

2019-04-23 16:29:03 +03:00

config.hh

Defining the config api

2018-03-28 12:41:55 +03:00

endpoint_snitch.cc

api::endpoint_snitch: c3 compat - allow dc/rack query for broadcast

2016-11-08 12:22:04 +00:00

endpoint_snitch.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

error_injection.cc

utils: error injection inject() returning a future

2020-04-01 16:22:52 +02:00

error_injection.hh

api: add error injection to REST API

2020-03-20 20:49:03 +01:00

failure_detector.cc

api::failure_detector: c3 compat - add endpoint phi value query

2016-11-08 12:22:04 +00:00

failure_detector.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

gossiper.cc

api: Add force_remove_endpoint for gossip

2020-11-29 13:58:46 +02:00

gossiper.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

hinted_handoff.cc

API: remove unneeded refrences to collectd

2017-03-21 16:42:57 +02:00

hinted_handoff.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

lsa.cc

Update seastar submodule

2018-11-21 00:01:44 +02:00

lsa.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

messaging_service.cc

api: Use local reference to messaging_service

2020-08-19 13:08:12 +03:00

messaging_service.hh

api: Use local reference to messaging_service

2020-08-19 13:08:12 +03:00

storage_proxy.cc

api: allow changing hinted handoff configuration

2020-11-17 10:24:43 +01:00

storage_proxy.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

storage_service.cc

storage_service: Introduce load_and_stream

2021-01-18 16:32:33 +08:00

storage_service.hh

repair: Keep sharded messaging service in API

2020-08-19 20:50:53 +03:00

stream_manager.cc

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

stream_manager.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

system.cc

treewide: replace calls to engine().some_api() with some_api()

2020-04-05 12:46:04 +03:00

system.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00