scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 03:45:11 +00:00

Author	SHA1	Message	Date
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00
Kamil Braun	e2064f4762	Merge 'repair: finish repair immediately on local keyspaces' from Aleksandra Martyniuk System keyspace is a keyspace with local replication strategy and thus it does not need to be repaired. It is possible to invoke repair of this keyspace through the api, which leads to runtime error since peer_events and scylla_table_schema_history have different sharding logic. For keyspaces with local replication strategy repair_service::do_repair_start returns immediately. Closes #12459 * github.com:scylladb/scylladb: test: rest_api: check if repair of system keyspace returns before corresponding task is created repair: finish repair immediately on local keyspaces	2023-02-09 18:44:37 +01:00
Botond Dénes	e247e15ec1	Merge 'Method to create and start task manager task' from Aleksandra Martyniuk In most cases, tasks manager's tasks are started just after they are created. Thus, to reduce boilerplate required for creating and starting tasks, tasks::task_manager::module::make_and_start_task method is added. Repair tasks are modified to use the method where possible. Closes #12729 * github.com:scylladb/scylladb: repair: use tasks::task_manager::module::make_and_start_task for repair tasks tasks: add task_manager::module::make_and_start_task method	2023-02-06 12:38:35 +02:00
Aleksandra Martyniuk	f3fa0d21ef	repair: use tasks::task_manager::module::make_and_start_task for repair tasks Use tasks::task_manager::module::make_and_start_task to create and start repair tasks. Delete start_repair_task static function which did this before.	2023-02-04 14:33:17 +01:00
Aleksandra Martyniuk	aab704d255	repair: finish repair immediately on local keyspaces System keyspace is a keyspace with local replication strategy and thus it does not need to be repaired. It is possible to invoke repair of this keyspace through the api, which leads to runtime error since peer_events and scylla_table_schema_history have different sharding logic. For keyspaces with local replication strategy repair_service::do_repair_start returns immediately.	2023-02-03 13:35:13 +01:00
Asias He	e2e5017c54	repair: Use remote dc neighbors for everywhere strategy Consider: - Bootstrap n1 in dc 1 - Create ks with EverywhereStrategy - Bootstrap n2 in dc 2 Since n2 is the first node in dc2, there will be no local dc nodes to sync data from. In this case, n2 should sync data with node in dc 1 even if it is in the remote dc.	2023-02-02 11:10:50 +08:00
Botond Dénes	2612f98a6c	Merge 'Abort repair tasks' from Aleksandra Martyniuk Aborting of repair operation is fully managed by task manager. Repair tasks are aborted: - on shutdown; top level repair tasks subscribe to global abort source. On shutdown all tasks are aborted recursively - through node operations (applies to data_sync_repair_task_impls and their descendants only); data_sync_repair_task_impl subscribes to node_ops_info abort source - with task manager api (top level tasks are abortable) - with storage_service api and on failure; these cases were modified to be aborted the same way as the ones from above are. Closes #12085 * github.com:scylladb/scylladb: repair: make top level repair tasks abortable repair: unify a way of aborting repair operations repair: delete sharded abort source from node_ops_info repair: delete unused node_ops_info from data_sync_repair_task_impl repair: delete redundant abort subscription from shard_repair_task_impl repair: add abort subscription to data sync task tasks: abort tasks on system shutdown	2023-01-05 15:21:35 +01:00
Aleksandra Martyniuk	599fce16cf	repair: make top level repair tasks abortable	2022-12-21 11:52:58 +01:00
Aleksandra Martyniuk	e77de463e4	repair: unify a way of aborting repair operations	2022-12-21 11:52:53 +01:00
Aleksandra Martyniuk	f56e886127	repair: delete sharded abort source from node_ops_info Sharded abort source in node_ops_info is no longer needed since its functionality is provided by task manager's tasks structure.	2022-12-21 11:37:03 +01:00
Aleksandra Martyniuk	18efe0a4e8	repair: delete unused node_ops_info from data_sync_repair_task_impl	2022-12-21 11:28:30 +01:00
Aleksandra Martyniuk	60e298fda1	repair: change utils::UUID to node_ops_id Type of the id of node operations is changed from utils::UUID to node_ops_id. This way the id of node operations would be easily distinguished from the ids of other entities. Closes #11673	2022-12-20 17:04:47 +02:00
Aleksandra Martyniuk	be529cc209	repair: delete redundant abort subscription from shard_repair_task_impl data_sync_repair_task_impl subscribes to corresponding node_ops_info abort source and then, when requested, all its descedants are aborted recursively. Thus, shard_repair_task_impl does not need to subscribe to the node_ops_info abort source, since the parent task will take care of aborting once it is requested. abort_subscription and connected attributes are deleted from the shard_repair_task_impl.	2022-12-19 16:07:28 +01:00
Aleksandra Martyniuk	e48ca62390	repair: add abort subscription to data sync task When node operation is aborted, same should happen with the corresponding task manager's repair task. Subscribe data_sync_repair_task_impl abort() to node_ops_info abort_source.	2022-12-19 15:57:35 +01:00
Botond Dénes	cc03becf82	Merge 'tasks: get task's type with method' from Aleksandra Martyniuk Type of operation is related to a specific implementation of a task. Then, it should rather be access with a virtual method in tasks::task_manager::task::impl than be its attribute. Closes #12326 * github.com:scylladb/scylladb: api: delete unused type parameter from task_manager_test api tasks: repair: api: remove type attribute from task_manager::task::status tasks: add type() method to task_manager::task::impl repair: add reason attribute to repair_task	2022-12-16 09:20:26 +02:00
Aleksandra Martyniuk	f81ad2d66a	repair: make shard tasks internal Shard tasks should not be visible to users by default, thus they are made internal. Closes #12325	2022-12-16 09:05:30 +02:00
Aleksandra Martyniuk	5bc09daa7a	tasks: repair: api: remove type attribute from task_manager::task::status	2022-12-15 10:49:09 +01:00
Aleksandra Martyniuk	8d5377932d	tasks: add type() method to task_manager::task::impl	2022-12-15 10:41:58 +01:00
Aleksandra Martyniuk	329176c7bc	repair: add reason attribute to repair_task As a preparation to creating a type() method in task_manager::task::impl a streaming::stream_reason is kept in repair_task.	2022-12-15 10:38:38 +01:00
Botond Dénes	122b267478	Merge 'repair: coroutinize to_repair_rows_list' from Avi Kivity Simplify a somewhat complicated function. Closes #12290 * github.com:scylladb/scylladb: repair: to_repair_rows_list: reindent repair: to_repair_rows_list: coroutinize	2022-12-14 09:39:47 +02:00
Botond Dénes	4122854ae7	Merge 'repair: coroutinize repair_range' from Avi Kivity Nicer and simpler, but essentially cosmetic. Closes #12235 * github.com:scylladb/scylladb: repair: reindent repair_range repair: coroutinize repair_range	2022-12-13 08:16:05 +02:00
Avi Kivity	96890d4120	repair: to_repair_rows_list: reindent	2022-12-12 22:54:07 +02:00
Avi Kivity	e482cb1764	repair: to_repair_rows_list: coroutinize Simplifying a complicated function. It will also be a little faster due to fewer allocations, but not significantly.	2022-12-12 22:52:12 +02:00
Benny Halevy	36a9f62833	repair: repair_module: use mutable capture for func It is moved into the async thread so the encapsulating function should be defined mutable to move the func rather thna copying it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12267	2022-12-11 22:10:28 +02:00
Avi Kivity	909fbfdd2f	repair: reindent repair_range	2022-12-07 18:17:21 +02:00
Avi Kivity	796ec5996f	repair: coroutinize repair_range	2022-12-07 18:13:10 +02:00
Asias He	c6087cf3a0	repair: Reduce repair reader eviction with diff shard count When repair master and followers have different shard count, the repair followers need to create multi-shard readers. Each multi-shard reader will create one local reader on each shard, N (smp::count) local readers in total. There is a hard limit on the number of readers who can work in parallel. When there are more readers than this limit. The readers will start to evict each other, causing buffers already read from disk to be dropped and recreating of readers, which is not very efficient. To optimize and reduce reader eviction overhead, a global reader permit is introduced which considers the multi-shard reader bloats. With this patch, at any point in time, the number of readers created by repair will not exceed the reader limit. Test Results: 1) with stream sem 10, repair global sem 10, 5 ranges in parallel, n1=2 shards, n2=8 shards, memory wanted =1 1.1) [asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2 (repair on n2) [2022-11-23 17:45:24,770] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:45:53,869] Repair session 1 [2022-11-23 17:45:53,869] Repair session 1 finished real 0m30.212s user 0m1.680s sys 0m0.222s 1.2) [asias@hjpc2 mycluster]$ time nodetool repair ks2 (repair on n1) [2022-11-23 17:46:07,507] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:46:30,608] Repair session 1 [2022-11-23 17:46:30,608] Repair session 1 finished real 0m24.241s user 0m1.731s sys 0m0.213s 2) with stream sem 10, repair global sem no_limit, 5 ranges in parallel, n1=2 shards, n2=8 shards, memory wanted =1 2.1) [asias@hjpc2 mycluster]$ time nodetool -p 7200 repair ks2 (repair on n2) [2022-11-23 17:49:49,301] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:52:01,414] Repair session 1 [2022-11-23 17:52:01,415] Repair session 1 finished real 2m13.227s user 0m1.752s sys 0m0.218s 2.2) [asias@hjpc2 mycluster]$ time nodetool repair ks2 (repair on n1) [2022-11-23 17:52:19,280] Starting repair command #1, repairing 1 ranges for keyspace ks2 (parallelism=SEQUENTIAL, full=true) [2022-11-23 17:52:42,387] Repair session 1 [2022-11-23 17:52:42,387] Repair session 1 finished real 0m24.196s user 0m1.689s sys 0m0.184s Comparing 1.1) and 2.1), it shows the eviction played a major role here. The patch gives 73s / 30s = 2.5X speed up in this setup. Comparing 1.1 and 1.2, it shows even if we limit the readers, starting on the lower shard is faster 30s / 24s = 1.25X (the total number of multishard readers is lower) Fixes #12157 Closes #12158	2022-12-05 10:47:36 +02:00
Botond Dénes	dbd00fd3e9	Merge 'Task manager shard repair tasks' from Aleksandra Martyniuk The PR introduces shard_repair_task_impl which represents a repair task that spans over a single shard repair. repair_info is replaced with shard_repair_task_impl, since both serve similar purpose. Closes #12066 * github.com:scylladb/scylladb: repair: reindent repair: replace repair_info with shard_repair_task_impl repair: move repair_info methods to shard_repair_task_impl repair: rename methods of repair_module repair: change type of repair_module::_repairs repair: keep a reference to shard_repair_task_impl in row_level_repair repair: move repair_range method to shard_repair_task_impl repair: make do_repair_ranges a method of shard_repair_task_impl repair: copy repair_info methods to shard_repair_task_impl repair: corutinize shard task creation repair: define run for shard_repair_task_impl repair: add shard_repair_task_impl	2022-12-01 10:04:31 +02:00
Aleksandra Martyniuk	78a6193c01	repair: reindent	2022-11-30 13:53:52 +01:00
Aleksandra Martyniuk	b4ad914fe1	repair: replace repair_info with shard_repair_task_impl repair_info is deleted and all its attributes are moved to shard_repair_task_impl.	2022-11-30 13:53:52 +01:00
Aleksandra Martyniuk	f6ec2cec92	repair: move repair_info methods to shard_repair_task_impl	2022-11-30 13:53:18 +01:00
Aleksandra Martyniuk	8bc0af9e34	repair: fix double start of data sync repair task Currently, each data sync repair task is started (and hence run) twice. Thus, when two running operations happen within a time frame long enough, the following situation may occur: - the first run finishes - after some time (ttl) the task is unregistered from the task manager - the second run finishes and attempts to finish the task which does not exist anymore - memory access causes a segfault. The second call to start is deleted. A check is added to the start method to ensure that each task is started at most once. Fixes: #12089 Closes #12090	2022-11-29 00:00:10 +02:00
Aleksandra Martyniuk	c2ea3f49e6	repair: rename methods of repair_module Methods of repair_module connected with repair_module::_repairs are renamed to match repair_module::_repairs type.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	13dbd75ba8	repair: change type of repair_module::_repairs As a preparation to replacing repair_info with shard_repair_task_impl, type of _repairs in repair module is changed from std::unordered_map<int, lw_shared_ptr<repair_info>> to std::unordered_map<int, tasks::task_id>.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	55c01a1beb	repair: keep a reference to shard_repair_task_impl in row_level_repair As a part of replacing repair_info with shard_repair_task_impl, instead of a reference to repair_info, row_level_repair keeps a reference to shard_repair_task_impl.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	9b664570f0	repair: move repair_range method to shard_repair_task_impl	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	3ac5ba7b28	repair: make do_repair_ranges a method of shard_repair_task_impl Function do_repair_ranges is directly connected to shard repair tasks. Turning it into shard_repair_task_impl method enables an access to tasks' members with no additional intermediate layers.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	a09dfcdacd	repair: copy repair_info methods to shard_repair_task_impl Methods of repair_info are copied to shard_repair_task_impl. They are not used yet, it's a preparation for replacing repair_info with shard_repair_task_impl.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	a4b1bdb56c	repair: corutinize shard task creation	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	996c0f3476	repair: define run for shard_repair_task_impl Operations performed as a part of shard repair are moved to shard_repair_task_impl run method.	2022-11-25 16:41:02 +01:00
Aleksandra Martyniuk	ba9770ea02	repair: add shard_repair_task_impl Create a task spanning over a repair performed on a given shard.	2022-11-25 16:40:49 +01:00
Aleksandra Martyniuk	a3016e652f	repair: define run for data_sync_repair_task_impl Operations performed as a part of data sync repair are moved to data_sync_repair_task_impl run method.	2022-11-23 10:44:19 +01:00
Aleksandra Martyniuk	42239c8fed	repair: add data_sync_repair_task_impl Create a task spanning over whole node operation. Tasks of that type are stored on shard 0.	2022-11-23 10:19:53 +01:00
Aleksandra Martyniuk	9e108a2490	tasks: repair: add noexcept to task impl constructor Add noexcept to constructor of tasks::task_manager::task::impl and inheriting classes.	2022-11-23 10:19:53 +01:00
Aleksandra Martyniuk	4a4e9c12df	repair: define run for user_requested_repair_task_impl Operations performed as a part of user requested repair are moved to user_requested_repair_task_impl run method.	2022-11-23 10:19:51 +01:00
Aleksandra Martyniuk	3800b771fc	repair: add user_requested_repair_task_impl Create a task spanning over whole user requested repair. Tasks of that type are stored on shard 0.	2022-11-23 10:11:09 +01:00
Aleksandra Martyniuk	0256ede089	repair: allow direct access to max_repair_memory_per_range Access specifier of constexpr value max_repair_memory_per_range in repair_module is changed to public and its getter is deleted.	2022-11-23 10:11:09 +01:00
Asias He	4571fcf9e7	token_metadata: Rename is_member to is_normal_token_owner The name is_normal_token_owner is more clear than is_member. The is_normal_token_owner reflects what it really checks.	2022-11-18 09:29:20 +08:00
Benny Halevy	53fdf75cf9	repair: pass erm down to get_hosts_participating_in_repair and get_neighbors Now that it is available in repair_info. Fixes #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:30 +02:00
Benny Halevy	b69be61f41	repair: pass effective_replication_map down to repair_info And make sure the token_metadata ring version is same as the reference one (from the erm on shard 0), when starting the repair on each shard. Refs #11993 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 08:07:29 +02:00

1 2 3 4 5 ...

738 Commits