scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Aleksandra Martyniuk	f42be12f43	repair: release resources of shard_repair_task_impl Before integration with task manager the state of one shard repair was kept in repair_info. repair_info object was destroyed immediately after shard repair was finished. In an integration process repair_info's fields were moved to shard_repair_task_impl as the two served the similar purposes. Though, shard_repair_task_impl isn't immediately destoyed, but is kept in task manager for task_ttl seconds after it's complete. Thus, some of repair_info's fields have their lifetime prolonged, which makes the repair state change delayed. Release shard_repair_task_impl resources immediately after shard repair is finished. Fixes: #15505. Closes scylladb/scylladb#15506	2023-09-26 17:09:47 +03:00
Aleksandra Martyniuk	d799adc536	tasks: change task_manager::task::impl::is_internal() Most of the time only the roots of tasks tree should be non internal. Change default implementation of is_internal and delete overrides consistent with it. Closes scylladb/scylladb#15353	2023-09-26 14:49:49 +03:00
Aleksandra Martyniuk	d0d0ad7aa4	node_ops: extract classes related to node operations Node operations will be integrated with task manager and so node_ops directory needs to be created. To have an access to node ops related classes from task manager and preserve consistent naming, move the classes to node_ops/node_ops_data.cc.	2023-09-13 10:49:31 +02:00
Aleksandra Martyniuk	848dfb26ef	repair: add methods making repair progress more precise Override methods returning expected children number and job size in repair tasks. With them get_progress method would be able to return more precise progress value.	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	4766f74623	repair: add get_progress method to shard_repair_task_impl Count shard_repair_task_impl progress based on a number of ranges which have already been repaired.	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	c9d68869b8	repair: add const noexcept qualifiers to shard_repair_task_impl::ranges_size()	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	09abbdddae	repair: log a name of a particular table repair is working on Instead of logging the list of all tables' names for a given repair, log the particular name of a table the repair is working on.	2023-08-30 15:34:25 +02:00
Aleksandra Martyniuk	5e31ca7d20	tasks: api: show tasks' scopes To make manual analysis of task manager tasks easier, task_status and task_stats contain operation scope (e.g. shard, table). Closes #15172	2023-08-29 11:32:16 +03:00
Botond Dénes	68d2397d01	Merge 'repair: delete unused fields' from Aleksandra Martyniuk Delete unused shard_repair_task_impl members and incorrectly used method's argument. Closes #14956 * github.com:scylladb/scylladb: repair: delete task_manager_module::get_progress argument repair: delete unused shard_repair_task_impl fields	2023-08-04 15:08:31 +03:00
Aleksandra Martyniuk	66df686980	repair: delete task_manager_module::get_progress argument Getting reason argument in task_manager_module::get_progress is deceiving as the method works properly only for streaming::stream_reason::repair (repair::shard_repair_task_impl::nr_ranges_finished isn't updated for any other reason).	2023-08-04 11:09:37 +02:00
Aleksandra Martyniuk	93ebbdcf1d	repair: delete unused shard_repair_task_impl fields	2023-08-04 10:52:24 +02:00
Asias He	9b3fd9407b	repair: Add ranges_parallelism option This patch adds the ranges_parallelism option to repair restful API. Users can use this option to optionally specify the number of ranges to repair in parallel per repair job to a smaller number than the Scylla core calculated default max_repair_ranges_in_parallel. Scylla manager can also use this option to provide more ranges (>N) in a single repair job but only repairing N ranges_parallelism in parallel, instead of providing N ranges in a repair job. To make it safer, unlike the PR #4848, this patch does not allow user to exceed the max_repair_ranges_in_parallel. Fixes #4847	2023-08-01 10:58:14 +08:00
Botond Dénes	a22446afe0	repair: log the error which caused the repair to fail Instead of just a boolean _failed flag, persist the error message of the exception which caused the repair to fail, and include it in the log message announcing the failure.	2023-07-27 03:22:11 -04:00
Asias He	32cad54c00	repair: Add aborted_by_user to repair status report Add the aborted_by_user flag to the repair status report, for example: INFO [shard 0] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: starting user-requested repair for keyspace ks2a, repair id 1, options {{trace -> false}, {columnFamilies -> tb5}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}, {primaryRange -> false}} INFO [shard 0] repair - Started to aborting repair jobs={4342512b-5a5f-48fc-a840-934100264cbc}, nr_jobs=1 WARN [shard 0] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: Repair job aborted by user, job=4342512b-5a5f-48fc-a840-934100264cbc, keyspace=ks2a, tables={tb5} WARN [shard 0] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: 3 out of 513 ranges failed, keyspace=ks2a, tables={tb5}, repair_reason=repair, nodes_down_during_repair={}, aborted_by_user=true WARN [shard 1] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: 3 out of 513 ranges failed, keyspace=ks2a, tables={tb5}, repair_reason=repair, nodes_down_during_repair={}, aborted_by_user=true WARN [shard 0] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: user-requested repair failed: std::runtime_error ({ shard 0: std::runtime_error (repair[4342512b-5a5f-48fc-a840-934100264cbc]: 3 out of 513 ranges failed, keyspace=ks2a, tables={tb5}, repair_reason=repair, nodes_down_during_repair={}, aborted_by_user=true), shard 1: std::runtime_error (repair[4342512b-5a5f-48fc-a840-934100264cbc]: 3 out of 513 ranges failed, keyspace=ks2a, tables={tb5}, repair_reason=repair, nodes_down_during_repair={}, aborted_by_user=true)}) In addition, change the log from "Aborted {} repair job(s), aborted={}" to "Started to abort repair jobs={}, nr_jobs={}" to reflect the fact the user requested abort api is async. Closes #14062	2023-06-06 09:08:00 +03:00
Asias He	7056b7ee9a	repair: Log nodes down during repair in case of failed repair This helps users to figure if the repair has failed due to a peer node was down during repair. For example: ``` WARN [shard 0] repair - repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]: Repair 1026 out of 1026 ranges, keyspace=ks2a, table={test_table, tb}, range=(9203128250168517738,+inf), peers={127.0.0.2}, live_peers={}, status=skipped_no_live_peers INFO [shard 0] repair - repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]: stats: repair_reason=repair, keyspace=ks2a, tables={test_table, tb}, ranges_nr=513, round_nr=0, round_nr_fast_path_already_synced=0, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=0, tx_hashes_nr=0, rx_hashes_nr=0, duration=0 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={}, row_from_disk_nr={}, row_from_disk_bytes_per_sec={} MiB/s, row_from_disk_rows_per_sec={} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} WARN [shard 0] repair - repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]: 1026 out of 1026 ranges failed, keyspace=ks2a, tables={test_table, tb}, repair_reason=repair, nodes_down_during_repair={127.0.0.2} WARN [shard 0] repair - repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]: repair_tracker run failed: std::runtime_error ({shard 0: std::runtime_error (repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]: 1026 out of 1026 ranges failed, keyspace=ks2a, tables={test_table, tb}, repair_reason=repair, nodes_down_during_repair={127.0.0.2})}) ``` In addition, change the `status=skipped` to `status=skipped_no_live_peers` to make it more clear. Closes #13928	2023-05-23 11:12:42 +03:00
Tomasz Grabiec	d3c9ad4ed6	locator: Rename effective_replication_map to vnode_effective_replication_map In preparation for introducing a more abstract effective_replication_map which can describe replication maps which are not based on vnodes.	2023-04-24 10:49:36 +02:00
Aleksandra Martyniuk	f10b862955	repair: rename repair_module	2023-03-27 16:33:39 +02:00
Aleksandra Martyniuk	8f935481cd	repair: add repair namespace to repair/task_manager_module.hh	2023-03-27 16:32:51 +02:00
Aleksandra Martyniuk	17e0e05f42	repair: rename repair_task.hh	2023-03-27 16:31:51 +02:00

19 Commits