Commit Graph

6 Commits

Author SHA1 Message Date
Asias He
32cad54c00 repair: Add aborted_by_user to repair status report
Add the aborted_by_user flag to the repair status report, for example:

INFO  [shard 0] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: starting user-requested repair for keyspace ks2a, repair id 1,
      options {{trace -> false}, {columnFamilies -> tb5}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}, {primaryRange -> false}}
INFO  [shard 0] repair - Started to aborting repair jobs={4342512b-5a5f-48fc-a840-934100264cbc}, nr_jobs=1
WARN  [shard 0] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: Repair job aborted by user, job=4342512b-5a5f-48fc-a840-934100264cbc, keyspace=ks2a, tables={tb5}
WARN  [shard 0] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: 3 out of 513 ranges failed, keyspace=ks2a, tables={tb5}, repair_reason=repair, nodes_down_during_repair={}, aborted_by_user=true
WARN  [shard 1] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: 3 out of 513 ranges failed, keyspace=ks2a, tables={tb5}, repair_reason=repair, nodes_down_during_repair={}, aborted_by_user=true
WARN  [shard 0] repair - repair[4342512b-5a5f-48fc-a840-934100264cbc]: user-requested repair failed: std::runtime_error ({
      shard 0: std::runtime_error (repair[4342512b-5a5f-48fc-a840-934100264cbc]: 3 out of 513 ranges failed, keyspace=ks2a, tables={tb5}, repair_reason=repair, nodes_down_during_repair={}, aborted_by_user=true),
      shard 1: std::runtime_error (repair[4342512b-5a5f-48fc-a840-934100264cbc]: 3 out of 513 ranges failed, keyspace=ks2a, tables={tb5}, repair_reason=repair, nodes_down_during_repair={}, aborted_by_user=true)})

In addition, change the log

from

"Aborted {} repair job(s), aborted={}"

to

"Started to abort repair jobs={}, nr_jobs={}"

to reflect the fact the user requested abort api is async.

Closes #14062
2023-06-06 09:08:00 +03:00
Asias He
7056b7ee9a repair: Log nodes down during repair in case of failed repair
This helps users to figure if the repair has failed due to a peer node
was down during repair.

For example:

```
WARN  [shard 0] repair - repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]: Repair
1026 out of 1026 ranges, keyspace=ks2a, table={test_table, tb},
range=(9203128250168517738,+inf), peers={127.0.0.2}, live_peers={},
status=skipped_no_live_peers

INFO  [shard 0] repair - repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]: stats:
repair_reason=repair, keyspace=ks2a, tables={test_table, tb}, ranges_nr=513,
round_nr=0, round_nr_fast_path_already_synced=0,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=0,
tx_hashes_nr=0, rx_hashes_nr=0, duration=0 seconds, tx_row_nr=0, rx_row_nr=0,
tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={}, row_from_disk_nr={},
row_from_disk_bytes_per_sec={} MiB/s, row_from_disk_rows_per_sec={} Rows/s,
tx_row_nr_peer={}, rx_row_nr_peer={}

WARN  [shard 0] repair - repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]: 1026 out
of 1026 ranges failed, keyspace=ks2a, tables={test_table, tb},
repair_reason=repair, nodes_down_during_repair={127.0.0.2}

WARN  [shard 0] repair - repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]:
repair_tracker run failed: std::runtime_error ({shard 0: std::runtime_error
(repair[ec2e9646-918e-4345-99ab-fa07aa1f17de]: 1026 out of 1026 ranges failed,
keyspace=ks2a, tables={test_table, tb}, repair_reason=repair,
nodes_down_during_repair={127.0.0.2})})
```

In addition, change the `status=skipped` to `status=skipped_no_live_peers`
to make it more clear.

Closes #13928
2023-05-23 11:12:42 +03:00
Tomasz Grabiec
d3c9ad4ed6 locator: Rename effective_replication_map to vnode_effective_replication_map
In preparation for introducing a more abstract
effective_replication_map which can describe replication maps which
are not based on vnodes.
2023-04-24 10:49:36 +02:00
Aleksandra Martyniuk
f10b862955 repair: rename repair_module 2023-03-27 16:33:39 +02:00
Aleksandra Martyniuk
8f935481cd repair: add repair namespace to repair/task_manager_module.hh 2023-03-27 16:32:51 +02:00
Aleksandra Martyniuk
17e0e05f42 repair: rename repair_task.hh 2023-03-27 16:31:51 +02:00