Compare commits

...

11 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
aebf71b5a5 Optimize: fetch tablet map once per table group
Move get_tablet_map() call before the should_break check to avoid
fetching the same tablet map twice for each table group.

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-12 15:10:13 +00:00
copilot-swe-agent[bot]
0f3bd7d91b Refactor: extract logging to method and improve format
- Extract logging logic into log_active_transitions() method with
  max_count as parameter for better reusability
- Change log format to "Active {kind} transition: ..." to make the
  transition kind more prominent at the beginning of the message

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-12 15:09:12 +00:00
copilot-swe-agent[bot]
c29d976e40 Use single info() statement for transition logging
Consolidated 4 separate rtlogger.info() calls into one that
conditionally appends leaving and pending replica information.
This makes the code cleaner while maintaining the same output.

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-12 15:05:38 +00:00
copilot-swe-agent[bot]
b08bd1dac4 Optimize transition counting to avoid unnecessary iteration
After logging 5 transitions, use transitions().size() to efficiently
count remaining transitions in other table groups instead of iterating
through each one. This improves performance when there are many active
transitions.

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-12 13:27:46 +00:00
copilot-swe-agent[bot]
9bd784f15c Limit transition logging to 5, check optionals, remove yields
- Only log leaving/pending replicas if they exist (check optional values)
- Limit logging to first 5 transitions, print "... and N more" if more exist
- Remove co_await coroutine::maybe_yield() calls as requested
- Handle all combinations of leaving/pending presence in log messages

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-12 13:26:44 +00:00
copilot-swe-agent[bot]
6b27f8002b Add clarifying comment about logging only involved replicas
Add comment explaining why we log only leaving/pending replicas
rather than full replica sets - to focus on what's actually changing
in the transition.

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-11 23:56:20 +00:00
copilot-swe-agent[bot]
cd62a4d669 Log only leaving and pending replicas in transition
Changed the transition logging to print only the replicas involved in
the transition (leaving and pending) instead of all current and next
replicas. This makes the log output more concise and focused on what's
actually changing.

Example output:
Active transition: tablet=<uuid>/<id>, kind=migration, stage=streaming,
  leaving=<host1>:0, pending=<host2>:1

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-11 23:55:40 +00:00
copilot-swe-agent[bot]
f1543e9075 Mark unused 'tables' variable with [[maybe_unused]]
Address code review feedback by marking the 'tables' variable as
[[maybe_unused]] since it's part of the structured binding from
all_table_groups() but not used in the logging loop.

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-11 23:51:10 +00:00
copilot-swe-agent[bot]
b583dc6a72 Add coroutine yield points to transition logging loop
Add co_await coroutine::maybe_yield() at each iteration to prevent
reactor stalls when iterating over potentially large numbers of
tablet transitions.

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-11 23:50:21 +00:00
copilot-swe-agent[bot]
d4a041da59 Add logging of active tablet transitions before sleep
Log detailed information about each active tablet transition when the
topology coordinator goes to sleep with active transitions. This includes:
- Tablet ID (table + tablet)
- Transition kind (migration, rebuild, etc.)
- Transition stage
- Current replicas
- Next replicas (destination of the transition)

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-02-11 23:49:47 +00:00
copilot-swe-agent[bot]
fa0f23a863 Initial plan 2026-02-11 23:45:57 +00:00

View File

@@ -1441,6 +1441,43 @@ class topology_coordinator : public endpoint_lifecycle_subscriber
}
}
void log_active_transitions(size_t max_count) {
auto tm = get_token_metadata_ptr();
size_t logged_count = 0;
size_t total_count = 0;
bool should_break = false;
for (auto&& [base_table, tables [[maybe_unused]]] : tm->tablets().all_table_groups()) {
const auto& tmap = tm->tablets().get_tablet_map(base_table);
if (should_break) {
total_count += tmap.transitions().size();
continue;
}
for (auto&& [tablet, trinfo]: tmap.transitions()) {
total_count++;
if (logged_count < max_count) {
locator::global_tablet_id gid { base_table, tablet };
const auto& tinfo = tmap.get_tablet_info(tablet);
// Log only the replicas involved in the transition (leaving/pending)
// rather than all replicas, to focus on what's actually changing
auto leaving = locator::get_leaving_replica(tinfo, trinfo);
auto pending = trinfo.pending_replica;
rtlogger.info("Active {} transition: tablet={}, stage={}{}{}",
trinfo.transition, gid, trinfo.stage,
leaving ? fmt::format(", leaving={}", *leaving) : "",
pending ? fmt::format(", pending={}", *pending) : "");
logged_count++;
if (logged_count >= max_count) {
should_break = true;
break;
}
}
}
}
if (total_count > max_count) {
rtlogger.info("... and {} more active transitions", total_count - max_count);
}
}
// When "drain" is true, we migrate tablets only as long as there are nodes to drain
// and then change the transition state to write_both_read_old. Also, while draining,
// we ignore pending topology requests which normally interrupt load balancing.
@@ -2026,6 +2063,7 @@ class topology_coordinator : public endpoint_lifecycle_subscriber
// to check atomically with event.wait()
if (!_tablets_ready) {
rtlogger.debug("Going to sleep with active tablet transitions");
log_active_transitions(5);
release_guard(std::move(guard));
co_await await_event();
}