Files
scylladb/test/cluster/test_node_ops_metrics.py
Michael Litvak 5c28cffdb4 test/pylib/rest_client: fix ScyllaMetrics filtering
In the ScyllaMetrics `get` function, when requesting the value for a
specific shard, it is expected to return the sum of all values of
metrics for that shard that match the labels.

However, it would return the value of the first matching line it finds
instead of summing all matching lines.

For example, if we have two lines for one shard like:
some_metric{scheduling_group_name="compaction",shard="0"} 1
some_metric{scheduling_group_name="sl:default",shard="0"} 2

The result of this call would be 1 instead of 3:
get('some_metric', shard="0")

We fix this to sum all matching lines.

The filtering of lines by labels is fixed to allow specifying only some
of the labels. Previously, for the line to match the filter, either the
filter needs to be empty, or all the labels in the metric line had to be
specified in the filter parameter and match its value, which is
unexpected, and breaks when more labels are added.

We also simplify the function signature and the implementation - instead
of having the shard as a separate parameter, it can be specified as a
label, like any other label.
2025-08-10 10:16:00 +02:00

36 lines
1.1 KiB
Python

#
# Copyright (C) 2024-present ScyllaDB
#
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
#
import asyncio
import pytest
import logging
logger = logging.getLogger(__name__)
@pytest.mark.asyncio
async def test_bootstrap_removenode_metrics(manager):
cfg = {'enable_repair_based_node_ops': True}
servers = [await manager.server_add(config=cfg),
await manager.server_add(config=cfg),
await manager.server_add(config=cfg)]
await manager.server_stop_gracefully(servers[2].server_id)
await manager.remove_node(servers[0].server_id, servers[2].server_id)
def check_ops(metrics, ops):
metric_name = "scylla_node_ops_finished_percentage"
shard = 0
while True:
cnt = metrics.get(metric_name, {'ops': ops, 'shard': str(shard)})
if cnt == None:
break
logger.info(f"Checking {shard=} {cnt=}")
assert int(cnt) == 1
shard = shard + 1
for s in servers[:2]:
metrics = await manager.metrics.query(s.ip_addr)
check_ops(metrics, 'bootstrap')
check_ops(metrics, 'removenode')