This patch adds metrics to the Alternator TTL feature (aka the "expiration
service").
I put these metrics deliberately in their own object in ttl.{hh,cc}, and
also with their own prefix ("expiration_*") - and *not* together with the
rest of the Alternator metrics (alternator/stats.{hh,cc}). This is
because later we may want to use the expiration service not only in
Alternator but also in CQL - to support per-item expiration with CDC
events also in CQL. So the implementation of this feature should not be
too tangled with that of Alternator.
The patch currently adds four metrics, and opens the path to easily add
more in the future. The metrics added now are:
1. scylla_expiration_scan_passes: The number of scan passes over the
entire table. We expect this to grow by 1 every
alternator_ttl_period_in_seconds seconds.
2. scylla_expiration_scan_table: The number of table scans. In each scan
pass, we scan all the tables that have the Alternator TTL feature
enabled. Each scan of each table is counted by this counter.
3. scylla_expiration_items_deleted: Counts the number of items that
the expiration service expired (deleted). Please remember that
each item is considered for expiration - and then expired - on
only one node, so each expired item is counted only once - not
RF times.
4. scylla_expiration_secondary_ranges_scanned: If this counter is
incremented, it means this node took over some other node's
expiration scanning duties while the other node was down.
This patch also includes a couple of unrelated comment fixes.
I tested the new metrics manually - they aren't yet tested by the
Alternator test suite because I couldn't make up my mind if such
tests would belong in test_ttl.py or test_metrics.py :-)
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220224092419.1132655-1-nyh@scylladb.com>