scylladb/docs/dev/metrics.md

# Scylla Metrics
Scylla exposes dozens of different metrics which are valuable for
understanding the performance of a node, and for diagnosing performance
problems when those occur. Among other things, you can see counts of requests,
activity of disks, cpus and network, memory usage of different types,
activity in different individual tables, and many many more metrics.

Scylla's metrics are implemented using Seastar's metrics infrastructure.
Scylla's code updates metrics continuously in memory variables, and then
exposes them through an HTTP request, http://scyllanode:9180/metrics.
The response to this request is a text file listing the metrics and their
current values at the time of the query. This protocol, and the format of
the response was defined by the Prometheus metric collection system and
is described in detail here: https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md

> Note that the REST API in port 9180 is only devoted to publishing metrics.
> Scylla also has a separate and more powerful REST API on port 10000.

This very simple REST API is useful for quick scripting and development work,
but in Scylla production you'd usually want to collect metrics from multiple
Scylla nodes, collect a history of each metric over time, and provide a
graphical UI for viewing graphs of these histories. For this purpose,
we provide the separate scylla-grafana-monitoring project - see
https://github.com/scylladb/scylla-grafana-monitoring on how to install and
use it. The scylla-grafana-monitoring project allows you to continuously
collect metrics from several Scylla nodes into a Prometheus metric-collection
server, and then to visualize these metrics using Grafana and a web browser.
Prometheus and Grafana will be described in separate sections below.

## Metric labels: shard, instance and type
Different Scylla nodes will have different values for each metric (e.g.,
`scylla_cql_reads`, the total number of CQL read requests). Moreover, Scylla
is sharded, meaning that inside each node each core works on its own data
and keeps its own separate metrics. So in the metrics output, each metric
identifier contains, beyond the metric's name, also an additional label to
qualify which shard this metric comes from. For example:
```
scylla_cql_reads{shard="0",type="derive"} 20
```
In this example, this measurement comes from shard 0 (the first shard) of
the node which returned this metric.

When Prometheus collects measurements from multiple nodes, it further adds
an "instance" label to each measurement to remember from which node this
measurement came. The "instance" label has the form `ip_address:port` - see
https://prometheus.io/docs/concepts/jobs_instances/ for more information.
Note again that the instance label is not present in the metrics exposed by
Scylla (`http://scyllanode:9180/metrics`) but added later by Prometheus.

Saving instance and shard ids on each metric is what allows a single
Prometheus server to collect metrics from many Scylla nodes and their shards.
The visualization tool (such as Prometheus itself, or Grafana) can then show
the metrics of different nodes and shards separately, or to calculate and
display various sums - e.g., the sum on all shards of each node, or the total
sum of all shards and all nodes.

The "type" label should be ignored - it appears for historic reasons
(it was used by collectd) and is planned to be removed in the future.

## Additional metric labels
In some cases, we have several metrics which measure the same thing but for
different cases. For example, Scylla has about a dozen _scheduling groups_
(see isolation.md), and we would like to get some statistics - e.g. the
scheduler queue length - separately for each of these scheduling groups.

One option is to have a dozen different metrics with different names, e.g.,
`scylla_scheduler_queue_length_main`, `scylla_scheduler_queue_length_statement`
for the two scheduling groups called "main" and "statement".

However, there is a second option - which we chose in this case. The second
option is to have just one metric name, and qualify it by a **label** with
a value. In this case, we have one metric name `scylla_scheduler_queue_length`,
and metrics on different scheduling groups differ by the `group` label:
`scylla_scheduler_queue_length{group="main"}` and
`scylla_scheduler_queue_length{group="statement"}`.

Each metric reported by Scylla often has multiple labels, e.g.,
```
scylla_scheduler_queue_length{group="main",shard="0",type="gauge"} 0.000000
```
This metric has the `group` label, saying to which scheduling group this
measurement pertains, and also `shard` and `type` labels which we described
in the previous section.

## Per-table metrics
Most of Scylla's metrics are global (in each shard). Scylla also supports
per-table metrics, which are maintained separately for each table in the
database.

On a deployment with a large number of tables, this can result in a very
large number of metrics at each time, and overwhelm Scylla's HTTP
server and/or the Prometheus server collecting these metrics. For this
reason, the per-table metrics are currently **disabled** by default:
The per-table metrics are defined in the `table::set_metrics()` function,
and only added when the `enable_keyspace_column_family_metrics` flag is
enabled (and it is disabled by default).

To enable this flag and the per-table metrics, you can pass the parameters
`--enable-keyspace-column-family-metrics 1` in the Scylla command line, or
set this parameter in Scylla's configuration file.

We are planning to rethink this approach in the future. In particular,
it's not great that we currently need to restart Scylla to make these
metrics available. Scylla already maintains these per-table metrics in
per-table memory variables, and we just need a way to optionally expose
them through the HTTP request.

To tell the metrics of the different tables apart, each metric's identifier
contains the "ks" (*keyspace*) and "cf" (*column family* - the old name
for table) as labels. For example,

```
scylla_column_family_pending_compaction{cf="IndexInfo",ks="system",shard="0",type="gauge"} 0.000000
```

Here we can see the "scylla_column_family_pending_compactions" metric
measured in shard 0 of this node, for the table "IndexInfo" in keyspace
"system".

## Types of metrics
Scylla metrics fall under three types: "counter", "gauge" and "histogram".

Most metrics are of the "counter" type. A counter metric tracks a cumulative
value over objects or events that existed throughout the lifetime of the
node. For example, the "total number of requests processed so far", or
"the total number of bytes written to disk".

When visualizing counter metrics, it is often useful to look at the
*derivative*, or rate of change, of the number, instead of at the cumulative
number itself. Note that Scylla only provides the cumulative number - the
visualization tool used by the user (such as Grafana mentioned earlier) is
responsible for calculating the rate of change - by taking two measurements
of the cumulative value at two different times, and calculating the difference
of cumulative value divided by the time difference. For example, by
subtracting the "total number of requests" values queried one second apart,
we can show the number of requests handled during that second.

> In some contexts, we call counter metrics "derive" metrics. We do this
> mainly for historic reasons, because our previous focus on the "collectd"
> metric collection daemon - which Scylla still supports but is no longer
> our recommended choice. Collectd has both "derive" and "counter" metrics
> with a subtle difference: Both indicate cumulative values, but "counter"
> is a sum of non-negative values, while "derive" is a sum of values which
> may be negative. This distinction is not important in Scylla: all our
> cumulative metrics are sums of non-negative values, and are monotonically
> increasing. So in this document we picked the term "counter" and use it
> exclusively.

Contrary to counter metrics which accumulate a measurement throughout the
lifetime of the node, a **gauge** metric measures the state of objects
currently existing in the system. For example, the number of requests being
processed *right now*, the size of some queue, the amount of memory devoted
now to the row cache, or the amount of disk used now for the data storage.

Gauge metrics are less common than counter metrics. When visualizing them,
one usually wants to look at the metric itself rather than its rate of
change. However, even for gauge metrics it is sometimes useful to visualize
their derivative - for example, a user might want to visualize the rate of
change to the amount of disk storage.

Internally, Scylla calculates many of the gauge metrics just like calculates
counter metrics - as a cumulative value: For example, Scylla maintains a
metric of the number of requests being processed *right now* by adding 1 to
the metric when starting to process a request, and subtracting 1 when the
request's processing is complete. This metric is nevertheless labeled "gauge"
because it provides a metric over currently-existing objects in the system
(requests being processed), not a sum of historic information.

TODO: histogram metrics. They are described in the Prometheus document linked
above.

## List of metrics
Looking at the response for http://scyllanode:9180/metrics is the best
way to see the list of metrics currently exposed by Scylla, because it
includes a textual description in a comment above each metric.

TODO: mention source files in which a developer should add new metrics.

## Prometheus
So far, we described Scylla's internal metric-retrieval recapability,
a REST API for retrieving the current values of all metrics from a single
node. But in production, as well as more advanced debugging sessions, one
usually wants to collect metrics from multiple Scylla nodes, and to collect
and to graph a history of each metric over time. As already mentioned above,
we provide a separate project "scylla-grafana-monitoring" which does exactly
this using the Prometheus time-series database.

Prometheus is installed on a separate monitoring node (which we shall call
below "monitornode"). It connects to several Scylla nodes, and saves their
metrics into a time-series database. Prometheus then allows querying,
analyzing, and and graphing this data, via a Web interface at:

    http://monitornode:9090/

Through this Web interface, a user can search for a metric name (type
a word and see the list of all metrics with this word as part of their
name), and then see the current value of this metric over all shards and
nodes (the "Console" tab), and also see a graph of the value of this
metric over time (the "Graph" tab).

Prometheus allows querying and graphing not only the metric itself, but
also various functions and aggregates of these metrics. For example, if
a user asks to graph some metric `xyz` the result is a graph with multiple
lines, one line for each shard and node. The syntax `xyz{instance="..."}`
will limit the lines to all shards of just one node (given the node's IP
address), and the syntax `xyz{instance="...",shard="0"}` will show only
one shard of one node. The syntax `xyz{group=~"memtable.*"}` will show
only metrics where the `group` label matches the given regular expression.

The syntax `sum(xyz)` will plot just one line, with the total of the metric
`xyz` over all shards in all nodes. It's also possible to plot partial sums -
for example `sum(xyz) by (group)` generates a separate sum (and plot line)
for each value of the label `group`.

The expression `irate(xyz[1m])` graphs the rate of change (i.e.,
the derivative) of the metric `xyz`. In this last example, the "1m"
selector is ignored by the `irate()` function, but some duration is required
by the Prometheus syntax.

Prometheus supports many more functions and aggregations, which are described
in its documentation:
https://prometheus.io/docs/prometheus/latest/querying/basics/

## Grafana
While Prometheus already allows analyzing and graphing metric data, Grafana
is a more advanced user interface which allows displaying many of these graphs
in professional-looking "dashboards" which are more convenient for end-users
who don't know which metrics Scylla has and what they mean, and want to see
pre-canned dashboards of graphs that are useful for particular purposes.

The Grafana user interface is available in:

    http://monitornode:3000/