234 lines
12 KiB
Markdown
234 lines
12 KiB
Markdown
# Scylla Metrics
|
|
Scylla exposes dozens of different metrics which are valuable for
|
|
understanding the performance of a node, and for diagnosing performance
|
|
problems when those occur. Among other things, you can see counts of requests,
|
|
activity of disks, cpus and network, memory usage of different types,
|
|
activity in different individual tables, and many many more metrics.
|
|
|
|
Scylla's metrics are implemented using Seastar's metrics infrastructure.
|
|
Scylla's code updates metrics continuously in memory variables, and then
|
|
exposes them through an HTTP request, http://scyllanode:9180/metrics.
|
|
The response to this request is a text file listing the metrics and their
|
|
current values at the time of the query. This protocol, and the format of
|
|
the response was defined by the Prometheus metric collection system and
|
|
is described in detail here: https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md
|
|
|
|
> Note that the REST API in port 9180 is only devoted to publishing metrics.
|
|
> Scylla also has a separate and more powerful REST API on port 10000.
|
|
|
|
This very simple REST API is useful for quick scripting and development work,
|
|
but in Scylla production you'd usually want to collect metrics from multiple
|
|
Scylla nodes, collect a history of each metric over time, and provide a
|
|
graphical UI for viewing graphs of these histories. For this purpose,
|
|
we provide the separate scylla-grafana-monitoring project - see
|
|
https://github.com/scylladb/scylla-grafana-monitoring on how to install and
|
|
use it. The scylla-grafana-monitoring project allows you to continuously
|
|
collect metrics from several Scylla nodes into a Prometheus metric-collection
|
|
server, and then to visualize these metrics using Grafana and a web browser.
|
|
Prometheus and Grafana will be described in separate sections below.
|
|
|
|
## Metric labels: shard, instance and type
|
|
Different Scylla nodes will have different values for each metric (e.g.,
|
|
`scylla_cql_reads`, the total number of CQL read requests). Moreover, Scylla
|
|
is sharded, meaning that inside each node each core works on its own data
|
|
and keeps its own separate metrics. So in the metrics output, each metric
|
|
identifier contains, beyond the metric's name, also an additional label to
|
|
qualify which shard this metric comes from. For example:
|
|
```
|
|
scylla_cql_reads{shard="0",type="derive"} 20
|
|
```
|
|
In this example, this measurement comes from shard 0 (the first shard) of
|
|
the node which returned this metric.
|
|
|
|
When Prometheus collects measurements from multiple nodes, it further adds
|
|
an "instance" label to each measurement to remember from which node this
|
|
measurement came. The "instance" label has the form `ip_address:port` - see
|
|
https://prometheus.io/docs/concepts/jobs_instances/ for more information.
|
|
Note again that the instance label is not present in the metrics exposed by
|
|
Scylla (`http://scyllanode:9180/metrics`) but added later by Prometheus.
|
|
|
|
Saving instance and shard ids on each metric is what allows a single
|
|
Prometheus server to collect metrics from many Scylla nodes and their shards.
|
|
The visualization tool (such as Prometheus itself, or Grafana) can then show
|
|
the metrics of different nodes and shards separately, or to calculate and
|
|
display various sums - e.g., the sum on all shards of each node, or the total
|
|
sum of all shards and all nodes.
|
|
|
|
The "type" label should be ignored - it appears for historic reasons
|
|
(it was used by collectd) and is planned to be removed in the future.
|
|
|
|
## Additional metric labels
|
|
In some cases, we have several metrics which measure the same thing but for
|
|
different cases. For example, Scylla has about a dozen _scheduling groups_
|
|
(see isolation.md), and we would like to get some statistics - e.g. the
|
|
scheduler queue length - separately for each of these scheduling groups.
|
|
|
|
One option is to have a dozen different metrics with different names, e.g.,
|
|
`scylla_scheduler_queue_length_main`, `scylla_scheduler_queue_length_statement`
|
|
for the two scheduling groups called "main" and "statement".
|
|
|
|
However, there is a second option - which we chose in this case. The second
|
|
option is to have just one metric name, and qualify it by a **label** with
|
|
a value. In this case, we have one metric name `scylla_scheduler_queue_length`,
|
|
and metrics on different scheduling groups differ by the `group` label:
|
|
`scylla_scheduler_queue_length{group="main"}` and
|
|
`scylla_scheduler_queue_length{group="statement"}`.
|
|
|
|
Each metric reported by Scylla often has multiple labels, e.g.,
|
|
```
|
|
scylla_scheduler_queue_length{group="main",shard="0",type="gauge"} 0.000000
|
|
```
|
|
This metric has the `group` label, saying to which scheduling group this
|
|
measurement pertains, and also `shard` and `type` labels which we described
|
|
in the previous section.
|
|
|
|
## Per-table metrics
|
|
Most of Scylla's metrics are global (in each shard). Scylla also supports
|
|
per-table metrics, which are maintained separately for each table in the
|
|
database.
|
|
|
|
On a deployment with a large number of tables, this can result in a very
|
|
large number of metrics at each time, and overwhelm Scylla's HTTP
|
|
server and/or the Prometheus server collecting these metrics. For this
|
|
reason, the per-table metrics are currently **disabled** by default:
|
|
The per-table metrics are defined in the `table::set_metrics()` function,
|
|
and only added when the `enable_keyspace_column_family_metrics` flag is
|
|
enabled (and it is disabled by default).
|
|
|
|
To enable this flag and the per-table metrics, you can pass the parameters
|
|
`--enable-keyspace-column-family-metrics 1` in the Scylla command line, or
|
|
set this parameter in Scylla's configuration file.
|
|
|
|
We are planning to rethink this approach in the future. In particular,
|
|
it's not great that we currently need to restart Scylla to make these
|
|
metrics available. Scylla already maintains these per-table metrics in
|
|
per-table memory variables, and we just need a way to optionally expose
|
|
them through the HTTP request.
|
|
|
|
To tell the metrics of the different tables apart, each metric's identifier
|
|
contains the "ks" (*keyspace*) and "cf" (*column family* - the old name
|
|
for table) as labels. For example,
|
|
|
|
```
|
|
scylla_column_family_pending_compaction{cf="IndexInfo",ks="system",shard="0",type="gauge"} 0.000000
|
|
```
|
|
|
|
Here we can see the "scylla_column_family_pending_compactions" metric
|
|
measured in shard 0 of this node, for the table "IndexInfo" in keyspace
|
|
"system".
|
|
|
|
## Types of metrics
|
|
Scylla metrics fall under three types: "counter", "gauge" and "histogram".
|
|
|
|
Most metrics are of the "counter" type. A counter metric tracks a cumulative
|
|
value over objects or events that existed throughout the lifetime of the
|
|
node. For example, the "total number of requests processed so far", or
|
|
"the total number of bytes written to disk".
|
|
|
|
When visualizing counter metrics, it is often useful to look at the
|
|
*derivative*, or rate of change, of the number, instead of at the cumulative
|
|
number itself. Note that Scylla only provides the cumulative number - the
|
|
visualization tool used by the user (such as Grafana mentioned earlier) is
|
|
responsible for calculating the rate of change - by taking two measurements
|
|
of the cumulative value at two different times, and calculating the difference
|
|
of cumulative value divided by the time difference. For example, by
|
|
subtracting the "total number of requests" values queried one second apart,
|
|
we can show the number of requests handled during that second.
|
|
|
|
> In some contexts, we call counter metrics "derive" metrics. We do this
|
|
> mainly for historic reasons, because our previous focus on the "collectd"
|
|
> metric collection daemon - which Scylla still supports but is no longer
|
|
> our recommended choice. Collectd has both "derive" and "counter" metrics
|
|
> with a subtle difference: Both indicate cumulative values, but "counter"
|
|
> is a sum of non-negative values, while "derive" is a sum of values which
|
|
> may be negative. This distinction is not important in Scylla: all our
|
|
> cumulative metrics are sums of non-negative values, and are monotonically
|
|
> increasing. So in this document we picked the term "counter" and use it
|
|
> exclusively.
|
|
|
|
Contrary to counter metrics which accumulate a measurement throughout the
|
|
lifetime of the node, a **gauge** metric measures the state of objects
|
|
currently existing in the system. For example, the number of requests being
|
|
processed *right now*, the size of some queue, the amount of memory devoted
|
|
now to the row cache, or the amount of disk used now for the data storage.
|
|
|
|
Gauge metrics are less common than counter metrics. When visualizing them,
|
|
one usually wants to look at the metric itself rather than its rate of
|
|
change. However, even for gauge metrics it is sometimes useful to visualize
|
|
their derivative - for example, a user might want to visualize the rate of
|
|
change to the amount of disk storage.
|
|
|
|
Internally, Scylla calculates many of the gauge metrics just like calculates
|
|
counter metrics - as a cumulative value: For example, Scylla maintains a
|
|
metric of the number of requests being processed *right now* by adding 1 to
|
|
the metric when starting to process a request, and subtracting 1 when the
|
|
request's processing is complete. This metric is nevertheless labeled "gauge"
|
|
because it provides a metric over currently-existing objects in the system
|
|
(requests being processed), not a sum of historic information.
|
|
|
|
TODO: histogram metrics. They are described in the Prometheus document linked
|
|
above.
|
|
|
|
## List of metrics
|
|
Looking at the response for http://scyllanode:9180/metrics is the best
|
|
way to see the list of metrics currently exposed by Scylla, because it
|
|
includes a textual description in a comment above each metric.
|
|
|
|
TODO: mention source files in which a developer should add new metrics.
|
|
|
|
## Prometheus
|
|
So far, we described Scylla's internal metric-retrieval recapability,
|
|
a REST API for retrieving the current values of all metrics from a single
|
|
node. But in production, as well as more advanced debugging sessions, one
|
|
usually wants to collect metrics from multiple Scylla nodes, and to collect
|
|
and to graph a history of each metric over time. As already mentioned above,
|
|
we provide a separate project "scylla-grafana-monitoring" which does exactly
|
|
this using the Prometheus time-series database.
|
|
|
|
Prometheus is installed on a separate monitoring node (which we shall call
|
|
below "monitornode"). It connects to several Scylla nodes, and saves their
|
|
metrics into a time-series database. Prometheus then allows querying,
|
|
analyzing, and and graphing this data, via a Web interface at:
|
|
|
|
http://monitornode:9090/
|
|
|
|
Through this Web interface, a user can search for a metric name (type
|
|
a word and see the list of all metrics with this word as part of their
|
|
name), and then see the current value of this metric over all shards and
|
|
nodes (the "Console" tab), and also see a graph of the value of this
|
|
metric over time (the "Graph" tab).
|
|
|
|
Prometheus allows querying and graphing not only the metric itself, but
|
|
also various functions and aggregates of these metrics. For example, if
|
|
a user asks to graph some metric `xyz` the result is a graph with multiple
|
|
lines, one line for each shard and node. The syntax `xyz{instance="..."}`
|
|
will limit the lines to all shards of just one node (given the node's IP
|
|
address), and the syntax `xyz{instance="...",shard="0"}` will show only
|
|
one shard of one node. The syntax `xyz{group=~"memtable.*"}` will show
|
|
only metrics where the `group` label matches the given regular expression.
|
|
|
|
The syntax `sum(xyz)` will plot just one line, with the total of the metric
|
|
`xyz` over all shards in all nodes. It's also possible to plot partial sums -
|
|
for example `sum(xyz) by (group)` generates a separate sum (and plot line)
|
|
for each value of the label `group`.
|
|
|
|
The expression `irate(xyz[1m])` graphs the rate of change (i.e.,
|
|
the derivative) of the metric `xyz`. In this last example, the "1m"
|
|
selector is ignored by the `irate()` function, but some duration is required
|
|
by the Prometheus syntax.
|
|
|
|
Prometheus supports many more functions and aggregations, which are described
|
|
in its documentation:
|
|
https://prometheus.io/docs/prometheus/latest/querying/basics/
|
|
|
|
## Grafana
|
|
While Prometheus already allows analyzing and graphing metric data, Grafana
|
|
is a more advanced user interface which allows displaying many of these graphs
|
|
in professional-looking "dashboards" which are more convenient for end-users
|
|
who don't know which metrics Scylla has and what they mean, and want to see
|
|
pre-canned dashboards of graphs that are useful for particular purposes.
|
|
|
|
The Grafana user interface is available in:
|
|
|
|
http://monitornode:3000/
|