Files
scylladb/service
Avi Kivity fb3a83a811 Merge "Slow query logging" from Vlad
"This series introduces a "slow query logging" feature that
allows logging the queries that take more than a specified
threshold time to complete.

Once such a query detected, it will be logged in a system_traces.node_slow_log table.

In addition all trace for that query that have been collected on a Coordinator
are going to be written as well.

If the handling time on a replica in the context of a query takes more than (the same) threshold
they are going to be written too.

The raw in a node_slow_log contains a session_id of a corresponding tracing session,
thereby allowing the user to query the system_traces tables for the corresponding trace
records.

The schema of the node_slow_log table is as follows:

CREATE TABLE system_traces.node_slow_log (
    node_ip inet,
    shard int,
    session_id uuid,
    date timestamp,
    start_time timeuuid,
    command text,
    duration int,
    parameters map<text, text>,
    source_ip inet,
    table_names set<text>,
    username text,
    PRIMARY KEY (start_time, node_ip, shard))
    WITH default_time_to_live = 86400

where
 - node_ip: IP of the coordinator Node.
 - shard: shard ID on a Coordinator where the query was handled.
 - session_id: ID of a corresponding tracing session.
 - date: a time when the query has began.
 - start_time: a time-based UUID for this query (needed for a primary key mostly).
 - command: a query string.
 - duration: a time it took to handle this query (in microseconds).
 - parameters: a map of query parameters (like in system_traces.sessions).
 - source_ip: IP of a Client that sent this query.
 - table_names: a set of "<keyspace>.<table name>" strings representing column
                families used in this query.
 - username: a user name used for this query.

The good thing is that most of the data we needed is already
collected by the regular tracing framework. The only missing ones
are a username and tables' names. So, this series makes the framework collect them too.

The whole feature is integrated in the Tracing framework. The main
changes to the framework that were made are as follows:
 - Store the constant capabilities of the tracing session in an enum_set, e.g.:
  - primary/secondary.
  - write on close.
 - Introduce two new capabilities to a tracing session of a specific query:
  - full tracing: collect all traces for this query (as it is before this series).
  - log slow query: log this query if its duration is above the threshold.
  These two capabilities may be defined independently.
 - Add the logic that handles the "log slow query"-only case:
  - Build the parameters<sstring, sstring> map only if the "duration" is above
    the given threshold.
  - The same about writing the trace entries.
 - In a not-only "log slow query" case:
  - Write the node_slow_log entry.
  - Extend the trace_info struct to pass slow query threshold and TTL to the replica
    Node.

In addition to above this series add the capability to configure the slow query logging
threshold and a TTL for the node_slow_log records.

The heaviest patch in the series is the last one. The series contains a few cosmetic (renaming)
patches that are meant to align the naming of the existing methods with the ones the last one
is going to add."
2016-08-29 13:11:36 +03:00
..