Commit Graph

24 Commits

Author SHA1 Message Date
Vlad Zolotarov
026061733f tracing: set a default TTL for system_traces tables when they are created
Fixes #1482

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1469104164-4452-1-git-send-email-vladz@cloudius-systems.com>
(cherry picked from commit 4647ad9d8a)
2016-07-25 13:50:43 +03:00
Vlad Zolotarov
a197323b47 tracing::trace_state.hh: Add descriptions for main methods and functions
Add a proper description to a tracing::trace() that clarifies
that the tracing message string and the positional parameters
are going to be copied if tracing state is initialized.

Add a description for trace_state::begin() methods and for a
tracing::begin() helper function.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
b36b69c1d6 service::storage_proxy: remove a default value for a tracing::trace_state_ptr parameter
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
89a49c346c tracing::trace_state: add begin() overload for seastar::value_of given as a "request" parameter.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
0689843e79 tracing::trace_state: add method to set the session's "params" map entries
Sometimes we want to be able to set "params" map after we
started a tracing session, e.g. when the parameters values,
like a consistency level parsed from the "options" part of a binary frame,
are available only after some heavy part of a flow we would like
to trace.

This patch includes the following changes:

   - No longer pass a map to the begin().
   - Limit the parameters to the known set.
   - Define a method to set each such parameter and save its
     value till the final sstring->sstring map is created.
   - Construct the final sstring->sstring map in the destructor of the trace_state
     object in order to defer all the formatting to be after the traced flow.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
9c0a725c56 tracing: add a _local_tracing to a i_tracing_backend_helper
A backend helper has to constantly communicate with the corresponding
tracing::tracing instance. By saving a reference to the tracing::tracing instance
will save us a lot of tracing::get_local_tracing_instance() calls and thus
a lot of dereferencing.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
2bb054748e tracing: record events' time stamps
- Extend the i_tracing_backend_helper interface to accept the event
     record timestamp.
   - Grab the current timestamp when the event record is taken.
   - Add the instrumentation to the trace_keyspace_helper to create a unique time-UUID
     from a given std::chrono::duration object.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
06d4221382 tracing: add tracing::make_trace_info() helper
This helper returns an std::experimental::optional<trace_info>
which is initialized or not initialized depending on whether
a given trace_state_ptr is initialized or not.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
7a5fc9fcdc tracing::trace_state: add const qualifiers to a trace_state_ptr parameter
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
b0673aabd5 tracing: fix a logger name
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
da4836becc tracing::trace_state: add support for a formatted message in trace()
Add an support for passing a format string plus positional parameters
for creation of a trace point message.

Format string should be given in a fmt library native format described
here: http://fmtlib.net/latest/syntax.html#syntax .

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
ee0e986e96 tracing: make a service shutdown stages more strict
kick() backend during shutdown and restrict accessing a backend
after that.

Flush pending records when service is being shut down.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
6e38133f82 tracing: prevent a destruction of a tracing::tracing while it's used
Prevent the destruction of tracing::tracing instances while there
are still tracing::trace_state objects that are using it:

   - Make tracing::tracing inherit from seastar::async_sharded_service<tracing::tracing>.
   - Grab a tracing::tracing.shared_from_this() in each
     tracing::trace_state object using it.
   - Use a saved pointer to the local tracing::tracing instance in a destructor
     instead of accessing it via tracing::get_local_tracing_instance()
     to avoid "local is not initialized" assert when sessions are
     being destroyed after the service was stopped.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
a5022a09a4 tracing: use 'write' instead of 'flush' and 'store' for consistency with seastar's API
In names of functions and variables:
s/flush_/write_/
s/store_/write_/

In a i_tracing_backend_helper:
s/flush()/kick()/

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:57 +03:00
Vlad Zolotarov
d3960f0bbb tracing: rearrange shut down
tracing::tracing local instance is dereferenced from a
cql_server::connection::process_request(), therefore tracing::tracing
service may be stop()ed only after a CQL server service is down.
On the other hand it may not be stopped before RPC service is down
because a remote side may request a tracing for a specific command too.

This patch splits the tracing::tracing stop() into two phases:
   1) Flush all pending tracing records and stop the backend.
   2) Stop the service.

The first phase is called after CQL server is down and before RPC is down.
The second phase is called after RPC is down.

Fixes #1339

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1465840496-19990-1-git-send-email-vladz@cloudius-systems.com>
2016-06-14 07:58:04 +03:00
Vlad Zolotarov
ce08bc611c tracing: fix debug compilation
Define flush_period as a const and not as constexpr.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1465240516-20128-1-git-send-email-vladz@cloudius-systems.com>
2016-06-06 22:15:27 -04:00
Vlad Zolotarov
905190ac06 tracing: add support for probabilistic tracing
Add a support for defining a probability (a value in a [0,1] range)
for tracing the next CQL request.

Traces for requests that are chosen to be traced due to this feature
are not going to flushed immediately.

Use std::subtract_with_carry_engine (implements the "lagged Fibonacci" algorithm)
random number engine for fastest generation of random integer values.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 15:41:01 +03:00
Vlad Zolotarov
779ff88c76 tracing: add flush timer
Flush pending sessions to the storage every 2 seconds.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 14:34:08 +03:00
Vlad Zolotarov
4b008ac5ea tracing: rework maximum sessions amount back pressure strategy
A tracing session life cycle includes 3 stages:
   1) Active: when new trace records are being added to this session.
   2) Pending for flushing to a storage: when session is over but not
      yet flushed to the storage ("backend").
   3) Flushing: when session's records are being flushed to the storage
      and this process is not yet completed.

Sessions may accumulate in each of the stages above and we should limit
the maximum amount of sessions being accumulated in each of them in order to avoid OOM
situation.

Current in-tree implementation only limits the number of tracing sessions
accumulated in the first ("Active") stage.

Since currently every closing session is being immediately flushed (as long
as "settraceprobability" is not implemented) the second stage never accumulates
tracing sessions.

The third stage is currently not controlled at all and if, for instance, we
succeed to push enough tracing session towards a slow storage backend, they may
accumulate there consuming an uncontrolled amount of memory and may eventually consume
all of it.

This patch fixes this unpleasant situation by implying the following strategy:

   - Limit the total amount of accumulated tracing sessions in all stages above together
     by a static value - 2 times "flush threshold". "2 times" is needed to allow new
     tracing sessions to accumulate in the stage 2 while sessions in the stage 3 are still
     being  processed.
   - Forcefully flush sessions in the stage 2 to the storage when their count reaches a "flush
     threshold".

This would ensure that there will not more than totally (2 * "flush threshold") sessions (in any stage)
on each shard.

An advantage of this strategy is its simplicity - we only need a single threshold to control all stages.
If we feel that we needed a finer graining for each stage we may add separate limits for each of them
in the future.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 13:50:41 +03:00
Vlad Zolotarov
139fa9d1bd tracing: minor cleanups
- Make small functions on a fast path "inline".
   - Add "const" qualifier where needed.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 13:00:31 +03:00
Vlad Zolotarov
a53d329b25 tracing: add a serializable trace_info object
tracing::trace_info is used to pass the tracing information between nodes.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:16:53 +03:00
Vlad Zolotarov
c965528a03 tracing: add a trace_state and tracing classes
trace_state: Is a single tracing session.
tracing:     A sharded service that contains an i_trace_backend_helper instance
             and is a "factory" of trace_state objects.

trace_state main interface functions are:
   - begin(): Start time counting (should be used via tracing::begin() wrapper).
   - trace(): Create a tracing event - it's coupled with a time passed since begin()
              (should be used via tracing::trace() wrapper).
   - ~trace_state(): Destructor will close the tracing session.

"tracing" service main interface function is:
   - start(): Initialize a backend.
   - stop():  Shut down a backend.
   - create_session(): Creates a new tracing session.

(tracing::end_session(): Is called by a trace_state destructor).

When trace_state needs to store a tracing event it uses a backend helper from
a "tracing" service.

A "tracing" service limits a number of opened tracing session by a static number.
If this number is reached - next sessions will be dropped.

trace_state implements a similar strategy in regard to tracing events per singe
session.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:13:42 +03:00
Vlad Zolotarov
d3988a8113 tracing::trace_keyspace_helper: a keyspace based i_tracing_backend_helper implementation
Uses a CQL keyspace system_traces to store tracing information.

Uses two tables:

CREATE TABLE system_traces.sessions (
session_id uuid,
command text,
client inet,
coordinator inet,
duration int,
parameters map<text, text>,
request text,
started_at timestamp,
PRIMARY KEY ((session_id)))

and

CREATE TABLE system_traces.events (
session_id uuid,
event_id timeuuid,
activity text,
source inet,
source_elapsed int,
thread text,
PRIMARY KEY ((session_id), event_id))

system_traces.sessions table contains records of tracing sessions.
system_traces.sessions columns description:
   - session_id:  an ID of the session.
   - command:     type of a command this session was created for
                  (currently supported "NONE", "QUERY" and "REPAIR").
   - client:      IP of the client that issued the command.
   - coordinator: IP of a coordinator that received the command.
   - duration:    total duration of the tracing session (in us).
   - parameters:  optional parameters for this session, passed to
                  i_trace_state::begin() call.
   - request:     a CQL command this tracing session is created for.
   - started_at:  the time the session has been started at.

system_traces.events contains records of separate tracing events.
system_traces.events columns description:
   - session_id:     an ID of the session.
   - event_id:       an ID of the event.
   - activity:       the trace point description - a message given to
                     i_trace_state::trace().
   - source:         IP of the Node where trace event was issued.
   - source_elapsed: time passed since creation of a tracing session (in us) on
                     the Node where this trace event was issued.
   - thread:         name of the thread in who's context this trace event was
                     issued in (currently its "core N", where 'N' is an index of
                     a shard the trace event was issued on).

This class will cache lambdas creating the corresponding mutations for each tracing
record requested to be stored till flush() method is called.

flush() will merge all pending mutations to "sessions" and "events" tables and
then apply a mutation to "events" table and when it completes - to "sessions"
table. This way it'll ensure that when some tracing session is visible, all its
events are visible too.

trace_keyspace_helper exposes a few metrics via collectd:
   - tracing_error - a total number of errors (not including OOM)
   - bad_column_family_errors - number of times a tracing record wasn't
                                stored because system_trace tables' schema
                                didn't match the expected value. This may happen if
                                a DB administrator is doing funny things like altering
                                the schemas of the above tables.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:12:19 +03:00
Vlad Zolotarov
a2994ffd7f tracing: add i_tracing_backend_helper interface
This class represents an interface for a specific backend that is
going to store tracing information.

The specific implementation may and expected to implement caching
of pending tracing records.

Interface functions are:
   - start(): Initialize a backend (e.g. create keyspace and tables).
   - stop():  Flush all pending work and shut down the backend.
   - store_session_record()/store_event_record():
              Cache/store the corresponding tracing records.
   - flush(): Flush pending tracing records.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-01 20:12:13 +03:00