scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Vlad Zolotarov	026061733f	tracing: set a default TTL for system_traces tables when they are created Fixes #1482 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1469104164-4452-1-git-send-email-vladz@cloudius-systems.com> (cherry picked from commit `4647ad9d8a`)	2016-07-25 13:50:43 +03:00
Vlad Zolotarov	a197323b47	tracing::trace_state.hh: Add descriptions for main methods and functions Add a proper description to a tracing::trace() that clarifies that the tracing message string and the positional parameters are going to be copied if tracing state is initialized. Add a description for trace_state::begin() methods and for a tracing::begin() helper function. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	b36b69c1d6	service::storage_proxy: remove a default value for a tracing::trace_state_ptr parameter Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	89a49c346c	tracing::trace_state: add begin() overload for seastar::value_of given as a "request" parameter. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	0689843e79	tracing::trace_state: add method to set the session's "params" map entries Sometimes we want to be able to set "params" map after we started a tracing session, e.g. when the parameters values, like a consistency level parsed from the "options" part of a binary frame, are available only after some heavy part of a flow we would like to trace. This patch includes the following changes: - No longer pass a map to the begin(). - Limit the parameters to the known set. - Define a method to set each such parameter and save its value till the final sstring->sstring map is created. - Construct the final sstring->sstring map in the destructor of the trace_state object in order to defer all the formatting to be after the traced flow. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	9c0a725c56	tracing: add a _local_tracing to a i_tracing_backend_helper A backend helper has to constantly communicate with the corresponding tracing::tracing instance. By saving a reference to the tracing::tracing instance will save us a lot of tracing::get_local_tracing_instance() calls and thus a lot of dereferencing. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	2bb054748e	tracing: record events' time stamps - Extend the i_tracing_backend_helper interface to accept the event record timestamp. - Grab the current timestamp when the event record is taken. - Add the instrumentation to the trace_keyspace_helper to create a unique time-UUID from a given std::chrono::duration object. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	06d4221382	tracing: add tracing::make_trace_info() helper This helper returns an std::experimental::optional<trace_info> which is initialized or not initialized depending on whether a given trace_state_ptr is initialized or not. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	7a5fc9fcdc	tracing::trace_state: add const qualifiers to a trace_state_ptr parameter Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	b0673aabd5	tracing: fix a logger name Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	da4836becc	tracing::trace_state: add support for a formatted message in trace() Add an support for passing a format string plus positional parameters for creation of a trace point message. Format string should be given in a fmt library native format described here: http://fmtlib.net/latest/syntax.html#syntax . Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	ee0e986e96	tracing: make a service shutdown stages more strict kick() backend during shutdown and restrict accessing a backend after that. Flush pending records when service is being shut down. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	6e38133f82	tracing: prevent a destruction of a tracing::tracing while it's used Prevent the destruction of tracing::tracing instances while there are still tracing::trace_state objects that are using it: - Make tracing::tracing inherit from seastar::async_sharded_service<tracing::tracing>. - Grab a tracing::tracing.shared_from_this() in each tracing::trace_state object using it. - Use a saved pointer to the local tracing::tracing instance in a destructor instead of accessing it via tracing::get_local_tracing_instance() to avoid "local is not initialized" assert when sessions are being destroyed after the service was stopped. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	a5022a09a4	tracing: use 'write' instead of 'flush' and 'store' for consistency with seastar's API In names of functions and variables: s/flush_/write_/ s/store_/write_/ In a i_tracing_backend_helper: s/flush()/kick()/ Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	d3960f0bbb	tracing: rearrange shut down tracing::tracing local instance is dereferenced from a cql_server::connection::process_request(), therefore tracing::tracing service may be stop()ed only after a CQL server service is down. On the other hand it may not be stopped before RPC service is down because a remote side may request a tracing for a specific command too. This patch splits the tracing::tracing stop() into two phases: 1) Flush all pending tracing records and stop the backend. 2) Stop the service. The first phase is called after CQL server is down and before RPC is down. The second phase is called after RPC is down. Fixes #1339 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465840496-19990-1-git-send-email-vladz@cloudius-systems.com>	2016-06-14 07:58:04 +03:00
Vlad Zolotarov	ce08bc611c	tracing: fix debug compilation Define flush_period as a const and not as constexpr. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465240516-20128-1-git-send-email-vladz@cloudius-systems.com>	2016-06-06 22:15:27 -04:00
Vlad Zolotarov	905190ac06	tracing: add support for probabilistic tracing Add a support for defining a probability (a value in a [0,1] range) for tracing the next CQL request. Traces for requests that are chosen to be traced due to this feature are not going to flushed immediately. Use std::subtract_with_carry_engine (implements the "lagged Fibonacci" algorithm) random number engine for fastest generation of random integer values. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 15:41:01 +03:00
Vlad Zolotarov	779ff88c76	tracing: add flush timer Flush pending sessions to the storage every 2 seconds. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 14:34:08 +03:00
Vlad Zolotarov	4b008ac5ea	tracing: rework maximum sessions amount back pressure strategy A tracing session life cycle includes 3 stages: 1) Active: when new trace records are being added to this session. 2) Pending for flushing to a storage: when session is over but not yet flushed to the storage ("backend"). 3) Flushing: when session's records are being flushed to the storage and this process is not yet completed. Sessions may accumulate in each of the stages above and we should limit the maximum amount of sessions being accumulated in each of them in order to avoid OOM situation. Current in-tree implementation only limits the number of tracing sessions accumulated in the first ("Active") stage. Since currently every closing session is being immediately flushed (as long as "settraceprobability" is not implemented) the second stage never accumulates tracing sessions. The third stage is currently not controlled at all and if, for instance, we succeed to push enough tracing session towards a slow storage backend, they may accumulate there consuming an uncontrolled amount of memory and may eventually consume all of it. This patch fixes this unpleasant situation by implying the following strategy: - Limit the total amount of accumulated tracing sessions in all stages above together by a static value - 2 times "flush threshold". "2 times" is needed to allow new tracing sessions to accumulate in the stage 2 while sessions in the stage 3 are still being processed. - Forcefully flush sessions in the stage 2 to the storage when their count reaches a "flush threshold". This would ensure that there will not more than totally (2 * "flush threshold") sessions (in any stage) on each shard. An advantage of this strategy is its simplicity - we only need a single threshold to control all stages. If we feel that we needed a finer graining for each stage we may add separate limits for each of them in the future. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 13:50:41 +03:00
Vlad Zolotarov	139fa9d1bd	tracing: minor cleanups - Make small functions on a fast path "inline". - Add "const" qualifier where needed. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 13:00:31 +03:00
Vlad Zolotarov	a53d329b25	tracing: add a serializable trace_info object tracing::trace_info is used to pass the tracing information between nodes. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:16:53 +03:00
Vlad Zolotarov	c965528a03	tracing: add a trace_state and tracing classes trace_state: Is a single tracing session. tracing: A sharded service that contains an i_trace_backend_helper instance and is a "factory" of trace_state objects. trace_state main interface functions are: - begin(): Start time counting (should be used via tracing::begin() wrapper). - trace(): Create a tracing event - it's coupled with a time passed since begin() (should be used via tracing::trace() wrapper). - ~trace_state(): Destructor will close the tracing session. "tracing" service main interface function is: - start(): Initialize a backend. - stop(): Shut down a backend. - create_session(): Creates a new tracing session. (tracing::end_session(): Is called by a trace_state destructor). When trace_state needs to store a tracing event it uses a backend helper from a "tracing" service. A "tracing" service limits a number of opened tracing session by a static number. If this number is reached - next sessions will be dropped. trace_state implements a similar strategy in regard to tracing events per singe session. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:42 +03:00
Vlad Zolotarov	d3988a8113	tracing::trace_keyspace_helper: a keyspace based i_tracing_backend_helper implementation Uses a CQL keyspace system_traces to store tracing information. Uses two tables: CREATE TABLE system_traces.sessions ( session_id uuid, command text, client inet, coordinator inet, duration int, parameters map<text, text>, request text, started_at timestamp, PRIMARY KEY ((session_id))) and CREATE TABLE system_traces.events ( session_id uuid, event_id timeuuid, activity text, source inet, source_elapsed int, thread text, PRIMARY KEY ((session_id), event_id)) system_traces.sessions table contains records of tracing sessions. system_traces.sessions columns description: - session_id: an ID of the session. - command: type of a command this session was created for (currently supported "NONE", "QUERY" and "REPAIR"). - client: IP of the client that issued the command. - coordinator: IP of a coordinator that received the command. - duration: total duration of the tracing session (in us). - parameters: optional parameters for this session, passed to i_trace_state::begin() call. - request: a CQL command this tracing session is created for. - started_at: the time the session has been started at. system_traces.events contains records of separate tracing events. system_traces.events columns description: - session_id: an ID of the session. - event_id: an ID of the event. - activity: the trace point description - a message given to i_trace_state::trace(). - source: IP of the Node where trace event was issued. - source_elapsed: time passed since creation of a tracing session (in us) on the Node where this trace event was issued. - thread: name of the thread in who's context this trace event was issued in (currently its "core N", where 'N' is an index of a shard the trace event was issued on). This class will cache lambdas creating the corresponding mutations for each tracing record requested to be stored till flush() method is called. flush() will merge all pending mutations to "sessions" and "events" tables and then apply a mutation to "events" table and when it completes - to "sessions" table. This way it'll ensure that when some tracing session is visible, all its events are visible too. trace_keyspace_helper exposes a few metrics via collectd: - tracing_error - a total number of errors (not including OOM) - bad_column_family_errors - number of times a tracing record wasn't stored because system_trace tables' schema didn't match the expected value. This may happen if a DB administrator is doing funny things like altering the schemas of the above tables. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:12:19 +03:00
Vlad Zolotarov	a2994ffd7f	tracing: add i_tracing_backend_helper interface This class represents an interface for a specific backend that is going to store tracing information. The specific implementation may and expected to implement caching of pending tracing records. Interface functions are: - start(): Initialize a backend (e.g. create keyspace and tables). - stop(): Flush all pending work and shut down the backend. - store_session_record()/store_event_record(): Cache/store the corresponding tracing records. - flush(): Flush pending tracing records. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:12:13 +03:00

24 Commits