scylladb

Author	SHA1	Message	Date
Vlad Zolotarov	7e180c7bd3	tracing: introduce the tracing::global_trace_state_ptr class This object, similarly to a global_schema_ptr, allows to dynamically create the trace_state_ptr objects on different shards in a context of the original tracing session. This object would create a secondary tracing session object from the original trace_state_ptr object when a trace_state_ptr object is needed on a "remote" shard, similarly to what we do when we need it on a remote Node. Fixes #1678 Fixes #1647 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1474387767-21910-1-git-send-email-vladz@cloudius-systems.com>	2016-10-02 11:31:37 +03:00
Vlad Zolotarov	a491ac0f18	tracing: introduce a log_slow_query logic The main idea is to log queries that take "too long" to complete. The "too long" is above the given threshold. To achieve the above this patch does the following: - Introduce two new properties to the tracing::trace_state: - "Full tracing": when the tracing of this query was explicitly requested. In this state we will record all possible traces related to this query: both on the coordinator and on any replica involved. - "Log slow query": when slow query logging is enabled. If slow query logging is enabled and a session's "duration" is above the specified threshold we will create a record in the "slow queries log" and write all trace records created on the coordinator and on a replica if a replica's session lasts longer than that threshold. (We will propagate the Coordinator's slow query logging threshold to replicas in the context of a specific tracing/logging session). The properties above are independent, namely they may be enabled and/or disabled independently and any combination of them is legal (naturally, creating a tracing session when both states above are disabled makes no sense). - Instrument the tracing::tracing service to allow the following: - Enable/disable slow query logging. - Set/get the slow query duration threshold (in microseconds). - Set/get the slow query log record TTL value (in seconds). - Instrument the trace_keyspace_helper to write a slow query log entry when requested. - The slow query logging is disabled by default and the threshold is set to half a second. - The TTL of a slow log record is set to 86400 seconds by default. - It makes sense to use the same "slow query logging threshold" and a "slow query record TTL" both on a coordinator and on a replica Nodes in a context of the same tracing session: - Pass both TTL and a threshold to the replica in a trace_info. This patch also implements the new slow query logging specific logic: - Don't write the pending tracing records before the end of a tracing session until "duration" reaches the logging threshold. - Don't build the parameters<sstring, sstring> map unless we know we will write it to I/O. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-28 18:28:44 +03:00
Vlad Zolotarov	8609900621	tracing: introduce trace_state capabilities bit field - Instead of keeping separate booleans introduce a trace_state_props_set enum_set and pass it around instead of separate booleans. - Change the trace_info to hold this value in addition to write_on_close. Initialize a corresponding bit in an enum_set based on a write_on_close value in a trace_info constructor for a backward compatibility. - Separate a trace_state constructor into two: - For a primary session object. - For a secondary session object. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 18:34:36 +03:00
Vlad Zolotarov	c8cf2ef82c	tracing::trace_state: introduce is_in_state() and set_state() accessors Use these new methods to manipulate trace_state::_state value. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 17:58:42 +03:00
Vlad Zolotarov	39b23cd084	tracing::trace_state: rename: get_write_on_close() -> write_on_close() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 17:58:42 +03:00
Vlad Zolotarov	09624f704f	tracing::trace_state: rename: get_type() -> type() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 17:58:42 +03:00
Vlad Zolotarov	b40a819d1e	tracing::trace_state: rename: get_session_id() -> session_id() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 17:58:42 +03:00
Vlad Zolotarov	372da7e71b	tracing: add support for setting a username and a table name parameters - "username" is a name used in the authentication process. - "table name" is a <keyspace>.<cf name> string representing a name of a table used for a query in question. Note that there may be more than one table name in a batch query. Therefore we store an unordered set of tables names. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 17:58:42 +03:00
Vlad Zolotarov	37da6f53f8	tracing: fix a session "duration" semantics A session's "duration" should be a time it took to handle a request, which is a time till response to a user. In other words - till a consistency level is reached. Before this patch is was a time that takes a complete handling of a request, which is the time it takes to handle all replicas and not only those required to reach a CL. This patch fixes this situation by extending the trace_state's state values to 3 states: inactive, foreground and background. A primary session may be in 3 states: - "inactive": between the creation and a begin() call. - "foreground": after a begin() call and before a stop_foreground_and_write() call. - "background": after a stop_foreground_and_write() call and till the state object is destroyed. - Traces are not allowed while state is in an "inactive" state. - The time the primary session was in a "foreground" state is the time reported as a session's "duration". - Traces that have arrived during the "background" state will be recorded as usual but their "elapsed" time will be greater or equal to the session's "duration". Secondary sessions may only be in an "inactive" or in a "foreground" states. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-16 12:32:34 +03:00
Vlad Zolotarov	f83e33fc13	tracing: make "elapsed" be std::chrono::duration - Define an tracing::elapsed_clock type (std::chrono::steady_clock). Use it instead of trace_state::clock_type. - Store the "elapsed" information in a form of elapsed_clock::duration. - Make all keyspace_backend specific conversions inside the trace_keyspace_helper class, where they belong. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-16 12:32:34 +03:00
Vlad Zolotarov	ebf13da9c9	tracing::session_record: make start_at to be a time_point Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-16 12:32:34 +03:00
Vlad Zolotarov	67d537ecb5	tracing: issue a write event if a single session creates a lot of events Currently write events are issued every time a trace session is closed. However if a single session creates a lot of events we will start dropping them after the total amount of pending records bypasses the limit. This patch will issue a write event before the session end in that case. Since now new events may be added to the active tracing session while it's scheduled for write we have to ensure the following: - Not to add the already pending for write session to the pending bulk. - Grab all pending data in a specific session in a synchronous way during the write event. - Serialize creation of events mutations - otherwise the "monotonic nanos" logic won't work. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-09 19:00:43 +03:00
Vlad Zolotarov	5391bcc5a9	tracing: improve a back pressure policy Use a per-shard tracing records budget instead of maintaining a fixed-size per-session records budget and a per-shard sessions budget. The original policy could lead to some irrational situations, when we have a single tracing session that creates a substantial amount of records that we can handle but we would start dropping new records after it surpasses the per-session limit. The new policy handles a per-shard trace records budget that is being consumed by each trace() call and by a primary session destructor when a session record is created. Each active record may only be in one of the following states: - cached: stored in its session's object. When record is in this state it's not going to be written to I/O during the next write event. - pending for write: when record is in this state it's going to be written to I/O during the next write event. - flushing: the record is being currently written to the I/O. There are counters of the total amount of records in each state above. Each record may only be in a specific state at every point of time and thereby it must be accounted only in one and only one of the three counters. The sum of all three counters should not be greater than (max_pending_trace_records + write_event_records_threshold) at any time (actually it can get as high as a value above plus (max_pending_sessions) if all sessions are primary but we won't take this into an account for simplicity). The same is about the number of outstanding sessions: it may not be greater than (max_pending_sessions + write_event_sessions_threshold) at any time. If total number of tracing records is greater or equal to the limit above, the new trace point is going to be dropped. If current number or records plus the expected number of trace records per session (exp_trace_events_per_session) is greater than the limit above new sessions will be dropped. A new session will also be dropped if there are too many active sessions. When the record or a session is dropped the appropriate statistics counters are updated and there is a rate-limited warning message printed to the log. Every time a number of records pending for write is greater or equal to (write_event_records_threshold) or a number of sessions pending for write is greater or equal to (write_event_sessions_threshold) a write event is issued. Every 2 seconds a timer would write all pending for write records available so far. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-09 19:00:43 +03:00
Vlad Zolotarov	63a0502ed1	tracing: rework the interface between the tracing/trace_state and the backend Before this patch the interaction between the layers above was as follows: - trace_state was passing the trace event data to a backend object every time trace() method was called. - trace_state was passing the session data to a backend object in a destructor. - A backend object was storing this data in a form of lambda where all data above was caught in a capture list. This was primarily done in order to delay the call for make_xxx_mutation(). Lambdas were stored in a map by a session ID and they were executed when a kick() method was called. - A tracing::tracing object was periodically calling a kick() method of a backend that was initiating a write of all pending data to the storage. All backend methods used in the described above interactions were virtual. Thereby, for instance, for each and every trace record we were calling a virtual method that was receiving a significant amount of parameters, store a lambda in a map and return. This is clearly a suboptimal way of using virtual functions since we prevent a compiler from inlining an obviously inlinable operations. This patch changes the interaction scheme to be as follows: - Trace events and session data are stored and passed around in a form of structs that hold all relevant information (no more lambdas). - As long as a trace session is active its data is aggregated inside the corresponding trace_state object. - The object containing all records is passed and stored as a lw_shared_ptr to save extra copies and to shorten capture lists. - All aggregated data is passed to a tracing::tracing object in a trace_state destructor. The data is stored in a std::deque in a tracing::tracing object (instead of a map by a session ID). - A single backend's virtual method call writes all data aggregated so far (kick() method is not needed any more), every time a write event occurs. - Backend has only one virtual method now: - Write a bulk of sessions' data aggregated so far. - Backend's virtual method receives a records bulk object by reference. As a result: - A latency of a single trace event that has no formatting improved from 0.2us to 0.1us. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-09 15:25:52 +03:00
Vlad Zolotarov	d7d72c4cd4	tracing: "inline" cleanup - Don't use inline for templates. - Put "inline" qualifier for out-of-class defined methods where they are defined and not where they are declared. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-20 19:49:30 +03:00
Vlad Zolotarov	5376b053f9	tracing: use seastar::format() for formatted trace() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-20 19:49:30 +03:00
Vlad Zolotarov	a197323b47	tracing::trace_state.hh: Add descriptions for main methods and functions Add a proper description to a tracing::trace() that clarifies that the tracing message string and the positional parameters are going to be copied if tracing state is initialized. Add a description for trace_state::begin() methods and for a tracing::begin() helper function. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	89a49c346c	tracing::trace_state: add begin() overload for seastar::value_of given as a "request" parameter. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	0689843e79	tracing::trace_state: add method to set the session's "params" map entries Sometimes we want to be able to set "params" map after we started a tracing session, e.g. when the parameters values, like a consistency level parsed from the "options" part of a binary frame, are available only after some heavy part of a flow we would like to trace. This patch includes the following changes: - No longer pass a map to the begin(). - Limit the parameters to the known set. - Define a method to set each such parameter and save its value till the final sstring->sstring map is created. - Construct the final sstring->sstring map in the destructor of the trace_state object in order to defer all the formatting to be after the traced flow. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	2bb054748e	tracing: record events' time stamps - Extend the i_tracing_backend_helper interface to accept the event record timestamp. - Grab the current timestamp when the event record is taken. - Add the instrumentation to the trace_keyspace_helper to create a unique time-UUID from a given std::chrono::duration object. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	06d4221382	tracing: add tracing::make_trace_info() helper This helper returns an std::experimental::optional<trace_info> which is initialized or not initialized depending on whether a given trace_state_ptr is initialized or not. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	7a5fc9fcdc	tracing::trace_state: add const qualifiers to a trace_state_ptr parameter Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	da4836becc	tracing::trace_state: add support for a formatted message in trace() Add an support for passing a format string plus positional parameters for creation of a trace point message. Format string should be given in a fmt library native format described here: http://fmtlib.net/latest/syntax.html#syntax . Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	6e38133f82	tracing: prevent a destruction of a tracing::tracing while it's used Prevent the destruction of tracing::tracing instances while there are still tracing::trace_state objects that are using it: - Make tracing::tracing inherit from seastar::async_sharded_service<tracing::tracing>. - Grab a tracing::tracing.shared_from_this() in each tracing::trace_state object using it. - Use a saved pointer to the local tracing::tracing instance in a destructor instead of accessing it via tracing::get_local_tracing_instance() to avoid "local is not initialized" assert when sessions are being destroyed after the service was stopped. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	a5022a09a4	tracing: use 'write' instead of 'flush' and 'store' for consistency with seastar's API In names of functions and variables: s/flush_/write_/ s/store_/write_/ In a i_tracing_backend_helper: s/flush()/kick()/ Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	139fa9d1bd	tracing: minor cleanups - Make small functions on a fast path "inline". - Add "const" qualifier where needed. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 13:00:31 +03:00
Vlad Zolotarov	c965528a03	tracing: add a trace_state and tracing classes trace_state: Is a single tracing session. tracing: A sharded service that contains an i_trace_backend_helper instance and is a "factory" of trace_state objects. trace_state main interface functions are: - begin(): Start time counting (should be used via tracing::begin() wrapper). - trace(): Create a tracing event - it's coupled with a time passed since begin() (should be used via tracing::trace() wrapper). - ~trace_state(): Destructor will close the tracing session. "tracing" service main interface function is: - start(): Initialize a backend. - stop(): Shut down a backend. - create_session(): Creates a new tracing session. (tracing::end_session(): Is called by a trace_state destructor). When trace_state needs to store a tracing event it uses a backend helper from a "tracing" service. A "tracing" service limits a number of opened tracing session by a static number. If this number is reached - next sessions will be dropped. trace_state implements a similar strategy in regard to tracing events per singe session. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:42 +03:00

27 Commits