scylladb

Author	SHA1	Message	Date
Kefu Chai	db77587309	tracing: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16925	2024-01-23 08:57:11 +02:00
Pavel Emelyanov	16e1315eef	tracing: Remove init_session_records() It now does nothing but wraps make_lw_shared<one_session_records>() call. Callers can do it on their own thus facilitating further list-initialization patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-12 16:11:18 +03:00
Pavel Emelyanov	dd87adadf3	tracing: List-initialize one_session_records::ttl For that to happen the value evaluation is moved from the init_session_records() into a private trace_state helper as it checks the props values initialized earlier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-12 16:09:05 +03:00
Pavel Emelyanov	b63084237c	tracing: List-initialize one_session_records This touches session_id, parent_id and my_span_id fields Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-12 16:07:24 +03:00
Pavel Emelyanov	944b98f261	tracing: List-initialize session_record This object is constructed via one_session_records thus the latter needs to pass some arguments along Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-12 16:04:01 +03:00
Benny Halevy	959a740dac	utils: to_string: get rid of utils::join Use `fmt::format("{}", fmt::join(...))` instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-05-02 10:59:58 +03:00
Benny Halevy	25ebc63b82	move to_string.hh to utils/ Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-02-15 11:09:04 +02:00
Avi Kivity	0b418fa7cf	cql3, transport, tests: remove "unset" from value type system The CQL binary protocol introduced "unset" values in version 4 of the protocol. Unset values can be bound to variables, which cause certain CQL fragments to be skipped. For example, the fragment `SET a = :var` will not change the value of `a` if `:var` is bound to an unset value. Unsets, however, are very limited in where they can appear. They can only appear at the top-level of an expression, and any computation done with them is invalid. For example, `SET list_column = [3, :var]` is invalid if `:var` is bound to unset. This causes the code to be littered with checks for unset, and there are plenty of tests dedicated to catching unsets. However, a simpler way is possible - prevent the infiltration of unsets at the point of entry (when evaluating a bind variable expression), and introduce guards to check for the few cases where unsets are allowed. This is what this long patch does. It performs the following: (general) 1. unset is removed from the possible values of cql3::raw_value and cql3::raw_value_view. (external->cql3) 2. query_options is fortified with a vector of booleans, unset_bind_variable_vector, where each boolean corresponds to a bind variable index and is true when it is unset. 3. To avoid churn, two compatiblity structs are introduced: cql3::raw_value{,_view}_vector_with_unset, which can be constructed from a std::vector<raw_value{,_view/}>, which is what most callers have. They can also be constructed with explicit unset vectors, for the few cases they are needed. (cql3->variables) 4. query_options::get_value_at() now throws if the requested bind variable is unset. This replaces all the throwing checks in expression evaluation and statement execution, which are removed. 5. A new query_options::is_unset() is added for the users that can tolerate unset; though it is not used directly. 6. A new cql3::unset_operation_guard class guards against unsets. It accepts an expression, and can be queried whether an unset is present. Two conditions are checked: the expression must be a singleton bind variable, and at runtime it must be bound to an unset value. 7. The modification_statement operations are split into two, via two new subclasses of cql3::operation. cql3::operation_no_unset_support ignores unsets completely. cql3::operation_skip_if_unset checks if an operand is unset (luckily all operations have at most one operand that tolerates unset) and applies unset_operation_guard to it. 8. The various sites that accept expressions or operations are modified to check for should_skip_operation(). This are the loops around operations in update_statement and delete_statement, and the checks for unset in attributes (LIMIT and PER PARTITION LIMIT) (tests) 9. Many unset tests are removed. It's now impossible to enter an unset value into the expression evaluation machinery (there's just no unset value), so it's impossible to test for it. 10. Other unset tests now have to be invoked via bind variables, since there's no way to create an unset cql3::expr::constant. 11. Many tests have their exception message match strings relaxed. Since unsets are now checked very early, we don't know the context where they happen. It would be possible to reintroduce it (by adding a format string parameter to cql3::unset_operation_guard), but it seems not to be worth the effort. Usage of unsets is rare, and it is explicit (at least with the Python driver, an unset cannot be introduced by ommission). I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't recognize unsets) with cql3::maybe_unset_value (that does), but that caused huge amounts of churn, so I abandoned that in favor of the current approach. Closes #12517	2023-01-16 21:10:56 +02:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Piotr Sarna	ec52e05eab	tracing: unify prepared statement info into a single struct The tracing code assumes that query_option_names and query_option_values vectors always have the same length as the prepared_statements vector, but it's not true. E.g. if one of the statements in a batch is incorrect, it will create a discrepancy between the number of prepared statements and the number of bound names and values, which currently leads to a segmentation fault. To overcome the problem, all three vectors are integrated into a single vector, which makes size mismatches impossible. Tested manually with code that triggers a failure while executing a batch statement, because the Python driver performs driver-side validation and thus it's hard to create a test case which triggers the problem. closes: #9221	2021-10-01 10:57:38 +03:00
Avi Kivity	4d70f3baee	storage_proxy: change unordered_set<inet_address> to small_vector in write path The write paths in storage_proxy pass replica sets as std::unordered_set<gms::inet_address>. This is a complex type, with N+1 allocations for N members, so we change it to a small_vector (via inet_address_vector_replica_set) which requires just one allocation, and even zero when up to three replicas are used. This change is more nuanced than the corresponding change to the read path `abe3d7d7` ("Merge 'storage_proxy: use small_vector for vectors of inet_address' from Avi Kivity"), for two reasons: - there is a quadratic algorithm in abstract_write_response_handler::response(): it searches for a replica and erases it. Since this happens for every replica, it happens N^2/2 times. - replica sets for writes always include all datacenters, while reads usually involve just one datacenter. So, a write to a keyspace that has 5 datacenters will invoke 15*(15-1)/2 =105 compares. We could remove this by sending the index of the replica in the replica set to the replica and ask it to include the index in the response, but I think that this is unnecessary. Those 105 compares need to be only 105/15 = 7 times cheaper than the corresponding unordered_set operation, which they surely will. Handling a response after a cross-datacenter round trip surely involves L3 cache misses, and a small_vector reduces these to a minimum compared to an unordered_set with its bucket table, linked list walking and managent, and table rehashing. Tests using perf_simple_query --write --smp 1 --operations-per-shard 1000000 --task-quota-ms show two allocations removed (as expected) and a nice reduction in instructions executed. before: median 204842.54 tps ( 54.2 allocs/op, 13.2 tasks/op, 49890 insns/op) after: median 206077.65 tps ( 52.2 allocs/op, 13.2 tasks/op, 49138 insns/op) Closes #8847	2021-06-17 13:46:40 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Michał Chojnowski	b9322a6b71	cql3: switch users of cql3::raw_value_view to internals-independent API We want to change the internals of cql3::raw_value{_view}. However, users of cql3::raw_value and cql3::raw_value_view often use them by extracting the internal representation, which will be different after the planned change. This commit prepares us for the change by making all accesses to the value inside cql3::raw_value(_view) be done through helper methods which don't expose the internal representation publicly. After this commit we are free to change the internal representation of raw_value_{view} without messing up their users.	2021-04-01 10:42:04 +02:00
Piotr Sarna	5386739354	tracing: allow providing a custom session record param The mechanism of session record params is currently only used to store query strings and a couple more params like consistency level, but since we now have more frontends than just CQL and Thrift, it would be nice to also allow the users to put custom parameters in there. An immediate first user of this mechanism would be alternator, which is going to put the operation type under the "alternator_op" key. The operation type is not part of the query string due to how DynamoDB's protocol works - the op type is stored separately in the HTTP header. While it's possible to extract the operation type from the session_id, it might not be the case once #2572 is implemented.	2021-03-17 11:14:28 +01:00
Konstantin Osipov	b4f875f08e	uuid: reduce code dependency on UUID_gen.hh Do not include UUID_gen.hh in trace_state.hh and lists.hh to reduce header level dependency on it. Message-Id: <20210127173114.725761-2-kostja@scylladb.com>	2021-01-27 20:08:29 +02:00
Gleb Natapov	4893bc9139	tracing: split adding prepared query parameters from stopping of a trace Currently query_options objects is passed to a trace stopping function which makes it mandatory to make them alive until the end of the query. The reason for that is to add prepared statement parameters to the trace. All other query options that we want to put in the trace are copied into trace_state::params_values, so lets copy prepared statement parameters there too. Trace enabled case will become a little bit more expensive but on the other hand we can drop a continuation that holds query_options object alive from a fast path. It is safe to drop the call to stop_foreground_prepared() here since The tracing will be stopped in process_request_one(). Message-Id: <20191205102026.GJ9084@scylladb.com>	2019-12-05 17:00:47 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Duarte Nunes	5de02ab98c	tracing: Pass string_view instead of string to add_query This resulted in superfluous copies. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180812085326.6260-1-duarte@scylladb.com>	2018-08-13 23:57:37 +01:00
Vlad Zolotarov	6db90a2e63	tracing: store a query response size Add a new "response_size" column to system_traces.sessions and store a size of an uncompressed response for a traced query. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-03 12:29:36 -04:00
Vlad Zolotarov	05020921bb	tracing: store request size Add a new column "request_size" to system_traces.sessions and store the uncompressed request frame data size. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-03 12:29:36 -04:00
Paweł Dziepak	0b9eed72f4	cql3: value_view: switch to fragmented_temporary_buffer::view	2018-07-18 12:28:06 +01:00
Vlad Zolotarov	818b5b75ba	tracing: store the prepared statements parameters values Store the prepared statement positional parameters values in the corresponding system_traces.sessions entry in the 'parameters' column (which has a map<text,text> type). Parameters are stored as a pair of "param[X]" : "value", where X is the index of the parameter starting from 0 and the "value" is the first 64 characters of the parameter's value string representation. If parameters were given with their names attached (see the description on bit 0x40 of QUERY flags in the CQL binary protocol specification) then parameters are going to be stored in the "param[X](<bound variable name>)" : "value" form. If the value's string representation is longer than 64 characters then the "value" will contain only first 64 characters of it and will have the "..." at the end. For a BATCH of prepared statements the parameter "name" will have a form of param[Y][X] where Y is the index of the corresponding prepared statement in the BATCH and X is the index of the parameter. Both X and Y start from 0. Note: Had to switch to boost::range::find() in sstables::big_sstable_set in order to address the "ambiguous overload" compilation error. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-06-12 10:57:05 -04:00
Vlad Zolotarov	a1da285f9e	tracing: store queries statements for BATCH Similarly to the regular QUERY of EXECUTE we want to see the actual queries statement that were part of the BATCH. If a traced query has only a single statement to execute then its statement will be stored in a form 'query':'<statement>'. If there are two or more queries (BATCH) then statements of each query in the BATCH will be stored in a form 'query[X]':'<statement>', where X is the index of the query in the BATCH starting from 0. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-06-12 10:57:05 -04:00
Vlad Zolotarov	c0e51c4521	tracing::trace_state: hide the internals of params_values Hide it inside the trace_state.cc in order to avoid future circular dependencies with other .hh files. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-06-12 10:57:05 -04:00
Vlad Zolotarov	fcff872089	tracing: make the session state modifying methods and tracing::trace(...) noexcept Make state session creation, stop_forground() and tracing::trace(...) methods noexcept. Most of them have already been implemented the way that they won't throw but this patch makes it official... Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:48 -05:00
Vlad Zolotarov	a491ac0f18	tracing: introduce a log_slow_query logic The main idea is to log queries that take "too long" to complete. The "too long" is above the given threshold. To achieve the above this patch does the following: - Introduce two new properties to the tracing::trace_state: - "Full tracing": when the tracing of this query was explicitly requested. In this state we will record all possible traces related to this query: both on the coordinator and on any replica involved. - "Log slow query": when slow query logging is enabled. If slow query logging is enabled and a session's "duration" is above the specified threshold we will create a record in the "slow queries log" and write all trace records created on the coordinator and on a replica if a replica's session lasts longer than that threshold. (We will propagate the Coordinator's slow query logging threshold to replicas in the context of a specific tracing/logging session). The properties above are independent, namely they may be enabled and/or disabled independently and any combination of them is legal (naturally, creating a tracing session when both states above are disabled makes no sense). - Instrument the tracing::tracing service to allow the following: - Enable/disable slow query logging. - Set/get the slow query duration threshold (in microseconds). - Set/get the slow query log record TTL value (in seconds). - Instrument the trace_keyspace_helper to write a slow query log entry when requested. - The slow query logging is disabled by default and the threshold is set to half a second. - The TTL of a slow log record is set to 86400 seconds by default. - It makes sense to use the same "slow query logging threshold" and a "slow query record TTL" both on a coordinator and on a replica Nodes in a context of the same tracing session: - Pass both TTL and a threshold to the replica in a trace_info. This patch also implements the new slow query logging specific logic: - Don't write the pending tracing records before the end of a tracing session until "duration" reaches the logging threshold. - Don't build the parameters<sstring, sstring> map unless we know we will write it to I/O. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-28 18:28:44 +03:00
Vlad Zolotarov	8609900621	tracing: introduce trace_state capabilities bit field - Instead of keeping separate booleans introduce a trace_state_props_set enum_set and pass it around instead of separate booleans. - Change the trace_info to hold this value in addition to write_on_close. Initialize a corresponding bit in an enum_set based on a write_on_close value in a trace_info constructor for a backward compatibility. - Separate a trace_state constructor into two: - For a primary session object. - For a secondary session object. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 18:34:36 +03:00
Vlad Zolotarov	c8cf2ef82c	tracing::trace_state: introduce is_in_state() and set_state() accessors Use these new methods to manipulate trace_state::_state value. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 17:58:42 +03:00
Vlad Zolotarov	b40a819d1e	tracing::trace_state: rename: get_session_id() -> session_id() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 17:58:42 +03:00
Vlad Zolotarov	92921fe110	tracing::trace_state: push the UUID to the end of an error message in a destructor Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1471780783-25406-1-git-send-email-vladz@cloudius-systems.com>	2016-08-21 16:50:52 +03:00
Vlad Zolotarov	0683d4bd29	tracing::trace_state: don't throw in a destructor The condition in question is sanity check for a SW bug. This SW bug (if occurs) is not critical - there is an additional protection against it in the stop_foreground_and_write(). Having said all that, since we shell not throw from a destructor, replace throwing of a std::logic_error with an logger error message. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1471773320-7398-1-git-send-email-vladz@cloudius-systems.com>	2016-08-21 13:50:52 +03:00
Vlad Zolotarov	37da6f53f8	tracing: fix a session "duration" semantics A session's "duration" should be a time it took to handle a request, which is a time till response to a user. In other words - till a consistency level is reached. Before this patch is was a time that takes a complete handling of a request, which is the time it takes to handle all replicas and not only those required to reach a CL. This patch fixes this situation by extending the trace_state's state values to 3 states: inactive, foreground and background. A primary session may be in 3 states: - "inactive": between the creation and a begin() call. - "foreground": after a begin() call and before a stop_foreground_and_write() call. - "background": after a stop_foreground_and_write() call and till the state object is destroyed. - Traces are not allowed while state is in an "inactive" state. - The time the primary session was in a "foreground" state is the time reported as a session's "duration". - Traces that have arrived during the "background" state will be recorded as usual but their "elapsed" time will be greater or equal to the session's "duration". Secondary sessions may only be in an "inactive" or in a "foreground" states. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-16 12:32:34 +03:00
Vlad Zolotarov	5391bcc5a9	tracing: improve a back pressure policy Use a per-shard tracing records budget instead of maintaining a fixed-size per-session records budget and a per-shard sessions budget. The original policy could lead to some irrational situations, when we have a single tracing session that creates a substantial amount of records that we can handle but we would start dropping new records after it surpasses the per-session limit. The new policy handles a per-shard trace records budget that is being consumed by each trace() call and by a primary session destructor when a session record is created. Each active record may only be in one of the following states: - cached: stored in its session's object. When record is in this state it's not going to be written to I/O during the next write event. - pending for write: when record is in this state it's going to be written to I/O during the next write event. - flushing: the record is being currently written to the I/O. There are counters of the total amount of records in each state above. Each record may only be in a specific state at every point of time and thereby it must be accounted only in one and only one of the three counters. The sum of all three counters should not be greater than (max_pending_trace_records + write_event_records_threshold) at any time (actually it can get as high as a value above plus (max_pending_sessions) if all sessions are primary but we won't take this into an account for simplicity). The same is about the number of outstanding sessions: it may not be greater than (max_pending_sessions + write_event_sessions_threshold) at any time. If total number of tracing records is greater or equal to the limit above, the new trace point is going to be dropped. If current number or records plus the expected number of trace records per session (exp_trace_events_per_session) is greater than the limit above new sessions will be dropped. A new session will also be dropped if there are too many active sessions. When the record or a session is dropped the appropriate statistics counters are updated and there is a rate-limited warning message printed to the log. Every time a number of records pending for write is greater or equal to (write_event_records_threshold) or a number of sessions pending for write is greater or equal to (write_event_sessions_threshold) a write event is issued. Every 2 seconds a timer would write all pending for write records available so far. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-09 19:00:43 +03:00
Vlad Zolotarov	63a0502ed1	tracing: rework the interface between the tracing/trace_state and the backend Before this patch the interaction between the layers above was as follows: - trace_state was passing the trace event data to a backend object every time trace() method was called. - trace_state was passing the session data to a backend object in a destructor. - A backend object was storing this data in a form of lambda where all data above was caught in a capture list. This was primarily done in order to delay the call for make_xxx_mutation(). Lambdas were stored in a map by a session ID and they were executed when a kick() method was called. - A tracing::tracing object was periodically calling a kick() method of a backend that was initiating a write of all pending data to the storage. All backend methods used in the described above interactions were virtual. Thereby, for instance, for each and every trace record we were calling a virtual method that was receiving a significant amount of parameters, store a lambda in a map and return. This is clearly a suboptimal way of using virtual functions since we prevent a compiler from inlining an obviously inlinable operations. This patch changes the interaction scheme to be as follows: - Trace events and session data are stored and passed around in a form of structs that hold all relevant information (no more lambdas). - As long as a trace session is active its data is aggregated inside the corresponding trace_state object. - The object containing all records is passed and stored as a lw_shared_ptr to save extra copies and to shorten capture lists. - All aggregated data is passed to a tracing::tracing object in a trace_state destructor. The data is stored in a std::deque in a tracing::tracing object (instead of a map by a session ID). - A single backend's virtual method call writes all data aggregated so far (kick() method is not needed any more), every time a write event occurs. - Backend has only one virtual method now: - Write a bulk of sessions' data aggregated so far. - Backend's virtual method receives a records bulk object by reference. As a result: - A latency of a single trace event that has no formatting improved from 0.2us to 0.1us. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-09 15:25:52 +03:00
Vlad Zolotarov	e1b2926a8d	tracing: add a missing try-catch in params building Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-09 15:21:41 +03:00
Vlad Zolotarov	0689843e79	tracing::trace_state: add method to set the session's "params" map entries Sometimes we want to be able to set "params" map after we started a tracing session, e.g. when the parameters values, like a consistency level parsed from the "options" part of a binary frame, are available only after some heavy part of a flow we would like to trace. This patch includes the following changes: - No longer pass a map to the begin(). - Limit the parameters to the known set. - Define a method to set each such parameter and save its value till the final sstring->sstring map is created. - Construct the final sstring->sstring map in the destructor of the trace_state object in order to defer all the formatting to be after the traced flow. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	b0673aabd5	tracing: fix a logger name Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	da4836becc	tracing::trace_state: add support for a formatted message in trace() Add an support for passing a format string plus positional parameters for creation of a trace point message. Format string should be given in a fmt library native format described here: http://fmtlib.net/latest/syntax.html#syntax . Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	6e38133f82	tracing: prevent a destruction of a tracing::tracing while it's used Prevent the destruction of tracing::tracing instances while there are still tracing::trace_state objects that are using it: - Make tracing::tracing inherit from seastar::async_sharded_service<tracing::tracing>. - Grab a tracing::tracing.shared_from_this() in each tracing::trace_state object using it. - Use a saved pointer to the local tracing::tracing instance in a destructor instead of accessing it via tracing::get_local_tracing_instance() to avoid "local is not initialized" assert when sessions are being destroyed after the service was stopped. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	a5022a09a4	tracing: use 'write' instead of 'flush' and 'store' for consistency with seastar's API In names of functions and variables: s/flush_/write_/ s/store_/write_/ In a i_tracing_backend_helper: s/flush()/kick()/ Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Vlad Zolotarov	4b008ac5ea	tracing: rework maximum sessions amount back pressure strategy A tracing session life cycle includes 3 stages: 1) Active: when new trace records are being added to this session. 2) Pending for flushing to a storage: when session is over but not yet flushed to the storage ("backend"). 3) Flushing: when session's records are being flushed to the storage and this process is not yet completed. Sessions may accumulate in each of the stages above and we should limit the maximum amount of sessions being accumulated in each of them in order to avoid OOM situation. Current in-tree implementation only limits the number of tracing sessions accumulated in the first ("Active") stage. Since currently every closing session is being immediately flushed (as long as "settraceprobability" is not implemented) the second stage never accumulates tracing sessions. The third stage is currently not controlled at all and if, for instance, we succeed to push enough tracing session towards a slow storage backend, they may accumulate there consuming an uncontrolled amount of memory and may eventually consume all of it. This patch fixes this unpleasant situation by implying the following strategy: - Limit the total amount of accumulated tracing sessions in all stages above together by a static value - 2 times "flush threshold". "2 times" is needed to allow new tracing sessions to accumulate in the stage 2 while sessions in the stage 3 are still being processed. - Forcefully flush sessions in the stage 2 to the storage when their count reaches a "flush threshold". This would ensure that there will not more than totally (2 * "flush threshold") sessions (in any stage) on each shard. An advantage of this strategy is its simplicity - we only need a single threshold to control all stages. If we feel that we needed a finer graining for each stage we may add separate limits for each of them in the future. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-06 13:50:41 +03:00
Vlad Zolotarov	c965528a03	tracing: add a trace_state and tracing classes trace_state: Is a single tracing session. tracing: A sharded service that contains an i_trace_backend_helper instance and is a "factory" of trace_state objects. trace_state main interface functions are: - begin(): Start time counting (should be used via tracing::begin() wrapper). - trace(): Create a tracing event - it's coupled with a time passed since begin() (should be used via tracing::trace() wrapper). - ~trace_state(): Destructor will close the tracing session. "tracing" service main interface function is: - start(): Initialize a backend. - stop(): Shut down a backend. - create_session(): Creates a new tracing session. (tracing::end_session(): Is called by a trace_state destructor). When trace_state needs to store a tracing event it uses a backend helper from a "tracing" service. A "tracing" service limits a number of opened tracing session by a static number. If this number is reached - next sessions will be dropped. trace_state implements a similar strategy in regard to tracing events per singe session. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:13:42 +03:00

43 Commits