Files
scylladb/tracing/trace_state.cc
Vlad Zolotarov 4b008ac5ea tracing: rework maximum sessions amount back pressure strategy
A tracing session life cycle includes 3 stages:
   1) Active: when new trace records are being added to this session.
   2) Pending for flushing to a storage: when session is over but not
      yet flushed to the storage ("backend").
   3) Flushing: when session's records are being flushed to the storage
      and this process is not yet completed.

Sessions may accumulate in each of the stages above and we should limit
the maximum amount of sessions being accumulated in each of them in order to avoid OOM
situation.

Current in-tree implementation only limits the number of tracing sessions
accumulated in the first ("Active") stage.

Since currently every closing session is being immediately flushed (as long
as "settraceprobability" is not implemented) the second stage never accumulates
tracing sessions.

The third stage is currently not controlled at all and if, for instance, we
succeed to push enough tracing session towards a slow storage backend, they may
accumulate there consuming an uncontrolled amount of memory and may eventually consume
all of it.

This patch fixes this unpleasant situation by implying the following strategy:

   - Limit the total amount of accumulated tracing sessions in all stages above together
     by a static value - 2 times "flush threshold". "2 times" is needed to allow new
     tracing sessions to accumulate in the stage 2 while sessions in the stage 3 are still
     being  processed.
   - Forcefully flush sessions in the stage 2 to the storage when their count reaches a "flush
     threshold".

This would ensure that there will not more than totally (2 * "flush threshold") sessions (in any stage)
on each shard.

An advantage of this strategy is its simplicity - we only need a single threshold to control all stages.
If we feel that we needed a finer graining for each stage we may add separate limits for each of them
in the future.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-06-06 13:50:41 +03:00

87 lines
3.4 KiB
C++

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <chrono>
#include "tracing/trace_state.hh"
#include "tracing/trace_keyspace_helper.hh"
#include "service/storage_proxy.hh"
namespace tracing {
static logging::logger logger("keyspace_based_trace_state");
trace_state::~trace_state() {
if (_tracing_began) {
if (_primary) {
// We don't account the session_record event when checking a limit
// of maximum events per session because there may be only one such
// event and we don't want to cripple the primary session by
// "stealing" one trace() event from it.
//
// We do want to report it in statistics however. If for instance
// there are a lot of tracing sessions that only open itself and
// then do nothing - they will create a lot of session_record events
// and we do want to know about it.
++_pending_trace_events;
tracing::get_local_tracing_instance().backend_helper().store_session_record(_session_id, _client, std::move(_params), std::move(_request), _started_at, _type, elapsed(), _ttl);
}
tracing::get_local_tracing_instance().end_session();
if (_flush_on_close) {
tracing::get_local_tracing_instance().flush_pending_records();
}
// update some stats and get out...
auto& tracing_stats = tracing::get_local_tracing_instance().stats;
tracing_stats.trace_events_count += _pending_trace_events;
if (_pending_trace_events >= tracing::max_trace_events_per_session) {
logger.trace("{}: Maximum number of traces is reached. Some traces are going to be dropped", _session_id);
if (++tracing_stats.max_traces_threshold_hits % tracing::max_threshold_hits_warning_period == 1) {
logger.warn("Maximum traces per session limit is hit {} times", tracing_stats.max_traces_threshold_hits);
}
}
}
}
}