Unlike bytes, bytes_ostream supports fragmented buffers, thus reducing
the pressure on the memory allocator caused by large frozen partitions.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Deserialization code is going to use a proxy object that will be casted
to either bytes or bytes_ostream depending on the demand. It cannot be
casted directly to bytes_view though as it won't extend the lifetime of
the buffer appropriately. The simples solution is just to add overloads
that accept const bytes&.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Deserialization code has now two variants. The faster one can be used
only when the source buffer is not fragmented. reduce_chunk_count() aims
to increase number of cases when the fast path can be used.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This patch makes append() and write() limit the maximum size of a single
allocation to bytes_ostream::max_chunk_size.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
input_stream performs a type erasure on seastar::simple_input_stream and
fragmented_input_stream. The main goal is to keep the overhead for the
cases when simple_input_stream is used minimum.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
fragmented_input_stream is an input stream usable by IDL-generated
deserializers which can read from fragmented buffers.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
It is easier for user to figure out the configuration error.
The log looks like:
WARN 2016-08-22 15:04:56,214 [shard 0] gossip - ClusterName mismatch
from 127.0.0.2 test2!=test
WARN 2016-08-22 15:06:16,106 [shard 0] gossip - Partitioner mismatch from 127.0.0.2
org.apache.cassandra.dht.RandomPartitioner!=org.apache.cassandra.dht.Murmur3Partitioner
Fixes: #1587
Message-Id: <745ed8857da6f70745735b94eef7b226d2f22e10.1471849834.git.asias@scylladb.com>
The condition in question is sanity check for a SW bug.
This SW bug (if occurs) is not critical - there is an additional protection
against it in the stop_foreground_and_write().
Having said all that, since we shell not throw from a destructor,
replace throwing of a std::logic_error with an logger error message.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1471773320-7398-1-git-send-email-vladz@cloudius-systems.com>
Glauber "eagle eyes" Costa pointed out that the Scylla logo used in our
Docker image documentation looks broken because it's missing the Scylla
text.
Fix the problem by using the Scylla mascot instead.
Message-Id: <1471525154-2800-1-git-send-email-penberg@scylladb.com>
The bug tracker URL in our Docker image documentation is not clickable
because the URL Markdown extracts automatically is broken.
Fix that and add some more links on how to get help and report issues.
Message-Id: <1471524880-2501-1-git-send-email-penberg@scylladb.com>
allow user to use the `supervisorctl' program to start and stop
services. `exec` needed to be added to the scylla and scylla-jmx starter
scripts - otherwise supervisord loses track of the actual process we
want to manage.
Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1471442960-110914-1-git-send-email-yoav@scylladb.com>
"This series includes a time stamp representation changes Avi asked.
In addition is fixes a session "duration" semantics to be the time
it took to satisfy the user's request and not a time it took to
achieve the complete replication factor."
* seastar 823a404...81df893 (3):
> memory: Do not increase g_allocs on failure in allocate and allocate_aligned
> memory: Balance the g_frees and g_allocs
> Merge "thread: explicitly yield on get()" from Glauber
Fixes#1586.
Once unlink_leftmost_without_rebalance() has been called on a bi::set no
other method can be used. This includes clear_and_disposed() used by the
mutation_partition destructor.
We like unlink_leftmost_without_rebalance() because it is efficient, so
the solution is to manually finish destroying clustering row and range
tombstone sets in the reader destructor using that function.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
mutation_fragment() constructor allocates memory. If it fails the
already unlinked parts of mutation (either rows_entry or range_tombtone)
will be leaked.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
The WorkingDirectory directive does not support environment variables on
systemd version that is shipped with Ubuntu 16.04. Fortunately, not
setting WorkingDirectory implicitly sets it to user home directory,
which is the same thing (i.e. /var/lib/scylla).
Fixes#1319
Signed-of-by: Benoit Canet <benoit@scylladb.com>
Message-Id: <1470053876-1019-1-git-send-email-benoit@scylladb.com>
We have two counters that tracks how many memtable flushes are in progress, and
how much memory are they pinning.
The problem is, after we have revamped the code to limit the amount of flushes
in progress, those counters became useless: as they live inside the semaphore
side, they will only be incremented once we have past the semaphore.
One wouldn't notice if working with CPU-bound problems, where memtables don't
pile. But as soon as they do, those counters will always show the same numbers:
the depth of the semaphore, which doesn't mean much. The problem is poised to
become much worse: once we enable write behind in full and set the semaphore's
depth to one, that's the number we'll see here all the time.
The fix is to move the counters outside the semaphore, which will bring back its
old semantics.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <c5ae6903e170f3f356cdda7ed78a4c9ba8d5f024.1471370504.git.glauber@scylladb.com>
A session's "duration" should be a time it took to
handle a request, which is a time till response to a user.
In other words - till a consistency level is reached.
Before this patch is was a time that takes a complete
handling of a request, which is the time it takes to handle
all replicas and not only those required to reach a CL.
This patch fixes this situation by extending the trace_state's state
values to 3 states: inactive, foreground and background.
A primary session may be in 3 states:
- "inactive": between the creation and a begin() call.
- "foreground": after a begin() call and before a
stop_foreground_and_write() call.
- "background": after a stop_foreground_and_write() call and till the
state object is destroyed.
- Traces are not allowed while state is in an "inactive" state.
- The time the primary session was in a "foreground" state is the time
reported as a session's "duration".
- Traces that have arrived during the "background" state will be recorded
as usual but their "elapsed" time will be greater or equal to the
session's "duration".
Secondary sessions may only be in an "inactive" or in a "foreground"
states.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
- Define an tracing::elapsed_clock type (std::chrono::steady_clock).
Use it instead of trace_state::clock_type.
- Store the "elapsed" information in a form of elapsed_clock::duration.
- Make all keyspace_backend specific conversions inside the trace_keyspace_helper
class, where they belong.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
events_records are promised to be kept alive till the future returned
by apply_events_mutation() resolves: it's dowithificated by a caller already.
In addition, since its passed by a reference, it's a logical thing to demand
it to be kept alive by a caller till the future above resolves.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
"This series changes the tracing back pressure scheme from limiting the amount traces in
a single session by a fixed number to have a per-shard budget consumed by all active tracing
sessions.
It was really easy to cause the traces to be dropped even if there weren't too many
active traces: e.g. if there was a single active session which creates more traces
than a per-session limit (30) the traces above 30-th were going to be dropped. Namely
traces were dropped when there were only 30 active traces, which is ridiculous.
This series introduces two main changes:
- Changes the records budgeting from being per-session to be per-shard. This substantially
increases the amount of active records after which new records are going to be dropped.
- Introduces a flow when events' records are written BEFORE the corresponding tracing
session is over (right now traces are written to I/O back end only when the session object
is destroyed).
The later is meant to virtually eliminate the traces drops in normal situations at all.
Of course, if a back end is slow or if there are a lot of small sessions that do not complete we would still have
to drop new sessions/records in order to avoid uncontrolled growth of a memory foot print of Tracing.
If we see the later case happening a lot in the future we may add lowres timers to each session that would
commit the cached records for writing every X time. But let's not try to optimize something that we
are not completely sure has to be optimized... "
The histogram implementation uses sampling to estimate the mean and sum.
This patch adds a method that returns an estimated sum based on the mean
and the total number of events measured.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1467547341-30438-2-git-send-email-amnon@scylladb.com>
Currently, query_processor.cc code formatting is all over the place,
which makes the file hard to read. Apply some formatting magic to make
it prettier.
Message-Id: <1470832486-26020-2-git-send-email-penberg@scylladb.com>
"Ranges that wrap around are a source of complexity and bugs. This patchset
adds a nonwrapping_range class, which specifies the range can't wrap around.
It is the user of the nonwrapping_range that is required to enforce this
constraint.
The idea is to incrementaly disallow ranges that wrap around. We do it
for query::clustering_range in this patchset, and it can be done similarly
for other ranges. This moves the burden of unwrapping ranges to the edges.
Fixes#1544"
This patch changes the type of query::clustering_range to express that
ranges that wrap around are not allowed, and ranges that have the
start bound after the end bound are considered empty.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>