compact_mutation code is going to be shared among queries and sstable
compaction. There are some differences though. Queries don't provide
_max_purgeable and sstable compaction don't need any limits.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
_max_perguable can be different for each partition, since it is computed
using sstables in which that partition is present (or likely to be
present).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Since decorated keys are already computed it is better to pass more
information than less. Consumers interested just in partition key can
just drop token and the ones requiring full decorated key don't need to
recompute it.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
query::full_slice is a partiton slice which has full clustering row
ranges for all partition keys and no per-partition row limit.
Options and columns are not set.
It is used as a helper object in cases when a reference to
partition_slice is needed but the user code needs just all data there is
(an example of such case would be sstable compaction).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Currently, each sstable write has its separate thread. However, the goal
is to have compaction use consume_flattened() with a consumer that
creates and writes the sstables. consume_flattened() needs to be executed
inside a thread, since sstable writer may defer.
This patch is a first step in preparations and it just makes whole
compaction logic run inside a thread. That makes little sense now, since
all sstable writes spawn their own threads but that's going to change
in the following patches.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
While limiting the number of concurrently executing sstable readers reduces
our memory load, the queued readers, although consuming a small amount of
memory, can still grow without bounds.
To limit the damage, add two limits on the queue:
- a timeout, which is equal to the read timeout
- a queue length limit, which is equal to 2% of the shard memory divided
by an estimate of the queued request size (1kb)
Together, these limits bound the amount of memory needed by queued disk
requests in case the disk can't keep up.
Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>
At the moment, we only trigger compaction after creating a new
sstable as a result of memtable flush, or some other event such
as changing compaction strategy of a column family.
However, it's important to trigger compaction on boot too.
That will happen after loading all column families.
Fixes#1404.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <54d38a418157454eec97aaba6b8a6b6e51484db4.1467135349.git.raphaelsc@scylladb.com>
The scylla-jmx no longer shutdown itself. A better setup would be that
the it would be started when the scylla-server starts and that it would
shutdown when the scylla-server shutdown.
This patch do the scylla-server part of the change.
The scylla-server definition would Want the scylla-jmx.service so there
is no need to enable the scylla-jmx.service.
A patch to the scylla-jmx would cause it to shutdown when the scylla-jmx
shutsdown.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1467184502-4358-1-git-send-email-amnon@scylladb.com>
Following Cassandra, our default sstable compression chunk size is 64 KB.
The big downside of this default size is that small reads need to read
and uncompress a large chunk, around 32 KB (if compression halves the data
size). In this patch we switch the default chunk size to 4 KB, which allows
faster small reads (the report in issue #1337 was of a 60-fold speedup...).
Since commit 2f56577, large reads will not be signficantly slowed down by
changing to a small chunk size. The remaining potential downside of this
change is lowering of the compression ratio because of the smaller chunks
individually compressed. However, experimentation shows that the compression
ratio is hurt somewhat, but not dramatically, by lowering the chunk size:
A recent survey of Cassandra compression in
https://www.percona.com/blog/2016/03/09/evaluating-database-compression-methods/
reports a compression ratio of 2 for 64 KB chunks, vs. 1.75 for 4 KB chunks.
My own test on a cassandra-stress workload (whose data is relatively hard
to compress), showed compression ratio 1.25 for 64 KB chunk, vs. 1.23 for
4 KB chunks.
Also remember that if a user wants to control the chunk length for a
particular table, he can - the 64 KB or 4 KB sizes are just the default.
Fixes#1337
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1467063335-12096-1-git-send-email-nyh@scylladb.com>
Helps in identifying pointers allocated through seastar
allocator. Shows to which thread the pointer belongs, to which size
class, whether it's live or free, what's the offset realtive to the
live object.
Example:
(gdb) scylla ptr 0x6040abe88170
thread 1, small (size <= 320), live (0x6040abe88140 +48)
Message-Id: <1467047215-1763-1-git-send-email-tgrabiec@scylladb.com>
* seastar 3029ebe...c15055c (5):
> memory: add option to mlock() all memory
> reactor: run idle poll handler with a pure poll function
> ignore all but one failed futures in map_reduce
> tutorial: more general exception printout on startup
> resource: don't abort on too-high io queue count
Fixes#1395.
Fixes#1400.
From Avi:
Both the cql binary transport and the rpc server have protection against
too many concurrent requests overwhelming the database due to transient
allocations. There work by estimating the amount of memory a request
requires, and accounting that against a semaphore. When the semaphore
blocks, we stop dequeing requests from the tcp connection.
Unfortunately, this doesn't work for reads, because we can't estimate the
required memory size. A small read request can require many sstables to be
read, perhaps concurrently, and a large response to be generated.
Fix by limiting the number of concurrent reads in a shard to 100. This
is more than enough concurrency for any reasonable disk, and there is no
network communication at this level, so we're safe from high network
latency requiring high concurrency.
Fixes#1398.
Since reading mutations can consume a large amount of memory, which, moreover,
is not predicatable at the time the read is initiated, restrict the number
of reads to 100 per shard. This is more than enough to saturate the disk,
and hopefully enough to prevent allocation failures.
Restriction is applied in column_family::make_sstable_reader(), which is
called either on a cache miss or if the cache is disabled. This allows
cached reads to proceed without restriction, since their memory usage is
supposedly low.
Reads from the system keyspace use a separate semaphore, to prevent
user reads from blocking system reads. Perhaps we should select the
semaphore based on the source of the read rather than the keyspace,
but for now using the keyspace is sufficient.
A restricting_reader wraps a mutation_reader, and restricts it concurrency
using a provided semaphore; this allows controlling read concurrency, which
is important since reads can consume a lot of resources ((number of
participating sstables) * 128k after we have streaming mutations, and a lot
more before).
This patch adds support for thrift prepared statements. It specializes
the result_message::prepared into two types:
result_message::prepared::cql and result_message::prepared::thrift, as
their identifiers have different types.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
query_options::prepare() changes the values array, but this is not the
one used by query_options internally (e.g., in get_value_at). So we
need to also recalculate the value_views after prepare() is called.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Having both the values and value_views arguments in the query_options
ctor is confusing, since query_options uses only the value_views field
but that is not communicated to the caller.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Similarly to the with_cob functions, this one takes the exn_cob
function and ensures it is called in case of an exception. This
is useful when the return type of the thrift verb is not nothrow
move constructible; by holding on to the cob inside the verb and
calling it directly when we have the result we avoid having to
wrap it in a smart pointer.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
gcc 6 complains that deleting a managed_bytes::external isn't defined
because the size isn't known. I'm not sure it's correct, but there's no
way to tell because flexible arrays aren't standardized.
Fix by using an array of zero size.
Message-Id: <1466715187-4125-1-git-send-email-avi@scylladb.com>
Using a template lambda invokes a bug in Fedora 24's boost where the
lambda's parameter is an internal boost type rather than a range_tombestone.
Constraining the parameter with an explicit type avoids the problem.
Message-Id: <1466844211-17298-1-git-send-email-avi@scylladb.com>
This adds to the definition of the collectd API the ability to turn on
and off specific collectd metrics.
For the GET end point a POST option was added that allow to enable or
disable a metric.
The general GET endpoint now returns the enable flag that indicates if
the metric is enable.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1466932139-19264-2-git-send-email-amnon@scylladb.com>
"This patchset implements the thrib describe verbs:
- describe_keyspace
- describe_keyspaces
- describe_cluster_name
- describe_version
- describe_ring
- describe_local_ring
- describe_token_map
- describe_partitioner
- describe_snitch
- describe_schema_versions
The verbs describe_splits and describe_splits_ex are not implemented
because they are marked as experimentail (Origin's thrift interface has
this to say about them: "experimental API for hadoop/parallel query
support. may change violently and without warning."). Some drivers have
moved away from depending on this verb (SPARKC-94). The correct way to
implement the verbs for us would be to use the size_estimates system table
(CASSANDRA-7688). However, we currently don't populate size_estimates, which
is done by SizeEstimatesRecorder.java in Origin."
This patch removes a conversion function from an internal type
name to Origin's naming, which isn't needed because the
abstract_type hierarchy already keeps that mapping.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>