This patch adds support for parsing legacy compound values by
introducing the composite class, a wrapper around a sequence of bytes
serialized in the legacy format for compounds. Compound values can be
sent though the thrift API.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
centos-master jenkins job failed at building libgo, but we don't need go language, so let's disable it on scylla-gcc package.
Also we never use ada, disable it too.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1468166660-23323-1-git-send-email-syuu@scylladb.com>
"Checking bloom filters of sstables to compute max purgeable timestamp
for compaction is expensive in terms of CPU time. We can avoid
calculating it if we're not about to GC any tombstone.
This patch changes compacting functions to accept a function instead
of ready value for max_purgeable.
I verified that bloom filter operations no longer appear on flame
graphs during compaction-heavy workload (without tombstones).
Refs #1322."
Checking bloom filters of sstables to compute max purgeable timestamp
for compaction is expensive in terms of CPU time. We can avoid
calculating it if we're not about to GC any tombstone.
This patch changes compacting functions to accept a function instead
of ready value for max_purgeable.
I verified that bloom filter operations no longer appear on flame
graphs during compaction-heavy workload (without tombstones).
Refs #1322.
memtable_list::seal_on_overlflow() is called on each mutation to check
if current memtable should be flushed. It will call
memtable_list::seal_active_memtable() when that is the case.
The number of concurrent seals is guarded by a semaphore, starting
from commit 0f64eb7e7d, and allows
at most 4 of them.
If there are 4 flushes already pending, every incoming mutation will
enqueue a new flush task on the semaphore's wait list, without waiting
for it. The wait queue can grow without bounds, eventually leading to
out-of-memory.
The fix is to seal the memtable immediately to satisfy should_flush()
condition, but limit concurrency of actual flushes. This way the wait
queue size on the semaphore is limited by memtables pending a flush,
which is fairly limited.
Message-Id: <1467997652-16513-1-git-send-email-tgrabiec@scylladb.com>
With so many consumer concepts out there, it is confusing to name
parameters using genering "Consumer" name, let's name them after
(already defined) concepts: CompactedMutationsConsumer, FlattenedConsumer.
Previously, same function was used to handle both regular compaction
and cleanup requests. That's bad because a lot of conditions were
added for both compaction types to live in the same function.
Now, cleanup and regular compaction will live in different functions.
They share a lot of code, so helper functions were introduced.
This change is also important for user-initiated compaction that
will go through compaction manager in the future.
Code is also a lot easier to read now.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
There is no longer a need to use gate for regular termination of
fiber that runs compaction. Now, we only set task->stopping to
true, ask for compaction termination, and wait for its future to
resolve. Code is simplified a lot with this change.
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Enable --partitioner option so that user can choose partitioner other
than the default Murmur3Partitioner. Currently, only Murmur3Partitioner
and ByteOrderedPartitioner are supported. When non-supported partitioner
is specifed, error will be propogated to user.
In order to support ByteOrderedPartitioner, we need to implement the
missing describe_ownership and midpoint function in
byte_ordered_partitioner class.
As a starter, this path uses a simple node token distance based method
to calculate ownership. C* uses a complicated key samples based method.
We can switch to what C* does later.
Tests are added to tests/partitioner_test.cc.
Fixes#1378
If a node fails to talk to any seed node, shadow round will fail. We
should exit shadow round state before we continue.
This issue is spotted by
consistency_test.TestConsistency.data_query_digest_test dtest.
Message-Id: <ba0613532a69bac369ca316ab61d907b320c8e68.1467963674.git.asias@scylladb.com>
As Nadav notes we use the chunk length as the buffer size for the compressed
stream too.
Fix by using it only for the outer (uncompressed) stream; the inner
(compressed) stream uses the sstable buffer size, 128 kiB.
Fixes#1402.
Message-Id: <1467910556-5759-1-git-send-email-avi@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Support for streaming of large partitions from Paweł:
This series converts streaming to streaming_mutations so that there
is need to store full mutation in memory in order to send or receive
it.
The first several patches add a way of estimating mutation fragment
memory usage and introduce fragment_and_freeze() which produces
a stream of reasonably sized frozen mutations from a single streamed
mutation.
The second part of this patchset makes sure that streaming mutations
in fragments doesn't break isolation guarantees. This is achieved by
delaying visibility of sstables produced by streaming until the
streaming is completed. However, our current receiving code merges
mutations from all streaming plans together thus making it impossible
to track which data was received from a particular streaming plan.
The solution to that problem is to introduce an additional flag to
STREAM_MUTATION verb which informs the receiver whether the mutation
is fragmented and care must be taken to preserve isolation. Small
mutations behaved as they were, with writes from different stream
plans coalesced while big mutations are handled separately for each
streaming task.
Commit 206955e4 "streaming: Reduce memory usage when sending mutations"
moved streaming mutation limiter from do_send_mutations() to
send_mutations(). The reason for that was that send_mutation() did full
mutation copies. That's no longer the case and streaming limiter should
be moved back to do_send_mutation() in order to provide back pressure to
fragment_and_freeze().
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
If mutations are fragmented during streaming a special care must be
taken so that isolation guarantees are not broken.
Mutations received with flag "fragmented" set are applied to a memtable
that is used only by that particular streaming task and the sstables
created by flushing such memtables are not made visible until the task
is complte. Also, in case the streaming fails all data is dropped.
This means that fragmented mutations cannot benefit from coalescing of
writes from multiple streaming plans, hence separate way of handling
them so that there is no loss of performance for small partitions.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
plan_id is needed to keep track of the origin of mutations so that if
they are fragmented all fragments are made visible at the same time,
when that particular streaming plan_id completes.
Basically, each streaming plan that sends big (fragmented) mutations is
going to have its own memtables and a list of sstables which will get
flushed and made visible when that plan completes (or dropped if it
fails).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
When flush_streaming_mutations() is called at the end of streaming it is
supposed to flush all data and then invalidate cache. ranges However, if
there are already some memtable flushes in progress it won't wait for them.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
The purpose of this patch is to split the actions of writing sstable and
sealing it. As long as the sstable is unsealed it is considered
incomplete and is going to be removed on reboot.
Such functionality is needed in order to defer visibility of sstables
created during streaming until the streaming is complete.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
* seastar c82c36f...9267dfa (6):
> app_template: Make run() wait for func when reactor exit is triggered externally
> core: Introduce futurize_apply() helper
> rpc: make unexpected eof messages more informative
> Fix boost version check
> reactor: more fix for smp poll with older boost
> reactor: fix build on older boost due to spsc_queue::read_available()
2a46410f4a changed sstable_list from a map
to a set, so it is no longer sorted by generation. The code for finding
the list of sstables not being compacted relied on this sort order, and
now broke, returning a longer list than needed (including some of the
sstables being compacted). As a result, the compaction code preserved
the tombstones, incorrectly thinking there was still live data they
referenced.
Fix by sorting the set explicitly.
Fixes#1429.
Message-Id: <1467793026-6571-1-git-send-email-avi@scylladb.com>
"After this patchset, date tiered compaction strategy is supported by Scylla.
For those who don't know what it is about, the following article may help:
https://labs.spotify.com/2014/12/18/date-tiered-compaction/
It's also nicely explained here by our wiki page:
https://github.com/scylladb/scylla/wiki/SSTable-compaction#date-tiered-compaction
Basically, date tiered strategy was developed to help the database perform better
when facing a time series workload. Date tiered strategy will work to keep data
written at nearly the same time together, such that the number of relevant sstables
for a time-based query is relatively low. We still lacks support to filter out
sstables based on time parameters of a query, but that feature should come ASAP.
The following dtests now pass:
compaction_test.py:TestCompaction_with_DateTieredCompactionStrategy.compaction_delete_test
compaction_test.py:TestCompaction_with_DateTieredCompactionStrategy.compaction_strategy_switching_test
Used cassandra-stress with the parameter '-schema compaction\(strategy=DateTieredCompactionStrategy\)'
to check stability.
Fixes #511."
"Issue 1195 describes a scenario with a fairly easy reproducer in which
we can freeze the database. That involves writing simultaneously to
multiple CFs, such that the sum of all the memory they are using is larger
than the dirty memory limit, without not any of them individually being
larger than the memtable size.
This patchset rewrites the throttling code, including now active flushes
so that this situation cannot happen.
Fixes#1195"