It will be needed to obtain column_translation that will
be added to data_consume_context in the next patch.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
New name describes the states in a better way as those states
will be used both for static and non-static rows.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
"
Main optimization is in the patch titled "lsa: Reduce amount of segment compactions".
I measured 50% reduction of cache update run time in a steady state for an
append-only workload with large partition, in perf_row_cache_update version from:
c3f9e6ce1f/tests/perf_row_cache_update.cc
Other workloads, and other allocation sites probably also could see the
improvement.
"
* tag 'tgrabiec/reduce-lsa-segment-compactions-v1' of github.com:tgrabiec/scylla:
lsa: Expose counters for allocation and compaction throughput
lsa: Reduce amount of segment compactions
lsa: Avoid the call to segment_pool::descriptor() in compact()
lsa: Make reclamation on reserve refill more efficient
Fixes#3339
* seastar 840002c...0a1a327 (7):
> Merge "fix perftune.py issues with cpu-masks on big machines" from Vlad
> Merge 'Handle Intel's NICs in a special way' from Vlad
> reactor: fix calculation of idle ticks
> log: streamline logging internals a little
> Merge "CMake imrovements and compatibility" from Jesse
> iotune: fix typo in property name
> cmake: do not find_package(Boost ...) if Boost is a target
"
This patchset adds support for writing counter cells in SSTables 3.x
format ('m'). The logic of writing counters is almost identical to that
used for the old 2.x format ('k'/'l') with the only difference that the
data length preceding serialised shards is written as a vint.
Tests: unit {release}.
Generated SSTables are verified to be processed fine by sstabledump
(note that sstabledump only outputs the binary data for counters, not
their actual values, same as sstable2json).
Verified with Cassandra 3.11 to get the expected values from the
counters table:
cqlsh> SELECT * from sst3.counter_table;
pk | ck | rc1 | rc2
-----+-----+-----+-----
key | ck1 | 10 | 1
(1 rows)
Verified that the deleted counter can no longer be updated:
cqlsh> use sst3 ;
cqlsh:sst3> UPDATE counter_table SET rc1 = rc1 + 2 WHERE pk = 'key' AND ck = 'ck2';
cqlsh:sst3> SELECT * from sst3.counter_table;
pk | ck | rc1 | rc2
-----+-----+-----+-----
key | ck1 | 10 | 1
(1 rows)
"
* 'projects/sstables-30/write_counters/v1' of https://github.com/argenet/scylla:
tests: Unit tests to cover writing counters in SSTables 3.x format.
sstables: Support writing counters for SSTables 3.x.
sstables: Move code writing counter value into a separate helper.
sstable test fails when running concurrently (for example, release and debug
mode) because it uses a static temporary dir in lots of tests.
Let's fix it by switching to dynamic temporary dir, which is created using
mkdtemp(). Also the sstable tests will now run in /tmp, and so it's made
much faster.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180516042044.15336-1-raphaelsc@scylladb.com>
Reclaiming memory through segment compaction is expensive. For
occupancy of 85%, in order to reclaim one free segment, we need to
compact 7 segments, by migrating 6 segments worth of data. This results
in significant amplification. Compaction involves moving objects,
which in some cases is expensive in itself as well
(See https://github.com/scylladb/scylla/issues/3247).
This patch reduces amount of segment compactions in favor of doing
more eviction. It especially helps workloads in which LRU order
matches allocation order, in which case there will be no segment
compaction, and just eviction.
In perf_row_cache_update test case for large partition with lots of
rows, which simulates appending workload, I measured that for each new
object allocated, 2 need to be migrated, before the patch. After the
patch, only 0.003 objects are migrated. This reduces run time of
cache update part by 50%.
"
The memory estimations we have when using the chunked vector
are usually slightly wrong. We can make them more accurate by
exporting the memory usage directly as a chunked_vector API.
"
* 'chunked_memory-v2' of github.com:glommer/scylla:
large_bitset: be more accurate with memory usage
chunked_vector: exports its current memory usage
We are slightly underestimating the amount of memory we use. Now that
the chunked vector can exports its internal memory usage we can use that
directly.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
There are times in which we would like to estimate how much memory
a chunked_vector is using. We have two strategies to do it:
1) multiply the size by the size of the elements. That is wrong, because
the chunked_vector can allocate larger chunks in anticipation of more
elements to come.
2) multiply the number of chunks by 128kB. That is also wrong, because
the chunk_vector will not always allocate the entire chunk if there are
only a few elements in it.
The best way to deal with it is to allow the chunked_vector to exports
its current memory usage.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Commit 9eb8ea8b11 installed
scylla_blocktune.py as part of preparing the rpm, but forgot
to add it to the installed file list, breaking the rpm build.
Fix by listing the file in the %files section.
Message-Id: <20180506202807.5719-1-avi@scylladb.com>
"
SSTables 3.x (format 'm') use CRC32 instead of Adler32 for calculating
checksums. This patchset introduces support for CRC32 along with Adler32
in checksummed_file_writer to be used for SSTables written in 'mc'
format.
Structures and helpers introduced for CRC32 will be later used for
calculating checksums for compressed files as well (not a part of this
patchset).
Tests: unit {release}
"
* 'projects/sstables-30/write-digest-crc/v3' of https://github.com/argenet/scylla:
tests: Add test covering checksumming SSTables 3.0 with CRC32.
sstables: Support CRC32 checksum for SSTables 3.x.
sstables: Move adler32 routines under the scope of a class.
sstables: Move checksum utils into separate header.
sstables: Remove unused 'checksum_file' flag from checksummed_file_writer.
"
The native protocol server generates mant reactor tasks that
can be easily eliminated. I measured a read workload with 100%
cache hit rate, seeing the number of tasks per request drop
from ~31 to ~27, and an increase of 3% in throughput.
"
* tag 'transport-optimize-1/v1' of https://github.com/avikivity/scylla:
transport: remove unused capture of flags variable
transport: merge response write and error handling continuations
transport: make write_repsonse() return void
transport: de-template a lambda
transport: merge memory-management and logging continuations
transport: remove gate continuation
transport: merge two response processing continuations
transport: simplify response processing continuation
transport: remove gratuitous continuation from process_request_one()
It just schedules the response, and returns immediately.
(I thought about calling it schedule_response(), but usually it will
write the response immediately, since waiting for network writes is
rare in a local network).
with_gate() generates a continuation if the protected function defers.
Avoid that by merging a gate::leave() call with another, preexisting,
continuation.
We have one coninuation transforming the result, and another shutting
down tracing. Since the first cannot defer, we can merge the two, reducing
the number of tasks processed by the reactor.
This patch fixes a bug where queries using a secondary index would, in
some cases, produce the same rows multiple times.
The problem was that the code begins by finding a list of primary keys
that match the search, and then work on the partitions containing them.
If multiple rows matched in the same partition, the partition was considered
multiple times, and the same rows were output multiple times.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180510203141.17157-1-nyh@scylladb.com>
If we're given a single reader (can be common in a low-write-rate table,
where most of the data will be in a single large sstable, or in leveled
tables) then we can avoid the overhead of the combining reader by returning
the single input.
Tests: unit (release)
Message-Id: <20180513130333.15424-1-avi@scylladb.com>