scylla-blocktune is a script that parses scylla.yaml and tunes the data file
and commitlog directories it references.
Tuning includes:
- set the I/O scheduler to noop
- disable merging
- tune dependent devices (like RAID members)
Message-Id: <1476357027-15014-2-git-send-email-avi@scylladb.com>
The current chunk size of 256 gives a 50% probability of a 128k read or
write getting split into two accesses. This reduces efficiency and
increases latency.
Change the chunk size to 1MB, with a 12% probability of cross-member
access.
Message-Id: <1476269082-2473-1-git-send-email-avi@scylladb.com>
Since Debian 8(jessie) does not provides g++-5, we frequently got compile error
because we are using older compiler.
To fix the problem, backport g++-5 from Debian 9(stretch).
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1476694318-10640-3-git-send-email-syuu@scylladb.com>
On Debian, lsb_release -r returns the version number something like '8.6'.
However, on this script we want to check major version only.
Therefore we can use VERSION_ID from /etc/os-release which only contains
major version number.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1476694318-10640-2-git-send-email-syuu@scylladb.com>
Commit 7dcd70124a "tests/sstables: add
test for fast forwarding reader" added a test for skipping parts of
sstable. Unfortunately, it did not include the sstables it was trying to
read.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
"This patchset enables mutation readers to be fast forwarded to a different
partition range. The main reason for introducing such feature are range
queries served from cache. If the cache is partially populated in the
requested range the reader will end up with multiple subranges that have
to be read from the sstables. Originally, each of these subranges would
require a new reader to be created, but with fast forwarding we can have
just one sstable reader. This is better since there is a chance that buffers
kept by the reader may be still useful after fast forwarding it.
In this series there are also patches that clean up cache readers in order
to make integration with fast forwarding easier. Namely, continuity flag is
changed to store information about range before the entry which significantly
simplifies the logic.
Fixes #1299."
* 'pdziepak/fast-forward-mutation-readers/v5' of github.com:cloudius-systems/seastar-dev: (24 commits)
sstables: keep separate stream history for single and range reads
sstables: drop sstable::{lower, upper}_bound()
row_cache: rework cache to use fast forwarding reader
row_cache: put cache entry flags in a struct
row_cache: add do_find_or_create_entry() to reduce code duplication
mutation_reader: forward fast_forward_to() calls
tests/row_cache: add fast_forward_to() to throttled reader
tests/row_cache: count mutations read from _underlying
memtable: add support for fast_forward_to()
drop key readers
tests/mutation_reader: test fast forwarding combined reader
database: enable fast forwarding of range_sstable_reader
combined_mutation_reader: implement fast_forward_to()
mutation_reader: make combinded_reader public
tests/sstables: add test for fast forwarding reader
tests: add more helpers to mutation reader assertions
sstables: enable fast forwarding for range readers
mutation_reader: introduce fast_forward_to()
sstables: implement mutation_reader::impl::fast_forward_to()
sstables: introduce index_reader
...
Single partition and partition range reads are expected to behave
considerably different so it is worth to have them use separate file
stream history. This also makes reads use different history for each
sstable which is also a good thing.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This uncomfortably large patch overhauls cache range reader so that it
can take advantage of fast forwarding mutation readers.
A significant change in the cache itself is that the continuity flag now
is used to determine whether cache is contiguous between the previous
entry and the current one. This allows for a significant simplification
of the cache code and easier integration with reader fast forwarding.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Flags are easier to manage if they are in a single structure.
Especially, default initialization and move contstructors are simpler
and less error prone.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Originally, cache tests checked how many times a mutation reader was
created from the underlying mutation source to determine whether
continuity flag is working correctly.
This is not going to work with fast forwarding mutation readers so the
test is switched to count number of mutations (+ end of stream markers)
returned from underlying mutaiton readers which is much less fragile.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Fast forwarding of memtable readers is needed only for unit tests which
often use memtables as underlying data source for cache and the cache
readers.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
When fast forwarding a reader that combines sstable reader we must also
remember that the set of sstables for the new range may be different
than for the previous one. The reader introduced in this patch makes
sure that we read from correct sstables.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
We want to be able to fast forward sstable readers. However, just
implementing fast_forward_to() for combined_reader is not enough as the
sstables we are reading from may need to change.
Following patches are going to introduce a combined sstable reader that
derives from combined_reader. To make that possible we first need to
make combined_reader public.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This patch introduces the interface for fast forwarding mutation
readers. The main user of this feature is going to be cache which, while
serving range query, may need to read multiple small ranges from the
sstables to populate itself with the missing entries.
Fast forwarding is an alternative to recreating a reader with different
range. Its main advantage is fact that it avoids dropping data that has
already been read.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This patch allows sstable readers to be fast forwarded without making it
necessary to recreate the reader (and dropping all buffers in the
process). It is built on top of index_reader and ability of
data_consume_context to be fast forwarded.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
index_reader is a helper that implements index lookups. Its goal is to
avoid dropping read buffers if they still may be needed (for example to
get end bound of the range or after fast forwarding the reader).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
That overload was used only by unit test and violated guarantee that
partition range lives until mutation reader is done.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
is_compactible() will pass on very small regions. full_compaction() is
only used in tests to force objects to be moved due to compaction, so
we want all reclaimable regions to be compacted.
The move constructor of partition_version was not invoking move
constructor of anchorless_list_base_hook. As a result, when
partition_version objects were moved, e.g. during LSA compaction, they
were unlinked from their lists.
This can make readers return invalid data, because not all versions
will be reachable.
It also casues leaks of the versions which are not directly attached
to memtable entry. This will trigger assertion failure in LSA region
destructor. This assetion triggers with row cache disabled. With cache
enabled (default) all segments are merged into the cache region, which
currently is not destroyed on shutdown, so this problem would go
unnoticed. With cache disabled, memtable region is destroyed after
memtable is flushed and after all readers stop using that memtable.
Fixes#1753.
Message-Id: <1476778472-5711-1-git-send-email-tgrabiec@scylladb.com>
Since we are exiting Scylla process in engine().at_exit() using
::_exit(0), even verify_seastar_io_scheduler() throwing an exception,
scylla always exit with 0.
Systemd misunderstands scylla-server.service was shutdown successfully
because of this, so we need to pass correct exit code to ::_exit() here.
Fixes#1674
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1475065607-15486-1-git-send-email-syuu@scylladb.com>
* seastar 207bf3d...ccd8649 (3):
> Merge "Augment semaphore with non-blocking operations" from Glauber
> Merge "More dynamic fstream patches" from Paweł
> Merge "fstream: add dynamic adjustments based on stream history" from Paweł
A 1MB response will require 2000 allocations with the current 512-byte
chunk size. Increase it exponentially to reduce allocation count for
larger responses (still respecting the upper limit).
Message-Id: <1476369152-1245-1-git-send-email-avi@scylladb.com>
Memory accounting code was attaching partition_snapshot to
partition_entry in order to calculate the size of partition_version
object. However, it is only allowed if partition_entry doesn't have
any snapshot attached already. In this case it always has one, created
by the flushing reader.
Change the accounting code to reuse existing partition_snapshot reference.
Fixes#1746
Message-Id: <1476449160-9252-1-git-send-email-tgrabiec@scylladb.com>
LSA tries to allocate zones as large as possible (while still leaving
enough free space for the standard allocator). It uses the amount of
free memory in order to guess how much it can get, but that obviously
doesn't account for fragmentation and the allocation attempt may fail.
This patch changes the LSA code so that it doesn't throw in case zone
couldn't be created but just returns a null pointer which should be
more performant if the LSA memory cannot grow any more.
Fixes#1394.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1476435031-5601-1-git-send-email-pdziepak@scylladb.com>
The expected behaviour in the scylla_setup script is that a question
will be followed by the answer.
For example, after asking if the scylla should be run as a service the
relevant actions will be taken before the following question.
This patch address two such mis-orders:
1. the scylla-housekeeping depends on the scylla-server, but the
setup should first setup the scylla-server service and only then ask
(and install if needed) the scylla-housekeeping.
2. The node_exporter should be placed after the io_setup is done.
Fixes#1739
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1476370098-25617-1-git-send-email-amnon@scylladb.com>
Change abstract_replication_strategy::create_replication_strategy() to
throw exceptions::configuration_error if replication strategy class
lookup to make sure the error is converted to the correct CQL response.
Fixes#1755
Message-Id: <1476361262-28723-1-git-send-email-penberg@scylladb.com>
* seastar f937fb0...207bf3d (11):
> Merge "iotune: gracefully exit on predictable exceptions" (Fixes#1623)
> core/semaphore: Add semaphore_units::release()
> Merge "rometheus API with grafana uses labels" from Amnon
> core/thread: Fix stack alloc-dealloc mismatch
> core/thread: Make jmp_buf_link::yield_at use the same time point as thread_scheduling_group
> file: support for XFS on older kernels
> reactor: fix bug when handling EBADF in flush_pending_aio()
> prometheus CPU should start in 0
> Collectd: bytes ordering depends on the type
> tests: Check that backtrace() doesn't corrupt signal mask
> core/thread: Add stack guards to seastar thread stacks