Move most of the body into a non-template overload to reduce dependencies
in the header (and template bloat). The function is not on any fast path,
and noncopyable_function will likely not even allocate anything.
A default parameter of type T (or lw_shared_ptr<T>) requires that T be
defined. Remove the depndency by redefining the default parameter
as an overload, for T = user_types_metadata.
The workload in #3844 has these characteristics:
- very small data set size (a few gigabytes per shard)
- large working set size (all the data, enough for high cache miss rate)
- high overwrite rate (so a compaction results in 12X data reduction)
As a result, the compaction backlog controller assigns very few shares to
compaction (low data set size -> low backlog), so compaction proceeds very slowly.
Meanwhile, we have tons of cache misses, and each cache miss needs to read from a
large number of sstables (since compaction isn't progressing). The end result is
a high read amplification, and in this test, timeouts.
While we could declare that the scenario is very artificial, there are other
real-world scenarios that could trigger it. Consider a 100% write load
(population phase) followed by 100% read. Towards the end of the last compaction,
the backlog will drop more and more until compaction slows to a crawl, and until
it completes, all the data (for that compaction) will have to be read from its
input sstables, resulting in read amplification.
We should probably have read amplification affect the backlog, but for now the
simpler solution is to increase the minimum shares to 50 so that compaction
always makes forward progress. This will result in higher-than-needed compaction
bandwidth in some low write rate scenarios so we will see fluctuations in request
rate (what the controller was designed to avoid), but these fluctioations will be
limited to 5%.
Since the base class backlog_controller has a fixed (0, 0) point, remove it
and add it to derived classes (setting it to (0, 50) for compaction).
Fixes#3844 (or at least improves it).
Message-Id: <20181231162710.29410-1-avi@scylladb.com>
The current implementation breaks the invariant that
_size_bytes = reduce(_fragments, &temporary_buffer::size)
In particular, this breaks algorithms that check the individual
segment size.
Correctly implement remove_suffix() by destroying superfluous
temporary_buffer's and by trimming the last one, if needed.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190103133523.34937-1-duarte@scylladb.com>
Simplify the fix for memory based eviction, introduced by 918d255 so
there is no need to massage the counters.
Also add a check to `test_memory_based_cache_eviction` which checks for
the bug fixed. While at it also add a check to
`test_time_based_cache_eviction` for the fix to time based eviction
(e5a0ea3).
Tests: tests/querier_cache:debug
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c89e2788a88c2a701a2c39f377328e77ac01e3ef.1546515465.git.bdenes@scylladb.com>
"
This series adds staging SSTables support to row level repair.
It was introduced for streaming sessions before, but since row level
repair doesn't leverage sessions at all, it's added separately.
Tests:
unit (release)
dtest (repair_additional_test.py:RepairAdditionalTest,
excluding repair_abort_test, which fails for me locally on master)
"
* 'add_staging_sstables_generation_to_row_level_repair_2' of https://github.com/psarna/scylla:
repair: add staging sstables support to row level repair
main,repair: add params to row level repair init
streaming,view: move view update checks to separate file
When a node is bootstrapping, it will receive data from other nodes
via streaming, including materialized views. Regardless whether these
views are built on other nodes or not, building them on newly
bootstrapped nodes has no effect - updates were either already streamed
completely (if view building have finished) or will be propagated
via view building, if the process is still ongoing.
So, marking all views as 'built' for the bootstrapped node prevents it
from spawning superfluous view building processes.
Fixes#4001
Message-Id: <fd53692c38d944122d1b1013fdb0aedf517fa409.1546498861.git.sarna@scylladb.com>
Currently queriers evicted due to their TTL expiring are not
unregistered from the `reader_concurrency_semaphore`. This can cause a
use-after-free when the semaphore tries to evict the same querier at
some later point in time, as the querier entry it has a pointer to is
now invalid.
Fix by unregistering the querier from the semaphore before destroying
the entry.
Refs: #4018
Refs: #4031
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4adfd09f5af8a12d73c29d59407a789324cd3d01.1546504034.git.bdenes@scylladb.com>
Checking if view update path should be used for sstables
is going to be reused in row level repair code,
so relevant functions are moved to a separate header.
In insert_querier(), we may evict older queriers to make room for the new one.
However, we forgot to unregister the evicted queriers from
reader_concurrency_semaphore. As a result, when reader_concurrency_semaphore
eventually wanted to evict something, it saw an inactive_read_handle that was
not connected to a querier_cache::entry, and crashed on use-after-free.
Fix by evicting through the inactive_read_handle associated with the querier
to be evicted. This removes traces of the querier from both
reader_concurrency_semaphore and querier_cache. We also have to massage the
statistics since querier_inactive_read::evict() updates different counters.
Fixes#4018.
Tests: unit(release)
Reviewed-by: Botond Denes <bdenes@scylladb.com>
Message-Id: <20190102175023.26093-1-avi@scylladb.com>
The check in consume_range_tombstone was too late. Before getting to
it we would fail an assert calling to_bound_kind.
This moves the check earlier and adds a testcase.
Tests: unit (release)
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This moves the predicate functions to the start of the file, renames
is_in_bound_kind to is_bound_kind for consistency with to_bound_kind
and defines all predicates in a similar fashion.
It also uses the predicates to reduce code duplication.
Tests: unit (release)
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
It is useful to adjust the command line when running the docker image,
for example to attach a data volume or a ccache directory. Add e mechanism
to do that.
Message-Id: <20181228163306.19439-1-avi@scylladb.com>
"
Instead of allocating a contiguous temporary_buffer when reading
mutations from the commitlog - or hint - replaying, use fragemnted
buffers instead.
Refs #4020
"
* 'commitlog/fragmented-read/v1' of https://github.com/duarten/scylla:
db/commitlog: Use fragmented buffers to read entries
db/commitlog: Implement skip in terms of input buffer skipping
tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix()
utils/fragmented_temporary_buffer: Add remove_suffix
tests/fragmented_temporary_buffer_test: Add unit test for skip()
utils/fragmented_temporary_buffer: Allow skipping in the input stream
Static class_registries hinder librarification by requiring linking with
all object files (instead of a library from which objects are linked on
demand) and reduce readability by hiding dependencies and by their
horrible syntax. Hide them behind a non-static, non-template tracing
backend registry.
Message-Id: <20181229121000.7885-1-avi@scylladb.com>
This simplifies the code and allows to get rid of the overload of
advance() taking a temporary_buffer.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
distributed_loader is a sizeable fraction of database.cc, so moving it
out reduces compile time and improves readability.
Message-Id: <20181230200926.15074-1-avi@scylladb.com>
lz4 1.8.3 was released with a fix for data corruption during compression. While
the release notes indicate we aren't vulnerable, be cautious and update anyway.
Message-Id: <20181230144716.7238-1-avi@scylladb.com>
"This series contains a couple of fixes to the
view_update_from_staging_generator, the object responsible for
generating view updates from sstables written through streaming.
Fixes#4021"
* 'materialized-views/staging-generator-fixes/v2' of https://github.com/duarten/scylla:
db/view/view_update_from_staging_generator: Break semaphore on stop()
db/view/view_update_from_staging_generator: Restore formatting
db/view/view_update_from_staging_generator: Avoid creating more than one fiber
If view_update_from_staging_generator::maybe_generate_view_updates()
is called before view_update_from_staging_generator::start(), as can
happen in main.cc, then we can potentially create more than one fiber,
which leads to corrupted state and conflicting operations.
To avoid this, use just one fiber and be explicit about notifying it
that more work is needed, by leveraging a condition-variable.
Fixes#4021
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"
A sharded<database> is not very useful for accessing data since data is
usually distributed across many nodes, while a sharded<database>
contains only a single node's view. So it is really only used for
accessing replicated metadata, not data. As such only the local shard
is accessed.
Use that to simplify query_processor a little by replacing sharded<database>
with a plain database.
We can probably be more ambitious and make all accesses, data and metadata,
go through storage_proxy, but this is a start.
"
* tag 'qp-unshard-database/v1' of https://github.com/avikivity/scylla:
query_processor: replace sharded<database> with the local shard
commitlog_replayer: don't use query_processor
client_state: change set_keyspace() to accept a single database shard
legacy_schema_migrator: initialize with database reference
system_keyspace is an implementation detail for most of its users, not
part of the interface, as it's only used to store internal data. Therefore,
including it in a header file causes unneeded dependencies.
This patch removes a dependency between views and system_keyspace.hh
by moving view_name and view_build_progress into a separate header file,
and using forward declarations where possible. This allows us to
remove an inclusion of system_keyspace.hh from a header file (the last
one), so that further changes to system_keyspace.hh will cause fewer
recompilations.
Message-Id: <20181228215736.11493-1-avi@scylladb.com>
query_processor uses storage_proxy to access data, and the local
database object to access replicated metadata. While it seems strange
that the database object is not used to access data, it is logical
when you consider that a sharded<database> only contain's this node's
data, not the cluster data.
Take advantage of this to replace sharded<database> with a single database
shard.
During normal writes, query processing happens before commitlog, so
logically commitlog replaying the commitlog shouldn't need it. And in
fact the dependency on query_processor can be eliminated, all it needs
is the local node's database.
Provide legacy_schema_migrator with a sharded<database> so it doesn't need
to use the one from query_processor. We want to replace query_processor's
sharded<database> with just a local database reference in order to simplify
it, and this is standing in the way.
When streaming, sstables for which we need to generate view updates
are placed in a special staging directory. However, we only need to do
this for tables that actually have views.
Refs #4021
Message-Id: <20181227215412.5632-1-duarte@scylladb.com>
The version of boost in Fedora 29 has a use-after-free bug that is only
exposed when ./test.py is run with the --jenkins flag. To patch it,
use a fixed version from the copr repository scylladb/toolchain.
Message-Id: <20181228150419.29623-1-avi@scylladb.com>
Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write
operation over a period of time.
Content:
- data_listener classes: mechanism that interfaces with mutation readers in database and table classes,
- toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this
interfaces with data_listeners and the REST api),
- REST api for toppartitions query.
Uses Top-k structure for handling stream summary statistics (based on implementation in C*, see #2811).
What's still missing:
- JMX interface to nodetool (interface customization may be required),
- Querying #rows and #bytes (currently, only #partitions is supported).
Fixes#2811
* https://github.com/avikivity/scylla rafie_toppartitions_v7.1:
top_k: whitespace and minor fixes
top_k: map template arguments
top_k: std::list -> chunked_vector
top_k: support for appending top_k results
nodetool toppartitions: refactor table::config constructor
nodetool toppartitions: data listeners
nodetool toppartitions: add data_listeners to database/table
nodetool toppartitions: fully_qualified_cf_name
nodetool toppartitions: Toppartitions query implementation
nodetool toppartitions: Toppartitions query REST API
nodetool toppartitions: nodetool-toppartitions script
A Python script mimicking the nodetool toppartitions utility, utilizing Scylla REST API.
Examples:
$ ./nodetool-toppartitions --help
usage: nodetool-toppartitions [-h] [-k LIST_SIZE] [-s CAPACITY]
keyspace table duration
Samples database reads and writes and reports the most active partitions in a
specified table
positional arguments:
keyspace Name of keyspace
table Name of column family
duration Query duration in milliseconds
optional arguments:
-h, --help show this help message and exit
-k LIST_SIZE The number of the top partitions to list (default: 10)
-s CAPACITY The capacity of stream summary (default: 256)
$ ./nodetool-toppartitions ks test1 10000
READ
Partition Count
30 2
20 2
10 2
WRITE
Partition Count
30 1
20 1
10 1
Signed-off-by: Rafi Einstein <rafie@scylladb.com>