cryptopp's config.h has the following pragma:
#pragma GCC diagnostic ignored "-Wunused-function"
It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.
The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793
To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.
While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
tmpdir is a helper class representing a temporary directory.
Unfortunately, it suffers for some problems such as lack of proper
encapsulation and weak typing. This has caused bugs in the past when the
user code accidentally modified the member variable with the path to the
directory.
This patch modernises tmpdir and updates its users. The path is stored
in a std::filesystem::path and available read-only to the class users.
mkdtemp and boost are replaced by standard solution.
The users are update to use path more (when it didn't involve too many
changes to their code) and stop using lw_shared_ptr to store the tmpdir
when it wasn't necessary.
tmpdir intentionally doesn't provide any helpers for getting the path as
a string in order to discourage weak types.
Message-Id: <20190207145727.491-1-pdziepak@scylladb.com>
"
This is a first step in fixing #3988.
"
* 'espindola/large-row-warn-only-v4' of https://github.com/espindola/scylla:
Rename large_partition_handler
Print a warning if a row is too large
Remove defaut parameter value
Rename _threshold_bytes to _partition_threshold_bytes
keys: add schema-aware printing for clustering_key_prefix
Committer: Avi Kivity <avi@scylladb.com>
Branch: next
Switch to the the CMake-ified Seastar
This change allows Scylla to be compiled against the `master` branch of
Seastar.
The necessary changes:
- Add `-Wno-error` to prevent a Seastar warning from terminating the
build
- The new Seastar build system generates the pkg-config files (for
example, `seastar.pc`) at configure time, so we don't need to invoke
Ninja to generate them
- The `-march` argument is no longer inherited from Seastar (correctly),
so it needs to be provided independently
- Define `SEASTAR_TESTING_MAIN` so that the definition of an entry
point is included for all unit test compilation units
- Independently link Scylla against Seastar's compiled copy of fmt in
its build directory
- All test files use the (now public) Seastar testing headers
- Add some missing Seastar headers to source files
[avi: regenerate frozen toolchain, adjust seastar submoule]
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <02141f2e1ecff5cbcd56b32768356c3bf62750c4.1548820547.git.jhaberku@scylladb.com>
Many area of the code are splattered with unneeded templates. This patchset replaces
some of them, where the template parameter is a function object, with an std::function
or noncopyable_function (with a preference towards the latter; but it is not always
possible). As the template is compiled for each instantiation (if the function
object is a lambda) while a function is compiled only once, there are significant
savings in compile time and bloat.
text data bss dec hex filename
85160690 42120 284910 85487720 5187068 scylla.before
84824762 42120 284910 85151792 5135030 scylla.after
* https://github.com/avikivity/scylla detemplate/v2:
api/commitlog: de-template acquire_cl_metric()
database: de-template do_parse_schema_tables
database: merge for_all_partitions and for_all_partitions_slow
hints: de-template scan_for_hints_dirs()
schema_tables: partially de-template make_map_mutation()
distributed_loader: de-template
tests: commitlog_test: de-template
tests: cql_auth_query_test: de-template
test: de-template eventually() and eventually_true()
tests: flush_queue_test: de-template
hint_test: de-template
tests: mutation_fragment_test: de-template
test: mutation_test: de-template
At the moment are inefficiencies in how
collection_type_impl::mutation::compact_and_expire( handles tombstones.
If there is a higher-level tombstone that covers the collection one
(including cases where there is no collection tombstone) it will be
applied to the collection tombstone and present in the compaction
output. This also means that the collection tombstone is never dropped
if fully covered by a higher-level one.
This patch fixes both those problems. After the compaction the
collection tombstone is either unchanged or removed if covered by a
higher-level one.
Fixes#4092.
Message-Id: <20190118174244.15880-1-pdziepak@scylladb.com>
* seastar d59fcef...b924495 (2):
> build: Fix protobuf generation rules
> Merge "Restructure files" from Jesse
Includes fixup patch from Jesse:
"
Update Seastar `#include`s to reflect restructure
All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().
Mechanically converted with https://github.com/avikivity/unsprint.
After the new in-memory representation of cells was introduced there was
a regression in atomic_cell_or_collection::operator<< which stopped
printing the content of the cell. This makes debugging more incovenient
are time-consuming. This patch fixes the problem. Schema is propagated
to the atomic_cell_or_collection printer and the full content of the
cell is printed.
Fixes#3571.
Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>
Currently timeout is opt-in, that is, all methods that even have it
default it to `db::no_timeout`. This means that ensuring timeout is used
where it should be is completely up to the author and the reviewrs of
the code. As humans are notoriously prone to mistakes this has resulted
in a very inconsistent usage of timeout, many clients of
`flat_mutation_reader` passing the timeout only to some members and only
on certain call sites. This is small wonder considering that some core
operations like `operator()()` only recently received a timeout
parameter and others like `peek()` didn't even have one until this
patch. Both of these methods call `fill_buffer()` which potentially
talks to the lower layers and is supposed to propagate the timeout.
All this makes the `flat_mutation_reader`'s timeout effectively useless.
To make order in this chaos make the timeout parameter a mandatory one
on all `flat_mutation_reader` methods that need it. This ensures that
humans now get a reminder from the compiler when they forget to pass the
timeout. Clients can still opt-out from passing a timeout by passing
`db::no_timeout` (the previous default value) but this will be now
explicit and developers should think before typing it.
There were suprisingly few core call sites to fix up. Where a timeout
was available nearby I propagated it to be able to pass it to the
reader, where I couldn't I passed `db::no_timeout`. Authors of the
latter kind of code (view, streaming and repair are some of the notable
examples) should maybe consider propagating down a timeout if needed.
In the test code (the wast majority of the changes) I just used
`db::no_timeout` everywhere.
Tests: unit(release, debug)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>
This ensures that row::external_memory_usage() is invariant to
insertion order of cells.
It should be so, so that accounting of a clustering_row, merged from
multiple MVCC versions by the partition_snapshot_flat_reader on behalf
of a memtable flush, doesn't give a greater result than what is used
by the memtable region. Overaccounting leads to assertion failure in
~flush_memory_accounter.
Fixes#3625 (hopefully).
Message-Id: <1535982513-19922-1-git-send-email-tgrabiec@scylladb.com>
Currently we check that the sum of continuities is exactly the same as
expected on failure. Relax this to require that continuity is not
broader, since in some bad_alloc scenarios, or preemption, we will
have to mark some ranges as discontinuous.
Row cache tracker has numerous implicit dependencies on ohter objects
(e.g. LSA migrators for data held by mutation_cleaner). The fact that
both cache tracker and some of those dependencies are thread local
objects makes it hard to guarantee correct destruction order.
Let's deglobalise cache tracker and put in in the database class.
With the introduction of the new in-memory representation changing
column type has become a more complex operation since it needs to handle
switch from fixed-size to variable-size types. This commit adds an
explicit test for such cases.
As a prepratation for the switch to the new cell representation this
patch changes the type returned by atomic_cell_view::value() to one that
requires explicit linearisation of the cell value. Even though the value
is still implicitly linearised (and only when managed by the LSA) the
new interface is the same as the target one so that no more changes to
its users will be needed.
This commit makes database, sstables and tests aware
of which large_partition_handler they use.
Proper large_partition_handler is retrievable from config information
and is based on existing compaction_large_partition_warning_threshold_mb
entry. Right now CQL TABLE variant of large_partition_handler is used
in the database.
Tests use a NOP version of large_partition_handler, which does not
depend on CQL queries at all.
This change is a preparation for introducing row-level eviction, such that entries
can be evicted from older versions without having to touch other versions.
Currently continuity flags on entries are interpreted relative to the
combined view merged from all entries. For example:
v2: <key=2, cont=1>
v1: <key=1, cont=1>
In v2, the flag on entry key=2 marks the range (1, 2) as
continuous. This is problematic because if the old version is evicted, continuity
will change in an incorrect way:
v2: <key=2, cont=1>
Here, the range (-inf, 1) would be marked as continuous, which is not true.
To solve this problem, we change the rules for continuity
interpretation in MVCC. Each version will have its own continuity,
fully specified in that version, independent of continuity of other
versions. Continuity of the snapshot will be a union of continuous
ranges in each version.
It is assumed that continuous intervals in different versions are non-
overlapping, except for points corresponding to complete rows, in
which case a later version may overlap with an older version
(overwrite). We make use of this assumption to make calculation of the
union of intervals on merging easier. I make use of the above
assumption in mutation_partition::apply_monotonically().
MVCC population of incomplete entries already almost maintains the
non-overlapping invariant, because population intervals correspond to
intervals which are incomplete in the old snapshot. The only change
needed is to ensure that both population bounds will have entries in
the latest version. Population from memtables doesn't mark any
intervals as continuous, so also conforms. The only change needed
there is to not inherit continuity flags from the old snapshot,
effectively making the new version internally discontinuous except for
row points.
The example from the beginning will become:
v2: <key=1, cont=0> <key=2, cont=1>
v1: <key=1, cont=1>
When marking a range as continuous with some rows present only in
older versions, we need to insert entries in the latest version, so
that we can mark the range as continuous. The easiest solution is to
copy the entry from the old version. Another option would be to add
support for incomplete rows and insert such instead. This way we would
avoid duplicating row contents. This optimization is deferred.
Introduce class result_options to carry result options through the
request pipeline, which at this point mean the result type and the
digest algorithm. This class allows us to encapsulate the concrete
digest algorithm to use.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"Changes merging in MVCC to apply newer version to older instead of older to
newer.
Before (v0 = oldest):
(((v3 + v2) + v1) + v0)
After:
(v0 + (v1 + (v2 + v3)))
or:
(((v0 + v1) + v2) + v3)
There are several reasons to do this:
1) When continuity merging will change semantics to support eviction
from older versions, it will be easier to implement apply() if we
can assume that we merge newer to older instead of older to
newer, since newer version may have entries falling into a
continuous interval in older, but not the other way around. If we
didn't revert the order, apply() would have to keep track of
lower bound of a continuous interval in the right-hand side
argument (older version) as it is applied and update continuity
flags in the left hand side by scanning all entries overlapping
with it. If order is reversed, merging only needs to deal with
the current entry. Also, if we were to keep the old order, we
cannot simply move entries from the left hand side as we merge
because we need to keep track of the lower bound of a continuous
interval, and we need to provide monotonic exception
guarantees. So merging would be both more complicated and slower.
2) With large partitions older versions are typically larger than
newer versions, and since merging is O(N_right*(1 + log(N_left))),
it's better to merge newer into older.
This fixes latency spikes seen in perf_cache_eviction.
Fixes #2715."
* tag 'tgrabiec/reverse-order-of-mvcc-version-merging-v1' of github.com:scylladb/seastar-dev:
mvcc: Reverse order of version merging
anchorless_list: Introduce last()
mvcc: Implement partition_entry::upgrade() using squashed()
mvcc: Extract version merging functions
mutation_partition: Add rows_entry::set_dummy()
position_in_partition: Introduce after_key()
Change merging to apply newer version to older instead of older to
newer.
Before:
(((v3 + v2) + v1) + v0)
After:
(v0 + (v1 + (v2 + v3)))
or equivalent:
(((v0 + v1) + v2) + v3)
There are several reasons to do this:
1) When continuity merging will change semantics to support eviction
from older versions, it will be easier to implement apply() if we
can assume that we merge newer to older instead of older to
newer, since newer version may have entries falling into a
continuous interval in older, but not the other way around. If we
didn't revert the order, apply() would have to keep track of
lower bound of a continuous interval in the right-hand side
argument (older version) as it is applied and update continuity
flags in the left hand side by scanning all entries overlapping
with it. If order is reversed, merging only needs to deal with
the current entry. Also, if we were to keep the old order, we
cannot simply move entries from the left hand side as we merge
because we need to keep track of the lower bound of a continuous
interval, and we need to provide monotonic exception
guarantees. So merging would be both more complicated and slower.
2) With large partitions older versions are typically larger than
newer versions, and since merging is O(N_right*(1 + log(N_left))),
it's better to merge newer into older.
Fixes#2715.
"This simplifies implementation of mutation_partition merging by relaxing
exception guarantees it needs to provide. This allows reverters to be dropped.
Direct motivation for this is to make it easier to implement new semantics
for merging of clustering range continuity.
Implementation details:
We only need strong exception guarantees when applying to the memtable, which is
using MVCC. Instead of calling apply() with strong exception guarantees on the latest
version, we will move the incoming mutation to a new partition_version and then
use monotonic apply() to merge them. If that merging fails, we attach the version with
the remainder, which cannot fail. This way apply() always succeeds if the allocation
of partition_version object succeeds.
Results of `perf_simple_query_g -c1 -m1G --write` (high overwrite rate):
Before:
101011.13 tps
102498.07 tps
103174.68 tps
102879.55 tps
103524.48 tps
102794.56 tps
103565.11 tps
103018.51 tps
103494.37 tps
102375.81 tps
103361.65 tps
After:
101785.37 tps
101366.19 tps
103532.26 tps
100834.83 tps
100552.11 tps
100891.31 tps
101752.06 tps
101532.00 tps
100612.06 tps
102750.62 tps
100889.16 tps
Fixes #2012."
* tag 'tgrabiec/drop-reversible-apply-v1' of github.com:scylladb/seastar-dev:
mutation_partition: Drop apply_reversibly()
mutation_partition: Relax exception guarantees of apply()
mutation_partition: Introduce apply_weak()
tests: mvcc: Add test for atomicity of partition_entry::apply()
tests: Move failure_injecting_allocation_strategy to a header
tests: mutation_partition: Test exception guarantees of apply_monotonically()
mvcc: Use apply_monotonically() where sufficient
mvcc: partition_version: Use apply_monotonically() to provide atomicity
mvcc: Extract partition_entry::add_version()
mutation_partition: Introduce apply_monotonically()
mutation_partition: Introduce row::consume_with()
The uses which needed strong or weak exception guarantees were
switched to a solution involving apply_monotonically(). All remaining
uses don't need any exception guarantees.