Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.
Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.
Closes#10433
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.
With changes
real 29m14.051s
user 168m39.071s
sys 5m13.443s
Without changes
real 30m36.203s
user 175m43.354s
sys 5m26.376s
Closes#10194
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Row-level rpair hashes the mutation fragment and wraps this into a
private fragment_hasher class. For some reason it takes ~20 minutes
for clang to compile the row_level.o with -O3 level (release mode).
Putting the whole fragment_hasher into a dedicated file reduces the
compilation times ~9 times.
However, it seems more natural not to move the fragment_hasher around
but to specialize the appending_hash<> for mutation_fragment and make
row_level.cc code just call feed_hash().
Compilation times (release mode):
before after
row_level.o 19m34s 2m4s
mutation_fragment.o 13s 17s
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The clustering_row is a wrapper over the deletable_row and
facilitates the apply-creation of the latter from some other
objects. Soon it will accept the deletable_row itself for
apply()-ing.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Both methods apply a list of tombstones to the stream. One
was unused even before the set, the other one became unused
after previous patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is a revival of #7490.
Quoting #7490:
The managed_bytes class now uses implicit linearization: outside LSA, data is never fragmented, and within LSA, data is linearized on-demand, as long as the code is running within with_linearized_managed_bytes() scope.
We would like to stop linearizing managed_bytes and keep it fragmented at all times, since linearization can require large contiguous chunks. Large contiguous allocations are hard to satisfy and cause latency spikes.
As a first step towards that, we remove all implicitly linearizing accessors and replace them with an explicit linearization accessor, with_linearized().
Some of the linearization happens long before use, by creating a bytes_view of the managed_bytes object and passing it onwards, perhaps storing it for later use. This does not work with with_linearized(), which creates a temporary linearized view, and does not work towards the longer term goal of never linearizing. As a substitute a managed_bytes_view class is introduced that acts as a view for managed_bytes (for interoperability it can also be a view for bytes and is compatible with bytes_view).
By the end of the series, all linearizations are temporary, within the scope of a with_linearized() call and can be converted to fragmented consumption of the data at leisure.
This has limited practical value directly, as current uses of managed_bytes are limited to keys (which are limited to 64k). However, it enables converting the atomic_cell layer back to managed_bytes (so we can remove IMR) and the CQL layer to managed_bytes/managed_bytes_view, removing contiguous allocations from the coordinator.
Closes#7820
* github.com:scylladb/scylla:
test: add hashers_test
memtable: fix accounting of managed_bytes in partition_snapshot_accounter
test: add managed_bytes_test
utils: fragment_range: add a fragment iterator for FragmentedView
keys: update comments after changes and remove an unused method
mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator
row_cache: more indentation fixes
utils: remove unused linearization facilities in `managed_bytes` class
misc: fix indentation
treewide: remove remaining `with_linearized_managed_bytes` uses
memtable, row_cache: remove `with_linearized_managed_bytes` uses
utils: managed_bytes: remove linearizing accessors
keys, compound: switch from bytes_view to managed_bytes_view
sstables: writer: add write_* helpers for managed_bytes_view
compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view
types: change equal() to accept managed_bytes_view
types: add parallel interfaces for managed_bytes_view
types: add to_managed_bytes(const sstring&)
serializer_impl: handle managed_bytes without linearizing
utils: managed_bytes: add managed_bytes_view::operator[]
utils: managed_bytes: introduce managed_bytes_view
utils: fragment_range: add serialization helpers for FragmentedMutableView
bytes: implement std::hash using appending_hash
utils: mutable_view: add substr()
utils: fragment_range: add compare_unsigned
utils: managed_bytes: make the constructors from bytes and bytes_view explicit
utils: managed_bytes: introduce with_linearized()
utils: managed_bytes: constrain with_linearized_managed_bytes()
utils: managed_bytes: avoid internal uses of managed_bytes::data()
utils: managed_bytes: extract do_linearize_pure()
thrift: do not depend on implicit conversion of keys to bytes_view
clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view
cql3: expression: linearize get_value_from_mutation() eariler
bytes: add to_bytes(bytes)
cql3: expression: mark do_get_value() as static
managed_bytes has a small overhead per each fragment. Due to that, managed_bytes
containing the same data can have different total memory usage in different
allocators. The smaller the preferred max allocation size setting is, the more
fragments are needed and the greater total per-fragment overhead is.
In particular, managed_bytes allocated in the LSA could grow in
memory usage when copied to the standard allocator, if the standard allocator
had a preferred max allocation setting smaller than the LSA.
partition_snapshot_accounter calculates the amount of memory used by
mutation fragments in the memtable (where they are allocated with LSA) based
on the memory usage after they are copied to the standard allocator.
This could result in an overestimation, as explained above.
But partition_snapshot_accounter must not overestimate the amount of freed
memory, as doing otherwise might result in OOM situations.
This patch prevents the overaccounting by adding minimal_external_memory_usage():
a new version of external_memory_usage(), which ignores allocator-dependent
overhead. In particular, it includes the per-fragment overhead in managed_bytes
only once, no matter how many fragments there are.
When a rows_entry is added to row_cache it's constructed from
clustering_row by unpacking all its internals and putting
them into the rows_entry's deletable_row. There's a shorter
way -- the clustering_row already has the deletale_row onboard
from which rows_entry can copy-construct its.
This lets keeping the rows_entry and deletable_row set of
constructors a bit shorter.
tests: unit(dev)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20201224161112.20394-1-xemul@scylladb.com>
The only purpose of this change is to compile (git-bisect
safety) and thus prove that the next patch is correct.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The memory usage is now maintained and updated on each change to the
mutation fragment, so it needs not be recalculated on a call to
`memory_usage()`, hence the schema parameter is unused and can be
removed.
The memory usage of mutation fragments is now tracked through its
lifetime through a reader permit. This was the last major (to my current
knowledge) untracked piece of the reader pipeline.
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.
In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.
Uses where this method was just used to move the fragment away are
converted to use `as_mutation_start() &&`.
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.
Uses where this method was just used to move the fragment away are
converted to use `as_range_tombstone() &&`.
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.
Uses where this method was just used to move the fragment away are
converted to use `as_clustering_row() &&`.
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.
Uses where this method was just used to move the fragment away are
converted to use `as_static_row() &&`.
The clustering_row is deletable_row + clustering_key, all
its internals work exactly as the relevant deletable_row's
ones.
The similar relation is between static_row and row, and
the former wrapes the latter, so here's the same trick
for the non-static row classes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The deletable_row accepts clustering_row in constructor and
.apply() method. The next patch will make clustering_row
embed the deletable_row inside, so those two methods will
violate layering and should be fixed in advance.
The fix is in providing a clustering_row method to convert
itself into a deletable_row. There are two places that need
this: mutation_fragment_applier and partition_snapshot_row_cursor.
Both methods pass temporary clustering_row value, so the
method in question is also move-converter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Seastar recently lost support for the experimental Concepts Technical
Specification (TS) and gained support for C++20 concepts. Re-enable
concepts in Scylla by updating our use of concepts to the C++20
standard.
This change:
- peels off uses of the GCC6_CONCEPT macro
- removes inclusions of <seastar/gcc6-concepts.hh>
- replaces function-style concepts (no longer supported) with
equation-style concepts
- semicolons added and removed as needed
- deprecated std::is_pod replaced by recommended replacement
- updates return type constraints to use concepts instead of
type names (either std::same_as or std::convertible_to, with
std::same_as chosen when possible)
No attempt is made to improve the concepts; this is a specification
update only.
Message-Id: <20200531110254.2555854-1-avi@scylladb.com>
gcc 10 requires a semicolon after every compound requirement,
as per the standard. Add missing semicolons where necessary.
Message-Id: <20200129205805.20928-1-avi@scylladb.com>
When entering a new ck range (of the partition-slice), the partition
snapshot reader will apply to its range tombstones stream all the
tombstones that are relevant to the new ck range. When the partition has
range tombstones that overlap with multiple ck ranges, these will be
applied to the range tombstone stream when entering any of the ck ranges
they overlap with. This will result in the violation of the monotonicity
of the mutation fragments emitted by the reader, as these range
tombstones will be re-emitted on each ck range, if the ck range has at
least one clustering row they apply to.
For example, given the following partition:
rt{[1,10]}, cr{1}, cr{2}, cr{3}...
And a partition-slice with the following ck ranges:
[1,2], [3, 4]
The reader will emit the following fragment stream:
rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ...
Note how the range tombstone is emitted twice. In addition to violating
the monotonicity guarantee, this can also result in an explosion of the
number of emitted range tombstones.
Fix by trimming range tombstones to the start of the current ck range,
thus ensuring that they will not violate mutation fragment monotonicity
guarantees.
Refs: #4104
This is a much simpler fix for the above issue, than the already
committed one (7049cd937A). The latter is reverted by the previous
patch and this patch applies the simpler fix.
When entering a new ck range (of the partition-slice), the partition
snapshot reader will apply to its range tombstones stream all the
tombstones that are relevant to the new ck range. When the partition has
range tombstones that overlap with multiple ck ranges, these will be
applied to the range tombstone stream when entering any of the ck ranges
they overlap with. This will result in the violation of the monotonicity
of the mutation fragments emitted by the reader, as these range
tombstones will be re-emitted on each ck range, if the ck range has at
least one clustering row they apply to.
For example, given the following partition:
rt{[1,10]}, cr{1}, cr{2}, cr{3}...
And a partition-slice with the following ck ranges:
[1,2], [3, 4]
The reader will emit the following fragment stream:
rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ...
Note how the range tombstone is emitted twice. In addition to violating
the monotonicity guarantee, this can also result in an explosion of the
number of emitted range tombstones.
Fix by applying only those range tombstones to the range tombstone
stream, that have a position strictly greater than that of the last
emitted clustering row (or range tombstone), when entering a new ck
range.
Fixes: #4104
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <e047af76df75972acb3c32c7ef9bb5d65d804c82.1547916701.git.bdenes@scylladb.com>
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.
Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.
Scylla now requires GCC 8 to compile.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
After the new in-memory representation of cells was introduced there was
a regression in atomic_cell_or_collection::operator<< which stopped
printing the content of the cell. This makes debugging more incovenient
are time-consuming. This patch fixes the problem. Schema is propagated
to the atomic_cell_or_collection printer and the full content of the
cell is printed.
Fixes#3571.
Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>
Incorrect column_kind was passed, which may cause wrong type to be
used for comparison if schema contains static columns. Affects only
tests.
Spotted during code review.
Message-Id: <1531144991-2658-1-git-send-email-tgrabiec@scylladb.com>
The parameters to the MutationFragmentConsumer concept must be concrete
types, not decltype(auto).
Reported by gcc 8.
Message-Id: <20180421110738.7574-1-avi@scylladb.com>
Moving clustering_row is expensive due to amount of data stored
internally. Adding a mutation_fragment constructor that builds a
clustering_row in-place saves some of that moving.