Make to_range able to handle rvalues. We will pass managed_bytes&& to it
in the next patch to avoid pointless copying.
The public declaration of to_range is changed to a concrete function to avoid
having to explicitly instantiate to_range for all possible reference types of
clustering_key_prefix.
This patch switches the type used to store collection elements inside the
intermediate form used in lists::value, tuples::value etc. from bytes
to managed_bytes. After this patch, tuple and list elements are only linearized
in from_serialized, which will be corrected soon.
This commit introduces some additional copies in expression.cc, which
will be dealt with in a future commit.
We will need them to port the representation of collection types
in cql3/ from bytes to managed_bytes.
The version which takes an iterator of `bytes` as an argument will
be removed after that transition is complete.
We will use them to avoid linearization when going from the intermediate
std::vector<bytes> form in cql3/ to the atomic_cell format, by outputting
managed_bytes instead of bytes in get_with_protocol_version.
We will use it to port the representation of collections in cql3/
from bytes to managed_bytes.
The duplicate version for bytes_view will be removed after that transition
is complete.
The operation::make_*cell functions are useless aliases to methods of
update_parameters, and are used interchangeably with them throughout the code.
Remove them.
Also, remove the now-unused update_parameters::make_cell version for
fragmented_temporary_buffer::view.
As a part of the effort of removing big, contiguous buffers from the codebase,
cql3::raw_value should be made fragmented. Unfortunately the change involves
some nontrivial work, because raw_value must be viewable with raw_value_view,
and raw_value_view must accomodate both raw_value (that's where we store
values in prepared queries) and fragmented_temporary_buffer::view
(because that's the type of values coming from the wire).
This patch makes raw_value fragmented, by changing the backing type from
bytes to managed_bytes. raw_value_view is modified accordingly by changing
the backing type from fragmented_temporary_buffer::view to a variant of
fragmented_temporary_buffer::view and managed_bytes_view.
We have prepared the users of raw_value{_view} for this change in preceding
commits.
It's only used in a single test, and there is no reason why it should ever
be used anywhere else. So let's remove it from the public header and move
it to that test.
We want to change the internals of cql3::raw_value{_view}.
However, users of cql3::raw_value and cql3::raw_value_view often
use them by extracting the internal representation, which will be different
after the planned change.
This commit prepares us for the change by making all accesses to the value
inside cql3::raw_value(_view) be done through helper methods which don't expose
the internal representation publicly.
After this commit we are free to change the internal representation of
raw_value_{view} without messing up their users.
Currently, raw_value_view is backed by a fragmented_temporary_buffer::view,
and many users of this type use it by extracting that internal representation.
However, we want to change raw_value_view so that it can be created both
from fragmented_temporary_buffer and from managed_bytes, so that we can switch
the internals of raw_value from bytes to managed_bytes. To do that we need
to prepare all users for that more general representation.
This commit adds an API which allow using raw_value_view without accessing its
internal representation. In the next commits of this series we will switch all
callers who currently depend on that representation to the new API,
and then we will remove the old accessors and change the internals.
Just for convenience. We will use it in an upcoming patch where we switch
the inner representation of cql3::raw_value from bytes to managed_bytes, and we
will want to construct managed_bytes from fragmented_temporary_buffer::view.
We will need them to replace bytes with managed_bytes in some places in an
upcoming patch.
The change to configure.py is necessary because opearator<< links to to_hex
in bytes.cc.
Tracing is created in two steps and is destroyed in two too.
The 2nd step doesn't have the corresponding stop part, so here
it is -- defer tracing stop after it was started.
But need to keep in mind, that tracing is also shut down on
drain, so the stopping should handle this.
Fixes#8382
tests: unit(dev), manual(start-stop, aborted-start)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210331092221.1602-1-xemul@scylladb.com>
Don't allow users to disable MC sstables format any more.
We would like to retire some old cluster features that has been around
for years. Namely MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES. To do this
we first have to make sure that all existing clusters have them enabled.
It is impossible to know that unless we stop supporting
enable_sstables_mc_format flag.
Test: unit(dev)
Refs #8352
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Closes#8360
* seastar 48376c76a...72e3baed9 (3):
> file: Add RFW_NOWAIT detection case for AuFS
> sharded: provide type info on no sharded instance exception
> iotune: Estimate accuarcy of measurement
Added missing include "database.hh" to api/lsa.cc since seastar::sharded<>
now needs full type information.
This patch is another series on removing big allocations from scylla.
The buffers in `compare_visitor` were replaced with `managed_bytes_view`, similiar change was also needed in tuple_deserializing_iterator and listlike_partial_deserializing_iterator, and was applied as well.
Tests:unit(dev)
Closes#8357
* github.com:scylladb/scylla:
types: remove linearization from abstract_type::compare
types: replace buffers in tuple_deserializing_iterator with fragmented ones
types: make tuple_type_impl::split work with any FragmentedViews
types: move read_collection_size/value specialization to header file
To avoid high latencies caused by large contigous allocations
needed by linearizing, work on fragmented buffers instead.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
In preparation for removing linearization from abstract_type::compare,
add options to avoid linearization in tuple_deserializing_iterator.
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
We may want to store a tuple in a fragmented buffer. To split it
into a vector of optional bytes, tuple_type_impl::split can be used.
To split a contiguous buffer(bytes_view), simply pass
single_fragmented_view(bytes_view).
Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
Addressed https://github.com/scylladb/scylla/pull/8314#issuecomment-803671234
(write issue: "Tracing: slow query fast mode documentation request")
adds a fast slow queries tracing mode documentation to the docs/guide/tracing.md
patch to the scylla-doc will be dup-ed after this one merged
cc @nyh
cc @vladzcloudius
Closes#8373
* github.com:scylladb/scylla:
tracing: api: fast mode doc improvement
tracing: fast slow query tracing mode docs
Following the work started in 253a7640e, a new batch of old features is assumed to be always available. They are all still announced via gossip, but the code assumes that the feature is always true, because we only support upgrades from a previous release, and the release window is considerably smaller than 2 years.
Features picked this time via `git blame`, along with the date of their introduction:
* fe4afb1aa3 (Asias He 2018-09-05 14:52:10 +0800 109) static const sstring ROW_LEVEL_REPAIR = "ROW_LEVEL_REPAIR";
* ff5e541335 (Calle Wilund 2019-02-05 13:06:07 +0000 110) static const sstring TRUNCATION_TABLE = "TRUNCATION_TABLE";
* fefef7b9eb (Tomasz Grabiec 2019-03-05 19:08:07 +0100 111) static const sstring CORRECT_STATIC_COMPACT_IN_MC = "CORRECT_STATIC_COMPACT_IN_MC";
Tests: unit(dev)
Closes#8235
* github.com:scylladb/scylla:
sstables,test: remove variables depending on old features
gms: make CORRECT_STATIC_COMPACT_IN_MC ft unconditionally true
sstables: stop relying on CORRECT_STATIC_COMPACT_IN_MC feature
gms: make TRUNCATION_TABLE feature unconditionally true
gms: make ROW_LEVEL_REPAIR feature unconditionally true
repair: stop relying on ROW_LEVEL_REPAIR feature
Fixes#8369
This was originally found (and fixed) by @gleb-cloudius, but the patch set with
the fix was reverted at some point, and the fix went away. Now the error remains
even in new, nice coroutine code.
We check the wrong var in the inner loop of the pre-fill path of
allocate_segment_ex, often causing us to generate giant writev:s of more or less
the whole file. Not intended.
Closes#8370
8a8589038c ("test: increase quota for tests to 6GB") increased
the quota for tests from 2GB to 6GB. I later found that the increased
requirement is related to the page size: Address Sanitizer allocates
at least a page per object, and so if the page size is larger the
memory requirement is also larger.
Make use of this by only increasing the quota if the page size
is greater than 4096 (I've only seen 4096 and 65536 in the wild).
This allows greater parallelism when the page size is small.
Closes#8371
Tests are short-lived and use a small amount of data. They
are also often run repeatly, and the data is deleted immediately
after the test. This is a good scenario for using the kernel page
cache, as it can cache read-only data from test to test, and avoid
spilling write data to disk if it is deleted quickly.
Acknowledge this by using the new --kernel-page-cache option for
tests.
This is expected to help on large machines, where the disk can be
overloaded. Smaller machines with NVMe disks probably will not see
a difference.
Closes#8347
In order to maintain backward compatibility wrt. cluster features,
two boolean variables were kept in sstable writers:
- correctly_serialize_non_compound_range_tombstones
- correctly_serialize_static_compact_in_mc
Since these features are assumed to always be present now,
the above variables are no longer needed and can be purged.
This pull request adds partial admission control to Thrift frontend. The solution is partial mostly because the Thrift layer, aside from allowing Thrift messages, may also be used as a base protocol for CQL messages. Coupling admission control to this one is a little bit more complicated due to how the layer currently works - a Thrift handler, created once per connection, keeps a local `query_state` instance for the occasion of handling CQL requests. However, `query_state` should be kept per query, not per connection, so adding admission control to this aspect of the frontend is left for later.
Finally, the way service permits are passed from the server, via the handler factory, handler and then to queries is hacky. I haven't figured out how to force Thrift to pass custom context per query, so the way it works now is by relying on the fact that the server does not yield (in Seastar sense) between having read the request and launching the proper handler. Due to that, it's possible to just store the service permit in the server itself, pass the reference (address) to it down to the handler, and then read it back from the handling code and claim ownership of it. It works, but if anyone has a better idea, please share.
Refs #4826Closes#8313
* github.com:scylladb/scylla:
thrift: add support for max_concurrent_requests_per_shard
thrift: add metrics for admission control
thrift: add a counter for in-flight requests
thrift: add a counter for blocked requests
thrift: partially add admission control
service_permit: add a getter for the number of units held
thrift: coroutinize processing a request
memory_limiter: add a missing seastarx include
LCS reshape is currently inefficient for repair-based operation, because
the disjoint run of 256 sstables is reshaped into bigger L0 files, which
will be then integrated into the main sstable set.
On reshape completion, LCS has to compact those big L0 files onto higher
levels, until last level is reached, producing bad write amplification.
A much better approach is to instead compact that disjoint run into the
best possible level L, which can be figured out with:
log (base fan_out) of (total_size / max_sstable_size)
This compaction will be essentially a copy operation. It's important
to do it rather than only mutating the level of sstables because we have
to reshape the input run according to LCS parameters like sstable size.
For repair-based bootstrap/replace, the input disjoint run is now efficiently
reshaped into an ideal level L, so there's no compaction backlog once
reshape completes.
This behavior will manifest in the log as this:
LeveledManifest - Reshaping 256 disjoint sstables in level 0 into level 2
For repair-based decommission/removenode though, which reshape wasn't
wired on yet, level L may temporarily hold 2 disjoint runs, which overlap
one another, but LCS itself will incrementally merge them through either
promotion of L-1 into L, or by detecting overlapping in level L and
merging the overlapping sstables.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210329171826.42873-1-raphaelsc@scylladb.com>
Failures in this test typically happen inside the test consumer object.
These however don't stop the test as the code invoking the consumer
object handles exceptions coming from it. So the test will run to
completion and will fail again when comparing the produced output with
the expected one. This results in distracting failures. The real problem
is not the difference in the output, but the first check that failed,
which is however buried in the noise. To prevent this add an "ok" flag
which is set to false if the consumer fails. In this case the additional
checks are skipped in the end to not generate useless noise.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210326083147.26113-2-bdenes@scylladb.com>
"
When a permit is destroyed we check if it still holds on to any
resources in the destructor. Any resources the permit still holds on are
leaked resources, as users should have released these. Currently we just
invoke `on_internal_error_noexcept()` to handle this, which -- depending
on the configuration -- will result in an error message or an assert. In
the former case, the resources will be leaked for good. This mini-series
fixes this, by signaling back these resources to the semaphore. This
helps avoid an eventual complete dry-up of all semaphore resources and a
subsequent complete shutdown of reads.
Tests: unit(release, debug)
"
* 'reader-permit-signal-leaked-resources/v1' of https://github.com/denesb/scylla:
reader_permit: signal leaked resources
test: test_reader_lifecycle_policy: keep semaphores alive until all ops cease
sstables: generate_summary(): extend the lifecycle of the reader concurrency semaphore
Said test has two separate logical readers, but they share the same
permit, which is illegal. This didn't cause any problems yet, but soon
the semaphore will start to keep score of active/inactive permits which
will be confused by such sharing, so have them use separate permits.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210326083147.26113-1-bdenes@scylladb.com>
Clean up step() function by moving state specific processing into per
state functions. This way it is easier to see how each state handles
individual messages. No functional changes here.
Message-Id: <YGHCiTWjq+L/jVCB@scylladb.com>
The command can be used to inspect IO queues of a local reactor.
Example output:
```
(gdb) scylla io-queues
Dev 0:
Class: |shares: |ptr:
--------------------------------------------------------------------------------
"default" |1 |(seastar::priority_class_data *)0x6000002c6500
"commitlog" |1000 |(seastar::priority_class_data *)0x6000003ad940
"memtable_flush" |1000 |(seastar::priority_class_data *)0x6000005cb300
"streaming" |200 |(seastar::priority_class_data *)0x0
"query" |1000 |(seastar::priority_class_data *)0x600000718580
"compaction" |1000 |(seastar::priority_class_data *)0x6000030ef0c0
Max request size: 2147483647
Max capacity: Ticket(weight: 4194303, size: 4194303)
Capacity tail: Ticket(weight: 73168384, size: 100561888)
Capacity head: Ticket(weight: 77360511, size: 104242143)
Resources executing: Ticket(weight: 2176, size: 514048)
Resources queued: Ticket(weight: 384, size: 98304)
Handles: (1)
Class 0x6000005d7278:
Ticket(weight: 128, size: 32768)
Ticket(weight: 128, size: 32768)
Ticket(weight: 128, size: 32768)
Pending in sink: (0)
```
Created when debugging a core dump. Turned out not to be immediately useful for this use case, but I'm publishing it since it may come in handy in future investigations.
Closes#8362
* github.com:scylladb/scylla:
scylla-gdb: add io-queues command
scylla-gdb.py: add parsing std::priority_queue
scylla-gdb.py: add parsing std::atomic
scylla-gdb.py: add parsing std::shared_ptr
scylla-db.py: add parsing intrusive_slist