Commit Graph

25803 Commits

Author SHA1 Message Date
Michał Chojnowski
6e7e795dfd cql3: expr: expression: make the argument of to_range a forwarding reference
Make to_range able to handle rvalues. We will pass managed_bytes&& to it
in the next patch to avoid pointless copying.
The public declaration of to_range is changed to a concrete function to avoid
having to explicitly instantiate to_range for all possible reference types of
clustering_key_prefix.
2021-04-01 10:44:21 +02:00
Michał Chojnowski
0bb959e890 cql3: don't linearize elements of lists, tuples, and user types
This patch switches the type used to store collection elements inside the
intermediate form used in lists::value, tuples::value etc. from bytes
to managed_bytes. After this patch, tuple and list elements are only linearized
in from_serialized, which will be corrected soon.
This commit introduces some additional copies in expression.cc, which
will be dealt with in a future commit.
2021-04-01 10:44:21 +02:00
Michał Chojnowski
fa2749c2a0 cql3: values: add const managed_bytes& constructor to raw_value_view
Will be used in the next patch. Separated for clarity.
2021-04-01 10:44:21 +02:00
Michał Chojnowski
8927aaf225 cql3: output managed_bytes instead of bytes in get_with_protocol_version 2021-04-01 10:44:21 +02:00
Michał Chojnowski
aab9509775 types: collection: add versions of pack for fragmented buffers
We will need them to port the representation of collection types
in cql3/ from bytes to managed_bytes.
The version which takes an iterator of `bytes` as an argument will
be removed after that transition is complete.
2021-04-01 10:44:21 +02:00
Michał Chojnowski
e9c05582a4 types: add write_collection_{value,size} for managed_bytes_mutable_view
We will use them to avoid linearization when going from the intermediate
std::vector<bytes> form in cql3/ to the atomic_cell format, by outputting
managed_bytes instead of bytes in get_with_protocol_version.
2021-04-01 10:44:21 +02:00
Michał Chojnowski
3387d43a34 cql3: tuples, user_types: avoid linearization in from_serialized() and get()
Deserialize from raw_value_view without linearizing and output managed_bytes
instead of bytes.
2021-04-01 10:44:20 +02:00
Michał Chojnowski
a10a82da30 types: tuple: add build_value_fragmented
A version of build_value which produces fragmented output.
We will use it to avoid linearization in tuples::value and user_types::value.
2021-04-01 10:42:07 +02:00
Michał Chojnowski
9777026e71 cql3: update_parameters: add make_cell version for managed_bytes_view
We will use it to port the representation of collections in cql3/
from bytes to managed_bytes.
The duplicate version for bytes_view will be removed after that transition
is complete.
2021-04-01 10:42:07 +02:00
Michał Chojnowski
c2c6b2abfa cql3: remove operation::make_*cell
The operation::make_*cell functions are useless aliases to methods of
update_parameters, and are used interchangeably with them throughout the code.
Remove them.

Also, remove the now-unused update_parameters::make_cell version for
fragmented_temporary_buffer::view.
2021-04-01 10:42:07 +02:00
Michał Chojnowski
463ec1b082 cql3: values: make raw_value fragmented
As a part of the effort of removing big, contiguous buffers from the codebase,
cql3::raw_value should be made fragmented. Unfortunately the change involves
some nontrivial work, because raw_value must be viewable with raw_value_view,
and raw_value_view must accomodate both raw_value (that's where we store
values in prepared queries) and fragmented_temporary_buffer::view
(because that's the type of values coming from the wire).

This patch makes raw_value fragmented, by changing the backing type from
bytes to managed_bytes. raw_value_view is modified accordingly by changing
the backing type from fragmented_temporary_buffer::view to a variant of
fragmented_temporary_buffer::view and managed_bytes_view.

We have prepared the users of raw_value{_view} for this change in preceding
commits.
2021-04-01 10:42:07 +02:00
Michał Chojnowski
5984d6b2ce cql3: values: remove raw_value_view::operator==
It's only used in a single test, and there is no reason why it should ever
be used anywhere else. So let's remove it from the public header and move
it to that test.
2021-04-01 10:42:07 +02:00
Michał Chojnowski
b9322a6b71 cql3: switch users of cql3::raw_value_view to internals-independent API
We want to change the internals of cql3::raw_value{_view}.
However, users of cql3::raw_value and cql3::raw_value_view often
use them by extracting the internal representation, which will be different
after the planned change.

This commit prepares us for the change by making all accesses to the value
inside cql3::raw_value(_view) be done through helper methods which don't expose
the internal representation publicly.

After this commit we are free to change the internal representation of
raw_value_{view} without messing up their users.
2021-04-01 10:42:04 +02:00
Michał Chojnowski
b3167ac0a6 cql3: values: add an internals-independent API to raw_value_view
Currently, raw_value_view is backed by a fragmented_temporary_buffer::view,
and many users of this type use it by extracting that internal representation.
However, we want to change raw_value_view so that it can be created both
from fragmented_temporary_buffer and from managed_bytes, so that we can switch
the internals of raw_value from bytes to managed_bytes. To do that we need
to prepare all users for that more general representation.

This commit adds an API which allow using raw_value_view without accessing its
internal representation. In the next commits of this series we will switch all
callers who currently depend on that representation to the new API,
and then we will remove the old accessors and change the internals.
2021-04-01 10:39:42 +02:00
Michał Chojnowski
45e0ef26d3 utils: managed_bytes: add a managed_bytes constructor from FragmentedView
Just for convenience. We will use it in an upcoming patch where we switch
the inner representation of cql3::raw_value from bytes to managed_bytes, and we
will want to construct managed_bytes from fragmented_temporary_buffer::view.
2021-04-01 10:39:42 +02:00
Michał Chojnowski
4715268e30 utils: managed_bytes: add operator<< and to_hex for managed_bytes
We will need them to replace bytes with managed_bytes in some places in an
upcoming patch.

The change to configure.py is necessary because opearator<< links to to_hex
in bytes.cc.
2021-04-01 10:39:42 +02:00
Michał Chojnowski
14c4639994 utils: fragment_range: add to_hex 2021-04-01 10:39:42 +02:00
Michał Chojnowski
b6740a01ac configure: remove unused link dependencies from UUID_test 2021-04-01 10:39:42 +02:00
Avi Kivity
bbec43f9a1 Update tools/java submodule
* tools/java ccc4201ded...fb21784b91 (2):
  > fix: Add dummy implementation of getToppartitions
  > nodetool: Make toppartitions call the generic endpoint

Fixes #4520.
2021-03-31 17:38:03 +03:00
Pavel Emelyanov
887a1b0d3d tracing: Stop tracing in main's deferred action
Tracing is created in two steps and is destroyed in two too.
The 2nd step doesn't have the corresponding stop part, so here
it is -- defer tracing stop after it was started.

But need to keep in mind, that tracing is also shut down on
drain, so the stopping should handle this.

Fixes #8382
tests: unit(dev), manual(start-stop, aborted-start)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210331092221.1602-1-xemul@scylladb.com>
2021-03-31 12:28:37 +03:00
Piotr Jastrzebski
57c7964d6c config: ignore enable_sstables_mc_format flag
Don't allow users to disable MC sstables format any more.
We would like to retire some old cluster features that has been around
for years. Namely MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES. To do this
we first have to make sure that all existing clusters have them enabled.
It is impossible to know that unless we stop supporting
enable_sstables_mc_format flag.

Test: unit(dev)

Refs #8352

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>

Closes #8360
2021-03-31 12:23:59 +03:00
Avi Kivity
f9244734f9 Update seastar submodule
* seastar 48376c76a...72e3baed9 (3):
  > file: Add RFW_NOWAIT detection case for AuFS
  > sharded: provide type info on no sharded instance exception
  > iotune: Estimate accuarcy of measurement

Added missing include "database.hh" to api/lsa.cc since seastar::sharded<>
now needs full type information.
2021-03-31 10:40:04 +03:00
Avi Kivity
de10a74a84 Merge 'types: remove linearization from abstract_type::compare' from Wojciech Mitros
This patch is another series on removing big allocations from scylla.
The buffers in `compare_visitor` were replaced with `managed_bytes_view`, similiar change was also needed in tuple_deserializing_iterator and listlike_partial_deserializing_iterator, and was applied as well.

Tests:unit(dev)

Closes #8357

* github.com:scylladb/scylla:
  types: remove linearization from abstract_type::compare
  types: replace buffers in tuple_deserializing_iterator with fragmented ones
  types: make tuple_type_impl::split work with any FragmentedViews
  types: move read_collection_size/value specialization to header file
2021-03-31 08:50:52 +03:00
Wojciech Mitros
f57fa935a2 types: remove linearization from abstract_type::compare
To avoid high latencies caused by large contigous allocations
needed by linearizing, work on fragmented buffers instead.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-03-31 06:35:10 +02:00
Wojciech Mitros
daa31be37f types: replace buffers in tuple_deserializing_iterator with fragmented ones
In preparation for removing linearization from abstract_type::compare,
add options to avoid linearization in tuple_deserializing_iterator.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-03-31 06:35:09 +02:00
Wojciech Mitros
823d4c7529 types: make tuple_type_impl::split work with any FragmentedViews
We may want to store a tuple in a fragmented buffer. To split it
into a vector of optional bytes, tuple_type_impl::split can be used.
To split a contiguous buffer(bytes_view), simply pass
single_fragmented_view(bytes_view).

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-03-31 06:34:37 +02:00
Piotr Sarna
6a2377a233 Merge 'Fast slow query trace doc' from Ivan
Addressed https://github.com/scylladb/scylla/pull/8314#issuecomment-803671234
(write issue: "Tracing: slow query fast mode documentation request")

adds a fast slow queries tracing mode documentation to the docs/guide/tracing.md

patch to the scylla-doc will be dup-ed after this one merged

cc @nyh
cc @vladzcloudius

Closes #8373

* github.com:scylladb/scylla:
  tracing: api: fast mode doc improvement
  tracing: fast slow query tracing mode docs
2021-03-30 17:57:04 +02:00
Ivan Prisyazhnyy
778d9217f3 tracing: api: fast mode doc improvement
Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
2021-03-30 16:22:56 +02:00
Ivan Prisyazhnyy
b3b66fb629 tracing: fast slow query tracing mode docs
Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
2021-03-30 16:22:56 +02:00
Avi Kivity
d2921b5112 Merge 'Clean up > 2-year-old features' from Piotr Sarna
Following the work started in 253a7640e, a new batch of old features is assumed to be always available. They are all still announced via gossip, but the code assumes that the feature is always true, because we only support upgrades from a previous release, and the release window is considerably smaller than 2 years.

Features picked this time via `git blame`, along with the date of their introduction:

* fe4afb1aa3 (Asias He                  2018-09-05 14:52:10 +0800  109) static const sstring ROW_LEVEL_REPAIR = "ROW_LEVEL_REPAIR";
* ff5e541335 (Calle Wilund              2019-02-05 13:06:07 +0000  110) static const sstring TRUNCATION_TABLE = "TRUNCATION_TABLE";
* fefef7b9eb (Tomasz Grabiec            2019-03-05 19:08:07 +0100  111) static const sstring CORRECT_STATIC_COMPACT_IN_MC = "CORRECT_STATIC_COMPACT_IN_MC";

Tests: unit(dev)

Closes #8235

* github.com:scylladb/scylla:
  sstables,test: remove variables depending on old features
  gms: make CORRECT_STATIC_COMPACT_IN_MC ft unconditionally true
  sstables: stop relying on CORRECT_STATIC_COMPACT_IN_MC feature
  gms: make TRUNCATION_TABLE feature unconditionally true
  gms: make ROW_LEVEL_REPAIR feature unconditionally true
  repair: stop relying on ROW_LEVEL_REPAIR feature
2021-03-30 16:13:35 +03:00
Calle Wilund
c0666ea89b commitlog: Fix inner loop condition in allocation pre-fill
Fixes #8369

This was originally found (and fixed) by @gleb-cloudius, but the patch set with
the fix was reverted at some point, and the fix went away. Now the error remains
even in new, nice coroutine code.

We check the wrong var in the inner loop of the pre-fill path of
allocate_segment_ex, often causing us to generate giant writev:s of more or less
the whole file.  Not intended.

Closes #8370
2021-03-30 12:14:55 +02:00
Avi Kivity
c2866f46b5 test: relax quota for tests on machines with small page size
8a8589038c ("test: increase quota for tests to 6GB") increased
the quota for tests from 2GB to 6GB. I later found that the increased
requirement is related to the page size: Address Sanitizer allocates
at least a page per object, and so if the page size is larger the
memory requirement is also larger.

Make use of this by only increasing the quota if the page size
is greater than 4096 (I've only seen 4096 and 65536 in the wild).
This allows greater parallelism when the page size is small.

Closes #8371
2021-03-30 12:13:42 +02:00
Avi Kivity
8785dd62cb tests: use kernel page cache
Tests are short-lived and use a small amount of data. They
are also often run repeatly, and the data is deleted immediately
after the test. This is a good scenario for using the kernel page
cache, as it can cache read-only data from test to test, and avoid
spilling write data to disk if it is deleted quickly.

Acknowledge this by using the new --kernel-page-cache option for
tests.

This is expected to help on large machines, where the disk can be
overloaded. Smaller machines with NVMe disks probably will not see
a difference.

Closes #8347
2021-03-30 12:04:55 +02:00
Piotr Sarna
6de2691bbd sstables,test: remove variables depending on old features
In order to maintain backward compatibility wrt. cluster features,
two boolean variables were kept in sstable writers:
 - correctly_serialize_non_compound_range_tombstones
 - correctly_serialize_static_compact_in_mc

Since these features are assumed to always be present now,
the above variables are no longer needed and can be purged.
2021-03-30 09:37:41 +02:00
Piotr Sarna
e42dee6afb gms: make CORRECT_STATIC_COMPACT_IN_MC ft unconditionally true
The feature is assumed to be true due to being over 2 years old.
It's still advertised in gossip, but it's assumed to always be present.
2021-03-30 09:37:13 +02:00
Piotr Sarna
28c9af6fa5 sstables: stop relying on CORRECT_STATIC_COMPACT_IN_MC feature
The feature bit is going away because it's over 2 years old,
so the code which depended on it becomes unconditional.
2021-03-30 09:37:04 +02:00
Piotr Sarna
08c4350968 gms: make TRUNCATION_TABLE feature unconditionally true
Turns out the feature was not used presently.
Historically, the commit which removed the support is
30a700c5b0 .
2021-03-30 09:36:45 +02:00
Piotr Sarna
c070178c7e gms: make ROW_LEVEL_REPAIR feature unconditionally true
The feature is assumed to be true due to being over 2 years old.
It's still advertised in gossip, but it's assumed to always be present.
2021-03-30 09:36:11 +02:00
Piotr Sarna
80ebedd242 repair: stop relying on ROW_LEVEL_REPAIR feature
The feature is going away because it's over 2 years old,
so the code which depended on it becomes unconditional.
2021-03-30 09:35:40 +02:00
Avi Kivity
c1badc6317 noexcept_traits: convert enable_if to concepts
A little easier to read.

Closes #8329
2021-03-30 09:30:23 +02:00
Avi Kivity
405c4e7af1 serializer: replace enable_if in deserialized_bytes_proxy with constraint
Simpler to read and understand.

Closes #8303
2021-03-30 09:30:06 +02:00
Avi Kivity
7c953f33d5 utils: disk-error-handler: replace enable_if with concepts
Simpler, cleaner. We also replace the deprecated std::result_of_t
with std::invoke_result_t.

Closes #8305
2021-03-30 09:29:46 +02:00
Nadav Har'El
115324f71a Merge 'Add partial admission control to Thrift frontend' from Piotr Sarna
This pull request adds partial admission control to Thrift frontend. The solution is partial mostly because the Thrift layer, aside from allowing Thrift messages, may also be used as a base protocol for CQL messages. Coupling admission control to this one is a little bit more complicated due to how the layer currently works - a Thrift handler, created once per connection, keeps a local `query_state` instance for the occasion of handling CQL requests. However, `query_state` should be kept per query, not per connection, so adding admission control to this aspect of the frontend is left for later.

Finally, the way service permits are passed from the server, via the handler factory, handler and then to queries is hacky. I haven't figured out how to force Thrift to pass custom context per query, so the way it works now is by relying on the fact that the server does not yield (in Seastar sense) between having read the request and launching the proper handler. Due to that, it's possible to just store the service permit in the server itself, pass the reference (address) to it down to the handler, and then read it back from the handling code and claim ownership of it. It works, but if anyone has a better idea, please share.

Refs #4826

Closes #8313

* github.com:scylladb/scylla:
  thrift: add support for max_concurrent_requests_per_shard
  thrift: add metrics for admission control
  thrift: add a counter for in-flight requests
  thrift: add a counter for blocked requests
  thrift: partially add admission control
  service_permit: add a getter for the number of units held
  thrift: coroutinize processing a request
  memory_limiter: add a missing seastarx include
2021-03-29 21:36:50 +03:00
Raphael S. Carvalho
a390f4eb61 sstables: optimize LCS reshape for repair-based operations
LCS reshape is currently inefficient for repair-based operation, because
the disjoint run of 256 sstables is reshaped into bigger L0 files, which
will be then integrated into the main sstable set.
On reshape completion, LCS has to compact those big L0 files onto higher
levels, until last level is reached, producing bad write amplification.

A much better approach is to instead compact that disjoint run into the
best possible level L, which can be figured out with:
	log (base fan_out) of (total_size / max_sstable_size)
This compaction will be essentially a copy operation. It's important
to do it rather than only mutating the level of sstables because we have
to reshape the input run according to LCS parameters like sstable size.

For repair-based bootstrap/replace, the input disjoint run is now efficiently
reshaped into an ideal level L, so there's no compaction backlog once
reshape completes.

This behavior will manifest in the log as this:
LeveledManifest - Reshaping 256 disjoint sstables in level 0 into level 2

For repair-based decommission/removenode though, which reshape wasn't
wired on yet, level L may temporarily hold 2 disjoint runs, which overlap
one another, but LCS itself will incrementally merge them through either
promotion of L-1 into L, or by detecting overlapping in level L and
merging the overlapping sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210329171826.42873-1-raphaelsc@scylladb.com>
2021-03-29 20:22:04 +03:00
Botond Dénes
3c54c990ab test: view_build_test: test_view_update_generator_buffering: fail gracefully
Failures in this test typically happen inside the test consumer object.
These however don't stop the test as the code invoking the consumer
object handles exceptions coming from it. So the test will run to
completion and will fail again when comparing the produced output with
the expected one. This results in distracting failures. The real problem
is not the difference in the output, but the first check that failed,
which is however buried in the noise. To prevent this add an "ok" flag
which is set to false if the consumer fails. In this case the additional
checks are skipped in the end to not generate useless noise.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210326083147.26113-2-bdenes@scylladb.com>
2021-03-29 17:58:28 +03:00
Avi Kivity
a8463cfb37 Merge "reader_permit: signal leaked resources" from Botond
"
When a permit is destroyed we check if it still holds on to any
resources in the destructor. Any resources the permit still holds on are
leaked resources, as users should have released these. Currently we just
invoke `on_internal_error_noexcept()` to handle this, which -- depending
on the configuration -- will result in an error message or an assert. In
the former case, the resources will be leaked for good. This mini-series
fixes this, by signaling back these resources to the semaphore. This
helps avoid an eventual complete dry-up of all semaphore resources and a
subsequent complete shutdown of reads.

Tests: unit(release, debug)
"

* 'reader-permit-signal-leaked-resources/v1' of https://github.com/denesb/scylla:
  reader_permit: signal leaked resources
  test: test_reader_lifecycle_policy: keep semaphores alive until all ops cease
  sstables: generate_summary(): extend the lifecycle of the reader concurrency semaphore
2021-03-29 17:57:31 +03:00
Botond Dénes
9e01c4c667 test: view_build_test: test_view_update_generator_buffering: use separate permit for readers
Said test has two separate logical readers, but they share the same
permit, which is illegal. This didn't cause any problems yet, but soon
the semaphore will start to keep score of active/inactive permits which
will be confused by such sharing, so have them use separate permits.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210326083147.26113-1-bdenes@scylladb.com>
2021-03-29 17:35:51 +03:00
Takuya ASADA
6f678ab7ff aws: initialize self._disks['ebs'] when no EBS disks
Seems like aws_instance.ebs_disks() causes traceback when no EBS disks
available, need to initialize with empty list.

Fixes #8365

Closes #8366
2021-03-29 17:21:14 +03:00
Gleb Natapov
13a3cf62bb raft: move incoming message processing into per state functions
Clean up step() function by moving state specific processing into per
state functions. This way it is easier to see how each state handles
individual messages. No functional changes here.

Message-Id: <YGHCiTWjq+L/jVCB@scylladb.com>
2021-03-29 15:48:43 +02:00
Tomasz Grabiec
43fd322856 Merge 'scylla-gdb.py: Add io-queues command' from Piotr Sarna
The command can be used to inspect IO queues of a local reactor.
Example output:
```
    (gdb) scylla io-queues
        Dev 0:
            Class:                  |shares:         |ptr:
            --------------------------------------------------------------------------------
            "default"               |1               |(seastar::priority_class_data *)0x6000002c6500
            "commitlog"             |1000            |(seastar::priority_class_data *)0x6000003ad940
            "memtable_flush"        |1000            |(seastar::priority_class_data *)0x6000005cb300
            "streaming"             |200             |(seastar::priority_class_data *)0x0
            "query"                 |1000            |(seastar::priority_class_data *)0x600000718580
            "compaction"            |1000            |(seastar::priority_class_data *)0x6000030ef0c0

            Max request size:    2147483647
            Max capacity:        Ticket(weight: 4194303, size: 4194303)
            Capacity tail:       Ticket(weight: 73168384, size: 100561888)
            Capacity head:       Ticket(weight: 77360511, size: 104242143)

            Resources executing: Ticket(weight: 2176, size: 514048)
            Resources queued:    Ticket(weight: 384, size: 98304)
            Handles: (1)
                Class 0x6000005d7278:
                    Ticket(weight: 128, size: 32768)
                    Ticket(weight: 128, size: 32768)
                    Ticket(weight: 128, size: 32768)
            Pending in sink: (0)
```

Created when debugging a core dump. Turned out not to be immediately useful for this use case, but I'm publishing it since it may come in handy in future investigations.

Closes #8362

* github.com:scylladb/scylla:
  scylla-gdb: add io-queues command
  scylla-gdb.py: add parsing std::priority_queue
  scylla-gdb.py: add parsing std::atomic
  scylla-gdb.py: add parsing std::shared_ptr
  scylla-db.py: add parsing intrusive_slist
2021-03-29 15:31:48 +02:00