Don't require filtering when a continuous slice of the clustering key
is requested, even if partition is unrestricted. The read command we
generate will fetch just the selected data; filtering is unnecessary.
Some tests needed to update the expected results now that we're not
fetching the extra data needed for filtering. (Because tests don't do
the final trim to match selectors and assert instead on all the data
read.)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
"
The reader concurrency semaphore timing out or its queue being overflown
are fairly common events both in production and in testing. At the same
time it is a hard to diagnose problem that often has a benign cause
(especially during testing), but it is equally possible that it points
to something serious. So when this error starts to appear in logs,
usually we want to investigate and the investigation is lengthy...
either involves looking at metrics or coredumps or both.
This patch intends to jumpstart this process by dumping a diagnostics on
semaphore timeout or queue overflow. The diagnostics is printed to the
log with debug level to avoid excessive spamming. It contains a
histogram of all the permits associated with the problematic semaphore
organized by table, operation and state.
Example:
DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore -
Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics:
Permits with state admitted, sorted by memory
memory count name
3499M 27 ks.test:data-query
3499M 27 total
Permits with state waiting, sorted by count
count memory name
1 0B ks.test:drain
7650 0B ks.test:data-query
7651 0B total
Permits with state registered, sorted by count
count memory name
0 0B total
Total: permits: 7678, memory: 3499M
This allows determining several things at glance:
* What are the tables involved
* What are the operations involved
* Where is the memory
This can speed up a follow-up investigation greatly, or it can even be
enough on its own to determine that the issue is benign.
Tests: unit(dev, debug)
"
* 'dump-diagnostics-on-semaphore-timeout/v2' of https://github.com/denesb/scylla:
reader_concurrency_semaphore: dump permit diagnostics on timeout or queue overflow
utils: add to_hr_size()
reader_concurrency_semaphore: link permits into an intrusive list
reader_concurrency_semaphore: move expiry_handler::operator()() out-of-line
reader_concurrency_semaphore: move constructors out-of-line
reader_concurrency_semaphore: add state to permits
reader_concurrency_semaphore: name permits
querier_cache_test: test_immediate_evict_on_insert: use two permits
multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader()
multishard_combining_reader: add permit parameter
multishard_combining_reader: shard_reader: use multishard reader's permit
The reader concurrency semaphore timing out or its queue being overflown
are fairly common events both in production and in testing. At the same
time it is a hard to diagnose problem that often has a benign cause
(especially during testing), but it is equally possible that it points
to something serious. So when this error starts to appear in logs,
usually we want to investigate and the investigation is lengthy...
either involves looking at metrics or coredumps or both.
This patch intends to jumpstart this process by dumping a diagnostics on
semaphore timeout or queue overflow. The diagnostics is printed to the
log with debug level to avoid excessive spamming. It contains a
histogram of all the permits associated with the problematic semaphore
organized by table, operation and state.
Example:
DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore -
Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics:
Permits with state admitted, sorted by memory
memory count name
3499M 27 ks.test:data-query
3499M 27 total
Permits with state waiting, sorted by count
count memory name
1 0B ks.test:drain
7650 0B ks.test:data-query
7651 0B total
Permits with state registered, sorted by count
count memory name
0 0B total
Total: permits: 7678, memory: 3499M
This allows determining several things at glance:
* What are the tables involved
* What are the operations involved
* Where is the memory
This can speed up a follow-up investigation greatly, or it can even be
enough on its own to determine that the issue is benign.
This utility function converts a potentially large number to a compact
representation, composed of at most 4 digits and a letter appropriate to
the power of two the number has to multiplied with to arrive to the
original number (with some loss of precision).
The different powers of two are the conventional 2 ** (N * 10) variants:
* N=0: (B)ytes
* N=1: (K)bytes
* N=2: (M)bytes
* N=3: (G)bytes
* N=4: (T)bytes
Examples:
* 87665 will be converted to 87K
* 1024 will be converted to 1K
Instead of a simple boolean, designating whether the permit was already
admitted or not, add a proper state field with a value for all the
different states the permit can be in. Currently there are three such
states:
* registered - the permit was created and started accounting resource
consumption.
* waiting - the permit was queued to wait for admission.
* admitted - the permit was successfully admitted.
The state will be used for debugging purposes, both during coredump
debugging as well as for dumping diagnostics data about permits.
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.
As not all read can be associated with one schema, the schema is allowed
to be null.
The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
scylla-python3 causes segfault when non-default locale specified.
As workaround for this, we need to set LC_ALL=en_US.UTF_8 on python3 thunk.
Fixes#7408Closes#7414
Make the warning message clearer:
* Include the number of partitions affected by the batch.
* Be clear that the warning is about the batch size in bytes.
Fixes#7367
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Closes#7417
Without this, clang complains that we violate argument dependent
lookup rules:
note: 'operator<<' should be declared prior to the call site or in namespace 'sstables'
std::ostream& operator<<(std::ostream&, const sstables::component_type&);
we can't enforce the #include order, but we can easily move it it
to namespace sstables (where it belongs anyway), so let's do that.
gcc is happy either way.
Closes#7413
bound_kind_m's formatter violates argument dependent lookup rules
according to clang, so fix that. Along the way improve the formatter a
little.
Closes#7412
* git://github.com/avikivity/scylla.git avikivity-bound_kind_m-formatter:
sstables: move bound_kind_m formatter to namespace sstables
sstables: move bound_kind_m formatter to its natural place
sstables: deinline bound_kind_m formatter
Without this, clang complains that we violate argument dependent
lookup rules:
note: 'operator<<' should be declared prior to the call site or in namespace 'sstables'
std::ostream& operator<<(std::ostream&, const sstables::bound_kind_m&);
we can't enforce the #include order, but we can easily move it it
to namespace sstables (where it belongs anyway), so let's do that.
gcc is happy either way.
Move bound_kind_m's formatter to the same header file where
is is defined. This prevents cases where the compiler decays
the type (an enum) to the underlying integral type because it
does not see the formatter declaration, resulting in the wrong
output.
When there are hint files to be sent and the target endpoint is DOWN,
end_point_hints_manager works in the following loop:
- It reads the first hint file in the queue,
- For each hint in the file it decides that it won't be sent because the
target endpoint is DOWN,
- After realizing that there are some unsent hints, it decides to retry
this operation after sleeping 1 second.
This causes the first segment to be wholly read over and over again,
with 1 second pauses, until the target endpoint becomes UP or leaves the
cluster. This causes unnecessary I/O load in the streaming scheduling
group.
This patch adds a check which prevents end_point_hints_manager from
reading the first hint file at all when it is not allowed to send hints.
First observed in #6964
Tests:
- unit(dev)
- hinted handoff dtests
Closes#7407
The test currently uses a single permit shared between two simulated
reads (to wait admission twice). This is not a supported way of using a
permit and will stop working soon as we make the states the permit is in
more pronounced.
Allow the evictable reader managing the underlying reader to pass its
own permit to it when creating it, making sure they share the same
permit. Note that the two parts can still end up using different
permits, when the underlying reader is kept alive between two pages of a
paged read and thus keeps using the permit received on the previous
page.
Also adjust the `reader_context` in multishard_mutation_query.cc to use
the passed-in permit instead of creating a new one when creating a new
reader.
Don't create an own permit, take one as a parameter, like all other
readers do, so the permit can be provided by the higher layer, making
sure all parts of the logical read use the same permit.
Don't create a new permit per shard reader, pass down the multishard
reader's one to be used by each shard reader. They all belong to the
same read, they should use the same permit. Note that despite its name
the shard readers are the local representation of a reader living on a
remote shard and as such they live on the same shard the multishard
combining reader lives on.
Clang parses templates more eagerly than gcc, so it fails on
some forward-declared templates. In this case, value_writer
was forward-declared and then used in data::cell. As it also
uses some definitions local to data::cell, it cannot be
defined before it as well as after it.
To solve the problem, we define it as a nested class so
it can use other local definitions, yet be defined before it
is used. No code changes.
Closes#7401
A few source files (like those generated by antlr) don't build with seastar,
and so don't inherit all of its flags. They then use the compiler default dialect,
not C++20. With gcc that's just fine, since gcc supports concepts in earlier dialects,
but clang requires C++20.
Fix by forcing --std=gnu++20 for all files (same as what Seastar chooses).
Closes#7392
The remains of the defunct #7246.
Fixes#7344Fixes#7345Fixes#7346Fixes#7347
Shard ID length is now within limits.
Shard end sequence number should be set when appropriate.
Shard parent is selected a bit more carefully (sorting)
Shards are filtered by time to exclude cdc generations we cannot get data from (too old)
Shard paging improved
Closes#7348
* github.com:scylladb/scylla:
test_streams: Add some more sanity asserts
alternator::streams: Set dynamodb data TTL explicitly in cdc options
alternator::streams: Improve paging and fix parent-child calculation
alternator::streams: Remove table from shard_id
alternator::streams: Filter our cdc streams older than data/table
alternator::error: Add a few dynamo exception types
"
max_concurrent_for_each was added to seastar for replacing
sstable_directory::parallel_for_each_restricted by using
more efficient concurrency control that doesn't create
unlimited number of continuations.
The series replaces the use of sstable_directory::parallel_for_each_restricted
with max_concurrent_for_each and exposes the sstable_directory::do_for_each_sstable
via a static method.
This method is used here by table::snapshot to limit concurrency
do snapshot operations that suffer from the same unbound
concurrency problem sstable_directory solved.
In addition sstable_directory::_load_semaphore that was used
across calls to do_for_each_sstable was replaced by a static per-shard
semaphore that caps concurrency across all calls to `do_for_each_sstable`
on that shard. This makes sense since the disk is a shared resource.
In the future, we may want to have a load semaphore per device rather than
a single global one. We should experiment with that.
Test: unit(dev)
"
* tag 'max_concurrent_for_each-v5' of github.com:bhalevy/scylla:
table: snapshot: use max_concurrent_for_each
sstable_directory: use a external load_semaphore
test: sstable_directory_test: extract sstable_directory creation into with_sstable_directory
distributed_loader: process_upload_dir: use initial_sstable_loading_concurrency
sstables: sstable_directory: use max_concurrent_for_each
The build with clang fails with
ld.lld: error: undefined symbol: icu_65::Collator::createInstance(icu_65::Locale const&, UErrorCode&)
>>> referenced by like_matcher.cc
>>> build/dev/utils/like_matcher.o:(boost::re_detail_106900::icu_regex_traits_implementation::icu_regex_traits_implementation(icu_65::Locale const&))
>>> referenced by like_matcher.cc
>>> build/dev/utils/like_matcher.o:(boost::re_detail_106900::icu_regex_traits_implementation::icu_regex_traits_implementation(icu_65::Locale const&))
That symbol lives in libicui18n. It's not clear why clang fails to resolve it and gcc succeeds (after all,
both use lld as the linker) but it is easier to add the library than to attempt to figure out the
discrepancy.
Closes#7391
Clang has some difficulty with the boost::cpp_int constructor from string_view.
In fact it is a mess of enable_if<>s so a human would have trouble too.
Work around it by converting to std::string. This is bad for performance, but
this constructor is not going to be fast in any case.
Hopefully a fix will arrive in clang or boost.
Closes#7389
Following ad48d8b43c, fix a similar problem which popped up
with higher inlining thresholds in query-result-writer.hh. Since
idl/query depends on idl/keys, it must follow in definition order.
Closes#7384
We add std::distance(...) + 1 to a vector iterator, but
the vector can be empty, so we're adding a non-zero value
to nullptr, which is undefined behavior.
Rearrange to perform the limit (std::min()) before adding
to the pointer.
Found by clang's ubsan.
Closes#7377
database_test contains several instances of calling do_with_cql_test_env()
with a function that expects to be called in a thread. This mostly works
because there is an internal thread in do_with_cql_test_env(), but is not
guaranteed to.
Fix by switching to the more appropriate do_with_cql_test_env_thread().
Closes#7333
The variable 'observer' (an std::optional) may be left uninitialized
if 'incremental_enabled' is false. However, it is used afterwards
with a call to disconnect, accessing garbage.
Fix by accessing it via the optional wrapper. A call to optional::reset()
destroys the observable, which in turn calls disconnect().
Closes#7380
get_nt_build_id() constructs a pointer by adding a base and an
offset, but if the base happens to be zero, that is undefined
under C++ rules (altough legal ELF).
Fix by performing the addition on integers, and only then
casting to a pointer.
Closes#7379
libstdc++'s std::regex uses recursion[1], with a depth controlled by the
input. Together with clang's debug mode, this overflows the stack.
Use boost::regex instead, which is immune to the problem.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164Closes#7378
b1e78313fe added a check for ubsan to squelch a false positive,
but that check doesn't work with clang. Relax it to check for debug
mode, so clang doesn't hit the same false positive as gcc did.
Define a SANITIZE macro so we have a reliable way to detect if
we're running with a sanitizer.
Closes#7372
d2t() scales a fraction in the range [0, 1] to the range of
a biased token (same as unsigned long). But x86 doesn't support
conversion to unsigned, only signed, so this is a truncating
conversion. Clang's ubsan correctly warns about it.
Fix by reducing the range before converting, and expanding it
afterwards.
Closes#7376
Some test (run_based_compaction_test at least) use _max_sstable_size = 0
in order to force one partition per sstable. That triggers an overflow
when calculating the expected bloom filter size. The overflow doesn't
matter for normal operation, because the result later appears on a
divisor, but does trigger a ubsan error.
Squelch the error by bot dividing by zero here.
I tried using _max_sstable_size = 1, but the test failed for other
reasons.
Closes#7375
When converting a value to its Lua representation, we choose
an integer type if it fits. If it doesn't, we fall back to a
more expensive type. So we explicitly try to trigger an overflow.
However, clang's ubsan doesn't like the overflow, and kills the
test. Tell it that the overflow is expected here.
Closes#7374
Our AVX2 implementation cannot load a partial vector,
or mask unused elements (that can be done with AVX-512/SVE2),
so it has some restrictions. Document them.
Closes#7385
Offsetting a null pointer is undefined, and clang's ubsan complains.
Rearrange the arithmetic so we never offset a null pointer. A function
is introduced for the remaining contiguous bytes so it can cast the result
to size_t, avoiding a compare-of-different-signedness warning from gcc.
Closes#7373