There are two issues with current implementation of remove/remove_if:
1) If it happens concurrently with get_ptr(), the latter may still
populate the cache using value obtained from before remove() was
called. remove() is used to invalidate caches, e.g. the prepared
statements cache, and the expected semantic is that values
calculated from before remove() should not be present in the cache
after invalidation.
2) As long as there is any active pointer to the cached value
(obtained by get_ptr()), the old value from before remove() will be
still accessible and returned by get_ptr(). This can make remove()
have no effect indefinitely if there is persistent use of the cache.
One of the user-perceived effects of this bug is that some prepared
statements may not get invalidated after a schema change and still use
the old schema (until next invalidation). If the schema change was
modifying UDT, this can cause statement execution failures. CQL
coordinator will try to interpret bound values using old set of
fields. If the driver uses the new schema, the coordinaotr will fail
to process the value with the following exception:
User Defined Type value contained too many fields (expected 5, got 6)
The patch fixes the problem by making remove()/remove_if() erase old
entries from _loading_values immediately.
The predicate-based remove_if() variant has to also invalidate values
which are concurrently loading to be safe. The predicate cannot be
avaluated on values which are not ready. This may invalidate some
values unnecessarily, but I think it's fine.
Fixes#10117
Message-Id: <20220309135902.261734-1-tgrabiec@scylladb.com>
When compiling utils/rjson.cc on GCC, the compilation triggers the
following warning (which becomes a compilation error):
utils/rjson.cc: In function ‘seastar::future<> rjson::print(const value&, seastar::output_stream<char>&, size_t)’:
utils/rjson.cc:239:15: error: typedef ‘using Ch = char’ locally defined but not used [-Werror=unused-local-typedefs]
239 | using Ch = char;
| ^~
This warning is a false positive. 'using Ch' is actually used internally
by rapidjson::Writer. This is a known GCC bug
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61596), which has not been
fixed since 2014.
I disabled this warning only locally as other code is not affected by
this warning and no other code already disables this warning.
Note that there are some GCC compilation problems still left apart
from this one.
Closes#10158
Recently, coordinator_result was introduced as an alternative for
exceptions. It was placed in the main "exceptions/exceptions.hh" header,
which virtually every single source file in Scylla includes.
But unfortunately, it brings in some heavy header files and templates,
leading to a lot of wasted build time - ClangBuildAnalyzer measured that
we include exceptions.hh in 323 source files, taking almost two seconds
each on average.
In this patch, we split the coordinator_result feature into a separate
header file, "exceptions/coordinator_result", and only the few places
which need it include the header file. Unfortunately, some of these
few places are themselves header, so the new header file ends up being
included in 100 source files - but 100 is still much less than 323 and
perhaps we can reduce this number 100 later.
After this patch, the total Scylla object-file size is reduced by 6.5%
(the object size is a proxy for build time, which I didn't directly
measure). ClangBuildAnalyzer reports that now each of the 323 includes
of exceptions.hh only takes 80ms, coordinator_result.hh is only included
100 times, and virtually all the cost to include it comes from Boost's
result.hh (400ms per inclusion).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220228204323.1427012-1-nyh@scylladb.com>
This series contains:
- lister: move to utils
- tidy up the clutter in the root dir
Based on Avi's feedback to `[PATCH 1/1] utils: directory_lister: close: always abort queue` that was sent to the mailing list:
- directory_lister: drop abort method
- lister: do not require get after close to fail
- test: lister_test: test_directory_lister_close simplify indentation
- cosmetic cleanup
Closes#10142
* github.com:scylladb/scylla:
test: lister_test: test_directory_lister_close simplify indentation
lister: do not require get after close to fail
directory_lister: drop abort method
lister: move to utils
"
Some templates put constraints onto the involved types with the help of
static assertions. Having them in form of concepts is much better.
tests: unit(dev)
"
* 'br-static-assert-to-concept' of https://github.com/xemul/scylla:
sstables: Remove excessive type-match assertions
mutation_reader: Sanitize invocable asserion and concept
code: Convert is_future result_of assertions into invoke_result concept
code: Convert is_same+result_of assertions into invocable concepts
code: Convert nothrow construction assertions into concepts
code: Convert is_integral assertions to concepts
Currently, the lister test expected get() to
always fail after close(), but it unexpectedly
succeeded if get() was never called before close,
as seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/4587/artifact/testlog/x86_64_debug/lister_test.test_directory_lister_close.4001.log
```
random-seed=1475104835
Generated 719 dir entries
Getting 565 dir entries
Closing directory_lister
Getting 0 dir entries
Closing directory_lister
test/boost/lister_test.cc(190): fatal error: in "test_directory_lister_close": exception std::exception expected but not raised
```
This change relaxes this requirement to keep
close() simple, based on Avi's feedback:
> The user should call close(), and not do it while get() is running, and
> that's it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Based on Avi's feedback:
> We generally have a public abort() only if we depend on an external
> event (like data from a tcp socket) that we don't control. But here
> there are no such external events. So why have a public abort() at all?
If needed in the future, we can consider adding
get(abort_source&) to allow aborting get() via
an external event.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
There's nothing specific to scylla in the lister
classes, they could (and maybe should) be part of
the seastar library.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
allocated through the region
It indicates alloc-dealloc mismatch, and can cause other problems in
the systems like unable to reclaim memory. We want to catch this at
the deallocation site to be able to quickly indentify the offender.
Misbehavior of this sort can cause fake OOMs due to underflow of
_non_lsa_memory_in_use. When it underflows enough,
shard_segment_pool.total_memory() will become 0 and memory reclamation
will stop doing anything.
Refs #10056
on_evicted() is invoked in the LSA allocator context, set in the
reclaimer callback instaled by the cache_tracker. However,
cached_pages are allocated in the standard allocator context (note:
page content is allocated inside LSA via lsa_buffer). The LSA region
will happilly deallocate these, thinking that they these are large
objects which were delegated to the standard allocator. But the
_non_lsa_memory_in_use metric will underflow. When it underflows
enough, shard_segment_pool.total_memory() will become 0 and memory
reclamation will stop doing anything, leading to apparent OOM.
The fix is to switch to the standard allocator context inside
cached_page::on_evicted(). evict_range() was also given the same
treatment as a precaution, it currently is only invoked in the
standard allocator context.
Fixes#10056
This PR propagates the read coordinator logic so that read timeout and read failure exceptions are propagated without throwing on the coordinator side.
This PR is only concerned with exceptions which were originally thrown by the coordinator (in read resolvers). Exceptions propagated through RPC and RPC timeouts will still throw, although those exceptions will be caught and converted into exceptions-as-values by read resolvers.
This is a continuation of work started in #10014.
Results of `perf_simple_query --smp 1 --operations-per-shard 1000000` (read workload), compared with merge base (10880fb0a7):
```
BEFORE:
125085.13 tps ( 80.2 allocs/op, 12.2 tasks/op, 49010 insns/op)
125645.88 tps ( 80.2 allocs/op, 12.2 tasks/op, 49008 insns/op)
126148.85 tps ( 80.2 allocs/op, 12.2 tasks/op, 49005 insns/op)
126044.40 tps ( 80.2 allocs/op, 12.2 tasks/op, 49005 insns/op)
125799.75 tps ( 80.2 allocs/op, 12.2 tasks/op, 49003 insns/op)
AFTER:
127557.21 tps ( 80.2 allocs/op, 12.2 tasks/op, 49197 insns/op)
127835.98 tps ( 80.2 allocs/op, 12.2 tasks/op, 49198 insns/op)
127749.81 tps ( 80.2 allocs/op, 12.2 tasks/op, 49202 insns/op)
128941.17 tps ( 80.2 allocs/op, 12.2 tasks/op, 49192 insns/op)
129276.15 tps ( 80.2 allocs/op, 12.2 tasks/op, 49182 insns/op)
```
The PR does not introduce additional allocations on the read happy-path. The number of instructions used grows by about 200 insns/op. The increase in TPS is probably just a measurement error.
Closes#10092
* github.com:scylladb/scylla:
indexed_table_select_statement: return some exceptions as exception messages
result_combinators: add result_wrap_unpack
select_statement: return exceptions as errors in execute_without_checking_exception_message
select_statement: return exceptions without throwing in do_execute
select_statement: implement execute_without_checking_exception_message
select_statement: introduce helpers for working with failed results
query_pager: resultify relevant methods
storage_proxy: resultify (do_)query
storage_proxy: resultify query_singular
storage_proxy: propagate failed results through query_partition_key_range
storage_proxy: resultify query_partition_key_range_concurrent
storage_proxy: modify handle_read_error to also handle exception containers
abstract_read_executor: return result from execute()
abstract_read_executor: return and handle result from has_cl()
storage_proxy: resultify handling errors from read-repair
abstract_read_executor::reconcile: resultise handling of data_resolver->done()
abstract_read_executor::execute: resultify handling of data_resolver->done()
result_combinators: add result_discard_value
abstract_read_executor: resultify _result_promise
abstract_read_executor: return result from done()
abstract_read_resolver: fail promises by passing exception as value
abstract_read_resolver: resultify promises
exceptions: make it possible to return read_{timeout,failure}_exception as value
result_try: add as_inner/clone_inner to handle types
result_try: relax ConvertWithTo constraint
exception_container: switch impl to std::shared_ptr and make copyable
result_loop: add result_repeat
result_loop: add result_do_until
result_loop: add result_map_reduce
utils/result: add utilities for checking/creating rebindable results
Adds a helper combinator utils::result_wrap_unpack which, in contrast to
utils::result_wrap, uses futurize_apply instead of futurize_invoke to
call the wrapped callable.
In short, if utils::result_wrap is used to adapt code like this:
f.then([] {})
->
f_result.then(utils::result_wrap([] {}))
Then utils::result_wrap_unpack works for the following case:
f.then_unpack([] (arg1, arg2) {})
->
f_result.then(utils::result_wrap_unpack([] (arg1, arg2) {}))
Adds a utils::result_discard_value, which is an alternative to
future::discard_result which just ignores the "success" value of the
provided result and does not ignore the exception.
Adds two methods to result_try's exception handles:
- as_inner: returns a {l,r}-value reference either to the exception
container, or the exception_ptr. This allows to use them in operations
which work on both types, e.g. logging.
- clone_inner: returns a copy of the underlying exception container or
exception ptr.
Currently, the catch handlers in result_futurize_try are required to
return a future, although they are always being called with
seastar::futurize_invoke, so if their result is not future it could be
converted to one anyway. This commit relaxes the ConvertsWithTo
constraint in order to allow this conversion.
The exception_container is supposed to be a cheaper, but possibly harder
to use alternative to std::exception_ptr. Before this commit, the
exception was kept behind foreign_ptr<std::unique_ptr<>> so that moving
the container is very cheap. However, the original std::exception_ptr
supports copying in a thread-safe manner, and it turns out that some of
the read coordinator logic intentionally copies the pointer in order to
be able to fail two different promises with the same exception.
The pointer type is changed to std::shared_ptr. Although it uses atomics
for reference counting, this is also probably what std::exception_ptr
does, so the performance should not be worse. The exception stored
inside the container is immutable, so this allows for a non-throwing
implementation of copying.
To encourage moves instead of copying, the copy constructor is deleted
and instead the `clone()` method should be used if it is really
necessary.
Adds a result-aware counterpart to seastar::repeat. The new function
does not base on seastar::repeat, but rather is a rewrite of the
original (using a coroutine instead of an open-coded task). The main
consequence of using a coroutine is that exceptions from AsyncAction
need to be thrown once more.
Adds a result-aware counterpart to seastar::do_until. The new function
does not base on seastar::do_until, but rather is a rewrite of the
original (using a coroutine instead of an open-coded task). The main
consequence of using a coroutine is that exceptions from StopCondition
or AsyncAction need to be thrown once more.
Adds result-aware counterparts to all seastar::map_reduce overloads.
Fortunately, it was possible to implement the functions by basing them
on seastar::map_reduce and get the same number of allocation. The only
exception happens when reducer::get() returns a non-ready future, which
doesn't seem to happen on the read coordinator path.
Adds:
- ResultRebindableTo<L, R>: concept which is satisfied by a pair of
results which do not necessarily share the same value, but have the
same error and policy types; a failed result L can be converted to a
failed result R.
- rebind_result<T, R>: given a value type T and another result R,
returns a result which can hold T as value and both the same error and
policy as R.
timestamped_val (and two other type aliases) are nested inside loading_cache,
but indented as if they were top-level names. Adjust the indent to
avoid confusion.
Closes#10118
This commit changes the behavior of `exception_container::accept`. Now,
instead of throwing an `utils::bad_exception_container_access` exception
when the container is empty, the provided visitor is invoked with that
exception instead. There are two reasons for this change:
- The exception_container is supposed to allow handling exceptions
without using the costly C++'s exception runtime. Although an empty
container is an edge case, I think it the new behavior is more aligned
with the class' purpose. The old behavior can be simulated by
providing a visitor which throws when called with bad access
exception.
- The new behavior fixes a bug in `result_try`/`result_futurize_try`.
Before the change, if the `try` block returned a failed result with an
empty exception container, a bad access exception would either be
thrown or returned as an exceptional future without being handled by
the `catch` clauses. Although nobody is supposed to return such
result<>s on purpose, a moved out result can be returned by accident
and it's important for the exception handling logic to be correct in
such a situation.
Tests: unit(dev)
Closes#10086
Adds `utils::result_try` and `utils::result_futurize_try` - functions which allow to convert existing try..catch blocks into a version which handles C++ exceptions, failed results with exception containers and, depending on the function variant, exceptional futures using the same exception handling logic.
For example, you can convert the following try..catch block:
try {
return a_function_that_may_throw();
} catch (const my_exception& ex) {
return 123;
} catch (...) {
throw;
}
...to this:
return utils::result_try([&] {
return a_function_that_may_throw_or_return_a_failed_result();
}, utils::result_catch<my_exception>([&] (const Ex&) {
return 123;
}), utils::result_catch_dots([&] (auto&& handle) {
return handle.into_result();
});
Similarly, `utils::result_futurize_try` can be used to migrate `then_wrapped` or `f.handle_exception()` constructs.
As an example of the usability of the new constructs, two places in the current code which need to simultaneously handle exceptions and failed results are converted to use `result_try` and `result_futurize_try`.
Results of `perf_simple_query --smp 1 --operations-per-shard 1000000 --write`:
```
127041.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52422 insns/op)
126958.60 tps ( 67.2 allocs/op, 14.2 tasks/op, 52409 insns/op)
127088.37 tps ( 67.2 allocs/op, 14.2 tasks/op, 52411 insns/op)
127560.84 tps ( 67.2 allocs/op, 14.2 tasks/op, 52424 insns/op)
127826.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52406 insns/op)
126801.02 tps ( 67.2 allocs/op, 14.2 tasks/op, 52420 insns/op)
125371.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52425 insns/op)
126498.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52427 insns/op)
126359.41 tps ( 67.2 allocs/op, 14.2 tasks/op, 52423 insns/op)
126298.27 tps ( 67.2 allocs/op, 14.2 tasks/op, 52410 insns/op)
```
The number of tasks and allocations is unchanged. The number of instructions per operations seems similar, it may have increased slightly (by 10-20) but it's hard to tell for sure because of the noisiness of the results.
Tests: unit(dev)
Closes#10045
* github.com:scylladb/scylla:
transport: use result_try in process_request_one
storage_proxy: use result_futurize_try in mutate_end
storage_proxy: temporarily throw exception from result in mutate_end
utils: add result_try and result_futurize_try
It now resembles the original parallel_for_each more, but uses a
coroutine instead of a custom `task` to collect not-ready futures.
Although the usage of a coroutine saves on allocations, the drawback is
that there is currently no way to co_await on a future and handle its
exception without throwing or without unconditionally allocating a
then_wrapped or handle_exception continuation - so it introduces a
rethrow.
Furthermore, now failed results and exceptions are treated as equals.
Previously, in case one parallel invocation returned failed result and
another returned an exception, the exception would always be returned.
Now, the failed result/exception of the invocation with the lowest index
is always preferred, regardless of the failure type.
The reimplementation manages to save about 350-400 instructions, one
task and one allocation in the perf_simple_query benchmark in write
mode.
Results from `perf_simple_query --smp 1 --operations-per-shard 1000000
--write` (before vs. after):
```
126872.54 tps ( 67.2 allocs/op, 14.2 tasks/op, 52404 insns/op)
126532.13 tps ( 67.2 allocs/op, 14.2 tasks/op, 52408 insns/op)
126864.99 tps ( 67.2 allocs/op, 14.2 tasks/op, 52428 insns/op)
127073.10 tps ( 67.2 allocs/op, 14.2 tasks/op, 52404 insns/op)
126895.85 tps ( 67.2 allocs/op, 14.2 tasks/op, 52411 insns/op)
127894.02 tps ( 66.2 allocs/op, 13.2 tasks/op, 52036 insns/op)
127671.51 tps ( 66.2 allocs/op, 13.2 tasks/op, 52042 insns/op)
127541.42 tps ( 66.2 allocs/op, 13.2 tasks/op, 52044 insns/op)
127409.10 tps ( 66.2 allocs/op, 13.2 tasks/op, 52052 insns/op)
127831.30 tps ( 66.2 allocs/op, 13.2 tasks/op, 52043 insns/op)
```
Test: unit(dev), unit(result_utils_test, debug)
Segregates result utilities into:
- result.hh - basic definitions related to results with exception
containers,
- result_combinators.hh - combinators for working with results in
conjunction with futures,
- result_loop.hh - loop-like combinators, currently has only
result_parallel_for_each.
The motivation for the split is:
1. In headers, usually only result.hh will be needed, so no need to
force most .cc files to compile definitions from other files,
2. Less files need to be recompiled when a combinator is added to
result_combinators or result_loop.
As a bonus, `result_with_exception` was moved from `utils::internal` to
just `utils`.
Adds result_try and result_futurize_try - functions which allow to
convert existing try..catch blocks into a version which handles C++
exceptions, failed results with exception containers and, depending on
the function variant, exceptional futures.
Adds a number of utilities for working with boost::outcome::result
combined with exception_container. The utilities are meant to help with
migration of the existing code to use the boost::outcome::result:
- `exception_container_throw_policy` - a NoValuePolicy meant to be used
as a template parameter for the boost::outcome::result. It protects
the caller of `result::value()` and `result::error()` methods - if the
caller wishes to get a value but the result has an error
(exception_container in our case), the exception in the container will
be thrown instead. In case it's the other way around,
boost::outcome::bad_result_access is thrown.
- `result_parallel_for_each` - a version of `parallel_for_each` which is
aware of results and returns a failed result in case any of the
parallel invocations return a failed result.
- `result_into_future` - converts a result into a future. If the result
holds a value, converts it into make_ready_future; if it holds an
exception, the exception is returned as make_exception_future.
- `then_ok_result` takes a `future<T>` and converts it into
a `future<result<T>>`.
- `result_wrap` adapts a callable of type `T -> future<result<T>>` and
returns a callable of type `result<T> -> future<result<T>>`.
Adds `exception_container` - a helper type used to hold exceptions as a
value, without involving the std::exception_ptr.
The motivation behind this type is that it allows inspecting exception's
type and value without having to rethrow that exception and catch it,
unlike std::exception_ptr. In our current codebase, some exception
handling paths need to rethrow the exception multiple times in order to
account it into metrics or encode it as an error response to the CQL
client. Some types of exceptions can be thrown very frequently in case
of overload (e.g. timeouts) and inspecting those exceptions with
rethrows can make the overload even worse. For those kinds of exceptions
it is important to handle them as cheaply as possible, and
exception_container used with conjunction with boost::outcome::result
can help achieve that.
If memory reclamation is triggered inside _cache.emplace(), the _cache
btree can get corrupted. Reclaimers erase from it, and emplace()
assumes that the tree is not modified during its execution. It first
locates the target node and then does memory allocation.
Fix by running emplace() under allocating section, which disables
memory reclamation.
The bug manifests with assert failures, e.g:
./utils/bptree.hh:1699: void bplus::node<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12, bplus::key_search::linear, bplus::with_debug::no>::refill(Less) [Key = unsigned long, T = cached_file::cached_page, Less = cached_file::page_idx_less_comparator, NodeSize = 12, Search = bplus::key_search::linear, Debug = bplus::with_debug::no]: Assertion `p._kids[i].n == this' failed.
Fixes#9915
Message-Id: <20220130175639.15258-1-tgrabiec@scylladb.com>
Fixes#9922
storage proxy uses is_timeout_exception to traverse different code paths.
a6202ae079 broke this (because bit rot and
intermixing), by wrapping exception for information purposes.
This adds check of nested types in exception handling, as well as a test
for the routine itself.
Closes#9932
* github.com:scylladb/scylla:
database/storage_proxy: Use "is_timeout_exception" instead of catch match
utils::is_timeout_exception: Ensure we handle nested exception types
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Fixes#9922
storage proxy uses is_timeout_exception to traverse different code paths.
a6202ae079 broke this (because bit rot and
intermixing), by wrapping exception for information purposes.
This adds check of nested types in exception handling, as well as a test
for the routine itself.
This series greatly reduces gossipers' dependence on `seastar::async` (yet, not completely).
`i_endpoint_state_change_subscriber` callbacks are converted to return futures (again, to get rid of `seastar::async` dependency), all users are adjusted appropriately (e.g. `storage_service`, `cdc::generation_service`, `streaming::stream_manager`, `view_update_backlog_broker` and `migration_manager`).
This includes futurizing and coroutinizing the whole function call chain up to the `i_endpoint_state_change_subscriber` callback functions.
To aid the conversion process, a non-`seastar::async` dependent variant of `utils::atomic_vector::for_each` is introduced (`for_each_futurized`). A different name is used to clearly distinguish converted and non-converted code, so that the last step (remove `seastar::async()` wrappers around callback-calling code in gossiper) is easier. This is left for a follow-up series, though.
Tests: unit(dev)
Closes#9844
* github.com:scylladb/scylla:
service: storage_service: coroutinize `set_gossip_tokens`
service: storage_service: coroutinize `leave_ring`
service: storage_service: coroutinize `handle_state_left`
service: storage_service: coroutinize `handle_state_leaving`
service: storage_service: coroutinize `handle_state_removing`
service: storage_service: coroutinize `do_drain`
service: storage_service: coroutinize `shutdown_protocol_servers`
service: storage_service: coroutinize `excise`
service: storage_service: coroutinize `remove_endpoint`
service: storage_service: coroutinize `handle_state_replacing`
service: storage_service: coroutinize `handle_state_normal`
service: storage_service: coroutinize `update_peer_info`
service: storage_service: coroutinize `do_update_system_peers_table`
service: storage_service: coroutinize `update_table`
service: storage_service: coroutinize `handle_state_bootstrap`
service: storage_service: futurize `notify_*` functions
service: storage_service: coroutinize `handle_state_replacing_update_pending_ranges`
repair: row_level_repair_gossip_helper: coroutinize `remove_row_level_repair`
locator: reconnectable_snitch_helper: coroutinize `reconnect`
gms: i_endpoint_state_change_subscriber: make callbacks to return futures
utils: atomic_vector: introduce future-returning `for_each` function
utils: atomic_vector: rename `for_each` to `thread_for_each`
gms: gossiper: coroutinize `start_gossiping`
gms: gossiper: coroutinize `force_remove_endpoint`
gms: gossiper: coroutinize `do_status_check`
gms: gossiper: coroutinize `remove_endpoint`
Refs: #9555
When running the "Kraken" dynamodb streams test to provoke the issued observed by QA, I noticed on my setup mainly two things: Large allocation stalls (+ warnings) and timeouts on read semaphores in DB.
This tries to address the first issue, partly by making query_result_view serialization using chunked vector instead of linear one, and by introducing a streaming option for json return objects, avoiding linearizing to string before wire.
Note that the latter has some overhead issues of its own, mainly data copying, since we essentially will be triple buffering (local, wrapped http stream, and final output stream). Still, normal string output will typically do a lot of realloc which is potential extra copies as well, so...
This is not really performance tested, but with these tweaks I no longer get large alloc stalls at least, so that is a plus. :-)
Closes#9713
* github.com:scylladb/scylla:
alternator::executor: Use streamed result for scan etc if large result
alternator::streams: Use streamed result in get_records if large result
executor/server: Add routine to make stream object return
rjson: Add print to stream of rjson::value
query_idl: Make qr_partition::rows/query_result::partitions chunked
Allows direct stream of object to seastar::stream. While not 100%
efficient, it has the advantage of avoiding large allocations
(long string) for huge result messages.
seastar::later() was recently deprecated and replaced with two
alternatives: a cheap seastar::yield() and an expensive (but more
powerful) seastar::check_for_io_immediately(), that corresponds to
the original later().
This patch replaces all later() calls with the weaker yield(). In
all cases except one, it's unambiguously correct. In one case
(test/perf scheduling_latency_measurer::stop()) it's not so ambiguous,
since check_for_io_immediately() will additionally force a poll and
so will cause more work to be done (but no additional tasks to be
executed). However, I think that any measurement that relies on
the measuring the work on the last tick to be inaccurate (you need
thousands of ticks to get any amount of confidence in the
measurement) that in the end it doesn't matter what we pick.
Tests: unit (dev)
Closes#9904
To emphasize that the function requires `seastar::thread`
context to function properly.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The header file <gm/inet_address.hh> is included, directly or
indirectly, from 291 source files in Scylla. It is hard to reduce this
number because Scylla relies heavily on IP addresses as keys to
different things. So it is important that this header file be fast to
include. Unfortunately it wasn't... ClangBuildAnalyzer measurements
showed that each inclusion of this header file added a whopping 2 seconds
(in dev build mode) to the build. A total of 600 CPU seconds - 10 CPU
minutes - were spent just on this header file. It was actually worse
because the build also spent additional time on template instantiation
(more on this below).
So in this patch we:
1. Remove some unnecessary stuff from gms/inet_address.hh, and avoid
including it in one place that doesn't need it. This is just
cosmetic, and doesn't significantly speed up the build.
2. Move the to_sstring() implementation for the .hh to .cc. This saves
a lot of time on template instantiations - previously every source
file instantiated this to_sstring(), which was slow (that "format"
thing is slow).
3. Do not include <seastar/net/ip.hh> which is a huge file including
half the world. All we need from it is the type "ipv4_address",
so instead include just the new <seastar/net/ipv4_address.hh>.
This change brings most of the performance improvement.
So source files forgot to include various Seastar header files
because the includes-everything ip.hh did it - so we need to add
these missing includes in this patch.
After this patch, ClangBuildAnalyzer's reports that the cost of
inclusion of <gms/inet_address.hh> is down from 2 seconds to 0.326
seconds. Additionally the format<inet_address> template instantiation
291 times - about half a second each - is also gone.
All in all, this patch should reduce around 10 CPU minutes from the build.
Refs #1
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
alloc_buf() calls new_buf_active() when there is no active segment to
allocate a new active segment. new_buf_active() allocates memory
(e.g. a new segment) so may cause memory reclamation, which may cause
segment compaction, which may call alloc_buf() and re-enter
new_buf_active(). The first call to new_buf_active() would then
override _buf_active and cause the segment allocated during segment
compaction to be leaked.
This then causes abort when objects from the leaked segment are freed
because the segment is expected to be present in _closed_segments, but
isn't. boost::intrusive::list::erase() will fail on assertion that the
object being erased is linked.
Introduced in b5ca0eb2a2.
Fixes#9821Fixes#9192Fixes#9825Fixes#9544Fixes#9508
Refs #9573
Message-Id: <20211229201443.119812-1-tgrabiec@scylladb.com>
This change makes row cache support reverse reads natively so that reversing wrappers are not needed when reading from cache and thus the read can be executed efficiently, with similar cost as the forward-order read.
The database is serving reverse reads from cache by default after this. Before, it was bypassing cache by default after 703aed3277.
Refs: #1413
Tests:
- unit [dev]
- manual query with build/dev/scylla and cache tracing on
Closes#9454
* github.com:scylladb/scylla:
tests: row_cache: Extend test_concurrent_reads_and_eviction to run reverse queries
row_cache: partition_snapshot_row_cursor: Print more details about the current version vector
row_cache: Improve trace-level logging
config: Use cache for reversed reads by default
config: Adjust reversed_reads_auto_bypass_cache description
row_cache: Support reverse reads natively
mvcc: partition_snapshot: Support slicing range tombstones in reverse
test: flat_mutation_reader_assertions: Consume expected range tombstones before end_of_partition
row_cache: Log produced range tombstones
test: Make produces_range_tombstone() report ck_ranges
tests: lib: random_mutation_generator: Extract make_random_range_tombstone()
partition_snapshot_row_cursor: Support reverse iteration
utils: immutable-collection: Make movable
intrusive_btree: Make default-initialized iterator cast to false