Adds a number of utilities for working with boost::outcome::result
combined with exception_container. The utilities are meant to help with
migration of the existing code to use the boost::outcome::result:
- `exception_container_throw_policy` - a NoValuePolicy meant to be used
as a template parameter for the boost::outcome::result. It protects
the caller of `result::value()` and `result::error()` methods - if the
caller wishes to get a value but the result has an error
(exception_container in our case), the exception in the container will
be thrown instead. In case it's the other way around,
boost::outcome::bad_result_access is thrown.
- `result_parallel_for_each` - a version of `parallel_for_each` which is
aware of results and returns a failed result in case any of the
parallel invocations return a failed result.
- `result_into_future` - converts a result into a future. If the result
holds a value, converts it into make_ready_future; if it holds an
exception, the exception is returned as make_exception_future.
- `then_ok_result` takes a `future<T>` and converts it into
a `future<result<T>>`.
- `result_wrap` adapts a callable of type `T -> future<result<T>>` and
returns a callable of type `result<T> -> future<result<T>>`.
Adds `exception_container` - a helper type used to hold exceptions as a
value, without involving the std::exception_ptr.
The motivation behind this type is that it allows inspecting exception's
type and value without having to rethrow that exception and catch it,
unlike std::exception_ptr. In our current codebase, some exception
handling paths need to rethrow the exception multiple times in order to
account it into metrics or encode it as an error response to the CQL
client. Some types of exceptions can be thrown very frequently in case
of overload (e.g. timeouts) and inspecting those exceptions with
rethrows can make the overload even worse. For those kinds of exceptions
it is important to handle them as cheaply as possible, and
exception_container used with conjunction with boost::outcome::result
can help achieve that.
If memory reclamation is triggered inside _cache.emplace(), the _cache
btree can get corrupted. Reclaimers erase from it, and emplace()
assumes that the tree is not modified during its execution. It first
locates the target node and then does memory allocation.
Fix by running emplace() under allocating section, which disables
memory reclamation.
The bug manifests with assert failures, e.g:
./utils/bptree.hh:1699: void bplus::node<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12, bplus::key_search::linear, bplus::with_debug::no>::refill(Less) [Key = unsigned long, T = cached_file::cached_page, Less = cached_file::page_idx_less_comparator, NodeSize = 12, Search = bplus::key_search::linear, Debug = bplus::with_debug::no]: Assertion `p._kids[i].n == this' failed.
Fixes#9915
Message-Id: <20220130175639.15258-1-tgrabiec@scylladb.com>
Fixes#9922
storage proxy uses is_timeout_exception to traverse different code paths.
a6202ae079 broke this (because bit rot and
intermixing), by wrapping exception for information purposes.
This adds check of nested types in exception handling, as well as a test
for the routine itself.
Closes#9932
* github.com:scylladb/scylla:
database/storage_proxy: Use "is_timeout_exception" instead of catch match
utils::is_timeout_exception: Ensure we handle nested exception types
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Fixes#9922
storage proxy uses is_timeout_exception to traverse different code paths.
a6202ae079 broke this (because bit rot and
intermixing), by wrapping exception for information purposes.
This adds check of nested types in exception handling, as well as a test
for the routine itself.
This series greatly reduces gossipers' dependence on `seastar::async` (yet, not completely).
`i_endpoint_state_change_subscriber` callbacks are converted to return futures (again, to get rid of `seastar::async` dependency), all users are adjusted appropriately (e.g. `storage_service`, `cdc::generation_service`, `streaming::stream_manager`, `view_update_backlog_broker` and `migration_manager`).
This includes futurizing and coroutinizing the whole function call chain up to the `i_endpoint_state_change_subscriber` callback functions.
To aid the conversion process, a non-`seastar::async` dependent variant of `utils::atomic_vector::for_each` is introduced (`for_each_futurized`). A different name is used to clearly distinguish converted and non-converted code, so that the last step (remove `seastar::async()` wrappers around callback-calling code in gossiper) is easier. This is left for a follow-up series, though.
Tests: unit(dev)
Closes#9844
* github.com:scylladb/scylla:
service: storage_service: coroutinize `set_gossip_tokens`
service: storage_service: coroutinize `leave_ring`
service: storage_service: coroutinize `handle_state_left`
service: storage_service: coroutinize `handle_state_leaving`
service: storage_service: coroutinize `handle_state_removing`
service: storage_service: coroutinize `do_drain`
service: storage_service: coroutinize `shutdown_protocol_servers`
service: storage_service: coroutinize `excise`
service: storage_service: coroutinize `remove_endpoint`
service: storage_service: coroutinize `handle_state_replacing`
service: storage_service: coroutinize `handle_state_normal`
service: storage_service: coroutinize `update_peer_info`
service: storage_service: coroutinize `do_update_system_peers_table`
service: storage_service: coroutinize `update_table`
service: storage_service: coroutinize `handle_state_bootstrap`
service: storage_service: futurize `notify_*` functions
service: storage_service: coroutinize `handle_state_replacing_update_pending_ranges`
repair: row_level_repair_gossip_helper: coroutinize `remove_row_level_repair`
locator: reconnectable_snitch_helper: coroutinize `reconnect`
gms: i_endpoint_state_change_subscriber: make callbacks to return futures
utils: atomic_vector: introduce future-returning `for_each` function
utils: atomic_vector: rename `for_each` to `thread_for_each`
gms: gossiper: coroutinize `start_gossiping`
gms: gossiper: coroutinize `force_remove_endpoint`
gms: gossiper: coroutinize `do_status_check`
gms: gossiper: coroutinize `remove_endpoint`
Refs: #9555
When running the "Kraken" dynamodb streams test to provoke the issued observed by QA, I noticed on my setup mainly two things: Large allocation stalls (+ warnings) and timeouts on read semaphores in DB.
This tries to address the first issue, partly by making query_result_view serialization using chunked vector instead of linear one, and by introducing a streaming option for json return objects, avoiding linearizing to string before wire.
Note that the latter has some overhead issues of its own, mainly data copying, since we essentially will be triple buffering (local, wrapped http stream, and final output stream). Still, normal string output will typically do a lot of realloc which is potential extra copies as well, so...
This is not really performance tested, but with these tweaks I no longer get large alloc stalls at least, so that is a plus. :-)
Closes#9713
* github.com:scylladb/scylla:
alternator::executor: Use streamed result for scan etc if large result
alternator::streams: Use streamed result in get_records if large result
executor/server: Add routine to make stream object return
rjson: Add print to stream of rjson::value
query_idl: Make qr_partition::rows/query_result::partitions chunked
Allows direct stream of object to seastar::stream. While not 100%
efficient, it has the advantage of avoiding large allocations
(long string) for huge result messages.
seastar::later() was recently deprecated and replaced with two
alternatives: a cheap seastar::yield() and an expensive (but more
powerful) seastar::check_for_io_immediately(), that corresponds to
the original later().
This patch replaces all later() calls with the weaker yield(). In
all cases except one, it's unambiguously correct. In one case
(test/perf scheduling_latency_measurer::stop()) it's not so ambiguous,
since check_for_io_immediately() will additionally force a poll and
so will cause more work to be done (but no additional tasks to be
executed). However, I think that any measurement that relies on
the measuring the work on the last tick to be inaccurate (you need
thousands of ticks to get any amount of confidence in the
measurement) that in the end it doesn't matter what we pick.
Tests: unit (dev)
Closes#9904
To emphasize that the function requires `seastar::thread`
context to function properly.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The header file <gm/inet_address.hh> is included, directly or
indirectly, from 291 source files in Scylla. It is hard to reduce this
number because Scylla relies heavily on IP addresses as keys to
different things. So it is important that this header file be fast to
include. Unfortunately it wasn't... ClangBuildAnalyzer measurements
showed that each inclusion of this header file added a whopping 2 seconds
(in dev build mode) to the build. A total of 600 CPU seconds - 10 CPU
minutes - were spent just on this header file. It was actually worse
because the build also spent additional time on template instantiation
(more on this below).
So in this patch we:
1. Remove some unnecessary stuff from gms/inet_address.hh, and avoid
including it in one place that doesn't need it. This is just
cosmetic, and doesn't significantly speed up the build.
2. Move the to_sstring() implementation for the .hh to .cc. This saves
a lot of time on template instantiations - previously every source
file instantiated this to_sstring(), which was slow (that "format"
thing is slow).
3. Do not include <seastar/net/ip.hh> which is a huge file including
half the world. All we need from it is the type "ipv4_address",
so instead include just the new <seastar/net/ipv4_address.hh>.
This change brings most of the performance improvement.
So source files forgot to include various Seastar header files
because the includes-everything ip.hh did it - so we need to add
these missing includes in this patch.
After this patch, ClangBuildAnalyzer's reports that the cost of
inclusion of <gms/inet_address.hh> is down from 2 seconds to 0.326
seconds. Additionally the format<inet_address> template instantiation
291 times - about half a second each - is also gone.
All in all, this patch should reduce around 10 CPU minutes from the build.
Refs #1
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
alloc_buf() calls new_buf_active() when there is no active segment to
allocate a new active segment. new_buf_active() allocates memory
(e.g. a new segment) so may cause memory reclamation, which may cause
segment compaction, which may call alloc_buf() and re-enter
new_buf_active(). The first call to new_buf_active() would then
override _buf_active and cause the segment allocated during segment
compaction to be leaked.
This then causes abort when objects from the leaked segment are freed
because the segment is expected to be present in _closed_segments, but
isn't. boost::intrusive::list::erase() will fail on assertion that the
object being erased is linked.
Introduced in b5ca0eb2a2.
Fixes#9821Fixes#9192Fixes#9825Fixes#9544Fixes#9508
Refs #9573
Message-Id: <20211229201443.119812-1-tgrabiec@scylladb.com>
This change makes row cache support reverse reads natively so that reversing wrappers are not needed when reading from cache and thus the read can be executed efficiently, with similar cost as the forward-order read.
The database is serving reverse reads from cache by default after this. Before, it was bypassing cache by default after 703aed3277.
Refs: #1413
Tests:
- unit [dev]
- manual query with build/dev/scylla and cache tracing on
Closes#9454
* github.com:scylladb/scylla:
tests: row_cache: Extend test_concurrent_reads_and_eviction to run reverse queries
row_cache: partition_snapshot_row_cursor: Print more details about the current version vector
row_cache: Improve trace-level logging
config: Use cache for reversed reads by default
config: Adjust reversed_reads_auto_bypass_cache description
row_cache: Support reverse reads natively
mvcc: partition_snapshot: Support slicing range tombstones in reverse
test: flat_mutation_reader_assertions: Consume expected range tombstones before end_of_partition
row_cache: Log produced range tombstones
test: Make produces_range_tombstone() report ck_ranges
tests: lib: random_mutation_generator: Extract make_random_range_tombstone()
partition_snapshot_row_cursor: Support reverse iteration
utils: immutable-collection: Make movable
intrusive_btree: Make default-initialized iterator cast to false
Currently this parse function reads only 100KB worth
of members in eac hiteration.
Since the default max_chunk_capacity is 128KB,
100KB underutilize the chunk capacity, and it could
be safely increased to the max to reduce the number of
allocations and corresponding calls to read_exactly
for large arrays.
Expose utils::chunked_vector::max_chunk_capacity
so that the caler wouldn't have to guess this number
and use it in parse().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211222103126.1819289-2-bhalevy@scylladb.com>
Seastar moved the read_entire_stream(), read_entire_stream_contiguous()
and skip_entire_stream() from the "httpd" namespace to the "util"
namespace. Using them with their old names causes deprecation warnings
when compiling alternator/server.cc.
This patch fixes the namespace (and adds the new include) to get rid of
the deprecation warnings.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211209132759.1319420-1-nyh@scylladb.com>
The stall report uses the millisecond unit, but actually reports
nanoseconds.
Switch to microseconds (milliseconds are a bit too coarse) and
use the safer "duration / 1us" style rather than "duration::count()"
that leads to unit confusion.
Fixes#9733.
Closes#9734
Provide a template parameter to provide a static callbacks object to
increment a counter of evictions from the unprivileged section.
If entries are evicted from the cache while still in the unprivileged
section indicates a not efficient usage of the cache and should be
investigated.
This patch instruments authorized_prepared_statements_cache and a
prepared_statements_cache objects to provide non-empty callbacks.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
This patch implements a simple variation of LFRU eviction policy:
* We define 2 dynamic cache sections which total size should not exceed the maximum cache size.
* New cache entry is always added to the "unprivileged" section.
* After a cache entry is read more than SectionHitThreshold times it moves to the second cache section.
* Both sections' entries obey expiration and reload rules in the same way as before this patch.
* When cache entries need to be evicted due to a size restriction "unprivileged" section's
least recently used entries are evicted first.
Note:
With a 2 sections cache it's not enough for a new entry to have the latest timestamp
in order not be evicted right after insertion: e.g. if all all other entries
are from the privileged section.
And obviously we want to allow new cache entries to be added to a cache.
Therefore we can no longer first add a new entry and then shrink the cache.
Switching the order of these two operations resolves the culprit.
Fixes#8674
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
* Store a reference to a parent (loading_cache) object instead of holding
references to separate fields.
* Access loading_cache fields via accessors.
* Move the LRU "touch" logic to the loading_cache.
* Keep only a plain "list entry" logic in the lru_entry class.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Hide internal classes inside the loading_cache class:
* Simpler calls - no need for a tricky back-referencing to access loading_cache fields.
* Cleaner interface.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Use std::pmr::polymorphic_allocator instead of
std::allocator - the former allows not to define the
allocated object during the template specification.
As a result we won't have to have lru_entry defined
before loading_cache, which in line would allow us
to rearrange classes making all classes internal to
loading_cache and hence simplifying the interface.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
fmt 8 checks format strings at compile time, and requires that
non-compile-time format strings be wrapped with fmt::runtime().
Do that, and to allow coexistence with fmt 7, supply our own
do-nothing version of fmt::runtime() if needed. Strictly speaking
we shouldn't be introducing names into the fmt namespace, but this
is transitional only.
Closes#9640
We've been observing hard to explain crashes recently around
lsa_buffer destruction, where the containing segment is absent in
_segment_descs which causes log_heap::adjust_up to abort. Add more
checks to catch certain impossible senarios which can lead to this
sooner.
Refs #9192.
Message-Id: <20211116122346.814437-1-tgrabiec@scylladb.com>
We cannot recover from a failure in this method. The implementation
makes sure it never happens. Invariants will be broken if this
throws. Detect violations early by marking as noexcept.
We could make it exception safe and try to leave the data structures
in a consistent state but the reclaimer cannot make progress if this throws, so
it's pointless.
Refs #9192
Message-Id: <20211116122019.813418-1-tgrabiec@scylladb.com>
A config option value is reported as 'text' type and contains
a string as it would looks like in json config.
The table is UPDATE-able. Only the 'value' columnt can be set
and the value accepted must be string. It will be converted into
the option type automatically, however in current implementation
is't not 100% precise -- conversion is lexicographical cast which
only works for simple types. However, liveupdate-able values are
only of those types, so it works in supported cases.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There soon will appear an updateable system.config table that
will push sstrings into names_value-s. Prepare for this change
by adding the respective .set_value() call. Since the update
only works for LiveUpdate-able options, and inability to do it
can be propagated back to the caller make this method return
true/false whether the update took place or not.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When a named_value is .set_value()-d the caller may specify the reason
for this change. If not specified it's set to None, but None means
"it was there by default and was't changed" so it's a bit of a lie.
Add an explicit Internal reason. It's actually used by the directories
thing that update all directories according to --workdir option.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Those named_values that support .empty() check can be "selected"
like this
auto& v = option_a() || option_b() || option_c();
This code will put into v a reference to the first non-empty
named_value out of a/b/c.
This "selection" is actually used on start when scylla decides
which config options to use as listen/broadcact/rpc/etc. addresses.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This PR started by realizing that in the memtable reversing reader, it
never happened on tests that `do_refresh_state` was called with
`last_row` and `last_rts` which are not `std::nullopt`.
Changes
- fix memtable test (`tesst_memtable_with_many_versions_conforms_to_mutation_source`), so that there is a background job forcing state refreshes,
- fix the way rt_slice is computed (was `(last_rts, cr_range_snapshot.end]`, now is `[cr_range_snapshot.start, last_rts)`).
Fixes#9486Closes#9572
* github.com:scylladb/scylla:
partition_snapshot_reader: fix indentation in fill_buffer
range_tombstone_list: {lower,upper,}slice share comparator implementation
test: memtable: add full_compaction in background
partition_snapshot_reader: fix obtaining rt_slice, if Reversing and _last_rts was set
range_tombstone_list: add lower_slice
The rjson::set() *sounds* like it can set any member of a JSON object
(i.e., map), but that's not true :-( It calls the RapidJson function
AddMember() so it can only add a member to an object which doesn't have
a member with the same name (i.e., key). If it is called with a key
that already has a value, the result may have two values for the same
key, which is ill-formed and can cause bugs like issue #9542.
So in this patch we begin by renaming rjson::set() and its variant to
rjson::add() - to suggest to its user that this function only adds
members, without checking if they already exist.
After this rename, I was left with dozens of calls to the set() functions
that need to changed to either add() - if we're sure that the object
cannot already have a member with the same name - or to replace() if
it might.
The vast majority of the set() calls were starting with an empty item
and adding members with fixed (string constant) names, so these can
be trivially changed to add().
It turns out that *all* other set() calls - except the one fixed in
issue #9542 - can also use add() because there are various "excuses"
why we know the member names will be unique. A typical example is
a map with column-name keys, where we know that the column names
are unique. I added comments in front of such non-obvious uses of
add() which are safe.
Almost all uses of rjson except a handful are in Alternator, so I
verified that all Alternator test cases continue to pass after this
patch.
Fixes#9583
Refs #9542
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211104152540.48900-1-nyh@scylladb.com>
This patch fixes a bug in UpdateItem's ReturnValues=ALL_NEW, which in
some cases returned the OLD (pre-modification) value of some of the
attributes, instead of its NEW value.
The bug was caused by a confusion in our JSON utility function,
rjson::set(), which sounds like it can set any member of a map, but in
fact may only be used to add a *new* member - if a member with the same
name (key) already existed, the result is undefined (two values for the
same key). In ReturnValues=ALL_NEW we did exactly this: we started with
a copy of the original item, and then used set() to override some of the
members. This is not allowed.
So in this patch, we introduce a new function, rjson::replace(), which
does what we previously thought that rjson::set() does - i.e., replace a
member if it exists, or if not, add it. We call this function in
the ReturnValues=ALL_NEW code.
This patch also adds a test case that reproduces the incorrect ALL_NEW
results - and gets fixed by this patch.
In an upcoming patch, we should rename the confusingly-named set()
functions and audit all their uses. But we don't do this in this patch
yet. We just add some comments to clarify what set() does - but don't
change it, and just add one new function for replace().
Fixes#9542
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211104134937.40797-1-nyh@scylladb.com>
sprint() uses the printf-style formatting language while most of our
code uses the Python-derived format language from fmt::format().
The last mass conversion of sprint() to fmt (in 1129134a4a)
missed some callers (principally those that were on multiple lines, and
so the automatic converter missed them). Convert the remainder to
fmt::format(), and some sprintf() and printf() calls, so we have just
one format language in the code base. Seastar::sprint() ought to be
deprecated and removed.
Test: unit (dev)
Closes#9529
* github.com:scylladb/scylla:
utils: logalloc: convert debug printf to fmt::print()
utils: convert fmt::fprintf() to fmt::print()
main: convert fprint() to fmt::print()
compress: convert fmt::sprintf() to fmt::format()
tracing: replace seastar::sprint() with fmt::format()
thrift: replace seastar::sprint() with fmt::format()
test: replace seastar::sprint() with fmt::format()
streaming: replace seastar::sprint() with fmt::format()
storage_service: replace seastar::sprint() with fmt::format()
repair: replace seastar::sprint() with fmt::format()
redis: replace seastar::sprint() with fmt::format()
locator: replace seastar::sprint() with fmt::format()
db: replace seastar::sprint() with fmt::format()
cql3: replace seastar::sprint() with fmt::format()
cdc: replace seastar::sprint() with fmt::format()
auth: replace seastar::sprint() with fmt::format()
if mean() is called when there are no elements in the histogram,
a runtime error will happen due to division-by-zero.
approx_exponential_histogram::mean() handles it but for some
reason we forgot to do the same for estimated_histogram.
this problem was found when adding an unit test which calls
mean() in an empty histogram.
Fixes#9531.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211027142813.56969-1-raphaelsc@scylladb.com>
Enable creating shared_ptr<BaseClass> in nonstatic_class_registry
using BaseClass::ptr_type and use that for
abstract_replication_strategy.
While at it, also clean up compressor with that respect
to define compressor::ptr_type as shared_ptr<compressor>
thus simplifying compressor_registry.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Our source base drifted away from gcc compatibility; this mostly
restores the ability to build with gcc. An important exception is
coroutines that have an initializer list [1]; this still doesn't work.
We aim to switch back to gcc 11 if/when this gives us better
C++ compatibility and performance.
Test: unit (dev)
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98056Closes#9459
* github.com:scylladb/scylla:
test: radix_tree_printer: avoid template specialization in class context
test: raft: avoid ignored variable errors
test: reader_concurrency_semaphore_test: isolate from namespace of source_location
test: cql_query_test: drop unused lambda assert_replication_not_contains
test: commitlog_test: don't use deprecated seastar::unaligned_cast
test: adjust signed/unsigned comparisons in loops and boost tests
build: silence some gcc 11 warnings
sstables: processing_result_generator: make coroutine support palatable for C++20 compilers
managed_bytes: avoid compile-time loop in converting constructor
service: service_level_controller: drop unused variable sl_compare
raft: disambiguate promise name in raft::active_read
locator: azure_snitch: use full type name in definition of globals
cql3: statements: create_service_level_statement: don't ignore replace_defaults()
cql3: statement_restrictions: adjust call to std::vector deduction guide
types: remove recursive constraint in deserialize_value
cql3: restrictions: relax constraint on visitor_with_binary_operator_content
treewide: handle switch statements that return
cql3: expr: correct type of captured map value_type
cdc: adjust type of streams_count
alternator: disambiguate attrs_to_get in table_requests
This method sits in dist.loader, but really belongs to util/ as it
just works on an "abstract" path and doesn't need to know what this
path is about. Another sign of layering violation is the inclusion
of dist.loader code into util/ stuf.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
managed_bytes_basic_view is a template with a constructor that
converts from one instantiation of the template to another.
Unfortunately when gcc encounters the associated constraint, it
instantiates the template which forces it to evaluate the constraint
again, sending it into a loop.
Fix that by making the converting constructor a template itself,
delaying instantiation. The constraint is strengthened so the set
of types on which the constructor works is unchanged.