Rename reclaim_timer::_reserve_segments to _segments_to_release
as it is clearer and more suitable for later patches
that will add reclaim_timers in more functions.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bytes_ostream is an incremental builder for a discontiguous byte container.
managed_bytes is a non-incremental (size must be known up front) byte
container, that is also compatible with LSA. So far, conversion between
them involves copying. This is unfortunate, since query_result is generated
as a bytes_ostream, but is later converted to managed_bytes (today, this
is done in cql3::expr::get_non_pk_values() and
compound_view_wrapper::explode(). If the two types could be made compatible,
we could use managed_bytes_view instead of creating new objects and avoid
a copy. It's also nicer to have one less vocabulary type.
This patch makes bytes_ostream use managed_bytes' internal representation
(blob_storage instead of bytes_ostream::chunk) and provides a conversion
to managed_bytes. All bytes_ostream users are left in place, but the goal
is to make bytes_ostream a write-only type with the only observer a conversion
to managed_bytes.
It turns out to be relatively simple. The internal representations were
already similar. I made blob_storage::ref_type self-initializing to
reduce churn (good practice anyway) and added a private constructor
to managed_bytes for the conversion.
Note that bytes_ostream can only be used to construct a non-LSA managed_bytes,
but LSA uses of managed_bytes are very strictly controlled (the entry
points to memtable and cache) so that's not a problem.
A unit test is added.
Closes#10986
This PR gets rid of exception throws/rethrows on the replica side for writes and single-partition reads. This goal is achieved without using `boost::outcome` but rather by replacing the parts of the code which throw with appropriate seastar idioms and by introducing two helper functions:
1.`try_catch` allows to inspect the type and value behind an `std::exception_ptr`. When libstdc++ is used, this function does not need to throw the exception and avoids the very costly unwind process. This based on the "How to catch an exception_ptr without even try-ing" proposal mentioned in https://github.com/scylladb/scylla/issues/10260.
This function allows to replace the current `try..catch` chains which inspect the exception type and account it in the metrics.
Example:
```c++
// Before
try {
std::rethrow_exception(eptr);
} catch (std::runtime_exception& ex) {
// 1
} catch (...) {
// 2
}
// After
if (auto* ex = try_catch<std::runtime_exception>(eptr)) {
// 1
} else {
// 2
}
```
2. `make_nested_exception_ptr` which is meant to be a replacement for `std::throw_with_nested`. Unlike the original function, it does not require an exception being currently thrown and does not throw itself - instead, it takes the nested exception as an `std::exception_ptr` and produces another `std::exception_ptr` itself.
Apart from the above, seastar idioms such as `make_exception_future`, `co_await as_future`, `co_return coroutine::exception()` are used to propagate exceptions without throwing. This brings the number of exception throws to zero for single partition reads and writes (tested with scylla-bench, --mode=read and --mode=write).
Results from `perf_simple_query`:
```
Before (719724e4df):
Writes:
Normal:
127841.40 tps ( 56.2 allocs/op, 13.2 tasks/op, 50042 insns/op, 0 errors)
Timeouts:
94770.81 tps ( 53.1 allocs/op, 5.1 tasks/op, 78678 insns/op, 1000000 errors)
Reads:
Normal:
138902.31 tps ( 65.1 allocs/op, 12.1 tasks/op, 43106 insns/op, 0 errors)
Timeouts:
62447.01 tps ( 49.7 allocs/op, 12.1 tasks/op, 135984 insns/op, 936846 errors)
After (d8ac4c02bfb7786dc9ed30d2db3b99df09bf448f):
Writes:
Normal:
127359.12 tps ( 56.2 allocs/op, 13.2 tasks/op, 49782 insns/op, 0 errors)
Timeouts:
163068.38 tps ( 52.1 allocs/op, 5.1 tasks/op, 40615 insns/op, 1000000 errors)
Reads:
Normal:
151221.15 tps ( 65.1 allocs/op, 12.1 tasks/op, 43028 insns/op, 0 errors)
Timeouts:
192094.11 tps ( 41.2 allocs/op, 12.1 tasks/op, 33403 insns/op, 960604 errors)
```
Closes#10368
* github.com:scylladb/scylla:
database: avoid rethrows when handling exceptions from commitlog
database: convert throw_commitlog_add_error to use make_nested_exception_ptr
utils: add make_nested_exception_ptr
storage_proxy: don't rethrow when inspecting replica exceptions on write path
database: don't rethrow rate_limit_exception
storage_proxy: don't rethrow the exception in abstract_read_resolver::error
utils/exceptions.cc: don't rethrow in is_timeout_exception
utils/exceptions: add try_catch
utils: add abi/eh_ia64.hh
storage_proxy: don't rethrow exceptions from replicas when accounting read stats
message: get rid of throws in send_message{,_timeout,_abortable}
database/{query,query_mutations}: don't rethrow read semaphore exceptions
Recently we noticed a regression where with certain versions of the fmt
library,
SELECT value FROM system.config WHERE name = 'experimental_features'
returns string numbers, like "5", instead of feature names like "raft".
It turns out that the fmt library keep changing their overload resolution
order when there are several ways to print something. For enum_option<T> we
happen to have to conflicting ways to print it:
1. We have an explicit operator<<.
2. We have an *implicit* convertor to the type held by T.
We were hoping that the operator<< always wins. But in fmt 8.1, there is
special logic that if the type is convertable to an int, this is used
before operator<<()! For experimental_features_t, the type held in it was
an old-style enum, so it is indeed convertible to int.
The solution I used in this patch is to replace the old-style enum
in experimental_features_t by the newer and more recommended "enum class",
which does not have an implicit conversion to int.
I could have fixed it in other ways, but it wouldn't have been much
prettier. For example, dropping the implicit convertor would require
us to change a bunch of switch() statements over enum_option (and
not just experimental_features_t, but other types of enum_option).
Going forward, all uses of enum_option should use "enum class", not
"enum". tri_mode_restriction_t was already using an enum class, and
now so does experimental_features_t. I changed the examples in the
comments to also use "enum class" instead of enum.
This patch also adds to the existing experimental_features test a
check that the feature names are words that are not numbers.
Fixes#11003.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11004
Scylla's coding standard requires that each header is self-sufficient,
i.e., it includes whatever other headers it needs - so it can be included
without having to include any other header before it.
We have a test for this, "ninja dev-headers", but it isn't run very
frequently, and it turns out our code deviated from this requirement
in a few places. This patch fixes those places, and after it
"ninja dev-headers" succeeds again.
Fixes#10995
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10997
An updateable_value() may come without source attached. One of the
options how this can happen is if the value sits on a service config.
It's a good option to make the config have some default initialization
for the option, but in this case observe()ing an option by the service
would step on null pointer dereference.
Said that, if a value without source is tried to be observed -- assume
that it's OK, but the value would never change, so a dummy observer is
to be provided.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Live-updating an option may involve running some action when the option
changes, not just getting its new value into somewhere. The action is
nice to be run as serialized action to batch config updates.
Said that, here's a sugar to write
serialized_action _foo = [this] { return foo(); };
observer<> _o = option.observe(_foo.make_observer());
instead of
serialized_action _foo = [this] { return foo(); };
observer<> _o = option.observe([this] {
// waited with .join on stop
(void)_foo.trigger();
});
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The utils::make_nested_exception_ptr function works similar to
std::throw_with_nested, but instead of storing the currently thrown
exception as the nested exception and then immediately throwing the new
exception, it receives the nested exception as an std::exception_ptr and
also returns an std::exception_ptr.
If the standard library supports it, the function does not perform any
throws. Otherwise the fallback logic performs two throws.
Introduces a utility function which allows obtaining a pointer to the
exception data held behind an std::exception_ptr if the data matches the
requested type. It can be used to implement manual but concise
try..catch chains.
The `try_catch` has the best performance when used with libstdc++ as it
uses the stdlib specific functions for simulating a try..catch without
having to actually throw. For other stdlibs, the implementation falls
back to a throw surrounded by an actual try..catch.
Adds a header for utility functions/structures, based on the Itanium ABI
for C++, necessary for us to inspect exceptions behind
std::exception_ptr without having to actually rethrow the exception.
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache (e.g. Adding a user and changing their permissions)
can become pretty annoying.
This patch series make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
It also adds an API for flushing the cache to make it easier for users who
don't want to modify their permissions_cache config.
branch: https://github.com/igorribeiroduarte/scylla/tree/make_permissions_cache_live_updateable
CI: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1005/
dtests: https://github.com/igorribeiroduarte/scylla-dtest/tree/test_permissions_cache
* https://github.com/igorribeiroduarte/scylla/make_permissions_cache_live_updateable:
loading_cache_test: Test loading_cache::reset and loading_cache::update_config
api: Add API for resetting authorization cache
authorization_cache: Make permissions cache and authorized prepared statements cache live updateable
auth_prep_statements_cache: Make aut_prep_statements_cache accept a config struct
utils/loading_cache.hh: Add update_config method
utils/loading_cache.hh: Rename permissions_cache_config to loading_cache_config and move it to loading_cache.hh
utils/loading_cache.hh: Add reset method
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache(e.g.: Adding a user) can become pretty annoying.
This patch make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch adds an update_config method in order to allow live updating the
config for permissions_cache. This method is going to be used in the next
patches after making permissions_cache config live updateable.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch renames the permissions_cache_config struct to loading_cache_config
and moves it to utils/loading_cache.hh. This will make it easier to handle
config updates to the authorization caches on the next patches
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
- Use `sstables::generation_type` in more places
- Enforce conceptual separation of `sstables::generation_type` and `int64_t`
- Fix `extremum_tracker` so that `sstables::generation_type` can be non-default-constructible
Fixes#10796.
Closes#10844
* github.com:scylladb/scylla:
sstables: make generation_type an actual separate type
sstables: use generation_type more soundly
extremum_tracker: do not require default-constructible value types
chunked_vector was headed by short comment which didn't really explain
why it exists and how and why it really differs from std::dequeue.
Moreover, it made the vague claim that it "limits" contiguous
allocations, which it really doesn't (at least not in the asymptotic
sense).
In this patch I wrote a much longer comment, which I hope will clearly
explain exactly what chunked_vector is, how it really differs in its
contiguous allocations from std::deque, and what it guarantees and
doesn't guarantee.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#10857
This patch adds a reset method which is going to be used in the next patches
for updating the loadind_cache config and also to be able to flush the cache
without having to update scylla config
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
If the len2 argument to crc32_combine() is zero, then the crc2
argument must also be zero.
fast_crc32_combine() explicitly checks for len2==0, in which case it
ignores crc2 (which is the same as if it were zero).
zlib's crc32_combine() used to have that check prior to version
1.2.12, but then lost it, making its necessary for callers to be more
careful.
Also add the len2==0 check to the dummy fast_crc32_combine()
implementation, because it delegates to zlib's.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Closes#10731
The CPU cost of iterating over the relevant ELF structures is probably
negligible (despite the amount of code involved), but there is no need
to keep the containing page mapped in RAM when it doesn't have to be.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Current code already assumes (correctly), that shrink() does not
throw, otherwise we risk leaking memory allocated in get_ptr():
```
ts_value_lru_entry* new_lru_entry = Alloc().template allocate_object<ts_value_lru_entry>();
// Remove the least recently used items if map is too big.
shrink();
```
Let's be explicit and mark shrink() and a few helper methods
that it uses as noexcept. Ultimately they are all noexcept anyway,
because polymorphic allocator's deallocation routines don't throw,
and neither do boost intrusive list iterators.
Closes#10565
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.
Closes#10562
primitive_consumer::read_bytes() destroys and creates a vector for every value it reads.
This happens for every cell.
We can save a bit of work by reusing the vector.
Closes#10512
* github.com:scylladb/scylla:
sstables: consumer: reuse the fragmented_temporary_buffer in read_bytes()
utils: fragmented_temporary_buffer: add release()
This series enforces a minimum size of the unprivileged section when
performing `shrink()` operation.
When the cache is shrunk, we still drop entries first from unprivileged
section (as before this commit), however, if this section is already small
(smaller than `max_size / 2`), we will drop entries from the privileged
section.
This is necessary, as before this change the unprivileged section could
be starved. For example if the cache could store at most 50 entries and
there are 49 entries in privileged section, after adding 5 entries (that would
go to unprivileged section) 4 of them would get evicted and only the 5th one
would stay. This caused problems with BATCH statements where all
prepared statements in the batch have to stay in cache at the same time
for the batch to correctly execute.
To correctly check if the unprivileged section might get too small after
dropping an entry, `_current_size` variable, which tracked the overall size
of cache, is changed to two variables: `_unprivileged_section_size` and
`_privileged_section_size`, tracking section sizes separately.
New tests are added to check this new behavior and bookkeeping of the section
sizes. A test is added, that sets up a CQL environment with a very small
prepared statement cache, reproduces issue in #10440 and stresses the cache.
Fixes#10440.
Closes#10456
* github.com:scylladb/scylla:
loading_cache_test: test prepared stmts cache
loading_cache: force minimum size of unprivileged
loading_cache: extract dropping entries to lambdas
loading_cache: separately track size of sections
loading_cache: fix typo in 'privileged'
This patch enforces a minimum size of unprivileged section when
performing shrink() operation.
When the cache is shrank, we still drop entries first from unprivileged
section (as before this commit), however if this section is already small
(smaller than max_size / 2), we will drop entries from the privileged
section.
For example if the cache could store at most 50 entries and there are 49
entries in privileged section, after adding 5 entries (that would go to
unprivileged section) 4 of them would get evicted and only the 5th one
would stay. This caused problems with BATCH statements where all
prepared statements in the batch have to stay in cache at the same time
for the batch to correctly execute.
New tests are added to check this behavior and bookkeeping of section
sizes.
Fixes#10440.
This patch splits _current_size variable, which tracked the overall size
of cache, to two variables: _unprivileged_section_size and
_privileged_section_size.
Their sum is equal to the old _current_size, but now you can get the
size of each section separately.
lru_entry's cache_size() is replaced with owning_section_size() which
references in which counter the size of lru_entry is currently stored.
Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.
Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.
Closes#10433
Checking a concept in a requires-expression requires an additional
requires keyword. Moreover, the constraint is incorrect (at least
all callers pass a T, not a result<T>), so remove it.
Found by gcc 12.
Fixes the case of make_room() invoked with last_chunk_capacity_deficit
but _size not in the last reserved chunk.
Found during code review, no user impact.
Fixes#10364.
Message-Id: <20220411224741.644113-1-tgrabiec@scylladb.com>
Fixes the case of make_room() invoked with last_chunk_capacity_deficit
but _size not in the last reserved chunk.
Found during code review, no known user impact.
Fixes#10363.
Message-Id: <20220411222605.641614-1-tgrabiec@scylladb.com>
"
The database::shutdown() and ::drain() methods are called inside the
invoke_on_all()s synchronizing with each other via the cross-shard
_stop_barrier.
If either shard throws in between all others may get stuck waiting for
the barrier to collect all arrivals. To fix it the throwing shard
should wake up others, resolving the wait somehow.
The fix is actually patch #4, the first and the second are the abort()
method for the barrier itself.
Fixes: #10304
tests: unit(dev), manual
"
* 'br-barrier-exception-2' of https://github.com/xemul/scylla:
database: Abort barriers on exception
database: Coroutinize close_tables
test: Add test for cross_shard_barrier::abort()
cross-shard-barrier: Add .abort() method
If reserve() allocates more than one chunk, push_back() should not
work with the last chunk. This can result in items being pushed to the
wrong chunk, breaking internal invariants.
Also, pop_back() should not work with the last chunk. This breaks when
there is more than one chunk.
Currently, the container is only used in the sstable partition index
cache.
Manifests by crashes in sstable reader which touch sstables which have
partition index pages with more than 1638 partition entries.
Introduced in 78e5b9fd85 (4.6.0)
Fixes#10290
Message-Id: <20220407174023.527059-1-tgrabiec@scylladb.com>
The method makes all the .arrive_and_wait()s in the current phase
to resolve with barrier_aborted_exception() exceptional future.
The barrier turns into a broken state and is not supposed to serve
any subsequence arrivals anyhow reasonably.
The .abort() method is re-entrable in two senses. The first is that
more than one shard can abort a barrier, which is pretty natural.
The second is that the exception-safety fuses like that imply that
if the arrive_and_wait() resolves into exception the caller will try
to abort() the barrier as well, even though the phase would be over.
This case is also "supported".
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>