Move the definition of bool_class can_yield to a standalone
header file and define there a maybe_yield(can_yield) helper.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This PR allows changing the hinted_handoff_enabled option in runtime, either by modifying and reloading YAML configuration, or through HTTP API.
This PR also introduces an important change in semantics of hinted_handoff_enabled:
- Previously, hinted_handoff_enabled controlled whether _both writing and sending_ hints is allowed at all, or to particular DCs,
- Now, hinted_handoff_enabled only controls whether _writing hints_ is enabled. Sending hints from disk is now always enabled.
Fixes: #5634
Tests:
- unit(dev) for each commit of the PR
- unit(debug) for the last commit of the PR
Closes#6916
* github.com:scylladb/scylla:
api: allow changing hinted handoff configuration
storage_proxy: fix wrong return type in swagger
hints_manager: implement change_host_filter
storage_proxy: always create hints manager
config: plug in hints::host_filter object into configuration
db/hints: introduce host_filter
hints/resource_manager: allow registering managers after start
hints: introduce db::hints::directory_initializer
directories.cc: prepare for use outside main.cc
std::iterator is deprecated since C++17 so define all the required iterator_traits directly and stop using std::iterator at all.
More context: https://www.fluentcpp.com/2018/05/08/std-iterator-deprecated
Tests: unit(dev)
Closes#7635
* github.com:scylladb/scylla:
log_heap: Remove std::iterator from hist_iterator
types: Remove std::iterator from tuple_deserializing_iterator
types: Remove std::iterator from listlike_partial_deserializing_iterator
sstables: remove std::iterator from const_iterator
token_metadata: Remove std::iterator from tokens_iterator
size_estimates_virtual_reader: Remove std::iterator
token_metadata: Remove std::iterator from tokens_iterator_impl
counters: Remove std::iterator from iterators
compound_compat: Remove std::iterator from iterators
compound: Remove std::iterator from iterator
clustering_interval_set: Remove std::iterator from position_range_iterator
cdc: Remove std::iterator from collection_iterator
cartesian_product: Remove std::iterator from iterator
bytes_ostream: Remove std::iterator from fragment_iterator
scoped_critical_alloc_section was recently introduced to replace
disable_failure_guard and made the old class deprecated.
This patch replaces all occurences of disable_failure_guard with
scoped_critical_alloc_section.
Without this patch the build prints many warnings like:
warning: 'disable_failure_guard' is deprecated: Use scoped_critical_section instead [-Wdeprecated-declarations]
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <ca2a91aaf48b0f6ed762a6aa687e6ac5e936355d.1605621284.git.piotr@scylladb.com>
Currently, the `directories` class is used exclusively during
initialization, in the main() function. This commit refactors this class
so that it is possible to use it to initialize directories much later
after startup.
The intent of this change is to make it possible for hints manager to
create directories for hints lazily. Currently, when Scylla is booted
with hinted handoff disabled, the `hints_directory` config parameter is
ignored and directories for hints are neither created nor verified.
Because we would like to preserve this behavior and introduce
possibility to switch hinted handoff on in runtime, the hints
directories will have to be created lazily the first time hinted handoff
is enabled.
meaningful
When computing moving average rates too early after startup, the
rate can be infinite, this is simply because the sample interval
since the system started is too small to generate meaningful results.
Here we check for this situation and keep the rate at 0 if it happens
to signal that there are still no meaningful results.
This incident is unlikely to happen since it can happen only during a
very small time window after restart, so we add a hint to the compiler
to optimize for that in order to have a minimum impact on the normal
usecase.
Fixes#4469
"
Introduce a gentle (yielding) implementation of reserve for chunked
vector and use it when reserving the backing storage vector for large
bitset. Large bitset is used by bloom filters, which can be quite large
and have been observed to cause stalls when allocating memory for the
storage.
Fixes: #6974
Tests: unit(dev)
"
* 'gentle-reserve/v1' of https://github.com/denesb/scylla:
utils/large_bitset: use reserve_partial() to reserve _storage
utils/chunked_vector: add reserve_partial()
To avoid stalls when reserving memory for a large bloom filter. The
filter creation already has a yielding loop for initialization, this
patch extends it to reservation of memory too.
A variant of reserve() which allows gentle reserving of memory. This
variant will allocate just one chunk at a time. To drive it to
completion, one should call it repeatedly with the return value of the
previous call, until it returns 0.
This variant will be used in the next patch by the large bitset creation
code, to avoid stalls when allocating large bloom filters (which are
backed by large bitset).
This is the continuation of 30722b8c8e, so let me re-cite Rafael:
The constructors of these global variables can allocate memory. Since
the variables are thread_local, they are initialized at first use.
There is nothing we can do if these allocations fail, so use
disable_failure_guard.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20201028140553.21709-1-xemul@scylladb.com>
Currently, we linearize large UTF8 cells in order to validate them.
This can cause large latency spikes if the cell is large.
This series changes UTF8 validation to work on fragmented buffers.
This is somewhat tricky since the validation routines are optimized
for single-instruction-multiple-data (SIMD) architectures.
The unit tests are expanded to cover the new functionality.
Fixes#7448.
Closes#7449
* github.com:scylladb/scylla:
types: don't linearize utf8 for validation
test: utf8: add fragmented buffer validation tests
utils: utf8: add function to validate fragmented buffers
utils: utf8: expose validate_partial() in a header
utils: utf8: introduce validate_partial()
utils: utf8: extract a function to evaluate a single codepoint
Add a function to validate fragmented buffers. We validate
each buffer with SIMD-optimized validate_partial(), then
collect the codepoint that spans buffer boundaries (if any)
in a temporary buffer, validate that too, and continue.
The current validators expect the buffer to contain a full
UTF-8 string. This won't be the case for fragmented buffers,
since a codepoint can straddle two (or more) buffers.
To prepare for that, convert the existing validators to
validate_partial(), which returns either an error, or
success with an indication of the size of the tail that
was not validated and now many bytes it is missing.
This is natural since the SIMD validators already
cannot process a tail in SIMD mode if it's smaller than
the vector size, so only minor rearrangements are needed.
In addition, we now have validate_partial() for non-SIMD
architectures, since we'll need it for fragmented buffer
validation.
Our SIMD optimized validators cannot process a codepoint that
spans multiple buffers, and adapting them to be able to will slow
them down. So our strategy is to special-case any codepoint that
spans two buffers.
To do that, extract an evaluate_codepoint() function from the
current validate_naive() function. It returns three values:
- if a codepoint was successfully decoded from the buffer,
how many bytes were consumed
- if not enough bytes were in the buffer, how many more
are needed
- otherwise, an error happened, so return an indication
The new function uses a table to calculate a codepoint's
size from its first byte, similar to the SIMD variants.
validate_naive() is now implemented in terms of
evaluate_codepoint().
The input range to utils::to_range() should be indeed a range,
but clang has trouble compiling <ranges> which causes it to fail.
Relax the constraint until this is fixed.
Fixes#7424
AWS sdk (kinesis) assumes SequenceNumbers are monotonically
growing bigints. Since we sort on and use timeuuids are these
a "raw" bit representation of this will _not_ fulfill the
requirement. However, we can "unwrap" the timestamp of uuid
msb and give the value as timestamp<<64|lsb, which will
ensure sort order == bigint order.
Currently, the serialized_action error is set to a shared_promise,
but is not returned to the caller, unless there is an
already outstanding action.
Note that setting the exception to the promise when noone
collected it via the shared_future caused 'Exceptional future ignored'
warning to be issued, as seen in #7352.
Fixes#7352
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, if `with_semaphore` returns exceptional future, it is not
propagated to the promise, and other waiters that got a shared
future will not see that.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This utility function converts a potentially large number to a compact
representation, composed of at most 4 digits and a letter appropriate to
the power of two the number has to multiplied with to arrive to the
original number (with some loss of precision).
The different powers of two are the conventional 2 ** (N * 10) variants:
* N=0: (B)ytes
* N=1: (K)bytes
* N=2: (M)bytes
* N=3: (G)bytes
* N=4: (T)bytes
Examples:
* 87665 will be converted to 87K
* 1024 will be converted to 1K
Clang has some difficulty with the boost::cpp_int constructor from string_view.
In fact it is a mess of enable_if<>s so a human would have trouble too.
Work around it by converting to std::string. This is bad for performance, but
this constructor is not going to be fast in any case.
Hopefully a fix will arrive in clang or boost.
Closes#7389
get_nt_build_id() constructs a pointer by adding a base and an
offset, but if the base happens to be zero, that is undefined
under C++ rules (altough legal ELF).
Fix by performing the addition on integers, and only then
casting to a pointer.
Closes#7379
b1e78313fe added a check for ubsan to squelch a false positive,
but that check doesn't work with clang. Relax it to check for debug
mode, so clang doesn't hit the same false positive as gcc did.
Define a SANITIZE macro so we have a reliable way to detect if
we're running with a sanitizer.
Closes#7372
Our AVX2 implementation cannot load a partial vector,
or mask unused elements (that can be done with AVX-512/SVE2),
so it has some restrictions. Document them.
Closes#7385
Offsetting a null pointer is undefined, and clang's ubsan complains.
Rearrange the arithmetic so we never offset a null pointer. A function
is introduced for the remaining contiguous bytes so it can cast the result
to size_t, avoiding a compare-of-different-signedness warning from gcc.
Closes#7373
Clang has a bug processing inline ifuncs with intrinsics[1].
Since ifuncs can't be inlined anyway (they are always dispatched
via a function pointer that is determined based on the CPU
features present), nothing is gained by inlining them. Deinlining
therefore reduces compile time and works around the clang bug.
[1] https://bugs.llvm.org/show_bug.cgi?id=47691Closes#7358
We have templates for multiprecision_int for both sides of the operator,
for example:
template <typename T>
bool operator==(const T& x) const
and
template <typename T>
friend bool operator==(const T& x, const multiprecision_int& y)
Clang considers them equally satisfying when both operands are
multiprecision_int, so provide a disambiguating overload.
Clang dislikes forward-declared functions returning auto, so declare the
type up front. Functions returning auto are a readability problem
anyway.
To solve a circular dependency problem (get_local_injector() ->
error_injection<> -> get_local_injector()), which is further compounded
by problems in using template specializations before they are defined
(which is forbidden), the storage for get_local_injector() was moved
to error_injection<>, and get_local_injector() is just an accessor.
After this, error_injection<> does not depend on get_local_injector().
"
Migration manager installs several cluster feature change listeners.
The listeners will call update_schema_version_and_announce() when cluster
features are enabled, which does this:
return update_schema_version(proxy, features).then([] (utils::UUID uuid) {
return announce_schema_version(uuid);
});
It first updates the schema version and then publishes it via
gossip in announce_schema_version(). It is possible that the
announce_schema_version() part of the first schema change will be
deferred and will execute after the other four calls to
update_schema_version_and_announce(). It will install the old schema
version in gossip instead of the more recent one.
The fix is to serialize schema digest calculation and publishing.
Refs #7200
This problem also brought my attention to initialization code, which could be
prone to the same problem.
The storage service computes gossiper states before it starts the
gossiper. Among them, node's schema version. There are two problems with that.
First is that computing the schema version and publishing it is not
atomic, so is not safe against concurrent schema changes or schema
version recalculations. It will not exclude with
recalculate_schema_version() calls, and we could end up with the old
(and incorrect) schema version being advertised in gossip.
Second problem is that we should not allow the database layer to call
into the gossiper layer before it is fully initialized, as this may
produce undefined behavior.
Maybe we're not doing concurrent schema changes/recalculations now,
but it is easy to imagine that this could change for whatever reason
in the future.
The solution for both problems is to break the cyclic dependency
between the database layer and the storage_service layer by having the
database layer not use the gossiper at all. The database layer
publishes schema version inside the database class and allows
installing listeners on changes. The storage_service layer asks the
database layer for the current version when it initializes, and only
after that installs a listener which will update the gossiper.
Tests:
- unit (dev)
- manual (3 node ccm)
"
* tag 'fix-schema-digest-calculation-race-v1' of github.com:tgrabiec/scylla:
db, schema: Hide update_schema_version_and_announce()
db, storage_service: Do not call into gossiper from the database layer
db: Make schema version observable
utils: updateable_value_source: Introduce as_observable()
schema: Fix race in schema version recalculation leading to stale schema version in gossip
The log-structured allocator (LSA) reserves memory when performing
operations, since its operations are performed with reclaiming disabled
and if it runs out, it cannot evict cache to gain more. The amount of
memory to reserve is remembered across calls so that it does not have
to repeat the fail/increase-reserve/retry cycle for every operation.
However, we currently lack decaying the amount to reserve. This means
that if a single operation increased the reserve in the distant past,
all current operations also require this large reserve. Large reserves
are expensive since they can cause large amounts of cache to be evicted.
This patch adds reserve decay. The time-to-decay is inversely proportional
to reserve size: 10GB/reserve. This means that a 20MB reserve is halved
after 500 operations (10GB/20MB) while a 20kB reserve is halved after
500,000 operations (10GB/20kB). So large, expensive reserves are decayed
quickly while small, inexpensive reserves are decayed slowly to reduce
the risk of allocation failures and exceptions.
A unit test is added.
Fixes#325.
Add new validate_with_error_position function
which returns -1 if data is a valid UTF-8 string
or otherwise a byte position of first invalid
character. The position is added to exception
messages of all UTF-8 parsing errors in Scylla.
validate_with_error_position is done in two
passes in order to preserve the same performance
in common case when the string is valid.
... and tests. Printin a pointer in logs is considered to be a bad practice,
so the proposal is to keep this explicit (with fmt::ptr) and allow it for
.debug and .trace cases.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
The messaging service is (as many other services) present in
the global namespace and is widely accessed from where needed
with global get(_local)?_messaging_service() calls. There's a
long-term task to get rid of this globality and make services
and componenets reference each-other and, for and due-to this,
start and stop in specific order. This set makes this for the
messaging service.
The service is very low level and doesn't depend on anything.
It's used by gossiper, streaming, repair, migration manager,
storage proxy, storage service and API. According to this
dependencies the set consists of several parts:
patches 1-9 are preparatory, they encapsulate messaging service
init/fini stuff in its own module and decouple it from the
db::config
patch 10-12 introduce local service reference in main and set
its init/fini calls at the early stage so that this reference
can later be passed to those depending on it
patches 13-42 replace global referencing of messaging service
from other subsystems with local references initialized from
main.
patch 43 finalizes tests.
patch 44 wraps things up with removing global messaiging service
instance along with get(_local)?_messaging_service calls.
The service's stopping part is deliberately left incomplete (as
it is now), the sharded service remains alive, only the instance's
stop() method is called (and is empty for a while). Since the
messaging service's users still do not stop cleanly, its instances
should better continue leaking on exit.
Once (if) the seastar gets the helper rpc::has_handlers() method
merged the messaging_service::stop() will be able to check if all
the verbs had been unregistered (spoiler: not yet, more fixes to
come).
For debugging purposes the pointer on now-local messaging service
instance is kept in service::debug namespace.
tests: unit(dev)
dtest(dev: simple_boot_shutdown, repair, update_cluster_layout)
manual start-stop
"
* 'br-unglobal-messaging-service-2' of https://github.com/xemul/scylla: (44 commits)
messaging_service: Unglobal messaging service instance
tests: Use own instances of messaging_service
storage_service: Use local messaging reference
storage_service: Keep reference on sharded messaging service
migration_manager: Add messaging service as argument to get_schema_definition
migration_manager: Use local messaging reference in simple cases
migration_manager: Keep reference on messaging
migration_manager: Make push_schema_mutation private non-static method
migration_manager: Move get_schema_version verb handling from proxy
repair: Stop using global messaging_service references
repair: Keep sharded messaging service reference on repair_meta
repair: Keep sharded messaging service reference on repair_info
repair: Keep reference on messaging in row-level code
repair: Keep sharded messaging service in API
repair: Unset API endpoints on stop
repair: Setup API endpoints in separate helper
repair: Push the sharded<messaging_service> reference down to sync_data_using_repair
repair: Use existing sharded db reference
repair: Mark repair.cc local functions as static
streaming: Keep messaging service on send_info
...
Contains patch from Rafael to fix up includes.
* seastar c872c3408c...7f7cf0f232 (9):
> future: Consider result_unavailable invalid in future_state_base::ignore()
> future: Consider result_unavailable invalid in future_state_base::valid()
> Merge "future-util: split header" from Benny
> docs: corrected some text and code-examples in streaming-rpc docs
> future: Reduce nesting in future::then
> demos: coroutines: include std-compat.hh
> sstring: mark str() and methods using it as noexcept
> tls: Add an assert
> future: fix coroutine compilation
Now the init_messaging_service() only deals with messaing service
and related internal stuff, so it can sit in its own module.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.
`contains` does not only express the intend of the code better but also
does it in more unified way.
This commit replaces all the occurences of the `count` with the
`contains`.
Tests: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>