This patchset allows changing the configuration at runtime, The user
triggers this by editing the configuration file normally, then
signalling the database with SIGHUP (as is traditional).
The implementation is somewhat complicated due the need to store
non-atomic mutable state per-shard and to synchronize the values in
all shards. This is somewhat similar to Seastar's sharded<>, but that
cannot be used since the configuration is read before Seastar is
initialized (due to the need to read command-line options).
Tests: unit (dev, debug), manual test with extra prints (dev)
Ref #2689Fixes#2517.
Replace the per-shard value we store with an updateable_value_source, which
allows updating it dynamically and allows users to track changes.
The broadcast_to_all_shards() function is augmented to apply modifications
when called on a live system.
Since some of our values are not atomic (strings) and the administrative
information needed to track references to values is also not atomic, we will
need to store them per-shard. To do that we add a vector of per-shard data
to config_file, where each element is itself a vector of configuration items.
Since we need to operate generically on items (copying them from shard to shard)
we store them in a type-erased form.
Only mutable state is stored per-shard.
The updateable_value and updateable_value_source classes allow broadcasting
configuration changes across the application. The updateable_value_source class
represents a value that can be updated, and updateable_value tracks its source
and reflects changes. A typical use replaces "uint64_t config_item" with
"updateable_value<uint64_t> config_item", and from now on changes to the source
will be reflected in config_item. For more complicated uses, which must run some
callback when configuration changes, you can also call
config_item.observe(callback) to be actively notified of changes.
Currently, we allow adjusting configuration via
cfg.whatever() = 5;
by returning a mutable reference from cfg.whatever(). Soon, however, this operation
will have side effects (updating all references to the config item, and triggering
notifiers). While this can be done with a proxy, it is too tricky.
Switch to an ordinary setter interface:
cfg.whatever.set(5);
Because boost::program_options no longer gets a reference to the value to be written
to, we have to move the update to a notifier, and the value_ex() function has to
be adjusted to infer whether it was called with a vector type after it is
called, not before.
config_file and db::config are soon not going to be copyable. The reason is that
in order to support live updating, we'll need per-shard copies of each value,
and per-shard tracking of references to values. While these can be copied, it
will be an asycnronous operation and thus cannot be done from a copy constructor.
So to prepare for these changes, replace all copies of db::config by references
and delete config_file's copy constructor.
Some existing references had to be made const in order to adapt the const-ness
of db::config now being propagated (rather than being terminated by a non-const
copy).
This change aligns descriptors and values to 8 bytes so that poisoning
a descriptor or value doesn't interfere with other descriptors and
values.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
pow2_rank is undefined for 0.
bucket_of currently works around that by using a bitmask of 0.
To allow asserting that count_{leading,trailing}_zeros are not
called with 0, we want to avoid it at all call sites.
Fixes#4153
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190623162137.2401-1-bhalevy@scylladb.com>
With this patch, when using asan, we poison segment memory that has
been allocated from the system but should not be accessible to user
code.
Should help with debugging user after free bugs.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190607140313.5988-1-espindola@scylladb.com>
A lot of code in scylla is only reachable if SEASTAR_DEFAULT_ALLOCATOR
is not defined. In particular, refill_emergency_reserve in the default
allocator case is empty, but in the seastar allocator case it compacts
segments.
I am trying to debug a crash that seems to involve memory corruption
around the lsa allocator, and being able to use a debug build for that
would be awesome.
This patch reduces the differences between the two cases by having a
common segment_pool that defers only a few operations to different
segment_store implementations.
Tests: unit (debug, dev)
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190606020937.118205-1-espindola@scylladb.com>
The _make_config_values macro reduces duplication (both the item name
and the types need to be available as C++ identifiers and as runtime
strings), but is hard to work with. The macro is huge and editors
don't handle it well, errors aren't identified at the correct
location, and since the macro doesn't have types, it's hard to
refactor.
This series replaces the macro with ordinary C++ code. Some repetition is
introduced, but IMO the result is easier to maintain than the macro. As a
bonus the bulk of the code is moved away from the header file.
Tests: unit (dev), manual testing of the config REST API
* https://github.com/avikivity/scylla config-no-macro/v2
config: make the named_value type name available without requiring
_make_config_values
config: remove value_status from named_value template parameter list
config: add named_value::value_as_json()
api: config: stop using _make_config_values
config: auto-add named_values into config_file
config: add allowed_values parameter to named_value constructor
config: convert _make_config_values to individual named_value member
declarations and initializers
Currently, the REST API does its own conversion of named_value into json.
This requires it to use the _make_config_values macro to perform iteration
of all config items, since it needs to preserve the concrete type of the item
while iterating, so it can select the correct json conversion.
Since we want to remove that macro, we need to provide a different way to
convert a config item to json. So this patch adds a value_as_json().
To hide json_return_value from the rest of the system, we extend config_type
with a conversion function to handle the details. This usually calls
the json_return_type constructor directly, but when it doesn't have default
translation, it interposes a conversion into a type that json recognizes.
I didn't bother maintaining the existing type names, since they're C++
names which don't make sense for the UI.
The value_status is only needed at run-time, and removing it from the
template parameter list reduces type proliferation (which leads to code
bloat) and simplifies the code.
I want to remove the _make_config_values macro, but it is needed now in
api/config.cc to make the type names available. So as a first step, copy the
type names to config_src. Further changes can extract it from there.
Because we want to add more type infomation in following patches, place the type
name in a new config_type object, instead of allocating a string_view in
config_src.
compact_and_evict gets memory_to_release in bytes while
reclamation step is in segments.
Broken in f092decd90.
It doesn't make much difference with the current default step of 1
segment since we cannot reclaim less than that, so shouldn't cause
problems in practice.
Message-Id: <1556013920-29676-1-git-send-email-tgrabiec@scylladb.com>
When we start the LSA reclamation it can be that
segment_pool::_free_segments is 0 under some conditions and
segment_pool::_current_emergency_reserve_goal is set to 1. The
reclamation step is 1 segment, and compact_and_evict_locked() frees 1
segment back into the segment_pool. However,
segment_pool::reclaim_segments() doesn't free anything to the standard
allocator because the condition _free_segments >
_current_emergency_reserve_goal is false. As a result,
tracker::impl::reclaim() returns 0 as the amount of released memory,
tracker::reclaim() returns
memory::reclaiming_result::reclaimed_nothing and the seastar allocator
thinks it's a real OOM and throws std::bad_alloc.
The fix is to change compact_and_evict() to make sure that reserves
are met, by releasing more if they're not met at entry.
This change also allows us to drop the variant of allocate_segment()
which accepts the reclamation step as a means to refill reserves
faster. This is now not needed, because compact_and_evict() will look
at the reserve deficit to increase the amount of memory to reclaim.
Fixes#4445
Message-Id: <1555671713-16530-1-git-send-email-tgrabiec@scylladb.com>
When --abort-on-lsa-bad-alloc is enabled we want to abort whenever
we think we can be out of memory.
We covered failures due to bad_alloc thrown from inside of the
allocation section, but did not cover failures from reservations done
at the beginning of with_reserve(). Fix by moving the trap into
reserve().
Message-Id: <1553258915-27929-1-git-send-email-tgrabiec@scylladb.com>
Rather than {std::experimental,boost,seastar::compat}::filesystem
On Sat, 2019-03-23 at 01:44 +0200, Avi Kivity wrote:
> The intent for seastar::compat was to allow the application to choose
> the C++ dialect and have seastar follow, rather than have seastar choose
> the types and have the application follow (as in your patch).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This patch adds fragmented_temporary_buffer_view::remove_suffix(). It is
also necessary to adjust remove_prefix() since now the total size of all
fragments may be larger than the size of the view if both those
operations are performed.
"
cryptopp's config.h has the following pragma:
#pragma GCC diagnostic ignored "-Wunused-function"
It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.
This patch series introduces a single .cc file that has to include
cryptopp headers.
"
* 'avoid-cryptopp-v3' of https://github.com/espindola/scylla:
Avoid including cryptopp headers
Delete dead code
cryptopp's config.h has the following pragma:
#pragma GCC diagnostic ignored "-Wunused-function"
It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.
The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793
To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.
While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This code would have be to refactored by the next patch. Since it is
commented out, just delete it.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
allocate_segment() can fail even though we're not out of memory, when
it's invoked inside an allocating section with the cache region
locked. That section may later succeed after retried after memory
reclamation.
We should ignore bad_alloc thrown inside allocating section body and
fail only when the whole section fails.
Fixes#2924
Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com>
read_exactly(), when given a stream that does not contain the amount of data
requested, will loop endlessly, allocating more and more memory as it does, until
it fails with an exception (at which point it will release the memory).
Fix by returning an empty result, like input_stream::read_exactly() (which it
replaces). Add a test case that fails without a fix.
Affected callers are the native transport, commitlog replay, and internal
deserialization.
Fixes#4233.
Branches: master, branch-3.0
Tests: unit(dev)
Message-Id: <20190216150825.14841-1-avi@scylladb.com>
Default constructed extremum_tracker has uninitialised _default_value
which basically makes it never correct to do that. Since this class is a
mechanism and not a value it doesn't really need to be a regular type,
so let's drop the default constructor.
Message-Id: <20190207162430.7460-1-pdziepak@scylladb.com>
extremum_tracker allows choosing a default value that's going to be used
only if no "real" values were provided. Since it is never compared with
the actual input values it can be anything. For instance, if the minimum
tracker default value is 0 and there was one update with the value 1 the
detected minimum is going to be 1 (the default is ignored).
However, this doesn't work when the trackers are merged since that
process always leaves the destination tracker in the "set" state
regardless whether any of the merged trakcers has ever seen any value.
This is fixed by this patch, by properly preserving _is_set state on
merge.
When the reclaim request was satisfied from the pool there's no need
to call compact_and_evict_locked(). This allows us to avoid calling
boost::range::make_heap(), which is a tiny performance difference, as
well as some confusing log messages.
Message-Id: <1548091941-8534-1-git-send-email-tgrabiec@scylladb.com>
The compilation fails on -Warray-bounds, even though the branch is never taken:
inlined from ‘managed_bytes::managed_bytes(bytes_view)’ at ./utils/managed_bytes.hh:195:22,
inlined from ‘managed_bytes::managed_bytes(const bytes&)’ at ./utils/managed_bytes.hh:162:77,
inlined from ‘dht::token dht::bytes_to_token(bytes)’ at dht/random_partitioner.cc:68:57,
inlined from ‘dht::token dht::random_partitioner::get_token(bytes)’ at dht/random_partitioner.cc:85:39:
/usr/include/c++/8/bits/stl_algobase.h:368:23: error: ‘void* __builtin_memmove(void*, const void*, long unsigned int)’ offset 16 from the object at ‘<anonymous>’ is out of the bounds of referenced subobject ‘managed_bytes::small_blob::data’ with type ‘signed char [15]’ at offset 0 [-Werror=array-bounds]
__builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work around by disabling the diagnostic locally.
Message-Id: <1547205350-30225-1-git-send-email-tgrabiec@scylladb.com>
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.
Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.
Scylla now requires GCC 8 to compile.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
In c++17 there are standard ways of requesting aligned memory, so
seastar doesn't need to provide one.
This patch is in preparation for removing with_alignment from seastar.
Tests: unit (debug)
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190107191019.22295-1-espindola@scylladb.com>
The current implementation breaks the invariant that
_size_bytes = reduce(_fragments, &temporary_buffer::size)
In particular, this breaks algorithms that check the individual
segment size.
Correctly implement remove_suffix() by destroying superfluous
temporary_buffer's and by trimming the last one, if needed.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190103133523.34937-1-duarte@scylladb.com>
Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write
operation over a period of time.
Content:
- data_listener classes: mechanism that interfaces with mutation readers in database and table classes,
- toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this
interfaces with data_listeners and the REST api),
- REST api for toppartitions query.
Uses Top-k structure for handling stream summary statistics (based on implementation in C*, see #2811).
What's still missing:
- JMX interface to nodetool (interface customization may be required),
- Querying #rows and #bytes (currently, only #partitions is supported).
Fixes#2811
* https://github.com/avikivity/scylla rafie_toppartitions_v7.1:
top_k: whitespace and minor fixes
top_k: map template arguments
top_k: std::list -> chunked_vector
top_k: support for appending top_k results
nodetool toppartitions: refactor table::config constructor
nodetool toppartitions: data listeners
nodetool toppartitions: add data_listeners to database/table
nodetool toppartitions: fully_qualified_cf_name
nodetool toppartitions: Toppartitions query implementation
nodetool toppartitions: Toppartitions query REST API
nodetool toppartitions: nodetool-toppartitions script
Replaced std::list with chunked_vector. Because chunked_vector requires
a noexcept move constructor from its value type, change the bad_boy type
in the unit test not to throw in the move constructor.
Signed-off-by: Rafi Einstein <rafie@scylladb.com>
Validate ascii string by ORing all bytes and check if 7-th bit is 0.
Compared with original std::any_of(), which checks ascii string byte
by byte, this new approach validates input in 8 bytes and two
independent streams. Performance is much higher for normal cases,
though slightly slower when string is very short. See table below.
Speed(MB/s) of ascii string validation
+---------------+-------------+---------+
| String length | std::any_of | u64 x 2 |
+---------------+-------------+---------+
| 9 bytes | 1691 | 1635 |
+---------------+-------------+---------+
| 31 bytes | 2923 | 3181 |
+---------------+-------------+---------+
| 129 bytes | 3377 | 15110 |
+---------------+-------------+---------+
| 1039 bytes | 3357 | 31815 |
+---------------+-------------+---------+
| 16385 bytes | 3448 | 47983 |
+---------------+-------------+---------+
| 1048576 bytes | 3394 | 31391 |
+---------------+-------------+---------+
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1544669646-31881-1-git-send-email-yibo.cai@arm.com>
"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.
Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.
perf_simple_query -c4 --duration 60, medians:
./perf_before ./perf_after diff
read 343086.80 360720.53 5.1%
Tests: unit(release, small_vector in debug)
"
* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
partition_slice: use small_vector for column_ids
mutation_fragment_merger: use small_vector
auth: use small_vector in resource
auth: avoid list-initialisation of vectors
idl: serialiser: add serialiser for utils::small_vector
idl: serialiser: deduplicate vector serialisers
utils: introduce small_vector
intrusive_set_external_comparator: make iterator nothrow move constructible
mutation_fragment_merger: value-initialise iterator