Option names given in db/config.cc are handled for the command line by passing
them to boost::program_options, and by YAML by comparing them with YAML
keys.
boost::program_options has logic for understanding the
long_name,short_name syntax, so for a "workdir,W" option both --workdir and -W
worked, as intended. But our YAML config parsing doesn't have this logic
and expected "workdir,W" verbatim, which is obviously not intended. Fix that.
Fixes#7478Fixes#9500Fixes#11503Closes#11506
(cherry picked from commit af7ace3926)
Scylla's Bloom filter implementation has a minimal false-positive rate
that it can support (6.71e-5). When setting bloom_filter_fp_chance any
lower than that, the compute_bloom_spec() function, which writes the bloom
filter, throws an exception. However, this is too late - it only happens
while flushing the memtable to disk, and a failure at that point causes
Scylla to crash.
Instead, we should refuse the table creation with the unsupported
bloom_filter_fp_chance. This is also what Cassandra did six years ago -
see CASSANDRA-11920.
This patch also includes a regression test, which crashes Scylla before
this patch but passes after the patch (and also passes on Cassandra).
Fixes#11524.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11576
(cherry picked from commit 4c93a694b7)
When cross-shard barrier is abort()-ed it spawns a background fiber
that will wake-up other shards (if they are sleeping) with exception.
This fiber is implicitly waited by the owning sharded service .stop,
because barrier usage is like this:
sharded<service> s;
co_await s.invoke_on_all([] {
...
barrier.abort();
});
...
co_await s.stop();
If abort happens, the invoke_on_all() will only resolve _after_ it
queues up the waking lambdas into smp queues, thus the subseqent stop
will queue its stopping lambdas after barrier's ones.
However, in debug mode the queue can be shuffled, so the owning service
can suddenly be freed from under the barrier's feet causing use after
free. Fortunately, this can be easily fixed by capturing the shared
pointer on the shared barrier instead of a regular pointer on the
shard-local barrier.
fixes: #11303
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#11553
Range tombstones are kept in memory (cache/memtable) in
range_tombstone_list. It keeps them deoverlapped, so applying a range
tombstone which covers many range tombstones will erase existing range
tombstones from the list. This operation needs to be exception-safe,
so range_tombstone_list maintains an undo log. This undo log will
receive a record for each range tombstone which is removed. For
exception safety reasons, before pushing an undo log entry, we reserve
space in the log by calling std::vector::reserve(size() + 1). This is
O(N) where N is the number of undo log entries. Therefore, the whole
application is O(N^2).
This can cause reactor stalls and availability issues when replicas
apply such deletions.
This patch avoids the problem by reserving exponentially increasing
amount of space. Also, to avoid large allocations, switches the
container to chunked_vector.
Fixes#11211Closes#11215
(cherry picked from commit 7f80602b01)
The logger is proof against allocation failures, except if
--abort-on-seastar-bad-alloc is specified. If it is, it will crash.
The reclaim stall report is likely to be called in low memory conditions
(reclaim's job is to alleviate these conditions after all), so we're
likely to crash here if we're reclaiming a very low memory condition
and have a large stall simultaneously (AND we're running in a debug
environment).
Prevent all this by disabling --abort-on-seastar-bad-alloc temporarily.
Fixes#11549Closes#11555
(cherry picked from commit d3b8c0c8a6)
Don't open-code calling the region_impl
_listeners->moved() in region move-constructor
and move-assignment op.
The other._impl->_region might be different then &other
post region::merge so let the region_impl
decide which region* is moved from.
The new_region is also set to region_impl->_region
so need to open-code that either in the said call sites.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The other _impl is presumed to be engaged already,
so just call other.get_impl() once for both use cases.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We can't be sure that the other_impl->_region == &other
since it could be a result of a previous merge,
so don't decide for it which region to unlisten to,
let it use its current _region.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Current ~region and region::operator= open-code
region_impl::unlisten. Just call it so it will be
easier to maintain.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
First check if _impl is engaged before accessing it
to set its _region = this in the move constructor and
move assignment operator.
Add unit test for these odd orner cases.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This series is the first step in the effort to reduce the number of metrics reported by Scylla.
The series focuses on the per-table metrics.
The combination of histograms, per-tables, and per shard makes the number of metrics in a cluster explode.
The following series uses multiple tools to reduce the number of metrics.
1. Multiple metrics should only be reported for the user tables and the condition that checked it was not updated when more non-user keyspaces were added.
2. Second, instead of a histogram, per table, per shard, it will report a summary per table, per shard, and a single histogram per node.
3. Histograms, summaries, and counters will be reported only if they are used (for example, the cas-related metrics will not be reported for tables that are not using cas).
Closes#11058
* github.com:scylladb/scylla:
Add summary_test
database: Reduce the number of per-table metrics
replica/table.cc: Do not register per-table metrics for system
histogram_metrics_helper.hh: Add to_metrics_summary function
Unified histogram, estimated_histogram, rates, and summaries
Split the timed_rate_moving_average into data and timer
utils/histogram.hh: should_sample should use a bitmask
estimated_histogram: add missing getter method
The to_metrics_summary is a helper function that create a metrics type
summary from a timed_rate_moving_average_with_summary object.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Currently, there are two metrics reporting mechanisms: the metrics layer
and the API. In most cases, they use the same data sources. The main
difference is around histograms and rate.
The API calculates an exponentially weighted moving average using a
timer that decays the average on each time tick. It calculates a
poor-man histogram by holding the last few entries (typically the last
256 entries). The caller to the API uses those last entries to build a
histogram.
We want to add summaries to Scylla. Similar to the API rate and
histogram, summaries are calculated per time interval.
This patch creates a unified mechanism by introducing an object that
would hold both the old-style histogram and the new
(estimated_histogram). On each time tick, a summary would be calculated.
In the future, we'll replace the API to report summaries instead of the
old-style histogram and deprecate the old style completely.
summary_calculator uses two estimated_histogram to calculate a summary.
timed_rate_moving_average_summary_and_histogram is a unifed class for
ihistogram, rates, summary, and estimated_histogram and will replace
timed_rate_moving_average_and_histogram.
Follow-up patches would move code from using
timed_rate_moving_average_and_histogram to
timed_rate_moving_average_summary_and_histogram. By keeping the API it
would make the transition easy.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
It was added in d20fae96a2
as a precaution not to invalidate iterators while
traversing _regions. However it is not requried as no allocation
is done on this synchronous path - therefore there is no
point in preventing reclaim.
This will allow making the respective functions const
as they merely return stats and do not modify the tracker impl.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To make the implementation inline and to prepare
for the next patch that adds a const overload of
this method.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Some methods were also marked inline when declared in the class
definition and in the ir definition site to provide a hint to
the compiler to inline them.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
dynamic_bitset allocates only when constructed.
then on it doesn't throw.
Though not that accessing bits out of range
is undefined behavior.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Maintain the const chain by returning a const segment*
from segment_from_idx() const overload.
And add a respective mutable overload to return a mutable segment*.
This is done for a similar change in idx_from_segment.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Add a const noexcept overload of `find_empty()` so that
can_allocate_more_segments can be const noexcept as well.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This patch split the timed_rate_moving_average functionality into two, a
data class: rates_moving_average, and a wrapper class
timed_rate_moving_average that uses a timer to update the rates
periodically.
To make the transition as simple as possible timed_rate_moving_average,
takes the original API.
A new helper class meter_timer was introduced to handle the timer update
functionality.
This change required minimal code adaptation in some other parts of the
code.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This patch fixes a bug in should_sample that uses its bitmask
incorrectly.
basic_ihistogram has a feature that allows it to sample values instead
of taking a timer each time.
To decide if it should sample or not, it uses a bitmask. The bitmask
is of the form 2^n-1, which means 1 out of 2^n will be sampled.
For example, if the mask is 0x1 (2^2-1) 1 out of 2 will be sampled.
If the mask is 0x7 (2^3-1) 1 out of 8 will be sampled.
There was a bug in the should_sampled() method.
The correct form is (value&mask) == mask
Ref #2747
It does not solve all of #2747, just the bug part of it.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
With the region heap handle removed from logalloc::region, there is
nothing remaining there that needs violation of the abstraction
boundary, so we can drop these hacks.