Commit Graph

434 Commits

Author SHA1 Message Date
Tomasz Grabiec
8d69d217af lsa: Guarantee invalidated references on allocating section retry
There is existing code (e.g. use of partition_snapshot_row_cursor in
cache_streamed_mutation) which assumes that references will be
invalidated when bad_alloc is thrown from allocating_section. That is
currently the case because on retry we will attempt memory reclamation
which will invalidate references either through compaction or
eviction. Make this guarantee explicit.
2017-11-13 20:55:13 +01:00
Daniel Fiala
ce2f010859 utils/big_decimal: Added necessary operators and methods for aggregate functions.
Signed-off-by: Daniel Fiala <daniel@scylladb.com>
2017-11-12 15:51:29 +01:00
Tomasz Grabiec
5348d9f596 managed_bytes: Declare copy constructor as allocation point
Because of the small size optimization, not all copies will call the
allocator, so allocation failure injection may miss this site if the
value is not large enough. Make the testing more effective by marking
this place explicitly as an allocation point.
2017-11-07 15:33:24 +01:00
Tomasz Grabiec
34ccf234ea Integrate with allocation failure injection framework 2017-11-07 15:33:24 +01:00
Calle Wilund
287b6fd8bd config_file: Add optional "error_handler" to yaml parse functions
Allowing parse errors / unknown options to be ignored.
2017-11-06 09:53:05 +00:00
Duarte Nunes
044b8deae4 Merge 'Solves problems related to gossip which can be observed in a large cluster' from Tomasz
"The main problem fixed is slow processing of application state changes.
This may lead to a bootstrapping node not having up to date view on the
ring, and serve incorrect data.

Fixes #2855."

* tag 'tgrabiec/gossip-performance-v3' of github.com:scylladb/seastar-dev:
  gms/gossiper: Remove periodic replication of endpoint state map
  gossiper: Check for features in the change listener
  gms/gossiper: Replicate changes incrementally to other shards
  gms/gossiper: Document validity of endpoint_state properties
  storage_service: Update token_metadata after changing endpoint_state
  gms/gossiper: Process endpoints in parallel
  gms/gossiper: Serialize state changes and notifications for given node
  utils/loading_shared_values: Allow Loader to return non-future result
  gms/gossiper: Encapsulate lookup of endpoint_state
  storage_service: Batch token metadata and endpoint state replication
  utils/serialized_action: Introduce trigger_later()
  gossiper: Add and improve logging
  gms/gossiper: Don't fire change listeners when there is no change
  gms/gossiper: Allow parallel apply_state_locally()
  gms/gossiper: Avoid copies in endpoint_state::add_application_state()
  gms/failure_detector: Ignore short update intervals
2017-10-18 10:13:25 +01:00
Duarte Nunes
c468e59817 Merge 'Extract config file mechanism + allow additional' from Calle
"Extracts the yaml/boost-po aspects of the "self-describing" db::config
into an abstract type.

db::config is then reimplemented in said type, removing some of the
slightly cumbersome entanglement with seastar opts (log).

Adds a main hook for additional configuration files (options + file)"

* 'calle/config' of github.com:scylladb/seastar-dev:
  main/init: Add registerable configuration objects
  db::config: Re-implement on utils/config_file.
  utils::config_file: Abstract out config file to external type
2017-10-18 09:50:53 +01:00
Tomasz Grabiec
f7a7e97095 utils/loading_shared_values: Allow Loader to return non-future result 2017-10-18 08:49:52 +02:00
Tomasz Grabiec
2e2ae4671e utils/serialized_action: Introduce trigger_later()
Can be used instead of trigger() to improve batching.
2017-10-18 08:49:52 +02:00
Calle Wilund
05db87e068 utils::config_file: Abstract out config file to external type
Handling all the boost::commandline + YAML stuff.
This patch only provides an external version of these functions,
it does not modify the db::config object. That is for a follow-up
patch.
2017-10-18 00:51:41 +00:00
Pekka Enberg
ae92055b52 Merge "Bring histogram closer to what Prometheus expects" from Glauber
"Histograms are a native prometheus type, and there are many functions
available that operate on them. There is extensive documentation about
them at https://prometheus.io/docs/practices/histograms/

One example is the function histogram_quantile(), that can extract
useful quantiles from the histograms. Currently, those functions don't
work well.

The reasons are twofold:
1) We are only exporting 16 metrics, starting from 1usec. That means
that the highest latency we can differentiate is 4ms. After that,
everything falls into the same bin.

2) The format that prometheus expects is that each bin will contain
the total number of points seen *up until that bin*, while we
currently export the total number of points that falls between bins.
IOW, it is a cummulative histogram.

About point two, granted it is a bit hidden in their website, but it is
there. The following phrase about a caveat make it clear:

"Note that we divide the sum of both buckets. The reason is that the
histogram buckets are cumulative. The le="0.3" bucket is also contained
in the le="1.2" bucket; dividing it by 2 corrects for that."

It is also not needed to accumulate things that fall over the last bin:
the _count component of the histogram will already account for that."

Acked-by: Amnon Heiman <amnon@scylladb.com>
Acked-by: Gleb Natapov <gleb@scylladb.com>

* 'prometheus-histograms' of github.com:glommer/scylla:
  storage_proxy: change reporting of estimated histograms
  estimated_histogram: bring histogram closer to what prometheus expects.
2017-10-17 20:23:10 +03:00
Tomasz Grabiec
68fe1a5bee utils/loading_cache: Fix compilation on older compilers
Message-Id: <1507728312-10585-1-git-send-email-tgrabiec@scylladb.com>
2017-10-12 14:55:34 +03:00
Botond Dénes
d2b294dc06 loading_cache: prepend this-> to method calls on captured this
To make gcc 6.3 happy.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <849402e20a1ffa6f603eff4fe295981a94b9ca79.1507282527.git.bdenes@scylladb.com>
2017-10-06 12:09:34 +02:00
Vlad Zolotarov
1394e781be utils + cql3: use a functor class instead of std::function
Define value_extractor_fn as a functor class instead of std::function.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1507137724-2408-2-git-send-email-vladz@scylladb.com>
2017-10-05 15:29:51 +01:00
Glauber Costa
fc4416abcc estimated_histogram: bring histogram closer to what prometheus expects.
Histograms are a native prometheus type, and there are many functions
available that operate on them. There is extensive documentation about
them at https://prometheus.io/docs/practices/histograms/

One example is the function histogram_quantile(), that can extract
useful quantiles from the histograms. Currently, those functions don't
work well.

The reasons are twofold:
1) We are only exporting 16 metrics, starting from 1usec. That means
   that the highest latency we can differentiate is 4ms. After that,
   everything falls into the same bin.

2) The format that prometheus expects is that each bin will contain
   the total number of points seen *up until that bin*, while we
   currently export the total number of points that falls between bins.
   IOW, it is a cummulative histogram.

About point two, granted it is a bit hidden in their website, but it is
there. The following phrase about a caveat make it clear:

 "Note that we divide the sum of both buckets. The reason is that the
  histogram buckets are cumulative. The le="0.3" bucket is also contained
  in the le="1.2" bucket; dividing it by 2 corrects for that."

It is also not needed to accumulate things that fall over the last bin:
the _count component of the histogram will already account for that.

This patch changes the histogram format to be more in line with what
prometheus expect.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-10-04 20:01:13 -04:00
Calle Wilund
801ee44cb8 class_registrator: Fix qualified name matching + provider helpers
Should not assume namespace "org", nor should we allow "loose"
substring matching.
2017-10-04 12:43:42 +02:00
Calle Wilund
3c509e0333 class_registrator: Allow different return types
Allows registry to give back, for example, shared_ptr etc instead of
solely unique_ptr. If a registry is defined with seastar/std
shared/lw_shared/unique_ptr as "BaseType", the type will assume
this is the intended result type.
2017-10-04 12:43:42 +02:00
Paweł Dziepak
fdfa6703c3 Merge "loading_shared_values and size limited and evicting prepared statements cache" from Vlad
"
The original motivation for the "utils: introduce a loading_shared_values" series was a hinted handoff work where
I needed an on-demand asynchronously loading key-value container (a replica address to a commitlog instance map).

It turned out that we already have the classes that do almost what I needed:
   - utils::loading_cache
   - sstables::shared_index_lists

Therefore it made sense to find a common ground, unify this functionality and reuse the code both in the classes above and in the
new hinted handoff code.

This series introduces the utils::loading_shared_values that generalizes the sstables::shared_index_lists
API on top of bi::unordered_set with the rehashing logic from the utils::loading_cache triggered by an addition
of an entry to the set (PATCH1).

Then it reworks the sstables::shared_index_lists and utils::loading_cache on top of the new class (PATCH2 and PATCH3).

PATCH4 optimizes the loading_cache for the long timer period use case.

But then we have discovered that we have another "customer" for the loading_cache. Apparently our prepared statements cache
had a birth flaw - it was unlimited in size - unless the corresponding keyspace and/or table are modified/dropped the entries
are never evicted. We clearly need to limit its size and it would also make sense to evict the cache entries that haven't been
used long enough.

This seems like a perfect match for a utils::loading_cache except for prepared statements don't need to be reloaded after
they are created.

Patches starting from PATCH5 are dealing with adding the utils::loading_cache the missing functionality (like making the "reloading"
conditional and adding the synchronous methods like find(key)) and then transitioning the CQL and Thrift prepared statements
caches to utils::loading_cache.

This also fixes #2474."

* 'evict_unused_prepared-v5' of https://github.com/vladzcloudius/scylla:
  tests: loading_cache_test: initial commit
  cql3::query_processor: implement CQL and Thrift prepared statements caches using cql3::prepared_statements_cache
  cql3: prepared statements cache on top of loading_cache
  utils::loading_cache: make the size limitation more strict
  utils::loading_cache: added static_asserts for checking the callbacks signatures
  utils::loading_cache: add a bunch of standard synchronous methods
  utils::loading_cache: add the ability to create a cache that would not reload the values
  utils::loading_cache: add the ability to work with not-copy-constructable values
  utils::loading_cache: add EntrySize template parameter
  utils::loading_cache: rework on top of utils::loading_shared_values
  sstables::shared_index_list: use utils::loading_shared_values
  utils: introduce loading_shared_values
2017-10-04 09:13:32 +01:00
Avi Kivity
a2f26f7b29 log_histogram: rename to log_heap
log_histogram is not really a histogram, it is a heap-like container.
Rename to log_heap in case we do want a log_histogram one day.
Message-Id: <20170916172137.30941-1-avi@scylladb.com>
2017-09-18 12:44:05 +02:00
Vlad Zolotarov
9a43398d6a utils::loading_cache: make the size limitation more strict
Ensure that the size of the cache is never bigger than the "max_size".

Before this patch the size of the cache could have been indefinitely bigger than
the requested value during the refresh time period which is clearly an undesirable
behaviour.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-09-15 20:53:11 -04:00
Vlad Zolotarov
4e72a56310 utils::loading_cache: added static_asserts for checking the callbacks signatures
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-09-15 20:53:11 -04:00
Vlad Zolotarov
a13362e74b utils::loading_cache: add a bunch of standard synchronous methods
Add a few standard synchronous methods to the cache, e.g. find(), remove_if(), etc.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-09-15 20:53:11 -04:00
Vlad Zolotarov
fa2f8162a5 utils::loading_cache: add the ability to create a cache that would not reload the values
Sometimes we don't want the cached values to be periodically reloaded.
This patch adds the ability to control this using a ReloadEnabled template parameter.

In case the reloading is not needed the "loading" function is not given to the constructor
but rather to the get_ptr(key, loader) method (currently it's the only method that is used, we may add
the corresponding get(key, loader) method in the future when needed).

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-09-15 20:53:11 -04:00
Vlad Zolotarov
a60a77dfc8 utils::loading_cache: add the ability to work with not-copy-constructable values
Current get(...) interface restricts the cache to work only with copy-constructable
values (it returns future<Tp>).
To make it able to work with non-copyable value we need to introduce an interface that would
return something like a reference to the cached value (like regular containers do).

We can't return future<Tp&> since the caller would have to ensure somehow that the underlying
value is still alive. The much more safe and easy-to-use way would be to return a shared_ptr-like
pointer to that value.

"Luckily" to us we value we actually store in a cache is already wrapped into the lw_shared_ptr
and we may simply return an object that impersonates itself as a smart_pointer<Tp> value while
it keeps a "reference" to an object stored in the cache.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-09-15 20:53:11 -04:00
Vlad Zolotarov
c24d85f632 utils::loading_cache: add EntrySize template parameter
Allow a variable entry size parameter.
Provide an EntrySize functor that would return a size for a
specific entry.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-09-15 20:53:11 -04:00
Vlad Zolotarov
6024014f92 utils::loading_cache: rework on top of utils::loading_shared_values
Get rid of the "proprietary" solution for asynchronous values on-demand loading.
Use utils::loading_shared_values instead.

We would still need to maintain intrusive set and list for efficient shrink and invalidate
operations but their entry is not going to contain the actual key and value anymore
but rather a loading_shared_values::entry_ptr which is essentially a shared pointer to a key-value
pair value.

In general, we added another level of dereferencing in order to get the key value but since
we use the bi::store_hash<true> in the hook and the bi::compare_hash<true> in the bi::unordered_set
this should not translate into an additional set lookup latency.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-09-15 20:53:11 -04:00
Vlad Zolotarov
ec3fed5c4d utils: introduce loading_shared_values
This class implements an key-value container that is populated
using the provided asynchronous callback.

The value is loaded when there are active references to the value for the given key.

Container ensures that only one entry is loaded per key at any given time.

The returned value is a lw_shared_ptr to the actual value.

The value for a specific key is immediately evicted when there are no
more references to it.

The container is based on the boost::intrusive::unordered_set and is rehashed (grown) if needed
every time a new value is added (asynchronously loaded).

The container has a rehash() method that would grow or shrink the container as needed
in order to get the load factor into the [0.25, 0.75] range.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-09-15 20:53:06 -04:00
Gleb Natapov
31e803a36c storage_proxy: wire up percentile speculative read properly
Collect coordinator side read statistic per CF and use them in percentile
speculative read executor. Getting percentile from estimated_histogram
object is rather expensive, so cache it and recalculate only once per
second (or if requested percentile changes).

Fixes #2757

Message-Id: <20170911131752.27369-3-gleb@scylladb.com>
2017-09-14 10:31:26 +03:00
Gleb Natapov
0842faecef estimated_histogram: fix overflow error handling
Currently overflow values are stored in incorrect bucket (last one
instead of special "overflow" one) and percentile() function throws
if there is overflow value. The patch fixes the code to store overflow
value in corespondent bucket and makes percentile() to take it into
account instead of throwing.

Message-Id: <20170911131752.27369-2-gleb@scylladb.com>
2017-09-14 10:31:21 +03:00
Tomasz Grabiec
87be474c19 lsa: Move reclaim counter concept to allocation_strategy level
So that generic code can detect invalidation of references. Also, to
allow reusing the same mechanism for signalling external reference
invalidation.
2017-09-13 17:38:08 +02:00
Avi Kivity
d9ee2ad9f0 chunked_vector: avoid boost::small_vector with old boost versions
Apparently older boost versions have a bug resulting in a double-free
in boost::container::small_vector. Use std::vector instead.

Fixes #2748.

Tested-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20170903170207.21635-1-avi@scylladb.com>
2017-09-07 09:32:51 +03:00
Tomasz Grabiec
5d2f2bc90b lsa: Mark region::merge() as noexcept
It seems to satisfy this, and row_cache::do_update() will rely on it to simplify
error handling.

Message-Id: <1504023113-30374-1-git-send-email-tgrabiec@scylladb.com>
2017-08-29 19:17:17 +03:00
Paweł Dziepak
d5fa07f6df Merge "sstables: switch from deque<> to a custom container" from Avi
Large deques require contiguous storage, which may not be available (or may
be expensive to obtain).  Switch to new custom container instead, which allocates
less contiguous storage.

Allocation problems were observed with the summary and compression info. While
there is work to reduce compression info contiguous space use, this solves
all std::deque problems (and should not conflict with that work).

Fixes #2708

* tag '2708/v6' of https://github.com/avikivity/scylla:
  sstables: switch std::deque to chunked_vector
  tests: add test for chunked_vector
  utils: add a new container type chunked_vector
2017-08-29 11:11:01 +01:00
Avi Kivity
7234f0f0a0 utils: remove dependency on types.hh
Replace with dependency on much smaller marshal_exception.hh.
2017-08-27 15:16:21 +03:00
Avi Kivity
3ba2c0652d utils: add a new container type chunked_vector
We currently use std::deque<> for when we need large random-access containers,
but deque<> requires nr_items * sizeof(T) / 64 bytes of contiguous memory, which can
exceed our 256k fragmentation unit with large sstables.  The new
container, which is a cross between deque and vector, has much lower
limitations.

Like deque, we allocate chunks of contiguous items, but they are
128k in size instead of 512. The last chunk can be smaller to avoid
allocating 128k for a really small vector.
2017-08-26 16:44:45 +03:00
Avi Kivity
5a2439e702 main: check for large allocations
Large allocations can require cache evictions to be satisfied, and can
therefore induce long latencies. Enable the seastar large allocation
warning so we can hunt them down and fix them.

Message-Id: <20170819135212.25230-1-avi@scylladb.com>
2017-08-21 10:25:40 +03:00
Vlad Zolotarov
4b28ea216d utils::loading_cache: cancel the timer after closing the gate
The timer is armed inside the section guarded by the _timer_reads_gate
therefore it has to be canceled after the gate is closed.

Otherwise we may end up with the armed timer after stop() method has
returned a ready future.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1501603059-32515-1-git-send-email-vladz@scylladb.com>
2017-08-01 17:21:44 +01:00
Avi Kivity
3fe6731436 Merge "educe the effect of the latency metrics" from Amnon
"This series reduce that effect in two ways:
1. Remove the latency counters from the system keyspaces
2. Reduce the histogram size by limiting the maximum number of buckets and
   stop the last bucket."

Fixes #2650.

* 'amnon/remove_cf_latency_v2' of github.com:cloudius-systems/seastar-dev:
  database: remove latency from the system table
  estimated histogram: return a smaller histogram
2017-07-31 15:58:30 +03:00
Duarte Nunes
4e3232fc29 utils/log_histogram: Fix typo when calculating number of buckets
We weren't correctly calculating the number of buckets due to
returning the wrong variable.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170731094733.7746-1-duarte@scylladb.com>
2017-07-31 12:49:11 +03:00
Avi Kivity
85056f3611 log_histogram: fix constexpr-ness of log_histogram_options
1. assert() is not constexpr.
2. can't use static_assert(), because the contructor may be called in a non-constexpr
   environment; moved to log_histogram
3. pow2_rank() uses count_leading_zeros() which is not constexpr; split
   into constexpr and non-constexpr versions
4. duplicated number_of_buckets() because bucket_of() can't be constexpr due to pow2_rank
Message-Id: <20170726105444.32698-1-avi@scylladb.com>
2017-07-31 09:11:40 +01:00
Paweł Dziepak
e62403190b Merge "Introduce perf_cache_eviction test" from Tomasz
Runs appending writes to a single partition, at full speed, and a reader
which selects the head of the partition, with 100ms delay between reads.
Prints latency percentiles and some stats.

Intended to test performance at the transition from non-evicting to
evicting modes.

Currently we can see that after the transition, whole partition gets
evicted and reads constantly miss.

Sample output:

    rd/s: 10, wr/s: 135947, ev/s: 0, pmerge/s: 1, miss/s: 0, cache: 708/778 [MB], LSA: 820/910 [MB], std free: 82 [MB]

    reads : min: 149   , 50%: 179   , 90%: 1331  , 99%: 1331  , 99.9%: 1331  , max: 6866   [us]
    writes: min: 3     , 50%: 4     , 90%: 4     , 99%: 5     , 99.9%: 258   , max: 51012  [us]

    rd/s: 7, wr/s: 93354, ev/s: 9, pmerge/s: 1, miss/s: 3, cache: 0/0 [MB], LSA: 107/128 [MB], std free: 82 [MB]

    reads : min: 179   , 50%: 179   , 90%: 73457 , 99%: 73457 , 99.9%: 73457 , max: 105778 [us]
    writes: min: 3     , 50%: 4     , 90%: 4     , 99%: 5     , 99.9%: 258   , max: 105778 [us]

* tag 'tgrabiec/row-eviction-perf-test' of github.com:scylladb/seastar-dev:
  tests: Introduce perf_cache_eviction
  tests: simple_schema: Add getter for DDL statement
  estimated_histogram: Implement percentile()
  utils: estimated_histogram: Make printable
2017-07-28 09:49:22 +01:00
Tomasz Grabiec
6a3703944b utils: Introduce serialized_action 2017-07-27 20:08:21 +02:00
Tomasz Grabiec
5602be72fa estimated_histogram: Implement percentile() 2017-07-27 17:19:07 +02:00
Tomasz Grabiec
1bc305ed7b utils: estimated_histogram: Make printable 2017-07-27 17:19:03 +02:00
Amnon Heiman
1b05f23d12 estimated histogram: return a smaller histogram
The current histogram contains 91 buckets, this is a very high
resolution with a high upper limit.

To reduce traffic passed, between scylla and the prometheus, this patch
generate a smaller histogram.

It limit the number of buckets (16 by default), set a lower limit to the
lowest bucket, and uses 2 as the bucket coeficient.

Highest empty buckets will not be reported.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

estimated histogram
2017-07-27 11:41:10 +03:00
Vlad Zolotarov
9adabd1bc4 utils::loading_cache: add stop() method
loading_cache invokes a timer that may issue asynchronous operations
(queries) that would end with writing into the internal fields.

We have to ensure that these operations are over before we can destroy
the loading_cache object.

Fixes #2624

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1501096256-10949-1-git-send-email-vladz@scylladb.com>
2017-07-26 21:28:49 +02:00
Vlad Zolotarov
76ea74f3fd utils::loading_cache: arm the timer with a period equal to min(_expire, _update)
Arm the timer with a period that is not greater than either the permissions_validity_in_ms
or the permissions_update_interval_in_ms in order to ensure that we are not stuck with
the values older than permissions_validity_in_ms.

Fixes #2590

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-07-13 10:48:59 -04:00
Vlad Zolotarov
121e3c7b8f utils::loading_cache: make a timer use a loading_cache_clock_type clock as a source
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-07-13 10:42:12 -04:00
Botond Dénes
b1082641f9 Make sure keyspace strategy class is stored in qualified form
Even when it's provided in unqualified (short) form.
Fixes #767

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4379f8864843e64c097d432fd06129ce4025f100.1499322476.git.bdenes@scylladb.com>
2017-07-06 14:50:00 +03:00
Duarte Nunes
d157e4558a utils/log_histogram: Remove largest() function
It should never have existed in the first place, as there are no
legitimate callers and it can be misused.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170630095939.2429-1-duarte@scylladb.com>
2017-07-02 14:29:17 +03:00