scylladb

Author	SHA1	Message	Date
Avi Kivity	a2f26f7b29	log_histogram: rename to log_heap log_histogram is not really a histogram, it is a heap-like container. Rename to log_heap in case we do want a log_histogram one day. Message-Id: <20170916172137.30941-1-avi@scylladb.com>	2017-09-18 12:44:05 +02:00
Gleb Natapov	31e803a36c	storage_proxy: wire up percentile speculative read properly Collect coordinator side read statistic per CF and use them in percentile speculative read executor. Getting percentile from estimated_histogram object is rather expensive, so cache it and recalculate only once per second (or if requested percentile changes). Fixes #2757 Message-Id: <20170911131752.27369-3-gleb@scylladb.com>	2017-09-14 10:31:26 +03:00
Gleb Natapov	0842faecef	estimated_histogram: fix overflow error handling Currently overflow values are stored in incorrect bucket (last one instead of special "overflow" one) and percentile() function throws if there is overflow value. The patch fixes the code to store overflow value in corespondent bucket and makes percentile() to take it into account instead of throwing. Message-Id: <20170911131752.27369-2-gleb@scylladb.com>	2017-09-14 10:31:21 +03:00
Tomasz Grabiec	87be474c19	lsa: Move reclaim counter concept to allocation_strategy level So that generic code can detect invalidation of references. Also, to allow reusing the same mechanism for signalling external reference invalidation.	2017-09-13 17:38:08 +02:00
Avi Kivity	d9ee2ad9f0	chunked_vector: avoid boost::small_vector with old boost versions Apparently older boost versions have a bug resulting in a double-free in boost::container::small_vector. Use std::vector instead. Fixes #2748. Tested-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20170903170207.21635-1-avi@scylladb.com>	2017-09-07 09:32:51 +03:00
Tomasz Grabiec	5d2f2bc90b	lsa: Mark region::merge() as noexcept It seems to satisfy this, and row_cache::do_update() will rely on it to simplify error handling. Message-Id: <1504023113-30374-1-git-send-email-tgrabiec@scylladb.com>	2017-08-29 19:17:17 +03:00
Paweł Dziepak	d5fa07f6df	Merge "sstables: switch from deque<> to a custom container" from Avi Large deques require contiguous storage, which may not be available (or may be expensive to obtain). Switch to new custom container instead, which allocates less contiguous storage. Allocation problems were observed with the summary and compression info. While there is work to reduce compression info contiguous space use, this solves all std::deque problems (and should not conflict with that work). Fixes #2708 * tag '2708/v6' of https://github.com/avikivity/scylla: sstables: switch std::deque to chunked_vector tests: add test for chunked_vector utils: add a new container type chunked_vector	2017-08-29 11:11:01 +01:00
Avi Kivity	7234f0f0a0	utils: remove dependency on types.hh Replace with dependency on much smaller marshal_exception.hh.	2017-08-27 15:16:21 +03:00
Avi Kivity	3ba2c0652d	utils: add a new container type chunked_vector We currently use std::deque<> for when we need large random-access containers, but deque<> requires nr_items * sizeof(T) / 64 bytes of contiguous memory, which can exceed our 256k fragmentation unit with large sstables. The new container, which is a cross between deque and vector, has much lower limitations. Like deque, we allocate chunks of contiguous items, but they are 128k in size instead of 512. The last chunk can be smaller to avoid allocating 128k for a really small vector.	2017-08-26 16:44:45 +03:00
Avi Kivity	5a2439e702	main: check for large allocations Large allocations can require cache evictions to be satisfied, and can therefore induce long latencies. Enable the seastar large allocation warning so we can hunt them down and fix them. Message-Id: <20170819135212.25230-1-avi@scylladb.com>	2017-08-21 10:25:40 +03:00
Vlad Zolotarov	4b28ea216d	utils::loading_cache: cancel the timer after closing the gate The timer is armed inside the section guarded by the _timer_reads_gate therefore it has to be canceled after the gate is closed. Otherwise we may end up with the armed timer after stop() method has returned a ready future. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1501603059-32515-1-git-send-email-vladz@scylladb.com>	2017-08-01 17:21:44 +01:00
Avi Kivity	3fe6731436	Merge "educe the effect of the latency metrics" from Amnon "This series reduce that effect in two ways: 1. Remove the latency counters from the system keyspaces 2. Reduce the histogram size by limiting the maximum number of buckets and stop the last bucket." Fixes #2650. * 'amnon/remove_cf_latency_v2' of github.com:cloudius-systems/seastar-dev: database: remove latency from the system table estimated histogram: return a smaller histogram	2017-07-31 15:58:30 +03:00
Duarte Nunes	4e3232fc29	utils/log_histogram: Fix typo when calculating number of buckets We weren't correctly calculating the number of buckets due to returning the wrong variable. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170731094733.7746-1-duarte@scylladb.com>	2017-07-31 12:49:11 +03:00
Avi Kivity	85056f3611	log_histogram: fix constexpr-ness of log_histogram_options 1. assert() is not constexpr. 2. can't use static_assert(), because the contructor may be called in a non-constexpr environment; moved to log_histogram 3. pow2_rank() uses count_leading_zeros() which is not constexpr; split into constexpr and non-constexpr versions 4. duplicated number_of_buckets() because bucket_of() can't be constexpr due to pow2_rank Message-Id: <20170726105444.32698-1-avi@scylladb.com>	2017-07-31 09:11:40 +01:00
Paweł Dziepak	e62403190b	Merge "Introduce perf_cache_eviction test" from Tomasz Runs appending writes to a single partition, at full speed, and a reader which selects the head of the partition, with 100ms delay between reads. Prints latency percentiles and some stats. Intended to test performance at the transition from non-evicting to evicting modes. Currently we can see that after the transition, whole partition gets evicted and reads constantly miss. Sample output: rd/s: 10, wr/s: 135947, ev/s: 0, pmerge/s: 1, miss/s: 0, cache: 708/778 [MB], LSA: 820/910 [MB], std free: 82 [MB] reads : min: 149 , 50%: 179 , 90%: 1331 , 99%: 1331 , 99.9%: 1331 , max: 6866 [us] writes: min: 3 , 50%: 4 , 90%: 4 , 99%: 5 , 99.9%: 258 , max: 51012 [us] rd/s: 7, wr/s: 93354, ev/s: 9, pmerge/s: 1, miss/s: 3, cache: 0/0 [MB], LSA: 107/128 [MB], std free: 82 [MB] reads : min: 179 , 50%: 179 , 90%: 73457 , 99%: 73457 , 99.9%: 73457 , max: 105778 [us] writes: min: 3 , 50%: 4 , 90%: 4 , 99%: 5 , 99.9%: 258 , max: 105778 [us] * tag 'tgrabiec/row-eviction-perf-test' of github.com:scylladb/seastar-dev: tests: Introduce perf_cache_eviction tests: simple_schema: Add getter for DDL statement estimated_histogram: Implement percentile() utils: estimated_histogram: Make printable	2017-07-28 09:49:22 +01:00
Tomasz Grabiec	6a3703944b	utils: Introduce serialized_action	2017-07-27 20:08:21 +02:00
Tomasz Grabiec	5602be72fa	estimated_histogram: Implement percentile()	2017-07-27 17:19:07 +02:00
Tomasz Grabiec	1bc305ed7b	utils: estimated_histogram: Make printable	2017-07-27 17:19:03 +02:00
Amnon Heiman	1b05f23d12	estimated histogram: return a smaller histogram The current histogram contains 91 buckets, this is a very high resolution with a high upper limit. To reduce traffic passed, between scylla and the prometheus, this patch generate a smaller histogram. It limit the number of buckets (16 by default), set a lower limit to the lowest bucket, and uses 2 as the bucket coeficient. Highest empty buckets will not be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com> estimated histogram	2017-07-27 11:41:10 +03:00
Vlad Zolotarov	9adabd1bc4	utils::loading_cache: add stop() method loading_cache invokes a timer that may issue asynchronous operations (queries) that would end with writing into the internal fields. We have to ensure that these operations are over before we can destroy the loading_cache object. Fixes #2624 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1501096256-10949-1-git-send-email-vladz@scylladb.com>	2017-07-26 21:28:49 +02:00
Vlad Zolotarov	76ea74f3fd	utils::loading_cache: arm the timer with a period equal to min(_expire, _update) Arm the timer with a period that is not greater than either the permissions_validity_in_ms or the permissions_update_interval_in_ms in order to ensure that we are not stuck with the values older than permissions_validity_in_ms. Fixes #2590 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-07-13 10:48:59 -04:00
Vlad Zolotarov	121e3c7b8f	utils::loading_cache: make a timer use a loading_cache_clock_type clock as a source Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-07-13 10:42:12 -04:00
Botond Dénes	b1082641f9	Make sure keyspace strategy class is stored in qualified form Even when it's provided in unqualified (short) form. Fixes #767 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4379f8864843e64c097d432fd06129ce4025f100.1499322476.git.bdenes@scylladb.com>	2017-07-06 14:50:00 +03:00
Duarte Nunes	d157e4558a	utils/log_histogram: Remove largest() function It should never have existed in the first place, as there are no legitimate callers and it can be misused. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170630095939.2429-1-duarte@scylladb.com>	2017-07-02 14:29:17 +03:00
Avi Kivity	fc966c0c4c	Merge "tombstone removal compaction" from Raphael "This feature is intended to make compaction more efficient at getting rid of droppable tombstone and expired data wasting disk space. So far, people have been dealing with it manually through major compaction. With strategies other than date tiered, large sstables will be left untouched for a long time even though it's all expired. Date tiered suffers from it when mixing data with different TTL because it only includes for compaction sstable that is fully expired. sstables keeps as metadata a histogram which allows us to easily estimate droppable data ratio from gc_before. sstables which droppable data ratio is above 20% (default value for tombstone_threshold option) will be considered candidates for the operation. Like in C, we will only do tombstone removal compaction when there's nothing to compact in standard way. It would be interesting to trigger it too when disk usage is above a given threshold, but I decided to leave this for later. Fixes #2306." 'tombstone_removal_compaction_v4' of github.com:raphaelsc/scylla: tests: more testing for tombstone compaction options tests: basic tombstone compaction test for date tiered compaction/dtcs: add support for tombstone compaction tests: basic test of tombstone compaction with lcs compaction/lcs: add support for tombstone compaction tests: basic tombstone compaction test for size tiered compaction/stcs: add support for tombstone compaction tests: add test for estimation of droppable tombstone ratio sstables: introduce function to estimate droppable tombstone ratio compaction_manager: periodically submit cfs for compaction streaming_histogram: fix coding style tests: add streaming_histogram_test streaming_histogram: implement sum tests: add test for sstable with bad tombstone histogram sstables: discard bad streaming histogram for future use tests: add sstable tombstone histogram test streaming_histogram: fix update streaming_histogram: move it to utils streaming_histogram: do not limit it to be used by sstables sstables: update tombstone_histogram for cells with expiration time	2017-06-29 10:19:59 +03:00
Raphael S. Carvalho	719dbf547d	streaming_histogram: fix coding style Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	a65b9eb8b4	streaming_histogram: implement sum This function is used to estimate number of points in interval [-inf,b]. It will be useful for estimating droppable tombstone ratio in a given sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	f35bd66da4	streaming_histogram: fix update This bug was introduced when converting java code. Return value of map::erase() was used as if it were the value of the removed entry, but it's actually the number of removed entries. update() also relies on ordered keys, so map is used instead by histogram. In addition, histograms will be written in sorted order (like C* does) such that we can detect bad histograms, using disk_array. disk_array is also used from now on to read histograms. The conversion from array to map is fine because histograms for sstables are limited to 100 elements. Coming patch will detect bad histograms (generated only by us) and discard them, because we can't rely on their information. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 01:17:26 -03:00
Tomasz Grabiec	3489c68a68	lsa: Fix performance regression in eviction and compact_on_idle Region comparator, used by the two, calls region_impl::min_occupancy(), which calls log_histogram::largest(). The latter is O(N) in terms of the number of segments, and is supposed to be used only in tests. We should call one_of_largest() instead, which is O(1). This caused compact_on_idle() to take more CPU as the number of segments grew (even when there was nothing to compact). Eviction would see the same kind of slow down as well. Introduced in `11b5076b3c`. Message-Id: <1498641973-20054-1-git-send-email-tgrabiec@scylladb.com>	2017-06-28 12:32:43 +03:00
Raphael S. Carvalho	d90f46000d	streaming_histogram: move it to utils It's not specific to sstables. May be needed somewhere else in the future. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-28 01:07:13 -03:00
Tomasz Grabiec	4b4aef789e	utils: Add helpers for dealing with nonwrapping_range<int>	2017-06-24 18:06:11 +02:00
Vlad Zolotarov	1ae40ee91a	utils::timestamped_val: fix the touch() comment The current comment has been written when the function has not been a timestamped_val member. Let's adjust it to the current code. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1495555659-10881-1-git-send-email-vladz@scylladb.com>	2017-05-26 19:26:56 +03:00
Vlad Zolotarov	0619c2cb71	utils::serialization: remove not used deserialization_xxx() functions Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1495556124-16672-1-git-send-email-vladz@scylladb.com>	2017-05-26 19:26:20 +03:00
Paweł Dziepak	3b9c0a6ae2	Merge "loading_cache: fix the known complexity issue in the shrink() method" from Vlad Use the boost::intrusive containers in order to achieve a O(1) complexity for both "LRU list" update and to minimize the memory overhead in the hash table item to "LRU list" item connection: - Make the timestamped_val be both a bi::list and a bi::unordered_set item. - Make a bi::unordered_set be a cache backend instead of the std::unordered_map. As a result dropping k LRU items becomes an O(k) operation instead of O(N log N), where N is a total number of all cached items: - Every time a value is read - move it to the front of the "LRU list" (O(1)). - When we need to remove k LRU items: - Repeat k times: - Take an element from the back of the "LRU list". (O(1)). - Remove it from the bi::unordered_set and dispose. (O(1)). We use an auto-unlink configuration for bi::list, therefore disposing an item is going to auto unlink it from the list. * 'permissions_cache_move_to_intrusive-v1' of github.com:scylladb/seastar-dev: utils::loading_cache: cleanup utils/loading_cache.hh: use intrusive list to store the lru entry utils::loading_cache: implement automatic rehashing utils::loading_cache: make the underlying map to be an intrusive unordered_set	2017-05-23 16:18:16 +01:00
Avi Kivity	fd0e1eb1e2	Merge "Fixes for mutation algebra" from Tomasz "Enforces commutativity of addition: m1 + m2 == m2 + m1 and consistency of difference and addition with equality: m1 + (m2 - m1) == m1 + m2" * tag 'tgrabiec/fix-range-tombstone-commutativity-v2' of github.com:cloudius-systems/seastar-dev: mutation: Make compare_*_for_merge() consistent with equals() tests: mutation: Improve assertion failure message tests: Use default equality in test_mutation_diff_with_random_generator mutation: Make counter cell difference consistent with apply tests: range_tombstone_list_test: Improve error message tests: range_tombstone_list: Check adjacent range merging range_tombstone_list: Merge adjacent range tombstones in apply() tests: mutation: Check commutativity of mutation addition range_tombstone_list: Avoid violating set invariant range_tombstone_list: Make tombstone merging commutative range_tombstone_list: Add erase() operation to the reverter range_tombstone_list: Make all undo operations ordered relative to each other utils: Extract to_boost_visitor() to a separate header allocating_strategy: Introduce alloc_strategy_unique_ptr<>	2017-05-23 15:20:38 +03:00
Vlad Zolotarov	2d4d198fb9	utils::loading_cache: cleanup - Remove "_" at the beginning of the type names. - s/Pred/EqualPred/ Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 23:02:18 -04:00
Vlad Zolotarov	fd59a548c0	utils/loading_cache.hh: use intrusive list to store the lru entry Fix the shrink() O(n log n) complexity issue by constantly pushing the corresponding intrusive list entry to the head of the list every time the values are read. This will keep the list ordered by the last read time from the most recently read to the least recently read entry. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 23:00:18 -04:00
Vlad Zolotarov	0c4e9efce7	utils::loading_cache: implement automatic rehashing - Start the cache with 256 buckets - the minimum number of buckets. - Limit the maximal number of buckets by 1M buckets. - Keep the load factor between 0.25 and 1.0 as long as the number of buckets is between the minimum and the maximum values mentioned above. - Grow and shrink the hash every "refresh" period if needed. - Enable bi::power_2_buckets and bi::compare_hash bi::unordered_set options. - Enable bi::unordered_set_base_hook's bi::store_hash option. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 22:57:44 -04:00
Vlad Zolotarov	2be3596a4f	utils::loading_cache: make the underlying map to be an intrusive unordered_set Make the underlying map to be a boost::intrusive::unordered_set<timestamped_val> instead of std::unordered_set<Key, timestamped_val>. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 18:45:13 -04:00
Tomasz Grabiec	5aeb9eb70c	utils: Extract to_boost_visitor() to a separate header	2017-05-22 19:30:02 +02:00
Tomasz Grabiec	69e2eccf68	allocating_strategy: Introduce alloc_strategy_unique_ptr<>	2017-05-22 19:30:02 +02:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Tomasz Grabiec	cd4d15672b	utils: estimated_histogram: Fix clear() It was a no-op. It doesn't seem currently used, but I will have a use for it soon. Message-Id: <1495198172-1969-1-git-send-email-tgrabiec@scylladb.com>	2017-05-19 14:34:34 +01:00
Vlad Zolotarov	6a63c87a9f	utils::loading_cache: avoid the reads storm when the key is not in the cache Use a mutex to serialize producers when the key is not present in the cache. Fixes #2262 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-18 07:55:48 -04:00
Vlad Zolotarov	1ef22f84c1	utils::loading_cache: cleanup - Fix a callback signature: receive a const ref. - White spaces. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-17 15:03:14 -04:00
Vlad Zolotarov	87ce0b2d47	utils::loading_cache: align the constrains in the constructor with the parameters description According to description of permissions_validity_in_ms the permissions_cache is enabled if this value is set to a non-zero value. Otherwise the permissions_cache is disabled. According to the permissions_update_interval_in_ms description it must have a non-zero value if permissions_cache is enabled. permissions_cache_max_entries description doesn't explicitly state it but it makes no sense to allow it to be zero if permissions_cache is enabled. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-17 15:03:14 -04:00
Vlad Zolotarov	e286828472	utils::loading_cache: refresh in the background This patch changes the way a loading_cache works. Before this patch: 1) If a permissions key is not in the cache it's loaded in the foreground and the original query is blocked till the permissions are loaded. 2) Every _period the timer does the following: 1) If a value was loaded more than _expiry time ago it is removed from the cache. 2) If the cache is too big - the less recently loaded values are removed till the cache fits the requested size. After this patch: 1) If a permissions key is not in the cache it's loaded in the foreground and the original query is blocked till the permissions are loaded. 2) Every _period the timer does the following: 1) If a value in the cache was loaded or read for the last time more than _expiry time ago - it's removed from the cache. 2) If the cache is too big - the less recently read values are removed till the cache fits the requested size. 3) The values that were loaded more than _refresh time ago are re-read in the background. The new implementation allows to minimize the amount of the foreground reads for a frequently used value to a single event (when the value is loaded for the first time). It also ensures we do not reload values we no longer need. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-17 15:03:06 -04:00
Avi Kivity	1c6cecd9d0	utils: introduce div_ceil() Divides integrals but rounds up rather than down.	2017-05-17 12:30:03 +03:00
Vlad Zolotarov	494ea82a88	utils::UUID: align the UUID serialization API with the similar API of other classes in the project The standard serialization API (e.g. in data_value) includes the following methods: size_t serialized_size() const; void serialize(bytes::iterator& it) const; bytes serialize() const; Align the utils::UUID API with the pattern above. The only addition is that we are going to make an output iterator parameter of a second method above a template so that we may serialize into different output sources. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-16 15:56:03 -04:00
Vlad Zolotarov	7706775a63	utils: serialization: unify the variety of serialize_XXX(...) Use the same templated implementation for all different serialize_XXX(...). The chosen implementation is based on the std::copy_n(char*, size, OutputIterator), which is heavily optimized and will be using memcpy/memmove where possible. This patch also removes the not needed specializations that accept signed integer values since we were casting them to unsigned value anyway. The std::ostream based specifications are also removed since they are not used anywhere except for a test-serialization.cc and adjusting the ostream to the iterator is a single-liner. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-16 15:56:03 -04:00

1 2 3 4 5 ...

408 Commits